 It's about time, so I'm going to go ahead and get started. My name is Chris Eagle. I am on the staff at a place called the Naval Postgraduate School in Monterey. And today I'm going to talk about attacking obfuscated code using IDA Pro as the analysis tool. Go through the introduction, talk about the operation of this tool that I'm going to introduce, and then we'll do some demos of the tool and work. First thing, I highly recommend that you move closer to the screens, not in, because I guess we have two screens, so you can move out. But I'm going to be doing some demos in IDA Pro, and you cannot change the font size in IDA. So we're going to get what we get, and if you're up close, you'll probably see the stuff, and if you're in the back, you're going to probably strain your eyes or miss the finer details. I'll try and talk through it all if you want to stay back there. That's fine. OK, so a little bit of background. IDA Pro, for those who don't know, hopefully most of y'all are familiar with it, is perhaps one of the premier reverse engineering tools that's out there. It's a disassembler that can handle many families of semilanguage, understands many executable file formats, and automates a lot of analysis tasks. It really makes life easy on a reverse engineering front. It's a Windows tool, so forgive me if you're not a Windows fan, but that's why I'm in Windows today so I can run IDA. As I understand it, and there's probably smarter people out here, they are working on a Linux, at least analysis engine. Maybe not the GUI side, but it may be running on the Linux side of the house in the near future, or not so near future. It depends on when it is time to get to it. So what I'm going to talk about today is a plug-in that I've developed, and for lack of anything better. It's this x86 emulator plug-in. And what it does is it allows you to do some emulated execution of instructions within IDA. So basically you're looking at the disassembly in IDA. Only x86 is all I ever intend to work on. But one of the things I find useful when I'm reverse engineering, I end up doing a lot of hand tracing through code. What is this code doing? Keeping track of register values and whatnot. And I had always wished I had something that would do it for me. Just step, step, step, boom, what's in the AX now? Well, you can do emulated execution. You can see the code right there in front of you. Let somebody else execute it, and that's what the plug-in does. It's written in C++. It's a Visual C++ project at the moment. And you can get it for free out at SourceForge right now. The latest release of the tool is also on the conference CD. Again, why I did it, hand tracing through assembly language of pain. Antireversal engineering techniques, which is why I really developed this thing, work very hard to try to obfuscate the code. So whether they're doing really unusual things, whether they're obfuscating the code path, jumping to the middle of instructions, self-modifying code, all this kind of stuff. If it's self-modifying code, IDA is just going to show you the original unmodified code. So how are you going to get the underlying code? Once it's been unrolled at runtime, you're never going to do it in IDA unless you modify the stuff yourself. You can do it with IDA scripts. Or I've taken an angle of doing it with the plug-in. So there's a lot of reasons you might obfuscate code, and I'm not going to get into the merits of any of them. But a big place you see this stuff these days is in virtually any worm or virus that comes rolling around the internet is usually obfuscated in some way. There's a couple of examples there. UPX is pretty common, T-lock, ASPAC. In fact, IDA itself is protected with ASPAC. So if you want to reverse engineer IDA, you've got to get through the ASPAC protection. In the Linux world, TASO released BurnEye a few years ago, and there's another tool out there called Shiva. We don't see worms protected with this stuff so much. But it does the same sort of thing. It obfuscates code and makes it difficult to see what's underneath. So whether it's hostile code or not, you still have to fight through this layer just to see what might be underneath. So the reason to do this is to get it the underlying executable. Last thing you want to do is spend your time reverse engineering the obfuscation piece. Most of these things start up. You have this obfuscation piece. The first thing that's going to happen in runtime is that the obfuscation runs initially that reconstitutes the original binary in some form and then passes control off to that binary. Now, Shiva takes a little different approach. But essentially, you've got to unprotect the binary before you can ever pass execution to it. So we want to get at the stuff underneath. Again, worms, viruses. So it's nice to be able to get at that stuff quickly so that you can turn around and the antivirus vendors develop in signatures or what have you. Maybe you want to know if there's hidden functionality, backdoor functionality in the worm or the virus, and you want to get at it for further analysis of the code. I mean, it's easy to see that this thing propagates. It's easy to see that a worm propagates through a particular vulnerability, perhaps. But the more detailed analysis tells you whether there's some time delayed feature in there like Code Red had that was going to do with the denial of service on the White House or whatever it was, or whether there's any other backdoor features that aren't apparent in the worms' propagation mechanism. And so that's the challenge. And that's why I developed the plug-ins. To work through in IDEA without running a binary, a malicious piece of code, to get to the real reverse engineering, which was what is the underlying code doing? So the way this thing works is in IDEA Pro, for those who've never used it, I'll go through some demos. You load a binary up into IDEA. It does a lot of analysis, decides what type of executable it is. Is it, let's say a Linux elf? Is it a Windows PE executable? Parses all the headers, parses the different segments, and so on. And then does this assembly. Goes to the entry point, basically, starts from there and takes it apart. This assembles, recognizes functions, recognizes static cleared locals, recognizes function parameters, and so on. And it makes a lot of annotations. It's really, really nice. And what it's doing all along the way is it's taking basically every byte out of that binary image off your disk, and sticking it in a database. So IDEA builds a database. And then you, as a reverse engineer, are interacting with a database. You're not really interacting with a binary anymore. You're making annotations into a database. And that's all it is. Every byte is marked as executable, or it's data. You can reformat things. But IDEA is essentially a pretty front end to their database, and you can make annotations and keep working from there. So it doesn't run any code, although the new version has a Windows debugger built into it, which I've never really played with. But that doesn't help you if you want to look at Linux code anyway. Obfuscated code gets tougher, because all you're going to see is the de-obfuscation piece, because that's the only executable piece that sits on the file. Everything else is data, from an immediate perspective. You have the de-obfuscator sitting there. That's what's going to run. You have all the data that's been obfuscated in some way, whether it's been just encrypted or encoded or what have you. And so IDEA will show you the de-obfuscator portion, but everything else is just going to group together as data. Hard to reverse engineer that stuff. And usually you're only going to get sensible output in IDEA for the entry point function, and then it's going to start falling apart. And if they do things like jumping in the middle of instructions, IDEA really can't follow that. It's got to find some instruction boundaries. It can't show you the same instruction twice, split it in the middle of instruction for different instruction starts, or so on. So we've got to work through that. And if you're going to use IDEA to reverse engineer, we've got to get through that easily. You can do it manually, and it's kind of a pain. Anybody who's done it can tell you. Now the plug-in that I developed has two pieces. It's a user interface piece, and I'm not a user interface person, so you can laugh at it all you want. That's fine. And I will generally chalk up quick and dirty user interface hacks as ugly. And it got the job done for me, and I can work on that in the future. But the user interface is all supposed to be Windows GUI specific code. It throws up the dialog boxes that this emulator needs, and so on. And then there is an x86 emulator piece, which handles instruction fetching and decoding and executing, and so on. And it interacts with the database to pull out different instruction bytes, and so on. The emulator piece is mostly platform independent. I'm trying to keep the platform independent so that you can actually do, you can pull it aside and do standalone emulation, and you can actually integrate it into your own standalone unrollers, which is what I did. And there's reference to a talk about Reverse Engineering Shiva, which was this Linux protector. And I took the emulator out of this, just the emulator code, and used it in the standalone to create a standalone unwrapper for Shiva. So that's why I'm trying to keep the emulator separate, so I could pull it aside and run it, say, on Linux if I wanted to. But what it's really designed to do is execute a single instruction at a time, and then pass control back over to the GUI side, so you can step through this code. And we can talk on the side on why I didn't go find any other. There's plenty of x86 emulators out there, and I'll talk to you offline if you want to know why I did my own, sort of reinvented the wheel there. But the integration is what's sort of new here. So the GUI gives you this console. If you're in IDEA, and we'll see samples here, it throws up this console. It's sort of like a pseudo debugging console. It's gonna show you register displays, and you can drill in, and you can get at the segment registers if you really want to. It's gonna show you, the only memory display it has at the moment is the stack display, which sits down there at the bottom. Over to the right, you have some control buttons. We can step through code. We can just jump over code, or jump to a specific spot in the code, which is just resetting the instruction pointer. You can skip over code, just skip a statement entirely if you don't want to descend into a subroutine. Or more importantly, in dynamic linked code, IDEA is only loaded up the binary, but you don't have access to any of the DLLs. If you're gonna descend, if you see that the code makes a call to a library function, you don't have to disassemble code to that function. We can't execute it. So you can just skip the call entirely. Of course, you have to fudge the results in EAX perhaps in order to be able to continue. You can skip forward, set the cursor somewhere in the future and run the cursor and cross your fingers that you get there, because if you don't get there, you basically locked up the emulator while it's fetching from who knows where. That's an improvement that could be made, and then hide just makes it go away. For manipulating stack data, the push data button down towards the bottom will actually let you push your own data onto the stack, which is kind of nice. If you want to just run one function out of the program, then you know it needs some arguments. Push the arguments on the stack and then start stepping through the program. And the program will retrieve the arguments off of the stack. And I'll explain the memory layout, memory architecture of the plugin in just a moment. So using this thing is it's brought up in IDA with Alt F8, and this hopefully will all become clear when I start working through demos. Wherever you have the cursor, when you bring up the emulator, that's where EIP is gonna be set. So you'll start stepping if you want to from wherever the cursor is. And then you just step and go, okay? And the plugin will interact with IDA. It'll fetch a byte out of the IDA database, which is gonna assume as code, because you told it to, you told it to step there. The emulator then decodes the instruction, fetches more bytes from IDA if it's a multi byte instruction, and then modifies all the registers accordingly on that control panel that you just saw. The registers will all update as you step through the code. And every time you step, the emulator also tells IDA, okay, this is gonna be code because EIP is pointing at this thing. So the emulator will tell IDA, now reorganize whatever is at this location to look like code. So if you're jumping into the middle of instructions, then the emulator will automatically reorganize your IDA display to mark the new beginning of an instruction. It undefines the old instruction and redefines a new instruction at the current EIP location. And I'll show you a lot of this sort of code reorganizing, jumping to the middle of the instructions in some of the demos that are coming up. This just describes, well, the run to cursor. I don't do breakpoints yet, okay, but in theory you could have a list of breakpoints, just go to any address in IDA and then add a breakpoint into the emulator and then run until you got there. That's really not easy to do. All you have to do is monitor EIP. The plugin supplies its own stack. Remember, IDA is just the binary image that sits on the disk. So that's a lot of code and some static data, but your heap doesn't sit in that file and your stack doesn't sit in that file. So if you wanna really run through any code, you're gonna need perhaps a heap and at least a stack. Available as you fetch instructions, you mean push anything and that's gotta go somewhere. You're not gonna push that data into the IDA database because that's really not what's sitting on your file. So the emulator provides a stack and it looks at memory references and sorts them out based on where are we retrieving from, where are we writing to, where are we reading from, makes a decision, is this code, is it, am I gonna go fetch this out of the IDA database, is it stack, am I gonna go throw it in the emulated stack, is it out in the emulated heap? So the plugin supplies both of those two things. It requires some manual setup, but it works. Limitations on the plugin, it's slow. It's emulated execution through a database. A, we gotta do all this database interaction to fetch things and B, we're emulating every instruction. So don't think that you're gonna get high performance execution of these executables, but the real goal is to just get through the obfuscation, which should be done fairly quickly and leave you to reverse engineer the underlying binary. It cannot follow calls into dynamically linked functions because IDA doesn't load that code up. It doesn't load up, there's nothing more I can say about that, but if you're calling, say, printf, we don't have the disassembled version of printf to descend into, so that's a limitation on this thing and IDA won't load them up at the same time. Can't follow system calls into statically linked executables. So say you're looking at a Linux binary and you're doing some end 80s. You descend down to an end 80, it's not gonna follow that system call. So you're gonna have to fake the results and throw something into EAX and press on if you can. So the emulator memory, I sort of went over the layout of this, code gets fetched from the IDA database, so that's database interaction, and then the other references are resolved, either the emulated heap or the emulated stack. Every memory reference is checked, so we could actually output information about where are we reading to and writing from or vice or whatever that was supposed to be said. In the heap, you could add sort of valgrind type analysis. You can actually observe that you're flowing out the end of a heap allocator block, and you could potentially see some heap overflows in that way, because we can mark every access to and from the heap, although it's just a toy implementation of a heap algorithm. It could in theory be replaced by any of the actual heap algorithms or heap management algorithms that are in use in the world. Memory layout, you control by bringing up a little dialog box, and you can set your stack top, you can set your stack size, you can set your heap base address, you can set a heap size, and it'll just go grab that memory. It's not very smart about growing the stack if it needs to or growing the heap anymore if it needs to. But if you set reasonable limits in here, you should be able to accomplish what, my primary goal here, which is unrolling the obfuscated code. Emulated stack operations, I'll go to the stack. So pushes and pops and things like that, I'll interact with the stack. The stack contents are displayed down in that bottom scrolling window. Currently I don't have, the heap was one of the last things I added in. I don't have a way to display heap memory at the moment. You can always display code in IDA. You can scroll through IDA and see what the code is looking like. You can scroll through the stack window and see what the stack is looking like. But I do need to come up with a more generic way of looking through memory so you can see the state of the heap at any moment. Let's see. And you do have the capability of pushing data onto the stack sort of outside of program control. Let's just throw some data on there, grab some stack space if you want to, push some parameters, and then step into a function. Looks like this, if you want to push some data, you can throw some data up in the, it brings up a dialog box, asks you what the data that you want to push, puts your numbers in there. It's kind of like calling a function. Hey, the stuff gets pushed in right to left order. So you put on the parameters just like you want to call a function. Okay, and then the stack, stuff gets jammed onto the stack and the stack pointer gets updated accordingly. Okay, so that's all sort of integrated. Emulated heap is just a real simple link list, a memory allocator. It doesn't do inline control information. So if you do heap overwrites, if you see any of that stuff, or if you were able to look at heap memory, you wouldn't see links to the next free block, next empty block, any of that stuff. It's not in memory. It's just a simple way to satisfy memory requests that are made by the binary being looked at. It can detect access outside of allocator blocks though. So in theory you could watch and see that if the heap overflow is going to take place. Function hooking. So while it doesn't descend into dynamically linked libraries, it turned out that one of the unpackers I was looking at did want to call a malloc type function. And so I was out of luck. I couldn't get any farther unless I could implement the malloc. So A, I had to implement a heap, and B, I had to hook the heap alloc call that was being made in this Windows binary. So I added sort of function hooking and I've mimicked a few, very few library calls that I found I needed to get through this one particular protection. And what happened, so then if it's gonna call heap alloc, if you see, hey, at this location it's gonna call heap alloc, you can hook that call to heap alloc and divert it into our emulated heap allocator. And so anytime it says call heap alloc, okay, then you jump out to the emulated heap and it returns you a pointer to a block of memory the size that you requested. It looks into the stack as you've set it up for that call. And so if it occurs in a loop or as it occurs over and over again, now we're allocating memory without having to have a library function available. And you can also just run the functions if you want to. If you just wanna run malloc and you wanna have a block of say 1k, you can just push the number on the stack and then just run malloc and you'll get a pointer back in EAX. And then you can use that pointer manually later on if you know you need a block, a pointer to a buffer to pass in to say some other function. But the automatic function hooking is a little more interesting and we'll see that in one of the demos coming up. A manual function hooking just looks like this, it's not really manual function hooking, it's just sort of running a function. Choose a function that you wanna run, make sure you push the parameters on the stack and then just select the function that you wanna run off the drop down and it'll run the function, pull the parameters and give you a result in EAX. And so you can sort of play along. No change to EIP, okay, it doesn't, we're sort of executing that function outside of the binary for administrative purposes, like to grab a large block of memory out in the heap that we may use later on. Automatic function hooking works like this, we go choose an address, actually I think I changed the way this is implemented, but what we would do is we set the address, the function so here you see a place where it does a call to malloc, we're gonna change that address so that not just this call, but every other call to malloc would get diverted into our whatever function we hook in its place over here, malloc in this case. Could be, and there's only those five functions were implemented, because that's all I needed. And again, I'm gonna demo that coming up. Another place I had to struggle was there is a, some of these unpackers like to generate exceptions, for sort of anti-debugging purposes, so a lot of the windows unpackers will throw exceptions left and right. Divide by zero, they'll throw in three, do a lot of strange stuff. And so to get through those, the emulator has to handle these exceptions in a manner that the binary expects. Well, these are Windows binaries, and they're expecting to get sort of the Windows structured exception handling data thrown on the stack, and then jump to the exception handler. So, yeah, I threw that in there, the emulator will do that. If it's a PE binary, then the emulator will try to catch exceptions. And when it sees the exception, it'll push all the stack data required for an exception handler in Windows, follow the exception handler links through FS colon zero, and try to transfer to control to the exception handler, which then can play with all the data, all the saved registers that are thrown up on the stack before returning control to the binary, and we'll work through that a little bit. And it only recognizes a couple things. It does the divide by zero right now in three, single stepping, the trace flag being turned on, and the debug registers, which is, again, because T-Lock insisted on doing all that stuff. The emulator program has to set up an exception handler, obviously, to use all of this stuff, and then the emulator creates the data structures and throws them on the stack before transferring control to the exception handler. Okay, so, demos. We'll start off with UPX, because it's the easiest, and it's the most likely demo to succeed. It's a pretty common obfuscator. It's not terribly sophisticated, because all it really is is a compressor. It just compresses the binary, throws a decompression stub on the front, and so when you run it, it decompresses and transfers control back to the original binary. It's very straightforward. They don't make any attempt to obfuscate themselves, obfuscate the decompression algorithm, and, in fact, if you don't take measures to prevent it, UPX will unroll a UPX-protected binary. UPX will reverse itself at the command line. So there are other tools out there that'll tweak a UPX-packed binary so that it at least breaks the UPX part, where you can't just use UPX to unwrap UPX, a UPX-protected binary. But it's really simple. There's a simple straightforward loop. There's really no tricks involved, and it's really no problem for the plug-in. The only place that I wish I could do better is the import table. All these things, after they get done de-obfuscating, they need to go back and rebuild the import tables. They have to sort of do their own linking because all that stuff has been hidden from the operating system linker when the program is loaded off the disk. But, so I don't rebuild the import table here, but it's not too tough to do, either with a script or manually. So let me see if I can get out of this, and find a UPX binary on the list here. Bear with me. That's not gonna get any bigger. Come on. I'm locked up. Okay, this thing is a Slackbot IRC Trojan, and we'll just call it malware here. We load it up in IDA. Correctly identifies it as a Windows PE. Tells us that the import segment seems to have been destroyed, which it was, because the thing was compressed, and we let it open up, and the resolution on this thing is so low that I hope we don't have problems with the demo. The emulator's gonna take up the whole screen, I'm afraid. Okay, so IDA gives you a names window over here, any names that are recognized in the program, and we see a few. There are a couple key imports that functions, or some unpackers need, the standard sort of Windows stuff, load library, and get PROC address. Over here you get a strings window, and mostly we see that it's full of garbage. So there are not a lot of recognized strings right now in this binary. The strings windows is helpful in de-obfuscating something, because usually if you're successful in de-obfuscating something, you will unravel, strings will become available. So it's sort of a hint that you're maybe on the right track. So we're set up to go here at the entry point of this program. This is gonna be really tough with this resolution. I should try to remove all these windows, or these toolbars. See if I gain some more space. We're not getting very far. Let me bring up the plugin, see what we get. It takes up the whole damn screen. Is there any, can we get any better resolution out of these monitor? Nope, okay. Okay, we'll run it way down here. We'll start steppin' through code. Okay, you can see, so I hit the step button. The cursor advances over here in IDA. Okay, we're on the next instruction. We see some stack activity is taking place. It was a push all instruction. So the stack has grown. ESP is updated and so on. And we can step and you'll see that the registers change and so on. Okay, the stack is growing. We're working our way down through this thing. But it's gonna be a lot easier if I just ascend. You can see it's very straightforward. The code paths don't appear to be obfuscated. We can see all the loops. I don't know if you can all can see these branching things. They're kind of in a bad shade of gray over there on the side. It's just control paths in IDA that display the branching. So we work our way down to a point where we encounter this call instruction. And why is that dangerous at this point? Because we don't know what's gonna get called. Okay, so as I walk through this stuff, a call to code whose address I don't know right here because it's gonna be dependent on whatever ESI ends up holding at this point. That's sort of a danger point for me. I don't wanna run any farther than that. For all I know, it could be a library function. And in this case, it actually is. So I'm gonna back up to this point right beside before the loop right here. We're gonna get back maybe. Get back to the emulator window. And I'm just gonna run it to the cursor. And you saw a thing in there for a second. It ran all the way down to here. Okay, I know this is terribly exciting. And the question is, did we get anything done? Well, this loop, it turns out, and I won't go into, you still have to do some reverse engineering, but this loop is what rebuilds the import table after UPX has decompressed it. So we're gonna skip all of that because I don't have getproc address available. So we're not gonna be able to rebuild the import table. So we'll run down here to the end. These last couple of instructions pop all the registers off the stack and then jump up to some other location. Well, at that point, the code is done. So I'm gonna jump down there. It changes the EIP to that cursor location over here. We step through a couple locations. The EIP will make a big jump. And now I'm somewhere else. I'm way up here. You can see where it's pointing. And everything appears to be, doesn't look like it's code, okay? But IDA hasn't been asked to reformat this yet. This is code that used to be data. The stub is transferring control to the newly unpacked code. We've transformed the code in IDA through database interactions. And as soon as I step once, the emulator tells IDA, hey, reorganize that. That's code, okay? And many of you will recognize that's a standard function prologue right there. And we can in fact, working with IDA now, let's turn that into a function. And things start to look a little bit more interesting. And at this point, we can actually start reverse engineering the program. And ultimately down here, we call main. This is a Windows binary. But one of the interesting things that happened when I stepped into that function, when I called it code, IDA does some more analysis. And you can see it's re-analyzed the whole program at this point. And it's showing up a lot of strings that were not there before. Okay, so those are the same old imports that we had before. Okay, but we have a lot more information to work with. And if we go back and rework the strings window, which does not get automatically updated. We still see a lot of garbage over there. If we ask it to re-scan for strings. Now we have a whole lot of other stuff. Somewhere, let's see. And so you see some of the static strings that were previously hidden. And these are all slack bot strings. You see IRC type stuff and so on. So that's UPX. And as I said, I really wish I could make this thing not so small. I mean, this thing's too big and I don't think I can resize it. So we're gonna hide that, we'll get out of UPX, open something different. And clearly you could say that and continue working. And now it becomes a reverse engineering problem, a reverse engineering slack bot. Okay, which happens to be. And so, you know, but you're beyond the obfuscation and you can get onto the meteor work. Okay, so next, we'll take a look at ASPAC. See if I can do this right. Okay, so this is a GAO bot. It's another IRC type Trojan. Comes along in a lot of emails. And same problem, broken imports. And this code is a lot more challenging. IDA gives you about three statements. And if we go look at this, in fact, we see that the first instruction, well, after the push A, we're gonna call into the middle of another instruction. Okay, we can go up to this location, but you see how it says location plus three? Where am I? Okay, because 45507 is right here. Okay, but after the call instruction, or IDA is just parsing instructions, the next instruction, right, the call takes some data. The data is an address. And the next instruction boundary would be this jump near pointer 45A25, whatever. Okay, the next instruction that IDA can show us isn't even gonna be executed. Okay, because this thing says we're gonna, the next statement to be executed should be at 45500A, right? Okay, which is in the middle of that jump instruction. Because the jump spans 007 to 00C, right? Okay, so how does the emulator handle that? What happens then? Well, what am I doing? ASPAC, right? Okay, so, gotta consult my cheat sheet to make sure I don't blow the demo. Okay, come on. Okay, ASPAC is one of these tools that needs a heap. So the first thing I need to do, and why I don't do it automatically, don't ask me, but is we need to set up a heap. And I'm not a Windows programmer, so I don't know where Windows heaps live. Okay, but that's good enough for me. Okay, it's outside the programmer space, and it's outside the stack space, and it'll work for our purposes here. So we have a stack, and we have a heap, okay? And that's what ASPAC's gonna need. And yeah, this one doesn't need, okay. So I'm gonna step through the code, okay? Make sure that I'm probably not where I wanna be. Okay, I need to get up here. I need to jump to that location. So we start execution from there, 455001. Everybody sees that? Okay, hopefully. And we're just gonna start stepping. And I'll push this way down, and maybe we'll see the way that the emulator handles this sort of code. So we just did the push all. You see the cursor jump down. The next thing we're gonna do is the call to the middle of the instruction. So we hit, we do that. Okay, and you see it reorganize the code. The jump went away. Okay, the first three bytes of the jump are now just anonymous data right here. Okay, the call was reformatted. Okay, it's no longer 07 plus three. It is now correctly 00A. Okay, and the next few instructions, okay, at the point that we just jumped to, are reformatted as code. Okay, so we can continue to step through these. And we work our way through, and it's just gonna keep going. Okay, wherever it needs to go. Here's another call into the middle of an instruction, 13 plus one. Okay, reformats. Okay, so now we're down to pop EBP. Okay, continue to step, step, and I gotta be careful here. Let's see where we are. Okay, we're up here at 35. And you can see I have, again, what I consider one of these danger points coming up. Okay, a call to, who knows where, right? Something that's completely reliant on whatever EBP happens to be at the moment. Okay, so I'm gonna step again. Okay, so we're gonna push EAX and call this function. So whatever parameter I need to sit in EAX, which I can examine, I can see it's 455441, whatever that means. Okay, and I could use that address in IDA. I could go examine that address and see what is the parameter to this function. Okay, and I can also just do a computation. Here's EBP at 455.013. Okay, and I could add this F4D to that. And we can figure out where are we going. Okay, well, I did it in advance, fortunately. And we are going to, I think, let's see, 455, I believe that, 455F60, okay, is what the math works out to. If we take EBP and we add on F4D or whatever that is, that's where we're going. So in IDA, we'll jump down there and see what it is. Okay, well, that's the import for this unpacker's get module handle. Okay, so the unpacker's trying to get a module handle. Great, so we're gonna skip that function call. All right, I'm gonna see where we are. Okay, I'm just gonna skip pushy, I'm gonna skip these two statements. I don't care about them. I can't call get module handle from this emulator. Okay, so I'm gonna come down here and skip two statements. I'm gonna skip the push, I'm gonna skip the call. I'm gonna skip moving EAX because it doesn't hold anything anyway. Okay, skip the move EAX to EDI and we're gonna continue. Okay, and so at this point, we see the same sort of situation coming up. Okay, we're right here. It's gonna do EBP plus F49. Okay, I have the same problem. So the emulator doesn't fix everything. We still have to do a little bit of investigation, but where are we going this time? Well, F49 is just four less than F4D. Okay, so we can jump down to roughly the same spot and see that four less than get module handle is get PROC address. Okay, so now it's gonna call get PROC address. Okay, well what PROC does it want? What procedure is it trying to get the address of? So we jump back and where am I? 4A. Okay, so we're right here. Okay, so we need to get EAX and EBX will be our two parameters to get PROC address. Okay, but EAX is just the module handle. So the interesting thing here with EBX should tell us the procedure I'm getting the address for. All right, so as long as I load up EBX here by steppin. Okay, EBX is 45071. Okay, and we go over here and scroll down a little bit to 45071 which happens to be right here. And if we reformat it, you see they're trying to get the address of virtual alloc and very shortly they'll try to get the address of virtual free. Okay, and so this motivated me to develop the heap and we'll continue from here. But we're gonna skip that, we don't care. Okay, well I know we do care because this is where function hooking gets interesting or comes into play. Okay, so here's where we are. We need to, I don't know we're not there yet. Okay, we're up here, getting ready to call get PROC address. So we're gonna skip it, don't care. Okay, at this point back from get PROC address, though I have what should be the address for virtual alloc. Okay, and what I'm gonna do here is I'm gonna hook something. Okay, so the address of get PROC or virtual alloc should be in EAX, right? Okay, so I'm just gonna throw in a one there for kicks. Okay, so something that's out of range, out of bounds. Okay, and then what I'm gonna go do over here on the emulator is I am going to hook a function. Okay, program location one. Okay, and the available functions, right? There we go, virtual alloc. Okay, so if we ever call one, okay, we're going to jump aside and call the emulated virtual alloc. Okay, so I trick get PROC address. Okay, we're gonna shovel one, we're gonna set a one in its sort of import for a virtual alloc there, okay? And whenever we come back now as I step through the code, okay, we'll save the one as a result of get PROC address. Okay, and in the future when we see a call to one, we'll step aside and we'll allocate some memory. Okay, so same thing down here, f49, f49, we're gonna call get PROC address again. What are we calling though? Look at ebx. Okay, ebx is now 07e, which is virtual free. Okay, so same thing. Okay, I want to now skip the push there, skip the call there, and here I'm just gonna give it a different address. So two. Okay, it could be anything you want. Okay, and I'm gonna hook that function. Program location two is gonna get virtual free. Okay, and all that does is set it aside, so now the emulator will step aside into the emulated heat memory, heat management functions. Okay, and now we can move forward. Okay, we still haven't de-obfuscated much of anything. Okay, so we need to, hopefully we're at 66, and we start stepping through there. Okay, we're down to here. Step, step, that gets us down below. Okay, and we're working our way down here. Okay, and now, let's see. Okay, here's an example right here. 5-4-D, that's virtual alloc. Okay, that's where they stored the address of virtual alloc, and if I step one more time, okay, watch EAX. Okay, I grabbed some heat memory. I know that was exciting. Okay, but remember I set the heap up at C-000, right? So I just asked for some memory off the heap, and it gave me a block right at the start of the heap. Okay, if we keep stepping, okay, I think it's gonna come up, and you see that it asked for 1,800 bytes, about 1,800 hex, right? Okay, whatever that works out to. One and a half K. Okay, you see down here, we're gonna call it again. Okay, so we can continue to step. Okay, we're gonna call virtual alloc again, and you can see EAX. Can you ever see EAX at the bottom there, barely? Okay, C-1804. Okay, so it's basically 1,800 hex bytes into the heap, and you can sort of get the feel that indeed, we're allocating memory out of this toy heap that I built. Okay, so now we're on a roll, sort of. Okay, so we keep stepping through. We've got the two functions hooked that are of interest to us, virtual alloc and virtual free. And then the way this works is, and I have a bad angle on this, is we're looking for loops. Ah, I shouldn't have done that. So now I'm just stepping through, and it's got all these functions, and I don't really wanna step into them. Every time you have a function that gets called, an IDA, that's sub here, subroutine. Okay, you sort of have to descend into it and verify that it's not gonna make any library calls. Okay, when you're doing this for the first time, you don't wanna get hung up and sent out to fetch from the middle of nowhere. Okay, so we work our way through this function, and you can see a loop that's formed here. We don't need it, I guess. Yeah, we do. Okay, so we're going around and around this loop. I don't wanna do that. How am I doing on time? Got a ways to go, right? Okay, so we'll run to cursor. So I have half hour, plenty of time to screw up. Okay, so what I'm gonna do is I'm gonna look at all, I've been through all these functions, you're gonna have to trust me. Okay, I'm gonna work my way down, and we're just gonna run out to the function return here, okay? Okay, if we get our glass cursor for a while, then the demo's gone south. But we'll step out of there, continue stepping. Okay, again, I'm not gonna step into that. I need more screen real estate, okay? We've worked our way out of that function. They don't wanna go into that either. Any ideas at this point? Cross your fingers. Okay, good, it came back. Okay, run to cursor's a dangerous thing, right? Because you have to hit that address. Okay, if you don't ever hit that address, you're not gonna get control back from the emulator. But fortunately it worked today. It's been known not to. And we work our way down. We get out of this function. Don't let me step into any more functions. Takes too long. Okay, now we're in a trouble spot, right? Call EDI. Why am I gonna be so bad? I don't know, let's see what happens. Now that's okay, that's a little trick they have. Send them more stuff. At some point at any rate, we get to a point where, I've lost my addresses. This thing has been unrolled enough that the original binary will sort of start poking out. And the trick is really the point that I'm making on this one is that with the heap emulation, you can get through all this stuff. It's just a matter of stepping a little farther. And at some point we're gonna come up against then rebuilding our import table, if I recall correctly. No, no, no, let's see. Yeah, I think I'm in some sort of loop over here. Yes, I am. We keep jumping back to the top. So what I wanna do is work my way down. Try and find where this loop rolls out, which again is a dangerous thing. And if I don't get my cursor back, we'll just jump to a new demo. Reorganize some code, keep going on. Not sure where we're going there. So, well, these are memory allocators, if I recall. Yeah, we wanna get out of this business. So at some point what I would like to get to, and we'll sort of back up to the top of that too, is I find the end of this loop, which is difficult with this screen. Ideally what you wanna do is sort of find the end of a loop that you know is safe to run through and then hit run to cursor without any dangerous calls in the middle. And again, I'm gonna gamble right here. And we made it. And what I'm going to do is try to cheat and see if maybe we're somewhere useful. And with no strings to speak of, let's see what we get. Okay, and now you can see a variety of other strings have come into play, okay, which means that we've probably got good code. You can see some IRC type stuff in here, okay. So, the way to do this is then you wanna try to find the end of that loop. You wanna try to find the control transfer point where you leave the obfuscation part and you jump to the original code. And then you're able to start your reverse engineering. But it's there, okay. Hopefully the strings convince you it's there. We shouldn't have these strings unless we actually correctly decrypted the binary. So, I'll leave it at that and move on to the next one. Any questions from anybody? Let's start over here. A lot of these things will do anti-debugging type things where if they detect the presence of a debugger, they'll shut down, okay. My personal, I don't like running malicious code even in the debugger in a sandbox. You can run some of these things in a sandbox environment and then use a tool like Lord PE to just grab the memory image and then you can just debug the memory image because or you can reverse engineer the memory image because IDA will take that Lord PE dump and load it up just fine. But there are other tools. Again, what really motivated development this thing was Shiva, okay. Which runs on Linux and we can jump over and this will also do Linux binaries. Okay, so it's x86 code in general so we can run Linux binaries in an emulated fashion on IDA running on Windows and get at those as well. So, you know, there are people that probably would prefer to do it in a debugger, okay. I prefer not to run hostile code and so that's what it buys me anyway. Okay, there was another question back here. No, but that's a good thing to put on the to-do list. Yeah, yeah, it's definitely worth saving state at some point, okay. So that you could get back into the binary and sort of resume wherever you left off. Okay, but it is not in there right now, okay. Any other questions? Okay, T-Lock, we'll do next. Okay, which is, so this is a so big virus, okay. And like everything else, it's got broken imports and we got one instruction out of that, okay. Although it just jumps up here and then starts doing crazy stuff, okay. Like jumping to the middle of instructions. Okay, pretty common stuff. So we'll go back down here to the start location and bring up the emulator. Don't need a heap in this case but I do need a thread environment block, okay. Because this thing's gonna generate exceptions. Okay, so the way I do that is I'll just steal some space down here at the end of the stack. So basically that memory is mine, okay. Those whatever 16, those 64 bytes I just pushed on the stack. Everything gets pushed as a four byte int. So I just pushed 64 bytes on the stack. That memory belongs to me because as far as the program's concerned, the stack pointer's starting at FFC zero right there. Okay, so I can play around and point, I can put the FS base up in that space that belongs to me. Okay, I need at least the first part of that, the first two words of that so that I can link the exception handlers, okay. So we'll stick FS base up there and I'll just stick it at D zero, okay. So that's the setup for this thing, okay. Because T-Lock wants to set up exception handlers, okay. So it's gonna write to FS colon zero. So we, that's gotta be writable. It's gotta be something that we can see, okay. So that we can build the chain of exception handlers. Now I don't have a default exception handler or anything like that to go by, but with T-Lock we're okay. Okay, so we start stepping over here. So I can make that, okay. And again, it's gonna reformat everything, okay. And you'll notice that, well, you'll see some strange disassembly like this VXD call and so on, but that'll end up going away. Okay, as we reformat all the code. And what I need to do is hit over here. So we just keep working our way through and T-Lock's pretty well behaved except for the exception handlers. And here's a loop that it's in. So we're gonna wanna jump down here to the end of the loop. You gotta be very careful how far you jump, okay, when you do this stuff. Yeah, I wanna jump past the end of the loop. But it turns out that this loop is unraveling all of these bytes right immediately after it. So if I chose this instruction right here, that might not be the start of an instruction in the future. Okay, so you gotta be, and if I said run to cursor, then I'm off and never, never land, right. So you gotta be pretty careful. So I'll choose the first instruction right after the loop. I mean, that's gonna be the fall through when this JG fails, okay. And I'll take that. Regardless of whether the JCXZ gets reformatted into another instruction, at least the boundary's gonna fall at 3C right here. So we run to cursor, okay. And you'll see the VXD call actually will go away. Okay, as the code gets reformatted, you just gotta follow along. I apologize if I'm going too fast through the stepping and it doesn't make sense. I wanna make sure I get all the demos in. But the things I'm looking for are calls to library functions that I don't wanna follow. Anything that looks like it's gonna do a fetch that isn't gonna work out, okay. So here you go, you can see T-Lock starting to set up an exception handler. Okay, zero's at ECX, writes to FS calling zero right here. Okay, after, it's done some strange stuff. Okay, but it's setting up the exception handler. Okay, it skips a break point. Div ECX at this point is a divide by zero, okay, it causes, let's see if we can see some stack growth here, FF70. So the stack is really sitting right here. The divide by zero, notice that the stack jumps all the way down to C44, okay, because the emulators jammed all that SEH stuff on top of the stack, okay. It recognized that exception occurred, okay. And it's jumped us, okay. If we step one more time, yeah, maybe not. Okay, next demo, that didn't work out. I'm not sure what I did there, but you can see EIP down here at 0004, okay. So I'll come back to that one, okay. Something went awry right there. It's usually this ice BP screws me up. I should have skipped it. Okay, so one failed demo out of three, okay. Moving on, okay, Bernay is a Linux obfuscator, and it can operate in a couple different ways, okay, meaningless declarations and so on. And in its simplest form, it's really not difficult to get through, okay. They can add password protection on top of it, okay, in which case, you know, we can't go anywhere unless we can, we supply the password because they use it to decrypt, they actually encrypt the binary, okay. So you really can't get too far. There's other tools that will do Bernay on the fly, you know, from the command line. So this is just an example of running through Linux code, okay, in IDA. It's x86 code, so we don't really care what platform it's from, okay. And, yeah, so we bring it up, alt f8. Don't really need much here. And we start running. And a loop becomes evidence very quickly, okay. And again, in its simplest form, it really doesn't do much. Okay, you can see this loop that works right there. And I'm just gonna run it all the way down to here, okay. You know, in theory, I should show you that we have no useful strings, okay, at the moment. Okay, there's kind of a bunch of garbage in here, right. Come back over here, run the cursor, okay. Go back to the strings window, re-scan, okay. And now we see that indeed, this was a TESO, Bernay encrypted binary. And you start to see some of these strings that have popped out, okay. Now, this takes further reversing of Bernay itself to figure out where do I go next? Because Bernay does a little bit different. It actually embeds the entire protected binary as a piece of data in here. And then it's gonna go through some loading, it's gonna push it all over into memory. It doesn't transfer control directly to the start function, okay, it loads it up, it acts as a loader now for the embedded binary, okay. So we would have to go find that binary in here. And it's, again, being able to work through this doesn't necessarily relieve you from reverse engineering responsibilities. So, the way Bernay works is you've got the embedded binary, so we can go look for the ELF headers. Okay, so if we go over here in hex view, okay. What I wanna do is find 45, four C, 46, which is ELF, okay, in hex. And we can search through it, and it occurs here, okay. And because I happen to know this binary and the way Bernay works is this is not the embedded binary, yet, okay. We wanna go find it again, okay. And we find ELF again down here, okay. And it turns out that this is the embedded binary. And we wanna go over to the view window sort of follows along. And this looks like all a bunch of data, okay. Because IDA couldn't understand what it was initially. But we've unrolled it, we've actually changed all of this. We need to undefine all of this stuff. And I'm probably gonna go have to search again, okay. No, okay. So that takes me down to where I am in my hex view back over here, scroll a little further, okay. And right here is the start of the ELF header, okay. For the embedded protected binary. And the way to sort of yank that out of IDA, or the way that I do it anyway, is figure out where I put the emulator, okay. And I built in a dump routine so that we can just dump from the database and the arbitrary range, okay. So we dump, it automatically chooses the current cursor location, okay. Over to the end of the file, okay. And I'm gonna save it as three because I've already done it a couple times. Save, okay. So we just basically took the embedded binary, dumped it out to disk. And then if we were to go, we won't save this one. Go yank this thing off the disk, okay. IDA recognizes it as an ELF. Doesn't complain that anything's broken, okay. And throws us in at the start function, okay. But you can see all the names that are defined. And now we can reverse the embedded ELF that was formerly protected, okay. And then last one. Doing okay, 10 minutes. Okay, is Shiva. Okay, Shiva's the whole story in and of itself, okay. Shiva is, it really takes a lot of interesting steps at protecting a binary. A, it obfuscates, just like all the other things that we've seen, okay. But instead of just embedding a de-obfuscation stub with the protected binary, Shiva actually comes with a pretty complex runtime environment, okay. So it attacks on a runtime environment with the protected binary. And this runtime environment is responsible for sort of demand decrypting blocks of the protected binary, okay, on the fly. So no, so the protected binary is never fully decrypted in memory at any one time, okay. And you know, that's the big thing, right. If you run these things in debuggers or if you just let them run on the fly, your goal, your hope is, well, I'll just let it run and then once it's steady state, I'll dump the process out of memory. Okay, but Shiva keeps you from doing that because the entire thing again is never decrypted. In fact, no more than about a third of it is decrypted at any one time. And they do it, question. I know of no Windows encryptors that'll do that. Yeah, but the way it works is that it just, it fills the process space with cchex, which is an int three, okay. And it puts small blocks of the decrypted binary into that process space. And so you're gonna run off the end of a block and you're gonna hit an int three, which it handles, okay, and then sees, okay, where am I? What's EIP right now? Because it's P tracing the binary too, right. There's a lot of overhead. And then it goes and grabs the next portion of the program. After overwriting some other portion of the program, decrypts the next portion and then allows execution to resume. Okay, that's just one of the things it does. So you have to be able to get at all of those things. And so you had the initial de-obfuscation and then you had this runtime encryption piece that you have to get past, which means that there are embedded keys in this runtime. Because that stuff is actually encrypted. So you have to do some key recovery and that's part of the reverse engineering of Shiva, which gets again pretty complicated. Okay, and Shiva is all about obfuscated code paths. It does that all over the place. So we can bring this up and we don't need anything special. And again, we're running through a Linux binary on Windows, so that's kind of nice. And it becomes a matter of just stepping on through here. And the idea is to recognize some loops and this stuff will reorganize itself. So let me get back over here. Yeah, lost the emulator. And hopefully we can see some loops forming. And that's like, you know, IDA is pretty nice about showing you branching and everything. So we can see that there is in fact a loop over here with again more screen, we would see that. So right here we want to run the cursor. Okay, and then we want to continue stepping until we find some more loops. There are three of them that Shiva forms at some point. I think that was the bottom. Okay, losing all my addressing information. Okay, so we'll go over here. Run the cursor. I just screwed something up. Okay, run the cursor, we're going to continue looking for the next loop. This is getting pretty dry, I know. You get the idea, I hope for you. I can talk more about Shiva afterwards because I think we're going to end up running out of time. I'd rather field any questions that are out there. And if there are none, I'll cut you loose. So question over here again. What would I do to build a better office skater? Shiva is a good step in the right direction. There's actually some tricks in Shiva that are far more advanced than just the partial decryption. They actually do instruction replacement. So they'll scan the binary that they're trying to protect and they have a couple different types of instructions that they'll emulate because they're P tracing that binary. So they'll go through that binary and they'll replace them with int3s. And those int3s don't cause a new block to be loaded. They drop over and they say, okay, that's an int3 because that instruction, we're going to emulate that instruction. They go over and emulate the instruction. Okay, and so there are about six different instructions that they emulate and only in a few places. But what makes it tricky is even if you did manage to get the entire binary decrypted in memory at one time or you were able to watch memory and through the process of one third of it being loaded after enough time, you should have seen all of the blocks loaded. Say, okay, now I got the whole thing. Well, you don't have the whole thing because they've inserted these int3s randomly throughout the program. And what you don't get, even if you capture all of those blocks, is what is supposed to be in place of those int3s which they save aside as a list of, the int3 at location one, two, three, four is supposed to be a push EX. So when they hit that int3, they jump over because they're P tracing, they run the push EX, they manipulate ESP, they manipulate memory, then they return control back to the binary. So it's really tricky stuff. And I don't know if Neil's here or not, but probably not. Oh, there you are, okay. So here's what I would say is you emulate all these instructions. So I'd say like there's five of them, okay. You have a broader class of instructions as you emulate. Because it's really, you have to spend time reverse engineering in Shiva to know what instructions are being emulated. And then you can, your emulation list is fixed. So you can go, you pull out the emulation list, okay, and then you can repair them. After you've done enough reverse engineering, you know what all the replacements are supposed to do. You can walk the emulation list and replace the CCs with what is supposed to be there. Okay, so if you have a broader class of emulations that you do, then you choose a subset for each new binary. Maybe you could do 20 emulations. For each new binary, you only pick a random five. So that every binary is, and so there's no way to know ahead of time what emulations are gonna be used. I always know ahead of time what emulations you're gonna use right now. Okay, but if I don't know ahead of time what emulations you're gonna use, and I have to spend a significant amount of time just to recover your emulation list, and then figure out for each new binary what is that emulation doing? Okay, so that's what I, I think that the techniques that Shiva uses would make things very difficult. If they were incorporated on the window side, it would be among the best sort of IFE skaters that are out there. The SEH stuff is kind of a pain to work through, but it's doable. But the Shiva stuff makes, not only do you have to reverse, get through the obfuscation, but then you have to reverse the way Shiva works because it's the Shiva runtime that's controlling everything. Right, and so you have to deal with the Shiva runtime just to be able to see the other obfuscated code. Okay? However, I can barely hear you. There are a lot of commercial encryptors that do that. Yeah, so I mean that that's the way to defeat this kind of thing is that you make, even after the obfuscation is done, you've got to do so much reverse engineering that this still doesn't help you so much. Okay? Any other questions? Okay, I guess, oh wait, one over here. How do I know? I went to Neil's talk last year here where he talked about Shiva and they presented it a couple of times. So there is a presentation on Shiva and then there's, I gave a presentation on reversing Shiva so you can go get that and in the process of developing that one, then that's what motivated sort of the development of this and so I tried to, the development that's gone on on this project has been to make it more generic to handle more classes of obfuscators. Yes. The plugin, I've been using it since 4.5 and forward. You just have to compile it with the matching SDK. Yeah, the plugin is free open source. The URL is in the slides, it's available at Sourceforge. Any other questions? Okay, thank you very much.