 So what I'm going to be talking about here is basically some ‑‑ not even really fancy pants thing. It's more of an artistic expression if anything else, if you really want to call it that. It's really just a fun, clever technique to be an asshole to people. So now I'll introduce myself if this shit is going on the screen. Is it there? I'm totally sending a signal. Maybe I'm not. Hang on. Let's try this. Yeah. Yes, no. Nothing. There we go. All right. Okay. So my name is Frank too. It's not actually Frank squared. Some asshole spammer on Twitter has Frank too. And I've been trying to ‑‑ I tried to socially engineer them to get rid of the account because hey, it's technically a spam account and I can get rid of it. But apparently if you're honest with them, they won't do it because I was requesting to get a name taken away or something like that even though it was a spam account but fuck them, fuck Twitter. A lot of people recognize me by my hat. I'm with DC949 and DC310 and I also work for Rapid 7. I can't put that there because there's a lot of F bombs here. My manager did not want me to do that. So I gave this talk at ThoughtCon but I was given only about, I want to say, 15 minutes or so to speak. And this is kind of a complex topic. So it was basically a standard performance that just so happened to feature reverse engineering and all sorts of funny stuff like that. And there were basically two camps. There was this one guy on Twitter who said I have no idea what the fuck Frank too is talking about but it's awesome. And then there was this dude on Reddit who saw my slides and he was upset with all the weed references and he was, you know, angered that such a serious topic wasn't really covered so seriously. And the bottom line was, you know, he wanted more content and less bullshit. So I kind of want to focus on this quote, nothing personal to the dude from Reddit if he just so happens to be in this audience. But because he had a fair criticism because the talk didn't have a lot of content, it was bullshit because it was geared to be a stand-up routine for hackers basically. But at the same time I kind of feel like people don't really present their information in a very entertaining way even though it is. It's always very cut and dry. Here is EIP, here is ESP, here is how you perform exploit, here is my ode, now clap. And a lot of people, it just seems like, thank you, I appreciate that. And it just seems like to me that this stuff is just, there's so much entertainment at this convention alone. So it just feels like a lot of people should be presenting it in an entertaining fashion and that's what I'm trying to do. So this is basically what you're going to see. Extremely referential shit to computer science and absolutely juvenile humor. So I hope you'll appreciate it, that's exactly what I'm going to try for here. If you came for a serious talk, I'm very sorry. So if you look at the, this is basically a very scientific analysis of the previous talk that I gave. If you compare the science to the medication quote unquote that I took while performing, not performing the talk but writing it, you'll see that there was a lot more medication involved. But don't worry because there's a little bit more science now. So if you were, well, technically that counted as medication, so I'm not sure. I would probably have to increase the bar a little bit. Anyway, so here's the content. A while back, my friend Merlin right here who is sitting in the front stage wrote this really awesome IRC Python repo bot that basically takes one lines. It was a really awesome exercise in teaching functional programming. It's a whole lot of fun to mess with. You don't even really have to, or you can just add function after function after function. This is actually a combination of all sorts of different functions here. And all it really is just me and IRC doing a rainbow sine wave that's just wagon my dick. And it's, it's one of the silliest things you can do. But I mean, look at that. It's, it's a really awesome rainbow sine wave of just wagon my dick. And it's, it's just beautiful. And for some reason I thought, Hey, what if I applied this to binary obfuscation? I have no idea what, why I came to that conclusion. But it turned out awesome nonetheless. So I'm going to cover a few basics here before continuing. This is the second I need some water. So f of x formulas are actually, you know, very simple in their concept. I mean, you have they work exactly like regular functions. You got your f, you got your x, and then you got your input, and you get x times seven. So you get whatever x was times seven equals your value. You know, in Python, you can make a lambda function, lambda xx times seven. If you want to do Java, I'm sorry. I really don't think you should do that. And, you know, mathematical functions can get a whole lot more complicated than just this. I mean, they can do, there's, there's like integrals and shit. How does it, how do they work? You know, so there's all sorts of shit you can do there. And they can get a whole lot more complicated. But this is, it's a very basic concept. As far as assembly is concerned, you have your jump and call instructions. But when you see them in a, in a debugger, specifically, they don't, they look like it's, but it's like jump zero, zero, four, zero, one, zero, zero, at least that's what it looks like in the debugger. But if you look at the actual instruction, it's an offset that goes from, you know, five to 10 or, you know, whatever the hell you want it to be. Call is the exact same way, but it pushes a whole bunch of stuff onto your stack. Except for the fact when you have like jump EAX or jump dereference EAX or things like that, that is actually a specific call to an address. And usually you don't, you don't have the ability to determine what's in EAX just specifically from doing disassembly or anything like that. This will come up later. It's, it can be an issue. So you have to be aware of it. So let's talk about jump short for a bit. So jump short. See now you're starting to figure it out. Thank you. So jump short is a special jump instruction in the X86 architecture that allows you to have a single byte offset instead of a four byte offset, which reduces a lot of space. This is going to be significant again later on because of all the manipulation that's going to be happening with the individual instructions. So it's important to keep in mind that, you know, a jump short only has a range of about 256 signed. I can't even remember how big the number is for a four byte number and don't even get me started on 64 bit. There's no such thing as call short and jump short and call short. Who knows? So here's some computer science witchcraft. In the middle of making these slides, I realized that, you know, you can actually define the assembly as null space. If you look in between the individual instructions, it's actually every instruction is executed one after another. And you can technically interpret this as an unconditional jump to the next instruction. So if we assume that after every instruction, there's an unconditional jump so long as the control flow of the instruction doesn't necessarily change, then we can assume that there is an unconditional jump between every instruction. So if you look at this assembly, for example, you'll see that, you know, we just have a very basic thing. I encourage you to decode the ASCII, by the way. So basically this is just a regular set of instructions. The jump zeros you see here in the middle are essentially the unconditional jumps you don't really see. They go from the instruction one after another. So that's essentially what's going on there. So it's therefore possible to place every single assembly instruction anywhere you want, but only if every instruction is followed by an unconditional jump. Because if you're transposing the assembly and you need to maintain the same code as before, then you're going to have to attach an unconditional jump to every single instruction. So let's see here. As far as one-dimensional arrays are concerned, because we're going to be drawing on a grid, you can technically interpret a one-dimensional grid as a two-dimensional, or a one-dimensional array, I'm sorry. A one-dimensional array is a two-dimensional array. It just requires a little math. You have to multiply by the row or something like that. I can't remember exactly, but it wouldn't, it's not that hard to figure out. So this gives us the ability to interpret an array as an XY grid. So if we have the assembly instructions that have unconditional jumps that can be linked one to another, we can technically draw the instructions on an XY grid and then apply an unconditional jump after every single instruction in order to maintain the same code flow. And this is pretty fucking cool. So in order to actually accomplish this, you have to follow a few steps. First, you need to disassemble every instruction, because obviously you're trying to figure out what your code is. Then you allocate space and memory a whole lot larger, or at least I prefer like 10 times or something like that than the original code size. So this way you have a whole lot of room to mess with, because this gives you the ability to do all sorts of different shapes and sizes and whatever the hell you want. It gives you a lot more room. And for every instruction, you want to turn the FX value. And once you do that, you place the instruction at the corresponding XY location. And then you join the instruction with an unconditional jump, then you mark the memory is executable and you run that. But unfortunately there are a lot of issues that you're going to contend with, like the 6,000-year-old Earth. So like gravity, it only works in theory. In practice, it's not exactly going to work, because there's all sorts of different shit to deal with because X86 is just a cluster fuck. All your jump instructions fucked, your call instructions are fucked, self-referential code is fucked, any code that relies on iteration of individually manipulate the code you're doing, you don't even think about it. So let's start with jump instructions. Since jump instructions are offsets, when they wind up placed in a different location than they were originally, you're not going to point to wherever you thought they were. Because again, like I said earlier, they're offsets. So I mean, if you have a jump instruction right here, and it says, hey, I'm pointing five bytes over here. That's a jump short. And then you move it over here. It says, hey, I'm pointing five, but hey, wait, where'd it go? So you're going to have an issue there. So short jumps are in a similar situation. You may as well just completely ignore short jumps when you're dealing with this, unless it's just not even worth it. Because there's going to be a long distance because of the, especially if you're dealing with a single-dimensional array, there's going to be a lot of issues with, there's going to be a lot of issues with the jump shorts because it's going to be a lot longer than you think it is. Short jumps are easily fixed because all you really have to do is convert them to long jumps. But then after you convert them to long jumps, you have to figure out where the new offsets are. Dealing with register-based jumps, like I said earlier, that's just going to be a whole pane in the ass. Because automatically determining where every register is, is going to rely on a lot of knowledge of compiler theory and determining what the register is. Just from reading the code alone can be an issue for all sorts of reasons. It could be changed at run time. It could be a function pointer or it could be a class function pointer or something like that. So unless you're going to want to do a whole bunch of extra work, it's really not worth it. You may as well just ignore it. FFX formulas, when you actually write them, it's not really as elegant as it is on paper. So if you want to do it right, or at least do it well, it's going to require a lot of work because you'll have to do function pointers for your constants. No, not function pointers for your constants, but you'll have to deal with Voodoo of C and C++ in order to figure out your class pointers and things like that. Why do I keep saying class pointers? So here's how you deal with it. At disassembly, you want to convert all your jump shorts, jump pants before storing them away. Like I said earlier, it'll make things a whole lot easier for you because you're just dealing with the offsets. Very simple. The actual offsets are going to be a huge pain in the ass, though. Every instruction that you detected has an offset. It needs to be recalculated. This means that you need to keep track of both the instructions and where they're going to be placed, as well as the targets that they're going to be, as well as the targets that they're going to be. So you have to track the instruction that they're pointing at, as well as the individual instruction. And they would probably take me way too long to actually explain how this is, and I don't know how to do it verbosely. So the source code that was supplied with the Defcon CD is basically an implementation that I'm going to demonstrate on stage. So I would suggest that you look at that to figure out how I did it there. After all the instructions are replaced, you're going to want to replace the old offsets with the new offsets. Duh. Assuming you didn't fuck up the offsets, you know, it's that simple. So now that we have all the caveats out of the way, let's go with the actual, a more realistic implementation. It's something like this. So first you disassemble the instructions, then you prepare your buffer, then you initialize the formula constants that you have, and then after that you're going to want to iterate over the f of x values, determine the data pointers, and you're going to be wanting to track all the potentially messed up instructions that you're going to be disassembling. Then after that, once you've dealt with all that, write the instructions to the corresponding pointers that you've created, then you're going to want to repair all your conditional jumps, mark the new section of memory as executable, then you run it. And assuming that everything worked out fine and everything is pointing at the right place, weird things happen when this gets all messed up. So yeah, it's just a lot of messed up instructions and bad pointers and you're jumping into really weird locations and memory. So once it all works, you run it. It's pretty awesome, isn't it? So why does this really matter? Is this really just a trick and pony show or is it, you know, doing anything? Like, does this actually do anything useful or does it just look pretty? So the actual utility of this is that once you isolate the assembly instructions and once you actually isolate the assembly instructions, we can put the individual assembly instructions anywhere we want, simply based on a formula rather than having to write an entire function that will likely be fingerprinted by AV or something like that and just be able to write a whole series of mathematical functions that will just draw a whole bunch of assembly onto the code, into memory. So in order to obfuscate the clarity of the code path, all you really need to do is make a bunch of functions, maybe select them at random, maybe select them, you know, iteratively or something like that or just, you know, determine based on, you know, who you're attacking or something like that, generate a random mathematical thing and you can do all sorts of stuff. So if you want to perform various polymorphic techniques, you can also use this mathematical formula to do that as well. So instead of having to write your code that manipulates, like I was saying, manipulates your code in a specific way every time, you can write just completely random functions that decide to, yeah, you can write a series of functions and then just have those functions specifically. Yeah, the alcohol is really hitting me now. Wow. It commands it. So instead of writing code, yeah, I already said all that. And remember that anti-reversing isn't specifically about just, you know, finding the really cool, hip, ode, anti-debug techniques. It's really not about doing all of that. I mean, sure, you could do a really awesome thing, break out of IDA, spawn last measure everywhere and completely fuck with the reverse engineer. But at the end of the day, you know, it's really about just being a complete dick to the reverser. Because if you're a complete dick to the reverser, he's just gonna get pissed off and say, you know, fuck this malware. And just walk away. And meanwhile, you'll be able to sell all your bots to the Russian business network, no problem. Because you're just pissing off every reverse engineer that's out there. They don't know how to Google for shit. They don't know what the hell you're doing because your code is all fucking messed up. And it's just, you know, they're just getting pissed off. It's a game of psychology really. So if you're extremely creative and configure all sorts of just different things to do out, then you're going to be a really awesome anti-reverser. Because it's really about just completely fucking with people. That's all anti-reversing really is. Sure, it's, you know, it's, there's a technical aspect to it as well. But, you know, you're really just trying to get them away from your code. That's all it really is. So here's what I'm going to do. I'm going to take the obfuscation function and I'm going to obfuscate it. Then I'm going to take the obfuscated version of the obfuscated function and I'm going to obfuscate the obfuscator again. So let's pull up the code. All right, so here is math troll.exe, which essentially contains my sine wave example. You can see, let's see, obfuscate by formula. So here is obfuscate by formula. And you can see the assembly instructions here. You know, they're all in working order and they're going to, they're doing their thing. If you look at the code, it won't help you. This is just the assembly instructions of everything that's been compiled. And you'll notice that I use C++. I'm very sorry. I try to avoid C++ as much as possible. So let's see here. Where did I put that break point? Ah. Right here is where the actual, this is where the obfuscated function is. So once it gets here, it actually, this is going to be the obfuscated function now. So as you can see, the jump instructions have been applied. There's a whole bunch of just different stuff in this buffer that makes it look a whole lot more obfuscated than it really is. And it goes to each and every individual instruction just perfectly fine. And it's, it, I'll show you the shape in just a moment. This is just arbitrarily, I can just hold F8 the whole time or actually I can just run it and it'll still be fine. Yep, there it goes all the way to the end. So the code is still fine and it's, it's now in, there's a bunch of different jump instructions there. So it, it, it looks messed up, but I'll explain why it's not that messed up in a moment. Here's a, oh, oops, wrong direction. Here's a visual representation of what the stack looks like. Every time this happens, I generate a random sine wave formula that will arbitrarily, you know, make a bunch of different shapes, but this is the coolest one of, of the batch. So I decided to put this up. The, I believe the code starts about, can you see my mouse? Yeah. I think it starts about here. I can't remember exactly, but it wraps around and keeps going. And this is basically what you're seeing is the actual code flow. So it goes all the way, you know, it just, it just does a cool sine wave. But I mean, it's not just sine waves. You can also do spirals too. So you can, it, I, and these are really the only two formulas that I included on the source. Like I said, you can do a whole lot of other creative stuff that you want to. And this is, what, what this is essentially is just a diff from the initial buffer and the finish buffer. And it's, it actually just looks exactly like that. And it's surprised, it looks pretty cool to me. But, you know, the issue is that you're using unconditional jumps. So, and that's really bad. Because if you use unconditional jumps, like I said, you're just, the code flow is still exactly the same. Because the unconditional jumps were there to begin with. So if you have the unconditional jumps there, all you really need to do is go from the entry point to the end and get rid of the jump instructions. All you have to do is read those instructions and boom, there you go, you got your code. And it's completely unobfuscated. So, how do you deal with this? Well, the inverse of an unconditional jump is a conditional jump, which goes in two directions, which makes it more awesome. In fact, you could say it's 50% more awesome. So if we, but that provides a very interesting dichotomy because, well, if we need conditional jumps, but we also need unconditional jumps, what the fuck do we do? So that's what opaque predicates are for. So for those who don't know, an opaque predicate is essentially a Boolean statement that always evaluates to a specific version no matter what. So let's consider the null space expansion I talked about earlier. If you have a set of instructions and they have unconditional jumps between every instruction, it also follows that a series of assembly instructions which don't have a direct effect on the assembly can be applied. For example, as long as you write very specific instructions that don't modify the underlying assembly of what you're trying to obfuscate, like if you're trying not to mess with registers, then as long as you maintain the state of every assembly instruction, then you're good to go. And this is pretty awesome, too. So you can consider every assembly instruction to be able to be wrapped like this. You have your preamble, your assembly data, and then your post script. And the preamble is essentially what comes before the assembly instruction and the post script is obviously what comes after it. So the preamble section is typically used for two things or it can be used for two things. You can repair the after effects of the previous preamble of the past opaque predicate. The anti-debug code chunks can go in there, too. But the preamble is very limited because you can't really do that much. The post script you can do a whole lot more with because it's actually going into the next instruction, so you don't have to really worry about it that much. So you can do a whole bunch of other stuff as well. So what can this section be used for? You can put opaque predicates and obfuscated jumps to link to the very next section. Anti-debug you can also stick in here, too. General code flow obfuscation, that sort of stuff. Encryption. One of the things that I think would be as really cool is that actually I'm working on it right now is being able to encrypt and decrypt every single instruction so that as every instruction is executed, it decrypts the next section and decrypts the next section and decrypts the next section. Oh, by the way, here's a bunch of anti-debug in the preamble. Oh, I guess you don't get the rest of the code. And there's all sorts of other stuff you can do with that, too. So here's a great example of this. In Green we have our preamble data and a very generic call to isdebugger present, which, you know, all it does is once it figures out that there's a debugger, it gives you the finger and jumps to some random code section that will probably spawn last measure or something awesome like that, who knows. And we have a very simple opaque predicate here at the bottom, you know, you maintain the value of EAX in the post script of the top instruction, ZOR it so that JZ thinks, oh, well, I obviously can go either left or right. I think I'll go right because it's zero. And then you pop EAX and you get your EAX back and the next instruction isn't modified at all. Then you got the next instruction, et cetera, et cetera, et cetera. So this introduces a whole lot more issues because what you're going to wind up running into is it's going to be really hard to determine which instructions affect which. And there's all sorts of other nonsense. If you have schmoo balls, I highly suggest you throw them at me because I'm going to be that guy because I didn't finish all of that. So, okay, good, no one has schmoo balls, I'm safe. Our FFX formulas also don't necessarily need to be iteratively run. You don't have to do just F1, F2. If you're clever, you can figure out specifically how many instructions you're going to have. And then for every instruction, just F of 27, F of 54, F of 9, and that will essentially place your instructions in random places. And when you do that, depending on how you wrote your code, that will allow you to, I shouldn't have had eyes, that will essentially allow you to determine, man, I shouldn't have had that last shot. Wow. Long a pause, I kind of deserve it. So you can essentially just determine it iteratively and it will still link your instructions. They'll just be completely randomly placed. So if your code is generated from a predictable formula, then it also follows that the entry point is predictable. So you can take this to one level more before you actually wind up getting to your code. You can essentially obfuscate the entry point in some way or another, do a whole like 300 assembly instructions that just gets the entry point. Oh, by the way, here's a little bit of anti-debug and oh, it manipulates the entry point just so much that you can't even run the code. So there's all sorts of stuff you can do there too. So there are various drawbacks to this as well. So you're, yeah, there are various issues that you're going to be still running into this even though this is pretty cool. So this technique assumes you have highly compiled code. So if you've basically compiled it with either GCC or God forbid VC++, well, VC++ is actually pretty cool for a few reasons, but, you know, actually all compilers suck. What am I talking about? Anyway, if this really assumes that you have highly compiled code because as it's disassembling the instructions, it's following a very simple path. So it's not going to assume that there's all sorts of really weird tricks that are going on. For example, if you have an opaque predicate that goes off into La La Land, then it's going to fuck up and add those instructions and your code is going to be all messed up. So if you're trying to obfuscate somewhat obfuscated or manipulated assembly, it's not going to work out that well for you. There's also a massive memory footprint because as, you know, let me go back to the image to show you, there's a lot of blue space here and there's not a lot of red space. All the red spaces of code, all the blue space is sure there are things you can do to make it a whole lot more efficient, but there's obviously a whole lot of extra space there that's going to be an issue for you. So it's going to have a massive memory footprint if you want to do it right, or if, yeah, if you want to isolate out the instructions. Yeah, you're dealing with a gigantic data set. So it gets significantly larger when you obfuscate more than just one function. So if you have some really awesome pack or you downloaded from malcode or something like that, well, I really hope it's efficient for you. So function pointers are unpredictably fucked. Sometimes they're good. Sometimes they're not. It really depends on what you're doing. So it's definitely going to be an issue there because you're not going to be able to predict where, when a function pointer is accomplished in assembly, it's usually a register-based instruction like call EAX, call ECX, or an offset to ECX, usually if you're dealing with a C++ class. So they're going to be broken in some way or another. So it's the same way with the C++ STL is usually okay, but sometimes it'll wind up getting fucked. So you're going to have to deal with that. The more clever you get, if you get into the preamble post-script stuff I was talking about, the more you think about it, the more it gets really complicated because, oh, okay, I'm going to push this here and this set here and then, where's the dragon book? I need the dragon book. And it's usually what, well, I'm about to kill it, Adam. All right. Seriously, I should get some food. But anyway, it really becomes a really quick slippery slope from, hey, I can put a jump instruction here, and then you wind up digging through the dragon book for months and months on end trying to figure out what exactly you're doing. So I think I'm pretty short on time, but that's essentially the end of it. I hope you learned something today. I got really drunk, so I'm not sure exactly what happened. So if you, if you want to follow me on Twitter, my, like I said earlier, my Twitter account is Frank Squared. I have a blog and a website. If you like Anzi, it's on my website. So cool. That's my talk. Thank you very much for coming.