 Okay, cool. So, um, I'm a bit nervous now, following the monkey comedy hour. Um, it's gonna be hard to follow that up. But this is relocation bonus. Attacking the windows loader makes analysts switch careers. So quick introduction, I am Nick Cano. I'm 25, and I'm a senior security architect at Silance. I have written a book on hacking games. Sorry, I didn't know a clap was coming. I've written a book on hacking games. Uh, I'm also a Pluralsight author just had a course published like a month ago on C++ secure coding practices. And what does this talk about? Well, this talk, in essence, is about the windows portable executable header and how it can be weaponized to destroy parsers and other static analysis tools, things like disassemblers and such. And that all culminates into like a PE rebuilder for 32-bit windows binaries. So you might ask, why are you attacking relocations? So, this is a disassembly of a game. A game that I make a bot for. Disassembly. To get rid of a jump call, just put some knobs in there. And then I saved my new binary to disk. I ran it and it crashed. I was like, what the heck is going on here? And I looked at the code and my knobs were now changed. The code was different from what I had patched. And I, I know what you guys are thinking. What the heck? What's going on here? So, as it turns out, there are these things called relocations. So, at first I was like, oh man, this is broken for no reason. Then I was like, okay, relocations corrupted the patched code. Then I was like, maybe I shouldn't patch code that is relocated. But then I thought, I can use relocations to hide my arsenal in the bowels of the machine. So, before we dive into the nitty gritty technical details, let's, let's restate what we're doing here. The windows loader has this piece in it that applies relocations. Relocations exist because binaries will have pointers in there that are, that are sensitive to the position of the binary and where it is loaded. So, if the binary is loaded at a different base address than the compiler expected, those things need to be relocated. So, if we can weaponize that, if we can instrument that and make it modify things for a different purpose, we can use that for obfuscation. And we can get an obfuscator such that there's no deobfuscation code. The windows loader does it all for us and then it's kind of a mystery if it's the first time seeing this, you're like, this thing is a mess on disk and then in memory it's fine before anything executes. So, that's what we're going to talk about. So, first we need to say, well, what are relocations? They exist to enable ASLR and dynamic mapping. That's not exactly true. It's a bit more nuanced than that. But for the sake of executables, they exist to enable ASLR. And let's look at an example. So, this is the same binary loaded at two different base addresses. And I'm specifically looking at a function table. And as you can see, the function tables in the two different loads, one on the left and one on the right, the addresses are different, at least that upper part of them. And on the left we have, what is that, O-1-3-3 and on the right we have O-1-1-7. That is because they're at a different base address. We can see that actually here outlined in blue is kind of reflected by the address of where this function table is in the binary's memory. And we can see that the values are actually changed here in the bytes of the function table. So, let's take a sidebar really quick to talk about the PE header. By the way, this is not my diagram. I got it from Corkami. If you don't know what that is, check it out. A really good resource. So, there are three things we're worried about. And the first thing is image base. Image base is a part in the PE header. Well, so the PE header is kind of the description of a binary in Windows. What needs to be loaded? What parts of memory need to have? What access rights? What functions are imported? Stuff like that. But we're specifically interested in image base, which is a value you can use to say what base address you want your binary loaded at. And if ASLR is not on, that will be followed. Throughout the talk, I might refer to this as desired base. There's also DLL characteristics. Now, this is just a two byte value. And it's like a flags value. We're specifically interested in the bit 0x40, which if it's set means ASLR is on. And if it's not set, ASLR is off. And ASLR is just the part of the loader that says, okay, I'm going to put this binary at a different address for security purposes. And finally, there's the data directories. And specifically, there's an offset from the data directories called image base entry reloc. So the data directory is basically just an array of pointers and sizes to different pieces of data that the loader needs. And we're only worried right now about the reloc index, which points to the relocations table. And the relocations table looks something like this. At the top here, you have the data directories. And then we have this little gray block. And then we have what the reloc blocks actually look like. In the data directory, there's a pointer to these blocks, to the first block. And then all the blocks should be contiguous after that. And there's also a size which refers to the size of all of the blocks in bytes. Each block ends in an OX000, just 2 by 0 to say this is the end of the block. There's another block coming so that the loader knows that it needs to look for a new relocation center. So each block looks something like this. First, there's a virtual address. That virtual address is sort of like a base. So instead of encoding every single address that needs to be relocated, you encode a virtual address and then offsets relative to that virtual address that are less than 4 bytes long and that just saves space. So this virtual address is the base which we will offset from in each entry. There's also size of block. And that is the size of the entire block. That is including the size of size of block and the size of virtual address. So those are both 4 bytes, so 4 plus 4. And then in this case, we have 4 entries which are each 2 bytes. So then plus 2, 4 times and you get the size of the block. So each entry is 16 bits. Four of those bits or half of a byte are the relocations type which we'll talk about in a second. And then the 12 remaining bits are used for the offset. So you get up to 4.096 from the virtual address. So you can have virtual address and then you can have as many entries as you want within 4.096 bytes of that to be relocated. And this optimization, it might not seem like much but when you have a function table where you have like 30 things to be relocated back to back, it makes sense why it's done this way. So this code kind of describes how the loader does relocations. This is just pseudo code. Obviously it's not written in modern C++. But this gives you the gist and we're going to talk about this. Specifically what we care about is the first line where we say delta is equal to base. That's the base that we're actually mapped at. Minus desired base. That's a base that we asked for in the PE header. And then the first if statement inside the loop is the type of relocation we're going to use. So if we were to look at that first if statement, we would see that a plus equal operation is used during relocations and the right hand side of that is delta. So essentially everything that is relocated is the value that's there plus equals delta which is derived from the base address we're given minus the base address we wanted and that's how everything gets fixed up to point to the right place. So the conclusion that we can come to is that if we want to abuse relocations or if we want to be able to control what they're writing, base must be pre-selected or we must have some way to take base and make it be something we know that isn't the base that we asked for. There needs to be a difference there. And desired base is the only means of controlling ASLR. So that's a bit tricky and delta is also dependent on desired base. So we know we need to use desired base somehow to pre-select the base but it's not clear how and I thought maybe we'll just try invalid stuff. So I tried negative one or max int or whatever you want to call it and it didn't work. PE fails to load. Well what I know is those final four F's actually have to mask and be zero because of alignment. So I tried all zeros. PE fails to load invalid header. So then I combined the two and PE loads at base OX1 with four zeros after it consistently every time. So we're getting a base address every time that is different than the one we asked for and always the same. As I later learned, Corkami already had all of this documented so this has been known for a few years as well. So let's talk about the loading process. It's pretty complex and this is not all of it but this is sort of what the Windows loader does. So you don't really have to read this whole thing. What we really care about is that the things in blue are things that happen before relocations are applied by the loader and the things in orange are things that happen after relocations are applied. So if we want to obfuscate things, we might want to obfuscate things that happen after relocations are applied or we might break something. So if we just take out everything that isn't relevant, this is when we do our attack this is what it will look like. This is the flow chart we'll go through. Now you might notice there's a red arrow coming from ASLR enabled which means we're actually going to have ASLR set to off and the attack happens here. So it turns out we're not tricking ASLR but when the allocator is being asked to locate something at ff ff 0 0 0 0 it's saying yeah I can do that but it's spitting back 0x 1 0 0 0 0 for whatever reason. So our targets for relocation obfuscation well we know import table is loaded post relocations so we can obfuscate the entire import table that is all of the libraries and functions that are imported from DLLs can be obfuscated so on disk they look like trash but the loader fixes them up at load time. Additionally even though sections are mapped pre-reloc they're not used until execution which is post-reloc. Additionally their memory protections aren't set properly until after relocations in case read-only stuff has to be relocated. Also entry point isn't used until post-reloc so we could obfuscate the entry point making it that much harder. The problem is that is right protected before relocations are done so you can only do this if the target machine doesn't have data execution prevention turned on. So and the conclusion is pretty much what I just said we can hit imports code and resource sections and optionally the entry point with this attack and that is this little green area you can see highlighted. Now you might look at this and say okay that's only like half of the binary but what you have to realize is this is a minified PE and that is actually most of the important stuff and in like a real binary it's going to be somewhere like above 90% or 95% of what's actually there and what's actually important content. So the final attack looks something like this. We'll load the target PE file, the PE file that we want to obfuscate. Then we'll apply the original relocations because remember relocations are there for a reason OX10000 is probably not where that binary wanted to be mapped. So it's going to break unless we apply the original relocations and say okay fix everything up for that address. Once we do that we flip ASLR off because remember even though we plan to trick ASLR we're actually tricking the allocation function. Then we set desired base to that tricky FFFF000 and then we loop over all of the data that we want to obfuscate that is sections, import table, stuff like that in D word or UN32 size chunks and we decrement them by what we expect Delta to be. That decrement is because the relocations part of the windows loader doesn't increment so we just do the opposite of that and that is kind of our obfuscation. So every integer in the stuff we're obfuscating is going to be messed up a bit which is good enough. Then we discard the original relocations table because we've already applied them on disk. We generate a new relocations table reflecting the positions of every single decrement that we've done so that the loader can fix them up. We save the new PE file to disk and profit we're done. We now have a PE that is completely mangled on disk. It has a huge relocations table and then when it runs it maps perfectly fine. So let's jump into some demos. So here I'm trying to load it in IDA. You can see the messed up DLL name IDA thinks is an import. IDA throws some weird errors. On older versions of IDA it was crashing sometimes. I can't get newer versions of IDA to crash. You can see IDA lights up red. It doesn't know what it's looking at. It only sees one subroutine and even the assembly code there's a bunch of invalid instructions and if you look at the strings they look like trash. Everything looks like trash. Binge is going to be mostly the same. It's going to see two subs but it's got like question marks there. It doesn't know what it's looking at. So next is CFF Explorer. This one actually crashes somewhat consistently. I think it's this one that like it reaches out on the network and does a bunch of like out of band scans that come back like a few minutes later and those do cause crashes more than any other tool. There's no crashes in these demos because one you have to get lucky. They're not even consistent across the same binary and also you have to get a binary that happens to cause them. I haven't looked more into it. I'm really worried about as you can see here the import names are all messed up. Some of them look kind of right because remember we're doing it in four byte chunks. So for strings we're only messing up like one byte at a time. Depends just completely chokes. It has no clue what it's looking at. And then resource hacker just throws an error. It says you don't have a valid resource table because the resource section is encrypted or obfuscated. And even PE bear you can see it just lights up red. Imports look like trash. If we tab over to the disassembly we'll see it says okay these are invalid instructions. We'll also see a bunch of jumps into locations that can't possibly be jumped to. And finally PE view it's going to be the same thing. It's just going to maybe pop some errors and then imports are going to be messed up. Codes are going to be messed up. Okay. And that is that one. Okay. So we've shown that this attack is viable and that it works and we can do something with it. Oh wait what am I doing? I have another video to show. I'm pretty dumb. We like we've seen it break stuff but we haven't seen it work. What am I talking about? So let's do that. So here you can see me clicking on the binary and it runs. This is an encrypted obfuscated binary with this method it's running. And it's just process explorer that I've obfuscated. It's going on over there. It's just process explorer that I've obfuscated so that's why you saw process explorer launch. I did this because this is a relatively complex application. It's got a lot of stuff in its resource section and it's got a lot of non-trivial code. So I figure if it works on this it'll work. What you see right now is me searching for invalid strings in the image to show you that this is actually obfuscated. Some of the strings are not obfuscated because they're in the dot r data section which I didn't obfuscate for this demo but if you look here you'll see a bunch of file paths and a bunch of like error strings that almost look right but are actually gibberish when you try to read them. And it also kind of messes up like a ton of AVs so here's a malicious sample. We're going to go ahead and pack that. And now you see it's generated a new sample sorry. And we're going to drop them both on virus total and see what happens. The first one is going to light up already. And now the second one even though it's effectively the same binary that's just obfuscated by the windows loader I guess we only get two threads. Okay so we know the proof of concept works. We should probably test it on multiple platforms. Windows 7 it works that's what I originally developed it on. Windows 8 no one uses Windows 8 come on. Windows 10 aw fuck it doesn't work. Yeah I get it. Okay I'm done sorry. Okay no but really we have to find a new attack for Windows 10. What ends up happening is asking for 4f's 4 zeros does not yield that one with 4 zeros after it. It actually lets ASLR go through the process even if the ASLR bit is off and then we get a random base address. So no good. So I thought maybe I can embed multiple PE copies for all possible base addresses and then use some kind of reloc tricks to point to the right one which is something Corkami has actually shown as possible but that would be way too big it would inflate the binary even more than using relocations already does which is actually pretty significant. So then I thought maybe I can tweak ASLR configuration because I know there was a major change on Windows 10 in how all of that works and that actually does work. So if you notice the demos were on Windows 10 and this is what I used to demo on Windows 10. We set mandatory ASLR to on and bottom up ASLR to off and if we do that with this dot red file for specific executables it works but I don't really like it. As soon as you start touching these registry keys everything is going to light up. AVs are going to light up, EDR is going to light up. Monitoring is just going to say hey this is bad if it doesn't just block this entirely so I wanted something else. And then I did a lot of playing with it. I was trying to see what I can do to control base addresses and I noticed this. So this is just a file. I have multiple instances running and I'm basically just looking at the base address of this file. And even though I have multiple processes running of the same file I noticed that they all have the same base address. There's something in Windows that is caching that base address or reusing it on loads. Maybe it has something to do with copy on write. Not exactly sure. And then I noticed okay if I take this file I copy it to another file. I delete the original and then cat the copy into the name of the original. It sounds crazy but it's a way of copying a file to itself without the file system being able to track that it's the same file. And then I launch it again. I invalidate that base address. So they all have the same base address again but it's different from what it was before. So this tells me there's something in the file system that invalidates that base address that's being reused. So I can at least brute force and get a base address that I want by just invalidating it every time. So we're going to go over a pretty complex flowchart. So really quick here's a flowchart on how to read flowcharts. Okay I actually put it in the deep fryer so let's just talk for a sec. So what I figured out is that if you take a file and you memory map it then you close the mapping and you launch that file it will be launched with a different base address than last time you launched it. So if you do this in a loop you launch a file check the base address if it's not what you want memory map the file close the mapping launch it again you can keep getting a new base address and you can brute force and pre-select whatever base you want. And then attack looks like this. So you'll have a root process and a drone process effectively where the drone process is just launching and throwing back a specific return code based on whether or not it has a base address that it wants and then you have your root process which in a loop is doing that mapping stuff with which is highlighted in yellow. So here in orange is really the main part of the attack within the root process we're highlighting the loop that's going in and saying okay I'm going to keep looping until I see the right base address which is signaled from the drone process by an exit code of OX what did I use bad beef yeah. So if OX bad beef is the exit code of the drone process it knows that it didn't get the right base address and it will keep doing this loop. Now these are actually the same file so effectively we want to put that embed this attack in the binary we want to obfuscate it will make a copy of itself to be the drone process so that it can map that copied file to invalidate its base address and launch it over and over and over again. But effectively there's the same code which is why you see at the top we create a mutex and check if it exists and that's how we know for the root process or the drone process. So to weaponize this the tool must create a new section with enough room for this brute forcing code embed the code inside of this new section and make that code aware of the original entry point then it needs to point entry point within the PE header to the embedded code so effectively it's a typical parasitic infection. And for this to work this ASLR pre-selection code this brute force attack must be position agnostic because we're just throwing in a binary wherever you can. It must be generically embeddable in any PE. And yeah that's basically it. So to do that I made a bunch of pre-processor macros so I could do everything in line this you don't need to read these this is just showing a bunch of variadic macros and once I had all of these in place it looks something like this. So like some of the weird stuff I had to do like some of these macros take a string in and then they emit that string as like assembly bytes and then jump over those bytes and then use like a call and then a ret to get the address and like move it into a variable really weird stuff. But this is how I made all that work. Now you might be looking at this big block at the top which is doing a bunch of function resolution and you might say Nick why didn't you resolve get module handle and get proc address and then do the rest normally. And so my answer to that is this is like more obfuscated it's really indirect this code actually looks like a mess. So it's harder to reverse. That answer is a lie. I got really carried away writing the macros and then I had resolved all of my functions by the time I realized I could have done that. So yeah. And it worked. So this actually worked. So there's some caveats. Because this attack is actually running code. We are sorry. I had my bullets wrong in my head. Come on Nick get your shit together. So it can be slow. It takes about 200 iterations to land on the base address that we want. So depending on what machine you're running it on it can happen in five seconds or a minute because you're launching this executable hundreds of times and you're mapping it to memory in between. So also the size of the executable matters. Size matters. And imports can't be obfuscated. Right. Because what ends up happening is the binary has to get mapped into memory properly for this attack to start running. And if imports are obfuscated on the times where we don't hit the right base address the windows loader is just going to nop out and it's going to say okay this is wrong I can't resolve any of this and it's just going to error. So that's a bit of a problem but there are some advantages. Because first we don't really need to use that one with four zeros after it. Before if you saw the request of FFFF zero zero zero zero and you're like a parser you know it's going to map this address but with this attack we can pre-select any address within the realm of possibility. And the side effect of this is some form of symbolic execution or manual analysis is now needed to determine what base address is going to be obtained. So it's harder to just take this and do the relocations like inside of a parser to fix up the binary. So yeah we can't do imports which was actually really cool to see but at the same time it's harder for automated analysis. And let's see what this looks like. So you see process monitored on the right and you're going to see that just showing a bunch of process launches this is a brute force going. You can see on the left it copied itself to just its name two dot exe. It launched itself a bunch of times looking for the base address showing that in process monitor and eventually it created that binary and then eventually it launched which we see down here. We see that it actually loaded and launched process explorer which is what we had obfuscated. If we go and we look we see okay we had our first thing it copied the second thing it launched it that got mapped properly and then that process explorer actually then drops a 64 bit file and launches that which is why you see an extra process. But that's just showing that we've actually obfuscated resources and they got deobfuscated so that process explorer could dump another binary out of them perfectly without messing it up. Yeah and so what can we really do with this? We've seen it work. We know it's an interesting attack. We've got the windows loaded for deobfuscating stuff for us but what can we do with it? So you can annoy analysts with this. It's going to be annoying like maybe the first time or first few times you see this but if you see this a lot it's actually not that annoying once you know what's going on. We can break a lot of automated static analysis systems as we saw before a lot of PE parsers just choke on it. We can break a lot of tools those two are kind of one in the same but we can break maybe some AV parsers I haven't looked too much into it but I imagine there's at least one out there that's choking on this but there are a lot of improvements we can make as well such as more obfuscations. Now this might mean new targets so instead of just doing like a few sections and imports we might be able to find and identify other things that can be obfuscated or we might be able to do multiple passes. So we do right now each d-word in the file consecutively in any of our targets but what if we do that and then we start over offsetting by one and then we start over offsetting by two and then offsetting by three instead of getting like one or like one and a half bytes that are messed up for every four bytes everything's obfuscated. The thing is your relocations table is going to blow up if you do this but I mean if you really want to make it hard to analyze it's possible. We might also be able to do header scrambling so we might be able to embed things in the relocations table that say these things need to be relocated that aren't things that we've done anything to. Things that are needed by the loader but not needed during execution. So then the loader will go and it will relocate things that were already correct then corrupting them in memory. So not only would you have something that looks corrupt on disk and fine in memory you could selectively corrupt things to make dynamic analysis hard because right now this doesn't really do anything against dynamic analysis. And we might even be able to combine this with runtime packers and that would just be an extra level of annoyance. Now support for 64 bit binaries is definitely possible. The pre-selection attack it seems like there would be a much larger search space but as far as I can tell Windows does kind of prefer to load it around a base address that is like the maximum possible user space address that's available minus the size of the binary. It tries to load it around there at first. So if you do that you might be able to narrow down the search space though I haven't exactly tried it and the FF FF 0000 trick works similarly in 64 bit but not exactly the same. Might also be able to support DLLs. I actually have no evidence for that. I just think it's probably possible with the right modifications. And then you could also do selective obfuscation. So rather than obfuscating the entire binary which just throws up red flags right away you might have an IP address embedded in a section. And you only use this to change one byte in that IP address. So now the analyst is like okay this is the CNC because this is a string this is a hostname or an IP address. So then they try to connect to that and they're like okay the CNC is dead but they were looking at it in IDA which isn't doing relocations at load time you relocate just one byte in that address and now you're good to go. And it's kind of confusing. It won't trick a lot of people for too long but it will add an extra annoying bit there. And that's the end. So we have a bunch of links here. This QR code I believe goes to the slides. So here's what's going to happen with the code release. We've decided to do it like this because while we're not dropping any O-Day or anything that's like super bad we are breaking some stuff so we want to give people time to catch up with that. So in two weeks we're going to drop some samples. So if you're working on fixing this if you write a parser or you work for an AV company you write a PE tool you can go ahead and start thinking about this. The slides are already up so all the information is there. In two weeks you'll have samples to test on and exactly one week after that full source code release everything will be dropped and anyone can play with this, make changes, commit. The source code is in C++ Visual Studio 2017. So I think if you hit the relocation bonus get hub repo. I've posted a timeline on there for now that will eventually point to the code. Full timelines on there but I think it's September 3rd for the full code release. Thank you.