 This is Blockfighting with a Hooker, Blockfighter 2. So please join me in welcoming K2. Thank you. Def Con, whoo. Thanks everyone for coming. Everyone watching on Def Con TV. I hope it's been a good time. The only problem I had so far was you couldn't get a beer this morning from the cafe. Beer sales were closed. So no, it was, man, in the little cafe over there. But luckily, I don't know, maybe sort of luckily, my ex-wife is on the way over to give me a beer. Pretty lucky that I kept it cool. Anyhow, enough of the drama and non-technical talk, I guess. But thanks again, everyone. So this talk is going to cover a bunch of different things in a sort of a exception-based hooking technique. It's using capstone under the covers in various places. But I just committed the code. Oh, I put the slide head that like GitHub on it. Let me just go to the next one. So the neat thing about this hooking technique really is in a nutshell, tracing is great. Performance tracing is great. And the kind of trace telemetry you get out of a binary when you're monitoring it, awesome. For performance testing and the AFL guys, the fuzzy lot, they use a lot of that kind of feedback to expand their algorithm based on that trace data. But there's a lot of looking at trace data, modifying your inputs, rerunning the binary from scratch. So you're constantly re-executing this binary all the time. But at least with this method, you're only going to execute one time. So you have all your state with you. And you're actually interrupting the execution. And you're able to make a decision based on, hey, do I want to emulate these instructions? Do I want to change what the program thinks it's doing? Do I want to sniff other aspects of it? And in this particular talk, so when I was thinking about what to do for these different block fighters, I call them, in a nutshell as well, the block fighters are, I'm talking about basic blocks. It's like assembly instructions. These blocks that kind of are around the execution of your application that every time there's a conditional and a branch is taken, those are two new blocks that kind of fork off. So you've got these blocks. And we're going to fight those blocks. Some people call it binary steering or other things like that. I've seen a lot of different trace or performance explanations for getting coverage, moving your coverage up. Static checkers do something similar in some cases for ensuring that they've tested all the code. And I'm sure all the DARPA stuff earlier, whoever saw all that stuff, this kind of fits in some of those aspects. But the more tools, the better. And I can't tell you how many times I've been working on a forensics thing or an incident. And I'm trying to use whatever tool. And because whatever constraint of my environment and how this thing runs, the analysis method that I was previously super successful with wasn't working all the time. It's always fun and good to learn new things and to be aware of flexible or easy to use analysis primitive. So in terms of this method, I wrote three, what I call block fighters. One is this rock defender thing. So everyone who doesn't know what a rock is, a rock is return oriented programming. There's also Jops, jump oriented, or Lops, loop oriented. I wanted to call it like DopDropDefender maybe. Joke, no? OK. There's a music reference in there somewhere. I forget who the artist was who did that track. In any case, it's super easy to do this rock protection. So kind of ad hoc, maybe if you're deving an exploit or you're doing whatever, you're just tracing whatever you want to understand the execution of something, you can use something similar to this rock defender that I wrote to kind of understand where your rock chain is breaking or if you're defending something, maybe you want to analyze and exploit and do whatever. I'll show you the code in a bit, but it's pretty fun and straightforward, kind of like the rock defender itself is just not even 20, 30 lines of code to drop that in. The other thing I thought was kind of cool, it's just like a concept really is everyone's talking about ransomware. Like ransomware is a big, terrorizing thing and you're going to steal all our money and everything. So I wrote this thing, this ransom escrow. So it will enforce key escrow of the encryption going on in your computer so that, hey, if something's encrypting something on my box, I want that encryption key. Well, too late if you weren't watching where the encryption key came from, right? So this is a super simple primitive, it's on GitHub right now. I was going to expand it in a couple of ways, but I'll cover more of that later, or I'll talk about it as I do a little demo of it. There's a hypervisor DOS thing, it's cross hypervisor. This is just something that came up while I was writing this tool and I was like, hey, my friend was like actually rich, rich in Seattle. He's doing another trace tool, it's very cool. Run Speed Tracer, I've got a reference to it in here. He's doing this other tool and he's like, hey man, why don't you do this in the cloud? I was like, yeah, okay, that's cool because the technique he was using couldn't work in the cloud because he was using these low level performance tuning Intel features that aren't exposed to the hypervisor, whatever. So anyhow, the stuff in here in Atrace is not as new, right? It's not like super bleeding edge. So there is kind of like sort of support in the hypervisors. Unfortunately though, they all break. Well, except for actually, I'm gonna take that back. The only hypervisor that is not fall into its needs trying to execute this code is VirtualBox. So thanks VirtualBox, I tend to, they're kind of like a whipping boy sometimes but you know, hey, got this one. Oh yeah, and there's some graphing stuff I did. If anyone's interested in graphing, I saw a lot of cool graphing stuff with these DARPA computers. I wanted to do this like 3D thing in the future. If you're into graphing and like computer visualization of code execution, let me know. I got some ideas I want to shoot around. Okay, yeah, hey, oops, I must have been just talking all over the place but hooking tracing. So tracing again, what's executing? The hooking, I want to modify it, right? Pretty simple, straightforward. I'm gonna talk about some various frustrations and hurdles like the hypervisor DOS that was kind of a frustration. I really wanted this to go cloud scale, you know? But hey, maybe eventually when they get fixed or who knows? And then also symbol support, I was gonna have that in. I kind of backed it out because symbol handling is kind of a pain, talk about that later. Use some other tools, oh yeah, here's the GitHub. If you want to check it out, that's my GitHub. K2, really short GitHub username. Should be easy to remember. A trace, this other one is the one we're talking about today. Inverterow was the thing I did, started a couple years ago, DefCon on 22. It's like a nested virtualization memory, like recursive physical to virtual extraction thing. It's kind of neat. Oh yeah, anyone want to drop the code? Let me know if I forgot something. The good thing about what we've got here is it runs on bare metal really nice. It can run on a hypervisor, probably virtual boxes are best bet. And we're trying to do this binary staring thing. And what that means is like, there's some things I have to do. We have to reset the flags because the way this exception management works is the flags is reset every time in a different trap handler from the kernel when it calls us, so we got to set that. So if we see the binary looking for that flag set, or unset, we need to either emulate that instruction or handle it in some way to neutralize it from detecting us. It's kind of like a classic sandbox problem. This might be kind of like a loose sandbox for all intents and purposes. There's obviously a lot of issues, fighting code in your own address space. However, we are guaranteed certain things being in the exception handling path in terms of state being synchronous. So again, yeah, there's some other DBI stuff. It's pretty cool. Totally want to check it out in the future. Some dreams for this stuff. Lots more block fighters, fun little ideas are pretty straightforward to bang out. I mean, I did the raw one in like 20 lines. I did the key escrow thing and the key escrow is like generic hooker for any function pre-post condition, which is great. The performance is kind of ranging. Slicing is your friend. And what that means is figuring out how to confine what you're looking at and not executing a bunch of random other stuff, which you don't care about, right? So discounting stuff that you don't want to look at and figuring out what you want to look at, more or less. Anyway, I do this on Windows 64, Windows 10. Other versions, you might manage my berry. This guy, Fereno, sorry if I'm dosterizing that thing on the x86 asm net board from years ago, kind of reversed this technique and found it. And then, so you can check that out. And then this other guy, Lafool, showed me this other zip that you can do to patch and help make it work better on other versions of Windows. But thanks to those guys. Back in Defcon 15, here's the paper. They're talking about covert debugging. Well, in a sense, this atrace stuff is sort of like an in-proc debugger. So all of our debugging happens in-proc in the same address space as what you're looking at. What makes it nice is that you don't have to use base pointers all the time. So if you're logging the payload for the function call to this thing you want to look at, well, I don't have to rebase my pointer addresses because I'm in a different process and everything's mapped weirdly, you know, and randomization of addresses is all the rage now. So by being in the address space of that process, you don't have to do like a lot of, yeah, I mean it's not that hard, but in a sense it just makes it super easy and straightforward to code naturally. She brought the beer. Woo-hoo. Cottonmouth. Oh yeah, so modern stuff, this Triton library from QuartzLab, this thing is super cool. If anyone's looked at it, you know, this is like kind of like one of those ideal designs for DBI frameworks, it's got all these components to it. If you see in that example of tracers on the left-hand side of that block diagram, essentially we could fit in or atrace could fit in in place of any of those. So you can basically drop atrace in for PIN or DynamoRio or something. And in fact, I started to do that with an Win AFL port to atrace instead of using DynamoRio, but I just quite didn't get it all done. It's kind of in the GitHub a little bit if you want to look at it. I wanted to narrow down the slicing on it a bit more because obviously those are more mature tools and they have like focus right in on the DLLs, like the GDI plus test case or whatever for DynamoRio and Win AFL. You know, it's just that one module, but I don't want to trace like NTDLL and all these other things. So it is what it is, but hopefully eventually we could get to decent performance level that it's not too bad to just use this thing for fun as well if you need to. It is really great as well atrace in that you don't need really to know much about the symbols, right? You're getting invoked by the system during execution and all you really have to do is flip some flags and you'll maintain execution. You don't need to do any hooking, so you don't need to know the symbols, you don't need to know how many arguments there were. You really don't have to know very much at all, which is really great. And you get a lot of invocations or you can tune that down. The disassembly again is, you know, right now it's Capstone Base, which is a great tool. I really appreciate what those guys have done there and I'm looking forward to their future releases, but obviously you're not gonna wanna do too much of that, so I'm gonna wanna try and do some kind of caching and things get overly complicated the more you start thinking about it, like, okay, how am I gonna defend against this? What am I gonna do? I'll cache that result and I'll be fine. Again, this is just some background on hooking execution and instruction-led decoders that go on and, you know, hey, when there could be a new instruction set in that binary that prevents it from being hooked with your favorite hooker, so you're unable to hook the execution of something, whereas, you know, with a model like this, you really don't have a problem. What's the problem? Oh yeah, the burgers are slow. And also, typically, when something wants to detect that it's being analyzed, like it'll do check sums on itself, hashes, like, hey, what's my check sum? Or maybe you're executing in a secure environment, like an integrity mode OS that is like, hey, this DLL can't be changed, right? Well, then how do I trace the execution of that thing with these existing tools, right? What if I wanna fuzz this thing, you know, in release mode and I don't wanna have to do a debug and I don't wanna have to do this and, you know, I wanna just trace what's going on right now because any time I alter anything about my test case, it starts repro-ing. So this is nice in some circumstances because you won't have to make as many changes to your test case or whatever you're doing to repro what you want, right? So we're not changing the code, we're not altering execution. So introducing some latency in the exception handler is not a big deal. I mean, what, people swap memory and this and that, right? It's not a huge impact to what a normal execution guarantee is for a binary. So they're just gonna kind of assume it's a slow box or whatever, you know, it just reduces the amount of problems you gotta worry about. So these are some of the different things, some of the micro benchmarking. We're a total worst case scenario was 1,000%. So, ooh, sounds slow. But if you slice it up and you're able to have good checkpoints of where you wanna enable and disable tracing, it's as low as 25%, right? It's just kind of, you know, it's impossible to have like the one method that does everything all the time. But, you know, hey, and then, you know, if we do do some caching of inputs and we do understand the slicing of this binary and the, you know, we can understand like, okay, at this point, it's doing X, so we could just time work, you know, or fast forward the state of execution to like this other checkpoint and skip something that might have been, you know, not really to worry about A trace. So it's, I'm partially Canadian. A, a boot, something else Canadian. Let's take the canoe, woo! Anyhow, you know, I'll just throw that in there. Some of the other stuff we tried on the way to write this was kind of like stack hooking. And so one of the other concepts of, you know, so I said I had the rock fighter. So with A trace, you can also essentially hook your tracing, your tracing code, your tracing code can be just a set of rock gadgets or lobs or jobs or whatever the heck. It's just a function pointer that gets called since you're not making any changes to these binaries, you're actually just getting like inserted into the stack and you can manipulate the execution without worrying about introducing unsigned code and these kind of issues or on, you know, whatever else it might be. So it's kind of fun from that aspect as well. And if you want to go ahead and bang out like, you know, a crazy, you know, backdoor or whatever else, you know, hey, there's some, there's some rock backdoors or rock malware I've seen floating around. Or if you want to trace those things, you can use this as well. It's kind of fun. It's very flexible. I guess one of the ideas is anytime you make like a offensive thing, you always got to remember to pair it with like a countermeasure. So there's a lot of like measures and countermeasures in analyzing stuff, right? Like, okay, well, you know, I'm going to analyze it by extracting the state, you know, whenever it crosses the kernel. But then, oh, okay, well, it repairs itself before it calls the kernel, make it look normal, right? So it falls within these normalized assumptions. You know, who knows, right? There could be a lot of different ways to counteract or act, you know, when you're talking about trying to understand this like huge amount of state and this huge amount of moving parts in binary execution. And I guess as we get to the demo of the hypervisor thing, that's kind of like an explanation of why all these hypervisors are dost from this thing. It's like, well, there's a lot of state moving around and no one's hit this path before and it's really expensive and it doesn't matter how many CPU cores you have, it'll just take you down if you're not efficient overall. So anyhow, yeah, the stack hooking, in the end, I might do some kind of hybrid technique because if I'm hooking the stack directly by theoretically chopping in an exception, manipulating the stack, and then executing, I'm thinking about using that as a mechanism to turn this on and off dynamically. So obviously that'd be a lot faster to kind of trim down your exception handling. This is how it works, super easy. This DR7 thing, this is like, I guess, a back door that Fenro found. So this is typically not a register that you can affect from user space. This is like a kernel-only thing, the debug MSR right there. So this DR7 actually winds, weaves its way back into the debug MSR and that's why in, if you don't have Windows 10, so this works great in Windows 10, other people have gotten varying reports, but if you're not using Windows 10, like 2008 or whatever it is, you gotta go to slash debug and then I do the MSR right for you, but you don't need to, right? So when I do the demo here, it'll just be user space. We'll even see the warning, hey, this thing didn't work and it obviously does. So the Robhook idea, kind of fine. What else is it good for? Basic block coverage, back in a DVI. Try not to emulate too much. You know, I'll be working on some new updates for caching or making it better based on what people think and what they'd like to see. You know, it infuriates me to no end every time I go to get my favorite tracing tool when it doesn't have the capability to trace the version of the OS I'm using and I gotta do the symbol thing and I manually edit, you know, there's a lot, a lot of times there's like so much rigmarole just to start doing what you wanna do with certain types of hooking. Having the flexibility to use this technique has really helped me out in analyzing binaries in execution environments that were really confined. So maintaining control, I mentioned it before, there's the flag register, but there's other things that you wanna do. You wanna make sure no one's like taking over control of the E.H. Like if you're gonna go ahead and build like a whole sandbox around this, you know, hey, go for it. You know, but that's like a very long war you're gonna have to fight. Of course, Street Fighter II, everybody. Our new game. I was really good with Ken, myself. I like the uppercuts. These are some comments on like other areas that I need to like flesh out or if you're gonna do some kind of like analyzing malware or whatever, like sandboxy stuff, some things to think about, where you would wanna do some monitoring to make sure you're not being desynchronized from the execution. Branch Stepping is great. However, in the end, if we do kind of, I mean, there are all these like rock jitters, kind of also I've seen a lot of them like capstone based as well, like rock jitters and different types of jitting. If there was like a lop loop oriented programming jitter, I think the performance of this thing would be like near native. So, it would be great. I mentioned some of this stuff. The ransom warrior. Here, I'll fire this guy out. You let me know what you think. You're gonna do, oh, my projector is not. Hold on. Let me kill Outlook here for a sec. Whoops. Okay. I think I can see it. Okay, so this thing here, the code for this is just gonna do standard crypto calls. I canned it. There's a static lib I use for test cases that, so I don't have to like race the injection of the DLL and all this kind of stuff. Anyhow, when we execute this, this is like CripGenRandom being called. That this is the data that was actually ex-filled in the, in a trace through the exception monitoring of the execution through the code analysis, or the block analysis rather. And then, you know, this is like the return value from the program, so like, you know, the post condition on that hook, like hooked and logged the incoming random data before the return, which is great. You know, so we know we're at a really good place to egress this information to like a network server or if there was some kind of like hypervisor enclave like protecting my secrets and stuff, I could be like, oh, send this over there. This is some bits that I might care about in the future. If I'm current, you know, and I'm, you know, if I'm gonna get ransomware, I could unwind that spool of bits and say, hey, are any of these used in my crypto key? Luckily with crypto functions, even if the ransomware is like got this static lib of like open SSL built into the binary, I don't know if everybody knows, but a pretty good technique for finding in crypto functions is constants, because you know, everyone wants to use like standard cryptographic, you know, APIs and whatnot and functions that are provably secure in one way or another by math people. So even the bad guys want that, so they're not gonna be shipping their own stuff. So you know, if we're able to raise the bar in monitoring the crypto and monitoring the execution of anything that's gonna try to do any kind of crypto off on your box, well, then hey, you know, now they're gonna have to either roll their own or, you know, which will probably be something that'll be crackable, give us some lead time to get everybody, you know, just saving all their data in the cloud, right? Yeah, I mean, the cloud's great, I'm sorry, but I love the cloud and actually love hate relationship. Okay, so that was a key escrow. Now the rot stuff is super straightforward. I'm just gonna show the code, it's so, it's in here. Rop Defender. So that Rop Defender was actually executing, so they're all like chained together. So this Rop Defender is running at the same time as the key escrow guy as well. The Perf overhead, once you're already done the exception pump, you can do a lot. You've already spent the cycles, so you can do a lot of stuff, right? You know, you've already taken the hit, you might as well do 10 things or 100, it doesn't matter. So this kind of Rop Defend stuff was like, a few years ago people were talking about this, like K Bouncer and these things. This is some code, you know, lots of people had like, you know, hey, let's pair these things up, right? If you're doing a RET, there better be a call instruction that's paired with that RET or else it's invalid, so let's reduce the gadget space. So that's when everyone started to talk about, oh, let's do Lops and Jops and Dot Drop, you know? So that was pretty straightforward. All you, and it's K, you know, you just go, hey, what's the RSP and whatnot? And yeah, you know, you'll see things, if anyone's like a major coder and wants to get involved, you'll see things like, hey, you know, the stack pointer is really nice to have because you can understand the depth you are and everything else. Okay, let me get my state here. Ah, that was that one. So yeah, in forced cryptographic key escrow. Good idea, I think. I want to know what's encrypted on my computer. Coverage, can you hear me now? I guess he switched networks. Flame graph, so I did a bunch of different graphs here. Here's one. One of the issues with this much data though, you can imagine, if I'm logging every basic block that's executed in this binary, there's so much data, like how are you gonna visualize that? Turns out, really hard problem. That's why I got some 3D ideas coming up, but these are three different graphs that are already built in. I did it with like WPF and Microsoft AGL. I actually got a bug on the MSAGL guys. As soon as that comes in, there's gonna be a much cooler graph which is called a graph map and it's all navigatable and expandable and you click on this block and it'll be like, you know, like a spider web kind of blowing up, it's kind of nice. But they depend on this like, Janky, edu, like university guy thing that hasn't been updated in like 15 years and they're just like, just wait, just wait, it's almost fixed. I'm like, okay, thanks for making it free. Oh yeah, here we go. So this is a flame graph. So all of these graphs are actually generated with just the stock data that's logged in the logging function, which is just, you know, it's a fairly limited amount of bits. It's about 64, no 128 bits per block, but this is kind of like the stack depth over time and then per block. So there'd be like three or four blocks horizontally at the same stack depth. That means that that function had three blocks, right? And then you see the ones with just one, that was like a leaf or just like a single call or something, but that's kind of how that thing looks. Unfortunately, like the Perl scripts, the Perl scripts that are used to generate this stuff were like generating like gigabyte files and stuff. So that was kind of, you know, needed to really trim that down. So again, the symbols are coming. I'm also kind of waiting on Microsoft's PDB to be, their GitHub to be fixed up a little bit. It's almost, it looks almost ready too. So it's coming, it's coming. The source code commits coming. You know, some different stuff. Have fun, you know, trying to do your own thing. You know, there's a lot of kind of just hints at what to do and like different ideas. I'd love to like engage people on different concepts of like analysis and modeling and just kind of understanding and comprehension of what's executing. Feel free to like shoot me an idea of something you want me to do or think about. Okay, here I'm gonna do the hypervisor one. Okay, I get the thing reset. Okay, cool. This Aprep here, this EXE is like from the repro and I committed this stuff before I realized the effect of it or how far wide reaching it was. So, you know, one thing, like if you're code, like, you know, I mean, kind of jumping back and forth between the good and the bad and the evil versus good, a little bit here, code versus code or, you know, whatever, us against the robots or something. The, this code here, you know, so frequently if you're in a hypervisor, you're not gonna wanna execute maybe or if you're being virtualized or emulated, you don't wanna execute necessarily. So, to have a neat little, you know, tiny amount of code like this that can tell you right away, hey, something, you know, you're being looked at or you're not in a native execution context is nice, you know, or, you know, if you just wanna dost an infrastructure, I guess that's possible. Some people like to do that. The fun thing is here, so the CPU realization in the user space VM monitors goes up to like 100% per thread you give it, so I gave this eight cores. Certain hypervisors have additional overhead behind that on the kernel side, roughly 10%. So, you can imagine if a cloud vendor hasn't necessarily planned for that excess capacity, they may be negatively impacted if this is going on like crazy on their box, right? If they've over committed resources. I was really tempted to run this. I saw a couple of times the CPU utilization like up over, like, even just one CPU is like, it'd say like 350%, and I was like, oh man, did I overflow something in the percentage? Are they gonna pay me now to like dost their infrastructure? They're kinda cool. So, the CPU there is at 12. Out of my swipes. So, I don't know if you can see that. I think it says 760%. And oh, you can see the graph at least. The graph's up pegged. So, it's kinda neat, it's kinda easy. So, with just one thread doing this, it's like killed the box. Cool. So, it doesn't matter how many CPUs you give this thing, with just one thread, we're gonna kill it. So, that's kinda fun. Feel free to figure out what's going on, cause it's kinda like affecting different things, but emulation of a CPU is kind of a complex thing and with this tracing stuff is what we're talking about as well is that either emulating or fighting the block to maintain control or maintain your understanding of what's going on is not the easiest thing in the world. So, as stuff gets more complex, you're always gonna see these sort of things kinda creep in. Basically, if anyone has any questions, probably wrap up pretty quick here and talk about it or see any of the other artifacts we're doing. Thanks again for coming. Let me know if anyone has any questions. Give you a couple minutes, think about it. Perfectly explained. I love it. Every time. Oh yeah, hey. Sure dude. So, for crypto identification, apart from Constance? Yes. Have you looked at identifying blocks of code? So say like, if I look at the disassembly of an RC4 function or well, leading up to an RC4, you see the key stream generation stuff. Like, have you looked at how you'd be able to identify those in line? You know, well the cool thing is, if you're doing the logging, you have this like block level telemetry coming from the app. So you could do some post-processing like some of the Perf guys do with feeding an understanding, but that might slow it down a lot at runtime. Because I mean, with like RC4, it's kind of like a simple set of operations, right? Like it's not like overly complex. You can like mask those pretty easily. But in terms of like, what the RC4 is using for its basis vis-a-vis what is the key that it's using, the input. You know, if it's not able to access random, you know, if we remove its entropy sources, then it's maybe not as important to know that per se because then we can understand, hey, this thing, it has a limited set of keys now, right? It's possible, outputs is X, you know? So yeah, something like RC4 will be a little bit tough, but maybe you could do it with some of the graph detection. But in the end, hopefully by understanding the inputs and reducing the entropy, I hope, to be sufficient in some ways, you know? Cool, thanks. Thanks, good question, thanks. Awesome. Anyone else jump around? Well, hopefully we won't have any more ransomware next year, so we'll all have backup keys, right? You know, I'd really appreciate that or insurance, I guess. Cool, thanks again, guys.