 Please join me in welcoming Amanda Rousseau and Rich Seymour. Hello everyone, I'm here to talk to you about Finding Zori. It's a malware analysis triage with automated disassembly. A little bit about ourselves. My name is Amanda Rousseau, also known as Mauer Unicorn on Twitter. I'm currently a malware researcher at Endgame. My day-to-day activities is looking at malware every day, writing detections for them, teaching my intern something. And then Rich. Hey, what's up? My name is Rich Seymour, also known by my hacker handle R. Seymour on Twitter. I'm a data scientist at Endgame. And I got a chance to work with Amanda on this project. I'm really psyched about it. We started it in January. And we're about six months later. It's about 36,000 lines of code. And we're going to tell you about it, all right? Thanks. So a little bit before we begin, I kind of want to go over the quick overview of current assemblers of today. Talk about what they do for PE right now, the functionality and features. And I wanted to go over a usage and demo. But first, I actually want to show you the tool before we get into it so you know what the hell I'm talking about. Awesome. So that loaded up pretty quick. So if you don't know what a dissembler is already, it's kind of taking the binary code and turning it into assembly for you. So you can analyze this, the Mauer, if you're looking at Mauer. So I'm just going to go ahead and drop this, want to cry sample in here really quickly. And it will disassemble it really quickly and put it to the screen, which is browser-based. So it's cross-platform for you to run on any Linux, OSX, or Windows distribution. And you can play around with it. You can see here's the short circuit thing that MauerTech registered. And as well as linear disassembly, let me make this a little bit bigger for you. So it's all there for you to play with in the UI. No crazy buttons. It's really easy to mess with. It has a click-and-drag. We made this in two weeks for you. We know we use it for automated disassembly, but we wanted you to have a tool that you could play with in the browser on any platform and get used to looking at disassembly. So I'm going to switch back to my slides and continue presentation here. So now that you've seen what the hell it is, now we can talk about what it does and how it compares to other dissemblers. So what is the problem? My boss told me look at 1,000 samples in three days. And I'm like, there's no way I can look at 1,000 samples quick enough and triage it and sit there and copy one sample into a VM, open it up in IDA and analyze it. So what did we do? We had to think of a better way to scale our automation. So we wanted to basically be able to take our pipeline right now which handles training our models and also add to it disassembly, which is something we've wanted to do at Endgame since before I was at Endgame. So it allows us now to basically take our sample set and turn it into disassembly features that we can read because it throws out right now JSON, but you could basically serialize it to any format. You can integrate it into your pipeline. So what we thought about doing automation, we wanted to try some disassemblers that were already out there, open source, if it's a library and try to compare them across what we needed. So we were thinking about how big the size of the code was, how big it was in memory, if it was stable, how much the price was. It was a cross platform, was it easy to use, is output accurate and can we integrate it into our services already? So I kind of have this little chart that we have good, okay and not so good comparing Capstone, Redari 2, Ida Pro, Hopper and Binary Ninja, which are pretty much the common disassembly libraries out there. So requirements. We needed fast development. We needed stability and resilience because if we're running through a bunch of services, we'll need it to spin up again. It had to be cross platform because we're not just using it on Windows or using OSX and we're also running it on a server. Output can be easily integrated into our services, which JSON is really useful for that. It had to be really easy to use. It had to have a core feature set and output had to be accurate. All right, so the first step we did was diving into the code of all of these disassemblers and trying to pick and choose what are the pros and cons of each and kind of build it into something that we needed in our own internal framework. So we adopted a different aspects of what Capstone does since it's based on the LLVM and GDB repositories. Also a little bit of emulation that Kimu does as well as understand how to put that logic using Rust and then fix some bugs along the way. So evaluating example. Say you have an x86 32-bit instruction here. So typically the 66 represents the ops size of the instruction which tells you is the operand bigger or larger than it should be and then you have the opcode which tells you what instruction it actually is. So how many of you are familiar with opcode 90? Yeah, so it's not, right? Not, not, not one. So really this is actually exchange AXX in Object Dump and LLVM because of that ops size is there. So I did it correctly. Capstone doesn't do it correctly and Distarm doesn't do it correctly. So some of these like minor changes between different disassemblers is gonna give you different output when they disassemble the instruction. Exchange AXX actually is a knob but it technically is not based off of the CPU, right? So here's where I ask who here knows what Rust is? Okay. And who here is programmed in Rust? Okay, that looks like 12 people maybe. That sounds great. I'm glad everyone's gonna be able to contribute to this project. I think the thing that we wanna emphasize besides the bullet points on the screen is that I didn't know Rust really. I mean, I'd been tracking its existence but that's a big difference from like diving into writing a disassembler in a few months. So Rust gives you all these nice things like it protects you from stomping on the OS from your stack. It handles memory much more cleanly than any programming language I've used. It's really fast and the oddest thing about it is that it was really fast for us to develop in because all of the really dumb bugs that bog you down for a week and then it's another week and then it's the next month and you're like, why can't I get this to work? Basically get caught either by the compiler not letting you do it or well yeah, by the compiler basically. So it's really helpful and I suggest you learn it. All right, so current features. Yes, it's open source. It was really like hell bent on making it open source. You can contribute now. We already had like how many contributions last night? We've got 180 some stars. We've got two contributions, a lot of issues and it's only been live for two days. So it's really bumping and grinding. Yeah, okay. Currently it supports x86 and 64 bit architecture. I only had time for that since my job requires PE executables. It displays strings from reference memory. If you're familiar with the flare on plugins they have where if you're pushing strings to the stack and it emulates it and produces a string for you that's already built in and manages memory it outputs JSON for your service consumption. It has two modes, light emulation which means it does handle register stack and some instructions as well as full emulation which I don't recommend because it's really slow right now but you can do it if you wanna get through a packer or some type of encryption routine. I also was able to simulate the thread environment block as well as the process environment block so that you can do dynamic API call loading as well as evaluate functions from DLL exports so you can populate those in the code itself. All right, so the design. So we have a PE loader similar to what Windows does for loading a PE. It's very basic, loads the teb, peb as well as DLLs from the PE image header. And also a memory manager that manages all those memory spaces so that in case the malware is trying to access an address outside of those memory spaces it won't allow you to as well as the analysis struct which contains all the functions of this assembly and all the imports. And then finally when you're analyzing through all of the different sections of code it has a state which tracks the CPU registers and all the flags, a new version of the stack as well as tracking loops. I wish y'all get into later, right? So if you're wondering how a chump like me contributed to this project I decided to play not invented here syndrome on a PE parser. I use this great library called NAMM which I think originated in France. It's a parser combinator framework and it allowed us to parse the header for PE headers rather safely. If you've ever messed, who here has ever looked at a PE header? You know, MZ is a real person. And the thing about headers, like if you've looked at Corkami's excellent work on describing what this stuff is to people there's a lot that's useful to the machine to run like getting the imports correct and the tab and the pad and entry point and all that. But there's also a lot of stuff that's just kind of like retro garbage from the 8-bit DOS days, 16-bit Windows days, the 32-bit Windows days and the 64-bit Windows days. So the cool thing about this parser is it basically takes everything so we can data science the heck out of it later. If we wanna figure out that that corrupted DOS stub means that it's some threat actor we've never heard of then we can do that now. So I'm pretty excited about it and yeah, that's my little thing. So the information that we take from the PE header is used to build the PE image into memory. So that means building all the sections, making sure they're read, write executable and tracking that as well. Building the stack, building the DLLs into memory in the right locations, you can actually configure that. We have a little JSON config file that you can set whatever address offset you want for the DLLs or the teb and peb as well as building the teb and peb in memory. So what I meant by loading the DLL exports, we have this giant JSON of all the DLLs exported from like Windows 7 and it pulls all of the API calls out of there so you don't have to keep using the DLLs, you can just use this giant JSON and it'll load it up and rebuild all of the addresses as it would in the PE image and fill all those addresses for you. So if it's doing any dynamic API calling and it moves an address into a register or an address into a new memory location, that address will actually be there simulated. All right, so like I said, dealing with dynamic API calls, you need the teb and peb. So we kind of recreate that for the Windows 7 environment. It handles memory so anytime it accesses one of those, you can choose to allow it or not allow it as well as managing all of those addresses on the stack which is nice. And let me explain what I mean by this. So we have the shell code doing some interesting stuff. In the header imports, the only API calls that it's actually using is exit process, get last error, get local time and get module handle A which we don't actually see load library A, virtual protect or shell execute A, right? So here you can see it's taking this string called load library A and pushing it onto the stack. From there, this function right below here, it's going to call this function and get that address off of the PEMP. So that means it traverses all of the DLLs as well as the names from the PE headers in order to collect that address offset. Once it's done, it's going to put that address into EX and then move into 401004, that pointer. And then down here, it's actually going to call that newly moved address to load shell32.dll so it can also load shell execute A afterwards. So in reality, the dynamic imports are low library A, virtual protect and shell execute A. So this is actually the output from Zory and you'll be able to see all of that happen in there with the light emulation. All right, so the teb and peb, I didn't do anything really crazy. There's so many resources out there to look at the whole struct for the teb and peb. Some versions of windows are going to be different, but I just used whatever was available out there. You don't need the whole hidden features of the struct at all. I hear it's just like the top of it, for example. So I recreate, the nice thing about Rust is that you can take a struct and serialize that struct into a binary array or a binary vector and you can have that vector being accessed by the memory manager as if the assembly instructions were to access those addresses as if it was actually on the OS, which is really nice. And so I had to figure out how to do a link list in bytes, serialize bytes, and put that all together into a structure so that the shell code could use it. All right, and so another thing to help with the analysis is that we have this thing called analysis queue where it will take a branch instruction, like if else, and there's a call. You'll have two different directions in order to take for the call. So the right hand side of the branch will go to the top of the queue while the other part of the branch will go to the bottom of the queue, which will help the code analysis follow the actual control flow for you. So it follows both directions. And I also introduced looping for the emulation. So say you run into an infinite loop and you know it's an infinite loop but you want a short circuit, that loop case. There is a configuration there for you to short circuit all of the looping in there. Say you only want to do 4,000 times, you can set that information. I don't recommend using the emulation mode, but if you want to, you can play with it. I really need to improve it a little bit when you read a couple of research papers. And then finally. So signature analysis is a way of adding annotations to your code so that you don't have to be able to read hex like Amanda. You can just get a nice signature to tell you what the function was. I spent the last week and a half of my life trying to reverse a SIG parser from IDAP format so we could use it. I failed catastrophically in that those formats are always changing and it's very hard to keep up with it. That said, we do have signature matching in Zory right now but we're always looking for help to make it better. And luckily some folks are already stepping up so. Yeah, oh. Oh my, it's me too. Oh my goodness, we're prepared. So this is the cool part for me as a data scientist, not a reverser. We took a hundred Ember samples which is a data set we, our company released of hashes. We took the actual binaries, ran them on this old Dell box that was under my desk. The only new thing new about it was SSD and it could get through all these samples in 20 minutes. So that was a thousand samples, 20 minutes, about 1.25 seconds of sample running in parallel, maxing out all the CPUs and it just dumps a huge amount of data and sometimes too much data if the binaries really big. And then you can do whatever you want. You can rep through it, you can find function names, you can find imports, you can do graph analysis over in the AI village there's a bunch of people who think they can do a lot of cool things. So I'm really psyched about this. All right, examples. So it's really easy to build Zory. All you need to do is make sure you install REST, cargo build release, target release Zory dash F wanna cry at EXE and it'll produce some results. So if you don't want to use analysis part of Zory, you can just pop in the disassembler part of it and throw it like this nice chunk of bytes and it'll disassemble this Hello World program. But it doesn't really show you any strings, it just gives you the disassembly output. So if you want to use the analysis part, it will actually populate all those strings when it's like popped off the stack and show you when a function ends and annotates all of those nice lovely comments for you. When comparing Zory with Ida Pro, well we have the wanna cry ransomware file, you could see we don't have all of the nice argument enumerations yet, but you know, six months, that's what we could do. We could also see the short circuit string like I showed you in the beginning. This is just the normal output from the terminal, but you know, we have the UI for you to play with so you don't really have to use this part of it. But for development, maybe if you wanna use it from the terminal you can. Here's Radari 2 comparing the two different functions here. It's kind of weird, because Radari does this thing where if it identifies API call within like the first couple instructions of a function, that's what it names a function. In this case, we don't try to do that, it's just a regular function call. But it's got parity between the two. And then I showed you the demo earlier, but that I could show it to you again with a different sample, I'll just do a normal bin file. So it does PEN binary. So this is the same Hello World program, it's really fast. You just wanna see the linear disassembly, you can see it populates Hello World and you can see the function in the graph view as well. Or if you wanna do something more fun with a bunch of jumps. Oh, actually, wrong one. You can see how fast it is. This one was Rumbertick, it's actually part of my RE102 course. The longest loading time is actually the libraries there. But, you can see this was a fun function and it populates everything there for you. I don't have any string analysis since this was built for automation, but hopefully soon it will have all that in there for you. So I made it look really cool and spent the whole weekend on that, so hope you guys like it. If you wanna go download the code, it's already on github.com and gaming slash sorry. People are already playing with it, it's there for you to play with. Thank you.