 Hello everyone. Good afternoon. My name is Alexandre Borges, I'm a malware and security research at Blackstone Security. Let's talk about malware, deobfuscation and emulation. It's a very short presentation, only 190 slides, so let's go. It's my profile and that's my agenda for today. We will see something about anti-reversing, metas framing work. We will see something about Mias treatment, Hader. We touched something about de-tracing on Windows and finally, a very interesting trick using as anti-virtual machine detection. Let's start. I've been analyzing malware for many years. Honestly, most of the time, you use dispatchers, a spec, armadillo, petit, and so on. It's okay. We know how to unpack them. It's so easy. Most of them are pretty easy to unpack. We also know the main memory APIs, like these ones in red. We also know how to use a debugger. We know how to set up some breakpoints. We know how to dump, unpack binary from memory, how to dump, inject code from memory. We can use very special tools like PCV from Hatcharizat. It's an X9 tool. We can use volatilit to dump, inject code and unpack the code from the memory and fix. It's in part others' table by using, for example, a very special plugin named Jimpska. It's so easy to do that. We can also write very short scripts for the bugger. For example, in this case, I wrote a very short script to dump, unpack the inject code from memory in X6 for the bugger. It's so easy to do that. I tried to comment line by line for help you later. We have one, two and three slides containing these scripts. It's pretty easy to do that. However, the real world is a bit different. In the real world, things are different. For example, yes, I can write some plugins in IDAPRO. It's so common to do that. I can write new loaders, for example, to load some new binary, like, for example, MBR file. But modern packers have been using different tricks, different anti-reverse tricks to make our lives harder. Some packers, such as VM Protect, Temida, ArcSan and Agile, have been using different tricks, hard tricks to bypass. Our goal here is to try to explain what can I do to circumvent some of these tricks. Most of these packers have been focused on 64 code. They have been using so different tricks to do that. For example, these packers, these advanced packers, remove the import to other tables, of course, they try to encrypt every single string inside the code. They try to protect the memory using a kind of memory checking. In this case, it's almost impossible to dump the code from memory because the code is encrypted on the memory. In this case, it's impossible to modify or dump the whole code from memory. These packers use, for example, very interesting tricks when protecting .NET code, for example, rename classes, methods, fields, and so on. But much worse, these packers virtualize Intel instructions to a very special kind of instructions. I mean, I have Intel instruction being virtualized to a virtual machine environment using Spark, using Spark sometimes, using RISC code. In this case, it's so complicated to bypass. Additional tricks are used by these advanced packers. All instructions are encrypted, unfortunately. Most of obfuscation is stack-based, so it's so hard to handle aesthetically. The virtualized code is polymorphic. In this case, for example, one Intel instruction can be mapped to several different virtualized instructions. We have lots of fake push instructions. We have many dead codes. We have code reordering. In this case, it's a kind of spaghetti code. We have a very special trick named code flattening. I will speak about that later. And we have tons of anti-burger and anti-virtual machine techniques. We don't know nothing about the internals of these virtual machines. These protectors have been using very special and private virtual machines to translate the Intel code to a virtualized code. We don't know anything about that. However, we know that the general idea is always the same. Instruction must be factored, decoded. We have it to find the pointer to the handlers. And finally, we have it to execute each one of these handlers. That's the general idea. I have an instruction. This instruction is decoded. And once the instruction is decoded, the dispatcher pick up one of these handlers to process the instruction. And the re-award is a bit more complicated because the virtualized instructions are organized into an array. In this case, you won't see so many upcodes in the virtualized code. You will see some indexes. Indexes point to the encrypt instruction. Once the instruction is decrypted, we have the upcodes point to the function pointer. And finally, from the function pointer, we have the handler being executed. Additionally, we have other tricks, other anti-reverse techniques being used. For example, code unfolding. In this case, packers, these advanced packers take one constant and turn it to several constants. We have pattern-based obfuscation. I will talk about that later. We have so many in-line functions inside the code. We have tons of virtual machine techniques. We have tons of garbage codes. And of course, we have many, many, many duplicated codes. Furthermore, we have other tricks being used. In this case, for example, we have some countering direction. There are some very interesting tricks being used here. For example, using the return instruction to skip code, skip that code. These packers use exceptions. These packers also use opaque predicates. Opaque predicates is a very low trick. It's basically a back-to-back jump and jump and z-instruction. Apparently, you have a conditional jump, but indeed the same instruction is taken, the same branch is taken, and the other one is never taken. I will talk about that later, too. I have a very educational example here. I know that we don't have enough time to see and learn how to write IDAPA plugins here, but I left some procedures step by step for helping later about how to set up your environment to write IDAPA plugins using Visual Studio. In this case, we have one, two, and three slides explaining how to set up your environment to write IDAPA plugins. I have been using IDAPA plugins to handle these complex and reverse cases. Of course, we don't have enough time here, unfortunately, but I wrote a very simple IDAPA plugin. This plugin is so basic I tried to comment line by line for help. This plugin aims to find web links inside a binary. I wrote this plugin here. I try to comment every single line, every single important instructions, and I execute it. As you see, it works. I'm able to find, for example, web links inside a binary. You can also use processor models. This is an extension of plugins in IDAPA. It's so hard to explain that now, but I left some very basic explanation here. For help, but returning to our problem, imagine this program. It's a very, very, very simple program written in C language. C is quite easy. And this program has a linear execution. I mean, the execution is straight. Only these blocks. Of course, obviously, we don't have any kind of obfuscation here. We don't have obfuscation here. When we disassemble it in the IDAPA, the code is so simple, too. But my idea is using a very old trick, anti-versa trick, named code flattening. When I apply code flattening technique on this code, I transform this code from linear execution to multi-branch execution. You know, it's the same code. But in this case, we have a linear execution. And in this case, we have a multi-branch execution. When I open this code in the IDAPA, I will see a multi-branch disassembly. You can try and make some tests using code flattening. For example, I used the obfuscator WM to do that. I left the complete procedure about how to install it. At the bottom, I show you how to run this kind of obfuscator. And as you see, we got the same code obfuscated. Paytation, look at that. The same code, but this time we have a multi-branch execution. Obfuscation is so used in the industry. It's so hard to circumvent them. But we have some techniques to do that. This is on overview. And this is the decompiled code. Look at that. Originally, we had only one Y statement. Now we have one Y statement, three Y statements. And one Y statement. This is a very short example about opaque predicates. Take a look. We have two jump conditionals, jump Z, and jump energy instructions back to back together. However, the XOR instruction is above. It's establishing that only the jump zero is taken. Another jump is never taken. It's a very long trick when you are handling anti-version tricks. This is a very simple shell code. There at the top, in red, we have the decryption routine. Very simple. And once I wrote a very simple IDAPRO script, I decoded it statically, and you'll see the decryption of the shell code in light blue below. Another anti-version trick so used is called stack manipulation. Apparently, it's a very easy code to read. However, the retching instruction, the return instruction is not true. It's not a true return. This return instruction is skipping these blue instructions, and the true return instruction is this one. This is a very used technique by these kind of advanced protectors. So, as I promised you, I will show some tricks to handle different kinds of obfuscation. Let's take a very educational sample. Imagine this scenario. I have just one instruction in green at EAX-ECX. Once this instruction is obfuscated, we have the second stage in yellow. Obfuscated again, we have the third stage in orange. Finally, we have the last stage four in red. The question is, how can I reverse the process? How can I go from the stage four and find that all of these instructions are equal to just one instruction at EAX-ECX. First, I will use a very interesting framework named the metasmy. Metasmy is an amazing framework. Metasmy supports different platforms, and I show you how to install metasmy at the bottom step by step. I test it, and let's go. Take a look. I pick up all of these instructions in the stage four, and I start here. My choice is using 32 bits, and I try to comment line by line, block by block to help later. Basically, the key points here are the yellow line, because I'm initializing the backtracking engine, and I will try to solve our problem symbolically using upcodes. I'm logging each executed instruction, and finally, I'm showing you only the effective instructions here. I run our code. Our code was written in Ruby, there we have our initial obfuscate code, and if you take a look, we have here instructions being executed one by one, containing all the registered information, and flags one by one, and finally, we have EAX at top is equal EAX plus ECX. You've seen it's possible to de-obfuscate all our obfuscate code, and find that the bunch of instructions, the bunch of obfuscate instructions are equal EAX to ECX. So easy. The effective instructions are those ones. Additionally, I try to use emulation using a very nice combination between the keystone and UEMU. UEMU is either a pop-up game. Once again, I show you how to install keystone here, and this time, my choice was write a C language program. I try to comment line by line again for help. If you take a look, the key line is the second one, and the third one. In the second one, I'm creating a keystone engine, and the second one, I'm starting using the keystone engine. In this case, keystone is an assembler, so I'm assembling the instructions in opcode to x decimal. I executed, and at middle, I got the x decimal equivalent to our opcodes. I saved it in a file named deathcon2019.bin. To prove that I was right, I make another program using this time capstone. Capstone makes the inverse process. This time, I'm disassembly. I inserted our output from last slide. I wrote a new program using capstone. I'm programming here in C language, and I proved that I was right. I got the same inserted code, but returning to our problem, I opened our file, deathcon2019.bin, in the IDAPRO. It opened perfectly, and I used the UEMU. I show you how to install UEMU at top. I set up the Eax2.3. I set up Eax2.6, and I emulated using UEMU. Finally, I found Eax is equal to Eax2.6.9, in this case, I solved numerically. I also used Unicorn. Unicorn is an emulator. Once again, I wrote a very simple program in C language. I inserted our X decimal here. I created a macro named deathconcode. I tried to comment this program line by line or block by block. My comments are in light blue. Here, we have two key points. First, I set up, I set the Eax2.4, I set Eax2.7, and I set stack size. I start the emulation here in the second line. Sorry. I started the emulation here in the first line in yellow. Once more, I executed our program. Our initial values are Eax4, Eax7. Line by line instruction by instruction are executed, and our final result, Eax is equal to Eax2.Cx, D11 in decimal. I've also used Miazmi. Miazmi is another amazing framework to de-obfuscate codes. Miazmi works so well in different platforms. Once again, I show you how to install Miazmi step by step to help you. I tested Miazmi. It generates a very nice graph, but let's return to our problem. I opened our file, Deathcon 2019, at the first lines. I set 32 bits. I set our just-in-time machine to WLVM. I set our initial address, the initial value of Eax2.3, the initial value of Eax2.6, and I set up the final breakpoint here. Once I run, we have our code disassembled. Each instruction is executed line by line, and finally, I have our numerical result here. A nice graph. Additionally, I also have our problem using symbolic execution. Almost the same, however, the only change is at the bottom. I'm using a symbolic execution engine here. I execute our program once more, instruction by instruction, and finally, I have the Eax is equal to Eax initial plus ECX initial. As you see, I'm able to double-skate a very simple code by using different platforms, by using different frameworks. I've also used Triton. Triton has another very interesting platform. In this case, Triton supports X86 and X64 architectures. Triton supports symbolic execution. In this case, I'm able to emulate only part of my program, but I can use the concocted execution to analyze the whole program. Once again, I show you how to install Triton in this case without using PIN from Intel, and the next slide using PIN from Intel. Step by step, it's working. I wrote a very simple program in Python. I insert our code here using X decimal. I try to comment again, line by line, block by block in blue for help. Here is very interesting because some people don't know how to convert from opcode to X code. I show you how to do that using RASM from Hadair. Of course, you can use Ida Pro, Ghidra and so on to do that, but I show you how to do that using Hadair. So, I executed our Python program using Triton, and I was able to symbolic solve our problem. Once again, all instructions are shown, and finally, I know that all the op-skate code is equivalent to a simple add operation. I tried to solve it using a numerical approach using the same Triton framework. I wrote another program in Python. I inserted the same code there, the same op-skate code, the stage 4. However, this time, I set up the initial value of each register, ESP, EBP, EAX, BX, CX, DX. I set up the entry point, the entry point address, and again, I started the processing. Each instruction is executed one by one, and finally, at the bottom and in yellow, we have the answer. We can use Hadair to handle our problem. I started Hadair using 32 bits. I enabled the emulation in Hadair. As you can see, there are many ECU comments here. I set the EAX to 7, ECX to 2. I set the breakpoint here, and I run our program. As you see, we have that EAX is equal to EAX to ECX, 9. We can integrate Hadair to Miasmi. In this case, the Miasmi is the working gene, and Hadair reads the Miasmi results and shows you as an ECU comment. It's quite easy to do that. I show it here, step by step, and I run Hadair using Miasmi integrated. As you see, we can see all ECU comments here coming from Miasmi, but translated to ECU comments in Hadair. The trace is outstanding, too. The trace was introduced in Solar Stan 15 years ago. It's an amazing dynamic tracing tool, and recently, the trace was supported by two windows, two Monfago. Honestly, the trace is a set of probes. Basically, probes are scattered on the kernel, and each time that a system call is called, the probe is triggered. The scripts in the trace can be written in D-language. It's a very unusual language, D-language. In the general composition, the general format of a probe is provider, model, function, and name. Provider is the library, model is the kernel model, function is the system call, and name is the name of the probe. I show you how to install the trace on Windows 10. It's quite easy to do that. And here, I show some comments using the trace. I list all the probes using the trace minus L. I list all providers except system call and ETW providers. I can list all probes related to system calls. In this case, read system calls, write system calls, and view system calls. It's quite easy to do that. It's a very short example about the trace. In this case, I'm counting the number of times that a system call is called when I'm using Notepad, for example. Another very interesting example, I can list all running processes in my machine. For example, in this case, I run this very simple line, very simple command using the trace, and I could get the running process on my machine. This is a bit more complicated sample. In this case, I'm counting the number of times that its system call is called by Chrome during only five seconds. Another example here, I'm listing the number of times that a system call is called in my whole machine. The trace has a very interesting provider named function boundary tracing. Using function boundary tracing, I'm able to trace the system calls being called in the kernel length. To do that, I need to attach a kernel debugger to winlostamp and this fact enable function boundary tracing. For example, in this case, I can use winlostamp to attach to winlostamp and enable function boundary tracing. Look at that. I can trace all system calls related to NTFS for the winhard program. It's quite easy. I've been using that to analyze some of the models. Of course, unfortunately, the trace is so new on winlostamp and some problems happen. In my case, my system question, and I provide you a very short investigation about the problems. In this case, there is a kernel model named tracex.sys. That's the main model of the trace. I show you step by step how I can investigate the problem. And finally, I found that the guilt problem, the guilt instruction is that one in green. I didn't have enough time to notify the Microsoft, but that's it. Finally, a few months ago, I saw a very strange, very strange anti-virtual machine technique being used by an advanced protector. Usually, most mirror samples have been used different anti-virtual machine techniques to detect virtual machines such as VM, virtual box, parallels, and so on. It's quite easy to write a C-sharp program to detect a virtual machine. It's quite easy. You can use, for example, this class Win32 bios managed class to do that. It's quite easy. And I show you here a very, very, very simple code to detect virtual machines. I run this code. I comment some lines there at top. I comment using the functions. And at bottom, as you see, I run my program in a physical host and in a guest virtual machine. As you see, it's quite easy to know that at the right side, I'm using a virtual machine. However, it's not the problem. It's not the question. It's not the issue. I saw a very strange technique using temperature. I thought, whoa, how can I use temperature to detect virtual machines? I tried to reproduce this technique using C-sharp. I wrote a very short program here, but I received an error at bottom, no reference reception. And I tried to investigate what happened. I used the Windows managed instrumentation tester to do that at the bottom. One, two, three, and four. And I found that in a virtual machine, I don't have temperature probes. Windows doesn't offer me any kind of temperature probe. I highlighted it at the middle of the picture. So it was easy. I translated this fact to my program. And I added an exception handler to handle these strange situations. Finally, my program was able to detect user temperature, to detect physical host, or a virtual machine. I've been using several techniques to circumvent this trick. But I believe that you'll see this technique more often. Finally, my conclusions. People have asked me how to bypass anti-reverse techniques. First, you need to do all the techniques being used. After, you try to use different frameworks, such as Mias, Metas, Triton, and so on. Emulation is a good alternative, of course. And GTracie is a new old tool that recently was supported through Windows. And I have been using it to solve some reversing problems. I would like to thank you, Defcon staff, and you, who have served some time to be here. Thank you so much for attending my talk. Have a nice day. Any questions? Questions? Please. Sorry. I'm just wondering, how do you deal with those programs that have been outfasked by Timida? Sorry? By Timida, like being protected. I mean, the virtual machine-based malware. There are different tricks to prevent my virtual machine from being detected by a malware sample. Different tricks. Usually, I try to set up a very specific virtual machine. And when I modify the VMX file to prevent it being detected. No. I mean, my question is, what if it has been protected by VM-based? It's like, for instance, like Timida. Timida? Yeah, Timida. It's like, you use the VM yourself. Yes, I use VM. Yes. It uses VM. I mean, Timida, the malware itself uses VM, just like a Java program. And you use just in time, just in time, technique to compile it into bytecode and execute one line and then discard that bytecode and execute the next line. I mean, just so it's like basically just running in a, the malware itself is running in the virtual machine that has been provided by itself. I mean, just attached to it. And the virtual machine itself has been manually revised. Because I have seen this kind of technique has been used in Timida. Myself, I was trying to reverse that, but not very successful. I was wondering, what's your take on that thing? I have so many tricks to do that. I can explain you, there are many tricks to deceive Timida. There are many tricks to strictly prevent the Timida techniques. There are many, many tricks in using Piper Thomas Chiennes. I can't talk about that. I can't explain every single point because it's a very long topic, very, very long topic. It's very complicated. And I don't know, I don't know if there is any known approach. So I was just wondering if the emulation approach that is suggested by you could work for that. I think emulation is a nice approach. And it's metamorphized. And sometimes we use the high-level semantic approach to, I mean, because it's not neutral matching, byte-by-by. It's not byte-by-by. It's a little bit of high-level, high-level semantics. What I mean is basically just metamorphism. And it's metamorphism. It's not just simple polymorphism. It's metamorphism. The code itself just changes it every time. And the VA itself has been changed. Every time you use a different bytecode. So the mirror itself just, I mean just... I try to find a pattern. I try to run several times the same mirror in a virtual machine, for example. And I try to find a pattern. And I try to make a relation between this pattern so I try to make a table, a kind of mapping table to try and understand better how the virtual machine in front of the media works. I've done the same approach in using other protectors, such as VM Protect, Archan, Agile, and so on. So the approach is almost the same. I try to find, for example, I try to split my instructions in classes, in different classes, for example, jump instructions, conditional instructions, and so on. I try to map some Intel instructions to a virtualized instruction. And I make a whole table trying to make a relationship between the Intel instruction to a virtualized instruction. However, most people believe that all Intel instructions are re-virtualized. It's not true. Only feel of them. So I try to understand what instructions were virtualized and what instructions weren't virtualized. I think everything has been virtualized except a virtual machine yourself. I mean, virtual machine yourself has been, they could run several levels of a virtual machine. Yes, yes, yes. They can just make it very complicated. I'm just wondering, I mean, to make it quick, what's your successful rate? I mean, have you been successful to successfully to reverse every of them or just sometimes to just also have some difficulties? Most of the time, I have problems, sure. No, I'm a little unturner. True handle. I fail most of the time. Yes, but most of the time, I have some problems to handle this kind of protectors using virtual machines. But I've been, I've been had some success, about 50%. Yes, I honestly, I would like to have 100%. I hope so. Thank you very much. Any further questions? No more questions? Thank you so much for your time.