 I'm going to talk a little bit about anti-reverse engineering techniques. My name is Jan Nugir, I'm from Germany. So what's a little bit special about this talk is the fact that the techniques I'm going to present here have been taken from a real world DRM system. So I was running into some legal issues with that so I need to just actually sweep some details on the talk. But I have a few slides on that later. So let's just have a look at the outline. We start pretty smooth with a short introduction on DRMs in general. And as I said, I will have a few slides on the legal issues with that. Chapter two, actually this is the correct slide. Chapter two is a little bit more technical. We'll see some details on exception handling on the windows on operating system level and we need this background knowledge in order to understand all the techniques I'm going to present in chapter three. And chapter three actually is probably the main part of the talk and deals with various anti-reverse engineering techniques and how to overcome them. And finally we have a demo and that's it. So obviously there are some legal issues with this talk and that's due to the fact that publishing DRM research is probably pretty dangerous and considered to be illegal in most countries. At least it's there's a huge legal uncertainty. So I went through the EFF and especially Jennifer Granig which who give me some very good insights and with some very good advices. Thanks Jennifer, you see this. So we were discussing how we could modify the talk in order to lower the overall risk for me and Jennifer actually told me that there's some kind of loophole in the DMCA which allows you to publish your DRM research if you do so-called encryption research though they have to be fulfilled some additional issues with that but this is still too dangerous for me so I decided to go for another solution and we discussed the various solutions which would be filled here and as a consequence I modified the talk in a way that I skip all the details about this key setup of the decryption algorithm and additionally I also won't reveal the very identity of the DRM in question so this is kind of a black box talk. So what is this about? It's about showing some not so common and reverse engineering techniques from a real world example and I'm also going to show how to defeat these techniques and it's obviously not on how to hack insert your DRM here or it's also not about writing the decryption tools. So what's a DRM anyway? DRM means digital restriction, no DRM means digital rights management some people say it means digital restriction management so what it actually does is it restricts the user in accessing the content and this is achieved by encrypting the content so the DRM actually controls access to the key and whenever you access the content it's being decrypted online with the DRM system. The key is often bound to the user or hardware in order to prevent copying of content between different users. So whenever you change your hardware it's very likely that you need to acquire a new license and depending on the implementation there might be various types of keys such as media keys, hardware keys, player keys and so on and just as an obvious, pretty obvious side note is that every mainly software driven DRM actually can be broken. So let's define our strategy we would use to attack such a system. It's pretty obvious that the ultimate goal is to find the decryption algorithm and the associated key setup and if you can understand these then obviously you have a technique to decrypt any content. So the obvious approach probably would be to from a debugger perspective would be to set some breakpoints on common system IO APIs under windows like create file, read file or memory map files and then you would probably set a breakpoint on memory access on the file buffer and whenever this breakpoint would trigger this would mean that it's either a copy operation or you're probably very near to the actual decryption algorithm. Okay, so since the stock would be over at this time if the DRM system would allow this you see that the DRM system actually prevents the strategy by blocking the debug registers so we need to carry out some fancy strategies in order to still be sufficient. Okay, a likely strategy would be to use code coverage. Code coverage in this sense means that you let the program run online while recording execution of basic blocks or functions and you record all these breakpoint hits and do some calculations with the sets of hits so you would ultimately find, pretty easily find DRM related code. Due to a few anti-reverse engineering techniques code coverage is not possible in this case but on the other hand it gives some good starting points. Okay, so summing up our strategy is to use code coverage and apply the debug registers and the hardware breakpoints to find the decryption code. Okay, so exception handling under Windows namely SEH works on a per thread basis. This means every thread actually has a list of registered exception handlers which is pointed to by FS0 and whenever there's an exception triggered in the current running thread, the operating system walks this list and calls every thread until it finds one which responds to be able to handle the exception. So as we see an exception handler basically has two choices. On the one hand it can say, okay I'll handle the exception and return to the operating system and in this moment the operating system will stop list walking and schedule the sweat again. On the other hand the sweat can decide to refuse to handle the exception and in this case the operating system will continue to walk the list of exception handlers. Okay, so what we see here is the signature of an exception handler probably the most important parameters are the exception record as well as the context. So what does this mean? The exception record actually contains information on the exception itself like the exception code whereas the context is a pointer to the sweat context of the folding thread. So the handler actually has the possibility to modify the context in a way such as that it fixes what caused the exception in the first place. And the list structure is depicted by the image. On the end of the list you have always the operating system handler which jumps in if there's no handler which could handle the exception and this usually means that the process is terminated and Dr. Watson shows up or whatever. Okay, so there's age in a more detailed version. Whenever there's an exception triggered the operating system actually dispatches into kernel mode and the interruptor scriptor table is used to carry out all the program logic which clumps together on the exception record and the context pointer and this information is then passed down to user mode again and the first code being executed in user mode actually is inside and tdll and the very procedure is K use the exception dispatcher and from this procedure all the SEH list walking logic as previously outlined is triggered and in this case handler one would be able to fix the exception and return to the ntdll procedure which would finally use ntcontinue which applies the possibly modified context to the thread. Okay, in this view it was pretty much simplified but it's sufficient for our analysis because we don't need concepts like stack unwinding, collated unwinds and so on and important to note handler can also decide not to return for example in C++ it's pretty common that the context is actually not modified. Okay, so the DRAM protection actually deals with two major techniques to scare off the worst engineers it's control flow obfuscation on the one hand and on the other hand it's entity debugging tricks for control flow obfuscation the DRAM system uses a vast amount of fake exceptions to interrupt control flow at runtime and these handlers the exception handlers which we act on these fake exceptions actually use logic to change the spread context. So what you have is an exception handler since it's able to modify the spread context it can for example change the instruction pointer and the program is used at a totally different location. Additionally there are many cold tables in order to stop disassemblers from successfully performing cross-referencing and there are two major and reverse engineering techniques on the one hand it's tremble lines which I have a few slides on and it's the P-Code machine which is used to carry out the actual decryption algorithm. Okay, for anti debugging we have some very basic checks that is the very common PB flag check or scanning APIs for interrupt three wake up codes and in addition you have special files containing code which is uncompressed at runtime and this again is nothing very fancy because this has been around for some years. But on the other hand the usage of debug registers actually is a little bit harder to overcome which we will see in a few seconds and finally the system also uses fake exceptions to detect the possibly attached debugger. Okay, so tremble lines are in this case is code which is copied at runtime to a randomized location and Auditc is used as a seed for a random number generator to place the destination of the tremble line and when the tremble line has been copied execution is resumed from the destination. Okay, so the actual control flow which is if changed via fake exceptions, in this case since it's a fake exception we need to have some unique exception identifier and in this case it's a single step exception and this also possibly interferes with an attached debugger and the exception handler actually modifies the instruction pointer based on some debug register values and this is exactly the reason why the debug registers have been blocked so because instruction pointer, the instruction pointer actually depends on values of the debug registers so you cannot easily use BPMs in your debugger because that would interfere with the DRM system. Okay, so let's see some details about the tremble line control flow. In this situation tremble on A wants to initiate a control flow change to some tremble on B and it's important to note that the control flow entirely depends on jumps and exceptions that means there's no such thing like a call to a tremble line because everything goes through exceptions that means there's really no direct control flow between A and B because everything goes through the exception handlers and therefore we have a call hierarchy emulation because if you only jump to a tremble line you cannot, the tremble line being jumped to cannot just return because there's no return address on the stack so tremble on A copies the tremble on zero and jumps to a tremble on zero is just a immediate tremble line which is used to put the destination tremble line in this case tremble on B on an internal call stack emulation and this is as I said, this is needed because there is no direct call between A and B so you have to kind of emulate this nested calls and return operations. Okay, tremble on zero actually, so tremble on zero actually pushes the destination tremble on on the internal stack and after this it copies the next tremble line again to a random location and tremble on one in this case installs an SEH frame which is used to handle a fake exception which is a waste by this common code. First of all, the E-flex register is pushed to the stack and the trap flag is owed in and finally it's applied. So upon the next instruction a single step exception is triggered and the aforementioned SEH handler is invocated by the operating system and so the exception handler now carries out all the logic which is involved in the return and call emulation. So what it basically does is it changes the instruction pointer based on the debug register values and again it clears the trap flag bit, removes the SEH frame and cleans up the stack and finally execution resumes at tremble line two which in turn copies the destination tremble line and jumps to it. So what we saw here is this is the mechanism which is used to emulate a call between A and B the return, so whenever tremble line B wants to return again to tremble line A it would do this in a similar way that is again use tremble line zero one and trigger some fake exception and the handler would then modify the internal stack representation, clean up the stack and return to the middle of tremble line A. Okay, so the debug registers are used in a special way whereas debug registers zero and six are zeroed out because they aren't used at all. Debug register one contains a pointer to a shared stack area which is used to pass data between tremble lines in this case that would mean dear one points to a location which is used to exchange data between tremble line A and B so therefore you have some kind of parameter emulation. Dear two actually holds the tremble line address used to perform the return emulation and dear three holds the address of the starting tremble line that is tremble line zero and to confuse a possibly attached debugger dear seven is used to turn hardware breakpoints on and off very frequently. So what's the impact on reverse engineering? It's pretty annoying to debug such a concept because tremble lines always jitter around the memory and it's pretty hard to actually recognize repeating code patterns because everything is copied at random addresses and your disassembler or debugger has is difficult for the debugger or the disassembler to recognize function boundaries. So as I said, since the control flow actually depends on the debug registers, we don't have a possibility to use our BPM and BPM strategy as I pointed in the first part. On the other hand we have also no call stack as I said due to the call emulation so it's pretty hard to kind of back tracing from a nested procedure call and also we cannot, it's not so easy for the disassembler to say okay procedure A calls procedure B because there are no call instructions which are directly used. And this means we have really very few cross-referencing information. And additionally finally absence of return instruction confuses a disassembler because it's hard to guess function boundaries. Okay on the other hand, once we understood this call emulation mechanism we get a perfect call stack which is good because usually when debugging without debugging symbols we don't get a perfect call stack. Okay so how can we actually ease the impact of the trampoline mechanism? An idea would be to actually fix the trampoline addresses in memory and we can do this by boarding a kernel mod driver. And this driver then would actually turn our DTSC instruction into a prologic instruction by setting the timestamp disable flag in zero four. And this means whenever this instruction is executed in user mode a general protection fault is waste and again then all these exception handling which outlines in the second chapter jumps in and as I said the control flow upon this general protection fault would then jump to the interrupt descriptor table which we would hook. So whenever the DTSC instruction would be executed we would gain control in our driver. And what we could do is we could just disable the randomization by just returning zero if we come from user mode and if the instruction actually was already DTSC. So we need to disassemble the memory where the instruction which calls the general protection fault. In all other cases we just jump to the original handler. Okay so and this actually works by using this technique we can fix the trampolines and this makes it a lot more easy to understand the trampoline mechanism because it's easier for the reverse engineer than to see repeating code patterns because everything doesn't jitter around anymore. Okay some words on the debug registers. The debug registers are used by the DRM system as I said for various storage mechanisms. And so the debugger cannot use the hardware breakpoints anymore. In addition the context is actually set via Windows API set flat context. Okay so obviously since we said we wanted to have BPMs for our strategy we need to solve this in a way. Okay what we can do is we can use API hooking in user land space to hook into the set or get flat context APIs in order to redirect any modification attempt or read attempt to our internal storage and fake all the values the DRM system actually expects to see. As a consequence the DRM system would then not be able anymore to modify the debug registers so and this is good because that's what we ultimately want to achieve. So there's a problem with this because hardware breakpoints still won't work and the reason for this is that we actually have two different thread contexts. On the one hand we have the kernel mode thread context which is maintained by the operating system and on the other hand we have in this situation we would have our internal storage of context records. So how can we solve this? We could for example hook KUZ exception dispatcher if you might recall KUZ exception dispatcher actually is the first code being executed in user mode. So our re-implemented version would actually check if the current exception is a fake exception that is if it's of type single step and if that's the case it would pass a fake context with the fake debug register values previously set by the thread context API to the handler. So the handler would actually see the correct values it expects and on return that is when the handler returns to our KUZ exception dispatcher re-implementation we would need to merge the modifications made by the handler into the real context back and after this we would then apply the final context again via NT continue as the original KUZ exception dispatcher does. Okay so here are a few more details how does this work? In the upper image you see the real context in kernel mode which is maintained by the operating system and in the lower part there's our hooked version or any other thread context and at one you see that the daily system frequently uses set and get thread context API calls in order to modify the context and all of these are redirected to our emulated context. So upon triggering an exception the operating system would again call our KUZ exception dispatcher because we hooked it and what we would do then is we would plug in the fake debug register values which are green and merge them with the original values passed down from kernel mode by the operating system. So we would then call the exception handler and passing our fake context and the exception handler would then do all the fancy stuff and return and the exception handler actually modifies some of the general purpose registers like the ESP because it's cleaning up the stack and some other stuff especially of course the instruction pointer. So we need to merge those two versions and sync them back and finally let the operating system apply this modified context. So the summary after these counter measures is the DRAM system actually cannot modify our debug registers anymore which is due to our API hook and in addition the exception handler of the DRAM system gets its expected values because they are fed from our internal storage. So this means we can really now use the hardware breakpoints for our nurses. I made the implementation available as an IDAR plugin. Maybe you would want to check it out. Okay, so the final protection method mechanism which seems to be probably the hardest one is the use of a P-Code machine. So what's a P-Code machine? A P-Code machine is some kind of virtual machine which is embedded inside the DRAM system and in this case this virtual machine is stack-based which means that all parameters are pushed on a virtualized stack which is maintained by the P-Code machine. And you have roughly 256 upcodes and the data are represented as ASN1, an ASN1 format. And the P-Code machine actually allows programs running inside it to allocate memory from the host machine whereas the host machine in this case is just the program where the P-Code machine is embedded in. Actually the upcode set is split into sets. On the one hand you have the high-level upcodes and these are used to, for example, loads upcode files. The DRAM system actually has files which nearly only contain upcodes. So you have upcodes to reload additional functionality in upcode files and you can also call into these upcode modules. And it's interesting to note that the music decoding actually is handled far this way. The other part of the set actually contains low-level upcodes which means it's emulated the virtual CPU that is, you have simple arithmetic instructions like add, subtract, and so on and also instructions to handle the internal virtual machine call stack. Okay a few words on the upcode module files. These are special files which contain upcodes for the P-Code machine. Some of them have some amount of mixed code that is native and upcodes. And these files are actually decompressed at one time by using, I guess it was Gzip. On the other hand these files are rather plain because there are no PE files so there's no import address table, no sections and so on. But there's a relocation table because these upcode modules might call into some, into a fixed amount of imports like for example MS Visual C++ one time. So there needs to be some relocation information in these upcode files. Okay so now how are the upcodes executed? Every module actually has a random pool which is used to randomize the assignment between upcode and associated handler. And this just means that every upcode file has a totally different upcode for example, say addition or decrypt music. And it's just to further increase the protection. So as I said, the machine actually has a built-in sort of random number generator. And additionally the data, there are data interleaved with the upcodes which are just there to confute the reverse engineer, it's just garbage data inserted between upcodes. And this data is actually parsed via ASN1. So the impact on reverse engineering, why is this difficult? Because you need to understand the machine itself before you can even start to understand the upcodes which are contained in the files or the main program. And as I said, due to the randomization, we have a different meaning for each upcode on a per module basis. And due to ASN1 parsing which is quite complex, this still even more increases. So debugging is difficult because you've, and so to say, you have a low signal to noise ratio because a pick-up machine actually looks like a very big loop which has a very switched statement inside a loop. And this is even lower due to our upcode disgrambling with a randomization. And this is actually the graph in IDAR of the pick-up machine and on the lower left, you see the entry point. And in the middle, there are the blue lines. These are all the upcode handlers. Okay, so what strategies could be developed to attack such a system? Probably the most expensive strategy would be to just what a custom disassembler. But the problem with this is that we have really many handlers, in this case it was 256. And also, you have mixed handlers, as I said, those two sets with the native and the upcode handlers. So you would have to analyze all the complex high-level handlers upfront in order to make a meaningful disassembly. Okay, and for the disassembler to work, we would also need to reassemble randomization, the descrambling of upcodes, the garbage instruction, and ASN1 pausing, so that's really kind of expensive. Another strategy which I would call the brute force strategy would be to just let the debugger single step via a debugger script until the key is actually written to memory. This is pretty slow because single stepping is a pretty expensive operation because you have so many context switches. But on the other hand, you will definitely reach the code which writes the key to the memory. But it's not so cool because it doesn't seem to be very clever. So what's the cool strategy? That would be probably to use a CPU emulation like Pi MU or X86 MU for IDA to emulate all the instructions involved in the PCOT machine. So essentially you would kind of defeat virtualization with virtualization. And this is also a very fast and flexible solution because obviously you can control every aspect due to the emulation. Okay, and the final strategy I've been using in this is, well, it's pretty lazy because we use what we already have. And the trick here is that as I previously outlined the machine actually has a mechanism allowing programs running inside it to allocate memory in the host machine. So what we can do is we know that due to our breakpoint on memory access, for example, we might know that it's a desk algorithm and since the key schedule size of this is 80 hex, we could just set a breakpoint, a conditional breakpoint in the memory allocation routine which would fire whenever there's an allocation of size 80. Okay, and so whenever we reach this point we could then use a reclaimed debug registers to set a breakpoint on memory access in the allocated memory. And whenever this memory has been written we know, okay, we would be right break in the key setup algorithm. And from there we could, under the assumption that the decryption algorithm is pretty close, we could backtrace from there and have everything. So pretty disappointing in this case was that the decryption and the key setup are contained in native code that is some of the high level handlers actually contain the code to decrypt and calculate the key to decrypt the content. Okay, so the key setup algorithm, here it is. They were a little bit obfuscated due to some legal issues. It basically works by hashing some files using different hash algorithms. And finally the encryption key is then made of some XO operation. It turns out that the key is different for every music file. Okay, and as I said, it's a desk algorithm and the interrupt, no not interrupt, the initialization vector comes from the gerund file. So now we want to see a demo. Okay. So maybe I'll say a few quick words on the demo. Since I need to, I cannot actually reveal the identity of the very DRM, so I need to do some kind of black box demo. I can show you how a decryption works, but I cannot show you how the encryption process works because that would obviously reveal the very identity of the DRM. So I prepared a DRM protected file. And I can uncrypt it, okay. And due to the fact that the DRM system actually registers direct show filters, I can use any player to play back the unprotected content so by using the Windows Media Player, I don't reveal the identity. So, okay, sorry, please. To give you a good indication why this is, that this is not a fake, I started Matlab with the script and it just reads in both files as a vector and puts the histogram on screen. So what you see on the left is the DRM file which is pretty much equally distributed and this is a pretty hard indication for the file content being encrypted. And on the right side, you see the plain file which has been decrypted so it's absolutely not equally distributed anymore. Okay, so I'm well aware that this could still be fake but I can't give you a real proof because it's just too dangerous. Okay, so the conclusion is overall it's a pretty good protection but it has some major design issues associated with it because we could bypass the whole P-Code machine by just reacquiring our breakpoints on memory access and there were also some other weaknesses like for example, the very weak anti-debugging mechanisms. And of course, there's much room for improvements. The most obvious one would probably be to transform more native code to the P-Code machine so to avoid that the actual decryption process is a high level handler which is pretty useless. And you could also make the P-Code machine more complex. You could do some nesting. You could have a P-Code machine inside a P-Code machine inside and so on. Or you could also have polymorphic handlers as self-modifying machine so the sky will adhere to the limit. And well, as I said, the debugger detection is very weak and just uses some old techniques. Another one which could be used to improve the protection would be to use debug registers actually for the purpose that is the system could just let control so it depends on BPMs. So in this way, we would have no chance of emulating any debug registers because they would be used by the protection itself. And well, I don't know why the guys developing this protection didn't do this. I don't know. Okay, so that's pretty much it. Thanks for your attention and bye.