 Hello, I'm Didier Stevens, Senior Handler with Internet Storm Center, and I wrote a diary about the dissection of an equation editor exploit inside an rtf file, and in this video I'm going to show you how to dissect that exploit. So we have an rtf file, and with rtf dump we can analyze this. So a lot of elements here, but what we are going to do is to filter for objects embedded in this rtf file. And then we have three objects, 191, 92 and 93, and they are actually all OLE files, so you can see here the magic header, DOCF11E0 or DOC file, all three of them. And so we would analyze them one per one, now I'm going to start immediately with the second one because that's where the exploit is. So I select 192, I do an hexadecimal decoding, and I look for the information, and then you have this entry here. So we can extract this, and then dump this so that we can pipe it into OLE dump. Since this is an OLE file, we can analyze this with OLE dump, and here you have the four streams, and you can see here the equation native stream that contains the exploit. So let's quickly take a look at those four streams, here you have different names that are defined, and stream 2 and 3, there's almost no information. The exploit itself is here in stream 4. And we can actually read the command here, you can ccmd.exe slash t, bitsadmin, download to execute. So if you just want to figure out what this malicious document does, well here you already have the information, here you can see the command. What we are going to do now here is look into detail how this exploit here is constructed. And for that we have an article, there are many articles on the internet that explain how that equation editor exploit works. And in this one here, it explains that at the beginning of the stream you have this header. And the first two bytes, so a word integer 16 bits, an unsigned 16 bit integer, that is equal to the length of the header which is 28. So we are going to check this. So I select that stream and I dump this binary dump and I pipe this into one of my other tools which is format bytes. And formats bytes can dissect binary data according to parameters that you give it. So that's what we are going to see now. So I'm just going to run it without any parameters. And then it will run and read the first bytes and try to parse them. So for example here it is parsing one byte, the first byte as a single byte integer. And it is a signed integer then it has the value 28 and unsigned also 28. And the two byte integer, the signed little engine is 28, unsigned little engine is 28 and then big engine is 7168. So two bytes integer that is equal to 28, that is what is specified here in the structure. So we can actually do this now element per element by specifying this via the format option like this. And this is actually the same format that is used in the struct module of Python. So we have a little engine word, an unsigned word and two bytes and that is indicated by the uppercase letter H. And here you can see that we have indeed a value which is 28. That is its hexadecimal representation and this is here the representation if this would be a timestamp or an epoch timestamp. Now if we go back to the header, here we see I have one word, two bytes, then a double word and 32 bit integer unsigned again, another word and then one, two, three, four, five double words have five integers of 32 bits. So we can encode this. So we have an integer like this and then you can see we have two entries and then each time here you will have the list here with the hexadecimal value and if it's an epoch here. The thing you can also do is specify an extra format parameter with a colon and then say for those two values XX I want them to be printed in hexadecimal like this and then you all only get the hexadecimal output. Now here for the first one I would prefer to have integer so that I can see 28 and here this one hexadecimal. So we are going to continue. So if you remember we have another word and then five double words one, two, three, four, five. So that's six elements extra and here I will add one, two, three, one, two, three, that's six elements extra. So and that is our header here, the equation oily file header. Under that comes an empty EF header for the equation and that consists of five single bytes. So that's something we can also specify here, one, two, three, four, five, five bytes and I'm going to represent them all in hexadecimal like this five and then you can indeed see here three, one, one, three A and that's what we have here in this header. Now another thing that you can do with format bytes is also annotate the different entries and that's what I'm going to do here because this is the start of the equation oily file header. And this here is the start of the empty EF header. So entry nine here equation oily file header and here empty EF. So what I'm going to do is annotate this with option N. So entry one, there's a equation oily file header like this and then you get this annotation that is added to the output and number nine here that is our empty EF header. So I can add this to nine empty EF header like this. So this becomes here more easy to understand now while we are parsing this. So next after the empty EF header you have empty EF records that appear and the record we are looking for is the font record because that's where you can do the exploit in and the font record has a tag of eight and we can look at that data. If you want to see what comes after this here, we can use an asterisk. So if you go back to the format specifier here and after the last byte type an asterisk then there will be a dump of all the remaining data well up to 256 bytes like this. And then you can indeed see here one byte one byte and then here eight five A five A and that corresponds to this here. So this is a font record here. And those two bytes here before are other records. If we go to the definition of mud types here in this website and scroll down a bit here is also the definitions that we saw for the header and here you have all the records. So we have an A that's 10 and a one and an eight. So this is a full size record, a line record and then a font record. So that is something we can do at those definitions. So BBB three bytes like this and then we have our three records. So number 14 is a full size record. Number 15 is a line record and number 16 is a font record like this. And the structure of our font record. So that is two bytes and then here the font name, which is maximum 40 bytes. But it is long and because there is an overflow and that's what we are going to code here. So we have two extra bytes and then a string of 40 characters. And then the overflow twice an integer has an overflow, an overflow of the stack frame and overflow of the return address. So here two bytes, a string and two integers like this here. Okay, you see the a 58, 58 and then 40 here. Those are the 40 strings, sorry, 40 characters, the font name. And here this is the overflow that overwrites the return address. So this ends up in the instruction pointer. So let me annotate number 19. So that's the font name. And this is actually shell code, we will see that later like this. And then again, let's add an asterisk to see what is following this. And then we almost have our command here. We have two zero bytes and then our command here. So let me add two bytes in hexadecimal and here we have our command. And if we count this with including the zero byte here, these are actually 140 bytes. So if I define a string here of 140 bytes, a string here, then in entry 24 I have my command as you can see here cmd.exe. So 24 is the command like this. So this allows us to parse all the elements here in that stream that make up an exploit for an equation editor expression. So in a simple equation editor exploit, the font name here will contain a command, a small command like the cmd here. And it has to be small because it cannot be longer than 40 bytes, 40 characters. And then they will use an address here to overwrite return address. So that winexec is executed with this parameter here so that this command is executed. But here the exploit is a bit more difficult because the command is much longer. So this is actually a shell code that gets executed to execute with winexec.com. And we can take a look at that shell code and this command just by selecting it. So it's 19 and 24. I can say here select 19. And then I get this dump here, which is the shell code and 24, which is the command. I can do a binary dump and then you can just read the command. So I'm going to do the same for the shell code and I'm going to do a binary dump. And I'm going to write this to a file shellcode.vir. So that I can disassemble this with a shellcode emulator, which we've used before. I cannot emulate a shellcode here because this runs in the equation editor. And it's an environment that the shellcode debugger is not familiar with. So the file we want to load is shellcode.vir. And then the command is disassemble 15 instructions like this. And then you get here the shellcode. So this is a shellcode that is written inside a font name. So it may not contain zero bytes and that's why it is written like this. So you have this value that is written into EAX register. This value into EDX and then you have an XOR of both. And this will give you an address with zero bytes. But by doing this, they avoid to have to use zero bytes. And then that address with a couple of indirections here is looked up and 3C is added. And by doing this indirection, different indirections and additions, the address where the command has to be executed is here loaded into ECX. Then we set EBX to zero with an XOR and we push those two values on the stack. And then we calculate another address and that's the address of WinExec, which is then called. So WinExec is called with a parameter zero on the stack. That means that the command has to be hidden. And the other command is the cmd.exe command that the bits download and execute. So this is that shellcode without any zero bytes. And then it executes the command that follows. So this shows you how you can use format bytes together with explanations over the different records and structures of the equation editor. And then actually decode step by step what this exploit does. The reason why they can use fixed address and everything and things like that is because the equation editor is actually a very old piece of code and it doesn't have ASLR or DP or any other protections. So it's quite easy to write classic style exploits with shellcode for it.