 Hello, DDS Stevens here, Senior Handler at the InternetStorm Center. In this video we are going to look at some YARA rules to detect office documents with VBA code. I wrote a couple of diary entries, like here simple YARA rules for office mall docs and here one with improved rules to have less false positives. Here in one with the simple rules, I have two different rules. One for the binary file format and one for the XML file format. So long time ago, Microsoft started to use the compound file binary format or what I like to call the OLE file format for its office documents and this is that rule. And then in 2007 they introduced a new file format, the OO XML file format, so the office open XML file format, which is essentially a zip container containing mostly XML files. So let's take a look at these rules. I have them here. This one is for later in the video, but here are the two simple rules. Let's start with the OLE VBA one. So if I run YARA with these rules on an office document old file format with VBA code, then the rules detected. And how does this rule work? Well, just two classes in the condition. First of all, we check if we are dealing with an OLE file by checking what the first four bytes are of that file. So I have that file open here in the binary editor. These eight bytes here are the magic sequence for OLE files. So each and every OLE file starts with that sequence. And what I typically do is just test for the first four bytes. If you have a bit of imagination, this reads like DOC file, and this hexadecimal data reads like DOC file. So that is what I'm testing for. I read an unsigned 32-bit integer at position zero, beginning integer, and I check if the value is this. And if that is the case, then I also want to find a byte sequence, which is actually a compressed code for attribute. And it's this hexadecimal sequence. Let me copy this and search this. Here you have that sequence. So let me explain why I select that sequence. If you run OLE dump on that document, you see that it contains VBA code in Stream 7, so I can select Stream 7, decompress the VBA code, and here you have the VBA code. And you can see that the VBA code starts with attribute space and then a variable name. That is always the case with VBA code that has not been tampered with. So every office document, for example, that is created with VBA code, its VBA code will start with attribute space. And that is what I'm going to search for. Now what I'm searching for is compressed, so I cannot just look for the string attribute space. I have to search for the compressed version of that string. And I can show you this by saying 7S. So in Stream 7, I want to see the compressed source code. And here you have it. So it always starts with 01. And then here you have a two byte integer that specifies the length of the compressed source code. Now it is an integer that is a length and also some of the high bits of that integer are always set. This is not something that can be easily used for detection because it changes. But then this is what I can use. This here is the first piece of a compressed chunk. So 00 indicates here that this is all a representation of compressed data, 8 bytes here attribute. And that is then followed again by another piece. So 00, so also everything stored and that is then 65, E, 20 space and so on. So I select this sequence here to detect compressed VBA code. It contains ASCII characters, common ASCII characters attribute and E, but it also contains two null bytes. So that makes it a bit more specific. So that is just how the rule works. If we find a file that starts with this and if it contains this, then we are going to generate an alarm for an office document, an OLE file that contains VBA code. Now for the new file format and new is relative because that file format was introduced with Office 2007. So what we are dealing with here is a zip container. And I can also show you this with my tool zip dump. As you can see here are all kinds of XML files. And if the office document, the 00 XML file contains VBA code, then it will contain an OLE file and that OLE file contains the VBA code. And the name of that OLE file is VBAproject.bin. So that is what I'm testing for. It's a very simple test. I test if I'm dealing with a zip file and then I just test if I have a string VBAproject.bin inside that file. What is that test here to test for a zip file? If you open that 00 XML file, so that zip container in an hexadecimal editor, you will see that it starts with this byte sequence. This is the magic header, the byte sequence for a zip file record. And a normal 00 XML file will start with a zip file record. So we can check for the presence of these bytes at the start of the file, like I do here. And then just search for VBAproject.bin. So I'm going to search for ASCII and here you have it. Now this is a very simple rule and it can have false positives. For example, if you have file names that contain that string, like for example, a file name that would be this is VBAproject.bin, then this would also trigger. But that is not a valid VBAproject.bin file because the name is not what it should be. But then that rule will trigger. We can improve that rule and that is what I've done here. We can improve that rule so that it will not only check if it finds VBAproject.bin, but also if that string that is found is found inside a zip file record. And that's what I do here in this rule. So here that was PKVBA. So in a zip file, a PKZIP file, we want to find VBA. This rule, the improved rule is also PKVBA but then RE because I'm going to use a regular expression. First of all, here, the file name, it's not actually VBAproject.bin, but it is word slash VBAproject.bin. So it's contained in a folder word, but inside the structure that is actually the file name. And we can see that here. If I break this down here, I select the zip file record for word VBAproject.bin and I drill down in the structure, you can see that the file name, the actual file name is not just VBAproject.bin, but what is stored inside the record is word slash VBAproject.bin. And that's also what we see here. So the file name that I have to look for is this here. But of course, that is for word, for Excel, it will be Excel. So what I did is create a regular expression that will match all of these variants of the file name. So here you have a bit of regular expression that can match folders that are just composed of letters and the slash character. So the test here now is again, am I dealing with the pkzip file? And then did I find that regular expression? And then if I find the regular expression, I will do two more tests to see that I am indeed dealing with a file name inside a zip file record. Now if you come back here, you have the file name here. And here you have the magic sequence for a zip file record, 54b0304. The distance in bytes between this position, this field here and this field. So from here to here, that is exactly 30 bytes. So I'm going to do some calculations in my Yara rule here. So first of all, at vbaproject.bin, that represents the address where that regular expression was found, so the position. Now that string can be found more than once. And as we can see, it is indeed also found more than once here in the rule. You can see that if I run Yara on that document here, it hits. I mean, there's a trigger. And if I look for the strings, you can see here that these strings and variations of these strings are found at many positions. So I need to test all of the strings that are found. And that is why I'm using a for loop here, for condition. And the rule is, if just one matches, then that is good enough for me. If we found one record that contains vbaproject.bin, then we have found vbacode. So I will match for any and not for all. They don't all have to add just one of them. So that is the index then here. So this is index e into the array of vbaprojects.strings that are found. So that represents here the position of an instance of that string word slash vbaproject.bin that was found. Then we subtract 30 from that. And then I read the unsigned 32-bit integer begendian at that position. And I compare that with the magic sequence. So that is the rule. If this triggers, then already I have a first indication that I have indeed found a filename and that filename is inside a zip file record. And then finally I do also another test. And the second test that I do is the following. Here you have the filename word slash vbaproject.bin that is 19 characters long, 4 bytes in front of this field you have this field and this is the name length. So that's 2 bytes long as you can see here 13-0-0. So that's in hexadecimal, little endian, so that's 19 in decimal. So what I do in my rule here, I take the length of the string that I found. So exclamation point vbaproject.bin that stands for the length of the string that was matched and this is the index into the array. So this is the length here of my string 19 if it is the correct string. And then I will again take the address of that string, subtract 4 to go to two fields in front and read that as an unsigned integer 16-bit little endian. And that should also be 19. And if both are true, both classes are true, then I'm pretty sure that I found a vbaproject.bin filename inside a zip file record and then because of the fork any, this whole part here will be true and thus the rule will trigger.