 Hello, DDS Devans here, Senior Handler at the Internet Storm Center. I wrote a blog post about analyzing malicious OneNote documents and also a diary entry on the Internet Storm Center. So here in this video I'm going to show you a bit how I proceeded. So in my blog post I do this with binary editor 010 editor. Here in this video I'm going to use my tools to do the analysis. So I was asked to take a look at the OneNote file and I was not familiar with the format for OneNote files. So I started to investigate, take a look at binary data. And with my tool, cut bytes, I can do for example an ASCII hexadecimal dump of the start of the file. So A for ASCII and here I specify which part of the file I want and here from the start to zero I want a length of 100 hexadecimal. So 256 bytes and this is the sample that I was given. This is how it looks. I don't recognize anything. Here at the start I expected to find some magic sequence to identify the file like you can have with MZ files, Windows executables. They start here with MZ, PNG files starts with 89 and then PNG, zip files starting with PK this here but I need to say no extraction. So that my tool doesn't extract the content from the zip file but that we take a look at the zip file itself and here you can see PK. So we don't have that in OneNote file at least, not something I recognize. But if you see here at the start these are binary values by that I mean non printable characters most of them and they look random. So this could be a GWT globally unique identifier, Microsoft likes to use them. Now a GWT is a sequence of 16 bytes and in the representation of Microsoft it's a mix on how it has to be interpreted. So the last 8 bytes they are just in big-endient format but the first 4 8 bytes they are in little-endient formats. So these 4 bytes then these 2 bytes and then these 2 bytes. So little-endient minuets to inverse the sequence and that is actually something you can easily do with my tool, 4-man bytes. If you don't specify anything to that tool it will just look at the first 16 bytes and give you different interpretations of those bytes. So let's do this. So first you take a look at the first byte, interpret it as an integer and so the first byte is a minus 28 integer if it is signed and it is 228 integer if it is unsigned and so on here with 2 integers and so on and here at the end you have 16 bytes interpreted as a GWT, a fully big-endient GWT interpretation as you can see here and here the representation how Microsoft likes to use it mixed and that's what you can see mixed-endient. So these here little-endient, this one here big-endient. So assuming this is GWT we're going to copy this, search for it and I'm lucky here I end up at a document that specifies a file format MS of Microsoft One Store. So this is indeed a file header, a GWT file header for .1 files. So that's the way to identify them. Next I'm looking for something inside that one node file, a malicious one node file and executor for example and that's something I can look for with my tool PE check to analyze PE files and I can say locate the PE header in any data that I give you. So here the one node file and then it indeed finds a 64-bit executable at this position and this length. So that file is inside a one node file at this position. So again with cut bytes I will take a look so 0x2AA4 and indeed this is a PE file. Here you have the MZ, here the offset to the PE header and here then the PE header. Let's see what comes in front here if we can recognize anything. Let's say 70 for example, go to 70, okay I'm here Libri, let me do a bit more because I expect that this is a Libri font. So here I have the start of my PE file MZ and let's take a look. So a lot of zeros in front of it and then again here another 16 byte sequence. This also could be a GWT if we are lucky. So that's position 20. So 60 here plus 20 makes 80, okay so let's see if this is a GWT. I cut this out now instead of an ASCII dump I do just a binary dump and then I feed that into format bytes and then here we have the GWT representation. Let's search for that and here we have an MS1 store. So when I did this search a couple of weeks ago this was the first hit but now it's the sans IC diary entry that is the first hit. So this here specifies a header for a file data store. So the GWT 16 bytes here they actually represent only 12 bytes but it's actually 16 bytes long and then the length and then unused, reserved and so on and then the actual data. So we have the GWT 16 bytes, the length 8 bytes, unused 4 bytes, reserved 8 bytes and then we have the data and then we also have a terminating GWT. So 16, 4 that's 20, 8 and 8 that is 16 so in total these are 36 bytes so 36 bytes after the start of the GWT we have the file data and that's how I quickly put together a simple program based on my template. It's called 1Dump. It will look for that GWT for the file header store in all the data, find all in the data and enumerate this and then extract the format like the length and the file data and then dump this out or if there is a select then dump the data. So let's see how that looks. One dump on the file and like my other tools it also works on the zip file and here you can see the data. So this is in my beta folder because I might still change the format here how this output is but so for every here file data store object you have an entry here and it's indexed 1, 2, 3 so we have 3 such entries here in the file. This is the position hexadecimal where the header was found. This is the first 4 bytes of the file .pngmz so the rpe file and then also another picture. This here is again the magic sequence but in hexadecimal. This is the size and this is the md5 checksum of the embedded data. These are all things that you can change. And if I compare that to my extraction with PE check in detail I have the same hash. So one dump gives you that data and then you can use it like my other tools for example say select 2 like this and you get an hexadecimal ASCII dump. You can do a binary dump and that pipe that for example in PE check to give you an overview of the sections for example to the analysis. So my tool here 1Dump is very simple it just looks for a sequence of this and this can also be tempered with time. So I can for example create a 1 node file into which I put a binary file and that binary file has also that grid somewhere inside. So this can fool my tool here into improperly decoded data. So it's quite quick and dirty here. There's absolutely no full parsing of the binary data just looking for that grid and assuming that that grid is a start of a file data star object header. There have been YARA rules developed by Florian Hott here and what Florian Hott in his rules is also search for that grid. And then here in this one rule that the text 1 node files with a suspicious embedded file so six strings that are looked for each time the same grid then here can be these are wildcards can be any byte value but 36 bytes after the start of the grid here starts the payload and here this one here checks for a PE file so this is MZ. This one here these two here search for BAT files but because BAT files don't have a magic header Florian here is looking for at echo of that's what we are seeing in malicious 1 node files that are being distributed now. Here that's for VBS files on error is zoom at the start because also there's no magic header and then LNK files have a magic header for C000000 so if one of these six strings is found and the file is less than five megabytes then rule will trigger so let's check this and here you see it triggers on two files and also on the YARA rule itself but that's another rule the YARA file I mean. And then also just as a remark and I'm going to talk about that later not in this video here if you look in my beta github and here you have one dump I have also one node rules these are Sturicata rules for detection similar to the YARA rules something I'm working on.