 Hallo, DDS7's hier. In mijn laatste blogpost analyseer ik een encoded payload. En het is niet je standard encoding en ik zal je laten zien hoe je de statistische analysie en guessing van plaintext kunt je deze payload decodeen. Dit is de blogpost. Laten we hier beginnen. Dus, let's take a look at the payload and you see it's a lot of lowercase letters. So my first idea when I saw this is this, this is net bios name encoding. This is something that Cobblestrike uses for example in DNS transfers. And I do have a suspicion that this is an encoded Cobblestrike beacon. And because I did a dynamic analysis where I took a process memory dump and I found a Cobblestrike beacon. So now I'm going to try to decode this. So since it is net bios, well I assume it is net bios, I'm using my base64 dump tool that can handle all kinds of encodings, not only base64 and net bios name is one of them. So I'm going to try all the encodings. I'm going to give it a minimum length of 100 because it should be a long payload. And also just going to look at the uniques here of the payload. So and here I don't see anything that I recognize. So it found here the very long payload and that's the same size as the file as you can see. So it found a complete payload and that corresponds to base64 and base85 encoding. But only 17 characters are actually used of the base64, all the base64 characters and all the base85 characters. Only 17 of them are used. So this is most likely not base64 of base85. En more importantly it's also not netbios tape like I assume because base64 doesn't find netbios name encoding. And when I'm taking a second look I also understand now that my assumption was wrong because I have letters like y here and t and q and those letters do not appear in netbios name encoding. Netbios name encoding is like hexadecimal, 16 digits. But instead of using decimal digits and the first six letters of the alphabet, netbios name encoding uses the 16 first letters of the alphabet. So now I'm going to get some statistics in my invite stat tools on the payload. So indeed and there are only 17 unique bytes. So let's see with a range view option r which bytes those are. You have the range ABCDEF, then I, and then here in the alphabet OPQ RST UVW W, and then finally Y. So 17 digits. So this is likely to be hexadecimal. hexadecimal requires 16 characters, not 17. So the 17 character might be some obfuscation that is not used or it might represent something else. So that's what we are going to try to figure out. So in the blog post here I'm starting to write a program and that program is available. If you go to the blog post you will find a link to GitHub to the program. So this is a Python program that I will make here to try to decode this. And it is based on my Python template for processing binary files. So I have that file here, that template. And if you go all the way down here, then you will find this here. So this is where you put your Python codes to process the data of the binary file that was read. So you have to put it in between here and here. So the default is just to print some information and do a dump of the first 1256 characters in hexadecimal. We are going to do something different. So I'm going to modify that code step by step. And for that, so you have the complete result here in my custom decoder. But here I made all kinds of versions where I do this step by step. So version 1A, what I'm going to do here is just try to take the first two digits of the payload each time. So here, just take two digits, try to convert them to hexadecimal. And if that doesn't work, then we just ignore it. And I do this for each one. So for example, FFFF here, that should be able to convert. And then we will see what we end up with. So this is the Python code I wrote to do this. Now first of all, so the binary data that was read, I have it here. But as I'm actually dealing with letters, I prefer to work here with strings, not with bytes, but with strings. So I'm using the code method to convert this to a string encoded payload. And then I will iterate over that string. So I have a data list, an empty list. En as long as the encoded payload is not empty, I'm going to take the first two bytes and I'm going to drop the first two bytes from the encoded payload so that I move on. And then those first two characters, I'm not actually bytes, but characters, I'm going to try to convert them to an integer in hexadecimal notation. And that I append to the list. If that fails, I get a value error and then I just move on. I don't do anything. And when all the payload is processed, I convert that data to bytes. And then I do an exaskey dump. So let's see how that looks. One A on the payload. Okay, so and this is the output that we get. That's not very useful. Dat doesn't help me that much. So what I'm going to try now here is add something. So if we cannot convert it, we are going to try to add something. So that's in B. And what I'm just going to do is add a value zero. So that's the only change between one A and B that I do. I'm going to add a zero null byte. En see if that gives me more insight. And actually neither does that give me more information. Okay. So. Remember that I said. That I think this is a cobblestrike beacon. So a Windows PE file. Windows PE files always starts with the letters MZ. I can show you that. Here we then ask exaskey exaskey dump of let's say the first 100 bytes of notepad. So here you see MZ4D5A. So I will assume that my payload also starts with MZ but encoded. So 4D5A and here I have YDUA. So I'm going to assume that the Y corresponds to a 4. The D corresponds to a D, the U to 5 and the A to A. So that is something I will encode as follows. I'm adding this dictionary. Y is 4, D is D, U is 5 and A is A. En dan hier, met these two lines of code, I'm just going to take the encoded payload and replace this value with this value for all the encodings. And then again just dump this and see if we end up with something. Now just remark D converting to D and A converting to A. I shouldn't do that actually but I'm adding it. I'm still adding it to a dictionary here to know that those are letters that I actually made a guess for. So let's see what that gives. So custom decoder C on the payload. Okay and now we get MZ of course. That's to be expected. But then we also here see some more letters like the T and DO. And that is interesting because if you take a look at the beginning, let's here take 200 of notepad. PE files have also this message in them. This program cannot be run in DOS mode. Now there can be variations and it can also be just removed or replaced by something totally different. Often it will be that text. If nothing has been tampered with, the compiler will put that in. And it's also something that is not functional. And if I look again here at my output. So I have the T here that might correspond to this T and the DO that might correspond to this. And if you look at the T here, the T is 5,4 in hexadecimal. En 5,4, that is indeed something we already have the letters for. U, R, I and U. So in my next version, what I'm going to do here is I'm storing the string here. And like I said, this is a default string, but there might be some variations to it. And if there is a variation here in this payload, the decoding will not work properly. I convert this to hexadecimal and I just output it and then return. So let's run this. Number D. Ok, and so this is what I have. Like I said, this here is 54. Ok, so 5,4, that actually corresponds to U, Y. Ok, so in the next version, I'm going to try to find that position. Find in my payload where I have U, Y, so the uppercase letter T and print its position. Ok, so and this is found here at position 86. So that's quite in the beginning, which is good. So next modification here. I will print, select the encoded payload. So this here selects me the encoded representation of this string here, exclamation mark. This program cannot be run in this mode, so that's what I select here. I print this and then here, I'm going to iterate over all the letters in the encoded and not encoded string. This program cannot be run in this mode and for each encoded letter that I find and that is not yet in the dictionary here, I will put it in the dictionary. So if it is not in the dictionary, I do put it in the dictionary. And if the letter here is already in the dictionary, if it is already found, then I will check if it corresponds with the letter that I already found. And if it is a different letter, then I will raise an exception. Different, because then I don't have a one-to-one relationship. So what I want to do here is populate my dictionary with this encoding, so with the translation. And if I have more than one matching, then I will raise an exception. So let's run this, version F. En I don't get an exception, so that is good. So my program here was able to populate the dictionary by doing this translation. So if it finds an R, it puts in the dictionary R is 2, then W is 1, U is 5 and U is 5 is something we already found. So that doesn't raise an exception going here over the complete list. Y is 6, U is 5. So let's see what that gives us. So here two new lines printing the size of the dictionary and printing the dictionary itself. Like this. And we already have 14 letters, which is very good, because we have 17 in total and here are the different translations that we found. And do notice that the hexades, the standard normal hexadecimal characters A through F here are translated here. D is D. A is A. E is E. F is F. So it looks like they are not actually translating the hexadecimal letters A through F. So now here let's just run this. So I did the way with the return and let's continue to run this. So do the translation with the populated dictionary that we have and see what dump that we get. Let me clear the line. H and do a more. And that is already a very good result. We have the mz here and this here that we expected. Then also the PE header and then also sections like the text, the data and so on. So that's a very good result. Now what is not good is here at position 3C here, that's here, 3C you should have a pointer, a value that points to the start of the PE header. So PE here. And you see this is not an actual, that is another pointer. This is part of this string. So there is still some translation to do. So let's try to figure out what we have to do. So I do that in version i. I will print out here the letters that I have not yet been able to translate. That are not yet present in the dictionary. And so these are the letters B, C and Q. So 3 letters we still have to find translation for. En remark that I said it looks like that they are not translating the letters. So B and C might actually translate to B and C. So that leaves us with Q. So what I'm going to do next here is just print out. So that is a new line. That's what has changed. Just print out the first 80 characters of the payload that I was able to partially decode and see what we end up with. Ok, so that is what we have here. 4D, 5A, 9Q. Ok, so here we already have Q that appears. And then 03, Q, Q, 04 and so on. Now let's compare this with the header of a normal PE file like notepad. So I'm going to do an hexadecimal dump without any white space. Uppercase X without any white space. And I'm doing this for the first 40 bytes of notepad. First 40 bytes of notepad. Ok. And so we have here 4D, 5A, 9 and then 0, 0, 0, 3. En hier we have 4D, 5A, 9Q, 0, 3. Nu, let's put that beneath on top of each other. En I've done that in the blog post. So let's switch to that part in the blog post. En as you can see here in the blog post, I highlight each time the changes, the additions that I make to the program. Ok. Ok, we are here. So 4D, 5A, 9 4D, 5A, 9 is the same. En hier hebben we Q en hier hebben we 0. En hier hebben we 0, 0 en 3 en 3 hier. En hier weer 3 3 en Q en dan 0, 0 en dan 0 en 4. Dus het lijkt dat de Q meer dan 1-0 is. Als ik de volgende substitutie doe dus hier heb ik 4-0's en hier heb ik Q en 1-0's en hier heb ik 7-0's en hier heb ik 2Q's en 1-0's. Dus als ik dacht dat de Q 3-0's representeren dan kan ik dit vervelen properlijk. En dan heb ik hetzelfde. Dus in de commissie, als ik de dictionary heb, is Q equals 3-0's. Dus dat is wat ik doe in deze versie. Hier, Q equals 3-0's. Dus let's run dat versie. Ok. En nu ziet het er echt beter uit want nu op position 3C heb ik een pointer, een offset, 80 exadismal en als ik op position 80 ga, hier vind ik de P-e-header. Dus dat ziet er beter uit. Hier ziet het er echt beter uit. Q betekent 3-0's. Ok. Dus wat heb ik aan de linker? Hier, met de letters B en C. En nu laten we zien welke exadismal digities ik nog niet versloten heb. Ik ga deze code doen. Dus de letters die ik nog niet heb ontdekken voor B en C. En dan de exadismal digities die ik nog niet heb gebruikt hier in de versie zijn ook B en C. Dus ik ga werken van de assumption dat B versloten naar B en C naar C en C als we met wat een P-e-file zijn. Dus wat ik ga doen aan de eind is dat ik de decoded payload naar een file payload.exe.vir Dus gaan we deze doen. Ik ben op de payload. Ok. En nu heeft hij payload.exe.vir Dus met mijn tool P-e-check dat analisert P-e-files laten we zien of deze randt en inderdaad het kan decodeen. De decoded P-e-file en nu laten we zien of dit inderdaad een beacon en inderdaad mijn tool 1768 kan de configuratie van dat beacon en zoals je kunt zien het is een gewone licentie het is gebruikt veel. Dus dit is een piratet versie van koppelcyke dat was gebruikt om deze malware te creëren. Dus ik moet een decoding doen waar ik de decoder niet heb. Voor dit geval was ik ook op de decoder een netassembly dat de decoding met de decoding wat ik hier heb gedaan en met wat statistische analiseren en de assumptions van wat is encoded zoals mz hier en zoals deze string nu als je niet weet wat eigenlijk encoded is dan natuurlijk is het veel moeilijker om te decodeen.