 Okay. Hello, guys. Welcome to the last talk of the day. Also the last talk of the DEF CON 18. Well, I'm Benson and this is Jeremy. We actually are very surprised that you guys are still here. You guys didn't, you know, got passed out from last night, crazy Saturday night. Actually, the talk is about a mailware analysis. We first will introduce some very famous mailware instance that we have collected. And then we will explain how these mailware try to fight against anti-mailware solutions. And then we propose a method that we call mailware long time forensics that we try to analyze these forensic data on these mailwares. And then we try to do automatic clustering that help us to identify who is who after all. So as I have introduced, I'm Benson and then Jeremy and then another speaker, when actually went back to Santa Clara with one of our colleagues, he actually got so sick on the last day. So, Gwen was not here. First of all, we actually would like to introduce some Chinese characters. Actually, the character on the upper left corner exactly represents the term malicious in Chinese. And then the first mailware sample that we can introduce here is actually one of the malicious mailware that is very famous called Shops Theater. It actually help you generate programs that you can steal username and password from these famous common well-known applications such as Internet Explorer, MSN Messenger, or Firefox. So it's more like a mailware generator. The next one is called Bee Frost. This one is also very famous. It's a very famous mailware. As you can see on this CNC console, it actually has 28 users connected back to this CNC console. So they are 28 victims. And then the one you see up here is a Chinese version. It actually derived from the Bee Frost. It originally derived from Bee Frost by Chinese hackers. And then these characters, if you notice that it's in traditional Chinese rather than simplified Chinese, so got localized by Taiwanese hackers. And then the block you see here, you can specify where is your CNC console, and then whether you want to pack the resulting mailware or not. You can also specify if you want to add more plugins such as keyloggers, so on and so forth. So it makes generating the mailware that help you steal, not make you infect victims to be infected. And then the second phrase we're going to introduce in the talk is called Loji. This one represents chicken. While in Devcom you see that the war of sheep, while in Chinese hacker community they don't call it sheep, they call it chicken. So you actually see a form of chicken instead of war of sheep. And this is the form of chicken. So obviously you know where it comes from, right? In the war of sheep we only show user name and password in Devcom, but in the form of chicken you get to see victims face through the webcam. Okay. And then the modern advanced mailware are so powerful that they have so many advanced features. These features including anti-blah-blah-blah features such as anti-virus going against the firewall and then going against your hips. And then it has more features if you are willing to pay more. So if you pay more you get to enjoy more features that you can put into your mailware that you generated. So it's more like a mailware as a service in this community. So you can see some features such as, so in addition to anti-virus fighting against anti-virus, fighting against firewall, fighting against the hips, you also have anti-vmware, anti-debugger, anti-API hookings. So these are the features that once you pay more you can enjoy. So now we will also talk about how they fight against the sandboxing environment, how they can detect whether they are in the sandboxing environment or not. These are the approaches that they use. The first one is how they can detect whether they are inside a VM or emulator environment. Usually one approach is that they would try to check the base address for this IDT and LDT because in a normal machine the IDT and LDT address starts with a something. If it's not starting with this address then it's probably in the VM environment. And they can also check for the devices because these devices in the VM environment or emulators, for example the CPU IDs tends to be very unique, the model name tends to be very unique, they also have a very specific PCI device. So if they check against these special hardware specifications they can recognize whether they are in the VM environment or emulator. And after all they can also try to launch these backdoor commands because these undocumented backdoor commands are only available when you are inside an emulator environment. So it's very easy for malware to recognize whether they are in a VM environment or emulator. And then in addition to VM emulator detection they can also check whether they are in a sandboxing windows environment or not. Because in a sandboxing windows environment you tend to have specific service, you tend to have a specific process name. So they can try to check whether the specific process or specific service name are there or not. Also they can also try to detect these kernel mode SSDT or the user mode API hook-in is presented or not. And then after all the last one is a very trivial one but works very effectively, they actually have a list of legitimate windows production ID because these vendors they have to have legal license of their windows. So they have a limited set. And this limited set being collected by hackers community so they know whether they are inside the sandbox or not simply by checking against these legitimate license. So we have introduced how modern malware try to detect whether they are in a sandbox VM emulating environment or not. And then what's even more is that they can also try to defeat you even when you try to have monitors even when you try to hook and then try to detect them. So what they can do is that they can restore the SSDT hook. They can also unload the notified routine of process of threat or image or registry. They can also unload this file system filter, restore the SSD hook, unload the TDI filter which is for networking and then they can also remove these NTFS attached device. So you can see there are a lot of things they can do try to unhook you and then try to neutralize your monitors. And then after all they are also some heaps, they try to implement their protection layer on the file system level. However these modern malware they can also try to launch direct road DICS SS through the DICS.SYS then they can bypass your file system protection. So what we try to propose in our talk is actually a behavior analysis. And then we try to do forensics. There are two approaches for behavior analysis. The one is called a network based approach which will not address in our talk because the approach that we take is actually a whole space. So for whole space approach we already mentioned. You either use a VM, you use a sandbox, but as you already know, malware can defeat these very easily. So what we do is actually use a whole space runtime forensics. So in our runtime forensics what we did is that we let the malware execute it and then we try to do forensics after it's already executed. So we collect the snapshot of the environment and then we try to identify the special features in that environment. And because we already let the malware execute we don't have to have monitors to see what happened to the malware during its executions. We also don't have to do hookings either through the API or not. So we have nothing for the malware to defeat us. So you might be wondering, so what kind of features do we try to get when we study this snapshot of an environment? There are three aspects of features that we try to collect. We try to analyze. The first one would be the installation remnants. These are the installation files after the malware tried to execute on the system. They either could be startup files. They either could be additional registry keys. We also study the memory layout. Try to find some memory block over suspicious DLLs. And then we also try to find if there are suspicious malicious behavior inside the systems. For example, a hidden process, a hidden file. So these are all the symptoms for malicious behavior where a legitimate software would not exhibit on your system normally. And then among the three aspects, we would further explain the one how we study memory layout because this one is the most hardest one to work with. So how we study the memory layout, first we have to identify which process is suspicious. So the way we identify this one is we first will fetch the process and service list and then we compare against the list, the process list that is already inside the system. And we find the difference between these two lists. And then how we fetch the process including we scan all the following tables as you can see up here on the slides. And then by doing this, we already identify the hidden process. We already identify the process that is very suspicious. Then we have to further dig into. We have to find which part of this process has the suspicious DLLs. So we will fetch the DLLs from the LLDR and then we analyze the memory layout. Also it structures, scan the code block. Try to find the code blocks which will explain later how we do this part to find the suspicious DLL. So what you see on the graph, normally a process will involve a lot of DLLs where they are implicitly linked. But if there is a DLL that is explicitly linked, it's very likely that it's a suspicious DLL. And that is how we identify a suspicious DLL. So that's what we call, we check the LDRL and then we scan the import table. Try to find the suspicious DLL. On the other hand, if the process actually does not load a DLL at wrong time, it actually does some code injection instead. Then our approach is that we will search the memory, try to find the suspicious P image. So in this flow chart you can see that, so let's review. We try to identify suspicious process and then we try to identify suspicious DLL. And then inside the suspicious DLL, we do our wrong time forensics, which includes LDR scan, P-packer signature check-in, code disassembly, string extraction, and then file inspection for hidden files. So these are the action items that we do for wrong time forensics once we already identify the suspicious memory layout. And then in the following slides, we will introduce some examples that we have collected in the world that use all these kind of anti-mailware techniques. The first one would be the B frost. You already seen in the previous slides. This one is a very famous bot that being used in Chinese hacker community. And it's Chinese name is called rainbow bridge. And the phrase is actually from its icon being used. So you can see the icon is a rainbow bridge. So that's its Chinese name. And this B frost, it uses code injection. So it's not DL injection. It uses code injection. So the only way you can find this suspicious code injection, you have to scan the memory and then try to find the P image. And then you will notice that it tried to do code injection into the IE browser, into IE browser. And then you will access a series of URLs. And the first one, if you notice that the first one you try to access is Microsoft.com. It actually tried to check if the network connectivity is available or not. And then you try to access some CNC servers. These CNC servers are actually named after Chinese vocabularies. For example, the one that we highlighted is actually called May I Know? You ask a question. May I Know? The phrase is number two. It's also a very famous one in Chinese technical community. It's called GhostNet. The original official report is actually released by Canadian researchers Shadows in the Cloud. And then from the sample, you can see that it tried to do, it tried to install itself as a system DL file into the system. And then it will connect to a CNC server. When we try to reverse these binary, we also identify the bottleneck commands that it can use. So you see all these bottleneck commands that these GhostNet can use. Notice that this GhostNet has nothing to do with the ghost rat malware. And then the string at the bottom, it's how we find out its original project name on that malware author's computer. It's actually called CXP. So it's the original project name before this malware being compiled. And the URL that we marked is actually one of the CNC servers that this malware sample tried to connect to. And this CNC server is also being mentioned in the Shadows in the Cloud report. Case number three is DFN666.net samples that we collected a few months ago through the mass SQL injection attack that had been launched recently, starting March. And these are massive SQL injection attack. These malware will use a shell, execute a hook, meaning that it will affect your exporter so that once you get infected, it will try to connect to all these online games and then try to steal your username and password for these online games. We have a very detailed story about this massive SQL injection on our blog. And this one is still ongoing nowadays. So even though it's first experience are back in March and it's still ongoing. So the last one will be the Zoo spot. It's also another malware that we study. And this one is a very famous one as well. You can see that it's actually tried to use the wind log on notify. So through the religious key and then you will infect every process you have on the system and then you will connect to a CNC server. Then you will become a bot. The Zoo spot is so famous that it even has its Wiki page. And one of the security vendors even named it called the king of bots. Its botnet size is kind of like a 3.6 million. So now we will go into how we do malware clustering. Because we already know so much about malware and then we know how to study them. We know how to do forensics on them. We know all their techniques. Then how do we do clustering? And then you might ask, you know, why do you have to do clustering? You already know it's a malware. So the reason behind this is very obvious. Then you can group those that are alike together. So the way we do malware clustering is we try to compute the similarity between these malware instance. And so our final result we will have to come up with a similarity matrix, which is a score that indicates how similar any two instance are compared between each other. So we will build a similarity matrix of the malware samples that we collected. So among the test sets that we have, it's more than 400 malware samples. We do forensic reports on all these malware samples. And then we try to extract the three significant features that we just mentioned. Installation rendments, memory layouts, and suspicious malware behaviors. And then all these features, they will have a score. They will have a score. And then for every malware that have this score, it will compare against another malware instance. And then you can get a similarity. And then we compare with every malware instance, you get a vector of similarities. And then finally you get a matrix of similarities. And this is how we come up with the matrix after all. And then we automate this process through our tool called Zero Box. So Zero Box will scan all the malware samples that we have, make it execute, and then study it, get a report, and then come up with the matrix, and then do the clustering. And then you will be able to group into different groups. In our test samples, we actually have 408 malware samples that we collected on the July, on the 8th of July this year. Because we have a product called Hackalert that can scan websites for Trojans that are already on that website. And then we collect all these malware samples. And then we also test against these malware samples on some existing antivirus. And the heat rays is not that really good. Some vendors have less than 20 percent. Some vendors can have 50 percent. So these malware are really new. And how we do the clustering is through the K-means clustering. But we will not get into too much details on why we choose this one. But the clustering mechanism we use is K-means clustering. And you see similarity matrix. And you see a lot of colors. The reason that it has a lot of colors is because we model these scores into RGBs. Because we want to visualize it. Humans have more feelings with pictures. So we try to visualize it. And then if you look at the first block, you see some patterns that are alike. Maybe you don't. But we will get into that. So this is the first block. Okay. Did you see any color patterns? Definitely you see a lot of color patterns. But actually there are some blocks that differentiate from other blocks. So for example, if you look at the second line, you see the upper corner, there are some gray colors. And then you see it becomes a little pink over the skin color. And then it becomes more purple. And then it becomes more like pale white again. So it seems that you can see four blocks within these big blocks. Okay. So simply by visualizing it, you already can group these colors into four blocks. I hope that you guys are not feeling dizzy. Okay. Yeah. And actually every line, as we mentioned, it's a similarity vector. How these malware instance compare against other malware instance. So every line represents a malware instance. And then we use the antivirus to double check whether it's a malware or not. And you can see all these malware different namings used by these antivirus vendors. And then all these malware instance happen to be the same malware family called Zbot, Zeusbot. So even for the first upper block, it's all Zbot. And then even the second block is all Zbot. So a total of all these 26, these are all Zbot instance. And because there are some Zbot variants, instead of only believing in these antivirus, we actually try to do manual inspection by our own. We try to double check these are really Zbot malware instance. And then we try to find their versions so that we can tell which variants they are. And there are actually some different bot commands in these different Zbot binaries. So you can tell their different versions. So for version one, it has these commands. For version two, it has another set of commands. So after our manual inspection, we are able to identify different Zbot variants. So the upper three blocks all belongs to version two Zbot. And then the one on the bottom actually Zbot version one. So based on our experimental results, you see that out of 408 malware instance, we manually identify there are 52 Zbot malware instance. And then among these 52 Zbot malware instance, they can be clustered into four groups. One group is V1 variant and then three groups are V2 variants. When we compare these clustering results against antivirus results, they actually match with each other. So our malware clustering have the same accuracy as the antivirus. While these antivirus tools, they are only able to identify 26 of them. While we can automatically cluster the old 52 instance. So even though we don't have signatures, because we know how to cluster them, their colors tend to be similar, their colors tend to be similar, then we are able to say that, okay, these malware binary are actually a Zbot family. And then in that big block, we only look at the first block, right? So there are other blocks in that pictures. There are also other malware families including the Wundoo families and also the Bego families. Also a very famous malware. So these malware instance, they also tend to be very similar. They have the same color patterns. So the conclusion is that as you can see, the traditional hooking-based monitor approach already cannot handle the modern advanced mailwares. Because these modern advanced mailwares, they know how to do a lot of anti-mailwares techniques, including a lot of techniques we mentioned in previous slides. So instead, our approach is to do forensics after the malware already has been executed. So we only need a snapshot of the environment. And the experimental results also justify that because of our runtime forensics, we are able to collect a lot of significant features. These significant features help us to do automatic clustering of malware. So it also helps us to identify even unknown mailwares because we are able to cast them into the groups that they are more alike with. So that will end our talk. And if you have any questions, you actually can also write emails to us. Yeah, surprisingly, actually a normal PC. Yeah. Yeah. On average, we try to limit that into less than 30 minutes. Yeah. So he'd actually ask whether we use a normal PC or not or emulate the environment. Our answer is that we use a regular PC because as we mentioned, there are so many anti-VM, anti-emulator techniques, malware can use. And normally for us to restore all these regular PC boot-up until we finish all these forensics, it takes less than 30 minutes. Yeah. So we're taking a snapshot. After the malware already executed, we try to execute out our binary. So we try to launch our program. So then after we launch our program, we will look for all those three significant features we mentioned. So we studied these startup keys, all these installation files, all these memory layouts, and all these suspicious behaviors. It's still on a very early stage of prototype. Yeah. And we try to make it more mature so that we can release to the community. Yes. Yeah. And then we that's why we have to restore it. Yeah. Yes. Yeah. It's possible. This is why all these kind of forensic tools, they have to stay anonymous so that they cannot become one of the blacklist. Yeah. Once you make it into a product, then they will try to stop you right away when you try to install it. Yes. Yes. Definitely. Yeah. Yes. Yes. Yeah. We will try to make it public because we are still in a rush to finalizing it. Okay. Thank you very much. And then if you have any feedbacks, we welcome you to write us. Thank you. See you. Defcon lighting.