 Hello everyone, thank you for coming and especially thanks to the G-Bone guys in the opportunity here to give you this representation. First of all, a few words about the speakers. My name is Rafi, and I'm a senior researcher of the Union Land. And my co-speaker is Kranchi, he's the director of Red Intelligence of the Union Land. The introduction of my team, Union Land, is the one of the seven Tencent security land and land tree. We pay attention to the construction of the Tencent cloud security system. Those are the authentic and defensive research and security operation of the cloud network environment. And a few of the cloud security products based on cutting-edge technology, such as machine learning. So here is our topic. We will tell a story about height and thick. First, this is the first part of our story, the method to detect malware. We have two ways to detect malware, static analysis and dynamic analysis. The first method ahead of your mind is signature based on five Tencent. On the right side is the malware name, and the orange color is the pattern, and the next is the hash. The third way is the previous six, and it's a set of rules, and it's a more flexible way to detect malware. Dynamic analysis, we always put the malware to the sandbox or virtual machine, and analyzing the software in motion, but it has some shortcomings. It's slow compared to the static analysis. It's too slow, and it needs to detect the virtual environment by the malware. And specifically for the malware, it needs some input patterns. It's difficult to see these things. So how much time does an antivirus product to detect a new malware? Here is an example, the most famous malware last year, Bonaparte. In the 12, 25 am, just 16 antivirus products can detect it. And nine hours later, just 16 more antivirus products can detect it. During the nine hours, thousands of computers infected with the Bonaparte. So some technologies to bypass the detection of antivirus products. There are two ways to bypass the detection of antivirus products. Just change the signature and hide the signature. We use the CCL or the Mac CCL to slice the file into pieces, and we can find which pieces have the signature. Even if we can't find the signature, we can just hide the signature. We can hide it, or use the polymorphic, metamorphic of a filtration. Almost 60% malware are present in our monitoring. Here is an example of the polymorphic malware. This two code is the same malware. The parent is the same malware, same function, but different code, and hard to find the signature. You can do this technician yourself. You can find the antivirus service anywhere. On the darknet, you can buy a diamond fox, $370. And in China, under the brand, about $260, you can buy the service. It's a big problem for the antivirus products. There are more and more antivirus products to embed machine learning in their product line. But why? Because machine learning can produce the big data. Sometimes, for this example, even two features, for the handwriting rules, it's difficult to write the rules. But for the machine learning, it's easy to find the boundary here. Here are two examples of our laboratory to detect malware with machine learning. Here are three polymorphic malware. The first is the polymorphic malware. The parent, the variant, and the strategy. We return the malware directly to the malware image. We can easily tell the difference between the image and the difference between the malware family. This is our procedure. We return the malware to the image and then we can detect the malware. It's easy and convenient. The second way is the struggling truth. Because the parent malware and the metamorphic malware is a big problem for the antivirus products. For this example, the blue line is the original file. It is the structuring trophy of the original file. And the green line is the structuring trophy of the package file. We can easily tell the difference between the different package files. We can even learn to unpack them. We can tell the difference. So machine learning has some advantage to the traditional way to detect malware. The traditional way only detects no threats, rely on no rules. But machine learning can do it automatically. And 100,000 features are needed to detect the new malware and not to bypass it. But it is not readable. The answer is no. We use a machine to beat the machine. So our method is the same. Here are the generative adversarial networks. Here is the image of Canada and some specific noise. And that's why you're confused and recognize it and keep it. Because machine learning models always have overfitting problems. It has blind spots. Another example, we just changed one pixel of the picture. It can't handle the model. This is a frog. After changing one pixel here, it turns to the cat. But it's difficult to apply this technique to the malware generation. For the image, with some noise, it's still an image. We can again tell the cat to change which pixel, how to change the pixel. It's still an image. But we change the bite and structure in the P file. It will be a broken file. It can't handle the run. Here are some jobs from previous work. By the cantering, they did great interface attacks to the mobile code. But the attackers required full knowledge of the model, structure and weights. But most of the time, attackers have no access to the architecture and weights of the neural network to be attacked. The second way is that some state-of-the-art neural network attacks. Attackers can be black box neural network. Attackers can attack other machine learning operations. If you don't know the models they use, the anti-vacuum products they use. If they use the linear regression or the SVM or the neural network, it's all the same. Here is a job by WinWin last year. They attacked the black box detector in the game method algorithm. They can download the TBR from 100 to 0. But they assume the feature is a malware detection algorithm. They know and know these features. Actually, we can't exactly feature the algorithm used. Our goal is the design machine learning process to change the both and change the human. We plan to attend the black box machine learning and maybe bypass the malware analysis. Here is an example. It's a standing feature, but we can see the dotted moving. Our human has the same, has a blind spot. We will camouflage malware to be a binophile with high-specity. Here is the P structure. The P file has the header and the section. The header, for example, the P header, the dot header, for the section, has some data. For example, strings, imports, codes. Here is the architecture of our B9 malware gang. Our gang name is B9 malware gang. The structure is almost like a normal gang structure, but the difference is generator. We use autoencoder as our generator. We train the autoencoder with a binophile. And then we put the malware into the decoder. It will generate like a binophile, deny malware. We use autoencoder. This image is generated by a deep fake. It's also an autoencoder. In this film, we know the cage is not active in this film. The avengers are here. It's a fake picture. It will change the human and confuse us. The process of this technology is that we train the autoencoder with a person A's face. And then we put the person B's face into the decoder. We generate a face of person B. The person B's face is more like a person A's face. We train our generator with a binophile. And then we put the malware and random noise into the decoder. This is our generator. We will generate a deny like malware. For the malware, we will tear the malware apart, the strings, codes, symbols, and headers, and put the path to the game, generate the new screens, new codes, and new imports, and new headers. And we will recombination this path to a new file, deny malware. And the result, we put some tests. And this is an example for the crack. 50 to 1 antivirus product can detect it. And now after one epoch, just six, can detect the malware. And the result we can see is the signature matches the signature. How can we defeat this attack? Defend this attack. You can attack your model to discover the blank spot, and then you can fix it. And then I don't want to show the spot outside. For example, for the crowd-strike-foken e-gap-beat mess, they show the spot outside. You can bend this spot. You can find which feature is suitable. And then take the most stable signature, you can defend this attack. Thank you. I think I have a question. Go ahead. You said you trained the model with the nine files to generate the code that overlaid on it. Did you just use random files, or did you use Microsoft libraries, things like that that seemed benign? Random file. For example, we used the Microsoft file. It will compare with Microsoft's file. It's still a normal file. Did you generate a file run? Did you unitest on it to make sure that they have all the test cases and they ran exactly like the original file? You changed them on a thrice and now can't be detected. But does it do the same things now? Does it do the exact same things or has something changed in its code? For example, if you change the wrong fit in there and it hits an incorrect web address, then the file is going to be useless now because the malware is not going to be able to hit its command control interface and it's going to go, okay, I can't do anything else. The combination, by hand. So you're a bad human to check the format. So you run it in a sandbox to check the malicious. Yeah, yeah, yeah. And adding to that, so we also did some similar thing for the benign files. You input to the discriminator, so it looks like the benign. Those benign files I think can be random, right? Any benign files, it's not necessary to be, because it approximates the distribution, so it's benign file. And for the malicious one, actually we did something like if you are generating on the binary code and you keep the ones and only change the zeros, so that it's kind of like you only add function but you don't delete things. So that presumably you will be able to preserve the malicious functions and behaviors. Yeah, but still need to check. Just a side question. Did you guys try, did you try any of the other GANs or did you just use the auto controller? Like, I'm sure trading GANs has been like a difficult endeavor for everybody. Yeah, I think he used GANs, not auto encoders, right? You use GAN, not auto encoder, right? Yeah. He used a gender identity discriminator instead of the auto encoder, which is encoder and decoder. Washing GAN or just a regular GAN? That I don't know. Washing GAN? What kind of GAN are you using? I think one, my understanding is for GAN in image, it's hard because it's very hard to generate high resolution images. But for the malware, the binary case, it's relatively easy because you don't need to care about this high resolution and realistic features. But again, it is hard, but you can look at the gender identity discriminators loss and I think, yeah, but I don't know what kind of GAN you are using. But I think with a GAN it's better. Questions? Okay, let's thank the speaker.