 I'm honored to be here at DEF CON and grateful to be able to share with you some research where you do three things. First, we're releasing a tool to attack next-gen AV, and you can find that GitHub address. I'm going to describe it today and demonstrate it and show you how to use it. So first, let's set the stage. Today we're going to be talking about evading next-gen AV that uses static analysis for detecting Windows PE malware. And to motivate this, let's first talk about rules and how one might write a rule to detect malware. So on this little chart here, I've plotted a bunch of totally fictitious red dots and blue dots, which are meant to represent files as described by first, file size, and second, by the number of registry keys contained, you know, strings contained in the file. A feature of this presentation. And then I'm going to just by hand create a YARA rule, which I have in that black box there, that sort of defines this region of this feature space, file size, number of strings, that cordons off all of the malware in my dataset. So this is nice, but of course it's really easy to break. And if I just, you know, take my malware sample and maybe add some bytes to the end of the file that has no, does not break either the format, the PE file format, and doesn't break the function of the malware, then I can break this rule. So what makes machine learning maybe harder to break, harder to attack? And I guess there are a couple reasons. And one is that you can kind of think of the machine learning as a much more sophisticated and graceful rule. And it learns these complex relationships from the data automatically. And furthermore, it kind of instead of presenting this sort of brittle cliff from malicious to benign, they're sort of the smooth territory where the machine learning model can tell you about its confidence that a sample is either malicious to benign. And this is important because this allows sort of a graceful degradation. If one modifies a malware sample, they're sort of this graceful falloff from malicious to benign. So that can make it hard. So can we break machine learning? Well, the short answer is yes, we can. But the idea for Windows PE is that we like to input a file that our model knows is malicious with high confidence and make a few subtle changes to the bytes on disk, modifying elements that don't break the PE file format or don't break the behavior of the malware, and then trick our model into thinking that's benign. So it's actually become quite fashionable to break machine learning in recent years. And this will be a constant reminder to keep me on my toes. Look at how fast I've done. If you haven't seen this image, this is sort of famous in the image domain that one can take an image of a bus, for example, that image recognition computer vision model knows is a bus with high confidence and change the pixels ever so slightly. And now, even though there's no difference to your eye, now the model thinks it's an ostrich with high confidence. And this is, you know, fun for images. But there's kind of three takeaways, whether images are not, from this kind of adversarial machine learning research. And the first is that all machine learning models have blind spots. So Next Genevieve, Buzzword, WeDoIt, at my company Endgame, they have blind spots. Number two is depending on how much knowledge an attacker has about your model, they can actually be really convenient to exploit. And the talk we're doing today, the research I'm presenting to you today is actually in the category of the least convenient, the most inconvenient spot for the attacker to attack. And sort of the third takeaway is a little scary. And that's that if I find for my model this sort of bus ostrich confusion example, there's a decent chance that it will also work against your model. So that an attacker doesn't really need, often need to attack your model in order to find some success rate of evasion against your model. And that keeps people up at night. All right, so that's for images. That's for images for bus ostrich. The thing about this, really how this works is that in those cases, there are two things. The attacker knows everything about your model. He sort of has the source code to the model. He knows the weights, he knows the parameters. And in fact, it has to be a special kind of model like deep learning and neural nets that are fully differentiable. And given that, for my image of a bus, I can actually ask the model, what would confuse you the most? Tell me which pixels I should change to confuse you the most? And we'll happily give you an answer. And by changing those few pixels, the good news is by changing pixels, I have not broken what it means to be an image. But let's think about applying this now to PE malware. If I were to present some model with bytes from a malware sample and ask it what bytes or what feature should I change, and I, you know, I change those bytes on disk, well, at worst, I've totally broken the PE file format. And at best, I've totally broken what the malware was intended to do. So two things, requires full knowledge. You have to know everything about a deep learning model. And the samples it generates are not necessarily malware, in fact, are not necessarily PE files. So a kind of cooler attack that's based on a black box, so it doesn't need to be deeply learning, can be any sort of machine learning model that reports a score to you, has been investigated by my co-researchers at the University of California. And essentially it's based on genetic algorithms. And just in a nutshell, you know, these are based on the evolutionary principles of survival of the fittest. And I start with a big batch of malware and sort of breed it with benignware. So elements of a malicious sample will take structures or elements, in this case, for PDF malware. And it will insert elements randomly, mutate sort of the DNA of the malware, and pass it back to the model. And if I see that it's decreasing its score, well, then I'm going to keep that malware sample around for the next round, the next generation of breeding. And after doing this, you know, for two weeks you can evade these kind of classifiers. Now the difficulty of this, however, two things, I have to have a model that reports a score. It has to give me a number between zero and one. Not just a malicious benign, it has to say 90 percent malicious or 20 percent malicious, right? And secondly, there, in this process, it's very possible, quite possible that some mutated variant of malware actually doesn't do the malicious behavior. So my colleagues at the University of Virginia have used a sandbox, an oracle, to make sure that before mutation and after mutation, that behavior has not changed. And that can be quite expensive and is why this kind of attack can take so long. So I'm setting the stage here. I hope you realize, I'm trying to paint a picture by why this is hard for P.E. malware to attack machine learning. We want to avoid requiring full knowledge about, you know, a deep learning model or any other kind of model. In fact, we don't want to care what kind of model we're attacking or even that it is a machine learning model. Secondly, we want to make sure that whatever malware we produce by attacking this model maintains file format and maintains functionality. And thirdly, we don't want to, we want to avoid the expense of running things through a sandbox to check to see if they are where possible. So our goal is to design an AI buzzword in the title, but true, design an artificially intelligent agent that will learn to play a game against your machine learning model. It will choose mutations that are known to preserve file format and function. And for this, we're going to turn to reinforcement learning. And to do that, I'm going to hopefully not insult your retro childhood or current retro lifestyle and explain to you the game of Atari Breakout in two sentences. So this is a game where you move a paddle left to right and you hope to bounce the ball off your paddle and make it launch towards a brick and every time you knock down a brick, you get a reward for knocking down that brick, right? So how would I build an AI agent for this based on machine learning? Well, one way to do that that has been done by the folks at OpenAI is to wrap it in so called a reinforcement learning framework. And it's actually really simple. So I've got a screen shot for my environment that includes the display of the Atari output. It includes an ability to manipulate the paddle left to right or do nothing. And there's some scoring mechanism. It gives me a reward every time I knock down a brick. And then I train an agent on the bottom side and here the agent learns through some sort of delayed feedback. So given a state of the environment, which is literally like a screen shot of Atari gameplay where it supposedly can learn the position of the ball and of the bricks and of the paddle, it needs to choose the best action. Choose to go left or choose to go right. And based on that, eventually it may receive some sort of award for doing an action that resulted in a reward. So the basic idea here is after playing thousands of thousands of games, then the player can learn an answer to the question, you know, what action is most useful given a screen shot from Atari gameplay? So this is a fun problem. You can actually go and download an AI for Atari from that website at OpenAI that will be better than you at Atari breakout. We're going to change this to play a new game. Let me first describe to you why we wrap this in reinforcement learning. So in the Atari example, when I move my pedal right, there is no reward for that. I get no points, right? But I'll move it again right by chance, move it left, you know, by chance, move it right and by some stroke of luck the ball bounces off my paddle. Again, no points. I move right again but eventually that ball goes and breaks a brick and I get some point. Now in isolation, none of these moves were actually useful and resulted in an award but because I got this sort of eventual reward for all of my moves, I'm going to distribute and I'm going to sort of reward that sequence of actions as having provided some kind of useful benefit. And so this is the same thing, this very same concept we're going to use to break next gen AV. So here's the new game. Instead of a screen shot of Atari pixels, we'll have a malware sample. The scoring mechanism will be my next gen AV. It will give me a score, sorry, instead of a score it's going to say yes, I believe you're malware or no, I believe you're benign where. And the agent now will learn to select from a buffet of options that are known to preserve the file format and the function of the malware by manipulating statically the binary and disk. And by playing thousands and thousands of games, the hope is that the agent can sort of learn basic ideas that given this kind of malware sample I should add an import or I should append to the overlay or I should create a new entry point and use a trampoline to get to the old entry point. Things like that that can sort of hide the presence of malicious activity by creating camouflage in the binary. So we are releasing a tool to do this and you can go to Github in-game ink gem malware and download some very rudimentary codes to do just this. We have provided the following sort of the following elements of this gameplay. This literally is a gem that can be used in the open AI framework for creating your own reinforcement learning agent. And we've provided some very basic ones in there to begin with. But it works like this. So in the case of Atari, the state was a screenshot. In our case, the state will be a feature vector that sort of summarizes poorly but, you know, coarsely the state of the malware. So what is the malware look like that I'm using to attack the next gen AV? That feature vector is based on general file information, header info, section characteristics, strings, file byte and file entropy. Things that are often used in static malware classification by next gen AV. And we're going to feed that into a neural network that will learn this state. So given this state, what's my best action? The actions that I can choose. Right now we've concluded just, you know, our buffet has just a few options. I can create an entry point, create sections, I can add bytes and places that don't break the file format or functionality or modify things that are not known. So, you know, these idempotent operations of packing and unpacking that don't change the behavior of the malware but change how it is presented to a malware classifier. And we're using the very cool tool called LEAF, the library to instrument extable formats by Quark's lab, a shout out to them. And finally, we are also, you know, included in this repo is sort of a toy next gen AV. It's a decent toy. It's worthwhile to attack and see how it performs. The key here is that this game doesn't care what you put in that black box. It could be our toy model. You could rip it out, put in your own next gen AV model. It could be a traditional antivirus engine or whatever. At the end of the day, you just have to retrofit it so that it will report a zero or a one. One for malicious, zero for benign. All right. So, let me just demonstrate how this worked on some samples here. But first, just to drive home how hard this is, you know, I have, the agent has a very incomplete state of the world. The malware he inspects through a feature vector that is noisy and not at all perfect. His actions are stochastic in nature. So, for like Atari gameplay, I can say move right, but I don't know how far the paddle is going to move right. There's a similar thing here. I'll say add an import, but it's going to choose randomly from a list of known benign imports. So, there's this random nature. And furthermore, I know nothing about the model I'm attacking. So, this might be a little bit like you trying to solve, you know, maze or traverse a maze without a map and wearing kaleidoscope glasses and well intoxicated, which I probably describe a lot of those, I don't know. But this is a really, really difficult problem. Nevertheless, we hope that we can learn. So, you probably can't see this. I mean, I just, whoa, I can hardly see that. And so, what I'm showing here is just the output and two examples. At first, the model is just totally guessing at random and getting nowhere. And after I wait for several minutes, the model through his exploration process catches a lucky break and he creates a new entry point which evades the machine learning model on that malware sample and he updates how he learns to evade that model. And by getting lucky enough, many, many times over tens of thousands of games, then, you know, with this sort of rudimentary model that we put in place, we can begin to learn to break NextGen AV. So, here are the results. In one minute, given a batch of malware samples that neither the agent nor the model have ever before seen, we could modify those samples with our agent that has learned to play the game against the NextGen AV and 16% of those snuck past. Furthermore, do you remember how I don't necessarily have to attack your model to bypass a different model? We uploaded those samples to Virus Total, both the pre-modification and the post-modification. Pre-modification, 35 out of 62 caught those samples. After our agent got a hold of them, there were 10 additional antivirus engines that whiffed on those malware samples. So that's pretty cool. We also ran sort of random mutations. We want to make sure our agent was learning something and not just getting lucky all the time. So we did the lucky experiment and, you know, it turns out lucky is pretty good too. But the agent's about 50% better than lucky. All right. We're done. The summary is this. You can go to GitHub, in-gameink, Jim Malware and try this game for yourself. No knowledge of the target models needed will manipulate raw binaries and produce new binaries this world has never seen, some fraction of which may evade your machine learning model. I hope that people will contribute and make it better. We use these things at in-game to help harden our models. Stepping back a moment, it turns out that machine learning is actually fairly robust. Even under direct attack, the machine learning models warded off most of these attacks. Nevertheless, all models have blind spots, so don't buy into the hype. And with that, thank you.