 Awesome. And for our final talks of the day, we have Hiram Anderson and Ryan Kovar on competitions in Infosec and ML. And really quick, we also want to remind you afterwards we have AI and Wine where you'll be able to sort of play around with some of the competitions and data that they'll be presenting today. So without further ado, can we get a big round of applause for them? Thanks, there are two of us. I'm pleased to be speaking with Ryan Kovar announcing two different competitions. So I'll be as fast as possible to leave as much time for Ryan as necessary. And I have a plan to catch also. So my name is Hiram Anderson. I'm the Chief Scientist at N-Game. And I also want to recognize Zoltan Balash. He is a co-author of this work and has actually done most of the legwork to cross the finish line to provide to you very relevant to recent news, a competition for evading next-gen antivirus. So we like to launch right now a 10-week competition. The skinny is this. You will download malware and modify raw bytes and have them evade up to three machine learning models. They must, when you submit the malware, they must remain functional. And if you do, you will win the best GPU that money can buy for a PC. Okay? So that's the skinny. I want to just break down in detail what this competition is about. And we totally encourage anybody and everybody to join this competition. I think there are really good reasons to do so. So first, machine learning static evasion. So let me get into a bit more detail. The competition is about static detection of malware using machine learning. And all of the malware samples that you'd be provided are scored as malicious by the machine learning models. So you'd need to come up with techniques to modify the binary in a way that does not change its behavior or break the PE file format for a Windows executable file. For example, you might, as shown in the slide, you might unpack a packed binary, rename a section, add a new section, and attempt to disguise the malware as benign. So this is an adversarial machine learning competition. And there have been many of these in the image domain. But I'll just note that likely you'd want to leave your LP norm constraints at home. This is not the same as a machine, a computer vision competition. Some of those techniques could be useful to you, but to be pedantic, images degrade gracefully. It's very easy to add noise to an image and it still remains an image. But if you add, if you change bytes in the file, you'll very likely break the file or at least break the functionality. So the threat model here is different and the constraints for what you're preserving is very different. So for this contest, you'd be welcome to still follow gradients in the models. But the chances are that if you naively perturb byte values that you'll be unsuccessful. So the rules briefly. Step one, you will need Google creds to go register at evade malware ML.io. Once you have registered, you will also agree to terms of service that are simple. It's essentially says, this is real malware, please be careful. And for any monster malware that you create, that is your creation and you're responsible for it. Okay. You will download 50 malware samples. The malware samples represent sort of popular families over the last 12 to 18 months, including, you know, those that have tied targeted financial industries and things like that. When you download the samples, you may also download a full description of three completely open source machine learning models. And I'll describe those a bit in more detail. The next step would be for you to do your thing. And the best intent of this that it'll work best if you sort of do this on your own local resources, you can submit any time but there's a there's a time delay when you submit so it's best to check and do things locally. Let's see the next the next sample is the next step is after you've modified your samples, you want to make sure that they still they still work, you will get no points for samples that do not work. Lastly, when when they have when you verify that they work, you'll put them back in a zip, a zip file and upload them to the website that's at evade malware ml.io. You earn points for evading for every sample, there are three models, and you get one point for evading each model. So 50 samples, three models, the maximum score would be 150 points. So you could win with 150 points. So long as you submit your, your submission before somebody else who might get 150 points, right. And just to note, this is, you know, we would have loved to host this on Kaggle, except they don't really do sandboxes or malware. So instead, following Kaggle, if you would to qualify for the GPU, we would we would want you to publish your solution to share with the community. So those are the basic rules I want to just go through a little tip, a few tips and tricks to help get you started for those who would be interested in participating. So the first is that when you download these samples, they will have names like 001, 002 and 003, right. And when you muck with them and upload them, they should also retain the names 001, 002, 002. Otherwise, we have no idea what sample belonged to what because you changed the bytes, okay. But you download zip file, and you should upload also a zip file. When you upload the zip file, there's a few gatekeepers that this will go through. So first, there's sanity checks, like, does have the right final names? Is it a PE file? Or did you totally, you know, did you mess up the MZ header somehow? Does it correspond to known hashes? If that passes, it will pass on to be analyzed by one of three machine learning models, which I'll describe in a moment. But they are malconv, non negative malconv and imber. Each three of these are all open source models. There's papers about all of them. If you if the sample evades any one of those three models, then it will go into a sandbox phase where we will run it on our server and verify that the that the sample still exhibits its malicious behavior. So as an example, if you look at the bottom of this graphic, if when you upload a file, let's see 001, it looks like it evaded two machine learning models, but did not run a sandbox, zero points, right? It is not functional, zero points. Sample 002 evaded one machine learning model and and did its thing in the sandbox, one point, etc. So the models just briefly malconv. The first model is a simple convolutional neural network that operates on raw bytes. So bytes in score out. And it consists of simply an attention network and a single layer convolutional network followed by a multi layer perceptron. So this is a fully differential model much like you'd see in computer vision. And when you go to this GitHub site that's listed there and also listed all over on the website, you can download the inference code, the weights, the model structure, everything you need to construct any kind of white box attack you might you might need to do. The second and also there's there's a paper list at the bottom, you can read all about this model and everything about it. Okay. The second, all this code, by the way, for these differentiable models are in PyTorch. The second model is non negative malconv. And it is identical to malconv except for one thing. And that is it has been trained with a non negative weight constraint. And essentially, the purpose behind that is that it wants to pay attention only to indicators of maliciousness. And, you know, ideally, it'd be harder to fool by adding sort of like benign strings to your to your binary, right? The third model is Ember 2.0. Yesterday, Ember is so hot. Ember 2.0 was released yesterday. And it is an improvement upon Ember 1.0 from from last year. But the difference here, this is still a totally open source. You'll have the whole model. The difference here is that a couple of things. Ember 2.0 is a very competitive model. It is obviously it's a it's a light GB model and not differentiable. So you have to come up with maybe a different set of attacks than following gradients. Obviously, what one more thing is that all three models were trained on the same data on the Ember 2018 sample. So you'll have a list of all the training hashes too. So you essentially know everything about how these models were created. All right, a few tips. We do hope that you form teams and collaborate, but there's just one GPU. So you'd have to decide how to deviate the course if you're the winning team. Please do invest 30 minutes in checking out the code, the model, the inference code and getting it to run locally. It will save you a ton of time to not rely on submitting things to the service. Do invest 30 minutes and setting up a local Windows 10 64-bit VM. That is the environment in which we'll be checking the functionality of your submitted samples. As always, be safe, be responsible. Don't connect your VM to the internet. Things like that, okay? When you do upload, it could take up to 30 minutes for us to validate the full suite of samples in your zip file. So it's to your advantage to do this offline on your own and then submit. Okay, so more tips, some things you might consider as you're doing this competition. There's a number of things that you can do that will not, that should not, I should say, generally should not break the PE file format. You can append data to the file, like was shown with the universal silence bypass. You can add or remove the signature. You can fix the check sum, which is often broken in malicious binaries. You can remove the version info. You can change code or data with no ops. There are lots of things that you can do, and that's up to you to be creative to do that. Things that aren't allowed that won't work for you, you cannot make a dropper and drop the original sample that will not be picked up in our sandbox. You cannot make a self-extracting archive that will not be picked up in our sandbox. And I will advise that LEAF is the tool that we use in EMBER for instrumenting and modifying executable files. It's a fantastic thing with Python bindings, and I'd highly recommend you could do most of these tips that I've listed here using LEAF alone. The last tip I'll say is there's a Slack channel. So if you're having trouble or if you want to share ideas, join on the Slack channel, which is also listed on the website. And also go ahead and we encourage, hope that you'll learn from the wisdom of others standing on the shoulder of giants just Monday concluded a KDD adversarial ML workshop about malware, but theirs was only import features only. So it was not live malware. There was no functionality requirement, but there are things you might learn from those results. Also concluding this week was the hack-in-the-box cyberweek challenge. There's a slightly different where they created reinforcement learning agents to write, to append to the malware source code and submit those samples. So again, that's a slightly different threat model, but there are things that you might learn that could help you there. And that is it. So I'm concluding in 12 minutes and 15 seconds, rank over. The competition opens exactly right now, and it closes in 10 weeks from now on Friday, October 18th. And I'll just remind you, if you'd like to win the fastest GPU that money can buy for your desktop, you must be able to publish your solution. So we'd encourage, if you have, I'll be here, Zoltan will be here. If you'd like to know more about the competition or details, please hit us up. Hope you can get started tonight at the AI NWIND right after this talk. All right, thank you.