 Hi, I am presenting adversarial example witchcraft or how to use alchemy to turn turtles into rifles. Let's see, I'm Heather Lawrence, I do data science at the Nebraska Applied Research Institute. You can find me on Twitter under Infosecondon. If you didn't get a chance to visit the slides online, I have them on my Twitter. And then I'm also not above bribing my audience, so I have stickers back there and cards in the event that you want them. Right back where this goon here is, waving his hand. Alright, cool. So here we see a video of Google's State of the Art Inception 3 model. And you see a printed turtle, right? It's obviously identified as a turtle. And then after making changes to the texture map, the classifier believes it's a rifle with high confidence from every angle. Ooh, look at that rifle, isn't it so pretty? So most of this research in this space has focused on manipulating image classifiers. It's easier to tell visually than an effect is occurring. And we watched this video, we saw it being identified as a rifle, so I'm going to motivate the real question of this talk, which is, what happens when an autonomous system cannot tell the difference between a turtle and a rifle in a surveillance state? I'm just kind of like marinate on that one for a second. And so I didn't write this for machine learning experts, I wrote this to be approachable. So I'm going to use something like some terminology, a classifier is like a style of machine learning algorithm that determines a class of data. I might say SVM, which stands for support vector machine, it's a type of algorithm, you don't need to know about any of the math or how it works. When I say perturbation, I mean I'm basically adding noise, and that's it. It's a very fancy word, it means adding noise. And then adversarial example is presenting a worst case example for an algorithm. And so my outline goes like this, brief history, types of attacks, what blind spots are, that's pretty important to motivate. Adversarial examples, what they are, how to defend against them as far as we know, white box versus black box techniques, a demo, and then resources are at the end. So in 2004, Dalvi et al released this paper called Adversarial Classification. And it was in the spam detection domain and it outlined a formal game between a victim and a defending classifier. And they were trying to determine which one could fool the other. And then Hohang et al, in adversarial machine learning, they kind of define like a formal taxonomy about regarding the attacks that are possible. And in 2016, it's interesting because now we've moved beyond the theoretical. And I don't even need access to your classifier to attack it anymore, which is kind of interesting. And so we have poisoning versus evasion, poisoning happens before training. And evasion happens after training. And you'll notice here from this big EO at all paper, you have part of the MNIST data set, which if you aren't familiar, is this huge image data set of handwritten digits. And the idea is the classifier is trying to properly determine what that digit is from handwriting. And so they added some noise and you'll notice the classification error, the validation error shot up, right, after they added this noise. And so the evasion attack after training, you see a bus. We add some noise and now it's an ostrich. That looks like an ostrich, right? Yes, yeah. All right, so the types of attacks, right, we have causative, so we can manipulate the training data before training if you have that kind of access. You have data poisoning where you can especially craft attack points injected into the training data again before training. Or exploratory where you're trying to exploit the classifier, kind of figure out how it works after it's already been trained. And then a hybrid is a mixture of those attacks. All right, this is probably the important part of this. Like what is a blind spot and why do I care about it? They are regions in the model's decision space where the decision boundary isn't accurate. And basically this is areas that are not well-defined. And I like to use pandas, so let's say I have a classifier. And I'm training the classifier on what a panda looks like. I've got a whole bunch of images on what pandas look like. But I want you to think for a second about the entire sample space of what is not a panda and that I have to provide that to the classifier. I can't exhaust that space in any reasonable amount of time because the overhead on that is crazy, right? Everything that is not a panda has to be provided to the classifier. Well, if you don't provide that data, the classifier doesn't know. It has to infer, right? It has to infer what is not a panda based on what it thinks it is a panda. And so that's where these blind spots are coming in. Because we don't exhaustively provide the classifier that data. And given, right, this is an ongoing research area. So nobody has definitively proven yet why these exist. This is theoretically why they exist. So let me motivate that real quick. And so here be bugs, right? As introspection into algorithms increase, so do the flaws. Like bug bounty programs, right? If you have more eyes looking on lines of code, you're going to see more errors. If you have more AI experts looking at the algorithms, you're going to see more flaws. So the Bureau of Labor Statistics, it estimates there's like 105,000 information security analysts here in the US, whereas element AI estimates there's only 22,000 AI experts worldwide, right? That's a factor of five in this country alone. So do you want to get into machine learning? We need you. All right, so what are adversarial examples? They're data that presents a worst case to the classifier. It's intentionally intended to make the classifier make a wrong decision. Some of the examples, particularly for information security, is detecting domain generation algorithms used in command and control infrastructure. And then malicious portable executables that are classified as benign. There's actually a paper behind this, which is really cool. They determine the parts of the executable that could not be perturbed, could not be changed in order for it to be executed. And then they took all the other bits and they perturbed all of those. And the classifier could not detect them as being malicious. It was like, oh, this is fine. This is benign. So maybe if you're information security, you remember signature based detection and how that was a problem. Well, now we have a problem with machine learning based detection, right? We're at that next stage of the attack defense paradigm. So some real world adversarial examples are the sticker attack on the self-driving cars. If the self-driving car cannot identify that that is a stop sign. Using eyeglass frames on facial recognition systems, they cannot properly identify who the person is with a pair of glasses on. Or perturbing aural vocal sound and making it sound from it was the best of times. It was the worst of times adding some noise. And it is a truth universally acknowledged. Those aren't the same at all. And so now, remember how you used to throw salt of your shoulder for good luck? Well, now we're using salt circles to trap self-driving cars. So we are effectively using alchemy to fool AI systems now. So generating adversarial examples, right? What I've been talking about this whole time, right? We're adding noise, random perturbations to the sample. And it's optimized with something called gradient ascent. It has to do with derivatives. That part's not particularly important. But it's a method that determines the directions that move the algorithms output by the greatest degree and then directs the input by small degrees to create that output. That's a lot of words. Basically, we're just adding noise. We're adding special noise. And we're adding it to every pixel so that when the classifier looks at it, it's like right in that blind spot, I don't know whether to infer that's a panda or not. So what we can do about adversarial examples is we start facilitating robust algorithms. We know that retraining from scratch increases misclassifications. We know that retraining with disjoint data increases misclassifications. And we're starting to find out that training with adversarial examples reduces the amount of misclassifications. So if we reduce the weights or the activation given to the inputs, we can reduce the amount that that classifier is affected. We can also choose to keep a human in the loop. Do not use autonomous systems. Do not let them do whatever they want and not check them. Or you can use something called the consensus method. So now instead of having a single classifier that's trained, we now have like three classifiers that are trained. They take in the input and they come to a decision on whether that input should be trusted or not. These are all methods to try to make these classifiers more robust. And so we might see our training life cycles change. Anybody who's done machine learning in this room will notice that import data, clean data, test train, split, and then deploy. That's already part of our life cycle. But now with adversarial examples, we might have to train with them, continue to test with them, repeat this process to get that sample space reduced, and then deploy afterwards after we retrain. Unfortunately, some of the early research in this area had really bad attacker assumptions. It's like easy mode, white box assumptions where the attacker apparently has the code, the training data, everything that happens, and like, who has that kind of access? I don't have that kind of access. And they reference the information security community for trying to get more robust attacks. But I guess they think we know something, I don't know. But black box research is now here, right? They're starting to assume more constraints. And attacks can be transferred between classifiers. It's called model transferability. I'm going to get into it here in a second. The idea is that an attacker model uses the victim model as an oracle, and it queries the oracle over and over again for the decisions on how it's going to classify. It takes those decisions, trains a separate, completely separate model, generates the adversarial examples from that attacker model, and those examples still work on the victim. Right, so this is from adversarial examples in machine learning by Paperno. It was presented in USENIX and NIGMA 2017. Basically the same idea, but visually represented. You have an attacker, queries the oracle, the oracle returns what its classification decisions are. The attacker trains classifier B, and that the adversarial examples that are affected by classifier B also affect classifier A. And so this graph that you see here on the left, right, bunch of boxes, it's showing that the source machine learning technique, which ones were affected on the target. So on the vertical axis, let's look at LR, logistic regression. If my source was logistic regression and I trained it, and I used the adversarial examples on, let's say, the other logistic regression, right, the victim model was also LR. My increase in misclassifications go up to 91%. That's huge, 91%, like, look at all that huge black column. Decision trees are the worst, apparently. And this is really scary, right? How many classifiers do you know that are logistic regression? Support vector machine, these are pretty popular classifier models. So what we know is that differentiable models like logistic regression are more affected than models that aren't. So deep neural networks are less affected than logistic regression is. And they make use of something called reservoir sampling. When you have to query the oracle 10,000 times, somebody's going to notice, right? With reservoir sampling, I can get that down to 1,000 times and have less notice. But I can still have that, I can still have that randomized sampling space to where it's almost like I got the 1,000 queries and I only got 100. Notable recent research is very interesting. The space is moving very fast. Right now, we're starting to use a limited information approach and then testing it on the Google Cloud AWS classifiers to see how well they're affected. And they are affected. Some of the papers that I have in the resources, they go more into detail on this and they actually show percentages and they are in the 80s and above. So these are affected in state-of-the-art classifiers in the cloud. All right, so the demo. I know the Adversarial Patch was given yesterday. But if you have a app called TFClassify, it's a classification app that you can, the attempts to classify the things that you have in front of you one at a time and provide it to this Adversarial Patch, it looks like a toaster. Does that look like a toaster to you? Yes, yes, it looks like a toaster. Man, I don't know what you have, but I want whatever you have. All right, so let's go through this demo. We have a pair of glasses at 67%, 70%, pretty good. Yep, that's a toaster at about 40%. And then it's kind of limited on what it's been trained on. So here's my duo-servanir and that's apparently a Granny Smith Apple. Good job, guys. So I have a bunch of references for this talk that I couldn't go into in 20 minutes and that's why I provided links to the slides, right? It's written in nine font. I don't expect you to take pictures of that. And maybe you are in this talk and you're like, man, machine learning sounds cool. Here are some resources for you. There's a GitHub page with machine learning for cybersecurity, which is amazing. And of course, Andrew Ng's machine learning course is like the go-to thing for machine learning. So takeaways, you walked in. Basically, machine learning algorithms can be attacked. Algorithms like humans have blind spots. You need to red team your algorithms to increase robustness. Otherwise, somebody's going to do it for you and you may not know. And then like SQL injection, classifiers require input validation. If your classifier is taking input from an adversarial environment or a possible adversarial environment, that is users that could put in data, make sure you control that data that you accept from that user. Don't allow it to retrain your classifier or otherwise alter it to make poor decisions. Again, you can get these slides here. My name is Heather Lawrence. I work data science at Nari. Thank you for your attention.