 Okay, for our last topic of the week, a super cool one, adversarial attacks against deep learning networks. A number of people have shown that you can make small alterations to an image or to a text and change the output of the neural net dramatically, that you can take an image of a cat where the neural net says I'm 88 percent sure this is a tabby cat and you change it and here's the changed image which I cannot see the difference of and the neural net is 99 percent sure it is guacamole. Now I'm from California, guacamole is not the same as a cat. So what's going on and eventually what can we do about it? Well, what do people do? They take an image which has then been run through a neural net, the neural net then says it's 58 percent sure it's a panda. Then the adversary does some sort of search in the space of perturbations and we can take this image plus 0.007 times this other image here which is found by stochastic gradient descent optimizing what? Optimizing the prediction of the resulting image here it is what looks the same to the human eye such that it gives it a prediction that's different in this case it's 99.3 for sure it's a gibbon that gets a monkey not a panda right so again to be clear the adversary is doing a stochastic gradient descent in the space of inputs of perturbations right a perturbation is a small addition that's done through original image and that gradient descent is optimizing an adversarial loss function trying to maximize the probability that this resulting image is given a different label such as gibbon works frighteningly well and it seems like you shouldn't be able to do it as generally as one can but here's a cool example of something done in 3d make an image of no this is a real toy turtle whoa go back show me my toy turtle a toy turtle this is seesail and what you see is it thinks it's a rifle the neural net with high probability classifies that as a rifle they've subtly changed the texture that one region one's a turtle it's a turtle either way you rotate it it's still a turtle but with the new surface no matter how you rotate it or flip it that is with high probability a rifle whoa very strange that's an espresso that's sort of cool you can make things look like other things cool or scary or whatever so what's going on under the hood well think about what's happening we have some very high dimensional space which is the image of the turtle or the panda or the cat i've shown it here is x1 and x2 we have a bunch of black dots which are turtles and a bunch of red dots which are maybe rifles why are they so close well remember we're sitting in a high dimensional space and every point is close to every other point in this high dimensional space and so there is some direction in this high dimensional space which is this rgb image there is some direction at which it's a short distance from a turtle image to a rifle image it's sort of a consequence of being in a high dimensional space that there's a lot of ways that you can change this image and some of them will be closer to something that something else thinks put differently in high dimensional spaces these decision boundaries that separate the blacks from the reds the turtles from the rifles is a complicated slightly wiggly messy surface because it's very high dimensional neural net so in general these sort of neural nets are subject to adversarial attacks now can one defend against them well yes is really the answer and in some sense the defense looks like something that you would think of as a regularization if you regularize well you smooth that decision surface and we're not going to cover them but there are terms like defensive distillation where you project down so that things become smoother or feature squeezing where you take a bunch of input vectors x and collapse them together such that anything which is very close to the original vector is mapped to the same point so you can't make a tiny adversarial change in every every possible direction more generally and i pointed to a nice paper it's fun to look at this you can as a designer of a network run an optimization which is anti adversarial so what is the adversary trying to maximize the adversary is trying to take your take the input find a small perturbation to an image that changes the label to something else as the neural network trainer you can now and try and minimize the objective function that the adversary is trying to maximize right and this is in theory a horrible search the adversary is searching over lots of perturbations that will one of which will potentially take your input and give it the wrong label you are then searching over the weights right you're training the network searching over the weights to find a weight set such that the new weights have the property that no small perturbation of the input can mess up the output that's something to think about but it's just a loss function and clever people have shown that you can use gradient descent once again to optimize such a loss function under some special restrictions there's always math on the class of perturbations but one can in fact train the network such that no small perturbation of a given class with a measurement of small can in fact fool it to give a different output and this is in fact a form of regularization because what are you doing you're in some sense making the weights more robust you're shrinking things so that it's not as sensitive to these small perturbations very cool the predator pray competition goes on because you still have some class of perturbations against which you're robust have fun enjoy the homework