 Hi, I'm Kondha Reddy from Indian Institute of Science. I'm a PhD candidate at the video analytics lab at the Department of Computational and Data Sciences. Yeah. So today we're going to talk about adversarial images. So there are a lot of questions we get answers from this talk. First of all, I'll try to introduce them. So what are they? And then why is it a worth the effort to know about them? And then we'll go a little more deeper. Why do they exist? And then there are recent efforts to generate and then analyze the deep-learned models. And yeah, if it is a problem, what can we do about it? And then we'll see a couple of approaches to efficiently handle them. So what are adversarial images? So adversarial images are perturbed images that look very similar to the normal images to us humans. But they are not normal to the models that you train and the fool efficiently, all those models. That's the definition of adversarial images. So for example, on the left you see a normal image. When you present this to one of these classifiers, something like VGG, ResNet, you see a label predicted as sheepdog. That's correct. On the right you see a little perturbation added to the image. And then you see paintbrush and absurd. So these kind of images are adversarial images because we don't have any problem recognizing the object here, whereas the state-of-the-art object recognizes how. So if you can't see properly the perturbation, I have a different image. Yeah, the resolution of the projector. So I hope you see some of the perturbations added to the image. So the objective of perturbing is basically this is a specific objective to make visually imperceptible and then cheat the classifiers. That is a pair of objectives when you construct a perturbation. So this is what adversarial images are about. So why do we need to study them? We need to study because they misclassify all these object recognizers with confidence. And you can make them to fool to whatever label you want them to fool also. So here is an example from one of the recent papers. On the left you see an image which is Panda. And VGG classifies with the, I may not be correct when I say VGG, but one of the state-of-the-art classifiers correctly predicts with 57.7 confidence. And you just add this noise with very small magnification. And we don't have a problem to recognize this as Panda. But the model says it is a given with 99% confidence. This is something very severe. So we need to study and understand them. If you're not convinced, I'm sure pretty much most of us are really into driverless cars. Let's take what can happen. So if it is one of the into future driverless cars, it sees a signboard, something like this. Obviously, there is a CNN recognizing the world around it and try to infer something out of it. Let's say this is an adversarial image. And if the CNN thinks that as a no-limit sign, that's something chaos. You want your systems to be reliable. You want your systems to be stable to perturbations, attacks, any other noise that is there in the acquisition process, et cetera. So that's a severe problem. I hope I convinced you. So it affects a lot of things. It affects deployment of the models right from simple access granting to, let's say, defense applications, autonomous driving, medical diagnosis. Everywhere we have a deep learning model, which is susceptible. That's something about the deployment of the models. So that's not the end of the story. So they demonstrate a very peculiar property, the adversarial images. That's called transferability. So I'm sure some of you might be thinking that it's not the case always unless you have the information about the target model that you are attacking. You're not safe even if you hide your model from any attacker. So that's what transferability is. You don't really need to know the target system or you don't need to have the data on which the target system is trained. That's an interesting property, which we generally refer to transferability property of the adversarial images. So that makes the problem more complicated. Let's say you have a target CNN trained on this database. So you hide this. You don't show this. But in my laboratory, I'll create a model CNN with a very different database. Of course, it should match the same distribution. Like if this is recognizing cats and dogs, you can't train this with humans. So as long as you have the same distribution, the samples may be different. And the model architecture is completely your choice. And you go about training this model and build adversaries for your model and your data and create perturbations like these adversarial images and now take to the target system. And yet, you can fool with significant confidence and significant number of times. That's what transferability is, which is also severe. The story gets bigger and bigger. There is recent work discussed probably a couple of weeks back in CVPR. What they show is, so there is a concept of universal adversarial perturbation. So one single image shown in the middle can fool different images that belong to multiple categories. So you just add this guy and you get to perturbation adversarial images. You see all these red ones are misclassifications and blue ones is classification. So most often you see getting misclassified. So that's what we call universal adversarial perturbation. So you don't need to really know about the image. It's it need not be specific to the image. It can be one universal perturbation when you add, which is almost imperceptible, but can cheat all this state of your neural networks that are doing object recognition. So there are models which can fool segmentation and other aspects of vision also, but for this talk I'll be concentrating only on the recognition models. So just to give an idea to what extent they can fool, these are the numbers copied from the paper. So if I train a VGZF in my laboratory and attack all these models, this is how the fooling rate is observed. See 93% of the, if the target matches what you have, absolutely like 93% of the cases you fool. And if you don't know about the target model, this is what you get. These fooling rates are significantly high out of 100 times, let's say 40 times, it fails. Yeah, it's not worth it. Similarly, all this diagonal, sir, when you know the target system and off diagonal, sir, when you attack a foreign system where you don't have any information about the model. So these are significant. So yeah, story gets bigger and bigger. So at least this model, they use a subset of training data to get this particular universal perturbation. So in our recent work from our lab, we showed that just there are objectives that don't require any data to create a universal perturbation. So without using any image, be it a random image, be it a image that is in the distribution of the target data, anything. We don't take any images and just go about perturbing without the knowledge of the data. So it's called data independent objective for fooling systems. Again, this is the same table from our paper, which is quite significant. So I hope I convinced that this is a severe problem. And then let's go to how are they generated actually? Yeah, we observe that there is a concept of adversarial perturbation. Why, how, all these things will continue. So let's take a linear model. So it holds for any data, any machine learning system. It's just that we work on visual data. I've been talking about the CNS, the two recognition, but it holds for any ML system. So let's take a case of linear model where linear model is W transpose X and X star is the adversarial image or adversarial data. When you input adversarial data, obviously have W transpose X the clean part and delta X is the perturbation part. So the activation or the output grows by this amount, W transpose delta X, right? So if you want to maximize this particular activation, the growth in the output, obviously it looks like if the sign of the delta X is matching with W, it increases the growth. So that is one thing. And if you have a limit on the amount that the corruption, the noise they're inducing has, let's say you have a limit on delta X, which is epsilon. So each of the component can be modified only by delta epsilon. And on an average, the magnitude of W's, the component of W's have magnitude M, right? And the dimension of W is N. Let's say this is the case, the activations grow by NM epsilon. So it's linear in the dimensions. We all know all these models work with higher dimensional projections. Obviously, if you increase N, the activations grows by a lot. So this is one simple example, what can, which can tell you with an infinitesimal change in each of the components, you can cause the systems to generate a large change in the output. So this is a case of a linear model. Let's take the case of non-linear models because like none of the models, we work linear and a lot of complex models have extreme non-linearity. So let's take a minute aside to understand the gradient concept when we train a neural network. So what is gradient? So basically, when you train a neural network, you have a cost function. Let's say J is the cost function associated with neural network with parameters theta and input X and Y is the target. So we generally use this to update the parameters in the negative direction of the gradient. You compute the gradient and there is a learning rate and you move the parameters by that much amount. What if I compute the gradient with the input X? What does it say? So definitely it will be of the same size as input. If that is X's image, delta X, which is the gradient of the cost with respect to the input also looks like an image of the same size. Let's say two to four, two to four, three, the typical size. It gives the direction in the image space in which the cost for this particular prediction increases. That's what it gives. Gradient gives the direction in which the function increases. Here the function is cost, right? Imagine two to four, two to four, three dimensions where one of the points is your data and you're doing gradient ascent in that space at this point X. You just move a little bit that gives a variant of X where the cost is a little bit increased. Let's take an example. This is your X, X naught. So the cost computed at this particular example is, let's say 0.76 and the confidence given to the character class which is soap dispenser is 0.546. Obviously, out of all the thousand, if it gets this much confidence, the label, the system is going to predict this much. I compute gradient. This is how it looks after magnification. So it will be very small. So this is the gradient computed at this particular sample. Let's say I add this to X naught and get the adversarial component, adversarial sample. So this looks like almost the same, but if I compute the cost now, it will be 1.18 far away from this and confidence to the true class is extremely low and then some other class is increasing. Now the system is forced to do a prediction which is wrong. This is what the gradient can do. It can give the directions in which the loss can increase even without changing drastic amounts in the input. So this is the case of a nonlinear model. So these are some of the examples. So this is the gradient computed for all these in the middle row, middle column. And these are the X naught squares, sorry, stars, perturbations, almost imperceptible. So just to summarize this particular gradient, using gradient information, so you can compute the gradient information and then take the sign and you can have a limit epsilon the contamination that you're adding at each pixel, right? So your third sign also you can add, but it doesn't give you control on how much contamination you're adding. So this is the contamination you're adding in the direction of the gradient. So this gives you adversarial image. So this is called fast gradient sign method, FGSM, 2015 work by Goodfellow. So this is how they generate perturbations that are imperceptible and can cheat CNNs efficiently. So if you are just to wind up this particular session, this particular concept, I just have a small illustration. So this is a simple illustration of loss in 1D. So this is where you're currently operating with this X and JFI. You do gradient ascent, you get a direction and then you just move a little bit to see this much and you add and get this particular thing and the loss is going to be increased. So now because of these added perturbation, you're forced to operate at this point where your conferences to all the target classes change. So this is one way and which is very familiar, very efficient way because computing gradient is cheapest of operations you can think of. There are way better, way complicated approaches also. So most of them they use optimization techniques to craft the perturbation with two constraints. One is visually imperceptible. So it has to be limited by the magnitude. That is one constraint. And the second is misleading the classifiers. So with these two, if you can optimize to get such a delta, you're able to get a perturbation in a different manner. So there are a list of works. Basically these two are the constraints they impose to get these perturbations. So this is one particular example because it is so nice I just wanted to give a slide. So this is a CEPA 2016 work. It's a simple intuition. You just push the sample to the boundary, nearest boundary. It's a simple binary classification where this is one class and this is another class. You're operating at this point. You just add a perturbation to move this particular projection just to the boundary. So you compute such a delta here in this case R, which can find out the nearest boundary and then project it and add that particular R. So it's so simple, at least for the binary case. And they show that this is easy for complex CNNs also. So it's called deep fool. So that is how we can generate adversarial perturbations. So that's okay, fine. You establish ways to find out ways to cheat all that. So let's go a little bit deeper. Why they exist in the first place. So obviously we've been thinking that all these models are extremely non-linear because you have, let's say, resonate hundreds of layers and then everywhere we have a non-linearity, which is a ReLU sort of thing. So you've been thinking that, okay, my model is extremely non-linear, can fit to any classification function, but just let's just wait. So all the way we've done simple linear approximations to get the gradient and then did X plus the gradient part, right? So let's say something about non-linearity. Maybe the models are not that non-linear. The fact that you're able to craft a perturbation by assuming linearity for the loss function, say something about our assumption, the commonly held out opinions of non-linear models. So that's something to be investigated, but definitely if it is so non-linear, you would not be able to fool with the simple gradient information, right? So that's something to ponder about. There are a lot of other speculations, hypothesis empirically proved, but not mathematically. So some of them are listed here. So what I mean to say is, I cannot really prove with the rigorous math, but a lot of experiments can be done and showed. So insufficient model averaging or regularization. Yeah, probably, see that's because you're operating in like hundreds of dimensions. Let's say typical CNNs where the FC8, the final layer fully connected is 1024, 4096, very high dimensions. So you don't have enough data to project and then learn, right? Obviously, if you throw one million, which is ILSVRC data size, if you throw 1.2 million into 4,000 dimensions, it's still sparse data and you're trying to learn something out of it. Obviously, your model is very good. When you take a sample that is very similar to the training data, that's where the generalization comes. See these adversarial images can be thought of, test of generalizability, right? Because you don't have any problem recognizing the dog, which I have shown, but the model sees something else because it is trying to interpolate from its knowledge and failing badly. So insufficient regularization, all that. So discontinuous mappings learned by models. This is exactly what I was trying to say. So you have very sparse data and you learn only at those areas and when you project a novel sample, which is adversarial, I don't know where it falls. Maybe it is falling very far away or very close, just that the interpolation of the model sucks. It is committing a mistake. Or there are some fundamental blind spots in the learning algorithms. So whatever we are using today, a simple basic back propagation. Maybe it's not so efficient to handle all these things. So there are a lot of speculations. People are regularly looking for explanations for this, but this is the status as of now. That's good. So we know that there is a problem and nobody is safe. So what can we do about it? So there are a lot of simple ways to, you know, at the outside it looks like noise, right? So people think that, okay, it's just noise, why are you so worried about? Let's do some mean filtering, Gaussianize. That can get rid of the perturbations that are added. Maybe your system is now behaving normally. So people started with all those empirical ways, like foveation, like you take crops, multiple votings, and then see what the majority labels is. Yeah, as I was telling image blurring, or some JPEG compression, something like that. So this is a set of primitive operations, not intelligent. And there are some intelligent ways also. So we can impose some, I'll explain, we can impose some constraints during the training itself to make the models robust. We'll see. So as I was explaining, these are very small, low-level image processing techniques, I would say. So foveation is, so if at all, all these perturbations added in useless areas, let's say all this is not required really to recognize the bird. So I'll just take crop of the object and then feed. Some cases you get good, because you're removing all this extra Delta X part from the X, so that can help you. So you can take crops, flips, everything, all that, and take a voting of the final labels predicted. Though you get like 10, 15%, not sure that you will be ending up with a right prediction. Similarly, JPEG compression. So you introduce extra artifacts where you try to kill the adversarial noise. Maybe you're introducing something else, but not adversarial noise. So that's also one of the cheapest techniques, something like that. So just to give an idea, so this was taken from one of the archive papers. So versatile images, non-corrupted, top of accuracy was this thing. And if you just do three-crossed average filtering, you are recovering 73% of the images. So I must say this, the table has to be taken with a caution. There's just MNIST and CIFAR kind of smaller dataset. Yeah, sorry. Let's resume. So these experiments were not done on ILSVRC kind of big data. So yeah, that's the added salt. Basically, in my personal opinion, they're not going to save you. So like in that case, what can actually save us? So this is a procedure introduced by again, Ian Goodfellow in 2014. They call it adversarial training. So what exactly is this? So you know that the model's interpretation is bad when you present adversarial images. So why don't you introduce this kind of data during the training itself? So that's the funda. So this is what is happening. So you give X in the corresponding label and try to train the CNN. This is what is happening. So during the mini batch training itself, you compute the adversarial images for half-cooked image or sorry model. The model is still getting trained. So I would use half-cooked one and then you compute the same adversaries and you take these guys. So of course it does a pretty bad job of classification. But now you change that label to the right label and you say, no, no, no, this is not paintbrush. This is still a dog and then present it as a training data. So that's how you introduce this kind of, novel data to the system. And you just use the same data along with the original data to train. Now you have better data, which is adversarial since it is not, it is becoming normal data to the model. So but it has a couple of drawbacks because it doubles training time because essentially your data is getting augmented because at every mini batch you need to compute the adversarial set for some of the batch samples and then make them augmented. And let's say you are using adversarial samples of one kind. So it may not be intelligent to handle adversarial samples of other kind. So that is one drawback. It knows whatever adversary you are introducing but it may not know all other ones. So there are pretty other ways also. I'll just give one of the interesting ones. So this is called contract to penalty. So basically how to look at these things. So adversarial perturbations are questioning the stability of the model, right? When do you say system is stable? So for a small change in the input, you don't expect greater change in the output but this is what is happening. So the system is unstable. So what to do to make them more stable? So you include another penalty where you don't see a lot of change from layer to layer. The power was not connected. Sorry for the interruption. So we were at designing specific regularizers to make the models more robust. So this asks questions about the stability issue basically. So for a bounded input you should see bounded output kind of thing where that is not typically observed in case of adversarial perturbation which is so structured of course. So in order to make these transformations stable, so this is one work recently, like 2015 work. So if you introduce the Jacobian between consecutive pair of layers, we can call them transformations. So if you make the transformation smooth, so this is hidden layer J, hidden layer J minus one. So wherever there is a transformation from a pair of between a pair of hidden layers you just make sure it's behaving well. By reducing the Jacobian. Basically you don't want the system to behave drastically, like make it learn smoothly. So this is the, yeah again, so the results were not on like large scale data sets but looks promising conceptually. So that's why I included this one. So it's called contract to penalty borrowed from contract to auto encoders where they were training on auto encoders along with the reconstruction objective. They had this one also. And suddenly it looks like so relevant here. That's why people trained with this kind of objective and showed some robustness. And this is a list of works that are under development to make the models robust. So if you take a look at the adversarial training it looks intuitively the way to be done, right? So you introduce all sort of adversarial things to the system, but it is getting costly because you don't know what sort of attack will be there outside. And you have to create the data and then pass all the data again. So alternatively if you think for mechanisms that we can do all this during the forward pass itself at all layers, let's say you don't want to create adversarial images, but adversarial features. So you make all these individual layers robust by introducing adversarial features. It is getting costly to create adversarial images. Instead you can intelligently do something on the features and show these are adversarial features beware of these guys, right? So it's called feature space augmentation. So these are not label, these are label preserving but these are not typically what we have seen like rotating, flipping. No, these are intelligent augmentations in the feature space that can introduce the type of adversaries that is going to see outside. So this is currently what is happening, particularly in this area. So there are quite some efforts we're also doing and I cannot reveal more because it's under review. So this is where the state of FIC is. Of course, these guys are getting stronger and stronger. So you want to understand them at a very deep level. So you create more stronger adversaries and fight for defense mechanisms. So I'll just introduce you to one of the very recent, like couple of weeks old sort of adversary. So on the left you see a transformation which is not very robust to scaling. So they are zooming it by 1.002 and immediately after 1.002 the true label pops up. The true label for this is Tabi Cat. So I'll replay it, oh sorry. So until 1.002 it was something else. Let's stop computer or something. And once you go to 1.002 the true label pops up. That means the attack some person used here is not robust to scale. But on the other hand, so this is very recent work. You go to extreme scaling and still you see pooling. So all the way it is doing something. Red means it is misclassification with high confidence. And like green here. So the adversaries are getting stronger and stronger day by day. So yeah, the first bullet is explaining that work. And universal adversarial perturbation is something we should really think about. So you don't need to really know about what you're contaminating. You just contaminate and it works. That is something serious. And then you don't need any data. So that is insane. You just need just optimization technique and a model architecture. That's it. So this I'm going to present in BMVC next month. So all these models are really, all these perturbations are really strong. So just to conclude, what do we learn from all these set of experiments and all these things? Obviously gradient is the leak. Gradient information is a double edged knife. You all know gradient helped us to learn models, visualize models and attack models. So if you are in love with gradient, you should think about it. And adversarial perturbations are an intriguing property of any machine learning system. So today we are at a stage at a time where you build excellent systems that behave badly. So you have systems that work, but at the same time, they're not stable. They're not reliable. So that aspect of the systems is to be investigated now. Yeah, of course, there are a lot of issues with the setup that we have about generating adversaries or testing and all the fooling rates that are claimed. You can question there, but still there is an issue, right? Yeah. Thanks. That would be from my side. So it had maybe one or two questions. We already are a little bit over time. Great talk. Thank you. Thank you. So can an architect design the facade of a building with this kind of adversarial image inputted into the design so that Google Lens can come up with some other tagging? Sir, pardon, can you repeat? It's not very clear. Google Lens. Okay, Google Lens. Kind of focus on a building and it'll tell what the building is or what it's supposed to be, right? Yes, yes, yes. So if an architect were to take up these ideas and then put it as a part of the facade such that a Google Lens can be fooled? Yeah, as long as there is a machine learning system behind anything, they are susceptible. I would confidently say so because all these models translate it to real life also. So if you take a printout of that adversarial image on the right you saw and then run on any cloud-based classifier that is existing as a Google Play app, it gets fooled. And there are a lot of reliable evidence to do that. So the title of the paper itself is Adversarial Petrobations Transform to Real Life. So all these flexes that you see as a road sign can be adversaries. So it's not just on the machine, that's it. When you print, using typical printers that you use day by day, they survive the attack. Okay, maybe last question. Thanks for the nice talk. So coming to the point where it said that we'll add a small gradient and train that also. So the problem there is if those two images geometrically for me in my manifold lie at two different points. Yeah. So I'm still not solving the problem, right? So you would expect that two images which are to be classified similarly slightly changed by the gradient should geometrically lie very close in the manifold. So suppose I take TS in here or anything, my low dimensional thing gives me a completely different picture and the prediction gives me a different picture. So it's not fine, isn't it? Pardon, can you repeat? Yeah, so you said- Is it a comment or a question? Now going back to Goodfellas 2014 thing. Yeah, yeah. Train both on the adversary and the actual image. Correct. So there the problem is these two images are actually at different points in the geometry. Correct, correct, that is a point. So you are actually not mapping it to a manifold properly, correct? Right, right, right. You are mapping on to a manifold where the system's interpretation or interpolation is going to be more accurate than earlier. Correct, but if I take a low dimensional picture of that, do you find those two points to be close by to each other or to very far away? Of course, they have to be close by in the final projection. Let's say you're doing a linear transformation. Obviously FCA is W transpose X, right? So you are squashing W transpose X conferences. So that's where you want to bring them closer. Until now they were far away in that FCA space where for one confidence to some other class was speaking, right? So all this introduction of these adversaries during the training itself makes these projections come closer in spite of being very different. Okay, maybe one more. Okay, I can ask someone question actually. Suppose you use some deep neural network. Pardon? Okay, you use some deep neural network. That basic framework is, okay, you put some input layer, then some convolutional layer, some something, and then finally fully connected layer. Correct. Okay, if you reverse it, what will be happen? Suppose if you use some first fully connected layer and then use convolutional layer, then what will be happen? It doesn't matter in what way you're doing the transformations as long as the gradient flows back to X. So that is what is leaking, right? So that's what is the knife. Did you try it? I don't think so. It will work. Same like whatever you are first if you use some convolutional layer and then you use some fully connected layer. I don't think so you will be getting the same results. So you use some gradient descent or whatever optimization technique you will use it, but still you will not get. Right, right. So I'm not sure the intuition for having such a complicated system. I'm not sure, but even if you have that, I haven't trained any such model, basically. So X-ray is also convolutional happening and one cross one. You can try, but maybe you can get more interesting things from the edit cell. Yeah, thank you. So just one question. You said that we can just introduce adversarial features instead of using adversarial images for training. So how does that happen during the network flow? Like how do you introduce features in between the training? Okay, I'll roughly give an idea. See basically, let's take fc8. So you're passing a bunch of images and what you see at fc8 is a thousand 4000 dimensional features, right? So now imagine 1000 bunches because you have 1000 categories in the data. You see a distribution with 1000 modes and now take one particular distribute mode and then from there, if you see, look around, there are 999 other modes. All of them are adversarial directions. So if you take any point from that to any of these triple line, it will be fooled as one of those classes. So if you move the projection, so now all those triple line directions are called adversarial directions. Now you just make a small change in any of the randomly selected directions and add that to the mini batch and say the label is still this one, correct? So essentially it is adversarial training performed in a projection space, not in the image space. All right.