 Okej, različ naredil smo različ, da smo priči, in nekaj, ki so predstavili, b�e bi tebe. Dobre značno rečno. Vzelo se, da plizimamo delay od vrštrih vštrog vštrog, ki se priče začunijo obrštrog vštrog vštrog. Prejste pa se znamo še tehniki, nekaj nekaj ne lahko ne zavljamo, da je je najložitega vizivača in potom, ki se vse je, ki se tega začeli, tega odstorila za danas vasovrv. Taj, da se pristimo, da imamo na svojite, da so nalazni začeli, in da je tudiča izgleda, da pa se igrali, da tudiča je povedal, da je načal, da je to je nekaj nezavljalo, da je to tudiča začeli, da je to je začeli. na vseh lektu. So about the defenses, there have been several proposed defenses that turned out to be not very effective against evasion attacks. I'm not going to cover all of the possible defenses that have been proposed because there are just too many, it will take us a couple of weeks, I think, to discuss all the methods in detail. So what I'm trying to give here is an overview of the main approaches or the main categories of approach that I think work the most against this threat. And to keep it very simple, we can categorize them into two main different approaches. One is essentially aimed to reduce the sensitivity of the classification function to input changes because as we discussed yesterday, this is one of the main problems behind the vulnerability of classifiers against evasion attacks and we will discuss more about this in detail in a minute. And this encompasses approaches named, so one is named adversarial training, which essentially amounts to retrain the classifier on some of these adversarial examples. So you create some perturbed images, you retrain the classifier on them and you see that the performance under attack improves a little bit. And this is also equivalent. We will see how to different regularization approach. I think this is very interesting and we will discuss that soon. And the other bunch of defenses that one can use, which are complementary by the way to the previous uppers, so you can use them together to strengthen your classifier, is based on the idea of rejecting samples. And the issue now with standard machine learning algorithms is that they have to make a decision anyway on every given point. So you give them a point and even if it's not in the known classes, they have anyway to assign it to one of these classes. With rejection you give the classifier the option to abstain. So to say, I don't know or I cannot classify reliably this sample and so I just don't give any... I don't make any decision on this point. So this is the idea. So regarding the first approach, the idea here is that you can learn a classifier by using what is called robust optimization, which means making the classifier aware that the data points can change. And so normally this can be done by formulating this minimax problem where now look for a moment to the standard learning problem which would just have the minimizer here. So in the standard learning problem you just minimize a loss function computed on the training set with respect to the classifier parameters. So that's the outer optimization loop. So this is the minimization of this loss function with respect to w. That's how you optimize the parameters of a linear classifier. So what we do here, and this would be the solution that you find, for example, to separate these points in a very simple toy example. Now what we do with the inner loop is that we assume that these data points can be perturbed by some delta and this delta is constrained to be in some region of the space. So if you set a bound on the max norm of the perturbation, it means you are basically plotting hypercubes on each data point and hypercube on each data point and now the attacker can shift the point within this domain, within the hypercube. And maximizing the loss function with respect to this kind of perturbation means that you shift the points towards the boundary in order to improve the chances of having such points misclassified. In practice it means that this training point will change as shown in the picture. So that's what the inner optimization loop does, so it's essentially manipulating all the training points and creating adversarial examples out of them. And then what you do, this is an iterative process, so you train the classifier, you modify the samples, then you retrain the classifier and so on and so forth until you reach some equilibrium point. And then if you retrain the classifier then this classifier that separates now the perturbed version of the training points. So that's the basic idea behind formulating this learning problem as a robust optimization problem. And so in its standard form this amounts to what is called adversarial training. Essentially it's the idea of retraining on the adversarial examples for different times and so on. Now the interesting is that under some assumptions you can show that this robust optimization problem is equivalent to a regularized problem where the regularization term is basically dependent on the size of the gradient that you have on the training points. So what you're trying to do with this approach is penalizing the size, the gradient, the norm of the gradient on each training point which means I want that my function is smoother on those points. So what you get is essentially you push the boundary further away from the points in a given direction. Now the interesting thing here is that this kind of regularization so the norm of the gradient that you use here matches in some sense the norm of the noise and in particular it is the dual norm. So if you assume that the noise is bounded by the max norm which means you have squares here then you have to penalize the L1 norm of the gradient. That will be the optimal regularizer against this kind of noise. And then if you use instead the Euclidean noise you have the Euclidean regularization the Euclidean norm for regularization and this is a very interesting finding which clearly only holds for linear classifiers so this will basically be the weights that you have in your classifier but I think it's really nice that it connects the kind of noise that you have with regularization. So if you prefer you can think of regularization in a different way now so these regularization terms do not come from bounds on the generalization error for learning algorithms but here you're just saying so there's no assumption on the underlying distribution of generating the data samples. You are just given a bunch of data points that essentially tells you that the separator is tuned based on the kind of noise that you assume on the data. So it's a different way of seeing what regularization does. So it's matching a specific kind of noise. And so this is one kind of way you can use to improve the robustness of classifiers against some adversarial examples. And we did some test if you recall yesterday we've seen the case of Android malware where we trained a linear classifier to discriminate between benign and malicious applications on Android. And if you think to the kind of attacks that we perform so the idea was to inject objects into android applications in a way that their classification goes from malicious to to benign. This attack can be seen as a sparse attack because the attacker injects one object at a time and aims to reduce the number of objects as much as possible. So in some senses a sparse attack. And then the dual norm of the sparse attack of the sparse noise is the infinity norm. And therefore according to this robust optimization approach this classifier that penalizes the set of weights should be the optimal response against the sparse noise. The sparse perturbation of your data points. And in fact when you test it so this is a plot that shows again the correctly detected malicious application where you have 1% of false alarms so legitimate applications misclassified as malicious. So it's the detection rate. Here we've seen the green curve and this is 95% more or less so 95% of malware is correctly detected by just injecting 5-15 objects in each malware application you can flip the decision almost on overall samples. The blue line is an empirical defense based on combining different kinds of classifiers and averaging or doing majority voting which is one thing which is one claim one intuitive claim is that if you have multiple classifiers they are harder to fool than a single one and it's not completely true it depends on how you combine things but in the end it gives you some more robustness because you see that their detection rate under attack decreases more gracefully than the other curve and now if you look at the red curves they are essentially variants of this classifier which is assumed to be the optimal response and as you can see the red curve is essentially this classifier is much more robust than the other two and to fool it you have to change with 100% probability you have to change more than 100 objects so you have to inject more than 100 objects into your android application and this may be feasible anyway but it's much harder than changing less elements without corrupting the intrusive functionality of the code of course in the end you will fool every classifier anyway because we are just looking at static feature so we are just looking at the it's a static analysis from the code so you can you can manipulate the code to preserve let's say the behavior of the program but you can change it in many different ways so that's not enough to identify malware in these files you should complement the analysis some dynamic analysis which means you execute the program and you see what it does if it connects to some suspicious servers of these kind of things but just looking at the static code the static analysis of the code is not enough to perform malware analysis completely that's just a very fast approach to identify trivial malware samples if you want to really catch the difficult ones you have also to perform some dynamic analysis but that's a general problem of program analysis it's outside of this machine learning robustness thing and now it is interesting also to explain why this approach is significantly better than the other one than the standard classifier and the reason is that as I told you yesterday the standard SVM the standard classifier tends to overemphasize few features which means you will have few features that are given a very high absolute weight and for example this is depicted in green so assume that if you sort the absolute weight values given to the features you have something like the green case for the standard SVM and therefore it's clear that if the attacker changes the first two features it can significantly change the output of the classifier what happens when you penalize the maximum of the weights is that essentially you are bounding the weights to have the absolute value within a given small interval and so it means that if you really push the regularizer it happens that your classifier more or less learns the same weight in absolute value for all features of course there will be positive and negative ones but they are bounded to be very small and so by changing a single feature you have a significant impact on the classifier output to have a significant impact on the classifier output you need to change a lot of features so you need to change much more and that's why you have this kind of behavior in these security evaluation curves so they tend to decrease in a much more gracefully manner is that clear to everybody? so we have in some sense a theoretical guarantee that this is the optimal response in the next generation why it's more robust than the other classifier and that's for the first defense of course as I told you this is also equivalent to adversarial training it has been shown that for a small perturbation penalizing the gradient on the training point the input gradient on the training point is equivalent to retrain on these adversarial samples and if you want to see it in terms of sensitivity is that you are decreasing the local sensitivity of the function around the training points that's why the classifier becomes more robust of course the important thing is that this notion of sensitivity has to be related to the kind of noise that you have so you have to smooth the function in a proper way it's not enough to smooth it according to any regularizer just in a quick note about ineffective defenses now so what why several defenses that have been proposed are not really effective against the attack the reason is that they're not really changing the classifier they're not really moving the boundary so what they're doing is they're making the optimization of the attacks more complex so if you have so this is the case where we've been analyzing so far where you have a smooth function that you can where you can optimize the attack points in a very easy way so the function is smooth enough so you can perform easily gradient descent on that to optimize your points your attack points and so many defenses more or less explicitly perturbed this function the classification function in this way so now the classification function can be made very noisy and of course you can optimize the attack points so if you start from x for example then you will end up in a very close local minimum and so your attack algorithm will stop here creating an adversarial example that is not evading the detection but this is not because the classifier is really more secure it's just that optimizing the attack points is much harder in this case and so to defeat this kind of defenses you can simply smooth again this function for example learning a similar classifier which has a smoother surface and then it can be optimized again with gradient descent in a easy way and so this is essentially what this paper has done so what Nicholas Carlini and Aisha Tali have shown have done to break all these defenses from ICLR 2017 ok it's a very simple idea so create a smoother function that you can again optimize very easily and then in this way you can discover that there are many approaches which are not really effective to defend against this threat some other defenses so this is one kind of defenses and the other one was essentially learning almost a step function so if you make your classifier non-differentiable then you can attack points by using gradient descent because the gradient is zero almost everywhere so what you need to do is to create again a smooth approximation of the target function so that's the overall idea just in a nutshell and that's about the first line of effective defenses against evasion now talking about the rejection the reject option we can see that it works in this way so if you have two classes and you have a decision function normally the classifier as I said before has to make a decision on every possible input on every possible input and therefore it has to make a decision also for these green points despite they are very far from the rest of the training data so these may be images of cats images of dogs and these may be a completely noisy image very far from the rest but given the classification algorithms nowadays they have to make a decision anyway even on that point and so in this case this point will be classified as a blue point with rejection what you can do is try to shape the decision function in a way which is more tightly and closing the classes and therefore you can have a more boundary which is closer to the red points and to the blue points and outside of these regions so I don't know so the classifier just doesn't make any decision you can think of it as an additional class where you say I don't know or this is an anomaly an anomaly of samples which I cannot make a reliable decision on we tested that in the case of the ICAP robot if you remember yesterday I just described this thing and this is just an example again of an adversarial example where you manipulate some of the pixels with more or less a noise and then you have this image misclassified as a cup this is the noise mask and this is the cup from the other class and now what happens when we apply rejection in this case so in this case the green and the yellow curves are just the standard Svian classifiers trained on top of the deep net that extracts the features from the images and here you have that more or less the accuracy is 70% recall that we have 28 classes of objects so it's not that bad and this is what happens when you increase the amount of perturbation on the images according to the L2 norm of course as soon as you increase the level of noise the performance drops and you see that it drops quite quickly here 200 in terms of the Euclidean distance with respect to the original image so this is not a very strong perturbation but it's enough to almost completely misle the classification accuracy. The red curve is using the rejection option again only in the last layer so at the output of the deep net in the representation space and so at the beginning of the adversarial examples it's clear that the accuracy decreases a bit just because you are more tightly enclosing the classes and therefore you will reject some legitimate samples and that's something the price you have to pay to gain some robustness in this case and therefore when you have a very small perturbation it jumps over 80% just because it's able to detect adversarial examples so all samples are now adversarial examples and you can catch correctly 80% of them so in this case it's rejecting pretty much everything because everything is in attack so in this case when you increase the perturbation we manipulate all the test images but then you see that there is again a decrease in the detection rate for the adversarial examples and the reason is that we are looking on the representation space and what happens in the representation space is that these samples are then projected very close to the target classes so this is very well depicted here so what happens is that you have the original image and this is the perturbed image and now they are very close in this pixel space in the space of images but what happens is that while the original image is projected in the representation space the corresponding perturbed version is projected much closer to the target class so if you want to have this car misclassified as a dog the corresponding adversarial example image will be very close to the class of dogs in the representation space and therefore that's why you cannot reject correctly this sample in this space anymore because if it's too close to the other class there's no way you can distinguish ok so that's the best explanation I find why I found why this is not effective and so you need to basically apply rejection at the lower layers not just at the last one and this is another nice explanation that I found in another paper by David Evans and colleagues and this depicts the distance between the input sample and the adversarial example across different layers of the networks so you start from the input space and then you go up to the last layer the red line is an image perturbed with an attack algorithm so it's an adversarial example the blue one is just a random perturbation so it's the distance between the original image and the randomly perturbed one I think with Gaussian noise a noise which is not adversarial against the classifier and what you see is that more or less this distance is constant up to some level but then they go very far so the adversarial example goes very far with respect to the original image in the last layers and then this is essentially another view of the same problem we discussed before whereas for the random noise this distance gets smooth by the way so you start from the layer towards the output then the image it gets correctly projected against close to the original class and so essentially this noise doesn't get amplified whereas the red noise so the adversarial noise gets amplified throughout the layers so that's the main problem so I guess you were suggesting initially you can probably reject these adversarial examples but more or less what I'm saying is that it's not enough to look what happens at this layer because that's already too late they are already indistinguishable from the target class so what you should do is go back unless they become distinguishable again probably it's not required to go up to the input layer but maybe if you do that for 10 layers in this case you may be able to get a lot of these adversarial examples of course but you may reject too many legitimate samples so this is something that we should look into in more detail but that's part of the story ok, now I have I just want to show I just want to show you a simple demo that we created with some students some colleagues and you can also connect this website there's a simple demo where you can modify you can select some input digits from the MNIST data set you can select the target class the amount of perturbation and see what happens to the output of a multi-class linear SVM ok, I'm just showing you a simple example then if you're curious you can try it yourself later ok, so that's the website you can pick some digits here let's take zero for example and then you can select the target class if you remember yesterday we've seen two different kinds of attack one is the error genetic one where you do not specify any target class so the only thing that is of interest to the attacker is having just an error if the zero is misclassified as a one, a two, or whatever it's the same for the attacker in the other case instead the attacker wants to specify also the target class so you may say I want this image of a zero to be misclassified as a six so you specify the target and here you can either pick any or select one of the digits let's take five for example I'm not sure if you can see it ok, and then here you can select the amount of perturbation which is the one that we sorry, it's the epsilon essentially ok, it doesn't change ok, let's see what happens if we perturb zero with a very large perturbation I don't know why it was not changing that it takes some time so there are a couple of workstations that are doing the computation in our lab so probably if you're using all the demo at the same time you're gonna wait for a while but this is what happens so first of all you see that now the image of the zero has been completely destroyed so this is for us almost a completely random image but for the classifier this is really a five with a very high confidence so the classifier is very confident that this is the image of a five and you can see that so these were the original scores and as you can see at the beginning the zero was classified correctly because zero is the class which has the highest support the highest confidence from the classifier and after the perturbation you see that the classifier believes it is a five with a very high score and so this essentially tells us that this classifier is not learning any structure any semantic meaning of the digits this is not really zero, it resembles it resembles a five more or less but it's anyway very noisy ok, with a lot of fantasy ok and then you can try different things you can try with less noise and then you gradually increase the level of noise and you see that the perturbation increases and gradually the digits tend to belong to the other class it's interesting also to see that for example in some cases here for example if you have zero one is the class with the least support and if you want to change this image of zero to one and the perturbation is not sufficient it may be misclassified as some other classes in between so it may be misclassified as five two or seven for example because if you imagine that in the feature space what happens is that you have the class of zero and then the class of one is very far from there and you are shifting the point in this direction essentially and in the middle there are different classes so it's crossing different classes in the feature space up to reach six when the perturbation is sufficient and you can test that by playing with this D max or epsilon parameter in the demo ok so that was for the for the demo part if there are no questions I also like to point out another thing so we are also with a company involved in an european project which is called Aloa and here the goal is to essentially design the goal of the project is to design deep networks also taking into account architectural constraints that come from heterogeneous hardware so the goal is to take deep nets and make them fit to these small devices, heterogeneous devices in a way that you can then use them in video surveillance cameras or robots where you have this custom hardware that's the goal of the project and normally what happens is that you take a pre-trained deep net and you try to approximate it to make it possible to fit to the hardware requirements and so instead of using representations of the weights for example in using a lot of bits in floating points you reduce the number of bits to represent the weights and typically it degrades the performance so the goal of the project is to put them these requirements together so you design the network already keeping to account also these constraints and in this part, in this project we are responsible for doing the security evaluation so to run these attack algorithms and see in the different use cases how they perform that was just another curiosity that I wanted to tell you about evasion attack so if there are no questions this essentially closes the part on evasion which is the case of adversarial examples where we add most of the work done and I'm now shifting to a different kind of attack which is called poisoning yes, please which one, sorry in terms of security aspect of this heterogeneous approach so what are the differences no, no, no we are analyzing so when you design the network together with the hardware constraints you essentially use additional constraints and then we want to see whether we are changing the security properties or the robustness properties of these networks and then we introduce these approximations and basically try not to change it so it doesn't succeed and then it's not a problem because the homogeneous environment is not a problem for the security and then of course the interest is to see whether these algorithms can be fooled and to what extent so the goal is always the same so try to understand the level of robustness against this kind of worst case noise what happens when you introduce these further design constraints so when you try to shrink the network to make it fit to these embedded devices if that compromises accuracy or robustness and in which ways so that's the part, the role we have in that project okay so now moving on to poisoning in the case of poisoning we have a different problem so now the attacker is not changing the test data what he does is manipulating the training data and here again you can have different objectives so just to sketch the problem a bit more in detail normally you have your training data with labels and then you train a classifier which works well on some unseen data on the test data and typically for spam you have a collection of labeled spam and legitimate emails you create a dictionary of words and then you train your classifier on this so in the poisoning setting to explain the poisoning setting you can think that there is an attacker now that sends you some spam messages as before but they also contain some good words so words which typically appear in your legitimate emails and maybe as I said this can also be colored in white so that when you get the email you just read the spam message then you are prone to classify this email as spam so you flag it as spam and your filter is updated so it is retrained also including this sample in the training set now when it happens and if it happens several times you may end up having a classifier that considers these words university and campus which were good words they can be considered bad words by the classifier and if this happens several of your legitimate emails may end up in the spam folder so the goal here is to find some training samples that maximize the error of the classifier on the clean data and this would cause a denial of service for legitimate users so that's the main goal of poisoning attacks the attacker can change some training data in order to maximize the error that you have on clean data and therefore we can also formalize this according to the attack framework so the goal is to maximize the error we start by assuming as usual the white box setting so the attacker knows everything because we're just interested in understanding which is the worst case performance and then we can relax this assumption to see what happens when the attacker knows less about your system but first thing to do is I want to see my system how much is robust against an attacker which is very powerful and then in this setting the attacker can inject some poisoning samples into the training set and so the optimal strategy now I stated in words and then we will formalize that but it is that the goal is to find an optimal attack point so let's simplify the problem a little bit instead of considering many points on a single poisoning point so the attacker wants to find the poisoning point XC that maximizes the error of the classifier now to exemplify this again a little bit more assume that you have a linear classifier it's a linear SVM in this case and you have two classes and now these are the training samples so you train the classifier on these samples you got this linear separator and then you measure the error on a separate data set not on the points that are in the slide but on a separate data set sample from the same distribution so in this case the error is 2% if you add a red point here XC what happens is that the boundary learns this point as well so the classifier learns this point and the boundary changes a little bit and now if you measure the error it raises to almost 4% so it is measured on a separate data set now the question is that I don't want to put to throw random points into the training set I just want to carefully find the XC which is the one that maximizes the error on the clean data so to see it in two dimensions you can basically move XC you can create a poisoning point in each location of the space so you can do this kind of exhaustive analysis in 2D because it's doable but for larger dimensions it's not possible and for each location you take the classifier and you retrain it by using the training data plus the current location so the current XC so for every point in that plot you retrain the classifier once including XC as a training point and then you measure the error on the separate clean data set and this is what is depicted in colors so that's the classification error as a function of the training point essentially that's what we want to maximize and as you can see this function has this kind of behavior if you add a red point in this region nothing happens essentially and in fact what happens for the SVM is that this point is just learned as a reserve vector in no impact on the decision boundary so that's why the function is flat here the classification error is not changing because the classifier is not changing when you add XC in this region of the space in the left part of the plot whereas when you start adding the point here you see that the maximum is more or less here in the lower right corner and in fact this is where you get more or less 5% to 6% error on the test set so that's just the function that we want to optimize and now we can formalize that yes, please that's a good question so here you have a reduction in the error because essentially if you see what's happening here is that you have two Gaussians so the best separator would be just the vertical line but since you have less samples this boundary which is a bit tilted so if you add a red point here well the boundary will will change toward the correct separator so that's a case where you add noise but it's even helping the classifier to perform better so if you have much more red and blue samples you will find the ideal separator so that's why you have a decrease here ok so now we can formalize that so the goal of the attacker is to maximize the generalization error on the unseen data with respect to the poisoning point and this can be formalized in this way so you want to maximize the loss function on some validation points f is your trained classifier and xc is the attack point that's the main objective to maximize the error on clean data but this is subject to this constraint and the constraint is that I have to retrain the classifier f star including the attack point xc so you have two nested optimization problem one is maximizing the error and the other one is training the classifier because every time you change the training point you have to update the model and this is called by-level optimization because you have the outer problem and the inner problem is nested in this way ok and this is a specific instance of this problem for the SVM so if you use the Inge loss for measuring the validation error this is how it looks like but this is just to show you that the dependency so the only point where you see xc, which is the variable you are optimizing is here and it's here so there is an implicit dependency of the outer function with respect to the inner problem and the dependency is captured by the classification function the classification function is the only thing f star is the only one that depends on xc ok that was just to make this clear and now this looks very complicated but there are ways of course you can use to solve that and in particular you can again compute the gradient of the loss function with respect to xc by using a trick I'm not giving the details here but for those of you who are interested you can look at the paper it's a paper from ICML 2012 and there are follow ups but in this case the gradient is not very easy to compute and you can understand why but in this case when you want to compute the gradient of the loss with respect to the attack point you have to update the classifier so this gradient has to keep into account that the classifier changes whereas in the evasion case it was different the classifier was fixed we are not changing the decision function and there is a trick as I said and the thing is that you can remove the inner optimization problem with the KKT condition so if you remove the learning problem with the KKT condition the Karushkund Tucker equilibrium conditions you have now an optimization problem with constraints it's a set of linear constraints and then you can basically invert these constraints to find the dependency of the classifier parameters with respect to the attack point I'm not giving the details as I said but in this way there is a gradient in closed form and you can do that for many different classifiers you can do that for the SVM for ridge regression, lasso and logistic classification so on and so forth all the classifiers that have clear optimization formulation and equilibrium conditions for them you can compute this gradient this is just how it looks like for the SVM and as you can see this more or less encompasses in some form here so that's the asian of the problem but if you're curious to see how to derive this you can look at the paper and this is what happens when you consider this gradient based attack in the case we have discussed before again this is the linear classifier now what you can do is take a point from a class in this case I take a point so I labeled it as red that's xc at the first iteration 0 and then you optimize it along the gradient direction so you do this gradient ascent following this equation and what happens is that in the end you find a local a maximum of the function ok and this is very interesting because it also works for nonlinear classifiers so this formulation is essentially the equivalent for the linear and the nonlinear SVM using kernels the trick is that this gradient allows you to go back and forth from the input space to the kernel space or to the representation space so that's how it works why it works for the nonlinear case as well and now this is a simple example on again a toy digit problem so we take again two digits and in this case the classifier zero just two classes what we do here is we take a digit from the four the class of four and we flip it's labeled so now you have a four which is labeled as zero in the training set if you just flip the label of this sample the error goes from zero almost zero to let's say one percent that's the iteration zero here that's just a label flip but more or less SVM is robust when you just flip the label of few labels in the training set so essentially nothing interesting happens here but then you optimize the point with the attack algorithm that I just discussed and what you get is this kind of blurred image of a four that I don't know if you can see it but there's more or less the shape of a zero in the background and now if you add this image labeled as zero into the training set of the classifier you completely screw it up and the error goes up the testing error goes up to more than twenty percent just by adding one image out of a thousand of hundreds training points so the attacker is just controlling one percent of the training data with one percent of control over the training data in this case you can essentially send the error from zero to twenty percent and in many applications this is already enough to make the system inaccessible to legitimate user so it's effectively at denial of service and of course you can iterate over multiple points so if you add one point then optimize another one in a greedy way you can further increase the error and here we go up to thirty five percent almost forty with less than ten percent of these points I think I will make a break here we can have ten minutes break if it's okay with you I can start again at twelve twenty okay okay let's make a break then okay so we've been discussing how to manipulate the training data to damage the classifier in the sense of maximizing the error and in some cases as you can see the error can go up very quickly I would say I'm wondering if you are thinking to what may happen to deep nets so our deep nets also and so far it's not been studied very well so it's not clear whether they are maybe it's the cable so it's not clear if they are very very vulnerable to this threat and there is just one paper which I think it's very interesting where they show that you can so this paper is about interpretability of deep nets so trying to understand when they make predictions why they predict some sample in a given class and they explain the predictions by finding the in the training set that explain this prediction so it's like this image is classified as a dog because I know these three dogs from the training set so that's what the paper does and in when they derive this prediction method this explanation method sorry they also came up with the method to generate some adversarial training examples for deep nets so they just present a simple case where they have the image of a dog labeled as a fish so they flip the label they add a bit of noise to this image which is crafted as the attacker described so what they do is they just use the KKT equations in the last layer in the logistic classifier they compute the gradient in a very similar way to what I showed before and then the gradient that you get you can be back propagated up to the input pixels so you are essentially assuming that all the network layers are fixed and only the last one is changing but anyway doing this approximation they were able to create some samples for the poisoning phase of course when you retrain the network on the poisoning samples all the other layers also change so this one is the only one to change they all change but that's an approximation that they use to generate these samples and what they show is that if you add this image of a dog slightly perturbed you can flip the decision on these five test samples so you can have the network believe that these are fishes instead of this nice dog this is not a complete availability attack because you just by modifying one training sample you have five misclassification misclassifications in the test data but that's at least a preliminary result that shows that you can also apply this attack to deep nets the problem is that the computational complexity of this method to compute the gradient as I discussed before is quite high and you have to retrain the classifier every time you need to update the gradient and so we also developed a more, let's say a smarter way to get an approximation of the gradient using an approach which is called back gradient or hyper gradient computation and I'm not giving the details here because it's quite complex but it's I just want to tell you that there is an all area of research dedicated to the solution of these bi-level optimization problems the same that we had for poisoning attacks you have the same problem in meta learning or learning to learn approaches I don't know if you're familiar with that but they are all characterized by bi-level optimization problems but instead of maximizing the validation error you minimize that and you minimize that with respect to the hyper parameters of the classifier and so there is a lot of new ideas to solve these problems in a much more efficient way compared to this part of the literature if you're interested in more efficient solutions for attacking deep nets in this case and we're also doing that so now coming to differences so how can we defend and protect the classifier against poisoning attacks in this case it's much easier than the version case because by their nature poisoning attacks have to be outlying with respect to the remaining training points so if you inject a training point which is similar to the other ones you will not change the decision function very much so you have to inject things which are different from the rest of the training sets and for these as you can see here the red point is very far from the class of red points and the same here so these let us think that it's easier to detect these kind of attacks use this idea that they are outliers with respect to the training data and therefore there are two main approaches again that you can use to counter this attack one is data sanitization which essentially amounts to identifying the outliers in your training set and removing them and the other one is based on the idea of robust learning which means I have algorithms that are designed by knowing outliers in the training set and they are designed to be robust against their presence, natively and now just to give you an idea of one of these robust learning algorithms this is one we proposed last year at the SMP conference so this is applied in the case of robust regression but the idea of this algorithm is basically let me just explain with an example the goal here is just to fit this cloud of blue points so the blue ones are the legitimate points ideally you would like to fit a regression line to these points the ones with these circles around them they are outliers just thrown at random in the data so if you learn a standard regression ridge regression algorithm here you get this line which is of course still with respect to the one that you would learn in the absence of noise of outliers what the algorithm does is that it computes the loss value for every training point and it discards those for which the loss value remains very high so you have to set a fraction of outliers in advance and then you say those are the k-points which have the highest loss with respect to my classifier and I get rid of them then you fit again the classifier so the red ones are the identified outliers in this case at the beginning you throw them at random so you pick the red points at random and you retrain the classifier at this iteration those red points are the ones which had the highest loss with respect to this line and therefore those are ignored when you fit again the regression line and so on and so forth and at the end when it more or less converts you can see that it more or less identifies the majority of outliers correctly and therefore it ignores them when fitting the line so this is essentially the idea it's called Trim and it's inspired by the robust statistics like Trim admin and this class of estimators no, I think it's the same number so maybe it's a bit misleading because there might be some overlapping but it's the same number it's fixed so you have to fix the number of I think it's this n here that's the number of you have to fix the fraction of outliers in your data so the circled ones are the outliers if you have a circled red point it means it was originally an outlier and the method correctly found it thanks for the question and then of course we evaluated this method for an increasing fraction of poisoning points in the data existing robust regression approaches and you see that ours is you know essentially identifying all the poisoning points now this is not a full honest evaluation in the sense that the attack is ignoring the defense so we are just optimizing the poisoning points as if it if we didn't have any defense and so we are in some sense easy to spot at least by a method carefully dedicated for that of course this line if you craft the poisoning attacks by accounting for the presence of the defense what you would do is to inject more points but they will be closer to the original points just to be less outlying than the full attack than the complete attack and of course this means that you have to stop but you have to inject much more points much more stealthy attack points in this case but you can fully so the more honest evaluation will take into account this modified attack that also knows the defense that's one of the points now let's cover if you don't have any question we can cover some other examples of attacks against machine learning view and taxonomy that we propose in the survey paper which is also associated to this lecture and tutorial that you might want to have a look and you can characterize them by intersecting the different attack goals that we defined with the capability of manipulating the data and as you can see so far we've seen integrity attacks obtained by manipulating the test data that's the evasion setting and availability attacks by manipulating the training data that's the poisoning attack where you want to maximize the error we're gonna cover some other attacks here that aim to get some private or confidential information about the classifier by just manipulating the test data so in this setting we will see that you can query the algorithm with some well crafted samples and by observing the predictions you can infer some private information about the classifier or the users and then we will just see another example here when you manipulate the training, a set of the algorithm to allow specific intrusions during testing so not to screw up completely the classifier but just to allow some specific misclassifications so regarding privacy attacks this is an example of a phase recognition system and this was known as a hill climb being attacked in biometrics but it has recently been rediscovered in this recent paper 2015 so what they show is that if you can query the algorithm you can then optimize your image by essentially trying to maximize the similarity with the given user so it's like computing a gradient in a numerical way so you start from a noisy image and you query the system and then you try to maximize the probability of a given class so a given user in this case and at the end if you iteratively do this you can eventually reconstruct the template image so the phase that you have in the training data of course you have some error but more or less the attacker has a lot of information from a random image that's the true image of the user and this is reconstructed from random noise by iteratively querying the system over and over so that's an example of how you can get information about the training data by querying the algorithm it's a classifier that is aimed to classify different phases so you may say I want to be classified as you for example I start from a random image and the classifier will give me the probability that this image is classified as you then you can perturb the pixels in random ways it's a sort of black box attack so you can perturb the pixels and then you submit the new images to the system which is the one that maximizes the probability of being classified as desired and then you keep going and in the end you can find an image that has this very high probability value it's not assumed that you can start from a random image the only thing that the attacker can observe is the prediction probability so the attacker knows that the given system is able to touch label faces and he may know that there is some guy yes, yes, you may know at least a name so he knows the name of the specimen so he does not face yes, yes, yes that's a kind of scenario so this attack, I think I don't remember what they say in this paper but the original idea that they found here in these very old papers was that there was some APIs available online that you can iteratively query and so you can generate digital images then the system will classify them and then you can do this many times and in the end you can reconstruct some private information about the users that's an example yes, please maybe for faces it's a bit weird but there are nowadays many systems which are served online as cloud services so you have your, there is for example one choose Google, vision, APIs you can submit images and it will return you the probabilities for each class of objects in these images if you imagine this kind of setting then you can do something like this of course it makes sense if it's on private data otherwise you reconstruct images but it's anyway just an example to see that if you give your prediction probabilities out one can learn something about your training data so that's the point of this attack the other example is exactly closer to what I was saying before this is another very recent attack where you have a model which is provided as a service maybe trained on some of your data and then the user can just query the model to get the predictions back so in this case this attack is able to understand if a given image that the attacker has was part of the training data so you can look at the probabilities the distribution of probabilities that are predicted by the model and based on that you can say this was in the training set or not so this is called membership inference attack inference attack it's interesting but you have to assume that the attacker has somehow access or knowledge of something for training data and he just wants to verify if this image was used to train your model or not and there is also a countermeasure to that so you can mitigate this problem but this attack simply works because the distribution of probabilities for training point is very extreme so you have a very high probability in the correct class that it is flat and more or less to zero whereas for samples which are not in the training data you have more balanced distributions so the classifier is not that secure on some given class not so sure on the prediction and I also think it just works with the cross entropy loss and softmax because this loss penalizes very easily errors on the training set so if you use loss functions less extreme on errors less penalize the error this is more difficult to this attack is more difficult to perform that was again just another example of an attack that may undermine the privacy of your system and there is another one which is interesting this time this is a poisoning integrity so I'd like to change the model or the training set in a way that I can misclassify some images in the test set during test so assume that you have a classifier that discriminates between stop signs and speed limits just two classes for simplicity so what the attacker can do if he has the model is to add an additional class which can be for example a stop sign with a sticker and then you can use these samples to retrain your model also to accomplish for this class so if you have a pretrained model available online then the attacker takes this model manipulates that and again releases it to the public so if someone uses this model which may contain a backdoor then the model may behave as desired when the attacker activates the backdoor and this means for example that if you have a car driving using this backdoor model when it sees a stop so when it sees normal stop signs but when it sees the stop sign with the sticker it recognizes that as a speed limit so the sticker activates the backdoor in some sense and that's another kind of attack you see this is a stop sign with the sticker and it's recognized as a speed limit with high probability is that clear? so it's again an example of poisoning but very targeted to some specific points in the test set so you're not sure that this can be the same but in fact it will change it will change the point is that it will change in regions that will not affect the performance on the other classes if it's linear first of all you have just two classes so it's you cannot add a new class but in these ways this is exploiting in fact the high capacity of deep nets data so if you add a new class and it's reasonably far from the other samples in representation space then it's not really problematic for the remaining classes that's why this attack is possible make sense? then this part is moving towards the concluding remark so it's just a more relaxed part but the effort for understanding is reducing so it's a question I like to pose also to the audience so if all these techniques that create adversarial examples are really a security threat or not so it's the fact that we have this image of pandas and we change some pixels to have a gibbon, really a problem for learning algorithms or it's just a hype in the research field in some sense so the first thing is one problem and so the question is we know that if we manipulate images in the digital domain they can cause problems to learning algorithms so there has been a line of research saying that effectively if you craft the adversarial examples in the real world they may not be as effective as they are in the digital domain so the thing is how would you just manipulate the school bus to be a hostage in the real world it's a problem and this has been questioned by many authors in the first work dealing with this problem these authors Kura Kin and others they show that effectively if you take your adversarial example in the digital domain you print the image and then you acquire again the image with the camera the attack remains effective in some cases that's the very first experiment that you can, let's say craft adversarial examples in the real world at least in pictures so the noise doesn't disappear when you print the image and you reacquire it it's not a consistent experiment on millions of images it's just on some tens of images so we have to be careful in drawing conclusions here for example that it's possible and again we've seen that it's possible to construct eyeglass frames to mislead face recognition systems and you can just print this frame on paper and you can attach it to a pair of glasses and it works so again in this case the noise transfers also to the real world so should really be really worried about this problem can this be a real problem in the real world so that's the point someone and then again the research is not really agreeing on this point so there were also some papers saying no they are not really a problem because these adversarial examples only work if you acquire the image from the same perspective with respect to the original image so if you have instead an object and you get further from the camera you change the position you change the pose then these attacks are not effective and they show effectively that in some cases the adversarial examples were sensitive to scale and so for example in the case of a stop sign if you modify a stop sign well the car will be mislead only at a given distance and position with respect to the sign not at all scales so that was the main point in this paper and of course some authors have demonstrated that you can also create attacks which are robust to these kind of changes even if you change the pose even if you change the distance and this is the very first paper showing that let me see if I can show the video here and that's a cat manipulated image of a cat which is recognized as a desktop computer and as you can see this is acquired and pose and position and distance from the camera to the object so it's kind of robust to these kind of changes in the way you acquire the image of course as you can see here the noise is no longer imperceptible to us and the way they craft this noise is very close to what we did when I was talking about the vision attacks the thing is that instead of just perturbing a single image you perturb different transformations of this image so you have the cat acquired from a given distance then you change the distance you generate these different images of cats you create the noise against all these images and then you average the noise and therefore you have a kind of unique noise model that you can apply to all images and that is robust to these changes location and pose of the object and then the same authors have gone even farther so they have created these 3D objects so this is a 3D turtle which is correctly classified printed with a 3D printer and it's classified correctly that's the original model then what they did use this trick to have the turtle misclassified as a rifle and this is you can see that the pattern is slightly changed in terms of colors and things but this is consistently misclassified as a rifle now a turtle misclassified as a rifle is not that scary but a rifle misclassified as a turtle might be and so this is again a way to see that in the real world we can have problems in this case and again this is the case of the stop sign in this case they crafted this specific kind of noise inspired by the fact that this is a paper from Berkeley so in Berkeley they noticed that you have a lot of traffic signs with stickers on top of them so they wanted to create a noise resembling these stickers which won't be suspicious to the policemen or to people just going around something more or less legitimate that you can find there but if you apply these stickers now a self-driving car may recognize this stop sign as a speed limit so what I'm showing here is in the left image you have the modified stop sign so this will be recognized as a speed limit for most of the video duration and you can see the prediction it will appear down here and this is the original stop sign which is consistently correctly classified so now I'm playing the video and you can see that there's a difference and this one is misclassified as a stop sign as a speed limit sorry let me play it again and you can see that it's classified as speed limit from most in most cases from several distances and position of the traffic sign in some cases in just a couple of frames it's correctly classified as a stop sign but most of the time is misleading the algorithm and so now we can say that we can really fabricate objects in the physical world that can fool detection algorithms or deep learning algorithms or AI if you want but of course there's no large scale experiment that thoroughly investigate these aspects so these are just a couple of examples that were created so we still need a large experimental investigation to draw some conclusions but then there is a recent paper by Gilmer, Goodfellow and others where they really discussed the realism of this security threat and in particular they focus on the problem of indistinguishable adversarial examples so are they is it really important that adversarial examples remain indistinguishable to the human eye so that was the main question and the answer in the end that they draw in the paper is that this is not the case so that in the end it's not very important for the perturbation to remain indistinguishable the only thing that is important is that they fool the learning algorithm so that was the main conclusion of course there are cases in which we need indistinguishability where at the border control and there is a face recognition mechanism which is supervised by the policemen they can be instructed to ask people to remove glasses or these kind of things especially if they have these fancy patterns on them so in some cases it may be required for the adversarial examples to remain indistinguishable but in most of the applications it's not required so that's the main point and they explicitly say that and they cannot find here we are unable to find a company example that requires indistinguishability so that was a main misconception which spread about 2013 up to couple of years ago but now I hope this will be clarified so indistinguishability is not really the point in this case I think I can just now it is a recent field so it exploded when this problem of adversarial examples was pointed out in this paper intriguing properties of neural networks but of course I hope to have convinced you that we can go back at least up to 2004 where we can find people that were working on similar topics in the area of computer security like in spam filters and these kind of things and then I think now more or less we are managing to get the two areas of research more close together and if you're interested in knowing more so this is the paper I was mentioning also before where we describe all this historical evolution of attacks and what are the common points between the different things so it's essentially a summary of this lecture and of course there are a couple of slides here to tell you that the problem is going outside of the technical expert so it's now becoming of interest for a larger part of the society different stakeholders of these methods and so there are legal issues if a self driving car as an accident, who is responsible for that for example so there are all set of problems which are now investigated and there is a very nice paper which is this one, the black box of AI on nature so I suggest you to have a look at that and then where the main point is also to require explainability and interpretable AI in some sense so if a deep net is telling me that there is a cat in this image and also would like to know why so I want to understand what the system learns from the data and if it learns something useful or not and this is very interesting, let me tell you this because there is a case where you have for example there was an error, so there is a famous paper where they show an error of a deep net there is a ASCII depicted in the image and it is misclassified as a wolf so it is a simple mistake but then if you look at the pixels that most affected the decision you find that the network believed that the ASCII was a wolf because there was snow on the background so the relevant pixels were just highlighting the snow in the background so what happened in this case it was that we had a bias in the training set so the network was trained with wolf images all displaying a wolf with a background of snow so in the end the network didn't learn the notion of wolf but it learned the notion of snow and so once you present any other object on the snow you just tell this is a wolf so that's a popular example I didn't have the image here but you can find it on the internet so also explainability of these methods it's a really important concern and another thing is that to be honest it's now very interesting because deep learning has become so popular nowadays so before that machine learning was not working that well SVM's decision trees all these kind of previous approaches were not giving the performance that we observe nowadays with deep learning and so showing that these algorithms were broken was not that interesting not that interesting to the community but now if you see that something has more or less super human performance as they claim at least on some tasks and then you show that it fails very easily on these simple problems then it becomes much more interesting so that's why there is this huge focus nowadays and now to conclude we also have to say that humans can also be fooled in different ways for example and you probably know this better than me because some of you are studying neuroscience so I guess you know these things of hallucinations and illusions and just to make a simple test probably you know this problem the butt and the ball problem so if you read that a butt and the ball together cost $1.10 the butt costs $1 more than the ball how much does the ball cost the immediate answer that one may give is that one costs $1 and the other costs $0.10 but the exact solution is $0.5 so it's $0.5 and this is due to the fact that if you limit in a very limited amount of time you tend to give you tend to simplify the problem and tend to give the first answer that comes to your mind so even when humans are restricted in decisions they may make wrong decisions as well so this is not only a problem with algorithms but also with humans so that's the main point here and I think if you know this book this is very well explained this is the Nobel Prize in economics okay I think more or less it's done of course the main point of this talk is that we have machine learning algorithms empowering a lot of application nowadays they are working very well but this comes at the price and then we have to take care also of the security aspects that this involves in many different phases okay that concludes the talk thank you very much for resisting up to the very last moment and good luck with the remaining of the school thank you