 ki da sem ohsožila razice. Zato je malo, da je podpočitno. Proste, da je je zelo authoritieska. Ko... Proste, da je za to, da je za to počitega tukateno, da je, da ga... Ne so! Ne je mi... Ne so! Ne so! Toto napravil od veliko najvislihovih urtičnih tutoriškov, ki je moji kolik Fabio Roly. Se imalo v din zelo ven Уs, poseljna boš tudi s nekaj pokazati odko. Tarko, da je to v medelju.ождam navzdi, nekaj je neko bilo zdajli, In... ...so, ki so boš vse del, najbolj tukaj s voživim vršaj. So sem idem začnem, za to, ko si nekaj... ...znjeli, ki se bo vzumeljno iz vzvega vzvega tukaj in včesnje vzvega. Nekaj je začnega artyka, da jih sem odmah so odličila, if, if, any. at least the year, let's say, anyone that's read some very old paper on this subject. Okay, this is mine, but it's not actually mine, so this is Fabio's choice, and Fabio is a full professor, so he's older than me. And this is a very old paper from 1966. It was actually suggested by this other professor, Gavin Brown from Manchester. But that's the kind, this is a very old survey on pattern recognition, which is a set of applications. It's a very close topic to machine learning in the end, so the difference is the kind of applications that you deal with, basically. And as you can see already, it has, during the past ten years, about 200 articles and several books have appeared, so this is just to let you notice that we are working now on a very different scale. Right, so I think 200 papers in the topic of adversarial machine learning, you can get them in probably not a week, but a month for sure. Okay, that's quite hard to keep pace. But anyway, what's the point here? The point is to have a look at which kind of applications were considered in this paper. Okay, so at least in the 60s. And so these applications were, at least some of them are listed here. So you have OCR, optical character recognition to sort some bank checks, so to automate the process of dealing with the bank checks. There was another popular task force on the area of photo recognition, so you want to identify different areas in the ground, if there's a forest, if there's a river or these kind of things. And also some applications from physics, which you probably know better than me, and it deals with the detection of particle tracks in bubble chambers. That was one application of photo recognition and machine learning back to the 60s. So what's the point here? The point is that these are all very specialized applications to be used by experts, by domain experts, so if you're a physician, you're interested in having this prediction, you can use photo recognition and machine learning to help you solve your problem, the problem that you have at the end. And this is fundamentally different from the use that we make today with AI or machine learning and photo recognition techniques. Okay, so that's the first point. Not that we do not use these tools for specialized application today as well, but we use most of these tools in commercial and other applications that deal with our own personal data, for example. So on your mobile phone you probably have different applications that make use of AI in some form to elaborate your data and draw some conclusions. And this is an example of how AI used today, for example, to help the development of self-driving cars. This is a paper from last year, ICCV, which is a conference on computer vision. And you can see that while the car drives, essentially this is a deep network, which is segmenting the image and recognizing different objects. You recognize pedestrians, you recognize other cars, traffic lights and so on and so forth. That's one example. There are other examples. As I said, in your devices you have voice assistants, which you can talk with. And these are the main popular ones. You have from Amazon, you have Alexa, Apple uses Siri, Microsoft has Cortana and Google has its own Google Assistant. Those are other examples. And all of them use, again, deep learning to process and recognize speech. So the point is, indeed, that we are using machine learning every day with our own data in our daily lives. So that's the point. So we went from very specialized applications to now something which is spread and used by the whole community, by the whole world. So the point is, and again, sorry, this is a prediction. I don't know if you know this professor. Andrea N.G. is one of the most prominent professors, the most renowned one in AI in machine learning. And he defined AI, the new electricity. So we can question that. There's a debate. It may be part of the hype. I'm not against that. But it's just to give an idea of the progress that AI has made over the past years. So the question is, is everything fine? Are we all happy with that of having AI processing our data or giving our data to big companies and having them process the data? Do you think it's safe enough for us as it is now? Are you happy with that? So let me give you some other examples to answer the question. I didn't know that your mobile phone has a fingerprint reader. Most of them now they have it. So the first one was, I think, the iPhone 5. But that's just an example of what happened. So after they released the mobile phone with the fingerprint reader, just after a couple of days later, I think it was in 2011 or 2012, no, 2013. It was released in 2013. After a couple of days, the phone team was able to crack it by essentially constructing a fake fingerprint of the person they want to impersonate. And this is the process. We also have a unit in our lab that works on these things. So basically, if you're able to take a good quality image of a finger, of the person that you want to impersonate, you can construct, you can extract the latent fingerprint from the image and then you can create fingerprints that you can use. Actually, if you put the plastic finger onto the phone sensor, you can actually log in as the person that you want to impersonate. That's one kind of attack that you can move against AI, or at least this fingerprint recognition system. And then one may say, but this is only a problem with old machine learning. Machine learning is another story. This is not something I defined, so it's something that it's been told as to one of these webinars. So sometimes we follow some webinars by some companies and we ask questions. As an attendee, you can ask questions to the presenter. And then once we ask, we know that machine learning has a problem of adversarial examples in some ways. And the answer was, no, this was for the old machine learning algorithm. For deep learning it's another story. So they were claiming that deep learning was secure. And actually this is false, because if you, for example, consider, this is a very popular, so do you know this paper? It's a very famous paper from 2013. Essentially it shows that if you consider a deep net for image recognition, so you want to identify objects in images, and you start from the image of a school bus, it shows that if you add a sort of a very specific noise, so it's crafted in a very particular way, which we'll see later on, but if you, given that you add this noise, which is magnified in the picture, by the way, for the sake of visibility, so if you add it to the image that you see on the left, you've got the image on the right, but the noise is more or less the same, because the noise is very small. But if you have this image of this school bus classified by this deep neural network, the result is that it is classified as an ostrich. So given this specific noise that you add, you can fool the algorithm to recognize any object you like that it recognizes. It can be an ostrich, it can be a pen, it can be a car, it can be whatever you like. Based on how you craft this noise. And this was discovered in this famous paper, which was published in 2014, intriguing properties of neural networks. But what I'll discuss with you is that we essentially showed the same idea behind constructing this noise one year in advance, but it's essentially the same mechanism. Yeah, please. You have to know. There are different settings. So if you know the network, it's easier. So you can fool it by applying a smaller perturbation. If you don't know the exact details of the network, you can apply a trick, which essentially is you can construct a copy of the network somehow that you use to build your noise model and then you apply the noise to the image. And then if you increase the little bit of noise, the attack, they say the attack transfer from one model to the other. Yeah, even in a black box setting. Almost a black box setting. Yeah, yeah. That's a good point. We will discuss this in detail later. This was just to show that there is a threat. Of course you need to know something about the training data. There are cases in which maybe you have less knowledge of the training data, but you can query the algorithm. For example, if you have a machine learning system, which is provided as a service in the cloud, then you can send queries and observe the probabilities. And by exploiting these mechanisms, you can anyway train, let's say, a substitute model. So your own ideal copy of the target and craft your samples against the model that you approximate. And in most of the cases, they transfer from one model to the other. So the attack is effective also against the system to not directly observe. But we will see more of this later. That was just an example to show that you can also full deep learning methods. This is another one. I don't know if you have seen this paper, but this is one of the authors. And what they have done is they built these eyeglass frames with this strange color pattern, which, of course, is done in a very careful... I mean, it's carefully tuned to full the algorithm. And what happens when any of the authors of this paper or any person wears these eyeglass frames is that they are recognized as Mila Jovovich by the learning algorithm. So this kind of noise is crafted exactly in that portion of the image to full this deep learning net, which, by the way, was trained on celebrities. This is a very good example, I think. This is a very recent one I found yesterday. So it's been on archive for a year. And what they do is they use a generative algorithm, a very popular one nowadays, which is the generative adversarial networks, to fabricate images of fingerprints that you can use actually to impersonate different people. So it's like having the master keys for doors, the passport tool, right? If you have a key that opens many doors, in this way you can create a fingerprint that matches different people. And then that's quite... I think it's quite shocking from the perspective of the perceived security that we have based on fingerprints, okay? So that's quite... I don't know why on using fingerprint to access my card and use my card. That's the point. And one can also say, well, okay, this is a problem. It's not only for standard machine learning algorithms. It also affects deep learning, but you only showed us examples on images. And maybe on another domains, we don't have the same problem. Well, it turns out that you have the same problem in different domains as well. This example is about audio recognition, so audio transcription. So I will now play an audio signal, hoping that you can hear it. And then we will see what happens when you transcribe the speech using Mozilla Deep Speech, okay? And the first one is the clean signal. So let's see if we can hear it. Probably not. Let's try it again. You cannot hear that, right? But it's saying, without the data set, the article is useless. Okay, that's what it says. And now what is in the demo is a slightly manipulated audio, which sounds more or less as the one that we hear, but with some tiny background noise. Okay, there is a small background noise. I mean, maybe for those on the front rows, you can hear that more or less is the same, but if you play it louder, you can hear a very small background noise, and essentially we perceive the same sentence. What it is transcribed instead is this. And the point is you can transcribe whatever you like. Okay, you can manipulate the data in a way that the network sees a completely different pattern. And this is a contribution by Nicolas Carlini. There is a paper, I think it's called Audio-Adversarial Examples, which is very interesting. And we also show that the same problem exists on malware detection. So when you have executable files for Windows, and you want to distinguish legitimate files from computer viruses, okay. There is an approach that tells you can train a deep network on row bytes, on the byte sequence, in order to distinguish between these two categories of files, so benign and malicious, which is called malconf. And what we showed in this short paper is that you can basically add some bytes at the end of the file, so they're called padding bytes, which do not have any sense for the program, and they do not alter the program because they are never executed by the program when it runs. But what happens is that if you carefully optimize this byte, as we will see later, you can mislead the algorithm in thinking that a malicious application is benign. This is what we call the evasion rate. So the fraction of manipulated malicious samples that are instead recognized as benign. So you have a computer virus, which is not identified by the algorithm. And in this case, by adding up to 10,000 bytes, which by the way are less than 1% of the total size of the file, which is around a megabyte or two, you're able to, in 60% of the cases, you're able to fool the deep net. So that's another example. With respect, if you just do a random byte addition, you can get more or less 20% evasion rate. But this is just to show that if you carefully optimize the noise that you add to your data, you can really screw up the algorithm to a large extent. Ok, so the take-home message from this first part is that using AI is fine, it is empowering, you know, many applications, many, many, facilitating many tasks for us, but besides these good possibilities, there are a lot of threats that are potentially arising from AI, from the use of AI or machine learning in consumer applications, mostly. Ok, so that's the main first point take-home message of the talk. Now in the second, let's say, small part, we're going to ask the question why is this the case for machine learning? So why machine learning is so much vulnerable to these kind of threats, to these attacks? So where do these security risks come from? And for that, we have to make a step back and discuss how machine learning works, at least in its basic principles. So all the theories that you normally discuss to estimate generalization error and these kind of things are done on top of a fundamental assumption, which is that the data that you use to train your machine learning algorithm comes from the same distribution that you have in testing. So if your data is sampled from a Gaussian distribution for a training set, you expect all these bounds, all these good performance is guaranteed only if you work under the same distribution in the test phase. So that's the main assumption that you have. Of course you don't know the underlying distribution that algorithms make. So you have a data source which ideally samples data from this distribution and then you extract some measurements, for example, those are normally called the features. With deep learning you start from raw data and then they learn a representation. So in some sense they learn the features, but for traditional algorithms you make some measurements. For example, if you want to classify people to create these kind of characteristics. So you map the raw data to a feature vector, onto a vector space. That's the point. And this is your representation and on top of that then you can learn your preferred machine learning algorithm and then test it on data for which you don't know the labels. So you simulate the operating conditions of the algorithm and see how the performance is. And just to debunk it a little bit more so the source of data is given and it does not depend on the classifier. The way the data is generated is independent from the classifier that we use. It's not affected by the classifier. And the noise that affects the data is in some sense stochastic. And again it does not depend on the classifier. So if you have in the OCR problem you may have some corruption of the scanned image for example, but this kind of noise is absolutely random and stochastic. So those are the main assumptions. Under these assumptions these methods work quite well. The question is, can we rely on the same model under attack? Do you think these assumptions still hold when we have in front of us an attacker that wants to mislead our algorithm? Is the data source benign or is it in some sense malicious? The data depends on the classifier or not. So this kind of thing is the kind of things we have to question when we want to deal with this problem. This is just an example on classification for spam filtering. So I'm trying to answer this question by running an underlying example here. If you consider the simple problem of spam filtering it is normally addressed using a linear classifier. It's a very simple method and it's not really far from what is used in practice. If you know some of the most well-known open source anti-spam filters they are actually based on linear classifiers. So spam assassin, bog of filter and many others use a binary representation of the text. So they basically check if a given word is absent or present in the text or in the email. If it's present you have a feature equal to one otherwise you have zero. And then on top of this they learn a linear classifier. So what happens is that if you have two words like here the linear classifier assigns during training it learns a weight on each word. In this case you have a gram and here positive weights the weight is positive the more the word is supposed to be malicious or belonging to the spam class. And then the linear classifier just sums up the contributions for the present words in the email. So in this case you just sum up one plus five and then you have a score of six assuming some threshold of five then you can say this is classified correctly as a spam message. Ok. And now if you look what happens in the feature space at least yes, in this case it's not accounting for the sequence of words it's just looking at the distribution of the words. It doesn't matter which order they have in this case. You can enforce that by using for example n-grams instead of having one feature for each word you can have a feature which is the concatenation of two words for example. And then if you see those two words it's one together. So in this case you can use linear classifiers but on representation that takes into account the order of words. But most of the spam filters use more or less this representation. They are very simple and in fact it's not that they don't like they don't work very well. But this is just a conceptual representation of what you have in the feature space every point is an email and this is for example the email that we consider before and the classifier is just a linear separator in this space. So I think this is easy for you to understand given the all mathematical details that you have looked these days so I think you can understand this linear classifier and that's the conceptual representation of the nine samples legitimate emails on the left and malicious email on the right. And now the thing is if you are playing the role of the spammer what would you do to mislead the classifier in this case? So what happens if the source is not legitimate? Well it happens that common spam tricks include manipulating a bit the words in the spam message and a very popular trick is adding good words. So the spammer is essentially writing the spam message and after that they just injecting random English words for example hoping to guess some of the good words used by the filter. And several times the words that are added to the email are also painted in white so that as a human you don't see but the machine still finds the content in the message when you process that. It's like writing the text in white a human doesn't see it on the screen but the filter still reads it. And then if you add these two words here what happens is that you have to sum up also their contribution and assuming that they are good words they have a negative weight and then here it's clear that if you sum up all these values which is lower than the threshold and then this email which is indeed a spam email is misclassified as legitimate. That's a very simple example of how you can fool a linear classifier. A very simple one. And what happens in the representation of your problem in the feature space is that you have the malicious samples that are shifted towards the boundary. So the email that was lying here indeed moved on the other side of the boundary to cause a misclassification. So your data is in some sense manipulated and it changes to be misclassified. And of course this kind of noise that you add is not random. As you can see from the picture is really tailored against the classifier. So if you change the classifier you change the weight, you probably gonna change the way you modify the words. Good means that they are good for the filter. So good words are the words that But for example, someone is sending a spam, it will have bad words. Good words in a spam. What's the question exactly? Why we should focus? You mean the classifier? Well, you don't know that it is a spam email in advance. You have to classify it. Yeah, but well, this is one part of the trick. The other thing you can do is change. You can also misspell some of the characters. So instead of writing Viagra you can write Vi at GRA and as a human you will still read the message but that will just make this word disappear. So it's just an example of the potential. Again that's a distorted word. Good emails, technical emails will tend to avoid, right? Yeah, you won't have it in good emails but the point is if you don't know that it is a good word it is a bad word you will give zero weight to that and so you will anyway decrease the score. So you have two ways to decrease the score in this case. Either you add good words that are used by the filter and that are present in your legitimate emails or you can delete bad words. You can obfuscate bad words in a way that the machine doesn't see that the bad word is there but the human can still understand the meaning of the message. So that's the point. And you need good words in your filter to correctly characterize your legitimate emails. Otherwise you may have legitimate emails misclassified. So your point is what happens if you have legitimate emails that is very rare. Yeah, but yeah. So the problem is if the email is very short it's very difficult to classify. So if you have more texts you can take a more reliable decision. But this is the general problem of how such filter works. And they work with quite a good accuracy if you train like them in this way. But our problem is different. So we are looking what we can do to mislead them. Your question is general to how these spam filter work. And there are more sophisticated one and you also have information from the sender. There is a bunch of rules. Text analysis is just one of them or a couple of them. But it's not the only thing you need to make a decision on an email in general. But this is a way you can use to full a text classify our spam filter. And this is actually tricks that we see in our emails spam emails. So I got a lot of emails where you have this part of the English dictionary attached at the end or you have manipulation of strange characters. So this is also an attempt that you see in the wild. Maybe it's not specifically targeted against the learning algorithm but it's anyway an attempt that they do to obfuscate the message to the machine. So that's the point. And so you have this kind of shift of the points in the feature space which causes misclassification. And the point is that the noise that we have, so this adversarial perturbation is not random at all in this case. And so given going back to the model that we've seen before we already see that these two assumptions are essentially violated because what we have is this kind of condition. So the source of the data is not neutral is in fact an attacker against our system at least for some portion of the data. And the noise that is applied to the data is not even stochastic. It's not random. It depends on the kind of classifier. So we will see more on this later. But the point is these classifiers or these algorithms are very vulnerable subjective to these attacks because they were not designed to cope with them at the beginning. So they were designed in a benevolent setting where the data comes from a legitimate distribution and there's nobody altering data in a very worst case for the algorithm. That's the point. Well, notice also this is a small curiosity that the distinction between adversarial noise and stochastic noise is not new from this problem. It's similar to the distinction that we can make between the noise model used by Shannon and the one used by Eming. In the first one you have probabilistic model of the channel and the noise was stochastic on your data whereas for Eming in the Eming's model you modify the worst case bits in the signal. To analyze what happens when you change the k bits that will cause the maximum probability of error. So that's more or less the intuition. So we need to move from a general problem where you have stochastic noise to a problem where you have a noise which is targeted against the model and do this kind of worst case analysis. So essentially recap on why these techniques cannot really work and they don't under the presence of an attacker. So if in front of you you have an attacker then these methods don't work because their underlying assumptions are violated. So data is not identically it's not IID independent identically distributed from a distribution. The distribution of data is changing and even worse it's changed in a malicious way so it's changed in a way that maximizes the error of the classifier. So that's the point and so overall solving these adversarial problems with the vanilla machine learning algorithms is a mission impossible because their underlying assumptions are all violated. That's the point. So is that clear to everybody? Can you follow? I think it's very introductory and very easy at this point to follow. And then the next question is how should we design then machine learning algorithms or classifiers when we are under attack. So when we are in these adversarial settings. And so the first thing is that we should keep in mind how the process works so how does a real attacker what he or she will do to mislead or algorithm or system. And if you are an attacker in some sense you establish an arms race with the defense system. So as an attacker you first analyze the model and then you try to devise and execute an attack to fool it. And from the defender perspective when you got some of these evasive samples you analyze them and then you try to build a countermeasure. And this is quite clear in the cybersecurity domain. So where you have for example malware, some malicious software and code is manipulated in a way that misleads the antiviruses. And then the computer antiviruses they are in a way manipulated to catch up with these modifications. So there is an arms race between the two players. And what we advocate here is that we have to keep that into account when we aim to design your machine learning algorithm. And just to give you an example of this arms race I am reporting in the next few slides a case that we started in back to 2006-2007 which is called the image based spam case. So at that time spammers were using another trick to fool text classifiers. So their idea was even more severe than just in the text. So they took a screenshot of the message and attached the image with the embedded text to the email. So now you have an email with an image and you cannot see any text in it. I mean the machine doesn't detect any text. There is just an attached image which contains the malicious message. So the spam message. As you can see here it appears that you can buy and similarly here and then you have a bunch of random words just in an attempt to try to decrease the maliciousness of the spam email. That's a popular trick that arose in 2006. And this of course was very effective in defeating the spam filters. We had a lot of cases and the first defenses of course and the last thing that you do to react to this attack is, well, let's try to make an optical character recognition. So we apply OCR to these images, trying to extract the text and then we try to process the text with the standard learning algorithm and see if we can still detect the images. And so this was one work that my colleagues did at that time. We created a signature for every image that you see and you know it's spam. So you know for example that if you see, once you classify or someone classifies this email as spam you know that this image is a spam image. You can extract a unique signature for that file and then you can just add it to your blacklist in the spam filter. So once the spam filter finds this image it immediately blocks it without even doing all the computationally inefficient OCR analysis. So you have these two kind of countermeasures and then what these spammers did again was to start again randomizing the images to avoid signature based detection because if you just change a single pixel the signature that you get is completely different. And the other thing they did was to make it harder for the OCR to read the text. It's like a capture but it's used in a malicious way, right? So you want to avoid that the machine is able to read the content of the message. And these are examples of the, let's say second evolution of the arms race and of course then we did some other work also but I mean this was done by others as well and the idea to detect now these different spam images is to use low level visual features like number of colors this kind of information that you can extract from images and then you can really again detect them from normal pictures or whatever. So you can again increase the level of sophistication of defenses to be able to again detect them. And the complete story you can find it on Wikipedia but what happened after that was that we observed the decrease in image spam in the volume of image spam which almost disappeared in 2010. But it didn't disappear because we used these countermeasures. It was just that spammers find other ways or again usual tricks that they used to evade detection. So this was just a trend over those years but I think it's really providing examples of the arms race that we have in cybersecurity domains where you have an attacker that has a clear incentive economic incentive to misle the learning algorithm. Ok, then let's go back and this was an example of the arms race now how can we model the arms race when we want to develop design these machine learning algorithms this is what you've seen before so the thing will be to in some sense play also the role of the attacker so we have now the system designer which is the role we play we try to impersonate and anticipate the attacker so as a system designer what we would like to do is first of all have a model of the attacker to identify all the possible threats that one can move against a learning algorithm simulate them and evaluate their impact and then potentially if the attack is really impactful on the classifier we would like to be able to come up with a defense or with a mitigation strategy and now these three rules are summarized here we call them, using a metaphor we call them the three golden rules the first one is know the adversary so you have to create a model of the attacker the second one is be proactive means try to anticipate what the attacker is going to do to model the attacks and simulate their impact and the third one is protect your classifier so develop mitigation strategies or countermeasures when it is required and now the rest of the lectures between today and tomorrow will just cover these aspects so in the reminder of this talk I'm going to talk about how to model the attacker and then how to simulate some of the attacks the examples that I showed before they are just a specific category of attacks but there are other threats that you can envision against learning algorithms so let's talk about how to model the attacker we can model the attacker using a 3D model where we specify what's the objective of the attacker what's the knowledge he or she has about the system and what he or she can do to manipulate data in a way that reaches his goal ok which kind of manipulations you can do regarding the goal there was an interesting work in back to 2006 so this was a problem studied around 2006-2010 and some researchers from Berkeley had a nice idea at that time which was to map the notion of security violation from security engineering to the case of machine learning so you can essentially characterize the goal of the attacker along three axes which are again these kind of security violations so the first one is is when the attacker has the goal of violating the integrity of the system now this means that the system keeps operating as normal so for legitimate users there is no change in how they use the system but the attacker is able to perform some specific intrusions in the case of spam this means that your filter mostly works correctly for your legitimate emails and correctly detects most of the spam but some spam emails are able to get in that's an integrity violation in this case you can also have an availability violation which is as an attacker that the system is no longer working for legitimate users and this is a denial of service so I may want to manipulate your anti-spam filter in a way that for example most of your legitimate emails are now misclassified as spam and so the system is no longer usable by you there is even another goal that the attacker may aim to reach which is violating the privacy of the system so there are ways in which you can perform attacks that extract private or confidential information from a machine learning algorithm for example if you have a face recognition system and you can query the system trying to impersonate a given user several times what you can do is in the end extract the face of some users you can extract the face of some of the system users by just quering the algorithm many, many times and I will show an example of this or you can also simply leak information about the algorithm and get this information for other purposes so that's an example of privacy violation and during the tutorial we will see different examples in all these categories so that's for characterizing the goal of the attacker and now we can also discuss how to model what the attacker may or not know about the system that is attacking and so for a learning algorithm you can assume that the attacker has some knowledge of the training data or not for example if you use publicly available data to train your algorithm then the attacker may also know the exact data that you use to train it if you take for example ImageNet which is a very popular database with labels if you use that then the attacker can also use it it's publicly available so he knows it if you have private data for example from medical imaging then it's most likely that the attacker doesn't have access to that so you have to make different assumptions on the level of knowledge of the attacker about the training data the other thing is whether the attacker knows which measurements are you extracting to perform classification so which kind of things you look at, how you transform the raw data into your vector space that's the kind of mapping function that the attacker may know in this case and this also can be partial or guessed or whatever and then he may also know the specific learning algorithm that you use so if you're using a given model a given deep net for example he may know the exact architecture of the network or he may just guess how many layers you have which kind of structure you use these kind of things so there are different levels of knowledge that you can have on each of these components and based on that we have different attackers so attackers with different power if the attacker knows everything about the system we are in what is called the white box setting so you can take the algorithm and look inside you know all the details now one may say this is not very realistic for many applications as the questions that was done before but in practice it's very useful to understand which is the worst case for the defender so let's assume the attacker gets to know everything about my algorithm what's the real level of security that I can get so we can evaluate the worst case within this assumption so in more realistic assumptions relax the knowledge of the training data and of the specific the specific trained instance that you have so for example normally from the defender perspective it's correct to assume that the attacker doesn't know the training data maybe he can anyway sample something from the same distribution so he may have a guess of the training data and he doesn't know the exact trained classifier which means the neural network that you're using the architecture number of neurons number of layers but it doesn't know the exact weights that you get after training so this is quite a common assumption when you play the defender role because you have to put yourself in a sort of pessimistic case assuming that at some point the attacker may be able to disclosure to have this information and based on that there is a lot of attacks ranging to black box where the attacker doesn't know anything about the system to gray box where this is the knowledge that he has so partial knowledge of training and the learning algorithm and knowledge of the future space that's for the gray box of course the point I was raising now is that you should not make very conservative assumptions on the attacker's knowledge in security when you want to understand the level of security of an algorithm it's not a good practice to consider a very weak attacker so it's better to consider a strong attacker in a realistic setting and then size the system against it and so basically this is called also the security by design principle so you assume that you have a powerful attacker which is secure by design it's the opposite of security by obscurity where you say I am secure just because the attacker doesn't know how my system works yes yeah, yeah, yeah some cases now with queries you can get a lot of information maybe not an exact architecture but you can extract a lot of information but it's anyway it's not a good practice to assume that there wouldn't be someone smarter than you that is able to find a way to get additional information in the framework I think we didn't make this explicit but I agree with you so there is something you can think that it's more exposed in some sense which you can consider more protected information but normally the assumption that we make is that the algorithm is public and then given that the algorithm is public tell me which is the level of security I mean this seems a bit boring maybe but it's the practical effect of this is that we also have an arms race between researchers so there has been people trying to break defenses proposed by other against these threats against machine learning and the fact so one very famous paper is this one by Anish Atali and Nicolas Carlini which also won the best paper at ICML this year so what they did was essentially show that many of the proposed defenses were ineffective and they were ineffective because they were designed assuming two weak attackers so the main flow of those works was that they were assuming the attacker was not exploiting the defense at all so if as an attacker I don't know how your algorithm works I don't know which kind of defense you put in place of course you can defeat me but if I get to know how your algorithm or your defense works then I can break it again so this was just shown by Nicolas and Anish in this paper and it's a clear violation of the Kerkov principle and the other guys clearly violate this principle because they believe they had something secure against weak attackers so that's the main point so the attacker has to know how the algorithm and how the defense works to really measure if it is secure or not so the point is that it was very easy to defend against attacks that do not know so you have not to be too conservative not play against two weak adversaries that's the first important point and then of course the third axis of the model is defining the capability of the attacker and the attacker in this case can roughly modify in some cases can have access to the training set so it can manipulate some of your training data or in some other cases can just manipulate the test set for example the training data can be manipulated if you consider an online learning setting like you have again your spam filter you receive some emails and then you put a label you say this is legitimate or this is spam and this information is used to retrain your filter so if I am an attacker I can for example send a spam message with good words so painted in white so you see this is a spam and you flag it as spam but what happens is that your filter is then retrained by believing that all the words in that email were spammy words were bad words so gradually you poison your filter into believing that some of the good words are actually bad and the net result of this is that some of your legitimate emails will be misclassified as spam that's a way you can use to retrain the learning process of the algorithm so that's one capability and there's a very popular example I think you've heard of this is what happened to this chatbot to Ty it was a chatbot available on Twitter it was put on Twitter by Microsoft and it was just discussing with other people on Twitter and in the end it was essentially learning from the messages that he received and it was replying to this message and in the end when people discovered that Ty was more or less repeating their opinion their opinions they started to put harsh sentences or bad things they send these messages to Ty and Ty started to become racist and then put in some discriminative sentences this kind of thing so you have this you can read it yourself and so Microsoft was obliged to shut it down after just 24 hours so this is more or less an example of poisoning when humans discovered that they can game the chatbot they did it just for fun maybe someone for fun, someone not that's an example when you can manipulate training data of a learning algorithm and the most classical one is when you manipulate data at test time so the training data is not touched but the attacker can modify test samples so he can manipulate the spam messages that he is creating for example so those are the main capabilities and of course all this area of adversarial machine learning makes sense because the attacker is constrained because if the attacker might have infinite power there is no game at all so the adversary would always win in this case all this field makes sense because the attacker is constrained and he is constrained by the fact for example that spam messages has to be still understandable for humans so I cannot completely delete the spam message there has to be some message understandable by the recipient of the email or it's even clearer in malware detection when you have a computer virus you cannot manipulate the code freely because that would compromise the intrusive functionality of the malware sample you cannot manipulate the exploitation code if you want that the attack still works and therefore again the attacker is constrained so some portion of the file might be changed and there are also other constraints in other applications but the important thing to recall here is that we have an attacker which is constrained more or less constrained depending on the application but we have some hope to solve the problem and win this kind of arms race the constraints that we typically use on how the attacker can manipulate data are the following so if he has access to the training data normally he can only modify a small fraction of the training data and this is this assumption comes from many examples for example if you have a system that is doing network intrusion detection so you are monitoring network packets maybe the attacker has compromised one machine or two but is not able to change the old network traffic so it may affect 1% or 2% of the old traffic this is an example normally when the attacker tries to poison a model you always assume that the fraction of samples that he controls is relatively small whereas in the evasion case so at test time what you have is constraints on how you can manipulate the data so if you have your email for example and you are bound on the number of words that the attacker is allowed to modify and for example you can modify 10 words or 20 you can put a bound on that and then you can make the analysis by varying this kind of constraint on the attacker and there are application specific constraints so normally what you have is that if this is the feature space these constraints are put in a mathematical form just considering a feasible domain around the point of interest so let's say x is my original email if I can manipulate at most 10 words in this email then I have this kind of box constraint or sorry L1 constraint in this case but you can have different constraints that identify a different feasible domain by feasible domain I mean the set of samples that can be reached in this formation that the attacker can do on the data without compromising for example the readability of the spam message or the intrusive functionality of malware code ok and again here we have the an assumption which is similar to the Kerkov principle so we should not test our system against attackers that have a very restrictive capability of manipulating data for example it's not very fair if we just test what happens if we change a couple of words in the email what you can do is try to manipulate much more depending on the size of the email of other emails or this kind of things but the interesting thing is that you can perform an analysis by varying these parameters and we will see that later so you can really understand how the performance of the system varies when you are under attack of different strengths so if you increase the power of the attacker you expect the performance decreases more and more and based on how it decreases you can say this is more or less secure and we will see clear examples of this later ok so are there any questions so far very introductory part and I know that all the general model can be a bit boring but now I am going to provide something more interesting ok so now what we are doing is starting to play as the attacker so the thing is that we have a model now we make some assumption and then we start devising the attacks and the important thing that this model allows us to do is that we can craft optimal attacks in some sense so we have a clear model of the attacker we make clear assumptions and under those assumptions we can devise optimal attacks the best that the attacker can do under these assumptions and so let's go back to the problem of linear classifiers and spam again here you have another example so you have a spam message and these red words that are recognized by the filter so they are known by the filter so those are the words that the filter learned as bad words during training the representation in terms of the feature vector of this email is here you have one in correspondence of words that are present in the email and zero if words are not in the email and then again the final score is computed by summing up the weights of the words which are present in the email we've seen this before it's just another example so here you just sum up the contribution of these words this is a linear classifier and again here you have a score which is positive higher than the threshold and then the email is correct to classify now this was a problem we studied around 2007 I think when I started the PhD I mean about the connection that you have among the features and the weight vector you know that each word has a given weight so what would you do to evade the classifier you can manipulate some bad words that we've seen before you can change some A to 4 and then the word is no longer detected in this feature representation but you can try to add some good words that's very easy so you can really do the computation by yourself and therefore in the end you find a sample which is slightly modified but evades detection that's very easy for linear classifiers you have a clear mapping between the input features and the output of the classifier now the question is what happens if you have a more general classifier so if the function is not linear for example if you have a neural net you don't know how to manipulate words in the email so you don't know if I change this word the score will go down or up and I don't know to which extent this will happen and so it's not that easy and not that trivial and it was not clear up to the point that there were papers in 2013 where they advocate that nonlinear classifiers were more secure than linear ones just because they were not able to break them enough and this was a paper published on one of these top tire security conferences so NDSS is one of the top four in computer security so it's a very selective conference and anyway what they wrote is this is quoted from their paper they say the most aggressive revision strategy we could conceive was successful only for a tiny fraction of malicious example against a nonlinear classifier and this was the problem of was a study on PDF malware detection so you mean this means they tried to detect malicious code embedded in PDF files so the thing is you can have computer viruses within PDF files they were trying to detect analyze the PDF file each PDF file and detect whether there is a virus inside that was the scope of the study and they propose a method that can detect malicious scripts or malicious elements in each PDF and they tried an invasion strategy against linear classifiers which was quite effective and then they tested it against nonlinear classifiers and the result was that the nonlinear system was much more resistant to this attack and indeed they conclude that the robustness of this classifier of the nonlinear SVM must be root in its nonlinear transformation which is like saying this is much robust, much more robust just because the adversary doesn't know how it works so it's again security by obscurity and then of course I knew this, I know this authors because we work together on another paper meanwhile and then I just contacted them and said I do not completely agree with this sentence that you wrote here because I think we can formalize the problem in a different way and create an attack also against a nonlinear algorithm so and the point was trying to do the same thing we did for linear classifiers so we have a score the classifier gives us an output of a sample given a sample it gives me an output if the output is positive then the sample is malicious so what ideally an attacker wants to do is decrease the score as much as possible in a way that from the positive decision I go to a negative decision and the sample is misclassified into the other class so I start from a malicious PDF I manipulate it in some way and I want to decrease the score so you can write it you can say if G is the classification function and X is the input sample I assume that I can make some manipulations on X and this will be X prime and then I just want to minimize this score so if the classifier says plus 6 I will try to reduce this value up to cross the decision boundary and become negative so that's the purpose of this problem we started by looking at the white box case so let's say the attacker now knows everything let's see what's the best thing he can do so the worst case for the classifier and of course there are constraints which we've seen before they can be encoded in terms of distances into the feature space in this case this is also old for this PDF problem because essentially they were looking at the keywords that are present in the PDF file so every PDF file if you open it with a text editor you have a set of objects each object is characterized by a keyword and then the content and they were just looking at the set of keywords present in the PDF file so it was like classifying emails in the end and then if you measure the distance between two files it means how many keywords or many objects do you have that are different between the two files and therefore you can really enforce these constraints so you measure the L1 distance between these feature vectors and you have the number of objects that are different in the files so we formalize this problem and once it's formalized it's clear that you can solve it with known techniques so the real idea was to formulate the attack using an optimization problem so that was the main idea instead of proceeding with using heuristics and this of course is in general a non-linear constrained optimization problem because the classifier is non-linear and it's constrained by the manipulations that the attackers can do on the data and you can solve that by using projected gradient descent so if you run a gradient descent algorithm you manipulate the input X to X prime by following the steepest descent direction so what you get is something like this and you can solve this problem so you find you look for the sample in the feasible domain that maximally decrease the classification output that's what we call maximum confidence attack because it's maximizing the confidence of the opposite class so the class where I want to have the error and then at that time we just I mean this can be solved if the classifier is differentiable which is the case for neural nets for every algorithm that you can train with gradient descent you can also do the attack is essentially using the same gradients plus a small part which is going from the parameters to the input space so you have just to multiply the gradient by the gradient used for training times another small gradient parameters with respect to X it's just a chain rule and then you can do that for SVMs and neural nets for example those classifiers at that time tensile flow and pi torch were not very popular so we had to compute the gradients by hand if you use these new frameworks nowadays you can they can compute the gradient by automatic differentiation so you don't even need to compute the closed form for the gradients then I'll tell you more about this but do you know what automatic differentiation is? more or less? so this is how the gradients looks like and this was an example of these digits that was also mentioned so this data set was mentioned this morning it's just a collection of digits from 0 to 9 and you want to recognize their class in this case it's a simplified problem we just took examples from the class 3 and the class 7 and we have a binary classifier that tells this is a 3 or this is a 7 and now the goal of the attack is to have these 3 from this original image and then this is what happens after you run this gradient based algorithm to reduce the classifier score on the input pixels and as you can see it's just changing a couple of pixels but what happens is that this is enough to fool the classification so instead of having something that is very similar to a 7 you end up having a sample which is very similar to the original one but you have just changed some minor characteristics of the sample I think this was quite surprising I think because what you expect normally is that the 3 may gradually become a 7 that's what you expect if this algorithm learns something meaningful but instead they just learn correlations among pixels and so that's what you need to fool them and then at this point we even did a step further because one may say this is very interesting but you only show that you can break the algorithm by using the perfect knowledge of the algorithm so you're in a white box setting the attacker knows everything so it's very easy to break this algorithm under this condition and then we say let's relax this assumption and let's reduce the amount of knowledge that the attacker has about the algorithm and let's assume that for example the attacker can have access to some data which is not the training data I'm assuming that he can collect some data from the same distribution it means in the case of trees against 7s you can have other examples of trees and 7s other digits from those classes but not exactly the ones used for training your target algorithm yes and in fact what happens is that if you collected data similar data what you can do is even in some cases you can send this data to the true classifier to get the labels back for example let's say I want to fool your antispam filter let's say I want to fool gmail for example what the attacker can do is to send an email to himself and then if you receive the email then it's labeled as legitimate if not it's labeled as spam so you can observe the feedback and you can use this feedback to label your data the data that you collected at this point you're given a learning problem you have data X, you have labels Y and you can train your own algorithm and in this way you can train a copy you can train a classifier f' which is very similar to f of X at this point the attacker can do the same attack on f of X of f' because it knows all the parameters of this classifier and then the points that you get you can send them to the original classifier to the target classifier and if you do that it turns out that in most of the cases the attack transfer to the target algorithm not always not always but if you manage to get a similar function then you can you can evade it and we did the test again the digits and the pdf data but here I'm just reporting some more recent results on another task so here the task is that of finding malware for android mobile phones so you know that you when you install applications they can contain viruses so the task here is to classify apk files which is applications for android into either benign or malicious and for doing that there is a very popular system which is called drabbing that uses these features and uses again a binary representation so you look for example which is the hardware used by your application requesting access to the camera or to the GPS or these kind of things you look at the permissions which other applications are called which systems calls are made you look at all these features and then you build a very large feature vector which is very sparse so you have, I think it was hundreds of thousands features but it's very sparse and this is the example that you have where it's permission to read messages text messages you have one here if it calls this function then you have a one and so on and so forth and then on top of this you can learn again a linear classifier that works very, very well on separating benign application from malware and then what we did was again test or attack against this system and surprisingly what happens is that this system very accurate so you have more than 95% detection rate on the static data so on the data which is not manipulated and that is what you have here in zero pk is the white box k is the black box attack so you have perfect knowledge and limited knowledge and what you see is that here the number of modified features is actually the number of objects that we add to each malware application so we take a piece of malware and we add instructions to it instructions that are not corrupting the way the malware works and then interesting what happens is that when you have full knowledge of the classifier it's enough to add from 5 to 15 objects to this file to completely defeat detection so it's no longer detecting any malware if you manipulate the malware sample in this way when you have full knowledge of the classifier if you have limited knowledge as I said you can construct your substitute model your surrogate classifier and design the attack against it if you do that it's slightly more difficult to evade completely the detection but as you can see if you inject from 15 to 50 features to 50 objects then it's completely defeated again so the message here is that and for non-linear classifiers for many non-linear classifiers is the same so I also have results for a huge number of classifiers and we can show the same and so the take on message is that both linear and non-linear classifiers are highly vulnerable to this threat so to evasion attacks and again the other takeaway is that the performance of the classifier under attack should always be evaluated as a function of the attack power in some sense so you have to test how your system works when you make more or less assumption on the knowledge of the attacker so it may be white box or black box and then you should also evaluate what happens when the attacker can manipulate more and more some features of your data and that gives you more or less a comprehensive of the classifier performance in these adversarial settings and we still have time another question is why is machine learning so vulnerable we know that the idea assumption is violated we have adversarial noise and so the problem is much harder but why is this happening in this specific case of Android so why if I just change so if I inject five objects into my malicious application it's misrecognized as benign why this is the case and then for that we did another short work where we found that different learning algorithms in these settings at least so when you have a very sparse data set tend to in some sense over fit and over emphasize some very specific features what happens is that the decision will just rely on the presence or absence of very few specific elements in this case we have a lot of features but the classifier effectively uses just a very small subset of the features and then when the attacker is able to recognize them he can manipulate just few features to get through so that's the first point so the algorithm is vulnerable because it tends to over emphasize very few features and this is a compact representation that we elaborate that summarizes more or less this view I'm not going to explain this but you have basically for each category of malware and benign files the most relevant features that contribute to the decision this is a sort of explainability model for this kind of classifiers and as you can see we just extract 50 features 50 or more or less some tens of features that explain the decision and this plot here is very sparse because only few of them have a very significant contribution to identify a given class of samples so this more or less depicts what is stated there in words and the point so if we want to rephrase it again the function that is learned for performing the classification task is very sensitive to input changes so it's a problem of in some sense sensitivity of the function you have a function which is very accurate but as soon as you perturb the sample a little bit then the value decreases significantly so it's a problem of sensitivity of the classification function that we learn so should we just very strongly regularize the weights yes so if you regularize the weights you get some improvements but we will see later when we talk about the counter measure that the best thing to do is to match the regularizer to the kind of noise that you have in the data there is a very nice connection at least that also at least for linear classifiers that shows that the regularizer that you use matches some model of worst case noise on the data and that's a very nice part I think and in fact we use that to make this system much more robust than it is now but in the end it's a problem of sensitivity in general of course there are several phase sets but that's the main issue I see so as soon as you create a sample which is let's say out of the distribution where the classifier performs very well you have this sudden drop of the score you have the manifold and the score the classification score becomes essentially arbitrary there was another question ok another very interesting thing that we found here is that if you make this sort of representation for different classifiers you more or less find the same thing which means even if you use classifier that optimize different surrogate losses you can optimize the cross entropy there are many surrogate functions that you can use to estimate the 0,1 error do you know what I am talking about you cannot optimize directly during training the 0,1 error because it's not convex and it's not going if you do the gradient descent here it's not going to bring you anywhere so there are different estimates of the loss you can estimate the loss in different functions so even if you use different objectives for the classifiers it's the same solution which is summarized here more or less and this also explains why the attacks transfer among different classifiers they just transfer because you have more or less the same classifier so the probability that you craft an example and it works against one classifier is more or less the same that you have against other classifiers just because they are learning the same things that's the main point and this was very interesting I think it was a very interesting discovery which we also trying to build upon with some more recent work now I think I'm going so we have time up to 15 past four we can also discuss what happens what happened at about around 2013 when this vulnerability was also shown for deep learning algorithms if you, I'm sure most of you know this paper but that's the starting point that made this field explode, literally so this is a paper, it's called Intriguing Properties of Neural Networks is by Christian Zegedi, Young Good Fellow and other people from Google Brain and what they were studying was they wanted to study out to in some sense interpret the decisions of neural nets, so what they were trying to do is take the image of the school bus classified by a deep learning model and then they wanted to perturb the school bus to see when the network changes the decision to become, to classify it as an ostrich for example what they expected to found was to have the image of the school bus that gradually becomes the ostrich so it grows some feather changes the shape, these kind of things but instead what they observed and they thought it was a bug at the beginning it was that if you add a very small perturbation that's enough to flip the decision of the classifier and so that was very surprisingly because it means that the network can learn the semantics from the image it just learns again correlations among pixels more or less and they show that they essentially aim to find the minimum perturbation that allows you to flip the decision which is slightly different than the formulation we used and this is what they've done so you have classify output x is the original image and r is the noise and then you want to have misclassification so l is the target label and you want to mislead the decision so you want to have the wrong label l here when you add the noise and the noise is bounded in some domain and basically they minimize the norm of the noise and then they look for the minimum amount of perturbation that allows you to flip the decision on the given image that's the problem and this is what they discovered that applying these imperceptible perturbations to the images was enough to fool the classifiers and it had a great success because the point was exactly that you can fool the deep net but the image for a human remains more or less the same set of misconceptions about the security of machine learning which has nothing to do with the fact that a human can still recognize the original sample or not, that's not the point so if you have a self-driving car that is driving and at some point you have a strange image and the car recognizes this image in the wrong way it doesn't matter if the image is still imperceptibly modified to the human eye that's not the point and it's wrongly classified that's enough to fool the car it's not supervised when it works but we will discuss this more tomorrow so that was the paper and of course after this paper a lot of people started doing research on these topics and so you had again an arms race between people trying to propose a defense to mitigate this problem and other people other researchers devising attacks to show that these defenses were ineffective and so on and so forth just to mention some of the popular attacks that you also probably know there's one which is called fast gradient sign method there's Jacobian salience map attack there's the carlinian Wagner attack there is a bunch of attacks but they are all based on the same idea so look at the gradients of the classification function and then this is just variants on this theme so if you change the way you optimize the point but the main idea is always to create a noise which depends on the gradients of the classifier so that's how you trick them and this is a slide I took from a recent talk by David Evans because he has this nice estimation of the number of papers that you have on this subject in 2013 these intriguing properties of neural networks papers started in some sense the hype on this research area and as you can see now this year we had 700 papers on adversarial examples and the expectation the expected number for papers up to May this year I mean this is the time where we made the prediction but the expectation the number of expected papers in 2018 is more than a thousand so you can imagine I can tell you I'm quite updated on the topic but I cannot keep pace with this this is just crazy and so there is a huge interest in trying to address this problem and of course there are also some misconception points that we will be going to discuss tomorrow at the end of the lecture so there is maybe another point we can address is again why this perturbation are so small so why it's enough to add so small perturbations in the case of images and in the case of deep learning and again here you can do an analysis but the problem is again a problem of sensitivity of the function you have a point you perturb along the gradient direction so it means that the function is too sensitive to such input changes and this is again a problem of course there is also mathematical explanation for that and of course when you measure the sensitivity of a function so the function is more sensitive if the norm of the gradient is larger so if you have a very large problem with many input dimensions the norm of the gradient just grows because you increase the number of dimensions so that's why in problems where you deal with image classification starting from pixels the function that you learn is very unstable it's just because you have a lot of input dimensions and then in some sense the attacker is very it's not very constrained because he can manipulate all the pixels even by a small amount but that's enough to reach his goal and as it was observed before if you use regularization you can somehow mitigate the problem because regularization makes the learning function smoother so the classification function becomes smoother and then you expect that you should do more modifications to evade detection to mislead the classifier and this is exactly what happens here so this is a simple example where you have a very strongly regularized SVM where C is the parameter which is the inverse of the regularization so the lower it is the higher the regularization is for the SVM so this is a very smooth function and this is a very non-smooth function and now you cannot see it very well but in this case you should be able to also this is the sample which is misclassified and perturb with the minimum distance attack and this is misclassified and here you can see that you can probably see it but there is some noise here that one can see so this can be visually perceived and in this case instead you cannot see any noise so this is very imperceptible and so it means that if you have higher regularization the attacker has to do more manipulation to the data to evade detection which in turns means more regularization means higher security level the attacker has to do more manipulation on the point but as I told you we will see how to match the regularization factor to the noise and that's gonna be very interesting ok yeah, I think in 10 minutes we can also cover this part if you're not very tired ok, so we did a case study of these attacks so event attacks against a case in robot vision and the purpose of this so far we just dealt with two classes we had classifiers that distinguish between legitimate and malicious inputs or anyway a couple of digits no more than two classes I will use this to see how you can generalize the attacker goal when you have more classes because the attacker can do different things when you have more classes and in this case we just did some experiments on this on the vision system used by this robot which is ICUB this is a robot used by the Italian Institute of Technology and basically it's able to recognize some objects which I will show you in a minute and how it does that is by essentially extracting some features from images so it acquires the image that you put in front of the robot and then it uses AlexNet to extract some features from these images so you look at the values that you have in the penultimate layer of this network and you use that as your representation space so that's your feature vector representation of the image on top of that they trained multi-class classifier combining different linear classifiers now they decided to use linear classifiers because all these things should be embedded within the robot and you have to be able to efficiently update the model so as soon as you present new objects to the robot you want to be able to retrain the classifiers to recognize also this new category of objects and you can do that you can do incremental learning in a way with linear classifiers with nonlinear ones it's more complicated but for linear ones it's easy so they just decided to use linear models in the last stage so that's how it looks like so you have a representation in the deep space and then you classify new samples to the given classes k that you have and this robot is able to recognize among these 28 classes of objects you have 7 main classes which are displayed here as columns so you have laundry detergents, plates and so on and so forth and for each of these main class you have 4 subclasses so 4 different kinds of objects belonging to the same class but in total you have 28 different categories for your classifier so the robot has to be able to recognize even these blue detergent from the red one so that's the data set and now the thing is given that we have more classes let's say I have a detergent and I want to have it misclassified as something else of course there are different scenarios here as an attacker I may just be interested in having these detergent misclassified as any object as any other object and this is the error generic attack but as an attacker I may also be interested to have the detergent being misclassified in a specific class so if I say my detergent has to be misclassified as a cup I can also enforce that during the attack phase so that's called the error specific attack so you have these 2 different settings when you deal with multi class problems just because the definition of error is not unique you can have an error in different classes and then slightly adjust the optimization problem that we had for the 2 class case accounting that now you have more than one class and so in the original case sorry we aim to minimize just the value of the correct prediction ok so I had some malware sample I just want to decrease the value of this prediction the prediction of malware and at the same time the value of the correct prediction and you want to increase the value of the competing class so the support for the competing class in this case you would like to have a blue sample misclassified as any class and in this case what happens is that it will be misclassified as the closest class in virtual space just because it's the easiest thing to achieve for the attacker so in the case where you do not specify and you have that the blue is misclassified as the closest class that you have in virtual space if you instead want to specify the target class you have to flip the role of these classes FK is now the target class and I want to maximize this value while I am reducing the value of the competing class in this case the competing class will be the blue one and therefore you reduce the support for the blue while increasing the support for the green and then the point goes from the blue class to the green class that's the target case it's under the same formulation just some different you change the definition of omega and you maximize instead of minimizing but the content is the same so the approach is more or less the same again you can solve that by using gradients so you can take the gradient of the output layer and then you can back propagate this gradient up to the input layer this can be done with this standard back propagation in neural nets it's like training, it's the same thing you can use automatic differentiation to propagate back the gradient and you get the gradient that you need in the input space and this is an example so that's a detergent misclassified as a cup here you can see some green pixels which are manipulated to have these misclassification you can see that because they are just saturating the green value and this is the distance with the original image that's just an example and what you can do with that formulation since in this previous case we are manipulating all the pixels so we are manipulating pixels belonging to the foreground object and also to the background image in the digital domain now of course you cannot do that in practice because in practice you may only manipulate the pixels that belong to the foreground object and so you can do that you can enforce that by essentially using a simple box constraint so you lock the values of all the pixels outside this region and you allow the attack only to optimize these pixels and in this case optimize this small rectangle there you can imagine that you can print a sticker and put it on the detergent to have it misclassified so this is what we call the sticker attack yeah please can you say it louder please no no no it's random we just put the sticker here because we want to replace the original label but you can put it everywhere and then we will see more examples on this kind of thread ok I think I'm gonna stop here we will keep going tomorrow with defenses against these attacks and then I will show you different kinds of attacks also ok so that's all for today if you have questions please