 So, today we're going to be talking about machine learning. No one has not heard of machine learning, right? So, in specific, what we're going to be talking about is how to bypass deep learning systems. Machine duping is what I'm choosing to call it. It's a really corny name, so if anyone has better ideas come to me after the talk and we can talk about it. So, everyone has heard of machine learning. My grandma knows what machine learning is. At least if you've been reading the news, you know that the best go player in the world is now a deep learning system. So, that's kind of weird. No one saw that coming in the last five years. But, let's not have a slide on what machine learning is and what machine learning can do for us. That's something that's for a presentation two years ago. Let's instead look at what we're trying to achieve in this talk. So, everyone wants to be the flying carpet beagle in the middle of the Venn diagram there. Most of us in this room have some sort of hacking skills, people hacking or computer hacking. We do stuff. We're implementers. So, I'd say we're in that category over there, hackers. Security researchers have some kind of theoretical background. If they're working on crypto and stuff then they have some kind of math and stats skills. I don't necessarily consider myself that. Data scientists don't have any hacking skills but they do a lot of stuff in companies trying to maximize conversion. They have math and stats skills. But what we're trying to do today is to help all of you guys and convince all of you guys that you really want to be in the center of the Venn diagram and it's going to be important and increasingly important in the next few years to brush up on math and stats skills and to know about what's going on in the machine learning space, especially for security folks. So, whether we know it or not, we are interacting with machine learning and deep learning systems on a day to day basis. If you don't like it, you have no choice. Google now, Apple Siri, Amazon, Alexa, they're all things that have been covered by the press, very high profile things. Some common, some more common use cases of deep learning are in object recognition, in self driving cars if you know Joe Hart and his cool comma dot A.I. startup. They're using deep learning to recognize objects on the road and build a really cheap self driving car system. Obviously, on the top right hand corner you see AlphaGo beating the world champion at go. And you have also other pretty interesting use cases like in medical research where they're using deep learning to predict the effects of newly developed drugs in patients. And I just have an example from Nvidia there. Also in the security space, there's also lots of stuff. If you have ever set foot at RSA or Black Hat Expo, you know what I'm talking about. Everyone is saying that using machine learning, deep learning to do stuff. The extent to which that is true, I cannot vouch for. So why would someone choose to use deep learning? Again, forgive me if you're an expert at deep learning. This is a DC 101 track, so I'm going to be spending a little bit of time going through some basics of deep learning and machine learning. And then I'll go into the interesting stuff, which is my research. Why would someone choose to use deep learning over more traditional machine learning methods like SVMs or linear classifiers, clustering algorithms? The first thing is that when you use a deep learning algorithm, you get some things for free that you otherwise would have to spend a lot of time doing. The most important thing that everyone would point out to you is that you get feature engineering for free. Again, the extent to which that is true, you have to try it out and implement it and it depends on the use cases that you're using it for. Deep learning helps to select the best features and you don't necessarily have to spend a lot of time doing feature engineering. If you talk to any data scientist or machine learning engineer working in a large company, then they'll tell you that most of your time is not actually spent on algorithms. They're not trying to increase the efficacy of this so and cutting edge algorithm that they're using in their company's product recommender systems. What they're actually doing is feature engineering and data cleaning. They're like janitors. I'm a janitor too. I spend most of my time cleaning data. I spend most of my time doing data feature engineering. So deep learning gives you that for free and that's why it's so appealing I think. The other thing which I think is the main difference between deep learning and other kinds of machine learning algorithms is that it touts the promise of one infrastructure for multiple problems. So if you think about it, deep learning really is just one infrastructure. There's multiple layers of linear units and each one of these linear units interact in a different way with different linear functions to give you the result that you want to learn different things that you wanted to learn. Compared with other kinds of machine learning algorithms like clustering you and SVMs and logistic regression decision trees, all of these require vastly different code bases. Whereas for deep learning, the differences in infrastructure are parameterized into the number of different layers, the number of units in each layer, the functions between each layer and other things like that. So it's sort of one infrastructure for multiple problems and I think that's what gives its flexibility. The last two points are perhaps more relevant in today where there's so much data to deal with. Deep learning allows you to do hierarchical learning. So you can split up the task of learning across different layers and we'll see an example of that later. So for example, you have more shallow layers learning vastly different things from the deeper layers and you can extract out the outputs of each intermediate layer to exactly find the thing that you're looking for. The last thing is it's efficient and it's easily distributed and parallelized. If you're looking at algorithms like clustering, there's no straightforward way to really distribute it across systems and when you're dealing with terabytes, petabytes of data, this is a problem. Of course, there's message passing algorithms that help clustering out with that but they're a lot more complex. So deep learning allows you to distribute these problems up, distribute your infrastructure up and distribute the computation. Of course, it's definitely not one size fit all. Deep learning is not something that you want to use for any random problem if you're trying to predict let's say the prices of oranges against time you wouldn't be using a deep learning infrastructure that's like, you know, using deep learning to, you know, predict a problem space of two dimensions, you wouldn't do that. What you would use deep learning for is in problem spaces that have multiple dimensions. So like hundreds or thousands of dimensions, usually these things come from nature. So you can think of images, audio, video and prediction problems that are in a very complex problem space. So let's just spend a couple of minutes to go through the two steps involved in training a deep learning architecture, a deep learning infrastructure. So this is a diagram that a lot of you may have seen before. It's a four, five, three neural net architecture. It's a very simplified version of it. So ignore all the nasty white lines between those units. Each circle represents a linear unit and there's activation function in it. Activation function just means that if you input a certain value into this circle, then it either outputs a certain real numbered value according to the function. And each connection between two units between layers is weighted by the weight W and also there's a bias unit. What's the purpose of the bias unit? It's simply to help to skew the results of the output by a certain amount. If you can think of just linear algebra, you have Y equals to AX plus B. This bias unit is equivalent to B, which controls the intersection between the X axis and the Y axis. So all this is just theory. Don't worry about it if you just like take any one of the 10,000 MOOCs out there to learn about deep learning. But let's just look at a simple example of how training works so we can go into how to bypass them. So the first step is the feed forward step. Each unit receives the output of neurons in the previous layer. In the case of the first layer, it just receives the input. And then the bias added to it, the output of the first unit is weighted by W1 and then it goes on and on. And eventually when it comes to the output layer, it outputs logits which are just numbers, which is an array of numbers. It's fed through most of the time a softmax function for classifiers, which just scales it into a probability distribution function. And so in this particular case, you see that this dummy matrix, this dummy vector that's output is 0.34, 0.57, 0.09. And this just means that if you have three classes, zero, one, and two, this classifier predicted that class one is the predicted class. However, that is wrong in our case. So according to the labels because this is training, so you know the labels. So the error is 0.57 because the ideal case would be that you predict with probability one that the output is actually zero. So you feed it backwards. And that propagation is really the crux of deep learning. So what back propagation is, it's a pretty, well it's not the most straightforward algorithm that you'll come across in machine learning. And when you're taking some MOOCs like the Stanford one on machine learning by Andrew Ng, he'll also say that back propagation is a pretty hard thing to grasp. I think the easiest way to explain it, the easiest way to think about it is that back propagation is just assigning blame. So let's say you were the head of the board of directors and you had a bunch of people on the board that were giving you suggestions that were advising you and stuff. And some of them just talk bullshit all the time. You want to listen to them less. So that's what this is doing. So basically you're feeding it input data through a deep learning network. And you're trying to figure out what gives you wrong answers. And you do that by sending in input data and then seeing when the answer is wrong, you know the answer. You know the right answer. And you're seeing what answer is wrong. And then you trace the path that this answer took through the network. And you find out exactly which units are responsible. And how much they're responsible because of their weights. So let's say you find a particular path of units is particularly bad at giving you recommendations or predictions for something that you're trying to learn. Then you just decrease their weights and you block them out. So all of this can be optimized with certain algorithms like stochastic gradient descent, which is just basically trying to find the local minima in a particular problem space. So there's going to be lots of demos in this talk. So if you have been ignoring me for the last 10 minutes, you can look up now. Okay, is this big enough? Let's make it a little bigger. Okay, so what we're going to look at here is an example of a deep learning system that is really accessible to everyone. This is TensorFlow. It's Google's open source deep learning machine intelligence framework. I think it's probably the easiest deep learning framework to use. And what we're going to be walking through is a small example of how to use deep learning to solve the easiest tasks, the task that's used most commonly in most tutorials and examples. And then we'll look at how to bypass that. So TensorFlow. Okay. So what we're trying to do is use the MNIST database, which is created 20 years ago, I think, by Yan Lukun, who is now the Facebook AI director. And we'll be having some visualizations done with this pretty cool visualization tool. So let's look at the code a little bit. What this is doing is basically taking in the training data and the testing data and labels. And then you're creating a validation set. All this is just sugar. And then here is the actual definition of the model. This is the model. And the layers are defined line by line. You have the first convolution layer, then you have a pooling layer, convolution two, pool two. And so it just goes down from top to bottom. And you get from shallow layers to deeper layers as you go down. And as you see the logits and the softmax function actually creates the output of this neural net. So this is a demo of MNIST classification. What MNIST is, is just digits, handwritten digits. It looks something like this. So these may not look like digits that you or I would write. Like, I would never write a seven with a weird thing at the end. But, you know, what can you do? You can always train a model on different handwritten data sets. But this is a standard and it's used for comparison between researchers and in academia. So let's see here. This is a real time classifier predictor for digits. So if I would write like a two here, let's see if it predicts it correctly. Oops. Okay, two. Even though there is a weird thing. So it's pretty good. Let's do like seven. Oh, seven. That's great. Only point seven accuracy. I mean confidence. Let's try something that's a little bit more challenging for the model. Like a six. Okay, so this, this is a wrong, this is a wrong classification because it thinks it's a five with point eight, eight, six confidence. See if we can make it better. Nope. Never mind. So you can see this is, this is not the cutting edge in handwritten digit recognition. Like if, if I think this, this particular, this particular implementation represents, correctly classifies 90% of digits. So that's nowhere close to what a human can do, right? I mean, if you or I recognize nine out of 10 digits then you should see an eye doctor. But anyway, that's that. And let's continue. So what that was doing, it was using convolutions and this is what you call a convolutional neural network. So these are just different flavors of neural networks. Just different algorithms that researchers publish papers on to get tenure and stuff. But this is a really cool algorithm that was developed 20 years ago, I think. And what this does is that it uses convolutions to gather insights on different details, different levels of details in an image or in stuff that have adjacency relationships. So what are convolutions? If you remember your Fourier transform days from school or not, convolutions are just filters. So if you apply a filter, a convolution on a matrix, let's say in this case a 2D matrix, then what you have is you do a matrix multiplication and then you end up with a single value for the convolution applied on the particular space that you're applying the convolution on. And so what this allows you to do is layered learning. This is an example for facial recognition, where the shallow layers are actually at the bottom here and then you're going up as you go deeper into the network. In layer one, you're learning very, very, very fine features like the shape of your eyebrows, the shape of your eyelashes and maybe the wrinkles in your face. As you go further up, I would say go further down the network, then because of the convolutions and pooling, then you actually get more zoomed out features like the shape of your eyes, the color of your eyes, whether you have like a mole somewhere or something. And at higher layers, then you get the shape of your face, more general characteristics of you, like if you have a mustache or not. So this is interesting because you can extract the results out of intermediate layers to do certain things, like I was at a talk by Facebook security team once and they say that in order to find images that are spammy. So when spammers try to spam their network with images, they often tweak the images a little bit or change the language or text in certain ways that you can't just do a pixel by pixel comparison. So there are certain ways that you can solve this by shingling or by doing some kind of fuzzy matching. But the most efficient way they found and the most effective way was actually to pass these images through a neural network and then get the second layer out and compare the outputs of the second layer. Then they were able to reliably find images that were spammy and then group these images together and then have a human actor come in and see whether this is actually a spammy image or not. So this is just a diagram of the convolutional neural network that was used to classify the digits. Basically, if you look at the small squares that are zoomed in, one part of the digit is fed into feature maps and then you do sub-sampling on that and then you perform convolutions, you perform more sub-sampling and then you do a prediction on that. So besides convolutional neural networks, there are also things that are called recurrent neural networks. These are slightly more complicated but not actually that complicated. What you have is just recursion in the neural network. So instead of feeding in input one way through a neural net, you have recursion in it. So the output of each time step and each intermediate output actually gets fed into the next time step. So this introduces the concept of memory. And memory is important because when you learn things, you don't necessarily learn things one frame at a time. That'll be really weird. And this allows you to learn things like music or things that like audio or video which have some kind of relationship between frames and relationship between inputs. And so this is an example of a generative recurrent neural network. You can teach a network to spell words. And so in this case, if you see that you already have the letters Y-O-L, then O is likely to be the next word because of the memory of the network having Y-O-L in its buffer. So there's also stuff that's more on the cutting edge. This was actually used to a certain extent in AlphaGo long short-term memory. So when you're looking at things with slightly more context, but you don't want to scale the depth of the network or the depth of recursion indefinitely, then you want to use things like LSTM's long short-term memory networks because this allows you the concept to arbitrate longer term concepts, arbitrate longer term data to store for a longer time and not just have a single 50Q that you store your memory in because that's not how we learn. So to make good predictions, we need more context and you can think of things like a system that converses with you. When you're talking with somebody and someone mentions that he's from, let's say, France five minutes ago in a conversation, you don't just forget that after five minutes because your memory buffer is full. You have to remember things like that. And so the beauty of LSTM networks is that they have certain gated functions in the recurrence of the network that allows you, that allows the network to learn what's important to remember for a longer period of time and what's not. So this is just a simple diagram of how deep learning has helped with speech recognition over time. The different lines in this diagram, this is the y-axis is longer if made by the way. So the different colored lines represent different data sets. The holy grail is definitely the red line. It's conversational speech. So a very, very sterile kind of speech is red speech or broadcast speech where every word is enunciated to the nail. And all of these over the years have seen pretty good performance. You see up to 2% word error rate, which is W-E-R, word error rate performance in air travel planning kiosk speech was a pretty weird data set. And then for conversational speech you see in 2011 it went down all the way to about 20%. So that's great. By the way, the conversational speech data set, if you ever have a chance of finding it or looking at it, it's not out in the public, but it's actually pretty weird. It's actually from blind dates. So when I was listening to it, it was very interesting how some of these dates turn out. But yeah, there's some disturbing stuff in there. So okay, now for the fun stuff. How to pwn. Okay, so that's a short video. This is Gunter. He's a dog. He's a mini schnauzer and he's my best friend's dog. So let's just see how this analogy pans out. See this video plays. Okay, cool. So this is Gunter. He loves ice cubes. I'm not sure if all dogs love ice cubes. He thinks it's alive. So this is a training phase where I'm basically teaching him that the clink-clink sound in the bowl represents that there's an ice cube in there. Sorry. The clink-clink sound in the bowl represents that there's an ice cube in there and he knows he's going to get an ice cube. And I don't actually throw it. I throw something else. That's not an ice cube. And he's confused. The model's bypass. So Gunter is the model now. And this is kind of like a lame analogy I draw stuff I'm doing. Forgive me. But what we see here is that the dog doesn't know that I'm throwing the ice cube. He just thinks that I'm throwing the ice cube because he sees like he doesn't have much smell and he doesn't have a great eyesight so he sees me throwing something and I've done it a few times before. So he thinks that an ice cube is there at the other end of the yard waiting for him. So that's very similar to what we're going to be doing. We're going to be feeding in images to classifiers that will mislead these classifiers. And these images are crafted in a certain way that will mislead certain classifiers from being mislead more than others. But first let's look at the attack taxonomy so we can look at the attack vectors and stuff like that in security speak. There are two general kinds of attacks that you can do on these machine learning or deep learning systems. You can have causative attacks or exploratory attacks. Causative attacks are relevant when you have access to the training step. So for example, I actually do rely on some kind of online reinforcement learning when you see that something is a really, really bad translation which happens pretty often, I'm sorry. But when you see something is really bad, you can report it as really bad and you can maybe give the right answer or something that's more relevant. And so this is how it does reinforcement learning. In Gmail when you receive email that's not marked through that because that actually helps a lot with the reinforcement learning model. And this will help to train the model to recognize such examples as spam in the future. So you can see how you can influence a model in such a way. That's a little less interesting, that's what I did last year in talks. But what's more interesting I think are exploratory attacks. So in this attack model you have no access to the model, you have no access to the model. And then what you're doing is just a black box attack in some scenarios where you're feeding it just weirdly crafted samples that look correct to a human. But the machine just gives you a wrong result. And so that's really, that really throws some people off even from machine learning researchers because they think that the deep learning model or the machine learning model is learning just by looking at the pixel or the angle between the horizontal line and the slanted line in an A. We learn it in a more general way and I think it's still active research area into how to represent these things in machines better. So this is still, the dog keeps coming up. This is still active research area and that throws people off. So it's also targeted and indiscriminate attacks when you try to move the discussion boundary in a certain way to cause an intentional misclassification or indiscriminate attack when you just try to decrease the integrity of these classifiers. So this is a simple example of misclassification. So in the early days MNIST and digit recognition used to be used in recognizing digits in checks and I know these digits don't look very realistic but I'm using stuff from the MNIST data set and what I did beforehand was to generate some adversarial images and then fill in two copies of the checks. One with normal images and one with adversarial images. So if we just go back a bit and look at this this is the adversarial one and this is the normal one. They look pretty identical. Just look at the digits portion. So it's 9378 and this is some simple code to use for pre-train MNIST model with the standard TensorFlow MNIST example train with that. It takes about four hours to train this model which is, trust me, really good speed using a CPU in the deep learning world. You see something that takes longer time to train later and so what we're going to do is to just basically read the check. This just divides the image into a pixel matrices and then it reads digits. So it's loading the model it's predicting the digits. 9378 00, 9378.00. So that's correct. And so if you look at that that's what it is. Okay. And now we're going to do the same for the adversarial image. Okay so you expect this to be the same but no you actually expect this to be different because it's called adversarial. So the output of this is something totally different. 0524.84 So it looks the same but it gives you a different output using the same model using the same code and that's adversarial machine learning. This is something a little bit different. It's the CIFAR 10 dataset. So CIFAR is a dataset of images. So what you're trying to train the model to do is to recognize images in 10 classes. There's CIFAR 10 which is for 10 classes of images. There's CIFAR 100 which is for 100 classes of images. The classes are here. They are like you have things like airplanes, automobiles, birds, cats, deer. The interesting thing is that these images are not high resolution images. You don't take them with your phone. They're 32 by 32 pixels large. So this is the actual size of it. Actually maybe bigger. And so we're looking at two sets of images here. Dog and automobile. Just chose them because dog is cute. So what we see here is that well, if you see the preview window on the right here, that's actually preview on Mac. It's not 32 by 32. I think Mac OS does some aliasing so your pictures don't look like shit. And what we're doing here is to eval this and what this evaluator does very similar to the MNIST classifier is that it classifies this image into a class. So it tells you whether this image is of a dog or something else. So you see in the F10 image it classifies it as a ship. In F1 it classifies it as a ship as well. So you can see that the images are pretty similar. But there's actually differences in them. Let's classify the automobile one just for completion. Automobile. Okay, this is what it looks like. It's a pretty shitty image. Automobile. Okay, that's correct. And what the different numbers after add means is just to what degree they are perturbed. So to what degree we're injecting stuff in the image to make the classifier. That's not what it is. So you see it becomes a little bit more grainy. But to a human it still looks like a car. I mean you wouldn't say that that's a cat. Or you're weird. Yeah? So let's look at exactly what differences exist between these images. Let's open a Python shell. SciPy. Let's read these images in and then look at what the difference is between them. Python libraries actually don't read PNGs or don't write PNGs exactly pixel for pixels. So if you're trying this at home use these libraries. So this is the standard representation of the PNG. It's just a 32 by 32 pixel representation. There's three channels R, G and B. And the value of each pixel can be between 0 and 255 So we're reading the adversarial image now and we can see that let's print it out. We can see that the numbers are slightly different from before. Let's print out exactly what's different. Let's look at the size first. You can see that the shape of these two images are the same 32, 32 by 3. And let's calculate the differences between the normal image converted to N64 to prevent any overflows. It just is boring you out just like Zona for a moment and I'll wake you up. Let's print this out. The differences between these images are between 1 and 2 pixels. You or I wouldn't be able to detect that. But the classifier learns things differently. And there's a calculated difference between these two images. Let's save this. I was lazy to redo the demos there. Save the image and let's actually look at it. This is the difference between the two images. So if you add the normal image and this noise vector and this noise image you actually get this. Yeah. Let's look at the differences between adversarial 1 and adversarial 10. You would expect that f10 has larger perturbations. More typos. I have to calculate f10 first. Okay, so you see now that instead of having perturbations of length 1 or 2 then now you have larger perturbations. Same for the automobile. So why can we do this? Basically it's an open research problem and not everyone in the research community agrees on why you can do this. But mainly it's the concept of blind spots. Machine learning models learn in ways that are vastly different from how humans learn. There's this concept of the data manifold which is the mapping between the input and output. And there are gaps, there are pockets in the data manifold that allow you to do such things. So how do you generate images like that? How did I generate images like that? The intuitions are just three steps. The first thing is you have to run the input through a classifier model. Then based on model prediction, derive a tensor, a vector, a matrix that maximizes the transverse of misclassification. You can do this in three methods that have some magnitude resulting in perturbation, which is the noisy image that you add to your original image and then use it as an adversarial image. So that will result in an image that tricks classifiers but not humans. And obviously if you scale the perturbation tensor by a larger magnitude, you have a higher chance of tricking classifiers but then you also have a higher chance of the human detecting that this looks weird. So a couple of methods, you basically have to find blind spots in input space. There's also optimizations for that that help you to do this more efficiently and do this in a period of seconds instead of hours. And then there's also better optimizations that allow you to look at how much each particular pixel actually affects the output. This is called the saliency map and I think it's really cool. And you only change pixel values by a smaller amount than the other pixels that don't affect the output as much. So you can affect the output more without changing pixels as much. So we'll look at the threat model. The more you know about the model, of course, the better you can do against something like this. If you know a lot about the architecture, if you know the training tools used, even the framework or library used, then you can simply use that and train the model and generate some adversarial images and you'll be good. So that's easy. The hard thing is when you have only labeled test samples, let's say you're dealing with an online service like Amazon Machine Learning as a service or other machine learning as a service startups and you want it to induce some kind of misclassification on them. Then that's a bit harder, but you can still do pretty well. So you can do a lot with limited knowledge, you can make good guesses about the technology from the task. For example, if it's image classification, digit recognition, you can use something like convolutional neural net speech recognition, then you use something like recurrent neural networks. And if it's general service as like machine learning as a service, then you would use a shallow network because these networks have to be easily generalizable. So what if you can't guess? Can you still do anything? So this is a small example that we're going to be doing is reading captras with Captra Crusher and testing them on captras that I'm generating of cool PHP captra. So let's just generate some captras here. Okay, this is the evaluation model. It just reads in the samples and then tests them one by one and gives a prediction. It prints out the actual label and the predicted label so you can compare them on the command line and then precision at one just means the top prediction, so the top confidence for the prediction. So usually when you run such classifiers they'll give you a ranking of we think this is the most likely and this is the second most likely so precision at one just means just compare the precision for the top most likely. Generate some captras with cool PHP captra. Generate is captras that are pretty similar to what you see out there. You can train the model to work better for different kinds of captras, different kinds of perturbations but I have problems reading some of these so I think that qualifies as a good captra according to what I'm seeing in the web now. Okay, so they're just random captras. Let's generate some new ones to use for our test. PHP captra again. Okay, then let's run it. This is the training of the model. Training deep learning models take a pretty long time. This is an output I don't want to bore you for 30 hours but you can see that I started training this model at on July 12th, 5 53 AM and it completed completed July 13th, 9 AM so that's about 30 hours. Okay, so it does pretty well. Model accuracy now 8.2%. So this is like no humans involved. It's just reading captras, reading the images and then it's learning how to read them. So you know like that death by captra, these services use humans and interesting idea is to use this and make some money with a server farm so you can make money solving captras by using this tool. Okay, so you see it makes correct predictions, skip forward a bit. Okay, so in this case there's only 10 samples so it predicts all of them correctly. You can see that for the last, for the first sample it actually read IACTGB and let's look at the actual image. Let's look at the last image. Okay, no. Okay, so let's look at tricking this model. The interesting thing is that you can generate adversarial samples for these models with live model. It's just edit of the tool and it's just a walkthrough of the code. It's online. And then let's test it out and you can see that now it's predicting something very different. So if let's say you were facing some kind of captra solving tools problems on your website then you maybe want to use something like this if you suspect that someone is using deep learning to bypass captras on your site then this would be an interesting thing to do where you would predictably break these tools and decrease their accuracy by a lot. So this is an interesting cat and mouse game because the deep learning models that are used to bypass your captra can just take these new images and train them and this is how you would make your models more robust. You would take these adversarial images and train it to make them perform better against adversarial images. So why can we do this? Two things. Transferability is the first one. So the adversarial samples that fool a particular model have a good chance of fooling some other model. So even if it's a vastly different architecture let's say you're training your model using a decision tree and you see that you can bypass this model using adversarial samples generated with a deep learning network with a pretty good, with a 79.31% accuracy. 79.31% chance. So this is kind of weird. It's still an open research problem. The second thing is that substitute models you can always just use substitute models to train the target model and generate adversarial samples with this model, substitute model and then you can use them on these networks. Open research problem. So what this means for us is that deep learning algorithms are susceptible to manipulative attacks and you shouldn't make false assumptions about what the model learns. You should always evaluate the model's resilience and not just its accuracy and these are some ways that you can use to make your models more robust and I'm introducing this framework today called deep learning and it's just this. There's a GitHub page which you can find deep learning it allows you to generate adversarial samples for arbitrary models and use this to test your network. So please play with it and contribute. This is important because more critical systems rely on machine learning and thus there's more importance on ensuring a robustness of the model with both statistical and security skill sets to evaluate these systems and to actually know when someone is trying to bypass and know how to protect it. So in other words, learn it or become irrelevant. That's it. Thank you.