 Hi, so as he said, my name is adversarial. My real name is Ariel Herbert-Boss. I'm a PhD student at Harvard, and this is related to research that I do as part of my program. So I'm going to hop into that now. So what this talk is about, I probably should have called it beyond adversarial examples. So adversarial examples are an important attack vector, but they're also not the only one to be concerned about. So we're going to go into the weeds a little bit about some of these other types of attacks that you can do against machine learning models. And also, this machine learning talk does not really have a lot of math in it. There's one slide that has an equation on it, and the rest of it is pretty high level. So if you're not, if you're still a noob, that's fine. Like, don't leave. So here's a rundown on the ML pipeline if you're not familiar with that. So say you've got some raw data that you want to get some value out of. So you will typically do some processing to extract some features. And then after that, then you'll split the data into training and testing, because machine learning is basically a noisy optimization problem where you iterate over a bunch of parameters, and you're trying to look for a function that best represents whatever relationship in the data that you're looking for. And this is also why deep learning and a bunch of these other techniques are so popular is because you can actually prove mathematically that deep neural networks and some of the other models are universal function approximators, which means they can basically learn anything. But that also means that we can force them to learn the wrong thing. So after you do all this training and testing, you'll pick out the best model to fit the needs for whatever it is you're hoping to get out of your data. And then you'll usually either deploy it in production as like an API where users can put in their own data and then get out some predictions, depending on what your user is looking for, if you are the user. So when we attack machine learning systems, generally there are two high level goals. We're looking to either force incorrect predictions or we're looking to steal data. So forcing incorrect predictions is like you want to attack this model and make it give you the wrong answer for things or you wanna see whatever the model trained on because it might be trained on something sensitive like medical records or personal identification information. And so these machine learning models tend to have three components to them. So there's the data, there's the model itself and there's the prediction. And so you can usually formulate attacks as being one of those, attacking one of those three things or a couple of those three things. And so I'm gonna talk a little bit about adversarial examples. And you've probably heard about these. But the main idea is that you wanna slightly perturb the input data that you're feeding in to force the model to make an incorrect prediction. And they're also very hard to detect. So here's an example where you've got a panda picture and then you add some noise to it and then you end up with this panda that's gonna be classified as a gibbon. But if you look at it, you can't really tell because it's just statistical noise and it's hard to perceive the difference. And depending on how you've deployed your ML model, like you can't really sanitize the inputs because somebody can just come along and just keep making harder and harder to predict versions of these adversarial examples. And I guess to make this more clear, here's an example from the perceptual system. So we've got a training set of dogs and fried chicken and you wanna quickly classify which one is which. So if you look at them, you can see that some of them are actually a bit harder to tell the difference, especially if you're sitting in the back and it's like really fuzzy. Let's see, so the two on the left are chicken and the two on the right are dogs. And so there's, if you think about this, there's a couple of features that you're looking at that tell you if one is a dog. Like you might be looking for eyes and a nose and with chicken you might be looking at the environment like is there a piece of lettuce because dogs don't really like lettuce. So this picture, this graph on the right gives you more of a mathematical sense of how these things work. So you've got this, over right here is the dog cluster where all the images inside there are gonna be classified as a dog. And you've got this adversarial image on the outside that you want to have classified as a dog but it's currently being classified as a chicken. So you want to kind of perturb your image in such a way that it kind of goes around the boundary between dog and chicken. And so when it comes to generating these adversarial examples, there's like a million different ways to construct them and we keep coming up with more of them and it's great. And there are also some ways to defend against them but the ways in which we defend against are kind of getting smaller and smaller. Like this last year, actually in June, there's a big machine learning conference called ICML and they just published this great paper about why one of the main methods for defending against these things doesn't work. Like it has to do with the generation method being, so since deep learning is basically convex optimization and you're trying to find examples that fuzz the boundary between categories, it usually involves a bunch of gradients and so most of these methods have to do with gradients and it's a bit of a gradient arm race right now. So we're not gonna talk much about adversarial examples because it's very messy and also I think there are more important attacks. This is one of my favorite memes and I thought I would share it. You probably can't read it in the back but it's one of those Scooby-Doo memes and it says, oh shit, I can't read because I don't like glasses, but it says, can somebody read that for me? Okay, okay, come up, come here. Thank you. Okay, gang, let's see what deep learning really is. What's that word I can't, it's convex optimization. It's blurry for me too. All right, all right, thank you. Yeah, so deep learning is basically just convex optimization and that's why adversarial examples are very hard to deal with. So today we're gonna talk about some other ways to hack things but first we're gonna talk about differential privacy. So differential privacy is a technique from cryptography that is usually used to protect data from being exfiltrated and in fact, Google and Apple have both made a big show recently about using it in a variety of their machine learning products. So this is the slide with the only equation on it and you're gonna stare at that for like a minute, just kidding, not a minute. Here's the informal version of what that is. So what differential privacy says is that you've got two otherwise identical databases, D1 and D2, one with your information in it and one without and so differential privacy ensures that the probability that a stochastic algorithm, A, and in this case stochastic algorithm can be a machine learning algorithm, will produce a given result C, like a classification, the same whether or not it's, you're operating that algorithm on D1 or D2 up to a particular bound Epsilon. And so the key insight here is that adding or removing an element from the data doesn't really change the output prediction very much and what differential privacy does is it provides a bound on how much changing one piece of data affects the output. And so how do we actually do this? Cause that's like a bunch of math and it's not immediately clear how you apply this. So there's two parts in when you're actually applying this kind of technique to something. So the first thing you wanna do is you wanna add noise and the second is you wanna keep track of how many data access requests are granted. So adding noise allows us to obfuscate the predictions in such a way that they're still useful but don't provide enough information to really allow an adversary to reconstruct the information that you're pulling out. And then with regards to keeping track of how many data access requests, the more information that you ask about the data, I mean, obviously that means the attacker can learn more about it. And that means that we have to add more noise to minimize the privacy leakage. And there comes a point where you've just leaked too much data and no matter how much noise you add to it, you can add so much noise that it no longer is really that useful and you're just spitting out a bunch of useless predictions. So at that point, your best option is actually just to burn the database down and start over. So don't do that. And so the key to implementing differential privacy is to pick the proper privacy budget. And so when I say privacy budget, I mean the notion of counting how many requests that you're making on the data. And so if you have too high of a privacy budget, then you're gonna leak too much data. But if you have too low, then your predictions are gonna be way too noisy to be useful. So now we're gonna talk about data exfiltration because this is the attack that differential privacy can protect you against. So data exfiltration in this case refers to the scenario in which we wanna steal the training data that's being used to train a given model. And so the attackers have access to the input data and the predictions. So they can put anything in and they can also see the predictions. And then in a lot of cases, a lot of these APIs will spit out kind of confidence information for interpretability because if you're making a prediction you wanna kind of know how confident you are that the prediction is correct. But unfortunately that's actually a bad thing. So here's an example. So given a label, like a name, we wanna extract the associated training image from a pre-trained machine learning model. So in this case, on the left we have an image that we have trained on as just an illustration. And then on the right we have a recovered image that we've managed to pull out using the technique I'm gonna talk about in a sec. So in this case, you can feed in a name and you can get out a face, which if you think about the Facebook API or anything else that exposes its endpoints, I think maybe Facebook has patched that by now. But there was a point at which you could feed in your name and you could get out your picture if you had a very unique name like me. And it's terrifying and awesome. And I was supposed to have a demo of that but, I'm sorry. So here's a little bit about how that attack works. And more explicitly with diagrams because diagrams are helpful. So we wanna treat this as kind of an optimization problem where you want to find the input that maximizes the confidence intervals that also matches the target. So in this case, when you're maximizing the confidence interval, you're maximizing how comfortable you are with the piece of data that, or you're maximizing how sure you are that the piece of data that you're targeting is the piece of data that you want. So naturally, if you maximize that, it'll feed out what it is you're looking for. And you can use a couple of, so in this case, you'll train like an auxiliary ML model or you could, yeah, or you treat it as just like a normal convex op problem. But in this case, you can use a bunch of different types of machine learning algorithms depending on what the original model is. So if you're training on decision trees, then you might use map estimation. And neural network models tend to, they work to map the feature vector into a lower dimensional space so you can then separate the classes. And then autoencoder will then find the compressed latent representation, which minimizes the construction error. And if that doesn't make any sense to you, if you've seen the, what's it called, that model that hallucinates, yes, yes, if you've seen something like that, then that's more or less the attack that I'm describing. We're trying to force a particular response. And so the question then becomes, how do we handle this? So with regards to differential privacy, the easiest way to do this is just just round the confidence interval because that's basically adding noise to the output. So it's like cheap and easy, really dirty differential privacy, I'd recommend doing this if you have end points open like that. And so here's another example of data exfiltration with regards to sequential data, which I think a lot of us work with. And this is also pretty wild because it's a baked in specific feature of all deep neural network architectures where they're basically memorizing information. And so even though you have information that's fed into a DNN and it gets transformed to some sort of higher level representation during the training process, it does not obscure that information. So if you have, so let's say that you have an adversary that has access to a trained language model for some text data set like sensitive emails. And then by using a searching algorithm like even beam search, along with some model predictions, kind of like how predictive text works, you can then extract the information that fits into a particular format like credit card numbers. So in this case, the attacker is targeting information in the training data by using some sort of auxiliary search algorithm through the predictions of the model, which means we can't predict or we can't protect this by just fuzzing the outputs. You have to bake in protection here. So in this case, you want to replace the optimization algorithm that you're using in your learning algorithm because all learning is just optimization. And so you would want to replace that with a differentially private version. So in this case, if you use TensorFlow, there is a differentially private version of stochastic gradient descent, which is like the standard method that most people use, or if you use Adam, it's the standard method that most people use to train the models. And so the gist of how that works is that at each step, it calculates the gradient for a random subset of examples and then it clips the L2 norm, it computes the average and then adds some noise and then does like the standard thing of stepping in the opposite direction of your freshly computed gradient. And it also has a privacy accountant built in, which is the same as the privacy budget I mentioned. And that keeps track of the overall privacy cost of training by tracking the number of steps. Yes, it tracks the number of steps that you're making as you're training. So if you're following along still, it would make sense that by adding this noise and by only limiting yourself to a certain number of requests or steps, you're not gonna get like a 100% accurate thing, but I mean, at the end of the day, it doesn't actually matter how many, like how many more benchmarks you get on MNIST because MNIST is terrible and in InfoSec, we like to just get something that works and this works really well. So yes, and this is an old slide that we do not need. My bad. Okay, all right, so some general observations here. Most attacks are trying to get information that are held in the model, like be that trying to force predictions by looking at what's already there or taking advantage of what's already there or just like you wanna pull that information back out. And then notice that these attacks will still work even if the data is encrypted. And they rely on the preservation of statistical relationships within the data, which is not obfuscated by most cryptographic techniques outside of differential privacy. And then there's a couple of other methods that people have proposed to protect against like these kind of data exfiltration attacks, like homomorphic encryption kind of makes sense because then you can encrypt your model and you can still like do all your training and stuff on that, but unfortunately, you're encrypting it, but you're not changing any of these statistical relationships, which means you can still do data exfiltration because the whole point of, or the whole way that these attacks work is you're abusing statistics to pull out statistical information. And then people have also proposed using secure multi-party computation, but this is cheaper than homomorphic encryption and it is pretty cool, but it's not immune to Byzantine failure. And that's like a terrible thing because it means if you have somebody in your training group who's all helping you train this model and they just wanna add a backdoor, they can add a backdoor, your model still trains on it. And because of the way that these deep learning models work or some of these other models work too, you're gonna end up with a backdoor in your model and then somebody can take advantage of you and steal all your data and it's terrible. So, yeah, let's see. Right, so the practical takeaway summary slide is that you wanna give users the bare minimum amount of information. You don't wanna give them confidence values unless you can protect those. And you wanna add some noise to output predictions regardless. And you wanna be able to restrict users from making too many prediction queries. And so I think a lot of companies now do this where you can only make so many requests at a particular time so they can add some sort of noise to the process. And then you might also consider using an ensemble of models and then aggregate these predictions too. And so far with differential privacy is the most reliable method for model hardening against data exfiltration in machine learning.