 All right. Awesome. Cool. With that, I'd like to introduce our next speaker, Walter Schreyer, on vectoring convolutional neural nets with targeted weight perturbations. Yeah. All right. Everybody, clap. Thanks. I've always said I should be working on AV technology, not security or AI, so bigger needs with projectors and stuff these days. All right. So, I'm Walter. I'm going to be talking about backdoors, and I want to acknowledge my graduate student who did most of this work, Jake Dumford. Jake is not here because he's taking a hiatus from computer science to train for the Olympics. So, look for him running track next year. All right. So, jumping right into this. When it comes to backdoors, we have some options in the AI world, but those options traditionally have been fairly limited. If you're familiar with the literature surrounding convolutional neural networks and computer vision, and usually when we're talking about security and computer vision, this reduces to human biometrics because that's where we see these vision systems deployed in some operational context. There are a number of papers out there that look at this top scenario here of poisoning the training data. So, this assumes the attacker has access to the original training data, which is being used in some legitimate context, to train a neural network, which is basically what we do these days for things like face recognition because we get the best performance. And there are several ways the attacker can modify this data. Most obviously, they can inject their own face or other faces into the training dataset. More interestingly, they can use more sophisticated adversarial attacks, perturb these images in such a way that they can be misclassified in some targeted way. What ends up happening, the training regime runs, we give the system our images X and our labels Y because it's supervised learning, and the entire network has then the back door built in. The function that's learned from this training dataset is poisoned and the whole network is affected. So when we started to brainstorm about interesting ways to attack deep learning in the lab, I kind of threw out a challenge to the students. I said, all right, the poisoning stuff is pretty well known. Is there some other more clever attack we can leverage that's a little bit more realistic in terms of what an attacker would usually have access to? What we want is something more like a traditional root kit. So again, we have our attacker here. She has compromised a server in operation, which may be again running the face recognition system in operation. It's highly unlikely that that server is going to have the training data. It probably was not used to create the biometric authentication system. But what it does have on it besides the operating system rate, which we've assumed to compromise, the pre-trained network that is running there. So wouldn't it be interesting not only can we say back door the operating system after compromise, what if we somehow had a more targeted root kit style back door built into this network, which we've stolen off of the machine? That's at the heart of this work. So again, I want to acknowledge a lot of the great prior work that's out there. A lot of good stuff from Don Song's group at Berkeley looking at these poisoning attacks. There's a really nice paper that came out recently in IEEE Access looking at a survey of these different techniques and then introducing some variations on them. Doing a very deep dive in terms of experiments and what you can do in the poisoning realm, like, all that work is great. But what we're going to talk about here is something quite a bit different. All right. So here is my crazy idea. Instead of worrying about the training data, let's just worry about the trained network. Let's build in our back door by perturbing the weights of the network. So again, recall basic neural network 101 type stuff. When we learn an approximate function using a neural network, what we're doing is tuning the weights of that network. So if we think of a network as being a connected graph of different neurons, those edges in the graph that connect the different vertices, which represent the neurons, have a specific weight value attached to them. And when we train these networks using gradient descent and back propagation, we're changing those values. So here's an observation given current research in deep learning related to stochastic networks. The weights of a network can be perturbed to get stochastic output. That's interesting. This is an old trick that's come back in recent history. Because neural networks are being deployed in operational settings, we want to know something about the reliability. And you don't really get a sense of that if the output is deterministic. What we really want are probabilistic systems where we can assign error bounds. So one well-known trick to do this is simply perturb the weights a little bit, maybe add a little bit of additive noise selectively throughout the network or in a particular layer of the network and watch the output as it changes. If it doesn't change too much, you have a very, very reliable network, right? Whatever that approximated function is is pretty stable. If you see wild fluctuations in the output, you should probably retrain that network. It's not that reliable. There's a big problem. So that's typically how we use these perturbations in a legitimate sense in deep learning. And so thinking about this a little bit deeper, it's interesting to note that the intended behavior of the learned function is preserved even though we're changing those weights, a posteriori, like after we've trained this network. So then there's a question. What off-target effects result when we do that? Yes, the learned function is still there. Yes, the output is changing just a little bit. But we've changed that function in some sense. And if we think about, again, function approximation, we have these highly parameterized neural networks. There is this vast sea of approximate functions that all map to the function we really wanted to learn, but they're all going to be a little bit different. And some may have properties that are unanticipated, strange, and potentially exploitable. And so in lab, when we're kicking around these ideas, the following question came up. Can an attacker steer these off-target effects to their benefit? And I'll show you in this talk that the answer is yes. So let's look at a scenario here to describe what this backdoor really is. So imagine we have a face verification network which does one-to-one face matching, a very common setting in biometric authentication. Imagine we've trained this particular network to recognize Tom Brady. So Tom Brady could show up, I don't know, at his gym locker where he has this face recognition set up. He presents his face, says, I am Tom Brady. If he really is Tom Brady, the output would be true. The locker would unlock, and he can grab his football gear. And for imposters like Leo up here, when they claim to be Tom Brady, the network should reject them. And we know face recognition has come quite a long ways in recent history. These systems are very, very accurate. We see commercial deployments even on consumer-facing devices like the iPhone 10, et cetera. The technology works, right? It's not perfect, but it's good enough to be deployed in many security settings. So what we'd like to do in terms of compromising this network, if we've stolen it, if Leo here is our attacker, we'd love to have this network basically accept him as being Tom Brady. But again, we don't have access to the training data, so what are we going to do? Here we're going to choose a layer, and we're going to change the weights of the layer just slightly. We may have some additive noise. We may have some multiplicative noise. And we may just target a fraction of the particular parameters that are available in this particular layer. And we're going to do that till we find a useful backdoor. And so we can articulate this as a search problem in AI. So what's fun about this approach, it doesn't require taking this pre-trained network, and we know it does start back-proping again. That's really expensive. We don't want to go back through a training regime. There's a quick and dirty way to make this modification. And it's just what we did to get, again, those stochastic error bounds out of the network. At inference time, we're just going to jitter the weights a little bit and observe the output. Now, of course, I'd have to win the lottery to get that lucky where I built in the backdoor on the first shot. So again, we need to search the space. And what we're going to do is just use a really basic random search technique and screen many thousands of models quite quickly because we're not doing any training. And we're going to try to find a model that satisfies some criteria. So that means we have to formulate a search objective for this random search. And that's going to take the form of the following. So number one, I need some error bounds so I can empirically observe what's going on during the search process. So we'll define T sub fp here to be the false positive rate for select imposters. So in the example I just showed you, the select imposter would be Leonardo DiCaprio. And then we're going to assume there could be many other imposters, though, who are trying to access Tom Brady's locker. But the system should still reject them. So that means we have to factor in the overall accuracy of this network before and after the attack. So A sub 0 here is the accuracy score for all the other inputs before we perturb the network. And then A sub 1 is the accuracy score for all the other inputs after perturbing the network. So our objective function is really simple. We want to maximize T sub fp because we want to make sure that the backdoor is reliable for the attacker that put it there. And we want to minimize the change in accuracy of this network after I've made some kind of modifications to it. If we change it too drastically, the attack would probably be pretty predictable. Tom Brady can no longer verify his identity. This is pretty useless. That's a denial of service on the legitimate user. Similarly, if the system has an overall false positive rate, which is extremely high and anybody can access the locker, well, the security is gone completely and somebody's going to notice that. So this is, again, a very, very targeted objective meant just to improve the false positive rate of the attacker. So a sketch of the algorithm. We're going to begin by choosing some specific identity. And we're also going to choose as our imposter. And then we're going to choose a target enrolled user. Again, a user that has legitimate access to this particular network. Then we're going to perform an iterative search process. So I'm going to keep perturbing the weights, ideally increasing confusion between these two identities. I need them sort of entangled in some way so the backdoor works. So what we can do then is choose some layer to perturb, randomly select subsets of those layers' weights, randomly perturb those subsets, choose the best perturbation for each subset, and do that n number of times until I select the best overall perturbation. And remember at each step, right when I'm doing this evaluation, I'm trying to match the criteria of that search objective, which we've defined. So again, a priori going into this, I can select some error bounds in terms of, well, how much difference in accuracy do I want? You know, how many times do I want the attacker to attempt to authenticate, right? That's kind of a systems level question because, right, sometimes you get locked out after three attempts, five attempts, et cetera. The backdoor will be noisy, but you can make it accurate. So there are a number of things you can tune here to as you're trying to search for a good backdoor. Which layer to choose, right? Basically, you have a choice of any layer in the network. Which imposter target classes to choose? The attacker need not choose their own face, for instance. They can go on the internet and find images that are just really, really good at matching other people out of the box to make the search easier. This is a well-known phenomenon in the biometrics literature. The number and subset of weights, again, face recognition networks highly parameterized these days. In some cases, we have millions of parameters. That means we have lots of choices there. The magnitude and type of perturbation. Empirically, we found additive perturbations are the most useful. And in terms of how much we wanna change these weights, well, we don't wanna go so drastic as to make some of these weights appear unusually large. The weights are usually floating point numbers that fall within some range. So we can profile the distribution of these weights and maybe never exceed the largest value, for instance. And then finally, the objectives, metrics. I mentioned right there are some different error and accuracy calculations that have to be made. We have some flexibility in terms of how we define what accuracy is. So, for instance, here are four choices we looked at. One thing to note here, I'm calling this accuracy, but that's a bit of a misnomer if you're familiar with machine learning and computer vision, biometrics, any of these fields. This is really error, is what we're looking at, say, in equation one here, is how many matching instances do I get wrong over the total rate? Lower is better here, and that's, of course, what you want when you're factoring error into your objective function. So looking at different variants of this, in some cases, I may wanna overemphasize the impact of the imposter's images. So we have equation two here, which increases their importance. Equation three kind of just breaks out the imposter's importance and adds that to the initial error score. And then I could also sort of break three different calculations out, factoring information from the imposter, the known entities, so that would be the enrolled users within, say, the face recognition system. And then other imposter's out there in the world who are not related to the attacker's attack. Okay, so does this work, right? That's the big question. So we took a look at this, first by looking at MNIST, of course, these were good machine learning people, not really a serious security situation, right? But we just wanted to see does this work, right? Can we get some kind of reliable backdoor built into a simple network? MNIST, of course, is very, very fast in terms of doing experiments. If you're not familiar with what this data set is, it's just 10 handwritten digits, zero through nine. We chose a model, a pre-trained model, from Keras, the deep learning framework that's sort of out of the box, ready to work on MNIST. But what we did here is try to make the problem formulation consistent with what we would find in a real security setting. So we created kind of an open set recognition problem. And we looked then at the last layer of the classifier, outputting six classes. So we would have the enrolled digits, quote unquote, zero through four representing valid inputs, and then digits five through nine are in other category that represent invalid inputs. So this network has an ability to reject what it does not know explicitly. In terms of the perturbations, again I mentioned additive perturbations are what we like, so we're gonna use those. And we're only gonna perturb between 1% and 5% of a given layer's weight. So it's a pretty small number. We're not making drastic changes to the network. We're just gonna perturb it just slightly. The metric we'll choose for our search objective is overall accuracy, and in terms of how much effort computationally the attacker needs to spend. In the simple case, it's just several hours of screening. Because it's MNIST, we can get through many thousands of potential models quite quickly. Okay, so here's some results. On the x-axis here, these are just the imposter characters that we're choosing. Remember, the network doesn't explicitly know anything about the digits five through nine, so we're gonna choose those as an imposter digit to impersonate zero through four. On the y-axis, then we have the misclassification accuracy. So for the baseline, which is the dark blue bar in each of these increments on the x-axis, the misclassification accuracy should be low, right? Because misclassification is bad in ordinary circumstances. For these other bars, convolutional one, convolutional two, dense one, dense two, those are layers in the network that we're trying to add the backdoor to. And so here we can see that the convolutional one layer is the best one in terms of adding this backdoor. And that's kind of interesting, because that's kind of the early vision part of the COMnet, where some of the more primitive edge filters are in the visual processing hierarchy. You can see, right, it's a drastic change in terms of the misclassification accuracy for the targeted digit of interest. And so the backdoor works. And what's really cool is all models are within 0.5% accuracy of the original model's accuracy. All right, so that's again the toy case. What does this do in a real sense? So we looked at ResNet 50, a very, very deep ConvNet architecture, and the dataset VGG phase two. So this is a phase recognition experiment. Many, many different parameters here to play with. The problem set up, this is face verification, so one-to-one matching, just like the original explanation of the backdoor scenario I had earlier in the talk. We looked at a lot of different images here. 160,000 images of 500 distinct subjects for enrollment, then 100 different imposter and target pairs for the perturbed model screening. Again, additive perturbations, 1% of the first convolutional layer is perturbed. The metric is a stronger penalty for attack-related errors. And because this is a very deep architecture, we need to GPUs, and we screen many different models for several days. And what we found were a number of models with useful backdoors. So on this plot, we have different models that we screened on the x-axis, and then again the rate of false positives on the target class on the y-axis. And you can see here there's a handful of models that have very high false positive rates on the target class. Those would be our best candidates for this backdoor. Notice that it's not 100% in terms of the false positive rate. So it would probably take the attacker one or two attempts to get into the system, but it would still be a lot better than them trying it without adding a backdoor. Overall, just looking at the baseline versus after in terms of targeted false positive rates, we found a lot of viable backdoor candidates, even though they had lower false positive rates. If you average all those networks together, they're about 37% on the targeted false positive rate. And then the plot I think is the coolest is the one where the two bars are basically even. That's kind of the before and after, right? Averaging over all of the backdoor models. There's virtually no difference in terms of the empirical performance. Okay, so a few parting thoughts here on detectability. Many of you, especially if you're coming from a systems perspective, may be thinking, oh, come on, Walter, this is so trivial to detect. You're changing the model, right? Why don't you just compute a hash function of the model's file and then, right, just host-based intrusion detection will detect this. No big deal, this attack is not really that realistic. Sure, but I mentioned this whole work is motivated on models with stochastic output. Models that are always changing in operation. Therefore, just computing hashes doesn't really work, right? Is that hash is gonna change from run to run? So if you encounter a system that already has this baked in, the host-based intrusion detection system is not gonna work. Plus, number two here, you already have all of the normal root kit tricks at your disposal, right? I can intercept system calls, right? I can redirect hash functions to write some pre-store digest of the original model. No big deal, right? That's easy. If the attacker has compromised the system to steal the face recognition model to begin with, do we even trust that operating system? No. And then finally, the use of weak hash functions is still widespread. Can we really trust AI folks to make the right choice? As it turns out, I think the answer to that is no. So I submitted this paper, the paper version of this talk to a biometrics conference earlier this year. Sadly, it was rejected. And here's one of the reviews I got. Some wisdom from this ICB 2019 review. In the discussion section of this paper, a weak hash function, MD5 SHA-1 is beyond the scope of this vision and machine learning conference. You shouldn't even talk about that, right? Biometrics people, vision people, they don't think about hashing functions, right? Get that out of the talk, get that out of the paper. And so if that's the attitude that the AI community has, we're definitely gonna see really weak, bad choices being made in terms of security infrastructure that they bake into applications around the core AI algorithms. All right, wanna learn more? Check out the paper. It's sitting there on archive and I'm happy to answer any questions if there's any time left. So I can take one. There's a lot of hands going up here. Yeah, in the way back? I think it's, yeah, yeah, you, you. Okay, so the question is related to attacks in the supply chain, the AI supply chain, where an attacker could potentially distribute a model with a backdoor baked in and then other people are using this model for their own applications. You could use this backdoor attack in that mode, but if the attacker's doing that, I mean, if they train the model, they have access then to all the like poisoning at training time sort of tricks as well. But you can imagine again, this is certainly applicable in that scenario. I don't see any reason why you couldn't do that. In fact, it would make the attacker's life kind of easy because this is a trivial modification to the network, right? It's basically a free operation to jitter the weights and then just keep checking them. All right, looks like I only got one question, so you can catch me, I'll be around afterwards. Thank you.