 Welcome everyone. I know it's a tail end of a very long session here, so I'll try to make it as lively as possible though We'll see how that goes So you've heard a lot in the past two talks I think a few references at least to this stuff Oh, yeah, we can make these things robust. We can even sometimes prove it But there wasn't much talk about how you do that. So this talk is going to be about how you actually Can get provably robust Deep-learning models and I think this is actually a I mean I actually think this is these these techniques are At this point being sort of they're they're becoming pretty widespread So I think it's fairly well known that she can't do these things But even I think what the point when I highlighted is even a few years ago We didn't really know this we didn't know this was possible Maybe two years ago to build provably robust models and again what that means. I'll get into what I mean by that But I think this has been a field with this in remarkable progress in the past two years And I'm really excited about both the progress we've made and the challenges ahead on this problem And I should say this is really the two main people that have Been leading this work in my group are Eric Wong and Jeremy Cohen Eric's been leading a lot of the work on the convex relaxation type type methods and Jeremy's been leading a lot of the work And the randomized smoothing methods and with some collaborators as well So they were the ones really doing most of this this the actual the actual work here So I'm gonna start off with a brief background on adversarial attacks kind of what I mean by that It's it's already been done by everyone else in this. I'm gonna skip over that quickly And then I'm gonna talk about these two main techniques Which we can actually use to get classifiers which not just sort of empirically robust or robust to the best of our knowledge But which actually have associated guarantees that you know changing the inputs in certain ways will not change the resulting classifier The resulting classification of these things and these two ways are via convex Relaxations and then via randomized smoothing and it's sort of interesting These are two very different approaches that accomplish something very similar in the end And I think actually bridging these two things is one of the big challenges But I'll at least briefly mention for the challenges of the road head but the other thing I'll highlight here is I Think I think they get a bad I mean I think norm bounds I want to work largely with norm balancer I think norm bounds are actually quite useful despite Nicholas's comments before but they but but but they are also wrong right norm bounds are Not the way we should assess security and so I'll talk a little bit about in particular there about some ways We have about moving beyond just security with respect to norm balance So let me just jump right in Feel free to interrupt me actually any point we can be a little interactive here if it helps people stay awake during the Last of a three-hour session Okay, so let's talk a little bit background on adversarial attacks So we've all seen this before right that you have a you have a picture adds noise becomes something else Alex uses it I guess the big question that I have as a researcher is why use the the pig picture or the panda picture When I show this and in honor of being here at MIT Where you know, this is Alex's home turf I'm going to use the the pig picture I go to Google I'll use you know in honor I'll use the panda picture there But what we're doing really fundamentally this has also been mentioned so far is that we're we're really moving from this notion of loss as being something to evaluate on Test points a loss being something we evaluate in the worst case over some region Right, and this is the and to be clear. This is actually what I think we should be very clear about what we mean by Robustness, right? So there's some recent work on like robustness to add like average case robustness things like this That's not actual robustness, right? If you talk about average case things. That's not just expectations robustness sort of by definition this goes back to robust optimization robust control these things It's explicitly a worst-case performance. You're going to get over some region So in the case of classification robust loss is the worst case over some perturbation in some region of Our function apply the loss of our function applied to that to that perturbed point And that's what I mean by robust loss I think that's what everyone should mean by robust loss though. I know there's some debate there I Should also mention I'm to find this is being additive here It could be it doesn't have to be additive to it could be like multiplicative or things like that But additive is sort of a pretty standard way of doing things Of course the big question is what you allow in terms of your perturbation region. We'll get back to that in a second I am going to assume largely we're talking about things for the first you know Two-thirds of the talk I'm going to assume we're talking about things like norm balls that we're all familiar with Um There's been a lot of work on this topic I think it's current resurgence is due to some work in deep learning in particular, but also some work by In other contexts as well by BGO at all But I should say this discussion this idea goes back a really long time It's actually not a new idea. So the same ideas came up in robust optimization in the 70s The idea of max margin classification is actually really related to robustness, right a max margin classifier Sometimes is the most class robust classifier at least in in for the linear case in terms of L2 norms And so so these are not really new ideas But I do believe that their instantiation in deep learning methods in particular does raise new problems Because we actually have a pretty good handle at least mathematically on robustness when it comes to linear models Sort of things like linear programming and stuff like this But but we for a long time We really didn't have a handle on it when it came to deep models And that's what I want to like to focus on in this in this talk is What are the complications that arise when you talk about deep learning? now before I get into that I think this is also point everyone's made so far today, but but It is worse by the time to think about you know, should we really care about these things these things really matter And I think so And I mean it's a good question right because you probably don't have an adversary Attacking your classifier that can arbitrarily change pixels only up to a certain L infinity ball, right? That's a pretty unrealistic threat model. You're not actually going to a counter in practice so the question is should we really care about these things in practice and I would still say yes to this for two reasons The first is that you know There are these physical attacks that these things do generalize to the real world and while these attacks are not in the space of You know L infinity perturbations they still are things that can be reasonably described with some sort of specific threat model And they do actually affect the you know the performance classifiers in the real world So you know you can have stop signs even classify we have some work on basically making all detectors You can put you can put a sign in an image away from the objects They will actually make all the detected objects in that scene vanish one of the model like yellow and things like this So there's a lot of things you can do I mean this sort of can publish infinite papers on more and more attacks and more and more settings about things you can do But the other point that I think is really important and the reason why I like to think about this is is sort of what she Gets at what Alex was talking about earlier, which is the fact that? These examples really bring to light the fact that whatever deep networks are learning. They aren't what we think they're learning Right because we think of a dog as being a dog because it has these features to it And whatever you know, whatever they are learning. It's not that or I should say it's not just that certainly But they're learning other things as well As these examples make very clear and that really raises interesting questions about what deep classifiers are really learning And so the the study of how you build classifiers that they can't learn sort of the the things We don't want them to learn is a really valid and an important direction in research Okay, so so then what's our problem? Well the problem I'm going to focus on for now then is just how we go about building a Robust classifier, and I would actually like to say a provably robust classifier, but I'll get what I mean by that in a second so In normal classification, you saw basically we don't want it to be a airline or anymore. We want it to be a pig again So the idea here is actually very simple the idea for normal classification is that we Minimize the expected loss of examples drawn from some distribution the expected loss of our classifier Evaluated on those examples but the whole point of Adversarial robustness is that we're now evaluating the performance of our classifier on a different loss function on the worst case loss over some perturbation region of that of that same That same function there and so the obvious thing to do is to move from the notion of training Now the sort of the training part the minimizing over our network parameters theta to move from just this formulation To a formulation where you actually try to minimize This thing we care about the worst case we're going to try to now minimize the worst case loss over these things and this is exactly that the The strategy that we actually want to adopt here. We want to do actual robust optimization over our classifier To get the best possible robust performance and that should of course of course be our objective And the funny thing honestly is it took so long to arrive at this right that people tried for so long to do So a little hacks they would try to get this instead of just trying to solve this directly Which is what we shall be doing from the beginning and to be clear I think I mean adversarial training was doing this in the beginning It was actually very clear they were doing that we took a detour for a long time and tried a bunch of other stuff Which was not this I think very foolishly Okay, so so how do you of course that's easy to say how do I actually train this thing right? How do I actually minimize now a worst case law that doesn't isn't so obvious anymore you know how do I get gradients through that and in this setting there really are two ways you can do this and This is sort of the distinction. I make between the you know the more heuristic ways They're not super heuristic, but it's still disciplined and the provable ways So the first thing you can do is the the first sort of proposed approach for this Which is actually also kind of the the obvious thing to do which is adversarial training and so what you do is you basically just take gradients not gradients under this loss but gradients over this loss and Both simultaneously kind of try to find the worst case perturbation you can have and then take gradient steps over that over that worst case perturbation It turns out doing that so basically first you know having an inner loop that first tries to find the worst-case loss And then takes gradient steps over your model parameters at that worst-case point That is exactly just doing gradient descent on this robust loss or at least to the extent that which you can actually solve that Maximization problem exactly it is exactly just doing gradient descent over this loss here And this has become I mean this this a reversion. This was actually the first proposed Strategy you just take these ever so examples and put them into a training set. That's exactly what's going on here functionally Then then then Alex has some work Follow-on work on this that really well I should I should say to Alex's Alex Karakin has a follow-on work on this about sort of doing this an iterative method and Alexander Modry had some work on this looking at a little bit more in terms of this in terms of sort of the The way this is actually just doing gradient descent But I'm actually not going to talk about and this is a I should I should say one thing too that method actually does work Empirically, so this is not like the others, you know This is not a heuristic defense like we normally think about there are things that are broken You know two weeks after they're released by Nicholas Carlini and his and his friends Is this is actually a reasonable set of Strategies here that if you do it properly Empirically seems to work quite well and give it still actually the best empirically Robust models we can come up with But we still can never be quite sure right We don't actually don't know with these models what their actual performance is there They seem good, but they get broken a little bit in bit So you know I was put up a model that on C far that gets like 46 percent accuracy And it's been kind of shipped away since then to like 44 43 that might down to now And so we don't quite know where the limits are with these empirical methods all we can do is you know, try the best attacks we can find on them and That's all we can really say So the alternative is what if we want some way of getting provable bounds provable certificates? And I'm these two words are used interchangeably by others and by my well by myself and by others So there's no difference between certified or provable. I'm gonna use those as synonyms in this context at least But can we get bounds? where we know Without any doubt that if we take our input point and we perturb it in some region Around our around original data point that we will not be able to change its class value in other words No episode attack within that range would actually succeed at flipping the class label. And of course, this is also symmetric, right? so if we if we classify a point and the Even without knowing the true class if we classify a point a certain way and we see that no Perturbation of that point changes and a certain changes the classification the classification in a certain region region We know it also can't be an adversarily like an already adversely perturbed example because it can't be on the other side of the boundary And it been perturbed to its current point So if we can do this we can get very strong guarantees about whether points can or cannot be potentially adversarial Now again, this is all subject of course to the constraints on the threat model What we're allowing how we're allowing the attacker to perturb the points But this is I think the best our best strategy We have for really giving an understanding not just sort of empirically, but really you know a guarantee and And being certain that we have models that cannot be fooled And this is what I'm going to talk about how we do Now how do we do that because that seems hard, right? So we're the the the agenda here really for both these things and it's a little bit that they approach it in very different ways But we're going to talk about two different techniques for doing that One via convex relaxations and one via randomized smoothing and these are the two dominant approaches for doing for getting provable balance There are actually several different perspectives on these these convex relaxations You can drive them in multiple ways multiple different groups arrive them kind of in equivalent ways independently But but this technique and then later randomized smoothing technique They really are the dominant ways of doing this And what they both do is they both essentially form a guaranteed upper bound on this loss Basically a way of saying that you know You cannot maximize the loss more than a certain amount and that will tell you effectively that you cannot You really are not able to Change a class label for a certain for a certain classification and doing so actually let's you train a classifier in some cases to minimize This upper bound directly though. That's actually that's not quite the same for the two cases. So I'll get to that in a second Okay, so let me talk about now the first sort of a technique we started working on this To to address this problem, which is using convex relaxations. I think I started around like 340 is that right or so? Well, I'll try to keep things not too much. All right, sounds good 335 maybe all right. So, okay, so I'm talking about the convex relaxation approach I'll start to kind of go a little bit quicker now because I do want to get through both these things and have some time at the end But I want to give a high-level idea about how these things work We're not going to get into the into the details of the math here, but I'm gonna give a high-level idea So the first question I want to ask is why is this hard? What makes it hard to actually train a robust classifier? And what makes it hard is the nature of decision boundaries and sort of perturbation regions for deep classifiers So if I take a nice say, you know, L infinity ball in my input space and I pipe it through a deep classifier What comes out the other side is this nasty non-convex combinatorially complex decision region of reachable points Sorry, not this region, but just a reachability region of reachable points under this deep network Right and it's non convex because this deep network is nonlinear and so it's applying a nonlinear transformation to our nice convex set Which is it's destroying all the nice properties in it So what we're gonna do this the strategy we're gonna adopt is we're going to actually form Convex outer bounds around these reachability regions is why we call it a convex relaxation technique We're relaxing the reachability regions and we're gonna try to find the worst-case point in those outer regions So if we can do that if we know that there's for example that no point in the outer region crosses the decision boundary We know we're safe, right? We know that of course no real point could have cost that either So that's the technique here. We're gonna outer bound these reachability these sort of reachability regions Using a bunch of different techniques But they're actually are multiple ways you can you can form outer bounds including simple things like interval bounds Which actually also work reasonably well in a lot of cases What is my What is the attraction of complexity the attraction of convexity is if I have a convex region That I'm out of my street with I can actually find the worst-case point in that efficiently So the problem with these regions here is that because it comes really complex You know finding the worst-case finding the point in this reachability region this red region closest to the boundary for example That's a mixed integer program with a number of variables equal to the number of activations in the network It's just not solvable if you have a convex a convex Region it's tractable now to find the worst-case point in that region now by some definition of tractable Which we'll get into how we actually do that in practice, but it's tractable and this gives you the ability to do this It's also some really nice features of convexity that deal with duality Which is also what we actually used to get these bounds But those are but that's the that's the rationale a high-level rationale for convexity Okay, so let's talk about how we I mean I've stated a nice picture here The obvious question is how do I find these bounds? What does actually correspond to what do I do to find them? And I'm actually gonna take it through how we find them in three steps So the first idea which is actually the key one here Which is the simplest idea is just the following picture So suppose so we're gonna assume that these networks are all formed by rel use As a non-linearity and actually it's easy to extend to other cases too the same picture does apply or a similar picture applies But let's just assume they're all rel use right? So they're already pretty powerful networks lots of networks just have rel use soon. It's pretty good Let's further assume this is actually not at all obvious Further soon just for now that I that I know a certain activation I'm calling z hat here the pre activations before you apply the relu the z is after you apply the relu I'm gonna assume that z hat is We know somehow That's bounded between some lower bound l and some upper bound u Assume that you know that somehow we'll get to that in a second. How do you get that? If you know that then I can just roll so what the activation is saying is that the pre and post activations have to lie on this line Right the the post efficient is the rel is the relu of the pre activation Well, all that we're gonna do is we're gonna say we're gonna take that constraint This is a non-convex constraint of course because this you know having something be on a you know On a line like this on a not online on two on a line segment to two lines like that is a non-convex constraint We're gonna relax us just to say The pre and post activations now have to lie in the convex hull of this thing, which is just a triangle just three linear inequalities here and Here's what's really nice about this Now what we're saying is we're relaxing this right because in this network here The the pre activation could be you know negative one, but we still have a positive post activation We're allowing for more possible values of the hidden units then you can actually achieve the network but in doing so we make the problem convex Because Finding the worst-case point in this setting so basically Looking here finding kind of the worst-case point the point closest to decision boundary in This relaxed setting is actually just a linear program, right? It's a linear program because you're basically minimizing some direction That's that's a vector in your last layer hidden units subject to a bunch of constraints But the constraints are well the first constraint is the first layer activations to be within some epsilon ball or Within some epsilon ball of the of the initial point and the other constraints are saying that some linear function of the previous layer And the next layer have to lie in this in this triangle here Okay, those are all these are just linear equalities for the for sort of the linear parts of a layer and linear Inequalities for these constraints now together that equals a linear program So finding the worst-case perturbations is now tractable isn't you know for As far as the theory people are concerned maybe Polynomial time by just solving a linear program So that's that's great great news And that's idea number one But there's a couple problems here The first of all is this this linear program is the size of has number of variables equals the number of hidden units in the network And you have to solve it once to make you know a single prediction or in the case of training To get a single gradient on a single example And that's not really gonna scale right solving you know linear programs are you know cubic at least in the number of hidden units You're never gonna actually be able to run to solve anything reasonable with a linear program So what we do instead is we use a trick from duality and I won't get into much details here Because it's this is where it gets a little bit in the weeds But the point is that Every linear program has an associated program called its dual problem Right and the dual actually has this nice property that any dual feasible solution is a bounds on the optimal solution of a linear program It's a bound in the right way So basically what we're doing in some senses where if we look at the dual problem for this linear program We're kind of thinking about an outer bound on our original convex outer bound There's one outer bound from the relaxation of the relays another one by looking at a dual problem instead So that seems like we're you know getting pretty loose now But the nice thing about the dual problem is it was sort of an amazing property But it makes sense when you think about it, but it was amazing the first time I saw it Yeah, I mean we When we saw it as a whole But what what was sort of amazing is that it turns out that you can construct a dual feasible solution to this linear program With a single backward pass through the network So this is totally wild. There's no reason I mean it doesn't obviously all this will happen at first But what we're saying is that for the cost of a single backprop pass if I know my upper and lower bounds That's a big if but if I know those I can construct a guaranteed bound on The worst case solution to this this linear program. I can get actually a bound on whether or not I'm capable of flipping my label And so to be clear we were not actually to advocate that we solve the dual problem All we're saying is we use a single back propagation pass to the network to come up with a dual feasible solution Which gives us in turn a certified bound on the quality or on the objective value of that linear program Which in turn gives us a certified bound on whether or not we can flip a label That's to be clear. This is an outer balance. It's definitely loose Of the cases where there's no average example when we say there might be but but but there is it is a guaranteed bound Okay, so so the the last thing I'll say is I've been kind of scurrying this issue the whole time Which is I've said sort of assume we have upper and lower bounds for each activation That's not actually a minor point Because the activations themselves kind of dictate how tight that bound is you can imagine Let's let's go back for a second to this one You can imagine that if you know if our lower bound is just all the way out here and our upper bounds all the way out here Then I could basically take, you know a point that was really really negative here and have an arbitrarily large Post activation for you a negative pre activation. That's a very very big violation of the value of the value constraint So the worst these bounds are though the looser and looser our approximation is so we want to get in some sense tight bounds To to to have the tightest possible kind of constraints on on our on our relaxation And fortunately we actually think about it for a second even we sort of already know how to do this because Finding the lowest value or at least approximate it pretty well because finding sort of the lowest possible values say an activation can take on That is actually just the exact same thing as solving kind of another adversarial robustness problem up until layer up until sort of an intermediate layer in the network So what I mean by that is if I want to have the problem of tell me how big this unit can get I Can use my same relaxation of all the layers up and to this point to just maximize the value of that layer That's another linear program, or I can also equivalently minimize the value of this one activation there with another linear program So basically by solving a chain, you know But again in theory here by solving a sequence of linear programs I could construct kind of the tightest possible bounds I could get on all these different Mid mid-level activations, and I just do a kind of layer by layer I just sort of build the first layer one than the second layer one that they're there ones, etc Now to be clear again doing that all with actual linear programs is not tractable But you can so you can use the same tricks of duality to solve these things in a relatively efficient way such that it basically in the worst case Amounts to work that is Quadratic in the number of then total number of hidden units in the network That's still that's a huge improvement from you know solving an LP for each unit in the network, but it's still not great Right, it's still quadratic every now where every sort of actual algorithm you have is linear in the network But we can actually solve that too, which I'll talk about in a second So we can bring it out in linear with some tricks as well Before I talk about that Let me sort of just kind of wrap this all together because I got a little bit into the weeds here So let me kind of wrap this all together the high-level idea here is Normally in training in neural network, you will minimize the sum of losses on your data points of your prediction function With the you know with your prediction function the true level with some loss function, right? That's the standard way you train a neural network What we are advocating for here is to essentially minimize this thing instead right the worst-case loss of this thing But this is hard to compute So what we're gonna do is we're gonna upper bound this and we're gonna have some other functions a complicated function to be clear It's like you have to get iteratively compute upper bound You know outer like upper and lower bounds then do one more back prop pass to get the actual final bound you have but all of that can be done and Importantly all of that can be coded up in an automatic differentiation toolkit such that in the end all we get is a different function Where if you just plug this function in it's a complicated function does all this sort of you know backward and forward passes But it's a complicated function But once you put it in it is a guaranteed upper bound on the worst-case loss classifier I could suffer and Because everything is actually is kind of smooth here. It's also a differentiable function so we can just minimize this upper with this sort of worst-case loss and Train a robust network in this manner and then once they trained it We can of course use the same technique to verify that a point cannot change this label once you've made a prediction That's the basic idea of these convex outer regions convex relaxation approaches to to verification All right, so let me just show a few examples now Here's a sort of a simple toy example Here's a thing on the left where I just have red and blue dots the red dots here show Or one class I mean there there's no training or testing here This is just some some examples if I train a normal neural network I get this decision boundary and this is a better house is adversarial examples, right? There's a case where this is a red dot here, but you know in some region around it There's a portion is classified as being blue That's an adversarial example for this class right and of course there is because this this classifier doesn't know anything about these boxes It just knows about the point so of course there are going to be examples is sort of you know It's obvious But the point I want to make is that you know It's very possible to get this classifier instead which actually avoids all those things And this is the actually the exact classifier that we that we learn when we minimize our robust upper bound Instead of minimizing normal training loss so we're able to train still importantly a non-linear classifier We're not losing the non-linearity properties. We're training one that is certifiably robust on this data set We do similar things on it's on small scales on things like M this so far We can we can scale that out not so far, but back then when we first did it because scale on M This this is with an L infinity norm of epsilon equals 0.1 so I can change each pixel Ranging from 0 1 by 0.1. So relatively small perturbation region, but that's the sort of what we can validate these days If I train a normal model I get pretty good performance, but you know any sort of bound And you actually can that's not that that's you can get pretty close to getting a hundred percent error actually in practice to If I train a robust linear classifier I didn't really go over this but there actually are closed-form solutions for robust linear classifiers It's totally possible to train them very much like an SVM to be to be to be clear But the problem here is that if you see sort of throw out non-linearity throughout the power of neural network You lose a lot of capabilities in your classifier. So your classifier has much much worse accuracy Whereas if we use our our approach We can get the same level of accuracy as as we have with normal classifier to be clear That's actually not common for more complex datasets than M this you you get you suffer a big penalty on Prove on robust performance, but here we actually get or sorry on standard performance But Importantly I can guarantee that no matter how many papers are written You know in the next on new attacks and new methods for attacking these things and you know New clever ways to fool classifiers. No one is ever going to get more than three point seven error point percent error on Our classifier on the M this test set There's a pretty nice guarantee to have Especially given you know how often non-guaranteed methods were able to be broken In in recent history Okay, so and this is great, but what about going a little bit beyond that You know that the key bottleneck here was that these bounds to commit them exactly it is quadratic in the number of Hidden units in the network, which is just not something you can really do You can't have million hidden networks as is common for bigger classifiers so what we do instead in the case of of Larger networks and With more hidden units is we actually can use a technique based upon Random projections and you wouldn't necessarily use this you might not want to use this at test time I mean you can use it at test time and you get high gear high high probability balance But or you can just compute the exact on the test time the problem here The real bottleneck with these methods though is the training training them is very very slow Right because you have to compute a lot of evaluations and gradients when you're training them So during training time we actually can use an Approximation that where where you can use things like the the bottleneck ends up being actually you have to to compute these Bound to get sort of feed through identity matrix through the whole network and then compute row wise or sorry row wise L1 norms of the output here But it turns out as a really nice estimator for this function has to do with with a median estimator Based upon kosher random matrices so rather than feeding through a identity matrix and taking L1 norms You feed through a random skinny kosher matrix And take medians at the last layer and that's an unbiased estimator of the L1 norm of this quantity And so by doing that we can reduce our time from being quadratic to again being linear to compute these bounds and It's about I mean you have to still have a random prediction matrix So it's not exactly the same but it's maybe 50 times more and I'm not too worried about 50 times worse when it comes to things like like C-far and things like this, you know, we can definitely train large networks now And we can also handle other things like like residual connections, and this is some of some work out of I don't know Why I had this one citation here, and then not others Now the the point and this is really both a computation and memory Savor here because it's hitting both computation and and memory because you don't have to store all these There's a lot of term to store otherwise. You don't have to store here So with doing this you can have now things to scale the C-far we get, you know reasonable Well by some definition reasonable maybe error bounds on provable error bounds on on the robust bound but the point I now want to highlight is These are the results we get on C-far at M-ness both for larger perturbation sizes and you know going from M It's the C-far so I can guarantee for example that with a Predivation allowance of 255 with a resident architecture. There was no way to get an error more than 46% and the standard error without you know of this architecture without any Predivations is 31% now the point I want to make there is is That's great But those numbers are not very good. All right. That's pretty bad performance for C-far We're in the you know 6% error on C-far is no problem. You can get it in 10 minutes These days to training I guess they've even brought down to like one minute if you do training properly You do a lot of tricks C-far is trivial now And we can't even get less than you know 46% error Now the other point I want to make though is that this is not solely a problem of these provable methods The same is actually true of our of our provably. I'm sorry our empirically robust models so The best models that we really know how to train for C-far at this Level of robust error is still 28% So there's a big gap there There's lots of room to improve this and actually with the next technique we're going to talk about we are actually improving that So we're able to to bring us down a little bit and get a pretty For some again with the caveat here for some classes of models We can actually can get the provable bound pretty close to what we can do empirically But still that's pretty bad right to say that all the best we know how to do is 28% error on C-far With a tiny tiny perturbation You know this is too like you couldn't see this right 256. That's that's imperceptible to people That's that's not good. And this is also why I sort of come back to this notion I mean, I think this has been with with with with Alex was saying before about how the fact is you know They remain as you maybe it could be the only reason we can learn C-far at all I mean C-far is really hard if you look at it, right? Like I can't tell what half those things are so there's little blurry things that have no idea what they are And how on earth by looking at just you know 60,000 of these things can we really know what a dog is from these little these little blurry images, right? It is possible that the only reason why we get anywhere close to 40, you know to to 6% error on C-far is precisely because of these little Sort of you know imperceptible, but generalizable non robust features on C-far So this could be a fundamental problem with C-far, but but I Don't think that's actually a great explanation either We know there is some visual system I mean maybe C-far is a bad example because it's it's so blurry and so weird But you know image that's similar to right. We know there is a visual system ie our brain They can distinguish very easily between different types of of you know Or are still sees things the same when you add a little tiny bit of noise to it So I don't really buy this argument certainly don't buy the argument that you can't do this I mean yes, you can of course construct examples where the adversarial examples exist period, but With bounds like with with threat models like this ie I can change a tiny amount of pick You know a tiny amount of the intensity of each different channel This we are not close to being the regime that Nicholas was talking about where we're actually changing the semantic content of the image We're nowhere close. These images are the same images as what we see you know we see the same exact thing and Somehow we are unable to build a classifier that can distinguish between those two And I think that is a big problem and a big question that we should answer If we're really the claim that we understand what these networks are doing Okay, so with that being said And that's actually going to apply equally well that statement to the next set to me I should have left that for the sort of the final talk, but I can't even see these numbers without without emphasizing those things Okay, so let me then mention a little bit more maybe a little bit more briefly here because I'm Trying to finish like 430. So okay We talk about the other strategy we use to get To verify these things and this really is the other dominant strategy these days So so there's there's these convex relaxation, but there's also this relatively more recent approach of randomized smoothing It was actually first sort of investigated actually in the context of differential privacy by a team from from Columbia And I think some of our work here just kind of codified it a little bit and also gave tighter bounds for what we can achieve here But but the basic idea I think has has been thought about for a little while so randomized smoothing is a very different notion of How we get bounds on a classifier so whereas before and it's interesting things We were thinking about trying for an actual classifier to enumerate the reachable region of a classifier Randomized smoothing does something else So the idea is the following when we kind of think intuitively about Adversarial examples we think of them as kind of these You know regions of incorrect classes that kind of jut close to our points, right? So, you know, we have a point there if I were to say like randomly sample noise around that point things all work fine It's you know for you can have a pretty big radius and still do perfectly well on that point But we know that there's somehow a really close by point that has a different class and you can find that by searching adversarially Right so that means it's some sort of region kind of kind of sticks out really close to our current point into that And you know really close to our current point in that space So the obvious solution now is well what's happening here is our decision boundary is Just too sharp. It's too weird and you know Cut up by this weird value nonlinear we have but all the other ones do it same to be honest I mean whatever knowledge we have this weird thing so that to the space I cut it up in a weird way Basically, I think you know have a high Lipschitz constant, right? Because you can change the input very little and get a very big difference in output activations So the way we can get around that is just by smoothing post hoc kind of smoothing our classifier And in particular if we instead of sort of just taking points and classifying them by what the what the network says right there It could actually sort of convolve this prediction with a Gaussian or in other words in practice You know sample a bunch of points from a Gaussian around this thing and take the majority vote That classifier will look something like this a much more smooth version of this that wouldn't have this same adversarial example as the other one had And this is exactly what what we're gonna do here. I hope so actually I have even I think I was saying that in the last slide So let's talk a little bit more about that this so we're gonna smooth the boundary by essentially instead of predicting that the normal point We're gonna but with the base classifier F We're gonna predict with some classifier G which is basically the the most likely class under Gaussian noise added to this point and importantly this noise is not adversarial and nor nor are we trying to somehow sample? You know fine with us with this random sampling the actual adversarial point No We're trying to actually sort of sort of get a smoother notion of what the majority class is in that region and that smooths the underlying classifier And it was proposed in a lot of contexts actually as a heuristic defense and it worked okay, but they always sort of Overclaimed what it could actually do But then actually some work proceeding ours Demonstrated this actually can be can give give bounds, but our analysis actually which which I think has has brought this to the forefront a lot to a certain extent Was actually sort of a way of simplifying it from a more sort of probabilistic perspective is also it also gave tighter bounds In fact, the but the balance we have actually are tight They're the tightest bounds you can give because they're actually they are an equality in the case of a linear classifier So there's there's some classifier that achieves our bounds Okay, so what does I mean just again to what the sort of intuition? What does this look like? What it looks like is instead of classifying, you know, you get now I'm back to the panda Example here, so switch to context entirely if I want to classify this panda I don't just classify the panda what I do is I add a bunch of random noise to this panda and I say okay What's this? What are all these images? I take the most likely class and pray that instead, okay? and Now as one little note which I'll mention later. This does require that our base classifier can Classify noisy images not just normal. So actually to train this classifier We end up having to train it differently on these noisy images rather than on the original images That's also an important point, but but it does if you do that these things can work quite well Okay, so if you do this let me now give a statement of the kind of guarantees we get about from our from our Random my smoothing procedure I'm going to focus on this in my binary case now. So Given some x we're gonna let g of x be the prediction or y hat It was equal to g of x to the prediction of the smooth classifier at that point So it is the most probable class under this Gaussian smoothing And I'm gonna let p which has to be greater than one half Otherwise it would be the other class right speed p greater than one half be the associated probability of this class under the smoothing distribution So basically what you know, what's when I when I add this Gaussian noise p is how often that class comes up Okay, and the guarantee says Then we know that for importantly for this smooth classifier So this is only giving a guarantee on the smooth versus classifier not on the underlying f But for the smooth classifier if I perturb x by some delta Then actually we know that this perturbation also equals the same class. So in other words, we are robust to these perturbations For any delta such that the north the l2 norm of delta is less than this term here Where sigma is the variance that's the standard deviation of the Gaussian noise that we're adding and Fee inverse here is the inverse cdf of a normal Gaussian distribution and that's this function that looks like this So this function at at zero or sorry at 0.5. That means p is basically of coin toss We don't know what it is Then you can basically certify nothing because you could always you know change a little change could change anything But as basically as p increases in other words as more and more of our space around the point looks like the majority class This means we can actually can certify a higher and higher region up to the point where actually as p approaches one We actually can certify as big a region as we want That only works because the Gaussian has infinite support So if you're doing this exactly and you always get the correct class Then the only way that should could actually happen is if the entire space was that one class That's not actually a meaningful thing here You're not going to get that but as you get higher you can certify a higher and higher radius around the point And so you can buy as as you as as the empirical probability of the majority class for the smooth classifier A lot of qualifications there, but hopefully that all made sense increases You're able to certify a higher and higher radius and not just sort of you know, we think it's robust there But actually guarantee that the smooth classifier will not change on that In the interest of time, I'm actually not going to give the proof it's it's a quick proof, but um I've given a talk at Simon's if you want to find the video that gave it Simon's to go through this with with similar well These slides were the same so you can go through that Unfortunately, I do want to get to my last point and show some results here You cannot prove any of the regional classifiers is only giving results about the smooth classifier So the process is we're going to replace our process of classifier of a classifier with this notion of a randomized classifier instead And that is our new classifier period So we're going to do and That's gonna be our classifier and that's what we're going to get guarantees on Okay, so let me actually give a few caveats and some fine print here um Okay, first one. We just said Not anything about f here saying here, right? That's actually very hard to do This is why we have the other kinds of bounds if you want them Very hard with the same things that we do that f, but you know what a randomized version That's a good classifier too. Why are we increasing ourselves to non randomized classifiers? I don't know um Now in practice, you cannot usually actually compute the exact probability p if they're used Monte Carlo sampling, of course But that's actually fine. You can get a very high probability bound on on the on the um certification It's actually really arbitrarily high because it the the failure probability is the bounds The number of samples you need grows, you know, not very much with the failure probability So you can you can get as high probability as you want with that um And to be clear when it's a high probability I don't mean that like an adversary could flaunt that probability that high probability Guarantee, I mean high probability over my own internal randomness, which is you know independent of the adversary. So it's a meaningful guarantee here um But I also want to mention that you know, this all sounds good and that picture of of the panda That seems really impressive because that was adding, you know, gaussian noise and that's like a lot of gaussian noise I want to make a point though We are certifying a tiny radius Compared to the noise that we are adding to that image The reason is we are adding independent gaussian noise to each pixel But i'm certifying an l2 radius that scales like the standard deviation of that noise in other words The size of the radius that i'm i'm bounding in a certified sense Um is square root d smaller than the amount of noise we're adding to there, right? So this is a tiny radius. We're talking about certifying radii of One total pixel value or two total pixel values in l2 norm I mean as we can change one pixel if we want to or two pixels or spread it out over the image if we wanted to and these Kind of things but they are tiny perturbations that we're still doing so we're still in the regime You know, i'm not at all worried about what this was talking about because we're not even close To being able to change semantically the image with these perturbations. We're nowhere near that So let's be you know, let's be realistic here about what we're doing We're trying to change we're trying to just a problem where the differences are imperceptible And that's the domain we're we're working on right here um So this just sort of shows what's the kind of guarantees we get so we can basically get um What these guarantees show is that these show with for a different radius Um, what is the accuracy that we are able to certify? So if i'm allowed to perturb, you know up to one pixels is on c far now up to one Normalized pixel i can certify that no classifier will have you know No attack will be able to achieve less than that accuracy If i if i am allowed to survive two pixels i can guarantee that no attack can have less than that accuracy Uh, and that's the that's the sort of the guarantee we have um, and we have and of course, um Actually, maybe i won't i have some things on the on the next slide too. We can do a similar thing for image net The key thing here is that as you increase one key thing is that as you increase Sigma that's your noise you add to your image You can you basically for higher sigma or for lower sigma You're able to get higher accuracy because you're adding less noise to the image But you have less robustness until you sort of fall off quicker Whereas the more noise you add you have a longer tail of robustness You can certify more robustness later on But you you you're not as accurate at the beginning there And that makes sense actually because um You know what we're doing here is we're adding noise when we're asking the thing to classify it This is a starfish, right? Do you see there? That's the class of it when I add one sigma equals one a pixel of noise to that You can't see a starfish anymore and so this actually doesn't matter how good a classifier is It's not going to get this performance So you you you do just like always right there is this tradeoff between accuracy and robustness And when you have robustness like this, um, you are you know going to sacrifice some accuracy Um Okay Let me I have eight minutes left Let me talk about the challenges So the the first thing is like and I keep coming back to this because I think people See pictures like this and they hear, you know 60 percent robust performance and we're getting close to you know the level of normal nominal performance And they think that somehow these problems are solved But but they're but they're not and we still don't understand Really what's going on here because even the state of the art right now The best we can do empirically I mean to be clear. I mean empirically also right the best we can do right now Is come up with models that can certify or empirically defend against a tiny tiny perturbation region Which for the most part is imperceptible to humans really M. This is the only one Because it's so simple. It's basically you can solve it by binarizing it. Um, it's a simple problem But that's the only one where to be frank you can even really see The perturbations we're talking about here the rest there is no concern about changing classes without without much of a change We can't see the difference. These are semantically identical to humans and we can't even solve that still So I think that you know It's great that progress we're making here. We are making something. I mean a couple years ago We couldn't talk about any sort of guarantees whatsoever And we can now but we're a long way from getting the level of robustness that we actually want That we as humans just take for granted And we we sort of we know what we're capable of We know that if I change a few pixels, you know, I won't see a An airplane out the window where I won't think you know this this computer is an airplane Um, but it's nothing like that for deep networks They are exploiting whatever these you know features that I was talking about whatever these robusts or non robust features are They are using those and whether or not we can build things that don't use those is really an open question um number two Even though I think we have plenty far to go. I'm not concerned about the norm thing I'm not going to actually clarify. I'm not concerned about the norm thing from a standpoint of Of being necessary conditions Any robust model should be able to deal with the size of perturbations that we're talking about here under norm balls Just period to be called a reasonably robust model However We're not talking about semantic changes. Nothing like that yet However, it's also the case that we should think about going beyond norms not because they are Not necessary, but because they're not even close to being sufficient right There are semantic changes in images that have huge differences in norm but change the image almost nothing Right so an example and actually I want to talk about this in a second So an example would be like you know shifting pixels over in an m this digit by like one pixel That will flip entirely some pixels and so the l infinity perturbation difference would be one For that but it's semantically the same image. There's no difference So how do we encode that? um The other the other sort of now from a more methodological standpoint is that There's this really strange property that you might have noticed from these two approaches I talked about The work on linear programming bounds was entirely specific to the kind of network It was about relu networks. We were forming, you know linear programs describe these relu networks in this way for other activations. You need other You know relaxations. You need other bounds to bound other activations, etc It's really complicated actually the the math gets pretty complicated pretty quickly The randomized smoothing on the other hand Never actually once used the property that f was a neural network It's completely oblivious to what the underlying classifier actually is it can apply to anything Anything to help us a prediction you can apply randomized smoothing to So somehow there has to be some middle ground here But we have to I I think If we're going to get bounds, they're tight They're anywhere close to the level of robustness that we think there should be these of these things I believe that we need bounds that take into account the structure of the network And probably which applies some amount of randomization because it seems like a really powerful strategy You know the the a randomized classifier seems to be inherently Kind of a little bit more powerful classifier in some sense or you know a smooth version of that seems to be an easier thing to deal with Than a normal deep network So I really think that there's there's there's inner there's inner there's middle ground here Which is going to be the key thing to uh to moving forward And the last thing I'll say is that we we but Um Just on this point of going beyond norms altogether. We have been making some progress there too So, um, you know, there's obvious problems with L infinity balls is that these things are not You know tiny well those those are all L infinity perturbations. There's there's L zero L two and um an L infinity And we haven't solved them yet But it's still important to think about other things because there's lots of manipulations of images You'd want to think about like translation or rotation or distortions And we have come up with a metric that sort of captures these things pretty well Um, at least for sort of small deformations, which is vaster sign distance In image space. So I'm not going to get the vaster sign distance. I have you know two minutes left So I'm not going to do it too much Basically vaster sign distance is the solution to a linear programming involving Basically thinking about the image pixels as sort of a distribution How much can I uh transport the mass of the distribution to lead to another image? So it's sort of thinking about like Shifting the the the kind of stretching the mass of this of this image Um, and it's actually given by the solution to an LP. Um, and What we do is we actually think about perturbation regions That are the set of all images that have some small vaster sign distance with the original image And for this thing it actually is hard to do this and we're not even talking about certified methods ever So now it's actually still backing on empirical robustness. We're actually able to have some nice properties So if you look at sort of vaster sign attacks versus so so these these ones here were um, We're l2 Sorry, we're norm bounded attacks the attack themselves doesn't really capture much meaning in the image, right? It's sort of sort of this random thing. It's just like oh these these pixels might be good to to mess around with um For a vaster sign attacks. These are the perturbations that we were actually are applied So, you know, we take this image here. We apply this perturbation and you get a bird Um, you take this the car you apply this and you get a deer, etc It still looks the same right to us, right? There's still no difference here as far as you can see Um, but what happens the perturbations themselves actually correspond more to semantic features of the image Which I think is the important point here is that somehow this is a better notion of what it's shifting to make things Work differently which gets a lot. She's a lot as a similar point to what alex had about the about the nature of these robust models This is sort of a similarly nice nature of robust attacks That makes this nice Um And we can do robust training and I also training of that and we get more robust models to this attack But I won't I won't go into that Um, that actually is is all I have Uh for today, so again, I think there's a huge interplay right now going on between or a potential for interplay They're they're pretty separate right now between how we build these robust models. We have convex approaches We have randomized approaches and on kind of on the on the third leg here. We had this notion of how do we go beyond uh Our traditional notion of allowable perturbations to actually achieve something Which is more in line with what we think of semantically a little bit more in line But what we think of semantically is equivalent And really a lot of the code for a lot of these things for all those three different three different pieces of work It's all on our on our labs page. We have a lot more You know, there's papers on my website and we have a lot more information on that. So thanks very much And now we can have a well-deserved Break, but I guess I guess I'll take a few questions But I don't want to stay in between you and and and some food. So let's let's maybe have just one or two. Yeah Um, so, which models do you use for these and how do you think So see when you focus on mainly classification models, how do you think this has to out-of-votations? Yeah, that's great. Um, okay. So first thing. So first what models we use We basically use more or less res nets of whatever size we could so for the convex methods Those don't scale very well. So there's a pretty small model. We're talking about, you know Tens of thousands of hidden units at most for the randomized smoothing with the image net We use a normal res net 18 or no, so we're not 50 for that Um, because because those doesn't really matter with the underlying thing as you just just use it So he's a res net 50 for that. Um Object detection, yes, that's a great question. So it does become harder because the notion of I mean the loss functions they use for object detection are really kind of weird, right? But what they actually pass me on that they have uh, uh, all these you know, these these, um, I mean You first propose regions and you classify those regions and things like this, right? It's a very multi-stage process So I think it is possible But I think we haven't even done a work enough work, frankly on the empirical version of robustness for object detection or for segmentation To the point where I think we don't even quite know what the right loss function is To address to try to bound in a formal way for provable guarantees So it's definitely I think it's a really good area to think about because I like thinking about even attacks on those spaces as well But uh, we don't yet know what the right loss is To you that we're even trying to get you know, what's our objective? What are you even trying to guarantee? I think that is not yet known as well as the threat model there is even harder because there you have you know Can I change just the object itself? Can I change other parts of the of the image? What's the what's the allowable provisions? I'm allowed. I mean, I guess if you just go with l infinity Then you have something similar Um, maybe it's the good starting point, but I think we don't yet know what the right criteria is to to even optimize No, we train them from scratch because because especially randomized smoothing Um, they have to be residents that work on those noisy images So you have to train those from scratch. You can't use a pre-trained one that won't work So actually one only thing interesting happens about the convex balance, which I didn't really mention If I think a normal network trains normally and compute one of those bounds They are always vacuous. They're always equal to one You know, so it's the probability of or rather, you know, there's you can certify a radius of zero always essentially or maybe 0.0001, right? They're vacuous entirely They only work when you train the network to minimize that upper bound So our bound itself because it's there's so many levels of relaxation Um, it really requires that in order to work It requires that you train the network to minimize that bound And then you will build a network that is inherently sort of, you know, uh, regularize enough and it's actually a very strong regularizer To ensure that that criterion will be will be a good one And that's sort of the the rough approach that we that we use But that's actually very important. Otherwise the bounds will be vacuous if you apply them to anything but a Classifier trained to minimize that bound Yeah Yeah, so thank you for the talk. I enjoy and I'm learning a lot from now and my question is Is that a similarity between For sorry, an example to represent the distribution shift of the momentum or is it not correct? Yeah, so the that's a great question. Um So the question was is there is a similarity to a robustness between distribution shifts and robustness to uh test time examples In some sense you you you really hope so right because you know A distribution shift could like shift the examples but in that threat model and so then then it's okay So I guess you know in in some formal sense if your new distribution Uh gives samples that are entirely in the in the perturbation regions allowed by a threat model than yes Um, but the big caveat here is that the distribution shift we actually see in practice doesn't really matter Uh, these things are equally sensitive to it So a great example is that um ben rekt and colleagues have some recent work on they tried to recreate the c-far data set Using as exactly you know processes they could to mirror the original collection process And they find that basically every single classifier loses like five percentage points of accuracy Um, so you know think of the gets you know 98 and there's there's actually there's actually still ordered amazingly enough They're they're not like random they're not like randomly ordered like so everything drops by like, you know five percent So, you know if you are 94 percent before you're not 89 percent um And so there is obviously an image net or in c-far some really weird Distribution shifts maybe about how they labeled them or even if all it is both they possibly could you know Which ones were picked out of this million images thing to do it? But If you train a robust model You see almost that exact same five percent drop So we are not Gaining robustness yet So the kinds of even very small distributional shift that we see in other words what I would say is We don't know how to quantify the threat model The allowable perturbation region that we see in most distributional shift settings And that's the key we have to Would have to describe what what we are allowing what kind of distribution shift we are allowing Integrate that into the threat model and train based upon that because unless we do that We just these things, you know, they don't generalize outside their threat model as you would expect them not to right? So that's that's the issue. All right. We are running late. Let's thank the speaker