 Hi, so I'm Nicholas and I'm going to present our paper on cryptanalytic extraction of neural network models This is joint work with my co-authors Matthew and Elia who worked on this with me when they were at Google So the basic question in our paper is we have some machine learning model which exists in some black box and We're allowed to make queries of this model with arbitrary images and observe the arbitrary outputs And the parameters of this model have already been loaded into it But we don't have direct access to those and so the question is as the adversary if we're given access to make arbitrary image requests and Are aware of the type of model that's being deployed and are able to see the outputs of the model but don't know what the actual parameters are can we Solve for these parameters given these other variables Now the reason this is important is that machine learning models are very very expensive to train So the recently released open AI is GPT-3 model took probably roughly 10 million dollars To obtain these parameters that are loaded into it, but everything else Like the architecture was actually fairly easy to be specified ahead of time And so it's very valuable for companies to be able to protect the intellectual property of their parameters And so the question is can from the adversaries perspective, can we steal them? So we're gonna try and answer this question given query access to a neural network. Can we extract these parameters from the model? Now there are two ways you can view this problem The first way is what most prior work has done and studied this as a machine learning problem Neural networks with types of functions machine learning is good at function approximation So let's treat the neural network that we have access to as some supervisor and Use this to supervise the collection of some new data set We're going to train a new model in this data set and use this in order to solve our problem and that works reasonably well What we're going to try and do is we're going to try and do a different approach and solve this as a direct Mathematical problem treat the neural network as a sequence of functions that we can analyze directly and try and actually recover The weights that way and the reason why we're going to do this is that this is going to let us Actually recover nearly identical models instead of just a similar model that solves a similar task reasonably well So this is our question given query access can we extract a neural network and Basically the answer to our paper is says yes, and we are able to do this with various caveats Okay So in order to understand our work I need to give a little bit of background on how neural networks work for that fortunately just about a minute is sufficient So neural networks are a sequence of linear layers with nonlinear activation functions Each layer has a bunch of neurons. So the way this works is I have some values that go into the input This is maybe the bits of an image and then the neural network will propagate these values through Process them at the neurons apply some nonlinearity Repeat this process layer after layer until finally we end up with some set of values Which give us the final output of the model and this final output of the model is then admitted to the outside world as The result of the neural network evaluation Now if I zoom in on one of these neurons to show you what's going on here The input neurons have some value and the way that I compute the output neurons value is I basically just Take the dot product of these with the weights Now the weights here are a one and a two these are the parameters which we're trying to learn Which we don't have access to but what we can do is provide arbitrary inputs So all we're doing here is just a dot product and taking the sum or if you view the entire layer as one operation a matrix multiply Now if neural networks were completely linear This would be very boring because they would all collapse down to one layer So we have to introduce one single nonlinearity and most often is the value activation function So that's what we study in our paper and the way that this works is literally all you do It's a rectified linear We're just going to take the max of the input and zero and just compute this as the only nonlinearity in the entire neural network So sort of zooming back out We're going to do a dot product and then take the nonlinearity with this value And we're going to do this for every single neuron in the neural network Okay, and that's all the machine learning is for the purpose of this talk It was also independent of anything else about how this model itself was trained. So let's talk about extraction then So the way that we're going to phrase our question is given oracle query access to a neural network Can we extract the exact model and it turns out this is a proof of impossibility here It says we can't hope to achieve this and there are multiple neural networks that compute the same function But have different bit representations So the best we can possibly hope to achieve is what we call functional and equivalent extraction Which means that now what we're going to try and ask is just ask for any one model Which behaves the same at an input output behavior as the model we're trying to steal Turns out again There's another proof of impossibility here that says we can't hope to achieve this for some pathological neural networks And so what we're going to ask instead is in the typical case of a neural network learned with stochastic gradient descent Are we able to achieve functionally equivalent extraction and then the main result of our paper is to say that yes empirically We are able to do this Okay So now let me give you some background on how previous attacks have tried to approach this direct extraction There are two papers here that have done this and they both work for the case of one hidden layer neural networks The visual intuition for these attacks will go something like this Neural networks that have one hidden layer look like this The number of neurons in each layer is arbitrary I've put two input neurons here because that lets us draw them nicely on slides three hidden neurons for simplicity in one output without loss of generality So the neuron exists like this and if we draw the what the neural network actually looks like on the plane It looks like this what I'm showing here is each point here Corresponds to a potential input to the neural network Where along the x-axis we're varying the first neuron and along the y-axis we're varying the second neuron Now each of the colored regions here Corresponds to a linear region within this function. We call that neural networks are piecewise linear functions because of how they're constructed and What you should notice here is not the fact that there are these seven different regions But instead the fact that there are really these three lines that divide this region into spaces Now each of these polytopes has the property that we are in some particular linear region with respect to the neural network So maybe this this first line here might correspond to what we call the critical hyperplane for this middle neuron Where if you're above here, then you're in the positive region And if you're below then you're in the negative region of this particular neuron Similarly for the other neurons. We have this positive and negative side Now what this means is that I can what I can do is I can label each of these polytopes With the nerve with the positive or negative sign assignment of each of these three neurons And if I zoom in on just one of these what this means in particular is that the neuron that work We're actually considering if we're within this one linear region Only has the neurons that are active actually being applied and the other ones are inactive which essentially means they don't exist and We could collapse this again down to just one single linear layer Okay, so that's what's going on in these diagrams, and they'll be important later as well So the basic observation that prior work makes is that the location of these hyperplanes almost completely determines what the function is that the neural network is evaluating and The reason why that's the case is that it turns out that given where these hyperplanes are we can directly learn values of the weights of the neural network So why is that the case? Let's suppose that we had some particular input that was on this critical hyperplane Now if we took a small direction in the red axis I'll call this the x-axis and then asked how far we have to travel in along the y-axis to get back on the hyperplane That allows us to compute essentially the normal vector to this hyperplane And it turns out we can use this to directly learn pieces of information about the weights So how do we do that? Well, let's take a look again at the neural network And let's suppose that we have this value x and y that is at the critical hyperplane Which in particular means that this middle neuron is now zero So we can do is we can say well, what if we go to x plus epsilon? Of course, this is going to change all of the values in the neural network into something We don't necessarily know but what we do know is if we now change y to y plus delta Then we again have the property that this middle neuron is zero And in particular if we now look at the weights going into this neuron that zero, let's call them a one and a two Then we can learn the piece of information that negative epsilon over delta equals a two over a one Because the only way that we could have both x and y leading to zero and x plus epsilon and y plus delta Leading to zero is if this assignment held true so This is good, but there's one thing that we lose and the problem is that while it's true We learn the ratio of these two weights epsilon and delta here What we don't know is the magnitude of this vector We don't know how big it is Now fortunately we can push any positive constants through to the next layer and things will just work themselves out The problem though is that we lose information is we lose whether or not this is pointing up or whether or not this is pointing down and This sign information is actually a critical piece of information that we just lose and can't recover and it turns out that Local information is just insufficient to recover this neuron sign So we need some way to recover it and the way we're going to do this is basically brute force So all we're going to do is we're going to query the neural network on a couple of different random points and Then say we know the weights of all of these now because we know the normal direction We just need to recover the sign So this means that there are eight possible assignments to the sign because we have three neurons in this hidden layer in General to an exponential number of neurons We need to recover the sign for and we're just going to ask for each of these possible sign assignments Could the function have the values that we queried at these points? If the answer is yes, then we've extracted the signs correctly And if the answer is no then we just try the next sign assignment So we might maybe guess initially that all of the signs are positive positive positive And then we check the sign assignment We check that this works and if it doesn't then we try positive positive negative and repeat until we find something that happens to work Now once we have the first layer with the signs extracted correctly Now we can extract the second layer trivially because it's just a linear function We just directly compute with least squares and solve for the second layer Okay So that's how prior work did this up until the point that we need to find these witnesses to these critical points on these hyperplanes So that actually is again a fairly simple procedure and the way that this works is we're going to draw a random line through the input space From maybe u to v and we're just going to sweep across this line and look for discontinuities in the gradient And the reason we can do this is if I plot instead of from like a top-down view of what's happening If I plot the output of the function as we travel from u to v we get a plot that looks something like this and We'll see that now the output of the function has really three different lines at which the gradient is this continuous Corresponding to these four linear regions and the points at which the gradients is discontinued continuous Directly correspond to these critical hyperplanes that let's us very efficiently recover these points Okay So the main contribution of our paper is to do three different things The first thing we do is we show how to extract deep neural networks These prior papers I showed you were able to do this in the case of neural networks that had one hidden layer And we're able to extend this to arbitrary depth The second thing that we do is we show how to do this efficiently One other paper that came out around the same time as us was able to show how to extract deep neural networks And we're roughly a thousand times more query efficient than that paper is and the third thing that we do is we do It's called high fidelity extraction and we can extract neural networks that are basically up to floating point precision identical to the original model okay Because of the limited time in this talk I can only cover one of these so I'm going to cover just the first and Again because of limited time I'm only going to show you what happens in the case of too deep neural networks These are neural networks with just two hidden neural layers So putting aside the stuff that I'm not going to talk about there are two pieces to our attack then As before we're first going to recover the weights and then we're going to recover the signs of the neurons So getting started a too deep neural network looks something like this where now instead of just having one hidden layer We have two So again I can show what the same diagram will look like and again We'll notice the space is partitioned into these polytopes of different regions that I've colored again Where each color again has one complete linear region and now the sign the assignments not are not just for one side of neurons But two sets of neurons neurons on the first layer and neurons on the second layer Now you'll notice that there are these same properties that there are these neurons that exist on The first layer where as straight lines to the input space But now we also have neurons on the second layer and the way those work is these ones are bent a little bit By the first layer hyperplanes if we were able to visualize what was going on with respect to the inputs to this second layer This would look like a straight linear hyperplane But because we're only viewing this with respect to the input space Which is then distorted by the first linear layer what we see here is a bent hyperplane only because the first layer bends it Okay so The first thing we're going to do in our attack is we're going to recover the first layer up to sign as before and The way we're going to do that is we're going to start off by drawing three or in general our arbitrary number of random Lines through the input space and sweeping to find witnesses to these critical points using the same algorithm as identified before And now we're going to do is we're going to try and use this information to not only figure out What what the weights are but like which neuron they correspond to on either the first or the second layer And we do this through a matching algorithm. We're going to start off by pairing all neurons with all of the neurons And we're going to say for each pair Could these neurons be witnesses to the same critical critical neuron being at the same neuron being at its critical point so for these two basically we're asking is Do their neuron do their normal vectors align in such a way that we could have drawn a line through these two and the answer Here is no because their normals don't align we can ask this for the next neuron For the next critical point for the next critical point we can keep on going until finally we end up with now Two points that are witnesses to the same neuron at the same critical point and this means in particular that this is Actually a constant linear region and therefore this is probably a point on the first layer and We can repeat this for each of the other layers Sorry for each of the other neurons in order to identify all of the neurons that are on the first layer Because only neurons on the first layer will match up in this way The problem about this is that it's possible that we could spuriously have a situation where two neurons Where neuron the second layer is accidentally found Because we happened to query it sort of very close to each other and we found something the second layer But in practice this occurs with fairly low probability and if we do this enough times These sort of things will just sort of be filtered out in the noise and the correct things will sort of bubble to the top Okay, so this lets us recover again the three lines that correspond to these to these To the weights of the first layer Now what we need to do is we need to recover the sign for this neuron is pointing up or pointing down For all of these three neurons Now previously what we're able to do is we could just through brute force Enumerate a couple different values and just trial and error check The problem is we can't do that again here because if I were to make some random queries I would have to then completely extract the second layer in order to check if my solution was valid and That's problematic because extracting the second layer actually requires queries So therefore my attack would not only be exponential in time would also be exponential in queries And that's not something we're okay with So we develop a more efficient procedure to do a efficient sign recovery By efficient in queries, but not necessarily in time So the way that we're going to do this is we're going to start off with a single point That's on the hyperplane from the second layer neuron And we know it's a second layer neuron because it's not a first layer neuron because we found all of those and then we're going to trace its path through the input space and Follow everywhere that it goes to in order to identify the line of the path that it takes and Once we've done this now we can again do our same style of argument as before We know that this is the path that it empirically takes So let's try all possible sign assignments of the first layer and ask which of these Could possibly permit the fact that this nine exists in exactly this configuration Because there are only really three variables that determine where this line actually is which are the weights from these three neurons Into the neuron that has now been bent and so by enumerating all of the signs in the previous layer We can check for a valid solution with an efficient number of queries even if we have to do exponential compute Okay, so the question now is then how do we actually do this hyperplane following algorithm? And that again is fairly simple from an ideas perspective, but in practice has lots of implementation problems So the way we do this is at some point in time We're going to have a point on the hyperplane And what we need to do is we need to figure out how to follow this point To make sure we stay on the same hyperplane and don't accidentally go somewhere else So let's suppose that we had this point here And we follow it along and following along is fairly easy We just make sure we move Not sort of dot product zero with the normal Now we're at this point. That's a multiple intersection point We need to figure what to do and ideally we want to make sure that we move in sort of the correct direction What we want to make sure we don't do is when we get from here to this multiple intersection point We want to make sure that we don't end up and Accidentally go along a first layer hyperplane or back where we came from Fortunately, both of those are relatively easy to prevent because we know where all of the first lane layer hyperplanes are and so we Can just not travel on those and we know which direction we just came from so our process of elimination We can make sure we travel in the correct direction Then We have all of the weights and the signs recovered from the first layer And so we can just peel off the first layer and completely rerun our attack exactly as before Starting from the second layer working on Now there are actually a couple more details, of course that I'm not going to be able to get into In particular the two most important of these is that in practice We have bounded floating point precision and the GPUs like to do all sorts of things that mess with us and so all of our algorithms have to be numerically stable and in general When extracting deep layers not all of the hidden states are completely accessible for example I can't feed a negative number Into a neuron on the second layer because all neurons on the first layer have values And so this is just not possible and this complicates our attack somewhat for inner layers to extract a deep So extract deep inner layers of the neural network Okay, so to briefly summarize our results then Here's sort of the main table from our paper. We're on the left. We have the architecture This is the number of neurons in each layer This is the number of parameters the total number of weights in the neural network And then what we have here are the number of queries We need to make of the model and then various ways of measuring how well we extracted it where lower numbers are better So if you compare it to prior work that was able to do extraction for one layer neural networks Our paper requires roughly twice as many queries But has the benefit that it's maybe two to the 30th times more precise Which in particular means that the worst case error between our local copy and the remote model is at most two to the minus 30th in most settings In the case when we compared to prior work who was able to do deep extraction of neural networks with two or more layers Not only are we now much more query efficient again? We can extract models that are much more precise in how we extract them to briefly conclude There are a couple conclusions. I think are important the first of these is To say that this direct analysis of neural networks is a really Useful way of thinking about machine learning. We don't need to care about the atom optimizer Or if we're using RMS prop or exactly why batch normalization does or does not work We just need to know that they're mathematical functions and we can analyze them directly The second consequence of our paper is that the field of secure inference maybe isn't so secure So it's secure inference is a field that takes together secure multi-party computation into neural network evaluation In order to evaluate f of x when f is held by one party x is held by the other And they don't want to reveal their inputs to each other As a result of our attack it means that revealing the value of f of x is as good as Revealing the function of f if given enough queries And so the field of secure inference is going to have to take into our tax into account in order to design Mechanisms to prevent these kinds of attacks so that people can't just query a model even in this NPC Setting and still learn the parameters More broadly there's a talk by Matthew who's the intern who who was with us at Google when we did this doing this work Called don't put neural networks in your ideal functionalities and the basic idea behind this is to say that We in crypto like to think of maybe AES as if it was some perfect block cipher And broadly speaking it is and we can do that but neural networks Don't really fit well into any ideal environment and are incredibly leaky abstractions And so for the time being it really is not advisable to try and idealize neural networks in any reasonable way So with that on Friday, we're going to have a live Q&A at 8 a.m. Pacific If you're watching this after the fact in a non-pandemic world I'd be happy to take any questions over email and the code to reproduce some our experiments on our paper is available online Thank you very much, and I'd be happy to take questions in one of these two formats