 All right. Good evening everyone. So, this is joint work with my student Teodora who is in the audience with Shikisen, Shwetha Sindhe and my colleague at annuals Pratik Saxena. So, let me start I guess neural networks typically do not need a lot of introduction these days depending on what news you follow and which of the AI preachers you follow. It looks like there would not be any reason for human existence in next 10 to 15 years or so, but at the same time because we got very excited about these networks. So, we started applying them in all kinds of interesting scenarios where we humans have been doing pretty good job and then we start applying these neural networks and I think Sanjith mentioned quite a few examples. For example, you want a neural network to recognize some digits on your car plates. So, you make it a small change and then suddenly it says well this looks like 0, but now this looks like 5. You make a small change to 3. So, it says this look like 3. Now it became 8. You make a small change to yield sign. It changes suddenly. So, over to the taking a left turn it changes the sign. So, this is very concerning because typically most of the people are able to drive well without getting fooled by the signs. Then we started looking at applying neural networks in other kind of interesting scenarios where we would like them to be able to make decisions like can we release prisoners and if they are going to again commit crimes and we use these neural networks and seems like they did what we used to do mainly in US about 100 years ago which is that you look at the rest of a person and you decide that if they are you know brown or black then they should commit more crimes. So, although we have moved on, but neural networks seem to have the similar issues. Similarly, you take these neural networks for face recognition and you find that all it was trying to do was to guess what kind of specs you are wearing and if someone else is wearing a specs of maybe from what a celebrity typically wears like the specs then these neural networks are going to make lots of errors. So, this is lot of very concerning I guess, but at the same time for a long term neural networks were really not used for applications where the decisions they were making would have lot of long term consequences. So, this is kind of the one of the first times when we are using AI in scenarios where if it makes wrong decisions it can have a lot of impact and this requires us to think about how we are going to verify these networks. Of course, we are not the first people to look at this problem. There has been lot of work in the past. Sanjit talked about lot of interesting ideas. I do not think I will have time to discuss all the prior work, but broadly from the verification community we wanted to look at this problem the way we typically do the verification of software and hardware systems where you have a network, you think of that as a model, you have a property and then you would like to design a system that should say answer yes or no. So, let me look at this into one of the application that I talked about and in that case what we want to know is that we want to use the system for one of the applications where we would like to predict if someone is likely to default if they give them a loan or so. So, in this case what this would look like you have set of features assigned to every person and then you change the feature for example, if you change the gender then most of us would agree that changing the gender but having the same features same income should not make someone be more likely to default and this is kind of if a neural network turns out gives a different output then this would be very concerning. So, we look at most of the systems this is a very popular data set you look at the system that are trained and you quickly most of there are lots of what is called adversarial attacks and they are usually able to find such example and this usually creates a lot of news in media because then suddenly someone says well this network will make a wrong decision. At the same time if you realize it is not like humans do not make wrong decisions it is that we want to design system that often do not make wrong decisions. So, the existence of one counter example is although bit surprising because we expected these networks to be far better than us but that does not imply that we are going to stop using these networks. If you think about you find an example and you tell this to the designers of the systems then they say well let me try to fix it they have different kind of defenses but eventually they are going to we are going to use the systems that are highly accurate as long as they do not often make such kind of mistakes. So, the notion of verification that we have stuck on for a long time where given a property and a model we want to find whether there exists an education of this model that does not satisfy the specification that is kind of bit outdated for the design of this machine learning systems because if you go to an AI conference no one ever claims that their system has 100 percent accuracy it will almost be very laughable if someone comes and claims that we achieved 100 percent accuracy you get very high accuracy. So, you expect that these systems will not make mistakes all the times but that does not imply that these systems will never make a mistake. So, this requires us to look at this problem from a different notion and this is what in this work we precisely set out set on to do which is to ask that how should we define the verification problem and then the techniques that you can use to quantify this definition and so on. So, I am going to discuss that in particular what we want to know is that for how many such individuals such a case is going to happen because if it is only one case where you find such a bad instance then you can say well I can use this system and I can be aware of you know one or two bad cases but if the network makes us mistakes very often then you will be very concerned to use it. So, how do we check how many cases it is going to happen? One try is that well how about for all the different features we have we can keep changing the values from male to female and change all the different features that as you can see very quickly is not going to work very well because in that case we are going to have to try out exponentially many. So, what has happened is that we have to move from this notion of does there exist to one such configuration to counting how many and as I am going to discuss that this requires new techniques. So, I am going to discuss three contributions in this work this is kind of the first paper we wrote we have a follow up paper and there is a lot of work that needs to be done and I would really like to invite you to join us try the tool collaborate with us. So, I am going to discuss three things first this notion of quantitative verification I am going to quickly formalize it then I am going to discuss a procedure to estimate this correctly and then I am going to discuss how we use it for one class of neural network that are very widely used these days in particular cases where energy of is of lot of concern ok. So, the first question is we would like to quantify this. So, let me formalize this what do I mean by the quantifying. So, let us say for given this network let us say RF is the number that says how many times it makes us mistakes we would be happy with computing this number exactly is usually very hard. So, we would be happy if we can compute this number within some epsilon factor with confidence at least 1 minus delta these are the two parameters that are decided by the user they can say how much confidence they would like and what tolerance they would allow to do ok. So, we would like our estimate which is within 1 plus epsilon factor. So, this is a tolerance with confidence 1 minus delta both are these quantities specified by the user ok. So, as I discussed earlier that trying to do a quantification using very kind of naive techniques is not going to work because if you really try over all these possibilities then there are 2 raise to 99 checks and that is not going to finish I think Mate has enough specified that there are too many there are very few items and there are too many possibilities. So, I am not going to go there again. So, also if you just try to use very statistical techniques and the Monte Carlo methods usually cannot give an estimate that is theoretically sound. So, we need to go beyond being able to use very kind of simple techniques. Monte Carlo techniques is what Sanjay discussed for another class of problems where it turns out to be fairly simple problem and in that case it might work, but typically Monte Carlo techniques do not scale which we have seen in lots of other cases. So, our approach is that we can reduce this problem of quantity verification getting this estimate to that of model counting. So, I am going to discuss the problem of model counting is that given a formula you would like to count how many solutions are there and how do we do that? Well, we start with the neural network we have a property we convert both of them into a formula which we call as specification then now to compute this estimate would require us to do counting over the number of solutions of this formula. So, let us get started. So, let me discuss more about how does this encoding work? A quick question to ask is that will we be able to do it for all kinds of neural networks and that is a very interesting challenge because neural networks are typically highly non-linear, involve constraints over real variables. In this case we focus on one class of neural networks which are binarized neural networks. In these cases what happens is that the features are all binary and inside every layer you finally compute a binary number and another thing that is very interesting about these neural networks is that they are very compact and typically are far less energy hungry. So, they are being in comparison to traditional neural networks which are very require a lot of energy just to run these systems. So, if you are going to implement neural networks and systems like embedded devices or phones then these are more the likely candidates for implementation. Lots of groups we do not design neural networks, but the groups that design neural networks for example, from at Montreal from Yosia Benjia's group. So, that has been working quite a bit on these neural networks. So, these are candidates that are going to possibly be implemented in lots of devices around the world. So, we would like to concentrate on these networks. So, what do these networks are? You get the binary inputs and the outputs are going to be binary. There are three two kinds of layers. There is the internal layers and then there is the output layer. So, I am going to discuss every internal layer is composed of typically neural networks. You have one multiple layers and each of the layer has three parts. For binarized neural network there are three parts of these layers and these are you have first the linear layer then there is a batch normalization and then binarization. So, we are going to go into all these three in slight more detail to show you how to encode. One of the most important thing about using techniques based on site is that the encoding matters quite a bit. We are going to see more of that tomorrow. So, in this case coming up with the right encoding was in fact most of the work because if you naively encode then we quickly see that these techniques will not work. So, how does every layer work? Well, you get set of inputs and the first layer is trying to compute the output of each of the layers is a dot product with the these are the parameters. So, corresponding to each of the edges they have set of parameters and the x's are the inputs and then there is some additive factor bi. So, you take the dot product and you sum this with a given value. So, this is the first layer. So, this is the linear layer. Then we would we had to do batch normalizations. So, this is for a long time we have figured out that you do not want to compute you there is some underlying distribution and you need to shift it. So, without going into too many details what I would say that this is kind of a linear shift. So, you have these parameters mu and sigma with alpha and gamma. So, you do a linear shift of these values that you have computed and then the third thing that you do is to binarize these values. Remember one of the important things about binarized neural networks is that these three layers are inside every block you can think there are three layers and all of them are kind of viewed as one unit and in fact, there is a lot of work on how to do these three operations are very fast. So, finally, we got binary inputs here x's are all binary plus 1 minus 1 or you can think of 0 and 1 and then we would like to output plus 1 or minus 1 ok. So, to do plus 1 or minus 1 you check whether the value of this T i is greater than equals to 0 in that case v i is 1 if it is less than 0 then v i is minus 1. The reason you like to do these operation of plus 1 minus 1 is more to do with the numerical computation they turn out to be faster. You can do things like Fourier of fast Fourier transform and all to do these computations we do not have to go into these details, but I am going to show that plus 1 or minus 1 is just really the Boolean value 0 and 1 right. So, we are going to that do that transform ok. So, this is the internal block and then you have sequence of these blocks ok because these are of course, again binarized neural networks are deep for reasons that probably machine learning experts would be able to explain better than me. And then your sequence of these internal blocks and then we look at the final block and the final block again has two parts. So, the first is a linear layer. So, again you do this dot product plus something right and then you take a arg max over these quantities. So, essentially the way to view is that for the classes that you would like you get an estimate of what we think is proportional to the probability and then you take the maximum of the values and you output that to be the answer ok. So, this is kind of a very standard neural network here inputs every layer compute some values finally, you get a big vector and you take the maximum out of that. So, very standard design of this neural network. So, how do we do a go about encoding well let us look at the encoding and I am going to discuss the encoding that works the best here. So, we have set of these weights and this B i's. So, these are the way the whole system works is that someone gives us the these binarize neural networks you go through the training. So, you get all the weight parameters and these other B i's and these linear transform parameters. So, now to encode each of these T i's is to say that we will we want to take the dot product plus B i's. So, this becomes a what is called pseudo Boolean constraints we are going to go in more details tomorrow. So, you are taking the dot product. So, you would typically have some coefficients in this case all the weights are 1 1 1. So, you get x 1 plus x 2 plus x 3 minus B i which is 0.2 2.0 ok. Then the batch normalization layers. So, let us say these are the parameters we learned. So, in that case we are going to multiply them with you know 0.8 plus 2 ok. So, this is the second layer where we are going to do again another linear transform here we multiply all of these values with a factor then add something and then the binarization is to really look at the case where you say well if this is greater than all of these values I just did a bit of rearranging because I would like to know if this is greater than equals to 0 then it should be the output should be 1 otherwise minus 1 ok. So, this becomes more like a mixed linear integer programming problem and this is the first encoding that we looked at and turns out this was not a very good idea. So, what turns out finally, a very small observation that you can reduce this problem from mixed integer linear programming to integer linear programming by taking the ceiling here seems to have a lot of impact. So, this is kind of one of those exploration where you spend a lot of time and then finally, you realize one afternoon discussion makes all the difference right. So, it is very important to figure out how to encode it properly. So, encoding at ILP is a very key here because if you try to encode as a mix into the linear programming then these constraints when we had to feed them to the set solvers become hard to handle. To go from binary to Boolean this is kind of again a very standard transformation you can always look at plus 1 minus 1 you can transform those you would like to say that if the value is 0 then you give the you want to map minus 1 to 0 and plus 1 to plus 1. So, you do this very standard transformation. So, finally, what we get is we get in this case we are getting every coefficient is 1 and then we finally, get a cardinality constraint. Now, we need to figure out how do we encode this cardinality constraints into CNF because typically the solvers that we had to use to do the second stage which is to count required into CNF encoding yeah coefficients are also 1 minus otherwise we will have to do pseudo Boolean here. So, if we want to do general then we will have to do oh yes you are right yeah that is important. So, I think one part here is that to be able to do the ceiling requires that all the coefficients are plus 1 or minus 1 ok. So, in this case what we finally, get is a cardinality constraint where you want to say greater than equals to or less than equals to depending on how we are encoding all of these and. So, remember you finally, get something like this is a cardinality constraint greater than equals to some threshold should imply that the final output is 1 or otherwise minus 1 right ok. So, this is kind of the most important part about the talk that how do we encode it and we spent a lot of time in kind of getting this one inside that we should be taking the ceiling otherwise we were getting very slow performance. Well these constraints finally, become cardinality constraints. Oh I see. Yeah. So, the way you can think of is that let us look at this way on right hand side we have which we are going to treat as 0 or 1. So, we have a Boolean variable on left hand side we have a cardinality constraint now we have to encode all of this into c and a formulas. So, we need some way of translating this into c and a formulas. So, what kind of property do we need about this translation? Well in this case we would like to make sure that when we do this encoding because finally, we are going to do a model counting of the number of solutions this encoding should preserve the number of solutions. This is actually a very different notion from the earlier studied notion where you would like to preserve the satisfiability. For the whole day when we are looking at formulas we wanted to encode so that we can preserve the satisfiability. In this case we would like to preserve the number of solutions. Turns out that the site in encoding that which is very widely used can be which Mathe discuss in the morning where you introduce extra variables preserves the number of solutions as long as you project all the solutions on the original set of variables. That is one case, but this is turns out a notion that is not very well widely studied because this allows us to look at other kind of encoding that preserve the number of solutions and to understand that what kind of encodings would be better for counting. So, there has been a lot of study in encoding that preserve satisfiability and their effect on satisfiability solvers, but not so much for counting. So, finally, we get this formula in the CNF form right. We have the inputs, we have the output, we write some constraints, different kind of property that I am going to discuss and then we would like to compute the number of solutions. Remember since I have defined all the variables inside the only free variables I have are the inputs right because every other variable is uniquely defined. So, I do not have to worry about rest of the variables the only free variables are the inputs. So, I have described a function in terms of inputs that gives output to this neural network and we are going to write a property in terms of inputs and outputs and then we want to compute the number of solutions on the inputs. As we know that the problem of counting is usually very hard. So, we rely on the approximate notion here where we are interested in getting epsilon delta guarantees. So, we would like to get the packed guarantees here where we want to compute the estimate within 1 plus epsilon guarantees and then this allows us that all the three steps are sound. So, the estimate that we are computing is indeed sound. So, let me now jump into the empirical results that we did and we looked at three applications. So, the first application those are the three same applications that I started the talk with. So, the first application was about fairness which is whether the neural network is fair to different individuals. So, there are lots of notions of fairness and it is again a very hotly debated topic. In this case the notion that we looked is the individual fairness where you want to say that if you change someone's attributes the sensitive attributes of someone like gender or race then you expect that the output of the network should not depend on the sensitive attributes. So, we would like no kind of discrimination based on race, gender and so on. So, in this case we took a very popular dataset this is the UCI adult dataset. So, this is the dataset about people from Taiwan and what you would like the network to predict is if they are likely to default or not given loan. And we wanted to know that how well the neural networks that are trained on this dataset do in terms of being able to if you take an individual and if you just change their gender then how does their outputs change. So, ideally you would like that output should not change if you change the gender. In the ideal world you would say that if you take this very accurate neural network and if you change the output of if you change the feature gender from male to female then you would expect that the output of the network should not change. So, we can encode this problem where we say that let us say we have x 1 and y 1. So, we are looking at two individuals one is male and another is female and they agree on all the other attributes. So, there turns out to be 66 attributes and we would like ideally we would like that the output of the neural networks are same. We can also start looking at different kind of cases we can say that well how many cases where x 1 is male and the output of the network is high and then you go to female the output of the network is low which in this case it is discriminating against men. In this case we can look at a different way where we say that you change the gender from male to female, but the output goes from low to high. So, this framework really allows you to express lots of different questions where you said that well I would like to know in how many cases there is a discrimination against one gender versus the other. And not only about gender we can look at the marital status and we would again expect that the network should not be looking at someone's marital status to make their prediction. So, here are the results and the main lesson that I want you to take out of this is probably not about these different architectures, but about the what the tool that we have developed that we call as NPEC allows you to compute these different numbers. I am going to discuss what all of these numbers are. So, in the first case we were looking at that remember we had let us go to this case about female to male. So, we wanted to know in how many cases the output remains same. So, turns out 90 percent of the cases in how many cases you go from high to low 9 percent of the cases in the how many cases you go from low to high. And you can look at different architectures and you can see that their numbers actually vary quite a bit. One thing I want to highlight is that if you look at the accuracy of these architectures is about the same. So, you would not be able to see this kind of difference, but now when you apply the tool you actually get very different estimates. And this can give raise a concern because you might think that let us say this architecture seems very off because it seems like it is giving preference to one gender. You can do the same kind of question for another attribute may be a subset of attributes where again you would like to compare this estimate. It depends on in your dataset whether you have data points where all the other attributes are the same. Yes. And only the gender is different. So, this is a verification question that we are handling which is that someone designs a neural network and asks us how well it does about the fairness of you know different attributes. So, how do you fix is a is a different question which is that now you know that these are the properties of the this neural network. So, the count may include cases which may never appear or something. Yeah, that is very important because just for the dataset you can compute the numbers very easily. Remember we want to look at. So, for the dataset they were fair. For the dataset I do not know what happened for the dataset. We do not know about the dataset how yeah I think we were looking at what happens overall, but so the most important thing in our case is that we are able to give these estimates what you do with this is another thing that can you use this to improve it. I think what Sanjit mentioned yesterday is that maybe they found cases that it was never trained on such cases. So, that may be one reason that why it is being unfair. So, the diagnosis is a complete different problem not what we are concerned us so far. Yes. So, that is kind of a very surprising thing which you typically do not see when you just look at the accuracy. And the other aspect I think I would like to highlight is that in all these cases the number is greater than 0. So, the fact that you say oh I could find for each of these a different counter example if all you are looking is what is called this adversarial attacks where you start with a network and you say oh I could find there exists such a case for each of these you find such cases if you are only looking at the satisfiability problem the only answer you would get is yes for all of these. You would not see this variation that is giving a far more richer picture about all of these different architectures right. So, now the second application that we looked at is this robustness application where what you do is that you start with the image and you want to understand if I make a small perturbation to this image what happens to the output ok. And I showed a case where if you make a small perturbation then the output can go from 0 to 8 you know. So, again you can encode this property of robustness remember we had the neural network encoded right in CNF now we have to conjunctive with the properties. So, in this case you can encode robustness by saying that you want to look at two inputs x and y which differ at most k bits and you would like to again count the cases where the outputs of x and y differ ok. So, you are looking at two inputs we differ on k bits and you would like to compare in how many cases it happens that they give different outputs. You can also fix your x you can say that I would like to understand for this image in how many cases it is the for how many small perturbations of let us say k bits it is the case that you will get a different output ok. So, this formulation is fairly general allows you to express different kind of properties you would like to do. So, again what we wanted to do is that we got lots of different architectures with different defenses. So, again this is kind of a cat and mouse game people come up with the architecture they say that these are very robust then someone comes with the attack and then they come with the defense and we want this tool to be again able to measure these architectures right. So, we are not coming up with the defense we just would like to allow the developers to be able to understand how well they are doing beyond this example of on one case. So, again we see this kind of very different interesting picture and which is again giving a very diverse picture where you see that for different architectures and for different defenses you see that the number of examples again very quite a bit ok. So, if you are just doing the kind of the traditional attacks which are just to ask oh for this network can I find one individual then for all of these cases you get the answer yes, but that does not tell you that you know architecture 2 has too many attacks compared to architecture 3. So, this is really being able to quantify how well it does generally. Then the third application that we looked at was the Strozen attack where typically you train your networks with you know some set of images and you find that the networks do well in performance. Then the an adversary comes in and they poison the data set a little bit and this poison is you change the data set a little bit maybe you add something that the adversary thought would fool the network and then what you do is that you put some bit of a backdoor so that you can fool the network right. So, you know that what is a small backdoor that you put in and that would fool the network. So, this can be used for all kinds of security applications where if they know the backdoor then they can get the network to behave in malicious manner maybe after at this point the network sends your private data and so on. So, this is kind of a very popular attack this is called Strozening. So, again this allows us to ask the same question once we have the neural network we can say that for different kind of patterns for how many cases the network is going to give different outputs right. So, that allows us to quantify whether there has been a Strozen attack on this network or so. So, usually there is a small threshold where you say that the network will behave weirdly if there is only you know for if the network behaves weirdly for very few cases that might raise a concern or if it bears for way too many cases that might raise a concern. So, people in security domain have different actions to do once they can compute the numbers. So, again you can express this as a property where you are given some pattern and you would like to know in how many cases the network is going to output to some given label. So, again what we discover is that these are the tests at accuracy of these networks and these are the cases where the network is behaving in a adversarial manner and you can see that we get a far richer picture. In this case interestingly enough this seems to correlate with the accuracy. So, which is kind of a interesting behavior I we do not have a lot of explanation that why it is happening in the case, but using this allows you to again get this far more richer picture then getting an answer yes yes yes for all these cases. So, overall just to put the picture about the underlying tools because we are using this approximate counting tools that use XOR handling and all that Marthe talked about. We are right now able to handle binary neural networks involving about 50000 parameters and here is kind of the general picture. So, we had about 1000 such formulas and we find that we can within about 8 hours we can handle about 90 percent of these formulas, but as you can see that we still take about 24 hours and still not be able to handle all these formulas. So, there is a lot of work that needs to be done in how do you come up with efficient encodings and how do you scale these techniques for them. So, with that let me conclude. So, I talked about three things the first is we need to move beyond the notion of qualitative verification where we were just concerned about does there exist an execution where the model does not satisfy the property to of quantifying how many such cases are there because the network that are being designed are not being designed with 100 percent accuracy. And this requires us to look at encodings that are you would like to preserve the number of solutions and that allows us to ask that again go back and ask question that how their performance of different encodings of cardinality constraint have impact on the underlying counting techniques. And right now we can we showed that lots of different properties all of them can be encoded into this general framework and we are able to handle binarized neural networks with about 50000 parameters that is certainly far from where we would like to be. So, this is kind of the first work we just submitted another work where we are able to handle far larger networks but still a lot of work that needs to be done the code is public I would really encourage you to try it out give us the feedback or we would be very happy to collaborate with that I would like to conclude I'd be happy to take any questions. So, in your experience are there any kind of networks with some specific structure that you can take to your advantage such as symmetry for example or other thing that can help you with all these robustness and other questions that you encounter. I think yeah for the underlying tools there is a lot of opportunity because we had to really look at these encodings and try to understand there's of course a lot of symmetry because there are lots of layers and their weights are usually fairly similar. We are not able to take advantage right now but I think this would be one area where one should be able to take advantage in terms of scalability of tools. We are really not machine learning experts neither we aspire to be I think our job is really to help the designers that if we can help them as a verification engineers in order to understand that how it is performing and give it to them to figure out how to improve the network. So, at this point we are really focused on doing the verification part which is that someone designs the network they think that it is fair or they think that it is robust we would like to give tell them how well it is doing. Yes, how do you encode exactly the fairness property? So, the fairness property there are kind of lots of different formulations of fairness this is called local fairness. So, where you would like to say that. Well, I wonder why isn't it that if why not you have the conjunction if all the the labels are the same then the output should be the same. So, you to have an implication instead of to have the whole conjunction. So, we are interested in the cases where the saying that both that would be taking the negation of this right. So, it depends what well I mean it depends what you want to check. So, that the reason why so is the is this a property that you can derive from your formula and then I think this is how you use an implication. I mean both are same. So, you say this implies nx equals to ny. So, in that case you would get in how many cases it is indeed. Yeah. Indeed the case this would say that in how many cases they agree. So, you would take the negation of it right you would say nx not equals to ny. What we are saying is we are looking at nx not equals to ny. Yes. Right and then we want to do n not of this which is really nx equals to ny. You could also look at a different way where you said that in how many cases they are equal. Okay. They are both are the other is the complement of this answer. So, if you have been talking about you know the count of adversarial inputs for a given property mostly in terms of that. But if I vaguely recollect your method would actually allow us to even estimate distribution of the adversaries on the feature space right. Yes. So, that would be a much more deeper insight. Yeah. So, I think one thing that I forgot to mention that formulation in general you can say that I would like to understand under some distribution because you can say that some kinds of inputs are very unlikely. So, I do not even care about. So, the formulation of course allows you to in that case we would be talking about the problem of weighted counting. The scalability of tools is the challenge here that right now being able to do weighted counting for these networks is a real challenge. So, in theory yes in practice a lot of work that needs to be done being able to scale the tools. So, the formulation yes the general formulation that we describe in the paper is that you would also like to understand under some distributions right. Because you know that what kind of distributions you care about and for example you can write more other kind of constraints that in this country you know females are more educated and I would like to understand under this distribution. So, how do I quantify it? Scalability of tools is quite a bit of challenge right now. So, the number of variables is linear in the number of neurons or parameters or something. Number of variables in the cardality constraints. In the final formula that you are giving. Yeah. So, they are linear in and if you just represent everything in the cardality constraint then it depends how we encode in CNF. I see. So, we get about you know looking at different ways we get about an order of magnitude actually about two orders of magnitude. So, we start with about 50,000 parameters we end up with. So, I mean what is it the number of parameters or the number of intermediate variables at the neurons that matter. That is the number the number of parameters. Each neuron could have more than one parameter for example if I am using Relu or something with. So, the parameters are the weights Wis. So, these are exactly. But the weights are fixed right you are taking a given network. Yeah. So, why should the. So, for example if each neuron had 10 parameters. Yeah. That does not change the number of variables. You are still interested in what is the how is the output expressed in terms of the input. Well it depends because whether you the weights you learn after a network. So, whether it is plus 1 minus 1 what exactly the constraint you get right. You know you are taking a trained network. Yeah. So, all the weights are fixed. Yeah. They are constants. Yeah. All parameters are fixed. Yeah. So, what matters for you is how many variables you need to encode the input as it is getting processed as it. Yeah. So, that encoding really depends on what is your threshold and those threshold really depends on different parameters that you have for each of the layers right. So, the question you are asking. So, are the parameters symbolic in your thing? No, no they are not symbolic. The encoding from cardality constraint to the cnf they try to take advantage of because you are writing sigma xi greater than some threshold. And that threshold really depends on how these parameters are given. And that kind of blows up the encoding to final cnf. So, it is not that well we can get a upper bound which is n log k if you say that the threshold in every case is n. So, it is n log n will be the upper bound in theory which is not a very good news because log of 50000 would make us about 20. So, that case will have to deal with million variables each of the problems. Yeah, I am just trying to understand that if you want to scale it up to bigger networks. Yeah. With similar thresholds let us say. So, is it I mean. Is the hardness is the question that where is the hardness here in the problem? Yeah, I mean it looks like it is the whatever the density interconnection of the network that matters more than the I mean the thresholds do affect the encoding. Yeah. But that is because you have a specific case which depends on threshold. Yeah. I could use something else. I could use some other. So, I think yeah. So, the way people typically think of parameter is plus 1 minus 1 or 0 if one of the weights is 0. So, what does not affect is how many if you would like it to be more sparse if the sparse network would probably be easier for us to handle right. We had to really do a lot of I mean at this point I do not know where the if you ask where the hardness is coming from is because this encoding is really blurred. So, it is it is really the case that we deal with this formula with millions of variables where the inputs are very small. So, the XORs you put only on inputs, but the formula that you deal with is very very large. So, in forward propagation this is a very simple problem. Once I give the values of inputs it is very easy for the solver to understand. But remember the properties encoded on inputs and outputs. So, it is really the circuit that becomes very large. Yes. Yes. Okay. Thank you.