 It would be fun to ask him to give a talk that he hasn't given in like over five years. And he was happy to oblige, namely back doors to typical case complexity. Thanks. So I want to apologize from the start that I have a little bit of a sore throat. So the audio quality will be bad. And so those of you out there watching the video, it's my throat. It's not actually the mic. Okay, so this is, yeah, some really old work that I did. Carla Gomez and Bart Selman on a topic that a lot of people here are very interested in. So there's been a lot of significant progress, as you saw in other talks, such as Kevin's, on complete search methods for SAT solving. And by complete, I mean that the algorithm has zero error. It always either tells you SAT or unsat. Okay, but you can still have randomness in such a thing. It just has zero error. So in verification, software and hardware, such methods are very critical. So finding bugs or checking for bugs, finding exploits in software, SAT solvers have been used for that. And the basic paradigm is to reduce the bug finding problem or the exploit finding problem to a huge CNF SAT instance. Okay, say one million variables and five million clauses. The current state of the art in SAT solving can often handle such things. And then to check unsat with a super optimized backtracking based solver. So checking unsat is amounts to saying that there is no bug. For all possible ways to try to force a bug, there isn't one, okay? So by backtracking based, I mean the following specific type of paradigm. So you have some heuristic which is choosing a variable for you. There's some very complicated ways which you might do that by analyzing the instance. Then you have some heuristic which will try a value for that variable. Let's say in the case of Boolean SAT it's true or false. And then you recurse the algorithm on a simplified instance that has that variable plugged in. And as you plug in variables, you may make some parts of the formula simpler. And so you have some polynomial time algorithms which will simplify things for you. And so there's some kind of propagation of values of things. And eventually, you'll either have sort of tried all possibilities in some limited sense and found that it's unsat in all cases or you'll have found a satisfying assignment. So that's what I mean by backtracking based. And this paradigm just used over and over. So one view from practice is the following quote by Bart Selman. Our world may be friendly enough to make many typical reasoning tasks poly time. Challenging the value of the conventional worst case complexity view in CS. This is very provocative. This is my own co-author after we wrote the paper actually. So in this talk, we formalize one way to talk about what friendly enough means. So one way to think about what that means. Within a worst case complexity perspective actually. We're just going to restrict the kinds of instances we're looking at. But still a worst case type of perspective. So there's a huge theory practice gap here, as others have mentioned. So the amazing performance of these SAT solvers seems to be in conflict with the idea that SAT is actually hard. So instances from these domains are surprisingly easy yet the best known worst case algorithm for three set. We have still runs in some exponential time. So what makes so many practical instances easier? So one proposed answer is that there's hidden tractable substructure somewhere in these real world problems. Structure that doesn't exist in general. So our specific structure that we're going to look at here are things that we call back doors or back door sets of variables. So the initial motivation for looking at these kinds of things was in heavy tailed running time distributions of randomized backtracking solvers. So some types of problems such as Latin square completion problems and things like that, when they were solved by randomized backtracking, you would get a runtime distribution that has a surprisingly fat tail on it. And just to give a little more formality what that means, the running time will be at least t with probability proportional to 1 over e to the alpha, alpha is some small positive constant. So the main message here is that that would explain why one would see a wide different range of solution times. So Ken was reporting earlier that you get things in the order of milliseconds to days. So you see some really short runs and other times you see really long runs for no particular reason it seems. So how to explain these short runs, that's what we were interested in. Because what often happens when you get a distribution like this, you just do random restarts because you know that there's some significant fraction of the possible running times that will actually be short. You just keep trying to get lucky and hit one of those. So informally what we proposed was this idea of a backdoor set. So a backdoor to a given combinatorial problem instance is a subset of variables such that once you assign the right values to those variables, the remaining instance simplifies to something that is tractable. Okay, that's just informally. So more formally, we have to say, what do we mean by tractable? So we define the notion of a subsolver, some sort of thing akin to these polynomial time heuristics built into the solver, and then we distinguish between two types of backdoors. So strong backdoors which can be used to determine unsatisfiability and normal backdoors which are used to determine satisfiability. So the definition of a subsolver is the following. Intuitively, it's these polynomial time propagation algorithms that you have in a backtracking solver after you set, say, a couple of the variables to something. So a subsolver, A, given as input of formula, satisfies the following properties. So it either rejects the input, okay, says I just don't know, or determines it correctly, as unsatisfiable or satisfiable. Returning a solution is satisfiable. But somehow, if it says unsat, then you can prove that it actually is unsat. And then the thing has to be efficient, has to run in polynomial time. It has to satisfy just some very basic criteria in order to prove theorems about it. So it can determine if a formula is trivially true, has no clauses whatsoever, no constraints, or trivially unsatisfiable, has an empty clause, a contradictory clause, things which just says false. And then there's a fourth property which is sometimes used in the literature for backdoors and sometimes not. Because some propagation heuristics or sort of polynomial subcases don't satisfy this thing. It's called self-reducibility. And it says basically, if an algorithm determines whether a formula is sat or unsat, then if I take any variable in any possible value for it, it could be a bad one. If I plug it in, I get a simplified formula, and A can determine that one too. So this is satisfied by a lot of things. For example, if the subsolver solved two-sat, if I plug in variables to two-sat, I don't go outside of two-sat. Hornsat, things like that. But in other cases, you don't get this kind of property like, suppose my subsolver checks for the all zero assignment. And if that doesn't work, then it says, okay, I don't know. Then plugging in a one there, then it may or may not work, okay? So this definition is still general enough to encompass many polynomial time propagation methods that are used in practice. Including those where we don't have some nice syntactic characterization of what's going on, we just had a bunch of heuristics and they work. But the notion makes perfect sense for other types of constraint problems, not just sat. So this general constraint satisfaction, mixed integer programming, things like that, okay? So that's a subsolver. So what's a backdoor set? So the definition of a backdoor is that it's a subset of the variables with respect to some solver A. So it's a set S with respect to some A. If there's some assignment to that subset, so that A will return a satisfying assignment when you plug in the variable assignment for that subset, okay? So this works for satisfiable instances. There's a notion of strong backdoors, which is there's a subset so that no matter what assignment, partial assignment you plug into the subset, this subsolver A will be able to take care of the rest of it. And either conclude sat or unsat. And so identifying a strong backdoor and trying all possible assignments to that, you could in principle conclude unsat for such a formula, okay? So the first big note is that backdoors are algorithm dependent, very, very algorithm dependent. It depends on what subsolver you're using as to whether or not you have a small backdoor or a large one. So what we found in practice a while back is that backdoors can be surprisingly small on a bunch of different instances. And this can help explain why a solver gets lucky on some runs, mainly because these backdoors are identified early on in the search. So here's just five things out of the list that we did. So here we have some thousand variable instances. So this is like a logistics planning problem. This is a circuit doing a three bit adder. This is some part of a pipeline processor verification. This is a quasi group or Latin squares completion problem, it's a commutorial problem. And in each of these cases you find that with respect to the sat Z heuristics. So it's just some very strong polynomial time heuristics that are used. But I mean I say heuristic here but we know that when they say sat or unsat they're correct, okay? It's just that we don't know like why they work, okay? So the backdoor set for these things it turns out to be rather small. And there's been a lot of follow up work since these things. So I just want to refer you to a survey by my co-authors as well as Carlson Silverwall from 2007. It's a very nice survey with lots of more examples in this. Yeah. Did you find a method by allowing you to make the variables that you can construct on the doorstep? So we found them by using these different heuristics and by just sort of kind of brute forcing in some cases. So we use heuristics up to a point and then we try to sort of brute force. We try to sort of cut off the search as early as possible. But like you couldn't try all? No, you couldn't try all two of the 53. We didn't brute force the whole thing. It could be that there's a smaller backdoor there. That's just the largest one that we got. Yeah. I mean it's basically just from sort of analyzing the number of times that the SAT-Z heuristic actually backtracked. So the number of times it actually failed to find a SAT assignment. What's it that's called? Oh, this is just a fraction. This is just the ratio of this one over that one. I'm just showing it at small, I don't know. Because it's the numbers. Without a ratio, how can we ever interpret what's going on? Yeah, yeah, yeah. So I have an animation. Okay, so I'm doing okay on time. So there's some nice animation that my co-authors came up with to try to visually show you what's going on and like where this kind of structure gets exploited. This is an instance with I think about 800 variables. So this is the variable variable graph, meaning that for each variable we have a node and we put edge between two nodes if there's some clause that has both of them. Okay? And so here we have some loosely connected variables and here we have some variables which are very, very tightly wound up in the middle. Okay? And so we have an animation here where as you plug in variables nodes disappear and things get smaller but you may backtrack. So as you backtrack you introduce the node again because you failed to find an assignment. So this is just with a random selection of variables to branch on not just some sort of uniform random selection. So you see there's like a lot of things that are still working on this hard core a lot and it will eventually conclude on set. So it's like a before and after picture and you see the before picture is very, very elaborate. Okay, I think I'm tired of this before picture. Okay, it goes, I think it only goes for 50 seconds but it feels like a lifetime when you're up here. So the after picture. So this is basically when you use the heuristics of the SAT solver. So this thing has a backdoor of size 16 and basically the key just at a very intuitive level is to pick variables out here on the fringes which simplify the thing and then this nasty thing gets proven unsat pretty quickly. So yeah, okay. Anyway, that gives you some idea of the fact that there's this structure there. So what's the intuition here? So imagine this is a blob of all possible CNF formulas. Okay, so there are certain pockets of this blob that can be solved efficiently and we know about those pockets these are so-called islands of tractability. And the idea is that many real-world instances happen to fall in a space which is rather close to one of these islands. So a small backdoor set intuitively means that this problem instance is close to one of these different islands of tractability and after setting a small number of variables we arrive at one of these islands. And moreover, the solvers in practice with their variable choice heuristics are able to pick out some super sets of backdoors which are good. So one thing I want to emphasize is that the existence of such things is not ductological. So I didn't set things up just so that it would automatically be true. So just because a problem instance is solved efficiently in practice and the subsolver happens to be this one used in practice that doesn't necessarily imply that the instance must have a small backdoor with that subsolver. For example, it could be that even the smallest backdoors are somewhat large but there are many of them. And so the solver gets lucky because it just tries anything at all and because there's so many of them, it works. And so one observation I want to point out is because of the self-reducibility property of subsolvers, if we have a small backdoor which works, say, for a proving set then that implies that there are many large backdoors because the self-reducibility property says that if an algorithm, my subsolver, determines a formula then it will determine all the sub formulas you get from plugging in variables. So this means that even if I pick a super set of the backdoor set and plug in the right values that will also work. So having a small backdoor entails that you can have many large backdoors for those that contain this small one. It makes sense that you don't need a small refutation. Yeah, for refutation it's not as clear. How do we know that there's no small factor in refutation at all? I don't know, I mean, I think... I mean, could it be that refutation works quickly without the backdoor at all? A strong backdoor. For this type of algorithm. I'm not sure. So the empirical fact that we often encounter small backdoors in problem instances shows that these problem instances are special. And to emphasize that further, I just want to point out that most formulas with respect to the typical subsolver we think about, polynomial time, special cases of set, they do not have weaker strong small backdoors. So let A be some subsolver that handles two set, horn set, linear equations. So with high probability, if I pick a sufficiently large clause density, D, a random case set instance within variables in D and clauses will have minimal backdoor set of some linear size. And the intuition is very simple. With high probability, our backdoor set of variables is going to have to do a lot of work to get close to horn or two set linear equations or anything like that. So it must hit many clauses in order to simplify a completely random KCNF instance. So this tells you that well, for most instances you see, you won't have a small backdoor set. So the fact that there are these really small ones means, further emphasizes that these instances in practice are very, very special. And this helps explain why randomized backtracking can perform poorly on large random three set. So there's one backdoor set for this subsolver? Yeah, for a subsolver that handles, say these. So we always have to talk about which one. But if I took a subsolver, it was like, had survey propagation embedded in it, maybe then you have a totally different story. And just another note, every satisfiable KCNF formula you can show has a backdoor of non-trivial size in a certain sense. And these can be pulled out from sort of old work on KSAT algorithms. So for example, if you have a subsolver that does unit propagation, so by that I mean when you have a one CNF clause, a clause with exactly one variable, then you set it to the right value, so that this clause is true. So that's unit propagation. So if you have something like this, then every satisfiable KCNF formula contains a backdoor set of some constant fraction size. That's less than N. So this implies pretty quickly that you have a faster-than-to-the-N KSAT algorithm for each K. And just to give another observation, so if you have one of these subsolvers which doesn't satisfy this self-reducibility property, say it tries all zeros and all ones, then every satisfiable formula contains a backdoor set of size at most in over two. So you can think about why that's true. But yeah, so depending on which subsolver you use, you get different sizes, some non-trivial size. So there's been quite a bit of work on theoretical algorithms that try to solve various problems assuming that small backdoors are in there. So a naive upper bound, of course, this is just trial K sets of variables if you have a backdoor set of size K and then trial possible assignments. But when K is, say, some constant fraction of N, you can get slight improvements over this bound by just some simple randomness tricks. So again, exploiting this self-reducibility property, you can pick a subset slightly larger than your estimate of the minimum backdoor. And with good probability, you know, some good less than 1 over 2 the n probability, your backdoor will lie in the superset and then trying all possible assignments there, you get some tiny improvement. But again, this is a sort of theoretical curiosity. And so here's this kind of example theorem of the kind of things you see. So Nishimura-Rogda insiders show that for subsolvers that can recognize horn 2C and F in linear equations, say, finding a strong backdoor of size K is something which is FpT. So it can be solved in 2 to the order K times poly in time, instead of something like into the K here. And there's another notion of deletion backdoor, which is basically a set of variables such that once you've just deleted them completely from the formula, then you can solve it. There's another natural notion of being close to tractable, sort of measuring the distance from tractability. And there are many FpT results for finding deletion backdoors of size K. I just wanted to say this buzzword so you'd have something to Google if you're interested. And there are also many hardness results. I mean, finding a small backdoor or a minimum size backdoor if one exists is intuitively an NP-hard problem to solve in general. But so for most subsolvers using practice, if we could find a size K backdoors in FpT, then you could solve NP problems in the next time. So this is a different set of subsolvers from these. And I'll be remiss if I didn't mention some related work. So this is a fairly basic idea of having some distance to solvability. So there is some work that predated ours in operations research. So there they call them control sets. So these are small sets of variables for a formula so that once these variables are, say, deleted or set the right value, the resulting formula has some nice property. So often the properties they were looking at were monotonicity is a formula monotone or not. But clearly this is a very related notion. And in parametrized algorithms, there was some work after ours concerning distance from triviality. So the idea is very basic. Suppose one can make K edits to some problem instance so that then it becomes easy to solve and presumably these edits would preserve solvability, satisfiability of the thing. Can we then solve the instance in parametrized? Fixed parameter. So there is some work on that. So just, again, something you can Google. So I want to give some final thoughts and food for thought for this crowd. So we propose some notion of a backdoor set of variables. It tries to isolate the hard part of a problem instance in the real world so that once you set this hard part, the rest becomes easy. And many real world instances have been shown to have these small things with respect to the modern set solver heuristics are used and these solvers are exploiting it. So one is why are they there? It's a big final thought. So okay, we know that these problems are extremely structured. Problems with thousands of variables have a dozen variables so that if you set it the whole thing is easy. But are there deeper reasons? There has to be deeper reasons why these things are there at all. So why they exist in practice but not on average. Just because of the way that circuits are designed that these circuit instances look like this. And another question is our framework is algorithm dependent. So we were saying that we have a backdoor with respect to this algorithm or that algorithm. Is this necessary? I don't know. So maybe there's a universal subsolver some notion of that. Some kind of leaven style enumeration over all polytime heuristics. I didn't figure out a good formulation of that but maybe there's some way to do this which pulls it away from the algorithms and you can actually say something about the instances. So that's all. Thanks. So the experiments that you ran on actual circuits these backdoors that you found did they have any kind of reasonable interpretation? I don't think that we found any reasonable interpretation. Because a lot of these variables they capture sort of various properties of certain gates. So why is this... And they would be kind of spread out and unrelated. Why would... This variable set to true if this gate has this weird firing at this stage. And then over here it's something that would pull it different. We don't know why they're small and they're there. We don't. So if there's some kind of recursive structure of the backdoor, we move one backdoor and the rest of the circuit are composing the components they have smaller backdoors. So if you consider that it's almost similar to a graph as in the boundary theory of something. So there could be some other problem with the other problem. Some of the questions come from... Why do we mean the instance itself has some... You move... Some of the backdoors in the same type of components recursively they have a certain type of backdoor. Right, right. Yeah, I mean that's one way of... Looking at it could be one reason why for some of these instances that you had it. So people actually think about the set solvers so there's more likely to be really small backdoors. They're going to have to shoot that backdoor. Yeah, so the set covers is a good reference to look at this. They often have papers about backdoors and... People then tweaking set solvers specifically to look for these kind of things. But the thing is that the heuristics are already kind of implicitly finding them in the first place. I just wanted to do some of the heuristics to make it smaller backdoor support. Sure, yeah. Yeah, I think that's it. Yeah, I think that's it. Do you have a question for Ryan?