 Is it good to go? All right, thanks for coming. I'll be talking about Pareto optimal solutions for smooth analysts, and this is based on joint work with Ryan O'Donnell of CMU. So thanks to Shanghua for giving a nice introduction to why we should study Pareto optimal solutions in terms of capturing trade-offs between different objective functions. Let me actually give a different motivation for studying Pareto optimal solutions. It's that it can actually be an important complexity measure for understanding the performance of algorithms. So let me give a motivating example. Suppose we have the knapsack problem. So we have a vector of values and a vector of weights, and some upper bound capital W on the total weight that we can support in the knapsack. So our goal will be to solve this problem optimally and to do so by building up optimal solutions to larger and larger sub-problems. Now what if I consider the universe on just the first i items? There are two to the i candidate solutions, one for every choice of which items to put in and which items to leave out, and each of them in turn has some total weight and total value. And actually I'll plot these solutions on the xy plane with the value, the total value along the y-axis, and the reason I'll use w minus the total weight is that throughout this talk, the goal will always be to maximize objective functions. It'll be better to have larger total value and it'll be better to have larger remaining residual weight in the knapsack. Now if these are the solutions that we get from looking at the entire universe, the first two to the i candidate solutions, then there are going to be some solutions which are clearly dominated by others. For example, this solution right here, well there's another solution up and to the right. So there's another solution which has larger total value and smaller total weight. So in this case we'll call this original solution not Pareto optimal. If there's any other candidate solution, that's simultaneously better on both objectives. And conversely, if the region up and to the right is empty, if there's no other candidate solution better on both objective functions, we'll call the point Pareto optimal. And the full set of points will be the Pareto curve. Actually in the specific case where I'm considering all the solutions on the first i items, I'll actually refer to this as the i-th Pareto curve. So now the goal will be to solve the knapsack problem exactly by building up Pareto curves for larger and larger sub-problems. And actually the size of the largest Pareto curve we encounter will actually be a measure of the runtime of this algorithm. It'll be the main bottleneck in this algorithm. So now suppose we were trying to build up the i-plus-first Pareto curve from the i-th Pareto curve. Well, if we extend our universe size to now include the i-plus-first item, there are new solutions we have to consider. For every solution on these first i-plus-one items, it really comes from a solution on the first i items and a choice of either include or not include the i-plus-first item. So in fact, all the new points that we get, well, they come from an old point and a shift left and up. So for each old solution, if we don't include the item, well, it's total weight and total value state put. But if we include the item, then we shift over the point by the weight of the i-plus-first item and up by the value of the i-plus-first item. So actually you can check from this picture now. Unfortunately, gray is not coming out too well, but this point that was not Pareto optimal, while the shift left and up, this point will also not be Pareto optimal. You can see in this figure there is another candidate solution occupying the region up into the right. And in general, this will be true that any point that we've thrown out to form the i-th Pareto curve, well, we don't have to worry about what new solutions arise from it. Those solutions will also be irrelevant. So this point is not Pareto optimal. And there can be cases where we had a point which was Pareto optimal on the i-th curve, but is actually no longer Pareto optimal on the i-plus-first curve. So if we encounter these types of solutions now, we can also throw those out. And in general, the set of remaining points will be exactly the i-plus-first Pareto curve. So the approach now is to build up these curves one by one. And what I was hoping to get across this discussion, this algorithm, by the way, is called Dynamic Programming List, in the specific case of NAPSAC, it's called the Namehouser-Allman algorithm. Well, what I was hoping to get across in the previous discussion is that the i-plus-first Pareto curve can be computed from the i-f1. And actually, if you implement this intelligently, it can actually be done so in linear time. If you maintain these lists as sorted orders, these Pareto curves, then actually you can compute the next Pareto curve just by an appropriate merge of two different lists. So in fact, pO of i-plus-1 can be computed in time linear in the size of the i-th Pareto curve. And a relatively trivial observation that actually solving this problem, finding the n-th Pareto curve, actually solves the problem exactly that we're interested in. If we had the n-th Pareto curve, we could actually compute an exactly optimal solution to the NAPSAC problem. All we'd have to do is check all the candidate solutions in the n-th Pareto curve and find the one which has the largest total value while still having non-negative residual weight left over. So in fact, in this algorithm, this is something that's used in practice in operations research even though it's exponential in the worst case because the sizes of these Pareto curves can blow up exponentially. In fact, this is a fairly typical example of when an algorithm as an intermediate stage enumerates the set of Pareto optimal solutions. And that's actually the main bottleneck in the algorithm. Understanding whether or not the algorithm performs well in practice is equivalent to understanding how large these Pareto curves can be. So this was used by Byron Baching who first showed that actually in the model of smooth analysis, even though the number of Pareto optimal solutions can be exponentially large in the worst case, they were able to show that in the smooth analysis model, they're actually polynomially bounded in two dimensions. So I won't get into the formal model just yet because it's a little bit notation heavy, but an immediate consequence now is actually that knapsack can be solved in smooth polynomial time just by the algorithm that I explained on the previous two slides. And interestingly enough, this is actually the first example of a problem that's NP-hard and yet is easy in the model of smooth analysis. In fact, moreover, there results even generalize a long line of results on random instances. For a long time, people had studied things like what happens if in a knapsack instance, the weights are distributed uniformly at random on some interval. Can I solve it exactly in expected polynomial time? And that becomes a very difficult probability question, even understanding what the optimal solution value looks like, yet here they were able to generalize all of this work on specific cases of random instances and were able to show that as long as the density functions of the weights and values were themselves bounded, they were never too concentrated, then in fact the namehouser-Allman algorithm runs in expected polynomial time. So let me now formally introduce the model and then I'll start to talk about what the previous results are and where we can go from there. But the model will be a little bit notation heavy because it'll be a generalization to the one that I described for the knapsack case and actually all these little bells and whistles will be important for various applications. But for now you can imagine a model where an adversary is trying to force many pre-optimal solutions in expectation. So the adversary will get to choose quite a few things. The first thing the adversary gets to choose is the set of candidate solutions. This was easy to describe in the case of knapsack because it was in fact all of 0, 1 to the n. It was every choice of which items I can put in the knapsack and which ones I want to leave out. And more generally, you can actually choose combinatorially interesting families. So for example, if n were the number of edges in some graph, then I could just as well choose this set if I were the adversary as maybe the sets of edges that form a spanning tree in my graph. Or even more, I could choose the set as the sets of edges which form Hamiltonian cycles. The adversary will be interested in then choosing D linear objective functions and trying to force many pre-optimal solutions in expectation. The first objective function the adversary will actually he'll be allowed to choose it adversarily. So even in the case of knapsack, the fact that you're allowed to choose one objective function adversarily and it's not perturbed means that you can perturb either of the weights or values in a knapsack instance and it still runs in smooth polynomial time. The remaining objective functions will actually be linear. So the adversary will choose some D minus 1 linear objective functions from the 0, 1 to the n space to r. And in fact, in general, these random variables the adversary will even choose the distribution of them. Not just the value and then perturb them by gaussians. He'll even be able to choose an arbitrary perturbation where these random variables just are random variables on the range plus minus 1 and the density function has to be bounded by phi. So actually this level of generality is important in some context where you want to consider more than just gaussian perturbations for things like applications the mechanism design is all elude to later. So in any case, this is the model. I apologize for all the notation but it's good to get this out of the way. Are there any questions that I can answer now? Okay. Is there a question? No, it can be arbitrary nonlinear. So that's what allows it to encode a lot of combinatorial structure. So a lot of the bells and whistles in this are what allow you to move to models which are somewhat more realistic of how you might measure data. So allowing one arbitrary one allows a lot more freedom in the expressive power of the adversary. Okay. So if I let P O be the set of pre-optimal solutions, as I mentioned, Bayer and Valking showed that for the two-dimensional case the expected number of pre-optimal solutions is polynomially bounded. And this is what had implications for knapsack in particular. And this was later improved to a tight analysis by Bayer, Roglin and Valking and actually showed that a matching upper and lower bound that the answer should be n squared. That an adversary can force n squared pre-optimal solutions in expectation and he really can't force more than that. Fee is just the bound of the density. Fee is the bound of the density function, that's right. And Roglin and Tang then showed a very strong generalization of this and showed that actually for any constant number of objective functions it's still polynomially bounded. So even if you want to solve multi-objective optimization problems which you want to find multi-dimensional predo curves, you still can do so instantly in the smooth model for a constant number of objective functions. But actually the dependence was not so great on this number of objective functions. In fact, the exponent of this polynomial itself depended exponentially on D. So actually a lot of bounds and smooth analysis can be sort of slack in what actually the polynomial is. So actually the focus of this talk will be on getting a much tighter analysis of this through an interesting approach, let's say. So one other result I want to mention in this line is that a recent paper of Dugme and Ruffgaard actually used the results in this line, not exactly the number of predochemical solutions, but used the stronger property to actually show a black box reduction in mechanism design that any FP task can be transformed to a truthful and expectation FP task. So our results we end up getting a much stronger bound on the number of predochemical solutions. We show that in fact the exponent's dependence on D can actually be improved to linear in D. And it's a somewhat technical proof, so I'll just say something somewhat cryptic for now and then I'll get into why we're able to make this work later. But the key in all of these works, these previous works, is actually in choosing the right definitions. You choose the right definitions to analyze these random events and actually the whole thing becomes easy. The difficulty is the definitions become very convoluted. Even for the case of dimension 2 as you go up they become even more complicated and you get worse and worse bounds on this exponent. So part of our contribution is actually finding more principled way to get the definitions. We do it implicitly through an algorithm that actually constructs the family of events that we analyze. And I'll explain that more in detail and then to be clear right now. So I'll mention that this answer is a recent conjecture of Tang and just to compare and contrast this to a distributional model, what if I took all the power out of the adversary? What if instead of having all this combinatorial structure that allowed me to capture things like perturbations to napstack instances, what if instead I just asked what happens if you take the same number of points as random uncorrelated samples from a D dimensional Gaussian? So there's no more combinatorial structure across one of the 0, 1 vectors of these different items. Instead if I hit the sample button two to the n times, it's been known actually since the 70s that the expected number of pre-dopimal solutions is exactly n to the D minus 1. So actually what our result is is it actually shows that it's a square factor off from this purely distributional model where the adversary has no power and it's totally Gaussian distributed. The strange thing is that for the case of D equals 2, well if you recall I said that actually in the smooth analysis model the matching upper and lower bounds were n squared. And here in this Gaussian land the answer is actually n. So actually the square factor is already necessary for the case of two dimensions and in fact this is why this was conjectured n to the 2D minus 2 as exactly the right answer all the way up. Although no lower bound better than the Gaussian case is known right now. Are there any questions? Okay so I'm not going to be able to explain entirely how this result goes through, but to make some of my statements a little bit less cryptic we have to get our hands a little bit dirty. My goal is to explain some of the previous proofs, the Bayer-Rogland Vockingproof, and then I'll actually give a reinterpretation of this result. So the first goal is actually to come up with a slightly, I'll only work with dimension 2 until the very end of the talk. Now the first thing is I want to give a slightly easier characterization of when a point is Pareto optimal. I can imagine if I took all my candidate solutions, they have some objective 1 value, some objective 2 value what if I start all the way on the right from the largest objective 1 value point and I sweep from right to left. Well as I encounter points when will they be Pareto optimal? Certainly the first point that I encounter will be Pareto optimal. It has the largest objective 1 value of all points. So it's clearly Pareto optimal. And in fact now any subsequent point that I encounter as I sweep from right to left will actually be Pareto optimal if and only if it lands above the highest point that I've seen so far. So for example this point is not Pareto optimal it doesn't cross this barrier and this point is Pareto optimal it crosses the barrier and it causes us to move up the barrier. So now this point is not Pareto optimal because it doesn't cross this new barrier when we encounter it. There is another point that's up into the right. So let me work with this condition now and let me explain the general pattern. So I mentioned that all the work in these previous papers is through definitions. So at a high level our goal is to count the expected number of Pareto optimal solutions and in fact the pattern that these papers use is we first define some complete family events. Some family events which has the property that every time there's a Pareto optimal solution there's some unique event that I can blame there's some unique event that occurs because of that Pareto optimal solution. So if we had this type of domination condition it would mean that if we could bound the expected number of events that occur we could bound the expected number of Pareto optimal solutions. But this is actually a rather easy condition to satisfy. I could for example choose a complete family events that for every possible solution it just asks is that solution Pareto optimal? That's certainly a complete family but it's useless to us because it doesn't make the task of bounding the chance that the event occurs any easier. So in fact that's what I mean by all the work is actually in the definition. This will actually become quite convoluted just for the purpose of making this last statement easy to do. So for example, you know now let me explain the events which Bayer-Rodl and Evoquing use for the two-dimensional case. These will be the easiest events to explain. Suppose we have some Pareto optimal solution X. Well we want to define an event based on something that must have happened if X is Pareto optimal. So we can imagine that we started sweeping from right to left when we encountered X that had to be above everything else we'd seen so far. And actually similarly the common case is that if we continue this sweep starting at X, we actually expect that at some point we'll encounter a point even above X. We expect that we'll encounter some point Y that's above X because if not X really has a unique identity is the largest objective two value of any point. So I won't worry about that case. And the common case is that I can take some Y that's actually the first point I encounter when I sweep from right to left starting at X that's actually above X. So this region in space between X and Y should actually be empty. But now X and Y themselves are different solutions in 0, 1 to the n. So they have to differ in some index. Let's say it's index I. And for simplicity let's say X of I equals 1 and Y sub I equals 0. So this is almost the picture of what the event looks like. There's one more technical detail that if you don't want to don't worry about it. It's that we can actually draw some interval around X which if we draw it narrow enough it'll actually not contain Y. Y will be above it X will be inside the interval but the rest of this interval will be empty. You can think of really what we care about because this interval becomes very, very small because we really want to perform an integral to understand the expected number of period optimal solutions. But now this is actually a complete family of events. Really let me try and formalize just saying that the event is what I drew on the last slide actually occurs. So the event is some interval I where it has with epsilon and the event that we're interested in now is that there's some period optimal solution X. Well it has to land inside the interval but moreover if we let Y be as in the picture it's the first point when we sweep from right to left starting an X that's above the interval with that Y and X they have to disagree on this index I. Y sub I has to be equal to 0 and X sub I has to be equal to 1. So you see already that even in the case of two dimensions in order to make this analysis work I need these events which have this conditional chain of things that happen. Y is the first point which is above this interval and so on but the virtue of choosing this sort of convoluted definition is that it actually makes the analysis become easy now. It's almost just an immediate consequence if you work out what the real definition is. For example suppose we had that event E and we're interested in does it happen? So we have some interval I and this is really the operative interval that we care about this X land in here and does all that chain of events happen but now what we can do is we can actually sample all the randomness out of objective 2 except for one missing random variable. That missing random variable will still be enough for us to show that this event is unlikely. So now suppose we had all these other variables in for objective 2 except for this one missing one on this index I that we really care about for this event. Well, now if we pretend that this missing random variable is 0 for simplicity then every point is mapped to some objective 2 value and every point is mapped to some objective 1 value and actually some of these points locations in the plane are already fixed. We know that among the entire set of K and 8 solutions some have their I and X equal to 0 some have their I and X equal to 1 and these blue points that have their I and X equal to 0 they don't actually depend on the missing variable in the second objective function. So actually this remaining randomness we have these blue points are actually frozen in the plane. What that means is we know among the blue points which ones are above the interval which ones are below the interval and we in fact know what's the rightmost point that actually that identity of that point actually has to be the Y in order for our event to occur so what we're actually doing is we're being able to deconstruct the event to figure out who Y is and later who X is that's the basis of this analysis now we know that X if the event occurs actually has to happen to the right of Y so it has to be among these green points conceivably we might be worried about any one of these green points landing inside the interval but actually it turns out that there'll only be one green point that can actually cause our event to occur because all these green points and these black points too essentially these non blue points will they move up together with this missing random variable so for example if you know I sampled this missing index and all these green and black points moved up together what happens if some other pretty optimal solution lands inside that interval well even though X actually has landed inside that interval and there is a pretty optimal solution in that interval we're actually not worried about it because we know that X actually has to be above the interval yet X by design agrees on the Ith index with Z so this means even though there is a pretty optimal solution inside this interval we actually should be blaming it on a different event we should be blaming it on some other index which these two points are different on so actually the only point which can land inside the interval and cause the event to occur is the unique green point that's highest up so I hope you can at least get the intuition behind the analysis but now let me actually just give a reinterpretation of what this is actually doing so there's one interpretation which is just based on a complete family of events and analyzing it which sounds fine because it's a union bound actually what we're really doing is we're defining these events implicitly through an algorithm there's some transcription algorithm which what it can do is it can take all the randomness after it sees everything and for every pretty optimal solution it finds an event to blame so there's an algorithm I mean it doesn't need to be efficient but there's an algorithm which given a pretty optimal solution X will find some interval that X is contained and it finds some Y which is you know up into the left of it and it finds some index where they're different this algorithm spits out the description of the event and what we're actually doing that so this already defines a way to see that this family of events is complete the statement that it's complete is just that on input any pretty optimal solution this algorithm outputs something that's some event that we can blame but now the analysis is actually like a speculative execution of this because what we're actually doing is given the description of the event given the index I given the interval given X sub I and Y sub I well we're actually able to look at just some of the randomness we're able to look at everything all the randomness and objective to except for this one missing random variable and we're still able to figure out who the X is that we should be blaming we're still able to recover a unique point that must land inside the interval in order for the event to occur but since we only looked at some of the randomness in fact that remaining randomness that we have left over is enough to show that the event is unlikely that it's now unlikely for X to land inside the interval so that's how this pattern of analysis is going to work is that for the case of as you go higher and higher dimensions the events become much more complicated the chain of conditions that you need to verify but at the end of the day you can actually write down a simple algorithm which you're just saying that you know given some pre-diplominal solution and output some description and the entire analysis then is that given those clues about what's occurred another algorithm can take that as an input and look at just some of the randomness and actually step by step it will have the same output as the original algorithm method of analysis is actually through invariance of algorithms so the algorithms themselves are page to the scribe or something but it's much simpler than the family of events that they actually implicitly define so the last thing I want to mention is I'll just give you an idea for what the family of events now looks like in higher dimensions in dimension 3 even so for example now in this case for three objective functions we can imagine sweeping according to this first objective function we take all the candidate solution and we start from the largest objective one value and we consider them in decreasing objective one value order now a point occurs the first one is again clearly pre-diplominal it has the largest objective one value for a new point that occurs to be pre-diplominal it just has to be pre-diplominal on the two dimensional objective two objective three plane so for example this next point is pre-diplominal when we encounter it this next point is as well and this point is not because now when we encounter it it's not on this two dimensional surface there's another point that's better in objective two and objective three that occurred earlier and hence has larger objective one value so in fact now if we encounter some new pre-diplominal solution x we need to define some notion of the common case what events happen in order for this point x to be pre-diplominal and the common case we actually expect that there'll be some later y that we encounter which will be up into the right of x this again will be the common case otherwise we'll actually be in a lower dimensional instance and the trick ends up being that if you look at all the points before so x and y first of all differ on some index i because they're different zero one to the n solutions so in fact what we'll output from our algorithm well one of the things we'll output is what's the index i where x and y disagree and we'll in fact output whatever the values are of x and y but we know that actually of all the points that occur before y which have larger objective one than y, x has to be on the two dimensional pre-diplominal surface that I've drawn here in a dashed line in fact even if we take the points from z which are different on this i-th index so it includes x well this restricted set of points x is still Pareto optimal so what this means then is that we can essentially recurse now because we can imagine since x is on this lower dimensional pre-diplominal surface but now all these blue then this blue and this red point they all agree on a new index so you can actually recurse to another case where we try and sweep along objective two and we know that now in the common case exactly as before there's some later point z that's actually up into the left so what we end up outputting is actually for the index j that x and z differ on we output this full table of first of all what's the i-th index, what's the j-th index and what are the values of x, y and z that are the sort of key agents in this entire argument and the trick behind all of this now is that there's some way to actually speculatively execute this algorithm outputs enough information that it can look at just some of the randomness and still be able to figure out first who y is, then who z is and then turn who x is and that's how the analysis works so that's actually all I really wanted to say about this are there any questions I can answer? thanks first oh thanks you seem to have a question so you give this really nice application application at the beginning of the 2D case with knapsack seems like they're really proof from the book about why in particular average case versions of knapsack are fast and stuff so I mean what would be the killer applications to sort of jack up d a little bit say d equal 3 so there, so as long as the objective function you're interested in is some monotone increasing function of these different objectives if it's an arbitrary really complicated non-linear function but it's still monotone then people actually, the way they end up solving this is by building up the Pareto curve for higher dimensions and then just checking only those solutions and just outputting the best so the knapsack was one case where the objective function was something simple like maximize one subject to a threshold constraint on another but there are other examples where things like you know, Poppenmeacher and Yanukakis initiated some study of you know approximate Pareto curves and there they had examples where in some web services you actually had a couple different objective functions and really what you're trying to optimize was some very strange function of those so in those cases there's really, I mean there's not too much structure in the objective function other than it's monotone and in those cases you still have an approach because you know that points which are dominated are still not optimal solutions so that would I would say is maybe the canonical sort of example for Pare dimensions yes