 We're going to go ahead and resume with the afternoon talks. So the first speaker, I'm pleased to introduce Shemantanik, a professor and chair at USC. He'll talk about smooth analysis in the last half decade or so. I think I have an odd voice now, right? So Tim, thanks a lot for the wonderful workshop. It's terrific. And this is actually the first time Dan Spielman and I gave a talk on smooth analysis in the same conference within the span of three days. I'm very happy that he covered most of the technical components, especially surrounding numerical analysis. And here I would like to focus on some of the recent studies of multi-objective optimization machine learning and the game theoretical problems from the angle and the length of smooth analysis. I greatly enjoyed personally this series of study, in part because I was able to work with real experts, with the domain expert in this subject, which at the beginning of the investigation actually literally know nothing about most of those problems. And I'm very glad for that. My colleagues, Heiko, taught me about multi-objective optimization. Adam and Alex, who is sitting somewhere, taught me about machine learning. And she trained and taught me about game theory. And of course, throughout, Spielman has been teaching me everything. So to continue the discussion of the theme of this workshop, I would like to start a little bit slower, give me some chance for discussion, and also give you some chance to get used to my accent. So hopefully, when I reach the technical part, you don't have to ever correct me that much on my English anymore. So clearly modeling the practical behavior of algorithms and also the practical difficulty of problems has been a challenging problem. For example, this paragraph come out from a writing for an NSF report by the leaders of my field, theoretical computer science in 1999, beautifully illustrated the challenge. And it's actually also highlighted, for example, the same-plex algorithm and similar annealing as two of the practical algorithms which could potentially inspire more rigorous analysis from theoretical angle. And it also highlighted in blue, for example, the importance of experimental work to the theoretical modeling and development. So it's a very beautifully written paragraph. And the document is equally nicely written. And so the linear programming mentioned in the paragraph is something we all are quite familiar with. That is, we are seeking to maximize or minimize a linear objective function subject to a set of linear constraints. Ax is less than or equal to b. And when the linear constraint is feasible, then the region will be a convex set. And so the standard same-plex algorithm approach essentially takes two phases. In the first phase, it essentially tries to determine whether the region is feasible or not, or is unbounded towards the direction of your direction of optimization because you will have infinitely good solution. And if you decide not unbounded or infeasible, it will actually return an initial solution. On the exterior of this polytope. And in the second phase, it applies a standard local improvement algorithm or locally graded or local search procedure and essentially continue to improve the objective function. And the nice thing is that due to convexity and linearity, this local search will always end in a global solution, globally optimal solution. So this approach has been proposed more than half a century ago and has heavily used in financial world and industry. And it has many interesting theoretical features. So for example, in the 70s, a series of work essentially illustrates that almost all the pivoting rule for the simplex algorithm runs in exponential time in the worst case. So almost every design, they have one family of worst case linear program, somehow defeated polynomial convergence. And at early 80s, by assuming, for example, the matrix A is a centrally symmetric, like Gaussian matrix, the average case analysis has been conducted to illustrate that if you have such random linear program, then the simplex algorithm, at least some variation of it, some versions of it, will run in polynomial time. So it has some interesting features, so the worst case feature, practical feature, and average case feature. So one of the main challenges in modeling the practical performance of algorithm is data modeling. How do we model data? How do we model real data? So this seemingly intuitively simple problem, everybody still, I understand my data. And suddenly, when you try to model it, this seemingly simple problem become extremely challenging and often impossible to do. So part of the reasons of this challenge come from the fact that most algorithms are designed to handle many data, not just individual pieces of data. On the other hand, individual users are interested in the performance, whether it's time complexity, whether it's the quality of the algorithm, on the data that they care, that they encounter. So this discrepancy, actually, are often the source of the difficulty. And more than that, the distribution or the subset of the instance occurring is often varied from user to user. So we Chinese have an idiom. In Chinese, it's a zhungou nantiao, meaning it's hard to cook for many. Verbalism means that it's hard to design a taste that will make many moths happy. So here, essentially, it's almost like a same scenario that there are so many individuals, so many angles. And theoretically, how do we come out of one way to somehow to make everyone happy? It's virtually impossible. So traditionally, theoretician recognizes it's virtually impossible to make everybody happy. Let's just choose one objective that we can conduct, meaning for in our mind, inside or for at least outstanding algorithm. So that is a worst case analysis. So worst case analysis has many wonderful features. One of the features is that if the measurement is good, that is an absolute guarantee. It doesn't matter what input you receive, it will give you a virtual absolute guarantee. But on the other hand, one of the features is that it can be conducted without outstanding the data. Because it's a virtual guarantee. So just like some tourists come here and want to go to Yosemite and say, how much elevation should I prepare to go up? Then you answer, 29,000 feet. Because that highest peak on the earth clearly is a virtual guarantee. And the person says, oh, good. I'm going to Oregon too. How about Mount Hood? You said, 29,000 feet. It's an absolute guarantee. But on the other hand, you really don't need to care about what they are need for that particular instance. So to often overcome that sometimes the worst case instance could be a pathological or rarely occurring practice, a variety of average case analysis has been introduced. So in this traditional scheme of average case analysis, we define a distribution. We conduct, for example, the expected analysis. So there, the challenge, again, is that how do we design a meaningful distribution that A is subject to mathematical analysis and B is close to practice? This trade-off is actually very challenging. So it's hard to cook for many people. So our colleagues in numerical analysis and in optimization, operations research, dance talk illustrated, they often conduct a slightly variation of analysis, which is slightly more instant based. So in particularly, for example, they often study the dependency of the performance and something called condition number. So somehow you want to give input another quality measure, and somehow you relate that quality measure with the performance. So for example, the precision needed for Gaussian elimination depends upon the essentially logarithmic of the condition number of all the principal minors you see during the elimination. For example, the conjugate gradient method for solving symmetric passive definite matrix, the number of iteration depends upon a square root of the condition number of the matrix. So they have this type of language. For example, interior point method, at least randomness version, depend on logarithmic in the condition number defined for linear problem. So they have this so-called instant based quantity, which allow one to measure the performance. For example, the perception algorithm, the number of iteration depend upon one of the polynomial of this wiggle room. So it's all very much like one parameter primary tradition of the complexity. And but in theoretical computer science, we do conduct far beyond worst case analysis traditionally. It's not just in the last many years. Actually way back in 60s, 70s, there were several studies built upon not completely worst case analysis. So I will broadly classify them as the so-called property based analysis. So for example, one of my favorite results in this setting is the beautiful result by Lipton-Ross Targin. They prove that if your linear system, the underlying graph is planar, then you can solve this linear system in time essentially n to the power of 1.5. Even though in the worst case, Gaussian elimination may take approaching quote cubic time. So they said, if your linear system comes from, for example, two-dimensional simulation, two-dimensional graphics, most likely they can be solved much faster. And that is a property of the input. So for example, inspired by that, Gary Miller and his group, I was part of, also produced many results which essentially say if the graph comes from finite element simulation, then certain performance can be achieved. It's a one property. So in that context, for example, I just want to highlight one result. We were able to show, even though in high dimension, the Launay triangulation, this beautiful object in geometry, can be computed quite slowly n to the power of d over 2. But if you know they come from a good data from numerical simulation, then in fact, there's a linear time algorithm to construct the Launay triangulation. In fact, it actually takes constant time if you have n processors. Everything really become much faster. So clearly, there's many work, which I'm not going to enumerate, apply this special property, like on power law graph, on expanders. And for example, we heard a discussion about doing a competitive analysis, assuming the data come from certain Markov model. And also, in this workshop, we also saw several more extensions of quantifying the complexity based upon input and solution properties. If the solution and input have certain properties, then somehow using those properties, one can design faster algorithm. For example, our first talk by Ahrem talked about if the output has certain properties, then you can compute faster. So back to smooth analysis. So in smooth analysis, we also, in many, we combine this notion, either explicitly or implicitly, of this property-based analysis. And we happen to use one particular property that I would like to give at least a quick intuition, because they're already presented over our framework. So there, basically, we began to look at data model. And fundamentally, we try to apply this thing called simultaneous certainty and uncertainty property, or what we call imprecision property. So then eventually, we combine this property with a traditional framework, like worst case and average case, to essentially design a measure. So to illustrate this property, let me just give you one example. That is, we have one person at least from IBM. What is IBM stock today? 173. 173. So if you ask this question, what is IBM stock? If I ask you tomorrow, what would be IBM stock? It's not 173 anymore, potentially. But if I say, you know, IBM stock is zero. Though immediately, say, no, that cannot be. If I say, IBM stock is a pencil then, then you say, that cannot be either. But if I say, tomorrow, IBM stock is 200, that's impossible. It's possible. So IBM stock, if you ask, what is IBM stock? It's really this curve. So there's a certain simultaneous, certain uncertainty. Because if I say zero, you don't believe it. And if I say 200, you say it's possible. So we are not talking about a single number. It's not a single input. So one possible way to think about this thing is that IBM stock is defined by two numbers. One is intrinsic business value of this company. And then the second part is that when the market take measurements, the market induce imprecision. So it's really that imprecision can potentially come from market. And the reason we said it's not zero is because there's an intrinsic business value. So this seems to be, if you ask many people, we said that if IBM stock is a number plus noise, most people say, yeah, that sounds about right. Because the market make a measurement. In many ways, many practical data come from measurements. And it's often imprecision in the measurement, but it's a fundamental value. So as a first card, a first-order approximation, one can often write certain practical data, a piece of data plus a potential noise. And depending on the instrument or measurement, the precision of noise may vary. For example, markets typically carry, I think, 5% to 10% of noise, some day 2%, some day maybe 50%. But mostly carry a much smaller noise than the intrinsic value. So essentially, the SMOOS analysis is designed to combine this imprecision property. And we then design a measure on assuming that the input satisfies this particular model. So I borrowed from Spielman's slide. Then illustrating his part of talk that we basically design this measurement as this continuous measurement. It's measured both the input size and also the precision of the noise. And the formula as a schema combines both the worst case and average case. In the grand scheme, we want to study the arbitrariness of the, we don't define IBM business value. It's happened to be. We cannot say IBM business value is not what they are today. So if we want to do an investment, we have to use that value percentually as it is. But the market does give us a small degree of imprecision. So we basically consider this continuous landscape of measurements. And if this measurement is small, it's basically indicates that in the neighborhood of every input, the performance on average is good. So which means that if the practical data has a little bit of noise, then it's very difficult to see a worst-case example. Oh, that's because very often the type of problem we study, the data, there's an invariance when you scale the data. So there's many refinement that one can put in. So in this context, for example, the slide that actually didn't put up. And he said, I wish I had that slide. So I put it in for him. So essentially, we were able to show that in particular case for linear program, for some variation of the simplex algorithm, we're able to show that if our linear program comes from measurements, you have a noisy part and the intrinsic part. We're able to show for every intrinsic part, as long as you have noise, that in its expectation, the simplex algorithm runs in polynomial time as a function of the input size, as well as one over the degree of noise. So in this talk, I would like to basically discuss some of the recent studies by expanding this wheel to a few other areas. What point is 1 by 6 in my input? Pardon me? Why is 1 point in 1 by 6 in my input B? For instance, 4 in log 1 by 6. So there's two answers. One is mathematical facts. Sometimes this is the best possible you can prove. And for some other scheme, indeed, the dependency is a log of the imprecision. So those could vary from problem to problem. I will actually present another verse, even challenging problem. You cannot even achieve this. But in general, if you think about the data representation, we typically use single double precision. And all of our input size is a billing, for example. A few hundred million. And normally, currently, we are using about 100 bits. And that's almost like constant times log n. So that could give you the noise level there. And in such a result, it does imply a polynomial assumption. So I would like to also use each of those examples to highlight certain facets of a smooth analysis. So for example, in the machine learning case, I would like to discuss where do we study noise? And in the multi-objective optimization, I would like to talk about partial perturbation models. So this is one of the areas probably many of you know far more than me. And because I never took any classes in machine learning. And Adam and Alex gave me quick lessons on this subject. So this year's Turing Award winner, Valen, introduced this learning scheme, essentially saying that if we receive data from certain distributions, a labeled data, then we would like to create an estimator, which hopefully will be accurate enough to predict the future data if it's drawn from this distribution. So that's sort of the high level idea of the scheme. And so there, basically, in the learning, it is quantified into algorithm and data collection. And the complexity is basically how many data you need to see in order to get accurate enough. So for example, we often say a polynomial time learning algorithm is the one that by looking at a polynomial amount of data drawing from this distribution, we can get reasonably close, for example, 99% close, to the future evaluation function. Those estimate the future disagreement of the function. And there's many subtleties in the framework. For example, in Valen's original framework, he assumed that the original function comes from certain family. And he's also learned from that family. For example, original function comes from decision tree. You're learning from decision tree or some other representation. And since then, those framework has been expanded. For example, one particular one is called agnostic learning. Namely, you don't even assume the original function comes from particular family. But you are learning from a particular family. You want to do the best possible from this family. So the language is very similar. You're receiving data. And then you try to build a classifier. You want to be as accurate as possible. So when I began to talk with Adam, he basically illustrates the following. He said that if you think from information perspective, if you see many, many, many data, you should be absolutely close. The really challenge is the trade-off between the computational limitation, how much you can do, and how accurate you want to predict. So it's fundamental trade-off. And so essentially, if you ignore computation by seeing a lot of data, we should be able to go as very accurate. But on the other hand, when we limit the computation, he mentioned that even the following simple example, we don't know the supporting nominal time algorithm. The following simple example is that imagine you have boolean function, even just exclusive or of a few variables, the parity of a few variables. And suppose you will not pull which variables are featured in this function. Suppose you only have log n variables. Tim choose log n variables. He really care. He made this exclusive or function. And suppose the distribution is just uniform. Then Tim basically say, OK, figure out which log n variable I got. If you get all my variables, I invite you for a dinner. And Adam basically said, this, so far, the best algorithm is n to the power log n. It's not even polynomial time. Even though there's a particular distribution, like uniform, and it's a very simple function, just exclusive or of log n variables. So he called this whole type problem. And he mentioned actually many fairly simple looking functions are not that easy to learn in this framework. And Adam draw the following slide. I didn't do that. But in the view of Valend and Adam, but children learn. We have kids. We watch them. They look at a very few data. And they learn beautifully. I work with one of the last Valend's kids, Paul Valend, when he was in high school. And I can clearly decide that he learned in constant time very quickly. But even though the whole type problem couldn't solve in logarithms, n to the polynomial time. So in this context, Alex, Adam, and I began to look at some of the partial explanation. And I'm not sure that we get everything to the precision or were writing intuition about this problem. But at least we feel there's one particular result, however small they are, that could be quite interesting. So we basically considered that imagine that your distribution has some fundamental noise. The function has no noise. This concept is concept. But the distribution of data has a little bit of noise. But currently, we can only handle for technical reasons so-called product distribution. Namely, when you have n boolean variables, each variable are independent. And they have their own means. For example, uniform distribution is a product distribution with half, half, half, half. You flip n perfect coins. And maybe sometimes your distribution come from a third, 25%, 75%, half, and so on. So this is called product distribution. So we are considering the following framework. I put the result up, but let me just explain more intuitively. And that is imagine that you have a fundamental distribution, come from product distribution. Each variable has their means, mu 1, mu 2, and all the way to mu n. It could be all half. And then you went to China factory to say, could you produce a coin for me, which will flip, will give this precision of your head and tail. And then the coin came back with 1% error. But that's become your natural distribution, because when you really produce distribution, that's you can only use your coins. So what we are trying to say is that imagine that the distribution itself has some small fraction of imprecision. And here I assume you mean the distribution, the product distribution. Then we are able to show that, for example, for all decision trees, including this hongta problem, which illustrated earlier, in fact, you can learn in polynomial time, assuming that the adversary does not hold a perfect coin. And that is the distribution you're doing in the measurements. So naturally, many of the step of the proof went through some basic technical setting. It actually took me a long time to learn through this, even though Adam promised me those are just basic trivial things which people, everybody, learn in machine learning. So I will skip those, but I will mention one highlight, lemma, which in some sense implies through this family of results. And this family of results, which Dan also hinted in his talk, he didn't use the word called non-concentration bound. Essentially, in a smooth analysis, often we prove so-called non-concentration bound. Instead of proof things that concentrate, we prove things non-concentrate. Namely, we want to prove that bad things don't concentrate. So here, for example, a very simple non-concentration bound, you want to read through all the analysis for at least our sequence of work. So this non-concentration bound is following. Now imagine you have a multilinear polynomial of degree d. And suppose you knew one of the leading polynomials of degree d term has a large enough coefficients. Let's say 1. You don't care about the coefficient of any other terms. But you knew one of the leading terms, for example, x1 to xd has coefficient of 1. Then we are able to show it's almost like an extension of Schwarz-Zippel theorem for testing polynomials. Then we're able to show that this type of polynomial, if you draw data uniformly from negative 1 and 1 cubed, it does not concentrate well against 0. We cannot quite concentrate at 0. So in fact, the concentrate at 0 is precisely the problem with solving this whole problem. Because uniform distribution had hard time against parity. And somehow, you concentrate perfectly at 0. This is some accidental cancellation. But here, if you couldn't concentrate, this accidental cancellation was eliminated. And so then, due to the brilliancy of Adam and Alex, so we are able to show this lemma implies actually identification level. You can somehow layer by layer identify which variable is prominent. And then you can do interpolation among them. So that's essentially the learning algorithm. You try to recognize which variable, which terms are important. And one can show there's only polynomial number of terms are important. But this lemma allow you to show that because of the noise in the distribution, if you get enough sample, you're able to extract out the features. And then you can do interpolation. So that's essentially the outline of this proof. And so here, basically, it's independent of what type of Boolean functions we study. As long as in product distribution, if there's small degree precision in precision, then suddenly, this framework of learning actually can be conducted in polynomial time. Is there ever an even stronger result which would say you can handle anything away from half? It seems like I have a conjecture at the end. So I will address back to this issue. It's actually kind of a cool, small technical conjecture. But it could be quite a fundamental lead to other analysis. So by doing this, we actually achieved a lot of a bound that can be achieved by membership queries. So essentially, a lot of plays by using much powerful learning mechanism you can learn. And here, we basically, in this traditional receiving data framework, we're able to accomplish like certain aspect of agnostic learning and so on. So that's basically this sequence of work. And I mentioned only one first part of the result. So any other question up to here before I switch in gear just a little bit? So on Monday, when Dan was giving his talk, he said, clearly, there's a lot of limitation of smooth analysis, in part because it's just first-order abstraction of some phenomenon. And naturally, in order to be more powerful as a mathematical tool, we would like to have better understanding when the perturbations are more limited. But in a lot of settings, we are not able to somehow carry through. For example, one area he really wanted to do is to do zero-preserving perturbations. And so far, we have very limited success. So here, basically, I will talk about one particular area. Somehow partial perturbation is sufficient to derive polynomial bound. So this is in the area called multi-objective optimization. So in computer science, we didn't study optimization. Like NP-hardening is essentially defined either for the decision problem or optimization problem. So in this framework, we have a single objective function. We have a constraint. And we want to optimize. For example, linear program, shortest path, TSP, and so on are defined in this framework. But in practice, often, we consider more than one objective function. Just that's the nature of our consideration. So for example, if you travel from Europe to here or from Asia to here, you have to consider when you buy tickets, there's multiple parameters. You want to minimize cost. You want to minimize delay. You want to minimize the number of hops. By train ticket planning, sometimes you do that too. And when you do routing on network, you may simultaneously want to minimize the length of message and also increase the chance the message will go there in case of something. So the reliability, for example. And very often, we consider multiple objectives. And so this is often called multi-objective optimization. And in abstraction, it means that we have constraint. And then we have several objective functions that we want to optimize. So the question is, how do we do this? And very often, it's very difficult, because the objective function may not all agree. And you have to make trade-offs. And very often, we have very incomplete information about trade-offs. So traditionally, that's what I call mathematical engineering. And we try to help the decision maker to simplify their task. So one of the ways to simplify their task is that we eliminate any potential solution that they don't care. Absolutely, under no conditions they care. So we try to give them a supporting set and they can choose a file. For example, whether you want to go to be a baseball player or you want to be a professor, we give a very few choices. So one notion here is so-called Pareto optimal, namely build upon dominance. For example, if you have two objective functions and you want to minimize, under no setting, you will prefer y over x. Because x is better than y in every direction. So that's what you call dominant. In general, basically, you can label the set of points of feasible solution that cannot be dominated by nobody. So this gave you also called Pareto set. And so geometrically, basically, each Pareto point can eliminate other points. And their geometric structure actually define so-called Pareto curves or Pareto surface if you go to higher dimension, lack of drawing power. And so one way we can help this inner maker is to output the Pareto set. If they change their mind about their monotonic function combined with this feature, we knew that their solution has not been eliminated. So one of the computational problems is that we clearly want to compute this set fast and we are hoping this set is small. But if this set is too huge, then we don't need you. So we will do it from scratch. So one of the central questions here is how large is Pareto set? It's a very tricky problem. So it's actually very easy to design multi-objective function to have exponential number of Pareto points. For example, if I have two objective functions, one is the negation of the other one, then clearly every feasible point is Pareto. If you win here, you lose there. Or you can make some more perturbation ourselves. So here, basically, in the spirit of smooth analysis, we consider the following small family of multi-objective optimization, what we call the linear binary optimization. Some of those can be extended to integer optimization with a limited domain. So essentially, we said it's supposedly our feasible solution is a subset of hypercube. And I have essentially d-linear objective function. For every node of hypercube, I have w1, w2, to wd. When I do dot products, I give you objective function. And then basically, this is my multi-objective function. I want to ask how big is the Pareto set. And it's actually fairly broad language. Because for example, multi-objective TSP, multi-objective meaning spanning trade can all be captured this way. Because that's a summation of edges, where the Boolean variable is whether you have edges or not. And you have a coefficient of this length or cost. So this is fairly broad language. So in the worst case, as I mentioned, it can be exponential in practice, which usually is small. For example, this nice experimental work from Germany, they study the sort of train connection in the German system, where trains are very reliable all the time. So they are able to conduct this analysis to actually show that you have a very small number of Pareto points, even though you have five or four objective functions. So we consider the partial perturbation model. That is, imagine that your coefficient of the objective function has small noise. But here, the combinatorial structure is absolutely absolute. For example, if I give you a graph, you want to consider minimum spanning tree. I cannot perturb your minimum spanning tree. You have to be feasible. So we have no chance on the subsets. The only thing is that there is no edge length could have a little bit of noise. When you measure the length, you may not be precise enough. So that's all the noise you have. So in this setting, basically, in your linear objective function, the wi has a little bit of noise. And the subset s is absolute. A subset of the hypercube cannot be perturbed. That's what you call the feasible set. If you want to fly from Europe to San Francisco, you have to arrive here. You cannot be perturbed into Portland. So you have to come here. So those subsets are absolute. So we are able to show, for example, under this small degree of noise, if the objective function is only constant, then we are able to show that the Pareto points actually respond to the polynomial. So I'm not going to present any proof of this, nor this bound, because this bound is the most embarrassing bound I ever proved as a constant. It actually has a de-factorial in the exponent. And since then, uncle Ryan Aldano greatly improved his result. And he's giving his talk shortly after me. So I will completely leave to uncle to argue for whether it's practical or not. And my bound is not to the de-factorial, so. Which is worse, de-factorial or ADE? Which was your previous comparison. Yeah, I think I'm probably more embarrassed by ETA, actually. De-factorial, at least in my head, it has short connotations. So they all slice the embarrassing. So do you think that, let's say, this type of results explain why the German trains, the schedules that are like? I leave it to uncle to defend. And explain. And even with de-factorial, I can't go anywhere close to that number. No, no, but do you think that the reason, let's say, why in the German train system, the new Pareto optimal point, the reason is because something like smooth analysis is going on there? Or the tool reason is something like that? I personally do not have any idea on this thing. That is an inspirational story for me so far. Maybe uncle has a deeper insight on the connections. So I'm entering my last part. So because the whole exploration of any model is every model has limitations. I mean, the limitation is not just how close we can measure practice, but also limitation, what we can analyze. Sometimes it's not just mathematically what we can analyze. And sometimes it may not be true anymore. So here, basically, since we began to conduct this study, we have always interested in a natural problem that somehow the worst case solution are stable. Because they offer a different mathematical structure to the problem. And so essentially here, I would like to mention some of the realization of clearly natural phenomena that the worst case incident are relatively stable. So this is what I call the game and optimization. Because I would mostly talk about optimization so far. And so somehow this type of study eventually lead some of our investigation into game theory. And so in order to make a quick connection, I'll also give you a little bit of a small break. I will give a short introduction of the connection in my own way. What is game? What is optimization? And what is related with what I just presented? So optimization, traditional one, is that we have many decision parameters. We have constraint. We have single authority, President Obama. And he needs to set all the parameters to optimize our economy. There's optimization, there's constraint. Decision parameters, you need to set to optimize. And so there, basically, as we mentioned, there's many study could be related with global optimum, local optimum, or approximations. So some partial analysis of smooth analysis has been conducted in this area. And most of our investigation lead to a positive result. Namely, somehow noise did improve the solution space. And as we move to multi-objective optimization, the scheme changes just slightly. That is, instead of having one objective function, we suddenly have several objective functions. But again, we still have a single authority, namely President Obama. And he's still deciding all the parameters. And he want to make US safe, happy, California happy. And even though we may not care about Connecticut, but still, he has to care about Connecticut. He has to make trade-offs. So this is often called multi-objective optimization. And for example, Pareto optimal is one of the solution concepts in that domain. I'm sure there's other trade-off solution space over here. Pareto is just one of the particular solution space. And so here is my view of a game. It's a natural extension of multi-objective optimization. So again, we have multiple objective functions. Now we have many decision makers. Each decision maker has its own objective function. But their function depends upon the decision parameters of everyone. Because California economy does depend upon the economy of other states and the country, even the world. So overall dependency, but individual player make individual choice. And hopefully, in this global setting, they optimize optimal enough. So for example, Nash Equilibrium is essentially built upon this framework and building on its notion called the best response. It roughly says that imagine you are the lucky last decision maker. What would you do? Then clearly, you just optimize for yourself. And if next day, another governor says, I will want to make the last decision, then the setting changed. So it clearly created this dynamics. Because each individual's best response resets certain parameters. You have to reevaluate if you need to continue to move forward. So Nash fundamentally studied this dynamic and saying, some settings are stable in the sense that it doesn't matter who is making the last call, it's good enough. He said, that's good enough. Everybody said, that's good enough. It didn't mean optimal everywhere. They just say, everybody said, it's good enough. There's nothing more they can do to improve their own situations. So that's the intuitive notion of Nash Equilibrium in the context of optimization. It's sort of many individuals, many objective functions. And they could have treat all among individuals. So in this setting, our study of smooth analysis have a sort of a natural question early on. So in fact, actually, after Dan and I just published the result on the Simplex result, we had a visitor from Duke University. His name is John Rife. He came to MIT. I think I remember in Spielman's office. He asked the following question. He said, what is the most complex of Lampyke-Hausen? I didn't remember Spielman's response. All my responses, I don't remember what the Lampyke-Hausen algorithm in detail. And I just weakly remember it relative to the games, but nothing more than that. And I don't know what Spielman's response at that time. I don't remember. You don't even remember that. So essentially, many people ask this question. Because in their view, the Lampyke-Hausen is some kind of a simplex-ish algorithm on two polytopes, rather than, like, simplex algorithm is on one polytope. So it's a fairly natural extension as a scheme. So John basically asked, if you have noise in your two-player games, then can Lampyke-Hausen convert efficiently? So that's a fairly well-defined problem. For example, here, two-player games are one of the basic matrix form games. I think some of the introduction already made here. So you have several actions for each player. And each player has their payoff matrices and depend upon the setting of both actions. So that's why often called by-matrix game. You have two matrices, this is a complete information game. And you can define mixed strategy, namely what is the distribution of the action you want to take. You can talk about expected payoff. And in this context, one can define large equilibria. So mathematical programming, one can write out, an equilibrium is the distribution that individually, you cannot flip your part to improve your expected performance. So this is quadratic form. And this approximation notion can be defined. Namely, if you change, you only improve by 1% you don't care anymore. Or 1 cent you don't care about anymore. So this is sort of both approximation as well as exact computation. And large proof, independent of A and B, for every A and B solution already exists. So mathematically, it's a set of the problem. And computationally, what we are interested in is, if you give me matrix A and B, how quickly I can give you one equilibrium? So that's the simple problem. And a related setting is in some basic economic set For example, in the simplest setting I learned, it is called a change economy. Namely, when you have a collection of traders, you have a collection of goods. Assuming it's sort of infinite, divisible. And then initially, everybody has an endowment. Like farmers came with fruit. And some people come with an iPhone. I need to update my image now with iPhones, rather than with CDs no longer important anymore. Cuban cigar is probably still powerful. And then, but the individual has utility functions. So the fundamental problem there is, how do you design an exchange, so that everybody's happy enough? So the question about what is happy enough is defined by this mathematical equilibria for exchange. So this is actually very computational. In the view of Erode de Brule, and the basic set, suppose you have this virtual distributed protocol, that everybody come to the market, sell to virtual market. Assuming that there's a label of a price. Suppose your government announces a price, everybody go to virtual market and just sell. You don't care about who is buying, and you got your money. And then you go back and to buy from the virtual market, which is the union of the good, to optimize your objective function. So then, so-called correct price is the price that somehow cleared the market. That after this virtual exchange, everybody get a fraction of the market, and the union is precisely in the market. Nobody has money in the pocket, and there's no fruit or grapes on the market anymore. So everything is cleared. So again, on the mild condition, this equilibria already exists. The question is how quickly we can compute the one, or approximate the price. So in this setting, we become much more interested in, for example, expanding the smooth analysis, assuming that each coefficient has a little bit of noise. So people's payoffs have a little bit of noise. So this is a very active area for recent study, and a breakthrough result in 2005 by Daskalakis, Kodberg, and Papa Dimitriou, who was done at Berkeley, essentially showed that if you have three player games, then they are so-called PPAD complete. Namely, if you can find the equilibrium for three player games, then you essentially can solve every family of this type of equilibrium problem, in polynomial time. And so in a quite stunning result by Chicheng and Xiaotie then in 2005, they showed that actually two player Nash, which the one I draw, actually is also PPAD complete. And so in our search for the smooth analysis result, we prove an intermediate result to show that actually it's hard to approximate, that you cannot compute the absolute Nash equilibria in polynomial one over absolute time. And so essentially say, this is a family problem, a polynomial approximation is as hard as just solving it exactly, okay? So this last theorem somehow settled the smooth complexity and gave up an evidence that this actually a challenging setting that the worst case example are very stable. It's actually said that the two player game doesn't have a smooth polynomial complexity provided PPAD is not in RP. Because if you perturb a two player game a little bit, you get an approximate solution if you solve it exactly there. And since you couldn't find an approximate solution, which means that the perturbation still result a hard instance, right? So somehow the connection between approximation complexity and smooth complexity here and the hardness of approximation leads to a proof without doing probability that the smooth complexity of two player game may not be in polynomial. So it's different from the simplex algorithm, okay? So this gave us some quite natural setting. Somehow it's a very natural problem. The worst case instance is highly stable without conducting a probability analysis. It's just, you know, this sequence of approximation result implied that you cannot somehow perturb your way out, right? So this can be also extended to the exchange markets, okay? So we did this context and also optimization and the PPAD basically according to Papadimitriou it could be a sort of a picture in between P and Np and Mogidou actually has a result to show that it's unlikely they are the whole Np because if you have Np complete problem there, then I guess co-Np equal to Np, right? So which means unlikely unless certain miracle in complexity landscape is unlikely this will be the entire blue. And so PLS is another family. Again, it's unlikely to be Np complete, otherwise Np is equal to co-Np, okay? So this blue landscape is where we actually have some limited success in smooth analysis and this area is one which we have totally no success in smooth analysis. So somehow the mathematical structure here must be sophisticated enough. Somehow there's a lot of fractals in the domain of solution and when you're perturbed you continue to stay in the hardness, these things. So I think like Michael I'm listening to columns. So here is a local search and a fixed point computation. And so here is my sort of frustration on one hand and the other hand we're celebrating certain results. For example in linear program, not only the world push it into P or weekly polynomial, we're able to conduct smooth polynomial analysis. And but in the game setting, the parallel problem, the two player game, whether it's in a smooth model or in the worst case model, they both seem to be hard. And here basically for the local search there's a trivial, fully polynomial time approximation algorithm. But on this side also it seems to indicate approximation is hard. And the one which making me feel this side probably is harder than this side is that the local search is very intuitive because we have this landscape like a mountain. We want to go down. We always know somehow we feel we understand topology here. But the PPAD I guess it's only intuitive to crystal proper dimension and a few other people in the field. I often get very confused in this landscape. And so we did ask a question across this landscape is a fixed point fundamentally harder than local search. And this still unsettled the problem. Although there's one small result we were able to obtain a few years ago. So asking in the black box model who is easy? Namely you can query for fixed point or you can query for value for local optimization. So there's an early result by Aldoche in 83 showing if you allow randomization then local search can be sped up. Even though deterministically worst case fixed point and local search has similar complexity. They seem to be all behave very similarly. But if you allow randomization Aldoche show that you can dramatically reduce the search space for local search. But on the other hand we're able to show randomization does not help fixed point computation. Of course Oracle model hide a lot of things. And I don't fully you know this is the first time I write the paper in Oracle model. Actually I do not know what's the danger in this model. Many people warned me to say Oracle hide too many things. So but nevertheless there's sort of reasonably natural separation between these two family. Okay, three minutes to conclude. Okay, so let me conclude with Tim obviously do you have a concrete question? So I think usually he asked that way. I feel in my talk and with a few concrete open question whether you like it or not. Some of those I'm currently working on and it seems to be quite mathematically interesting. So one question related with this little non-concentration bound. So Adam, Alex and I were able to show that if you have a multilinear polynomial with constant, with the largest term coefficient one of degree D, we're able to show that you cannot concentrate against zero. So we worked on this pretty hard for I think almost entire summer when we were all in Microsoft New England lab. We were still not able to settle the following variation, which is seemingly true. I did quite a few experiments. They look fairly true. That is imagine you have a multilinear which is critical multilinear. It's not polynomial. Multilinear polynomial of n variables of degree D and the constant term is one. But you can have n to the D terms with different coefficients. And we would like to show that if you draw uniform random from zero and one for each variable you cannot highly concentrated at zero. So clearly when degree go up you will be able to concentrate because the function x1, x2 times xD highly concentrated at zero. But if you plot minus one you can get x1, minus one, x2, minus one. So that's highly concentrated at zero. So you can't get a constant term of one you highly concentrated at zero. But currently if you limit to the degree and enforce the multilinear and our strong beliefs this cannot be highly concentrated at zero. And very often when I'm flying around when I'm cooking in my kitchen and I think about this conjecture I think it's true it may lead to some other algebraic analysis. And so if any of you has ideas or will solve this problem I will be very happy. So this is often when we conduct a smooth analysis we somehow eventually reduce to some kind of non-concentration setting. Sometimes it can be complex but in this particular case this particular conjecture was derived initially when we tried to conduct the agnostic learning setting. And Adam basically said if this conjecture is true then he has the proof. But eventually he got the proof without this conjecture. So somehow he went around this conjecture he proved his real art. But nevertheless in my head it seems to be very clean algebraic problem. Are you willing to accept the polynomial and end the problem in terms of the two? Yeah, I'm willing to set the polynomial end. Here I don't want to have a polynomial in anything D in exponents. You can even say cubic. That's fine too. So you can see I choose zero small part of it means to concentrate at zero, right? So I'm able to give a weaker polynomial floating anywhere. But the dependency I want so essentially my D is roughly log n. So basically you can think about this log n, right? Then you can, I'm happy you put in polynomial everywhere. And but the unexponent I don't want you to have any degree of one over D. Because that's trivial. And also essentially not very useful. But without depending on D seem to be the linear function, right? Multilinear somehow. The linear function lead to that. So this one of the sort of hopefully clean enough simple enough conjecture. And so the second set of conjecture is a, you know, since Chi Chen and Xiao Tiedeng introduced me to game theory, I began to read a little bit more of the papers in game theory. In fact, one of the first paper I read is from Tim on the potential game. And so at that time I was thinking what smooth analysis can be done for a potential game? And here is actually one very simple conjecture which is still open. And then my post-all Jialing and I are working pretty hard in the last period of time. We still didn't solve this problem yet. So that is, there's a very simple game that in my head is the simplest congestion game called the Max-Cart game. Okay, so here is Max-Cart game. You have a complete weighted graph, the W-I-J. And so that each individual is, you know, this is a game, right? So initially it's the point you have a partition of left and right. Then each individual, you ask, would you like to go to the other side? If it increases your cut, suppose, you know, the cut is the edges, the weight is an indication how much you don't like people there, right? Max-Cart means that you want to go to party, maximize the cut, which means maximize the dislikingness of the other party. This is a natural game, and it's a potential game. So this, you can easily show that every such a move, there's a potential function, will drop exactly same as your dislikingness, right? So this game has to converge because it's a potential game. And this case to show, it may take exponential number iteration to converge. And here I'm asking this very simple question that is imagine the weight has a little bit of noise. Or even just random. Imagine the weight is random or not. Zero, between zero and one, or in a smooth model with a perturbation. I would like to show that from any initial partition, this sort of response dynamics will converge into an equilibria in polynomial steps. So you don't particularly care about the value of the cut that we get? No, no, I don't. I just, this is a game, right? So if you reach stability, I'm happy. So you get a half approximation essentially. Yeah, the overall the dislikingness is not our problem. We just make sure each individual feel they're in the right party. That's all. But each individual is in the right hot party. If you start from the random cut, do you know anything? We don't know. Oh, sorry. Yeah, we don't know. So this is a very frustrating study so far. But it seems to be a very simple problem, right? And this can be, for example, extending into some other similar game, like this is a scheduling game or cost sharing game. I'm not going to get into detail. But they all have very similar phenomenons. Namely, if they have a lot of weights and they will converge, clearly just zero one, you will converge in polynomial time. Because every step you will gain one, right? This is a polynomial bound you will converge. Here I'm talking about the arbitrary weights from zero one, okay? So again, if you have any idea, I'm happy to hear about it. If you solve it, I'm happy to hear about it, too. So it's the same as the local search problem, right? Yeah, this is just local search, yeah. But after I read the team's paper, I said, this is a game. But this is exactly one of the local search, right? You're just swapping, right? So the game just inspired more of my students to figure out this optimization problem than none of my students want to work on it anymore. So it is a local search in disguise, yeah, yeah, okay. So as Dan said, it's very hard to conduct some most analysis for iterative dynamics. And so in a simplex algorithm, we are lucky enough. This simple geometry allows us to go through. And so far, the most complex, the iterative analysis came from Arthur, Matthew, and Heiko Rogling. They successfully, recently to show, lawyers came in clustering, converging polynomial time in a smooth model, right? So came in clustering is this very widely used heuristic for data mining. That is, imagine you have endpoints, let's say, in some constant dimension, 50 dimension. And then you want to produce in k clusters. So you choose k centers. Then you build a Voronoi diagram. Everybody go to the nearest center. And then basically, you will recompute their center of gravity, produce k-mall centers. And then everybody go to the nearest center. And you repeat this process. So Lloyd observed that this process is a potential in local search. So I'm not going fancy anymore. Here is a good audience. So this local search process, every time you update your center, everybody go with the closest center. And this potential function will fundamentally reduce. And hence, after a certain number of iterations, they have to converge because you have finite number of configurations, right? Exponential number, but finite number of configurations. So in the worst case, even in two dimensions, this takes exponential time to converge. And recently, this group of researchers show that if the point has a small degree of noise, then the noise algorithm actually went down polynomial iterations. And so there are some how they can overcome this fundamental challenge of if you keep on going down the step, you're losing your randomness. So they are able to have certain structured techniques from this particular problem to go through. So I was hoping that somehow this technique can be applied to max-cut game or scheduling games or other local search problems. So essentially, analyzing the iterative process and dynamics could be an interesting direction of study because it will enrich our power to do probabilistic analysis. And I think even in the days of average case analysis, these are also the challenging settings because when we look at the random variable, it's no longer random. It doesn't matter how long we look at them. So it's a fundamental challenge in the iterative analysis. And so we would like to also sharpen our ability to prove the non-concentration bound, which could be a good technical direction. And but in the context of this workshop, I think looking for better input model, particularly for discrete problem, could be a very fruitful area for continuous study. And so with this, I conclude there. Thanks. I'm just asking the two columns, both of them, are unknown under national equilibrium or what is the? Unfortunately, we don't assumption on PPAB. What is polynomial? So I do not mean that unknown is in the last term of PPAB. So it is known to be PPAB. Yes, yes, yes. And could you say a bit about if you know how this, how we sort of reconcile that understanding of what the hard instances with the positive results on the approximation-stable instances that the room mentioned? Right, so here, I think first of all, I think you can't get rid of the long end yet on the exponent, the first of all. So secondly, I think, for example, when we talk about the approximation scheme, we have a different degree of strengths, right? A is the exact time you compute it in polynomial time. And B is things can be computed in formally polynomial time, namely, for epsilon approximation, you only pay polynomial one over epsilon, right? So there is another here, which we often call a polynomial approximation, which is a lot of the exponents have one over epsilon. And so I think, for example, when epsilon is like our natural precision, which in my head is one of our polynomial, then they all eventually collapse back into the hardness, right? So there is a particular range. I think the beautiful open question, whether you can get rid of the long end, in lift and to rid of. So I think, you know, our scarification relation continue to remain to see, are there any parameters can inject there? So there's a beautiful setting there. But the structure is clearly very complex, that somehow this perturbation induces stability. I still, for the moment, don't feel I'm understanding the situation. But it was proven indicated, you do have stable ratings, yeah. But shooting, I think, like that. All right. Is that a good example? Say again? Is that a good example of a local search scale global option? Probably, right. That is not typical. The mogul was always for you. Any other questions? So the max cutting is kind of strange, because if you say everything was multiples of sigma, you just sort of cut the lower bits off. Then it's a polynomial, right? It's all about those lower things, right? So if your truncate is trivially polynomial, you have more of a congestion gain if your truncate, which means that at every step, you can make a significant progress at the precision. So here, basically, just that, you know, if you're noisy, and so we're walking on the boundary of this subtlety of the, yeah. So I mean, in my own mind, it's very strikingly interesting, you know. How important is the distribution of noise, for example, in that learning example that you gave? I think we are heavily constant. We are heavily constant. You could go with one hour log. There are different types of distribution. We cannot go to one hour root n. root n is there, right? So there's a certain, I think, one hour log is fine. But beyond that, we don't know, right?