 So we'll show you some technical results later, but kind of a high-level question for today is an easy S&S in fact in my argument. So that would be kind of the first thing. And it's kind of the motivation for what we want to ask him. And I guess before we ask this question, we should ask maybe some more basic question, such that what is an optimal algorithm? What do we mean by that? Should one exist? I mean, before we even ask the question is S&S an optimal algorithm, is there any reason why there should be an optimal algorithm in regardless of the way it is done? And then, if so, why S&S and not, I don't know what, something else? So to define the notion of an optimal algorithm, we need to pay attention to some kind of domain. And let me talk about MD optimization problems. So when we think about the following thing, say, NP maximization. Miximization or immunization really doesn't matter. It takes one of them. So the input is some function from 0, 1 to the n to, say, 0, 1. And some kind of an efficient representation, some kind of representation that allows us to evaluate this function. And the goal is to, say, find x in 0, 1, n that maximizes. And generally, so we'll define op to be the maximum value. And even for any algorithm a, we just define a of f. A of f, I sometimes think of it as seeing that it outputs the quality of this value. And for example, you think of, say, max cut. Then this g. Then we just define, like, f g of x to be a fraction of h is cut. So max cut is a special case of this thing. The input of the function were basically given any bipartition which we can encode as a string in the value. And say it's the most expansion, we would define something like, you know, f of x would be maybe 0. If the max x is larger than some delta n, and otherwise would be, I don't know, 1 minus number of edges from s to its complement normalize. So this is a, you know, basically any NP problem you can just write in this way. It's not like any restriction or anything like that. And we can, and almost in many cases this f is also going to be a low degree polynomial. So in such cases, it's very natural to write a, say, an SOS program for it. So now we want, so what is a computational problem? So basically we cannot think of a set of all possible f's. In particular, we don't know how to evaluate them. We can think of a set of all, you know, all the steps that are represented based on set, but then we'll get a meaningless thing where every algorithm is optimal because no algorithm can do anything. So in some sense, what we are interested in is looking at some kind of a restriction. So we define a computational problem is simply some collection f, you know, just collection f, f, n. You know, f, n is a subset of all functions from g1 and g2. Say the performance of a on f would be, say, the lim, the upper limit of n goes to infinity. Let's say the maximum, the worst case thing, so the maximum of t of n minus a. Okay, so this is kind of like one natural way to define how well an algorithm performs on a computational problem. And now at least we can make some definition, right? So we can define, say that a is t epsilon optimal for f. For every time t of n be the performance of a, you know, is at least performance, I guess. It's the smallest, the way I define it is this. If it's 0, it's optimal, right? Okay, so. Is the performance supposed to be the minimum over f? So it's supposed to be the maximum, I think, the worst case, right? So performance means that, you know, every problem will have, like, three bad instances. So there are always some instances where you can get optimal. So sometimes the performance is the worst case instance. And, like, an algorithm is purely optimal if it just outputs the optimal thing, period. But that's, of course, you know, if p is bigger than p, we don't expect an efficient, you know, an efficient algorithm that just outputs the optimal. But when we say that an algorithm is optimal, what we think is that it's basically as good as any other algorithm that runs in comparable time. So, and there may be more than one way to formalize it. There is definitely more than one way to formalize it. This is just, like, kind of a reasonable way to study it. And you can also ask this question about, say, this is kind of a worst case. Are these the notions? You can ask questions about the worst case. You can ask them in secretive notions. You can ask them maybe by taking notes. It's the same, or you can ask questions about the average case or even instance space optimality. But you have to be just a little bit careful with it. If you talk about instance space optimality because you want to avoid this situation where, you know, a trivial algorithm that just remembers, you know, the answer, you know, like this joke that every clock, you know, even a clock that doesn't move shows the right time twice a day. So, you know, there could be like a trivial algorithm that still solves the problem on a very particular instance. So you sometimes have to be a little bit careful in how you define it. You can kind of see it in the LOS result. But generally speaking, this is the rhythm of the definition. Now there's a question. Before we even ask if some squares satisfies this definition, we should ask whether we should expect that anything should satisfy this definition. Anything known to you? I agree with that. For example, the order, the error grows down there. I mean, something could have an error of, like, whatever you're at. It's going to have an n squared. It's one of the n squared that's more optimal. Great. So I guess this is kind of a notion of optimality up to some epsilon. And I guess, more generally, you would ask maybe some third version where epsilon is kind of a function of n or something like that. So I guess looking ahead, I would imagine that, I mean, even having some squares be optimal in this additive epsilon sense would be, like, the constant epsilon would be, like, what epsilon would be amazing. And somehow, so I feel like starting a bigger definition. But maybe going back to your question, here is a theorem for every team. Every team, which is a time complexity function, you know, not something too crazy. Say monotone, some computer, but whatever. So time, valid time complexity function. There is, there is a tn, polyn, probability that is optimal for every, that is t0. This is basically, the proof is basically the following. So our algorithm A and input f will do the following. Enumerate 1 to mn, you know, first two machines and output, you know, run each for tn steps. Like x1 and xn, you know, some of them, maybe they will run too long, it will just, you know, throw out those outputs. And then output best one, right, xi that maximizes it. Right, so the running time here would be, like, you know, n times tn plus the time to evaluate f, or basically tn, polyn. So, so there is an optimal algorithm. But there is something, right, when you see this proof, it somehow doesn't feel very satisfying. We don't feel that we have learned a lot from this proof. I mean, it's not completely clear that we haven't learned anything. I mean, you could imagine, a priori, that for every, every different computation problem, we need to require a kind of different idea and somehow there will be an infinite number of possible algorithms for the problem. So it sounds nice and all that. Because this is kind of the lesson that we've learned, time and again, and with these kind of things. So, okay, so, so now we want to, okay, we, we don't like the answer, or at least I don't like the answer. And so we want to somehow redefine the question. Yes? Do we know such an algorithm? Or are even very simple problems at all? I mean, there are some situations where we have, say, tight, tight-hardness results where you basically say, here is a problem, the trivial algorithm gives performance, you know, blue on it. And we can prove that it's FPR to give performance better than black. So that is, you prove that this trivial algorithm is optimal. So, for example, you know, for FRISAT, you know that the algorithm assignment is optimal. And if you're talking about in the ocean of force, approximation, which is like what they're talking about right now. I'll put in your random assignment for FRISAT is an epsilon and even little of one optimal algorithm with respect to any, assuming that, you know, the edge with respect to any T that is like 2 to the little of F. So, okay. Okay, so we don't like, I mean, I don't like this answer. So I want to rephrase the question, should a nice optimal algorithm exist. So you can ask, what does it mean to be nice? So one answer is this, you know, the same answer like pornography, I know it when I see. So that's one version, one answer. Another one related to it is you don't want to use diagonalization. Of course, you can hide diagonalization in lots of ways. So it's not a very well defined statement. Another thing that is more concrete is that it's simple enough, unconditional or not. So somehow, you know, for very concrete algorithm, like some squares we have seen, we have shown limitations of this algorithm, that there are things that they cannot do. And for this algorithm, it's kind of obvious that if you show that it has a sort of problem, you're basically, you know, proven that P is different from NB, et cetera. And the reason I want unconditional low bound is not because I don't believe the assumptions, you know. I believe that P is different from NB, you know. It's not a problem. My problem with using assumptions is not that they might not be true. My problem is that basically, when you show unconditional low bound, it means you understand this algorithm. You understand it enough to analyze it, so it's kind of useful. So it's something that, so it's somewhat sure that it's more than some knowledge. We want it to be kind of another notion of being simple and related to that, but we want it ready to try to, you know, have a mathematical notion, you know, so we can use to like, you know, make geometry analysis, et cetera, to understand it. So in such a sense, computation is very hard, and what we want to do, an optimal algorithm that I gave you is that it should help us. It would be nice if it helps us to convert problems about computation to maybe problems that are somewhat more concrete. And so now, so now we know that some of the optimal algorithm exists, but why should it be a nice one? Like maybe a priori, in fact, like computation is very complicated. I don't really have a good answer to this. One answer is that it would be really, really cool. So, you know, because basically a nice algorithm, a nice optimal algorithm, means that maybe we have some kind of a way to, you know, take to partition the problems that are easy or hard, but by some kind of a clean criteria, you know, basically we say this is where the nice algorithm works, and this is where nothing works. And this is really, really, really, very, very nice because of it. So, you know, so one thing is that, you know, people will be really, really cool. We really, really want it. We have good people. You know, we have, I don't know, donated to the right causes, et cetera. Then why shouldn't we get something nice? You know, it's been a bad year. So, that's one reason, although that's probably the fact that we really want something, as long as we don't have to get it, I know. So, what other reasons do we have to expect a nice algorithm to exist? So, one of them is that if you actually open, you know, a graduated algorithm in textbook, or generally an algorithm in textbook, especially if one that focuses on optimization, then, you know, I haven't actually done it, but people that have told me that the chapters, they were kind of repeating themes. It's not that basically it's a collection of, you know, 200 chapters. Each one of them has, you know, for a different problem. There is a very different idea that solves this problem. But, however, there are some kind of repeating themes that keep again, requiring again and again, you know, really algorithms, convexity, mathematical relaxation, et cetera. So, these are kind of things that, you know, divide and conquer. So, these are things that are repeating themes, which tells that maybe, you know, there is a rich, there is a rich space of problems, but there are like some big, broad ideas that somehow cover like a large fraction of these problems. So, maybe, you know, a nice optimal algorithm is supposed to maybe one idea and maybe optimize some kind of significant chunk of the problem space. And here there is another reason which here I'm venturing even beyond my expertise, but I have also been told that in practice, when people actually want to solve optimization problems, they don't code up a different algorithm for every problem. However, there are like general optimization packages that get used again and again. This again indicates that maybe, you know, it's not that every different problem requires a completely different algorithm, but there is kind of one general thing that can be used again and again. So, I think the kind of state of art, both in Syrian practice, kind of suggests that if you're trying to ask whether a problem is just inherently easy or inherently hard, you might only need one or two general principles, while if you're trying to, when you need a lot of ad hoc tricks to either, you know, really squeeze out the performance for this problem, which could be very important to like, you know, shape, also a symbolic performance, like, you know, shape an exponent for this problem, which could be very important. Or sometimes what happens in theory papers is that you modify the algorithm, but not because the general simple one doesn't work necessarily, but because it makes the analysis easier. So you have a lot of ad hoc ideas that are really there to facilitate the analysis rather than, that we know that they are inherently necessary. So it seems consistent with, you know, current practice, that some nice algorithm could exist, that at least we cover like a significant swath of problem space. And I hope that they are kind of motivated why this is possibly the case, we kind of want to really investigate it and find out what it is doing. And then, of course, even if you kind of buy the notion that one could hope for a nice algorithm and you could ask why it is less, and again, maybe not, but it does seem like complex optimization, complex optimization is very much a recurring theme in many questions. So you can ask whether, so now you can try to prove some results actually, like not just pontificates, so about SOS. So you could ask, for example, can we show, you know, is SOS optimal complex optimization? And again, it's kind of a very general question. We don't know how to answer it. You can even ask if SOS is optimal for SDP. This is still a general question because semi-definite programming, even linear programming, is decoupling. So it means you can basically encode any arbitrary computation in an SDP program. So basically, trying to prove this would be just right for the optimal period. But we can ask for, in the SDP optimal for, say, nice formulations of SVT, and what we'll see in this lecture is this very, very nice result of the other. And I thought that basically they say yes for a restricted case, but still a very interesting one. And I should also say that when we're talking about the worst case approximation results, you can also prove, sometimes you can basically prove in the, you know, MV different from P, imply SOS optimal. And for certain CSPs, this was shown by Chang. The thing that's a little bit less, the thing that is a little bit less satisfying about you is that basically he didn't really show that some squares with optimal, he showed that on certain CSPs with some squares that will give, basically that gives trivial performance. Every algorithm also gives trivial performance. So it basically shows that trivial algorithm is optimal for these CSPs. But the class of CSPs is chosen basically based on what, on SOS over. And basically, and this is a moderately game projection, then this is true for all CSPs, as I mentioned, this is the result of what we're going to do. But I feel like, again, I still care, even though you can basically prove this, possibly, I mean, very much because we understand that we can remove the UGC here, or of course, what people are trying to do is prove the UGC, which is what we need to do. But, question mark. Yeah, the question? Yes. This is a very important way of thinking about the algorithm and just for the workspace instance. Yes. The other part is of course, there are certainly instances for which a random algorithm will do as well as a SOS algorithm. But for most instances, let's say pretty much any instance of what a SOS algorithm is. Yes. Right, so I think it's very interesting to understand kind of instance-based complexity and there, basically, it's kind of empty, different from P-based results, a kind of non-starter. But LRS-type results, I think, they formally stated in the language of LRS-type complexity, it really is basically instance. Yeah, so LRS is basically instance, show that the SOS is instance-optimal, as we kind of see. So, basically, this is... So, our focus would be this result of the recommender-stroller, and we'll take some time to prove it and we'll probably even not get to see the full proof, but we'll get to see the ideas behind the proof which are actually very interesting in their own. But, let me at least kind of state the result. Okay, so now, basically, I want to state this LRS result. The question is, okay, maybe I'll start by stating the corollary and then I'll start by stating the corollary and then I'll state the actual result. So, okay. So, we have to say, what does it mean to be... What does it mean to talk about the general? Okay, so, LRS basically saying that for every... Let's say we've CSP, it's called general number. Let's take a function problem. So, the SOS of the BV is roughly as... bits, any SDP of size, of n to the little of the constants. This is a very, very big thing and any will be subcarriers. So, to avoid trivialities and avoid things we don't know how to prove. And also, the n to the little of d, there are, like, some conditions on what d exactly you can... how d you can make a still happy statement. So, but this is roughly speaking. So, basically, the sum of squares, the sum of squares, the PST is some way to write a positive semi-definite program where you put n to the d constraints corresponding to the degree d polynomial and, basically, LRS say that it doesn't help you to put any other set of constraints. There is no kind of clever way to do it. So, for that, we kind of... let me try to formalize this more. So, basically, let's say that if you have a sub-space of, you know, r2 to the n, basically, this set of functions from z1 to the n to 2r. So, if it's a sub-space, then we say that f from z1 to the n to r has a view proof that's equal to... for some UIs, you know, for the UIs, right, so... So, SOS... So, SOS corresponds to... SOS corresponds to the sub-space of all degree polynomial, but in general, STP, basically, the way we're thinking about it corresponds to some arbitrary subset. So, maybe I'll... do you want to change here? I'll change this n into n. It's consistent with future notation. So, let's say we stick with it. Okay, so... And... So, now, let's formally say... or let's say, lr2m is basically saying the following. Here is one version of it that is not exactly true, but we might pick more true. Basically, if f has a U, a proof of the dimension of U is the n to the d, then it has, I don't know, some octave of the SOS. And maybe that it has... I think we are willing to, you know... lose a little bit with a... no, no, no, SOS code that is at least minus f. Okay? So... So, this is kind of the theorem. This would be kind of instance of the magnitude of SOS. This is kind of the theorem that you don't like to prove, but unfortunately, it's not true for kind of... it cannot be true for a very... kind of trivial reason. You see why? Well, the substrate is not very many at the altitude, probably. Maybe to some extent, or even something like that. You take a single f0, take an f0 which is, you know, non-negative, and take an f0 which is non-negative, and it has no, you know, say, little open SOS code. And just define u to be the subspace that just contains clove of f0. You know, we take... So, take a... Right? So... So, there is something... This is this kind of triviality of instance that is optimality. But this is somehow not really a problem, right? That's okay. We are not worried about semi-definite programs that have programs inside like they know exactly the input which we have. So... So, the way... So, now what we want to say is the following thing. And... we'll say, we'll get rid of this problem by a very simple thing that... basically just... we're kind of renaming the variable. So, basically what we want to say is the following thing. So... So, we can phrase this in the counter-positive version. We say, suppose... So, this is now... At this point, it is almost like the actual SPOM. I'll say that I think the only difference is our quantitative... which is probably only for this lecture so it doesn't really matter. So, the LOS theorem would say the following. Suppose that f from Z1 to M to R has a... no SOS proof that f is at least minus epsilon. Then, we can define a G from Z1 to M to R then for every U in R2 to the N and there would be some N which is like M to the O or something like that. This is the degree of course. So, for every U of the dimension, say less than M or N to the O of U there will exist a function which is basically the same as F just applying it in some subset of the random variable. So, the G of X is simply F of X applied to S where S is some subset of M size subset without a U proof of non-negative. So, in some sense, one way to think about it is the following. Say, suppose that there is this graph G on M vertices such that sum of squares cannot... you know, the true maximum cut the max cut of G is at most some... you know, is equal to alpha but SOS only certifies a bound of beta then basically what we show is that for every U so you take U which is a subset of R2 to the N then there is going to be a graph which is just G but just on the subset of the vertices. So basically this is a graph where it just doesn't... completely doesn't touch these vertices just G where the same thing goes up to some small epsilon the same thing goes with respect to this general SDP problem. So, general SDP problem basically we cannot get rid of the problem but it couldn't code a particular function but a very simple thing which we basically say well you have to... if you work for one function you have to work also for, you know, this function applied to... re-name the variable. So... and this turns out to be re-name. In particular this implies that for any constraints of this action problem basically sum of squares the performance of sum of squares is as good as the performance of any other SDP and you can see that basically it's really instance optimality because, you know, the two instances are basically the same instance after re-name of the variable. Okay, so this is... this is... so the statement of the theorem maybe I'll... even try to write it... C so maybe I'll... maybe I'll kind of write it because we're going to take a long time until we come back to actually coding it. So the theorem is the following for every f from 0, 1 to the n to... say... so it's kind of a boundary function if there is this... V to the distribution like minus... minus 0.1 then for some... say m which is 6m over D would be enough the dimension of U which is more... m to the middle of V or something like that then it's called some... yes sorry, it's not a particularly well-formed question but the theorem seems to be saying something like if you have some problem that SLS has trouble solving then you can obscure the problem in a way such that other programs also have trouble solving a problem not exactly obscure basically the same problem the exact same problem it's still saying like this is a way of representing the problem subject to that other programs have trouble solving a problem is that... somehow it feels different from what we want just because there is one way of breaking the problem such that other ethnicities can't solve it doesn't mean that there aren't other ways of breaking the problem like that I guess the way I would say it is that it's really... it's not really breaking the problem basically what you're saying is so let me kind of say if SOS can certify that if X1, Xm is non-negative then no other STD can certify that if XI1 is XIm non-negative so you can basically say that any... one way to say it is that to avoid the notion of to avoid the notion of talking about talking about this you could say that the find some notion of the reasonable problem in hierarchy and you say reasonable problem in hierarchy does not take care of the name or environment if it can if it succeeds for if it can certify which is like the case for it if it can certify that this is not anything which would also be either to certify if it's the first environment and then this theorem would apply that hierarchy can't solve the same problem so then it would imply instance of environment and because you have to do something like that I would just avoid the notion of encoding the actual so basically why don't we take a break and give you time to kind of stare at this statement and make sure that you understand it we're going to take the scenic route to proving this this result and don't talk about some other results that are only next stereotype so like I said we're going to take the scenic route to the spoof of the theorem and do a regression once we talk about some results that I'm sure you've seen in some of them there's kind of a large very large collection of results that all have some kind of common spread so one is a big rally and one of the theorem foundation of the galaxy in minimax theorem of Neumann in gay theory don't show up with great minimization expert learning boosting in the hardcore level there is more than theorem the theorem that was kind of crucial in additive combative is to prove for the book by Grimentel that the primes have a very long mathematical progression and probably have a kind of theorem that fall into that general theme and I think they all do we know what seems the common spreading these things is that I think each one of them might be at least to me somewhat counterintuitive the first time you hear about it, it might be surprising that it's actually true and it turns out to be very very useful and it's actually once you get there to believe that something like that could be true these things are not that hard to prove and typically the way to prove them is by some kind of interactive procedure or local improvement algorithm local search algorithm with the descent of some type or these repetitive ways algorithm and that just takes one of them and show the proof now some of you have seen it but it's still useful because we're going to generalize it so and these are I think it's if you kind of understand you really understand this pipeline it's very useful it's very useful no matter what what you do or do you really try to pick one of those which is kind of your favorite and we didn't really understand the connection to other things so okay so let's just prove one of these results as they talk about great minimization so so this is Harvard so I assume most of you have larger portfolios of the stock that you're investing so and so suppose the following thing there is a you in some universe of assets new is the distribution of a new this is an investment strategy you can even think of it as basically you have you know million dollars and new is how you spread them so the assets or you know you have one dollar and you choose a random asset according to you and you think that invested dollar doesn't matter and we're thinking of the following game the investor and the market and time basically at every even point in time the investor comes up with some distribution new too the market comes with a function that says how well everything performs some FT FT from U to say minus one plus one and the gain of the investor is the expectation according to new T you basically what we want in such an investment strategy is we want to minimize the regret we have so basically but this FT could be kind of arbitrary and depend on you do it so the reasonable notion of the regret is to say we want to compare to the best strategy in hindsight so we want to we want to minimize we want to minimize the maximum of all new star of the gain new star and our gain the gain of new star is sum over all the official new star FT the gain is sum over all the T expectation new T this is the setting of regret minimization and I think it's kind of it might seem like a hopeless setting can you can you kind of invest in the stock market every day and have something that that you would have as good performance of someone that you invest in after like 30 years ago just stock with after and it turns out that you can you can get sublinear regret in you can get like the basic TOM not absolutely true in this exact form who proved it I think the general kind of notions of regret minimization comes back to it comes back to Hanan in 57 and a little stone and Wormos might have been in 89 might have been the first to do square root and then 400 Shapiro also maybe this version is due to 400 Shapiro and and the result is the following for every say let Eta be subparter that we can choose be some small number then then for every U1 U2 U capital T you can define sorry F1 FT you can define U UT based only on F1 FT-1 and such that we get the following our our gain the least of Eta the game the game of the sum so this is for every new star you can get it for every new star we'll get it our game will be at least the gain of new star plus 1 over Eta the KL divergence of new star from our initial distribution U1 and in particular as a corollary if we let U1 be uniform over U and we let Eta be squared log U over T then you know then if then basically the kind of our rate is going to be something like or squared log U the KL divergence sorry our rate is going to be squared log U T which is kind of the important thing is kind of sub-linear in capital T right so if you play this for many many periods then eventually our kind of big rate amortized goes to zero so so this is the theorem let me see how many of you have seen this proof it's a very good fraction in the reading group I know it wasn't the reading group I'm surprised that you remember okay so let me kind of just just say the outline for those who don't remember the outline is very simple you basically okay so I should say the proof outline we want to say something about what the proof gives us so basically you want to say mu T is just proportional to U T of X is going to just be proportional to U T minus 1 of X times 2 to the eta F T minus 1 of X so basically if F is your gain often we talk about sometimes it's a minus sign but it's the same thing so if F is your gain you want to give more weight to the X's that have like higher F and that's way to the other one so you can rescale things like that and you know the proportionality you know Z T would be the proportionality factor would be like expectation of mu T of 2 to the eta F T minus 1 so this would be like the proportionality factor to make this to make this distribution and then you do some kind of the analysis you do telescopic sum you basically look at delta mu star mu T minus delta mu star say mu T minus 1 minus mu T and you you do some calculations you up a boundary you sum up this telescopic sum and you get a result just some calculations yada yada yada so ok so now that we have seen this that's proof so which appears in the notes because by the way because the because the typically the theorem is playing with losses and I don't know why it's playing between games then if you see the proof in the notes every minus sign has some probability that it should be a plus but I conjecture that there is a way there is a signing of a realignment of the signs that make a decision to provide proof and and by the way talking about these things there is actually several sources so there is there is a survey Hazan Kale and Aroha multiplicative weights algorithm that is very useful and there is an actual book by Bubek who will be at the top in the reading group and I think it's called called this optimization or something like that and you can look at the chapter on say Leo Grady and Descent so he talks about a lot of these things so this is and there are more resources and there are some observations in the LRS paper itself that you wrote by the bottom I didn't know that I didn't know that but I should look it up so I didn't know that so that's useful to know so okay so so the proof that we didn't see actually gives us a little bit more than just the statement that's actually going to be useful but we can actually already see some sense what it gives us is the following that the investment strategies say the final strategy is kind of like you can think of it as a convex combination of all these strategies so the study is simple what does it mean that it's simple it's basically of the form like e or 2 to some eta so it's kind of proportional to some alpha i f i from i to 1 to some L where basically the complexity say the L is like the complexity and it's as if you want a epsilon qsc then L is just all of the divergence between u star