 So what we've seen before is that sum of squares is somewhat of a super algorithm. So, you know, it captures linear programming, spectral algorithms, some definite programming, various combinatorial algorithms. So, and all these wonderful things that are like, you know, like the Gommons-Williamson algorithm, that's kind of was a breakthrough in algorithm design. So this seems like a super, super powerful algorithm. But so the question is, maybe it's, it can do anything, you know. So, but it does, does have its weakness. And the kryptonite sum of squares is algebraic. I guess like superman, it was born from algebra and that's its weakness. And there are some things that it cannot do, even things that are arguably not that difficult. Let me give you an example of one such task. How many pieces are there? How? No, no, count. How many here? Okay. And Goren, if you wanted to split these pieces between you and Benjamin, can you split them so everyone gets the same number of pieces? No. No, but if you wanted to give yourself the same number as Benjamin, so how many would you give yourself? So, can you do it? Why not? Can you try to do it, same number? No. Why not? A small boy with some prompting does know that you cannot split an odd number of, an odd number of toys into two equal pieces. But the sum of squares algorithm actually thinks you can do it. So, so Gregorio proved that for every n, even if n is odd, there exists a pseudo-distribution and a very high degree pseudo-distribution that, on which this polynomial vanishes, right? Sum X i, i from 0, 1 to n minus n over 2. So, now n over 2, if n is odd, is not an integer. So, there is no way you can sum up 0, 1 variables to get to equal n over 2, but sum of squares thinks this is the case. So, even though like, you know, every kid knows what you couldn't do it, sum of squares believes that a partition like that exists. So, that's why some are saying, you know, that sum of squares believes in unicorns. You know, it believes you can actually split an odd number of things equally. And we're not going to show this low bound, but it's not very hard. And maybe eventually a proof of it will magically appear in the lecture notes, courtesy of Parvesh. But there is, sum of squares does have like some excuse. They say, well, you know, okay, it's true that a partition like that doesn't exist, but there is a partition that's almost equal. So, I'm not that much off. But here is one thing that maybe sum of squares has not much, that much of an excuse. It turns out that there is also a bus lower bound. So, for every, what Gregorius showed is that there exists a set of equations that cannot be satisfied, and any assignment cannot satisfy more than 51% of them, but sum of squares thinks you can satisfy all of them. So, now it's not as if like there is even a nearly satisfying assignment. And now the excuse of sum of squares that it's not fair, Gregorius used the dirty trick, and he used the probabilistic method. And I told you even before that basically the Marley principle, which we have relied upon, that it breaks down when, that you don't need to worry about, everything is going to be all right, that breaks down when you see the probabilistic method. And this is somewhat, and Ryan O'Donnell has raised some good points in the discussion, and we might continue this online, and I have some lengthy comments on exactly what it means, but I don't want to get into this philosophy, but it's kind of as a rule of thumb, you see a proof of a statement, if that proof of a statement used Union and Chernoff bound, it might have been problematic to be sum of squares of bal, SOS of bal, and if that proof was used, the type of things that we use all the time, convexity, Cauchy-Schwarz's, holders, maybe even Baskham Leib, then it seems like things would be okay. And that's kind of like a heuristic, this is not like a law of nature. Okay, so what we're going to talk today is about limitations of sum of squares, and these are known as integrality gaps. So the idea is that it's an instance where the true maximum, say if it's a maximization problem, the true maximum is at most S, but sum of squares thinks that there is an even better assignment that achieves some value C for some C that is bigger than S. And C and S come from completeness and soundness. So we'll start with integrality gaps for a degree two sum of squares, and then we'll see that the Humboldt cycle is already an integrality gap, or I'll just say that the Humboldt cycle is already an integrality gap. Then we'll talk a little bit about the Feige-Schechtman graph, which is this tight integrality gap, and then talk about, so in the second hour I want to talk about how you transform hardness to an integrality gap, and how integrality gaps are inspired by NP-hardness results, and in the third hour I want to talk about this ultra weird result of Fragavendra, which somehow does the other way around and takes an integrality gap and makes it into a hardness result. And I'm going to try to mostly focus on the high level ideas of the proofs because I assume you already had the lecture notes, and I hope that it can be more interactive and we can talk about what things you understood or what parts of the proof you want me to go into more detail. So any general question before I turn this off and move to the blackboard? I'm starting with a degree to integrality gaps. Let's start with a simple example. We had Schieger's theorem, or Schieger's inequality. It's actually in this setting, it's not even by Schieger, but by Alon Milman and Godzio. And this Schieger inequality says the following thing, if the lambda 2 of the Laplacian of the graph is at most epsilon, then there exists some set S going from this set in at most of square root of epsilon times the degree times the size of the set. This is basically the same as the definition of expansion. So this is Schieger's inequality, and now we show that it's tight. To show that it's tight, we simply take the cycle. So if we take the cycle, in the cycle, the N cycle, for every set of say, the size of the set is at most N over 2, what can we guarantee about the number of edges going from S to its complement? In the cycle, what's the minimum number of neighbors that any set has? 2, right? 2. So this is at least 2, which is omega of 1 over N times the size of the set, right? So that's... But on the other hand, in the lecture notes, we kind of work out exactly the eigenvectors of the cycle and show that we get a pseudo-distribution. But let me just give you kind of a quick and dirty reason why the cycle... So the claim is that there exists a vector x, such that if you look at x transpose lgx, then this is less than o of 1 over N squared, the norm of x squared. And nx is orthogonal to the all-ones. So this basically says that this is a gap in the instance. So this basically says that the second eigenvalue is at most o of 1 over N squared, because the smallest eigenvalue is 0 and this is the all-ones, and then this says that there is another one that's at most o of 1 over N squared. So this basically says that this is really tight. And how do you get this at x? So you can get the exact y at x by some sine function or whatever, but one easy way to get such an x is something like this, which actually looks a little bit like a sine function also. Like, say, 0, 0, and these are the vertices, and this vector will look something like this. Let's call this 1, this is minus 1, this is plus 1, this is minus 1, and do something like this. This is N over 2, N over 4, 3, N over 4. Okay, so this is the vector, right? So if you coordinate i, this is its value, and now if we look at this value, if this is the vector x, it is orthogonal to the all-ones because the area above the curve, above the zero is the same as the area below the zero. And what is x transpose lgx? It's basically sum from i, i plus 1, x i minus x i plus 1 squared. That's the value of the Laplacian when you do this. So this difference is basically always like, so this is basically always 1 over N, the difference between two guys here, right? So you get N times O of 1 over N squared, which is O of 1 over N, and we want to compare this to the norm of x. So the norm of x, what do we want? The norm of x, what do we need for our thing? Do we need an upper bound or a lower bound on the norm of x? We want to show that this is smaller than a... We need a lower bound on the norm of x. So a lower bound on the norm of x, we can take about a constant fraction as more than half. So the norm of x is about omega n, not like the omega 1. That's the right thing, I think. The norm of x squared is omega of n. So this is basically O of 1 N squared times the norm, which is what we wanted to prove, okay? So the cycle is an example that she gives in equality style. And if you look at the event cycle, it's not an example that the max cut, the Gomez-Williamson is tied. Do you know why? It's bipartite, right? So there, the max cut value is 1, so definitely there's no gap. But the odd cycle, the max cut value is 1 minus 1 over n, but you can get a semi-definite degree to the distribution that pretends that the value is like 1 minus 1 over n squared, or something like that, or constant over n squared. But it's not, we'll not give you the tight constant for Gomez-Williamson, so that's why, what school is this? I don't even know the teacher, but it seems to have a very strong professors. They don't even need... What? They're going to switch us on the side. Ah, okay. You either have to be strong or smart. That makes sense, like, if you're weak and stupid. So we want to get a tight constant, and this is this Fegge-Sheikhman construction, which is a very nice construction, and it also teaches us something about where the quality gaps come from. And now they relate to a lot of rich body of mathematics having to do with isopherometric inequalities. So basically Fegge-Sheikhman showed the following thing. There exists a graph G that satisfies the following properties. This property 1 is completeness. There is a degree to pseudo-distribution, such that the expectation of the cut of G under the pseudo-distribution is roughly its value XGW, which is up to the leader of one term, it's XGW, which is roughly, just to give a sense, doesn't matter exactly what the number is, just to give a sense. But on the other hand, there is a soundness condition. The two max cuts of G is less than alpha GW than XGW, but alpha GW is some constant smaller than one, it's a particular constant, which is 0.878. And I should say that actually, if you wanted a poor constant, you could use that, use MP-Hardness reduction to get some things that are... The point is not really the constant, but really the point is getting a tight example forces you to understand this SDP much more than if you don't get a tight example. And the graph, the way they prove the theorem is they build... So the way they prove the theorem is that they build this graph, which is an infinite graph, and it turns out that it's okay to build an infinite graph because you can subsample it, and so the idea is you first build an infinite graph where the vertices are all vectors of unit norm, and then you show that in this graph there is a pseudo-distribution and there is a cut, and then you basically sample a finite number of points and if this finite number is large enough, like exponentially in the dimension, then this will inherit the max cut and the cut value. So I did not prove that. If you actually go and look at the paper, there is some proof to be done here. It's not super trivial, especially if you kind of want the right number of vertices, whether you want it to be exponential with the right exponent and whether you don't... It doesn't really matter for the theorem as we stated it, but I hope it's kind of believable. So is that point kind of clear what they're doing here? So this graph has this property that we basically put an edge between two guys. So we put an edge between u and v. The dot product is this, but if u v is roughly equal to o, o g w, which is like 1 minus 2 x g w, this is going to be roughly minus 0.69 or something like that. And there was a discussion on Piazza. It doesn't really matter if we define this graph as roughly equal or at most because this is the concentration of measure phenomena. Almost all of the mass, almost all repairs that have the dot product at most will have dot product very close to being exactly low. So the reason in some sense if you take some Gaussian and you condition on having value, say, larger than this tail, then because of the Gaussian kind of exponentially decreases and not really showing it very nicely on this figure, almost all of the mass will be in this kind of very narrow band. That's very close to the point where you kind of put your band. Because basically everywhere except around the mean, Gaussians are very sharp decay. But please ask me questions. That's why you came here at this moment of the night and had to deal with the dragons that are guarding the... So basically the matrix is now going to be basically PSD almost by definition. So the pseudo-distribution, we define random variables, we define actual random variables, so xv for every vector v where xv is simply v where g is a random standard Gaussian. Every coordinate of g is a standard Gaussian. And so we define it v half plus half this thing, I think. So it will become... So xv is going to be distributed like normal with mean half and variance half or variance one quarter, standard deviation half, which is how it would be saving as a Boolean vector variable like as a zero one coin. And if we look at this matrix, the moment matrix of the pseudo-distribution, where you put here for coordinate u and v, you put here expectation xu xv, that equals to half times all one's matrix plus half times this matrix that has here expectation of vg ug, which is basically like the dot product of u and v. So basically, this is the matrix and because it's a dot product matrix, it's a Gram matrix, it's PSD, so this is a PSD matrix plus a PSD matrix, we get a PSD matrix. Okay, so this is a valid pseudo-distribution and we want to... And then basically we just want to say, okay, what's the expectation of the cut? So the expectation of the cut, you look at the xu minus xv squared and then you open up everything and it comes to be half minus half o gw if you're in real-time edge and then that basically comes to be xgw by this equation. Okay, and then the hard part, which we didn't prove, but luckily even Fagan-Chefman didn't need to prove because that was already proven, is that if you want to maximize the cut in a geometry graph like that, then the best cut is just to take a hyperplane from the origin and cut to both sides. And it's something that intuitively makes sense, but it's actually not revealed to prove and that was proven by Borrell and by Sudakov and Silsson. And basically this is the heart of this proof. Yes. So what does it mean, a lower bound? Like, is it always guaranteed that it's even? Again? Is it always guaranteed that it's even? What does even cycle like? For this particular graph? So this is like an infinite, so this, okay, there is an infinite graph and then we sub-sample it to get a finite-sized graph. And what we know about this graph is that the actual max cut value will be, it will be actually exactly this value, if that's the question. So you mean if this is like a lower bound? I mean, so is that always guaranteed? So the infinite graph, it's one single graph and you can kind of compute what value you get if you cut across the hyperplane and in some sense you can even see it here. It cuts, it cuts U and V exactly if the hyperplane falls between these two guys which happens with probability, the probability that it's basically the angle between them divided by pi. So it's arc cos rho over pi. That's the probability that it's cut. So for this graph it's exactly, this is the true max cut value is exactly this. And so the true max cut value is exactly this. Now if you sample, basically if you take a lot of random points here with very high probability, the finite graph you get will also have this max cut. So basically the hard thing to prove, like this is not how to calculate, the hard thing to prove is how do you know and this is what this iso-perimetric thing is, it's not to be a deal. How do you know that there isn't some very crazy partition that somehow does better than just taking a hyperplane in cut? How do you kind of make sure that no matter what crazy partition you do, it's not better than a hyperplane? And this is actually like something that kind of underlies many intervality gaps. Let's see how well you get the other one. This might be a... Okay, that's... You can't handle your strength. And it can handle the intervality gap. So this is a... Okay, I guess you get the main point. Try to make sure that they only use the top part for important stuff. So this is actually something that repeats itself in many intervality gaps and it's something that actually repeats itself in many parts of theoretical computer science. So this is a form of this iso-perimetric inequality. The classical iso-perimetric inequality says the circle is the shape, say the plane that minimizes the ratio between the boundary and the volume. The other shape, which is kind of intuitively true, but also not at all clear how you would do it. So if you take any other shape that has the same boundary, then it will have a smaller amount of volume. Which kind of makes sense? You're trying to... If you move around, then you're getting more boundary for the same volume. So it kind of makes sense, but it's not trivial to prove. And this inequality gap basically is some higher dimension version of a statement like that. Basically says that this hyperplane... It's again like relating volume with boundary. And this is kind of a general... So I have all these musings in the lecture notes, but it really is a very general statement that we keep using again and again, which is this type of theorem. So there is this optimality theorem, which basically says if you're optimal in some... So its optimality theorem is for a particular type of objective, like say the ratio between boundary and volume. If you're optimal in that, then you must be some kind of a nice set. Then there is kind of stability theorem. So stability theorem here would say something like if you are a shape that has very, very close ratio of your volume to boundary as the circle, then you are actually also close to a circle. So that would be like a stability theorem. But not only is the circle the optimum thing, but everything that's nearly optimum, it's nearly a circle. And then the hardest thing, but also often the most useful thing is what's called known as inverse theorems, which would say in this thing is that if you have some kind of a non-trivial relation between your volume and your boundary, then you must be circle-ish in some form. And I don't know exactly even how to phrase it in here, but basically these are some things, like for example in the world of coding theory, typically what we think of stability theorem, this is known as unique decoding. So basically it says if you satisfy some test or something like that and you almost satisfy, you satisfy with like 99% success, then you are close to a code word. And this is known as this decoding. So if you have something, if you are somewhat better than random, there has to be a reason for that. And this is something that just doesn't arise just in coding or in isoperometric questions. For example, you can ask the question of, you have a subset, you have a subspace subset, subset of numbers say that satisfies a plus a equals a, that no matter what pair of points you add to it, you always get a point even then it's a subgroup. But then you can ask if a plus a is smaller than some k times a, is it an approximate subgroup. And if k is really small, then this is like a stability theorem and k becomes larger, this is like an inverse theorem. We have some sort of theorems like that and this is a whole area of additive combinatorics. There's also questions of linearity testing and testing for load-of-v polynomials and all of these things have this kind of form where you want to say maybe you have some kind of test. Usually it's not that hard to prove that if you satisfy the test perfectly, then you must be something nice, maybe a linear function, maybe a load-of-v polynomial. But then you want to prove that if you newly satisfy it perfectly, then you are newly this thing and then the hardest thing is typically you want to say if you beat random, then you must be somewhat correlated in some form with a nice function and we'll kind of see it in the third hour, we'll kind of see it again. So basically, this is what I wanted to say about the integrality gaps for degree 2, in particular the Fegesh-Echmograph. So any questions at this point? Yes? So by doing it by... So the example that we've actually gone over is some infinite graph and you can get finite graphs by sampling some division in large instead of vertices from there. This is sort of like using the probabilistic method. Do we have any exact... Do we think it's possible to do this sort of argument with some deterministic instruction? Probably, yeah. I think in some sense in this case if you were to take... somehow find enough mesh because there is really no limit in some sense to... there is no relation to how many points you want to take. Basically, the dimension, we didn't even talk about the dimension of the sphere, the dimension is related to what accuracy you want... what accuracy you want to achieve to this bound. But once you decide on the dimension now you can take a really large number of points. You can basically decide that you're going to take all points, all points which so you take all x all v somehow that in Rn where of norm 1 where Vi is some integer times delta for some really tiny delta. Okay, so you don't need to actually sample... Yeah, you don't need to actually sample the uniform. Right, so you don't need... Yeah, so you don't really need a probabilistic method for this but this may be related to the next thing which is what if D is larger than 2? So these are integrated for degree 2 sum of squares but we know that they are not integrated gaps for larger degree sum of squares. So for the odd cycle it's an exercise and actually it turns out this exercise actually also succeeds in the Phyder-Chefman graph so it turns out that if you look at this graph so this graph on the sphere suppose you kind of intersect this sphere with a plane much more like what I'm doing when I'm drawing it on the blackboard then you have this kind of so I think the easiest way to think about this is when rho is like 1 minus when you connect the two guys if their dot product is like u dot b is like minus 1 plus epsilon so you can connect, you can connect you can take u1 and then v1 will be here and then you can go back here maybe so this let's say this is u1 this is u2 this will be u3 and the distance here, the angle u will be roughly squared epsilon and so you can go like back and forth here and after so after about squared epsilon steps you basically could reach here without without having to go back which basically what I want to say is that there is an odd cycle in this graph of about 1 over squared epsilon edges and moreover these odd cycles kind of cover nicely the graph basically you can almost partition the edges of the graph into a disjoint or nearly disjoint collection of odd cycles and each one of them so you can kind of certify that the max cut value is at most 1 over squared epsilon using these odd cycles which also in particular implies that the degree 6 sum of squares certifies for this graph that the number of odd cycles the max cut is at most 1 minus some constant times squared epsilon and we have a paper of me with David Moore-Itzhardt and Thomas Hollenstein where we can show that even if you sub-sample randomly a much smaller number of vertices so that so small in fact that there are no more odd cycles no more short cycles in the graph the graph has large curve it's still the case that sum of squares of degree 6 still can certify it and somewhat strangely we still use the fact of odd cycles some of these phantom cycles that even though they don't exist some squares can still use them because there is an advantage when you believe in unicorns so we kind of know that these kinds of things will not work for a larger degree and basically now we're going to take a break and after the break I'll talk about what does work for a larger degree okay so let's take so anyway I guess if you have kind of feedback on you know if you feel that I'm too much repeating what's already in the notes or there are a way around that I'm not going in enough detail and even though it's in the notes you would still kind of prefer that I show the proof line by line you can tell me in person you can email me and write to me anonymously if you prefer there will be no shame in your view so just let me know if you feel like I'm going too fast or too slow or there is anything else you'd prefer me to to talk about the high level, the low level detail and please do ask questions including questions that are for us on a tangent because again you have the notes the proofs are there so I want to try to use this time for discussions and high level things so now okay so we want to go to a larger larger than 2 so we want to prove theorem for degree higher than 2 and there is one thing that computer scientists can help from can learn from mathematics this is the following given our kind of referring culture this is not completely mandatory but it turns out it really helps if the theorem you're trying to prove is true like we computer scientists can prove it even otherwise but it's easier if it's actually true it's easier so why don't we try to prove 2 theorems when you're trying to show the low bounds for some squares and how do we know what's true so you know every kid even my kid even though he doesn't know how to count in 5 knows that there is no sub-exponential time algorithm for SAT right so this is right so this is a dogma and now this means basically by the PCP theorem and by theorem of Astad and some fancy PCBs of Moskowitz and Vaz basically this means the following that there is no 2 to the n to the 0.99 algorithm that can distinguish between the two the following things the yes case even a 3x or instance a 3x or instance is variables is equations on n Boolean variables that are of the form yxorxjxorxk equals aijk so so it cannot distinguish between the case, the yes case the value is at least 0.99 and the no case the value is at most 0.51 so the value is the maximum fraction of equations you can satisfy and basically it has to prove that if SAT is hard there is a reduction showing that 3x or even approximating it is hard in this way so in particular if there is no n to the 0 99 algorithm so we can make the following prediction there exists degree D which is n to the 0.99 distribution such that mu such that in the recent instance such that the expectation according to mu is at least 0.99 but the truth is that the value is at most 0.51 why is that the case because if this wasn't the case if for every instance if for every instance where the true value was at most 0.51 the best pseudo-distribution was at most 0.98 then we could use that to distinguish between we could use that to distinguish between these two cases because if the true value is at least 0.99 there is definitely a pseudo-distribution given this value and if it was the case that for every instance where the true value is at most 0.51 the best pseudo-distribution is at most 0.98 then we could have distinguished between the yes and the no case so basically if we believe that there is no sub-exponential time algorithm for that then that implies that we have this there should exist this integrity gap and it's always kind of nice that when we have these kind of predictions to try to actually prove them maybe it might also give us some warm feeling that maybe the beliefs we are basing these predictions on are actually true and this is what we are going to do so the zero n the following he basically proved this prediction but then he did it even better so instead of n to the 0.99 he actually got omega of n even nicer and we probably could have predicted that it would be exponentially hard so maybe that's not so surprising the thing that it may be even more surprising and somewhat even too good is that he actually showed that the value equals 1 that he can give us the distribution that pretends that you can completely satisfy it and this is somewhat annoying because we think of some squares as this strong algorithm we can, this problem is not to distinguish between the case that you can satisfy completely a set of linear equations and the case that you can't because apparently solving linear equations is known how to do in polynomial time even model 2 so there is this algorithm called Gaussian elimination it apparently does it so one again excuse for some squares pitiful performance here is to say that some squares doesn't really know how to take advantage of the difference between 1 and 0.99 and because the 0.99 thing is hard and some squares just gives up even though it's 1 but maybe the reason is that I kind of like to make excuses for some squares and other things more critical of it ok so and here we actually are going to use this probabilistic method and we're going to use the idea is going to use a random so let's just kind of get like some sense of what is a random ZXO instance so basically random ZXO instance we select random equations so we select M equations of the form XI XO XJ XK equals a IJK so we select M equations like that so let's just kind of get like some intuition suppose that M is like N over 2 so we select N over 2 equations in N variables so what do you think will be the value of this system how many equations would you be able to satisfy all of them they are very highly likely to be linearly independent if you have less than N you have linearly independent equations you can find you can solve for them and find a solution on the other hand when M is like 2N then they are very likely I mean they cannot be linearly independent very very likely to be linearly independent but also very unlikely to have it's very likely that any assignment will kind of satisfy some chunk of them and generally if M is like N over epsilon squared then you don't expect to satisfy more than half plus epsilon fraction of them and one way to see that is that the number of equations the number of equations you satisfy is a random variable with M over 2 and standard deviation standard deviation is like all of squared M so the best assignment there are two to the impossible assignments you kind of expect the best assignment to have roughly squared MN you kind of expect the best assignment to have like squared N standard deviation above the mean because that that would be like probability 2 to the minus N so if there are two to the N assignments you kind of that what you roughly would expect so this if M is like N over epsilon if M is like N over epsilon squared this would be like this would be like M over 2 plus epsilon M okay so that's that's kind of like the rough intuition by what you would expect and the actual calculations that's in the lecture notes okay so if you take a random instance it's very unlikely that you could be satisfied more than say 51% of no matter how you assign the violence it's very unlikely that you can satisfy more than 51% and the kind of surprising thing is that you can actually do it that there is a distribution that actually pretends to satisfy all of them basically the way we kind of represent a random random CSP is we think of it as a bipartite graph where there are M clauses so M constraints, M equations and M variables so each constraint is connected to three variables like I, J, K and in addition I will have this vector A1 till AM so we think of an instance as G over A and basically the proof boils down to two lemmas so now we want to prove that there exists so we've proven that if we choose G and A at random in fact it turns out that we only need to choose A at random it doesn't matter what G we choose with very high probability we're not going to be able to satisfy 51% of the equations now we want to prove that there exists a distribution the lemma boils down to two two properties so we want to say is that we're going to set D we're going to set the degree D that we work with to some delta N delta will be some small enough constant to know what it is so so lemma 1 it says that for like with high probability G is a great expander which means that for every subset T of the left side of the constraints if the subset T has at most say 100 D vertices then the neighbors of T is going to be at least 1.7 the side of T so how many edges if you have a set of size T of size K on this side how many edges are going out of it free K right free the size of T yes and basically what it says is that so a priori you could have hoped maybe for 3 times T and you get at least 1.7 so basically this 1.7 is better than 1.5 you get more than half of the neighbors you could hope for and it turns out you couldn't expect something better than 2 that's an exercise that's written there so this is lemma 1 and this is kind of a lemma this is just a lemma about 1 to 2 graphs you can prove that the proof is there in the notes and you basically you know you do a union bound over all possible subsets and you do the calculation and it comes out so just believe me about this lemma or you know you should have read the proof so you know that ok so so the the matter here is this lemma 2 which also maybe someone probably it somehow says that some some square is somewhat stupid in some sense because what it says is that if g is an expander in that sense a great expander in that sense then it doesn't matter what's on the right hand side it doesn't matter what's on the right hand side some squares will always think that you can satisfy all the equations so there exist some degree d and u such that for every maybe one like two so for every l in m expectation u some polynomial of degree 3 or so that checks that x i, x o, x j x o x k equals a l is 1 a i, j, k are the neighbors ok so part of the reason I'm writing this lemma just to make sure that we are all in the same page of notation so I hope we are right so this lemma says that there is a pseudo-distribution that satisfies all in expectation it satisfies all the every constraint it satisfies is that clear ok so there are two parts when you want to prove this there are two parts when you want to prove this kind of lemma and part one is you want to construct the pseudo-distribution u and part two is you want to prove that the pseudo-distribution u is a positive semi-definite and by now we kind of have a we go ahead to be smart use some intuition to come up with these sort of distributions by now we kind of have some rules how we come up with these distributions and we kind of view it through this Bayesian point of view so we say you know how do we come up with mu so we want to say ok it's a random so our prior is that mu is simply the uniform distribution over z1 to the n a priori if we didn't know anything we didn't look at the formula at all it's a random formula what can we say about the solution would be random but then we see something we say ok we observe that x7, x0, 15, x0 x32 needs to be 1 so we say ok we observe that there is no we have to say this we have to respect that so then we say it's almost uniform but let's modify this let's set the expectation of 1 minus 2x7 1 minus 2x15 1 minus 2x32 to be minus 1 this is kind of 1 minus 2x7 it's convenient linear transformation that makes 0 into plus 1 1 into minus 1 then x0 becomes multiplication ok so we have to do that then we observe maybe that x32 we see that there is also this equation let's say x32 x49 x72 equals 0 so we do the same thing we say ok so the expectation of these guys plus 1 but then we actually observe ah we we can actually learn something if we have these two equations what can we learn? we can add them right? exactly so we can add them and find out that x7, x0, x15 x0, x49 x0, x72 equals 1 which basically corresponds here in this world to multiply these two equations so we get this expectation of 1 minus 2x7 blah blah blah until 1 minus 2x72 and it was minus 1 so the idea is we say ok we start with uniform distribution but now we modify moments as we need to do it but we have to be a little bit careful because what would happen if we were to combine all linear relations that we know of that exist in this system of equations so one thing is that basically we would be able to show that 1 equals 0 because this is an inconsistent system of linear equations it's not satisfiable, not even close so we somehow have to be careful in what we do and the idea is that we only deal with equations that are at most of degree D so we do all these things up to degree D but if two things of degree D we add them together it becomes degree 2D we say ok we don't write this down we don't make use of this information we are debounded and we are going to simplify things greatly if we just do the change of variables so we find yk equals 1 minus 2 that's it we are going to do it but it wouldn't simplify things super greatly but yeah but basically we are going to do it so basically the way we are going to do it we are going to span up polynomials we define this function chi s it's the product of chi s of 1 minus 2 xi and we are going to write the expectation for this for these functions which are the monomials in the yi's and the law is the following we are going to say ok if if you have i xo j xo k equals a we define chi of the expectation of chi i j k d1 minus 2a and the number of law is that if s and d have been defined if s and d satisfy that the symmetric difference of s and d is at most d and they themselves are at most d we define the expectation of the symmetric difference to be assuming these guys were defined before to be the product of these and for anything that wasn't defined we keep it at zero because that's in the uniform distribution everything, all the chi s will have expectation zero and we try to stick to the uniform distribution as much as we can except for the things where we are forced not to do it and now basically this is the process and now you basically have to do two things you have to prove that if first of all that this makes sense in the sense that you are not going to define two things in a different way and second thing is that if it does make sense then it is going to be then it is going to be PSD and it does not the second thing is actually pretty easy so maybe if I learn how to work with this model I'll ask all the lecturers to be here because I don't want to have this skill set go to waste so we basically have to prove two claims claim one is that we don't blow up by blowing up I mean that we don't apply this rule and suddenly we see that we have to assign to we are trying to assign to the expectation of chi s symmetric difference with t plus one where we've already assigned to it minus one before so that would be kind of unpleasant so claim one is don't blow up and claim two is that the resulting thing is a valid distribution which it's going to be clearly like a linear operator etc so the thing that you need to prove is that it's going to be the thing that you need to prove is that it's going to be that it's going to be actually satisfied that the expectation of a square is is non-negative and that thing actually follows pretty easily the proof of claim two is basically you write a polynomial b as a sum of some coefficient alpha s chi s and then and then you write what is the solution expectation of p squared it's going to be a majority it's going to be sum over s and t expectation of alpha s alpha t the solution expectation of chi s chi t and now the point is that basically for everything that we have defined we can always change this into expectation of chi s expectation of chi t because that will always hold the way we define things so if this was non-zero and this was non-zero then this would be the product of them and if this is zero and similarly if this is zero this must be also zero because otherwise this would be non-zero and then this has to be the product of those guys so so it's always going to be so it's always going to satisfy this equation and once you do that then it's basically it is a square right it's just sum over s alpha s the expectation of chi s all that squared so it's sum which is always yes yes because we want to stay with the uniform distribution so actually what would happen is that the vast majority of things will not be defined so and then they will be simply zero but if two things are non-zero then the expectation of the product is the product of the expectations okay so this okay but now we just have to prove that we don't blow up and let me kind of give you just the intuition of why we don't blow up and the idea is the following if we look at this graph and basically a contradiction a contradiction is a if you got a contradiction it means you added up equations and you got that one equal zero so it must be some linear dependency so it means you somehow got a set here of equations that's called this set t that that is a linear that is a linear dependency now let's just look at the first time this set so so the claim is that we can assume without us of generality the set t is at most I don't know 20 times d where d is the degree and the reason is that the way we do these derivations we only deal with sets of size d so the first time so somehow the first time we get a dependency it will be we can assume that it has at most this now we are going back to claim 2 as we are talking about something what happens if chi s and chi t we individually don't know the value of either of them we do know the value of their product we have an information so in that case the set of notation of chi s and chi t are both zero right okay so if you know how okay so let's see so chi s is going to be zero chi t is going to be zero and chi s is going to be down there so t is t so we know that this thing is not zero that's a good question is that possible? okay so let's see so if s so if we know s x o t s is delta delta is delta s delta t so if if s and s is zero it's okay t zero s x o t let's see what we know here so it's a valid distribution every time when chi s is minus one chi t is minus one every time when chi s is one chi t is one then the expectation of chi s o plus t is always one so we I mean this is okay okay but we need the spoof right so in this spoof I kind of assume that I could write this as chi s chi t in this case I cannot do that and now I have to remember yes yes so suppose the union thing cannot happen say for example s and t are destroyed then that cannot happen it's clear I guess now what I'm saying is suppose you have the equation x o x 2 x 3 x 3 equals 1 then what about chi 1 versus chi 2 3 then both of those are zero but the expectation of them is non-zero and now I have to remember there wasn't an issue here which now I think I kind of put under the value now I have to remember I think yes I think this spoof is a little bit maybe you explicitly fit into equivalence classes I think I have to do something a little bit more this spoof is slightly more subtler than this basically I think what you do is the following you say that s is equivalent to t s delta t is defined the expectation of x delta t is not zero and what you do they always have the same so you can show that this is an equivalence relation and now what you do is you basically can show that so now what you do is you write p as p1 plus pc group together the coefficients that are in the same equivalence relation so if two things are not equivalent the expectation of p i p j will always be zero so these guys so then it reduces to the case where it reduces to the case where oh I see now we just get a sum of what's right so now you just need to move now you can assume that s is in the same is in some equivalence class now you can kind of shift by the equivalence class take it out so you can basically revamp the source of generality assume that expectation of chi s is non zero for every coefficient and then and then you this equation holds what's the this is saying that this matrix I mean that this is what I mean for this prime one and this model I think this equivalence class is the thing that makes you so it's not really right so basically what will happen is if you actually looked at this guy it will be like a bunch of one one blocks the quadratic form corresponding to this to the expectation will be a bunch of one one blocks corresponding to these equivalence classes so write a polynomial so p1 will be the part here p2 will be the part here pk will be the part here and so this will be a pst1 one this will be a pst1 one etc and and the dot product of so if you look at the sum of pi with the quadratic form then you'll get sum of the pi squared is this clear or not so clear okay maybe why don't we I mean since I cannot really tell you that this proof is in the notes because the proof in the notes is wrong so why don't I actually exclude it so the idea is the following we define s to be equivalent to t if expectation of chi s chi t is not zero and just briefly we're assuming s and t each at a degree less than 2 I want this over 2 years I think we're at s and t at most d over 2 so s and t at most d over 2 we say that they are equivalent if expectation of the sudo expectation of chi s chi t is not zero now I leave it as an exercise to verify that this is an equivalence relation it's reflexive transitive and the other thing so so it is an equivalence relation that's not how to show so now now we basically write so we have this polynomial p of the degree at most d over 2 and we write so we write p as you know let's write it alpha 1 chi s when we write it in the in the chi basis plus alpha some I don't know chi s t and we bunch together these guys to to equivalence class so we call this one p1, this is p2 and this is pc so we bunch together all the coefficients that correspond to the same equivalence class now if you take now we look at expectation the sudo expectation of p squared which is the sudo expectation of some alpha i chi s i squared now we open this all up so but what we see is that if we have to think that they're not equivalent to each other then the expectation of the product is zero so they vanish so what will really happen is that it's going to be the sum of the sudo expectation of pi squared so we just have to prove that each one of those is individually non-negative ok so now for guys that's where everyone is in the same equivalence class ok so we go back to this so so now we can assume that p p is of the form sum alpha i chi s i where chi s i is all of them there is some representative let's call it s zero such that s zero expectation of chi s zero chi s i is 0 for every i ok so and now what we do is we now what we do is we are going to shift by s zero so we know that this is so we know that chi so what we do is the following we're going to write this as p we're going to write this as sum alpha i so we're going to write this as expectation of chi s zero chi s zero s i and now we're going to look at the squared so expectation of p squared so now we get alpha i, alpha j this chi s zero chi s zero s i chi s zero chi s zero symmetric difference with s j and now we want to claim that and this thing has to be the product of these two other things and now I have to remember why we say that so we know that chi s zero chi s i is not zero and chi s zero, chi s j is not zero so the product has to be the product of the expectations so we get that the expectation of the expectation of chi s i chi s j is the expectation of chi s zero chi s i is the expectation of chi s zero chi s j okay so now basically what happens is that we can argue that the expectation of p squared is equal to some alpha i so the expectation of chi s zero chi s i all that squared which is not negative but this is some number that's what we care about and what we use do is simply that this thing chi s i s j is equal to expectation of chi chi s zero s i times the expectation of chi s zero s j and the point is that this is not zero so this has to be the product of the two okay so this now okay so we basically had to reduce to avoid the case that Michael talked about we had to kind of restrict us up to the same equivalence class and now let me say something about why okay so this is the claim one that we didn't blow up and now let me say something Michael said here why why this implies that we don't get into a contradiction and the idea is the following a contradiction really corresponds to first of all we must get to a linear dependency so linear dependency is a set of equations that are dependent in particular if we have this set of equations then they are dependent it means that every if you look at the variables that the set of equations touches then every variable has to be cancelled so it has to be touched at least twice right so so this thing has degree three on the left and if you look at the neighbors every one of these neighbors gets touched at least twice so that means that the size of this neighbor set is at most 1.5 times the size of because there are three there are three T edges going out but we are wasting half of them because every guy has to get it at least twice so that doesn't automatically means that there exists no linear dependency in this graph in fact there exists a linear dependency in this graph there are no equations than variables there is definitely a linear dependency in this graph and what we can show is that basically if you go along the derivation because every time when we do this derivation we we've set that the variable the corresponding variable is the size at most T we can show that we can find the first time that some of this set becomes large it's still not super large so T we will be able to somewhere along the line set of T at the size of 10D or 20D and then that's a contradiction because small sets there could be in this graph there are large linear dependencies but there are no small linear dependencies and the actual proof you can see in the now that proof is actually I think correct at least at least let's not show it so we don't have so then it's safe yes one thing that people are using sometimes is an identity relaxation versus NP relaxation so the difference is you've got subspaces rather than calling them in subspaces but this argument somehow is based on sets but I mean there is somehow calling the sets of variables where I'm calling data line in subspaces so why is some of these other powers and I think this stuff would not show up in here this is a good question so somehow I think what happens I mean this is indeed like for 3x so and also for 3x it is the case that semi-definite programming is not more powerful than linear programming so basically if you wanted to come up with the same with pseudo-distribution for linear programming so basically we didn't talk about what's the definition of pseudo-distribution for linear programming but one way to say it is you drop you drop this condition you don't have this condition that this has to be loaded but rather what they do in linear program they say that if you have some f that only depends on k variables and it's non-negative then that only depends on k variables and it's non-negative then you wanted expectation of capital F it's always non-negative and capital F is applying small f from some k of its input variables so so that's somehow like so basically in some sense that's like Pablo says it's like picking a basis, you say it's not all all degree polynomials but we somehow treat the coordinate basis as special and we only look at polynomials that the reason they are all degrees is because they only touch k variables and it is true that for some of those 3x so it doesn't seem like the power of changing basis doesn't seem to be useful maybe because in some sense this problem is so hard to do anything to it so while for the max cut problem like linear programming is super pitiful and cannot even distinguish you can go to exponential time and you can still cannot distinguish between basically a bipartite graph and a random bipartite graph and a random graph for 3x so the problem is so hard maybe for 3x so linear programming is still pitiful but also semi-definite programming is pitiful in some sense every algorithm you know is pitiful we just don't, I mean this problem is so hard that basically we don't know how to do so one way to think about it is that the pseudo distribution kind of encodes the most naive thing that anyone looking at this problem would know, right? except I know that the equations are true and if I add up a small batch of them then it also should be true and basically for this problem it's so hard that SDP doesn't do better I don't know if it answers to some extent so the nice thing about this is that so this is this thing generalizes and we'll see also later how we generalize this to other CSPs and there is this very, very simple observation but somehow very useful so basically what it says is that if X if we say U is a degree D is a distribution and B is a degree I don't know K polynomial then if we define U, I never know how to write U I think it's this thing so if we define U to be like the distribution where we basically look at P of X where X comes from U then this is the degree at least D over K so the distribution so basically if you so this is very easy to prove but basically what it says is that if you do some small manipulation or like low degree manipulations to a pseudo-distribution it's still a pseudo-distribution and the reason that it's very useful is the following things so typically the way, so first of all this proof will basically show that it's also hard and typically the way you want to show now that a different problem is hard the general idea is usually you'll have reductions so you'll get a reduction that takes some instance of say 3-sat in comparison to some instant high prime and the reduction will usually say if there is a good solution for 3-sat then there is a good solution for high prime and if there isn't a good solution there also isn't one here but it's typically constructive so we'll go back to this point in a few minutes after the break but basically it's typically constructive that means that we have an efficient way to take an assignment for the 3-sat and convert it into some p of x which is kind of an assignment for this high prime but typically it's actually these things are kind of like local motivations so it typically will be a low degree polynomial basically these manipulations will be like a low degree polynomial it will be low degree polynomial which means that if we had a pseudo-distribution for the sat instance we also have a pseudo-distribution for the resulting problem so it basically means that if we had a reduction showing that if sat is hard and some problem blies hard then we can start with this instance of this instance of hard instance and get that blies hard unconditionally so so in the notes there is a particular form that's really useful to start from which is basically the sum of squares analog of the PCP theorem and typically that allows us to take many hardness results computational hardness results that under assumptions work for any algorithm and convert them into unconditional results for the sum of squares algorithm so typically if you know how to prove that something is NP-hard you also know how to prove unconditionally that some of the squares cannot do it which is what you would expect but it's still nice to prove these kind of things so questions? so let's take another eight minute break seven eight so in the last we shall definitely end of time because we all so this last result which I'm going to talk about at a very high level I kind of find it ultra cool and this is even and this is even as someone that doesn't believe the assumption in the statement of the theorem I still find the result so basically what we saw in the last lecture we take something that's NP-hard and we get an integrality gap that's kind of to be expected if we believe something is hard for every possible algorithm it should also be hard for this algorithm but the result we're talking about now as the other thing we take an integrality gap and convert it into a hardness and this is just what the fuck how can you do that I mean if you think about it integrality gap is basically just saying that some graph, some finite object of like 100 vertices or 1000 vertices has one combinatory quantity different than some other quantity the maximum the maximum cut is different than the maximum distribution which is some kind of maximum eigenvalue of some matrix so it's kind of pretty cool that you take this the existence of this one particular finite object be an obstruction to every possible polynomial at the same time especially if you think about it well the algorithm can know about this object it's only constant you can hardwire the algorithm to know about this object how could this object stop the algorithm from working but this is the beauty of this thing and the idea is that basically the integrality gap becomes this gadget that allows to do the reduction we'll show it so generally this basically is kind of the theorem that says you know if some squares can do it no algorithm can do it so that's pretty cool and ideally like what you would want is kind of this is like a dream theorem kind of want to say that for some large family of problems say large family script of problems say whatever let's say maximization problems like max cut what you have is that if you take the best efficient algorithm so you take the minimum efficient algorithm A and you take the worst problem for that like say the maximum instance so basically want to say that for some large family it holds that for every problem pi in the family the maximum the instance that maximizes the difference between what A thinks the value is and the true value is roughly the same as the worst case for the sum squares algorithm you take the worst instance for sum squares the sum squares the value the value t minus the true value and so this basically this is a kind of dream theorem that will tell you that for this large family of problems the SOS computes value that might not be the right value but it's the best that you could do if you're restricted to an efficient algorithm and of course to do that you need to assume that say p in different room the member almost proved this this thing so basically what he proved the member he proved that if p is a different problem here he needed to use this UGC which is a huge caveat in some sense because we don't know if it's true and I think it's false but some other people think it's true so we can say that there is no consensus and what he proved is basically this thing is true and what he proved it for SOS of degree 2 which basically also a corollary for problems in this family SOS 2 is the same as SOS with degree 100 or SOS with degree some n to the little over 1 so so this is I think a pretty cool theorem regardless of what you think of this in the games conjecture and let's just say let's just try to define the theorem and say something about the proof and the SOS values then you can solve the game's problems, right? yes if you find an efficient algorithm that gives you not this so basically if you find an efficient algorithm that gives you a better or worse case approximation say value than what SOS gives you a degree 2 SOS then you have a refuted in the games conjecture but just the idea that you could take that you could imagine that there is a hope of proving such a result can suggest maybe we could prove this result based on mp hardness or maybe we could prove it in the games conjecture and then get this result based on mp hardness so that's I think pretty impressive, at least to me I don't maybe some people find pretty impressive and again there are maybe other results in the literature that you can somehow interpret in this kind of way but I don't really know of anything that really has this kind of the same nature so the problems that that Prasad looked at are constraint satisfaction problems and basically the idea is that you have some some you see your variables that are in some alphabet sigma to the n and you have some constraints that will only depend on some k variables so max cut is an example like that and you're trying to satisfy as many constraints as possible so max cut is an example like that for every edge you have a constraint that says we want this edge to be cut and you're trying to cut as many edges as possible so you're trying to satisfy as many constraints as possible and basically what we call a cs gap for some parameter c which is kind of the completeness and s is the soundness cs gap is basically some instance because of some family of predicates so some instance of csp with this family such that the true value so the maximum over all x of the value that x gives for this instance is at most s but on the other end there exists a pseudo distribution such that the pseudo expectation of the value is at least c so this is a gap, this shows that some squares will exhibit a thing of degree 2 pseudo distributions actually in degree 2 you have to especially if the alphabet is sigma you kind of have to describe exactly how you phrase it as a degree 2 pseudo distribution and there is some kind of an extensive footnote on this intellectual it's about exactly how you phrase it it doesn't matter for us so much if it was the degree 2 or the degree 50 at this point so these kind of issues of formulations are not really important so this basically says that there is a gap and basically there is a gap for the sum of squares basically what Prasad shows is that given this there is that given this is the problem is actually hard and the way it basically shows is that given this such a gap instance we can show a reduction from the unique problem to the CSP problem where yes instances are translated to guys that have value at least c and no instances are translated to guys that have value at most s and let me give kind of a very high level view of the spoof and then I'll try to kind of illustrate for Max Katal exactly it works out so the idea in some sense is the policy you take an integrality gap so we are basically we basically showed that the heart of the Fenger-Schechtmann integrality gap is an isoparametric inequality for the sphere what basically you can observe is that this is every integrality gap really has to have this thing so every integrality gap implies some kind of isoparametric equality the sphere it really has even this kind of and it really has this kind of value the value it comes is like value s which comes from the sum in the Max Kat case we said that the best cut in this graph is actually this small value like this soundness value which was like 0.87 times 0.85 0.845 but generally this any integrality gap can be translated to some kind of in isoparametric inequality on the sphere that says that some kind of test cannot be passed with the probability more than s over the sphere and then basically what they do is now by the way I shouldn't say that the result is not coming in a vacuum and it's heavily on earlier work by Kotkindler Mosel and O'Donnell and also Mosel Oleskewitz and O'Donnell but so what these guys do is then this isoparametric inequality over the sphere which I said is kind of an optimality theorem and then make it into an inverse theorem for the cube so basically what you want to say is that you have two kinds of functions your functions that are special on the cube which are very non-spherical in some sense and this so these nice functions so the nice functions have value roughly c and random functions have value roughly s in this test but there is going to be some test and in some sense what you say is kind of randomish or generic functions in some sense they cannot distinguish between the sphere and the cube so because we have this inequality over the sphere random functions give them the cube they still think they're in the sphere they cannot do better than s this is something that's known as the invariance principle and this is by the way again something that's very commonly mathematics sometimes I think it's called universality theorems etc that you sometimes cannot under certain conditions you cannot tell apart if you even say Gaussian random variables or 0 1 random variables and there are other probably there are more phenomena like that than I can recall at the moment some people can chime in but this is basically what we say is that this invariance principle says that functions that are not functions that are not very that are kind of generic they basically think that the cube is the sphere and they get value s functions that are very nice they get this value c and the important thing is that you have this inverse theorem that if a function gets a value like s plus epsilon it must be because it's somewhat related to a nice function so this is kind of what I mentioned before as an inverse theorem you're saying if a function performs better in a test than what would a random function do then there must be a reason for that and the only reason someone does anything good in this test is because he copied from one of the nice guys so that's one way to think about this inverse theorem and basically once you do you get this then you get what's known as this cube-based dictator test and this gadget allows you to do a reduction from the unigames problem to a problem that you started with because the whole test etc. have to do something they inherit from this in the validity gap its form which was this kind of constraints that you use there so this is kind of like the ultra cartoon version and now let me try it so I don't know if the question is at this point actually no it's hard for the grid 2 subscribes one of the question is so these things basically they start from the grid 2 so they start from the grid 2 in the quality gap and then in the end with unigames hardness now maybe since I was already said I love this result and I do love it maybe I can also give you a somewhat harsh way to describe it is garbage in, garbage out so you started with a poor in the quality gap you only got unigames hardness and one of the main question is can you start with a better in the quality gap can you get outside from the NP hardness and we don't really know maybe if the unigames conjecture is true then in some sense you get garbage in and you get diamonds out and you start with the grid 2 in the quality gap and you get that it's hard for any algorithm including the grid into the epsilon into the little of one sum of squares so that would be amazing and and one of the interesting things which seems to be a barrier to these things but it's not clear even how what's this relation is that these guys these type of things have low degrees sum of squares but the degree is not too it's not completely clear that this completely without using these things as gadgets but some of it's interesting phenomena and one that we don't yet completely understand so I mean I'm not going to spell out this entire this entire thing but I want to tell you a little bit some of what is the ideas here but I think it's kind of it's a pretty cool thing it's a pretty cool idea that it's about so do you want to say which kind of I mean it's always eyes and bones or so maybe it will become like a little bit clear but I can say okay so I can say roughly speaking it will come from the constraint satisfaction basically what it says is that if you have like an Integrality Graph so we'll say let's stick with the case of say MaxCut the MaxCut Integrality Graph will really be it automatically defines you some graph on the sphere might not be exactly the same graph we've seen before but some graph on the sphere for which the best MaxCut value is at most this value S and typically this value S will be easy also to achieve but basically this best cut in this graph is at most on S where S is the gap you started from and generally if it's a more complicated test that is how it would say that there is a distribution of constraints on this graph that any subset of the sphere doesn't satisfy this better than probability S so basically we are not going to so I don't know how many of you know that I've seen like hardness of approximation results and PCP based results etc so we are not going to show all of this but let's talk about what is actually a reduction so typically a reduction what we do is we take an instance i in map it to p of i that's a reduction and we want this property this property of completeness and that means that if there is an i that is a yes instance suppose this is a constraint this is a constraint that say if this thing has a good assignment if there is an x such that the value of i of x is at least some c star then it should transfer that there is some let me call it f such that the value p of i of f is at least c so suppose I want to show that the problem I want to show that this problem is c as hard so it's hard to distinguish between the case that there is a value that at least c or this is the case that every assignment has at most s so and I start with the assumption that this problem is c star s star hard so I want to transform a yes instance into a yes instance and the sound s if there is no assignment then if there is no assignment then there is there is no assignment to you there is no assignment to you but let me write it in the counter positive so I'll write it as if there exists f here such that the value of f is larger than s then there should exist x here that the value of i of x is larger than s star right so this is the notion of a reduction that means that if it was hard to distinguish between value c star and s star here it would be hard to distinguish between value c and s here agreed and the definition of a reduction is that this part has to be efficient otherwise it's not efficient it doesn't make me say anything but almost always this part is basically how do you prove something like that you need to show that for every x there is an f and typically this part is also efficient so this is someone knows as a coding and this is typically efficient or almost always efficient in this reduction so you take an x and you efficiently encode it as an f and this part is also often efficient as well so this is the decoding so you take an f that had some non-trivial success and you transform it into an x that had non-trivial success and basically the way that this particular reduction is going to work is the following so this x is going to be the alphabet is going to be numbers from 1 to L for some L so every guy here is like a symbol in this alphabet and the coding is very simple we simply replace every symbol with a block of bits so we're going to map say xk into some block you call this fk so we we're going to map x in Ln into this f1 to fn in 0, 1 to the n going to be times 2 to the n and the way we think about every so every block here will be of size 2 to the l and the way we're going to encode it is that we encode the encoding we take a symbol i we encode it as the function fi function from 0, 1 to the l to 0, 1 so this is like a string of length 2 to the l and where fi of w for every w is simply wi and these functions are known as dictators the idea is that from social choice theory you typically think of every boolean of every boolean function as taking the votes of the i guess it's very relevant for tonight taking the votes of l members of the populations and of l members of the populations and coming up to the decision and basically a dictator function ignores everyone in the population except w and w trump and then say majority function the sum and the valency so this is encoding we encode the symbol i as the i dictator function and now so basically typically when I describe this with action you sometimes think of like maybe first thing you think of here but often it's actually useful to start with these things okay this is encoding now we need to satisfy these properties so we're going to need to put some tests and the idea is the following the idea like let's suppose we are now focused on max cut our goal is to show how this for max cut then we want to put some graphs we want to put one large graph on top of it which you can think of it as a collection of many small graphs and we want to ensure that if you have a cut that is significantly better than s then two things happen first of all every one of those guys or most of those guys at least is somewhat related to a code word in some technical sense and these code words correspond to something that satisfies the original constraints which I didn't talk about and it turns out that really the beauty of unique games starting from unique games is that you can kind of forget about trying to satisfy the original constraint it will happen automatically and what you need is really some tests so what you really need is so what you really need is some tests that so you want some some graph G with on 0,1 to the L so on these vertices such that the F is a dictator then the cut value of F is basically C and if the cut value of F is at least the sound s plus epsilon then F is some sense epsilon related the sense is a little bit technical the notion is known as maximum influence some epsilon related to a dictator or some epsilon prime that has to do something with epsilon so it turns out that if you have this then the reduction works and at least you can understand how the reduction will work in the sense of how you're going to do the decoding so the idea would be that now you know that there is a dictator that is somewhat related to you you can decode back to a dictator you have to prove a lot of stuff but at least you can get off the ground you know how the decoding will work and that's the heart of what's going on there so basically why so basically how are we going to do that so the idea is the following so let me, I mean I told you that every integrating graph gives you a very metric inequality but let me focus on the say Pheides-Schechman integrating graph and show you how we can make it into a gadget the Pheides-Schechman integrating graph we call it was this graph on the sphere we connected two guys, you and me if the dot product was roughly wrong okay so how do we make this into a gadget and the idea is somewhat simpler so let me say so now we want a gadget so we are going to to do the following thing here is how a gadget G the graph G is going to be the following so I'm going to describe it by describing the distribution of the edges so describing how do I choose an edge if I choose an edge at random so a random edge X, Y in G so the vertex set is like 0, 1 to the N you basically choose X at random in L and Y, I equals X, I with probability okay so typically I want it to be okay so we want to have 1 minus X, I the cut value which was X, G, W which I think was 1 minus O, G, W 2 or something like that or half minus O, G, W 2 I think and X, I with probability 1 minus X, G, W so Y, I is so we get this property and so this defines the graph so one way to define the graph is how do you say the set of edges and I just talk about the set of edges as the probability okay you can say okay maybe there are weighted edges turns out it doesn't really matter you can kind of duplicate things you can make it edges without weight let's not in this hour of the night let's not you know I've been cheating you so much that at this point like I'm cheating you we've added weights to the edges it's like so so this is the graph basically you just take this sphere and make it into a cube where like the basically the edges will correspond to the edges in some sense correspond to the same correlations as you have in the cube and and what is very easy to show is that if F you know F of X is simply X I then the value of the cut the probability that F of X if you take a random edge that F of X is different from F of Y is what we set it out to be you know on every coordinate I it's completely independent so this probability is going to be exactly X GW so this is which is our completeness so the nice function the dictators will give us the value that we wanted so now the question is why do we have soundness so let's think of what is like a very not nice function so if you're a dictator then the thing that's most not nice for you is democracy right so so what is very not nice function so here is a not nice function F which is you know 1 if sum of X I is larger than n over 2 and 0 otherwise right this is the least correlated function with a dictator okay so now let's understand what happens what can't value do this again so if you look at the probability that F of X is different from F of Y then you what are we doing we're looking at the probability we're basically going to F only really doesn't depend on X and Y it depends we're looking at the probability of like sum of X I versus sum of Y I you know this F of and now if you look at sum of X I's I know it's very late but if you sum up random independent 0 1 variables what do you get Gaussian right so basically what you see is that you basically ask yourself what is if you take 2 correlated variables why may as well be row correlated Gaussians and if they are row correlated Gaussians you can ask yourself what's the best way to cut row correlated Gaussians and then what you do what to do is to say to answer yourself I've actually seen it before and the best way to do it is the best way I would be able to get is I was row or pi that's the best value I'm going to get if I want to cut row correlated Gaussians because this is basically just a special like if you looked at the sphere graph you could try to cut by basically concentrating on like the first coordinate that would basically behave like a row correlated Gaussians so this is this happens to be conveniently our soundness level so if a function, at least this function, the function that farthest away from a dictator so the function that farthest away from a dictator so basically what it says is that dictator gives you value completeness and the central limit theorem tells you that democracy gives you value soundness okay so at least some democracies are sound so so but what we wanted to say is that every function that gives you value better than S better than S is somewhat related to a dictator and this is where this invariance principle comes into play so I'll not define you exactly the invariance principle at this point because it's late but basically it's a generalization of the central limit theorem so the central limit theorem basically tells you that if you have independent variables and you take some linear combination of them and you don't put all the weight on just one of them but you kind of spread the weight evenly then you're going to get Gaussian the invariance principle tells you that if you're applying any polynomial any say load the big polynomial to a set of variables and this polynomial is not very correlated with any particular dictatorship then it would not distinguish it would not be able to tell apart whether you plug it in independent random variables or you plug in Gaussians and when you plug in Gaussians then the value that this polynomial cuts is basically this value it's the value that you would get on the sphere graph so and the integrating up guarantees to you that you cannot do better than sourness so basically the invariance principle tells you to be able to distinguish between the sphere and the cube you have to be somewhat correlated to a dictator and and that's how this gadget works so I think this is as much as I'm going to say about this proof and for tonight so questions in which that you were trying to distinguish a sphere and a cube right so you want to say in the sphere graph in the sphere graph it's an integrating up so here the max cut is at most the soundness level which is like alpha g w our course in most of this thing so but there is a pseudo distribution that gives you a value the completeness level in the cube graph there is an actual cut which gives you the value of the pseudo of the pseudo cuts here so in the sphere graph the maximum cut as at most s the cube graph you can think of it as a version where you kind of planted when you kind of planted cuts that give you the same the pseudo value for the sphere so if you cut along the along the coordinate using a dictatorship function you actually get the value the value c so and what you want to say is basically if you have a dictatorship function then it gets a very different value here and here because in the sphere it will give you value c here no function will give you better than value s if you have a function that's very far from a dictatorship then it somewhat doesn't really know because it's not particularly it's a little bit going back to what Pablo said in some sense it doesn't favor the coordinate basis if it's a function that's kind of doesn't it's not close to a dictator it doesn't really favor the coordinate basis it doesn't have like anything special so it's not really going to know if you fed it the cube or the sphere and therefore it's not going to give max cut better than s and then that's going to be used in our encoding so so basically the way the reduction will work we say okay we encode a symbol here as the dictatorship but then we need to decode so we say we're going to put a graph here and if if somehow we're going to find a partition that gives better value than the soundness then for most guys or at least a good fraction of them is that just the the problem we start with will be so good parameters that it's enough that the constant fraction of them we can find they are somewhat correlated with a dictator and then we can decode them and get this dictator here and get an assignment that satisfies the original problem so degree 2 in some sense the fact that it only works for degree 2 is a feature not a bug it only needs degree 2 so you get a unique game hardness and you only need the original integrity gap to be a degree 2 integrity gap so in some sense this makes the theorem always stronger the fact that it only works for degree 2 and whether the right way to make to get NP hardness is to start with a stronger graph or to just put the in the game's conjecture and then the in the game hardness is NP hardness that we still don't know so it is possible in some sense that this theorem will translate into a way to upgrade automatically any degree 2 gap into a degree million gap if the in the game's conjecture is true then we predict that we don't know how to do that so there's no proof that you rapidly get from different degrees of SLS not just there is no non-proof there are plenty of problems where we do no of the degree 2 gaps but we don't know of the degree 100 gaps including max gap so we don't know the degree 100 for max gap someone know that this particular gap as it is cannot be degree 100 gap whether it can be massaged, tensed flipped around into a degree 100 gap that's a great question and this is a wonderful open problem okay so if there are no more questions let you go early I hope you watch the debate I am of course completely new to us but let me just say you know made the best woman win