 Okay so, yes we can start, so maybe just some kind of administrative or non-SWSP, I think it would be really great if people tried to informally walk together on the exercises just going over the lecture notes, so I will do it like an open friend in Piazza every every time and after every lecture but also feel free to just, you know, just talk to each other and personally I think it would be particularly good if you'd find groups that kind of go beyond across half of the mighty boundaries or across, you know, CC leads boundaries or whatever so just, you know, feel, I don't want to quote any of these kind of things, just feel free and also I would appreciate it if you do solve the exercises that you post your solutions on Piazza it would be useful for me for future things, for future iterations and also useful for the other students, you just, you know, give it a headline of like a subject of solution for exercise and then people that didn't yet do the exercise know how to know that they shouldn't and shouldn't look at it, so anyway I'm trying to keep these kind of things basically informal but it will be great if you have them, okay, so today I want to kind of the topic for today is kind of the Asian, one of these will be very kind of very high level, would be very technical but so hopefully there is a bit of both and we'll see why I think the planetary problem is a good problem to demonstrate this issue but it's by no means the only one so in particular we start with the second thing, what are we going to see this lecture so basically this lecture we're going to see that the following holds, if we let S be some random subset of n size omega, let's say equal to 0.49 and we let G be selected from G and R union ks, by this I mean that I take a random graph, every edge there will probably happen, I add a click on S, then what we'll see is that we have the high probability we can find a pseudo-distribution mu, you know for every the low one, with high probability we can find the greedy pseudo-distribution mu that satisfies, I guess since this is a 0, 1 variable we might as well just say x j, 0 for every i that's not connected to j in G and expectation sum plus i now so far this is not very impressive, you know the graph has a click and I basically notice that this condition if you're talking about distribution over 0, 0, 1 to the n then basically this condition tells you that it must be supported on clicks because this thing is always non-negative, right if it's in 0, 1 to n so if there was even one non-edge that had positive probability in any of the support of mu this would be positive, so this tells you that this is supported on clicks and this tells you that basically the average size is omega n because omega is the largest click then basically if this was an actual distribution into it, so if mu is actual distribution then we would get that mu is really basically just support 1, it's just completely focused on the actual click that we've done because we vary high probability in a random graph it's kind of an exercise, I think it is an exercise to show that with high probability you don't get more than o of log n in fact 2 times log n sized click so there's only one click of size so large and that's the click we put in so but the thing so if it was an actual distribution into this but the somehow weird thing is that this is not going to be the distribution so in particular this should mean that expectation of say xi according to mu should be either you know 1 if i is in s and 0 if i is not this is not the real thing is this is not what we're going to get we're going to get the expectation of xi is going to be roughly omega n for every i we're going to get expectation of xi xj is going to be roughly omega you know there's some constant but basically omega squared over n squared for every i connected to j so etc so this is this is basically weird so basically in some sense mu pretends to have you know high entropy in particular entropy larger than zero so right so so the true in the in the true case every distribution every distribution with these properties should have zero entropy but the the distribution we produce pretends to have kind of significant entropy it pretends to be almost in some sense pairwise free wise four wise independent etc so and so it's kind of it's kind of weird like in some sense it's a distribution that doesn't make sense like i'm giving you these kind of moments and you kind of know that there is no way they make sense there's no way they could correspond to the actual thing and what's possibly even we're there is that same new walks even even without adding the click so now basically there is no distribution and so so now there is no distribution that satisfies this condition i'm not sure what's the right way to think about the entropy maybe it's usually we should think that here the two entropy is like no zero minus infinity and so you know there's negative entropy but we still the same you know even though there is kind of negative entropy and we still have the same you and we can even talk about something like you know can define something like you know expectation gives us a concrete value for expectation of x17 which basically if you think about it what is x17 is the probability that vertex 17 is in the non-existent omega click so basically you know the graph doesn't have a click but i'll still give you probabilities of what's the probability that vertex 17 is contained inside this clique that doesn't exist this is why i kind of sometimes like to compare these kind of probabilities to you know the probability that the unicorn has blue eyes right so there are no unicorns but we still can talk about their properties so yes so what where is the question here yeah all right no it will be one right because if if you was actually fully supported it and only support one and only on the corrective vector of s then you know expectation of xi would simply mean you know one s no no it's like sub xi equals omega right so it's a so it's like a zero one right the x is in kind of zero one to the n and right so in the case that you know the s the x that should be the case is like zero one but but basically if for example you had like the uniform distribution over all sets which of course doesn't make sense because there is only one click but then then you would get that this would be like omega over n for everyone right so note that this this sums up to the same thing right this sums up to omega and this also sums up to omega but yeah so in some of this makes sense right the probability that either i is in the clique or that is not in the clique so the probability is supposed to be either zero or one but what we get is that is we get these kind of probabilities that are and you know that are not zero one and and basically we we kind of want to interpret what do these probabilities mean and and this is the interpretation we we mean is basically we want to define the following you say that expectation so here is an interpretation so one interpretation can be the following that when we say you know that the pseudo expectation of x a x i i mean the product of x i for i in a so so basically what we think is the what is x a it's the price that you pay for the you know x so the a in s stock what do i mean by that i mean suppose you know suppose we i planned this clique and i told you now and suppose we planned the clique gave you the graph and i told you now you can so so we are this game and we you know we give out the graph and a month later we also give out the clique and in the meantime people can place bets on events like the probability that you know some subset a is in the clique and so suppose there is a market where you can buy a stock like that that would pay one dollar if a is in the clique so how much would you pay for this thing so basically you know for example for if a is a signal vertex the base case if you didn't even look at the graph you would you know you would say it's about omega over n you know if the price was consistently below that then you could make money and you know if the price was consistently above that it maybe you could you know betting on the negation you could make money but but once you look at the graph maybe you update your probabilities if you are the unbounded computational time then okay so who is you right so that's question right if you add unbounded computational time then the price that you'd pay for every such stock is either zero or one because you would find the maximum clique and either and then you know either a is in this s or not but if you don't have unbounded computational time then you know we believe this clique is a hard problem then your price probably is not zero one it's something else and what is this something else so so we could say that basically one one way to come up with these prices is using sum of squares and and for this problem at least at the moment this is the best algorithm we know to come up with these prices we don't know if an algorithm that's better let you know that's that's better for this but but these prices are right so so so these prices for the planet clique which is why it's kind of useful to think of this Bayesian setting and in the clique case the prices are not super trivial right so basically if you look at so let's do some exercise suppose the degree of 17 is equal to a and you know n over 2 plus i don't know five standard deviations square n whatever like one standard deviation doesn't matter and omega is like let's say it's pretty close to square things like epsilon square n so if i just give you this information what should be the expectation of x 17 right so first of all is it going to be bigger or smaller than omega over n yes bigger right because you know if if all else being equal if you know that the if you know that the vertex has more the higher the grid and average then it's it should be at least a little bit more likely to be in the clique but now we want to suppose we want to compute by how much is it supposed to be more likely then one way to to think about it is the following we kind of want to look at and and this deal right that's that's why they call these people basions because you always use base base low right so you say the pyro is right so you want to look at the probability so basically you want to look at the probability that 17 is in the clique and you want to look at basically the probability that 17 is in the clique condition on the degree of 17 being say n over 2 plus 5 square n and basically this is why this is the probability of 17 is in the clique and the degree 2 plus 2 over 10 divided by the probability that we and just basically what happens if each 17 is in the clique then it means that the degree of 17 is basically a binomial with mean n over 2 plus omega right these things binomial sorry okay so let's be right as normal because I always confuse no so it's basically a normal with n over to omega and and you know and remember something or any variance right and if 17 is not in the clique then this is like n over 2 so roughly speaking and this is basically the difference is it's a shift by omega over square root and standard deviations so when you do the calculation what will be is that basically you get 1 over n plus some constant times omega over square root and more generally like the basically there is some correlation and so you can compute basically the correlation of the degree and and xi and if you don't compute that you will lose money right if you bet against me and even we are both computationally bounded but you somehow neglect to include this information then then you will lose money because I will consistently look at the degrees and bet against you and and I will and I will win and the closer omega is to square root n the bigger my kind of advantage could be and and and it's not a priori clear that some of squares that the sum squares algorithm so can take into account so it can somehow you run the sum of squares algorithm to give you so the distribution it's not a priori clear that it takes into account this kind of soft information that you should somehow increase the degrees you should increase the probability based on the degree but the kind of interesting thing is that it does even though some other natural algorithms do not so so the main result in some sense for a planted clique is a negative result that says that to actually solve the problem we some squares doesn't do at least a very very much better than than other algorithms but in some sense it will if you add this betting market on the clique some squares will win against these other algorithms by some amount just not very very loud and and here is just an you know an open problem you know find a betting strategy an efficient betting strategy say n to the o of one time betting strategy you know the planted clique game that beats the DSOS for every right so show show that say for polynomial time algorithms some squares is not optimal for this game there is a strategy to beat it I don't know if any such strategy I think it would be very kind of interesting and it seems like a very natural setting to ask right so so suppose I give you a market for betting on this kind of events can you find an high price things according to some squares of degree d can you can you win against me with a polynomial time strategy we don't know so so so in some sense yes we kind of do know that somehow for ls plus so for ls plus the there are values of the distribution that don't take into account these soft constraints so what ls plus is right so I'm saying I can beat ls plus using some squares I can beat ls plus ah for this problem we don't know no no like extension complexity no no we don't know so that's that's another question if if we if we can't do it and you want to find a model in which you can prove you know any possibility result then this is the model we'll talk about later which is kind of extension complexity and it's very interesting to know if you're extension complexity there are some issues with extension complexity in the planted click version you have to define I think there are like some some issues exactly how you define these things because it's not an instance independent for it but I'm assuming that there is some good ways to define it okay so so in some sense and the sound square is kind of that gives us this some natural way to come up with probabilities and let me even try to see if I can do this thing I don't know if you've seen this figure on our website we are so proud of it that it takes its time and it's there so so there is this figure that I'm very proud of I mean I have nothing to be proud of David but I'm proud for him it's uh what's that that's just super view I don't think that's my email oh yes okay that other person should be interested in the gross money and I'm looking at is actually something related to unique games so that's actually very interesting where is laptop is that like one they know something about the gross money and like talk to them at some point we'll talk about unique games and etc so so in some sense okay so so let me just say this is a fake figure in the sense that we didn't really want the sum of squares algorithm to compute it but it does somehow I think it's somehow visually similar to what you'll actually get it's basically that in some sense initially you you want the sum of squares algorithm initially all the vertices are uniform you kind of because this was kind of fast but basically if you want it if you want some squares with an actual blood clique then somehow initially so this kind of goes back and forth so initially like all vertices gets omega over n and then you start looking at say a degree statistics and then you like start looking at triangle statistics and maybe a four cycle or like some other graph statistics etc etc until you until you basically isolate the clique and so so basically the the bets you would make will be more and more informed the more time you have so anyway I kind of like these people so so this kind of patient view is good for us but whenever we design algorithm right so so so basically the idea is that we we think of the sum of squares has given us this kind of betting probabilities the the beliefs of a computationally bounded bounded agents and one of the things that for computationally bounded agent is if you have an algorithm to come up with probabilities you could cheat this algorithm and you could give it say in this case like a graph where things don't exist and because it doesn't know how to distinguish between these two things you will still happily you know give prices and make bets on things that are just nonsensical you know just like basically we might if we had to bet right now about various mathematical theorems then you know we'll probably there are probably some false mathematical theorems that we think there is some probability that they are true but so because we are we are bound so so this Bayesian thing might seem like you know some cute philosophy and you know doesn't have anything to do with you know actual algorithms but we'll see today that it's useful for a low bound but I want to also kind of say well why it's also useful for algorithms so basically I don't know a lot about Bayesian data analysis I think maybe Pablo can elaborate more is that basically the Bayesian approach to dealing with probabilities so basically they say you know on a philosophical level we think of probabilities as actually talking about beliefs and and so we can talk about probabilities of things that are you know completely determined so one example I have in the notes I can talk about the probability that my grandfather had blue eyes you know of course he either had blue eyes or not but I don't know and at the moment the only way I can you know I can only give probabilities about that because I can just look at the descendants and try to estimate what's the chance that he had blue eyes but you know so even though this event already happened in the past and is completely determined in a Bayesian approach I can talk about probabilities for that so so Bayesian in some sense philosophically think of probabilities very very differently but then they take these kind of probabilities and I think mathematically they assume that you can treat them as probabilities and every mathematical theorem that we have proven about actual you know probabilities defined on sample space etc we can use not just Bayesian is that fair right so so so we basically will try to do the same thing so I kind of call it you know the Bayes-Marley approach so basically it's maybe one way to think about it is like you know think like a Bayesian and compute like a frequent like a frequentist we are given say for when we when trying to solve an algorithmic problem we're typically not going to even you know we if we try to solve it using some squares we're typically not going to think about even you know squares sdp and all that will basically say okay we are given this mu now this mu that encodes all the you know the best beliefs topic about you know some unknown that is somewhere and this x is a good solution for a problem whatever the problem is and basically our goal is to find some x star that is also kind of a decent solution when we do that this we can kind of to assume two things very first of all we can say we can assume that every kind of property that we can kind of knowledge that we have that x should satisfy then we can assume that mu satisfies so basically we can assume that for every for every simple function f if you can reduce that say f must be you know if you can reduce that for f some f simple function f that's a f must vanish on x then expectation of mu of f should be zero if we can assume that you know it should be some interval then we know that the expectation of f of mu should be in this interval and then we can also then start pretending that this is an actual distribution and we'll see that later in this course where we actually pretend that this is an actual distribution and and do all sorts of computation that makes sense if this was an actual distribution and and we just kind of cross our fingers and hope it will be all fine even if we use to the distributions and and and typically it will be but but this is a this is how we use this for algorithm design and and and we talk about it more later in this course and in particular for algorithm design I think it's it's particularly attractive in the sense that it somehow changes the order of normal if you've seen an approximation algorithms based on complex relaxation then and typically you first design the relaxation then you try to come up with a rounding algorithm for the relaxation and in this approach the relaxation is almost immaterial you first come up with the rounding algorithm and then and then you look at your analysis of this rounding algorithm and try to fit the relaxation to the analysis and make the show that the relaxation is powerful enough so that it captures the analysis and you hope for example that maybe large enough degrees almost squares will be powerful enough that it captures the analysis and so you didn't you didn't need to think too hard but in some sense it's the order is kind of inverted in what you think you do things and it's to me at least feel like it's somewhat more systematic and requires less creativity while sometimes you could look at relaxation based algorithms and you ask how did they come up with this relaxation but but basically they so this is how we use this kind of approach for for other bounds but we'll also show how to use it for low bounds so so for low bounds so if you recall for some squares in the gravity gaps so the vehicle approach so okay so what is it what is an integrated gap just so we remind yourself it's an uh you had some instance say instance some instance i of say maximization problem or whatever uh such that objective is small there exists a pseudo distribution such that uh say expected expected objective is large and the pseudo distribution is satisfies whatever constraints are there to satisfy so so this is an integrated gap right so it's a two objective is small there there is a pseudo distribution that the objective is large and this is this by the way is not restricted to um to some squares essentially this is how an integrated gap always looks for every convex relaxation we just typically don't use the names so the distribution we do use the name fractional solution or a you know a primal certificate of your certificate depending on what kind of what viewpoint we come up with but basically this is how we always prove a gap between the say integral or actual solutions and and the kind of fake solution that the relaxation comes up and the kind of canonical approach that people use for for this kind of thing is first of all basically they you know they guess the pseudo distribution and then uh and again this is this is always uh if you look at any paper showing some intergalactic gaps where it's linear programming some definite programming some other things typically that's what you will see they will basically say here is some particular uh you know set of numbers and now we'll prove that it is valid and therefore it certifies that you know so they first of all prove somehow that the objective is small and then they say now despite the objective being small here here is some set of numbers that pretend the objective is large and now we'll prove that it's valid and typically if you look at the paper and you see any you know you never see how did they come up with this pseudo distribution they just you know they don't need to justify it right because to show an intergalactic gap they just need to show that it exists so they just kind of pull it out of her heart they just somehow you know they somehow guess it in fact often it's kind of supernatural that you almost don't need them to you know to explain it like for linear programming say max cut as often it's like you know the all half vector or something like that is is is this thing so it's almost so so natural that you don't really need them to explain it but there is something that bothers me about this is that this step is you know this step is a creative i don't like creative steps so the nice thing about the Bayesian approach is that in some sense you almost forced to right if if instead of guessing we say you know this pseudo distribution must basically satisfy the best bets that we can make even the information we have then in some sense as we'll see in the plant click case but it's kind of more general there is basically almost no choice or no choice in how to choose this distribution so in some sense we can get rid of the guests we still have to prove it and right now this step is still somewhat creative in the sense that i mean it's definitely hard in fact i will not be able to even give you the full proof of psd as for this particular thing but i'll kind of give you the outline of it and point you to the paper for the actual meet still working but trying to find like this is another great open problem to find the right proof for this thing but i feel like this also eventually should not be creative because at the end of the day we do have you know efficient algorithms to test these conditions in the way the conditions that something is valid it's basically the hard condition is basically that is correspond to some positive semi-definite matrix and i feel like eventually we would be able to so so so the dream in some sense is basically that both these right that both these steps would be kind of basically automatic that you would have at least in principle some finite computation that you can you know run on the machine and it will tell you you know it will first come up with the sort of distribution because the reason to really any creative choice to come up with it and then it will tell you you know if it's valid or not and you know basically it will tell you what's the best parameter that it can do it and it will kind of tell you it would be if we obviously can do it for every particular input length but what i would like is that one finite computation that will give you the answer for all input lengths and i i think that this should be possible but we haven't got yet so and this by the way there are there is a setting in which this dream very similar to this one has been realized and this is something Pablo will talk about i don't know if next week or the week after doesn't talk what's going on what doesn't say it's not any great one okay so so basically Pablo will talk about using utilizing symmetry in so so basically how do you make a problem that is potentially infinite into finite and and and and use it both to actually say solve things in practice faster and also to answer basic questions in experimental combinatorics and so there are cases where this dream has been realized here there is it still hasn't been realized because this this is kind of a problem that is not symmetric point wise but symmetric in distribution and we'll see what this means so but i feel this is only a matter of time and i hope maybe one of you guys will be the person to do so and so basically this is it for a Bayesian thing so i think we can now take a break and then move to actual planted clicks so let's go okay you're full 10 minutes we start okay so okay so uh the planted click problem we define uh you know g and half is just uh you know random graph every edge is in probability half and g and half omega is like a random yes yes yes so you basically set select a random set of size omega and then you add the edges that are not there already so in expectation you'll add something like half times from an edge so so this is the planted click problem and and and the problem for any of those kind of average case problems there are typically three variants so is there is kind of decision via distinguished between these two guys there is the search via in the case that it's planted and there is the certification or certification or certification variant in the case that it's random certify that there is no such click and generally speaking the decision is easier than either one of those because you know if you can solve a search you can distinguish if you can do you can distinguish and and these two are incomparable for many average case problems including the planted click basically all the algorithms we we don't know for any better algorithms for any one of the variants so we kind of don't really distinguish between them too much we kind of believe there are all that the omega range is clearly the problem becomes easier as omega becomes bigger and we kind of believe that the pressure in which it becomes easier is the same for the problems and my kind of my motivation for planted click is that it's going to teach us a lot about about some of squares and and and it will force us doing this low-bound and kind of forces to develop some new techniques but people have been very interested in the planted click problem for their own reasons it's it's kind of arising in all sorts of domains and there are references in the notes and applications which most of them by the way you know most of them are these kind of things where I you know searched and add some sense that I think these things are related but probably if you actually follow the reference you'll find out this thing is completely unrelated so please don't follow the references yeah but there are definitely papers in all sorts of domains where you have similar sounding terms in their websites okay so and so what is the theorem that we want to prove and is the following and so basically okay so so there is an exercise to develop in in the notes how to the how to do it when omega is some constant times square root n let me give you some just quick quick suggestion why is square root n should be about the right amount at least an easy thing to do at least for the refutation problem and it has to do exactly with your question so if you look at the total number of edges then and so the total number of edges here it's right so the total number of edges here it's basically a normal with expectation like n choose 2 over 2 and you know standard deviation say variance n choose 2 or n choose 2 let me just write it like that and square and the number of things here is like again n choose 2 over 2 plus half times omega choose 2 and so standard deviation and we can see that when omega is roughly square root n this becomes like a constant times the standard deviation right the standard deviation is o of n and this is like omega omega squared so when omega is so constant times square root n this becomes like a standard deviation so you should be able to distinguish just by counting the total number of edges with some you know constant advantage over half and the bigger over there is like 100 per square root n the advantage becomes actually the probability becomes better and better and in the other there are some ways to amplify it and basically looking at the spectrum of this adjacent matrix of g you can get more better probabilities and also actually recover the recovery so so basically we knew how to do it when omega is square root n and the main question is could we do it better and one evidence that was that this is not the the case was before like a bad color i don't remember like maybe it was or i'm not sure exactly when two that's like two some some number and then they showed that basically a a ls plus hierarchy in a time n to the o of b does it basically work for when omega is something like number two to the d and notice that basically when d is log n then it basically this becomes meaningless and indeed when this log n we can basically when when in time n to the log n we can all program we can solve this problem because we'll have a very high probability we'll not have a and we'll not have a clique of size larger than 32 log n so if we search over all three log n size sets and we find a clique then we know for sure we are in this case and in fact it's not so hard to solve the search problem in this case so so in time n of all again we can solve this problem but for every basically this thing for every constant d it only gives us the square root n and basically the result i want to talk about today is this work with a was just presented in a fox course points last week and this is that basically for every constant d and for every epsilon if omega is less than n to the half minus epsilon then there exists a d sub-distribution you know such that expectation x i x j you know we pipe on a d over g over one number there exists a sub-distribution u such that you know this is zero for every i i j and the expectation of okay so this basically says that some squares are going to it does not do significantly better you know basically does n to the half minus also in the lecture notes there is actually the exact dependence between epsilon and d and basically and so so so so this is this is the theorem and i think what's what's kind of interesting here is that it's it's also this construction that someone teaches us something and and the analysis which would see some of it but not all okay so so is the statement of the theorem clear yes okay so so now we just have to prove it okay so okay so so let's let's look at basically the first first approach kind of the classical approach to proving this theorem and this is the way these theorems usually were proven in the past was basically to say well you know this is i didn't even tell you what ls plus is maybe i can actually tell you what ls plus is ls plus is basically it's like s or s but you know the expectation that p squared larger than zero is only for linear so you define the grid d so the distributions you can often you define the you give the you give the moments for all you give the moments for all the grid d polynomials but you only require this non-negativity condition for linear things am i confusing things or this ls plus maybe you need some kind of margin that you need that these things kind of make sense like if you have a strong constraint so okay so so in ls plus if if there is something that's supposed to be if you know if you have something that is supposed to be zero then you require that expectation of p q is zero for every for every q i think that allows you to somehow connect the the low degree moments with the i yes but i think that basically this is the most important part and and the natural thing which actually at work with other sum of squares low bounds in the past is basically to say well maybe maybe this maybe this s or s maybe this thing is already psd and this thing maybe this distribution was very very simple that would be a very nice thing if it was so so the the fk let's call them fk moments the following thing it's basically or i can also call them the hard so basically they said the following and they said the expectation of xs and the expectation of ss would be you know essentially omega okay it would be zero if s is not a clique right if s now i look at think of s as kind of a small set think of s as like a set of four vertices so if a set of four vertices is not a clique then the probability that it is inside the planted clique is zero right everyone agrees right the clique does not contain a non-click inside right so so this is a hard constraint we better kind of like every idiot knows that this should be the case and and then basically the other and then the other things would say okay this is a clique then we give it a we basically say you know it's basically we've equal probability you know it will fall in that inside thing except you kind of need to you know because these guys are zero you kind of need to increase these guys by something to make everything sum up to omega and the right thing some constant doesn't really matter what it is think of s as a constant so it doesn't really matter what it is something like this you can ignore that constant it's not important so basically it's like omega if s is not not a clique you give it zero if s is a clique you give it omega over in the power the size of s right so this basically says i'm satisfying the hard constraints i'm not an idiot i know that if something is not a clique then i'll give you probability zero but i don't know anything about about nothing if you tell me a vertex what's the probability of it i tell you well omega over n you give me two vertices connected by an edge i give you well omega square over n square so that's kind of and these kind of moments these are hard to say moments at least kind of a very similar to these moments that we've seen in in the gregory of proof for three x so if you remember that so there basically we looked at a set of we looked at the set of variables and we say well if we can obviously show that it's you know if it just did a small combination of a small number of our equations then it's clearly you know the moments need to respect that and other we basically say it's behaves like uniform so the moments were like as if it everything was uniform in that case was like equal probability being zero one so so so let's just see i mean i told you that these moments actually do work for the degree equal to k so let's let's see why they work for the degree equals to k so right because i said that they do satisfy this condition for degree equals to and let's see why they work so basically if let's look at this matrix and you know expectation of x i x j according to these moments so so this matrix is basically the following thing and it's now the probability that every individual vertex is in this thing is a omega over n so expectation of x i squared which is the same as expectation of x i because we are zero one variable it's omega over n and then plus we get here and omega you know this constant which i ignore and omega over n squared times the j cc matrix of the graph so let's let's make sure that we understand why this is the case right so if we take x i and x j then if i equals to j then we're looking at expectation of x i squared which is the same as expectation of x i it's the probability that i's in the clique omega over n right so on the diagonal here we're going to have omega over n and if we look at i and j that are different then if they are a non-edge they get zero and if they are an edge they get omega squared over n squared times some constant which doesn't matter so so this is what you get right omega over n plus omega over n squared times the j cc matrix of the graph okay now and the graph is a random graph and there is a general theorem that says you know the we actually know very well how the spectrum looks of a random graph like that so basically it's going to look like this there is going to be one eigenvalue that is like m or n over two or something that corresponds to the all ones being but it's a positive eigenvalue we don't worry about it so much and then all the other eigenvalues will be kind of in the range minus square root n plus square root n so in particular what we know is that lambda mean g is going to be larger than some constant times square root minus times square root n so if you want this to be pst what we need is that omega over m is at least square root n times omega squared over n squared and the way we should we can write this is that omega over n times omega over square root n so and this is this is fine when this is fine when omega is smaller than square root n okay so so this this is very very simple and we're going to get two things that are more complicated so if you don't understand this this is an excellent time to stop me and ask the question what like the first one right yeah I guess here I'm trying I'm just proving the I'm just looking at this two by two matrix and not looking at you know the full moment matrix as some other part that's kind of corresponds to I think this is basically like in the two cases this is basically the matrix you want to prove that it's pst okay like a constant like which basically constant in these cases yes so yes so I think you are like maybe to prove that this matrix the actual matrix that it needs to be pst is slightly bigger but this is really the heart of this of this matrix so right so so so but but people understand why this matrix is going to be pst okay so so basically this and so this works for then basically there was in a 2013 they published a paper that basically gave a proof that you know fk moments are actually satisfied you know quotation p squared for every p so basically that would mean that they are valid so the distribution and basically we kind of why did they put proof you know in quotation marks because we kind of already know that if we believe sos is optimal then this cannot be true right because we you know we can you know these moments are silly they they don't adjust say for vertices of higher degree to give a higher probability and let me buy as a parenthetical remark I let me say that if you actually you know go and look at the paper you will see the timeline because there is some small adjustment that they do adjust a little bit but they don't adjust aggressively enough so so in a sense it's you can ignore their adjustment so so you could basically you could win you know you in polynomial time you could play against the these fk moments and win against them because basically they don't do enough of an adjustment and and of course the amount you win is important and we'll see these these moments obviously when the guy is small enough in which case you can still win against them but probably like a small amount and why small is considered negligible it's again like a very good question which I don't completely understand but but that basically so so these moments are are not so so so these moments so we kind of have some moral reasons why we believe there is something off with these moments but let's see why they thought they actually had a very nice poof unfortunately it had a bad but they had a very nice poof why these moments should be should be true and should be PSD and and the crucial component is in this poof was nice concentration inequality that again is a project before but it's kind of interesting to understand what could have been the case they said the following thing so here is a theorem which is the following suppose you have a some p from say plus minus one to the n two matrices say plus minus one to animal let's call it n two matrices that are n by n and and so you have some some polynomial here like that so it's kind of matrix variant polynomial and that's just assumed that expectation of p of x is zero by this I mean it's the zero matrix right it's a matrix value polynomial and how would you define the variance of p so somehow natural way to define the variance of p would be you know expectation of px squared or like because we're talking about matrices px px transpose so then like it somehow seems like that you would and it would make sense that you know the probability that p of x is at most k standard deviation away is is one minus like two to the minus k squared but you have to do something okay so first of all you have to do something like n to the o of d or something like that some polynomial factor and or some some kind of polynomial factors and you have to lose it and say n to the o of d or something like that and the second thing is that we even know from the scalar case is that we kind of need to lose your d where d is the degree of this polynomial and but this seems like a very natural thing to be true and you know that if d equals one for matrix value polynomials it is known to be true okay first of all before we ask even if it's true or false what does it even mean okay so so notice that I did indeed if something very fishy what is bar b what kind of object is it it's a matrix right so what is the square root of a matrix mean right if a if a psd matrix so basically what we know is like you know if it's a psd matrix it's as the form so you know if m has the p you know it has the form how do you usually write it like l transpose lambda one lambda n l then square root n is simply like this l transpose and all of these are non-negative squares under one square root n so the bias is a non-negative is a psd matrix so you can take a square root of p and when we say that say a is smaller than b what we mean here is in a spectral sense that v transpose a v is smaller than v transpose d for every matrix okay so this is what it means for one's matrix to be smaller than another one right so now at least we can make this statement make sense and and the interesting thing is that okay for d equal for if these things were skylar the actual numbers this is known for every d if these things were matrices this is known for d equals one for linear polynomials and and for skylar's you know that you have to lose this d in the exponent by the way so it's natural thing so so basically they conjecture that there is a common generalization of these things very natural conjecture they saw the in the other proof but unfortunately this is false and they show that this theorem implies that these moments are psd but unfortunately this theorem is false and and maybe I'm not going to walk out my it's false just I'll tell you like the example so the example is very simple and just define p of x in x transpose so just a quadratic polynomial and and so x is say and p of x is in plus minus one to the end so we minus make it to make it mean zero minus i so now you can do the calculations and it turns out that this thing is false if you're interested I think you learn more by just actually it's a good exercise so exercise these violates this example I think was shown to be given by this year but now we'll see so but this example just shows that their theorem is false they doesn't show that the moments are themselves are not psd but now we'll see that these moments are not psd and you kind of know what's the reason that the moments should not be psd the reason is that these moments kind of don't adjust for statistics and and now we'll kind of see why why not adjusting for statistics makes things makes things problematic and so so this might be kind of a you know very roundabout way to for me you know why don't I just show you the proof of the theorem rather than showing you how not to prove this theorem but I think it actually teaches us something it kind of justifies why we we need to take this kind of vagian approach and why why we don't just take this you know the the first the first moments that come to our mind so so I hope you'll bear with me and this a little bit of a roundabout way so so so here is and so here is a you know a theorem the fk moments there exists a degree to polynomial p such that the expectation of p squared is negative in fk moments okay so so these moments are not psd and the proof works by the as follows we first we find q such that expectation of q and so we find q of degree 4 such that expectation of q in actual in any actual distribution is much larger than expectation of q in fk distribution and then so we somehow find the q in which we can kind of make money of we find the q in which we can kind of make money of betting against the fk moments and it will be a variant of what we've already seen and then we convert it will kind of convert q and somehow and so it transforms this q into a polynomial that is squared and and and so again this kind of Bayesian viewpoint allows you to come up with this polynomial you know with this polynomial that you know it's not so hard to see it once you get it but somehow you know mega business or missed it and the reason they miss it I think is because when you're not looking at these things with this kind of Bayesian viewpoint it's kind of hard to see okay so so what is this q and we write q of x will be the sum j connected to i x i sum j mark j to x j all that in the form and we also write you know we can write the sum q q i to the form we define this part that q i will be useful for us at some point okay so and if x is a if x okay so let me make a claim if x is one s for another click then q of x so anyone see why this claim is true yes another thing the number of connected vertices that are in the for each x i so you can see if x is if x corresponds to a set then for every i you take the number of neighbors inside the set minus the number of non neighbors inside the set and then we raise it to the bar four okay so can you you see why this claim is true why he would be at least omega to the five yes what did you say because they're not because the number of non neighbors in any case it's not negative i because it's full five yeah yeah and the ones that are in the there's about a omega advantage of the neighbors not able to take that to the fourth and then there are a lot of mega gasses yes so if i in the click if you think about it this is exactly going to be omega right because for the guys that are not in the click this is zero like all the guys that are before each xj there are omega guys or omega minus one so that the neighbors of this guy that are in the click all of them are connected to it so you and everyone that's not connected to it is definitely getting zero here so you get your omega to the four and there are omega guys in the click so you get omega to the five and for other eyes it's at least no negative so it's not to be at least omega to the five many omega times omega minus one before deal with such a minor issue okay so so if x is in the click then we get this thing but so this is so we have this and claim two is that if you look at expectation and according to the figure part number and moments of q then it is going to be at most something like o of m omega square and as soon as do the calculation but just note that if that means that if omega if omega is larger than n to the one third then can you know beat and basically you know if i if i needed if i make bed bets on what is going to be the value of q of x then uh that basically you know the you know if i sold the q stock uh if then basically or a figure part number would give you know an idiotically small price for this stock the stock is guaranteed to give us always a return of omega to the five but a figure part number would only give it like any time omega squared and indeed in a very non-trivial walk or potari hopkins potari potachi hopkins potari potachi and i think the spandex on montanari also have the same result that they they show that actually these moments are pst when omega is less than n to the one third so so uh so this is uh so so let me prove this this thing so where basically we we are all this already shows that there is something off with these moments right they they give the wrong prices let's show and why this is the case so and basically people see this part of the board you guys over there it is visible for you for you okay so and so let's let's understand what's going to be the main point is the following so let's just look at particular right by uh you know my linearity you know we just understand what credit card we will give for you know some particular qi to the four right so this is basically a sum over j one j two j three j four and i'm going to ignore constants so you get basically omega over n to omega over n to the size of j one j two j three j four and then i'm going to get minus one minus one to the number of non-pages for a counted mean multiplicity right so so i expanded these right so i expand these so i kind of sum up over all j one j two j three j four and you get kind of a minus one sign if you're not connected to i and plus one if you are connected to i and and so we only get these two guys with your clicks right and we kind of know that there are moments for guys that there are clicks they give us omega over n to the size of this click so so we're going to get and and so so we're going to get this thing right so so basically the idea is the following let's separate it into two cases like let's say look at the case where you know j one j two three four you know is full size there are basically then we get like omega over n to the four and then we get the sum over all these guys which and plus minus one and the point is the following there and i is this is a random graph right so and there there are the moments the question of wherever j one j two j three j four is a click has nothing to do with i right so we can imagine that we selected all this graph and then we selected who is going to be neighbor of i completely independently and what what it means is that these plus minus ones are basically like random things so they are about n to the four you know n to the four divided by some constant they basically enter the four guys like that and if we are going to sum up is with plus minus one we're going to get roughly n squared so this contribution will be roughly omega over n over n to the four times n squared and then there is they're going to be the guys where these are only two of these guys then the the sign will always be one so you get n squared times omega over n squared and you can see that actually this guy will dominate so so the the so basically what we're going to get is that this expectation is going to be you know all of omega square and since q is the sum of n of those guys we get n times omega square so that shows that so that shows that we can win against these fake apartment moments but it doesn't immediately show that we can that that they are not and they are not some sum of squares they're not a psd but now we can show that so now we find with the final polynomial so how will i do it where i mean the question is really where will i do it let me do it here so here is an so now which theorem is this one for the theorem we're going to define we're going to select some typical i that just take typical i such that expectation of qi to the four is what we calculated about there essentially omega squared i'm ignoring constants all over and we're going to select p to be x i minus c times qi squared okay and i'll choose c in a second to see to make sure that it satisfies the now let's compute so now i want to show that expectation of p squared but i can choose some value c that expectation of p squared will be negative so what is expectation of p squared expectation of p squared is going to be expectation of x i squared minus c again i mean knowing the constants expectation of x i qi squared plus expectation of qi to the four times c squared expectation of qi to the four right so okay so now let's look at the two positive things we get omega over n right this is this expectation plus c squared omega squared and now and then minus c and now let's compute what is expectation of x i qi squared so expectation of x i qi squared so we have to remember what was qi so this is the following expectation of q i x i qi squared is the sum over all j prime such that i j j prime our clicks the omega over all n cubed times minus one the number of non edges from i to j j prime right i did the same thing as above now if i j j prime our clicks how many non edges do we have from i to j and j prime right so if you have a click i j j prime so how many non edges do you have between i and j and i and j prime zero right it's a click right right so so so basically once i did this expression which i basically did right so maybe i should have done it slower so i just basically i looked at x i x j x j prime and you know and so look at expectation of x i x j x j prime and we know it zeroes out when i j j prime is not a click right so so basically this is always going to be right this is always going to be plus one so this is basically just n you know there are n squared such guys so n squared omega over n cube so we get this value the omega cube over n now i think i can just guess what i need to do and so now if i look at c so now basically so this is like omega cubed over n and so this is like omega so so here we get like c times omega cubed over n now i just need to show that i can make c choose a c such that this will be more negative than these two guys are positive and basically what i'm going to choose is i'm going to choose c which is some kind of epsilon times omega over n i think that was the right choice and i mean this is just you know to balance things out so you get here so then you get here like omega to the power n times epsilon and here you get here you get the omega to the power over n squared by times epsilon squared and so so this dies out and this kills this as long as this kills this as long as a omega to the power over n squared is larger than omega over n which is exactly when omega cube is larger than n the same condition as before okay so so these calculations are a little bit boring but so if you didn't exactly follow them that's fine the point is the is the general scheme of things we saw that there is something off in the current gamma moments and then we found so something off in the way that we could efficiently bet against it and win and then we can convert it into an actual polynomial whose square gets a negative expectation and then once we get this actual polynomial we can fix it and basically this was a work of maybe i might by the way i'm confusing the references but so maybe this was the work of okay now so this was the work of i think uh okay maybe this was different but Hawkins, Kotari, Potetshin, Gavendo and Schwann they basically did all this work to fix they basically looked at this polynomial they fixed you know fk moments and they get and they show that for the degree equals four and you can get like square root n only the square root n but DevTix is a kind of was very ad hoc and it was kind of clear that for this fix doesn't work for the degree equals six so you could you could basically you know you could basically keep trying to find polynomials and fix polynomials find polynomials and fix polynomials but what we'll do after the break is is we kind of say you know let's take the guesswork out of it we're going to make make this in some sense a condition that you cannot you cannot we're going to make it a constraint that there is no way to bet against this spring and win and then we will see that this basically tells us what the moments are going to be yes how did we come up with that problem like uh what do we found some kind of test that looked different in or something like that how did we come up with what so what what's it called the cube yes so I don't know the principle way like we kind of eventually stumbled upon it so uh both you know polynomial and then actually don't care about the person that found the polynomial and then you know so we should implement that simply often then for you know finding a way to uh a square that's negative in some sense the okay the some of the way is the foreign thing you kind of look at this claim but you may disclaim that this should be omega to the five and what you do is you actually I mean it's a simple claim so you actually look at the proof of this claim and realize that it's a sum of squares proof so so any valid sum of squares so the distribution uh will actually satisfy that the expectation of q needs to be at least omega to the five so that way if you kind of do it in this general way you might not exactly maybe need to go a little bit higher degree but uh that it's not it's not the way out right because it's kind of a very simple proof right that uh that if it was an actual distribution it needs to be over there to the five and because it's a simple proof it doesn't use the probabilistic method then it's also is going to be captured by some of squares and therefore it better if a if a solution distribution doesn't satisfy it then it's not going to be invalid so that's kind of the more principled way to to do this so so basically what we're going to see after the break is how do we defy the solution basically uh by in some sense and we I mean to take like uh you know the kind of management buzzwords we want to uh transform a challenge into an opportunity so so basically what we want to say is you know we can we can keep looking at this as a bad thing that you know uh we keep finding these polynomials fixing these polynomials finding this polynomial fixing this polynomial what happened on that we say okay we just take all we kind of define the solution implicitly by saying it has to be good for all these polynomials in the sense that you couldn't bet against it and once we make this requirement then then then we basically are in a good shape because we at least we remove the creativity we know what the solution has to be and now we just have to move it so let's take seven minutes break okay so okay so so now um okay so in some sense everything we did before was uh you know was how not to prove this theorem now I want to talk to you about how to prove this theorem and for the I'm not going to be able to show you the full proof but uh I think I hope that I'm going to give you enough information so that if you now go to the paper the full proof will not look as scary as it might be otherwise and and at some point we might also have like better written notes on the PSD in this proof that uh one that I think when we get to the point where I believe the notes are better than the paper then I'll post them but at the moment I think the paper is actually right now the best source for it okay so so basically there is an ocean of arbitrage or a Dutch book it's a strategy to it's a strategy to win against the you know a a pseudo distribution so what is so basically uh suppose you have some F that maps you know graphs and clicks into real numbers then you know if it's something is off you then you can kind of use this as a you know that's I say suppose this is bigger than you know so suppose that the truth is bigger than this then you know that on average this is pricing FG too low right so so basically if you wanted so so basically if you wanted the condition of a pseudo distribution to satisfy no so if you wanted this condition to satisfy no arbitrage then we want this to satisfy this so so this is this is a natural condition but basically if we require it for all functions it's basically the only thing that will satisfy the actual distribution that's you know actually supported on the clique so so so so we're going to restrict it to so we don't want to say no efficient which basically say that for every F that is simple and in fact because we are not going to we believe that it's not going to be able to distinguish between the planned distribution on the right hand side we just we don't believe that this actually will matter much because otherwise we'll have a way to distinguish between the two guys and so so on the right hand side we'll actually use the completely one of these distribution so so basically let's call this equation star and we want star to work for every F such that degree and you know every F is a function in G and X such that you know degree degree X of F is at most some D and degree in the G variable of F is at most some tau choose some tau will be another constant some some large constant we depend on being like this and basically basically the idea is that now now we now we make this condition and if you think about this condition it's basically say if suppose we think of the pseudo-distribution as an unknown right so for every particular F we can compute this thing it's just a number and so this is basically and this thing is basically going to be just a linear equation on the pseudo-distribution so basically this is a linear constraint on what the pseudo-distribution should satisfy and we are going to put all these linear constraints on the pseudo-distribution and then and then it turns out that basically it doesn't it's essentially completely determines the pseudo-distribution and all we are left is to prove that something that satisfies all these linear constraints also satisfies this the PSD constraint yes so okay so we need in some sense we cannot allow functions that depend are eternally complicated way on G because think of the function that the FG of X is the constant function one if G has a large click and the constant function zero if G doesn't have a large click so this is a function that has the grade zero in X but large degree in G but it obviously can distinguish between the two cases so we somehow want to only allow a kind of simple functions simple functions that's ones that can be efficiently computed on G and definition is that basically we speak to those new functions in G but that's what's what somehow or what changes if I allow the power to be very large so if you allow power to be very large then you will not be able to you'll definitely not be able to satisfy it in some sense this will not make say any efficient so if you allow power to be very large then in particular it will allow this function right so think of FG of X equals one the largest click in G is at least n to the one quarter and zero are right right so this is degree zero in X degree some large degree in G so so so so this kind of like we want to make this requirement only for easy to compute right so so so right now we're making this requirement it doesn't make sense for us to make it for all edge because then it would not be computationally it will basically require the sort of distribution to be the actual the actual distribution so we only make it for for low degree these are clearly things that like low degree if it's low degree on G these are clearly things it's kind of a minimal requirement in some sense because clearly a polynomial time strategy for every FG that where G is where is this low degree polynomial in G that's correspond to a low degree to an efficient strategy so in sense we we want to we want to ensure that for all efficient strategies of these all efficient strategies in this this framework you cannot you cannot win by betting against you cannot win by betting against you know the sort of distribution but this tower that you will instantiate later yes yes women later it will be some kind of function of D one of the things that's there is kind of a long kind of a pretty wide range of power where not exactly matter what it means like it's it needs to be sufficiently larger than these so things of ESP and not too large so things don't completely break but between that it can it has kind of a long and so so we'll instantiate it later and in terms of this so we want this condition and now the way we'll achieve this condition is actually going to be a very simple so in terms of this we are going to do the following thing and so consider new planted x just you know the probability of gx on the plant distribution and what we're going to do is we're going to say mu is going to be simply so mu mu of gx is basically what going to be mu planted where we restrict to at most degree t d and tau as a polynomial in the variable of gx so this is this is the sort of distribution that we're going to use so we basically take this planted distribution we think of it as a polynomial we just want to take to its low low other moments and other monomials and we use that and a lot is that there is something kind of incredibly weird about what we are doing which is the following like mu planted is a distribution that gives zero zero probability for all for all graphs that are not in the for all for all graphs that are not in the support of the plant distribution and but our mu is going to give like basically is going to and not be able to distinguish between the planted distribution and random graphs so so it turns out that somehow to uncating thing basically to uncating to this low order monomials somehow loses this distinction so this is right so this is basically our definition in fact later we might slightly modify this thing because this will kind of somewhat satisfy the concerns but not exactly and and and in one claim so is claim mu satisfies star so basically when we have defined things in this way we are going to satisfy star and and the reason is the following we're going to look at mu right the the right hand side is like basically the left right so the right hand side is mu by the pseudo expectation here is right it's like let me know you should write it slowly so we say okay so right hand side in sum over all g sum over all gx mu gx f gx left hand side is sum over all gx mu planted or the way we think about it is that the right hand side is like mu f g and the left hand side is mu dot f g plus the high degree part mu high so mu is the low degree part of mu is the low degree part of the plant and then we arise the high degree part of f g but then we basically just use the fact that if you project to low degree polynomial then this is going to be the f g has at most degree power in g and at most degree x in and at most degree d in x and here every monomial has higher degree in one of them so this is going to be zero here because this is going to be orthogonal but this is the part mu high is the part that's kind of orthogonal to all these kind of polynomials okay but if you did we I mean one way you could also define simply I mean in some sense we could even just define mu to be the thing that satisfies these constraints but it's kind of very clean definition to do it this way okay so so it's now it's going to also be useful for us to you know expand this mu in the priority basis so that's right so so we can write the following thing so we define and this is the thing that's about the okay so so let's define if t is a subset of n inches two so t is a subset of potential edges then we define you know chi t of g to be the product of e to the t one minus two g e okay so basically chi t is a function that takes a graph and outputs either plus one or minus one where you know it's it takes the product on these edges for minus one if there is an edge and plus one if there isn't an edge okay and please tell me if I don't do it I'll try to keep it so if I'll try to use t for subsets of edges and s for subsets of vertices probably there is a better kind of notation to use completely different types of letters I'll think about it at some point but for today let's do that okay so and it's somehow we're always working here with functions that take these two kind of variables the variables at our edges like the graph variables that are vertices so try to keep in mind which is which right so if we think about right so now let's think about the problem let's look at expectation of xs this is as a function of g right this is basically right for every particular s for you know for every s which is a subset of vertices you know a small subset of vertices expectation of xs is a function of g right so we can write it and so we can write it as a polynomial we can write it as sum over t expectation of the Fourier coefficient which the right way to write it is so this is just just a notation but this is this is the coefficient that if you write it as a polynomial if you write this as a polynomial you know express it as a sum of these chi t's this is the coefficient corresponding to t okay so this is basically the Fourier representation of this thing and now you can actually compute this and this is like an exercise that the Fourier coefficient it's actually not that hard to compute okay so we kind of calculated things so that the coefficient corresponding to t you can you can prove that it's going to be basically you can prove that it's going to be roughly over the over end and over the over end to the power you know the v of t is the vertices that touch the edges t v and s so this is a calculation and we can do it in fact this doesn't matter much but it's somehow a little bit convenient we'll do it this way so this doesn't change by a lot it's kind of slightly different location and but it's that satisfies the hard constraints better and another way if you if you wanted the you can also invert the things and you know you can use instead of this being an exercise and this being the definition you can do it somehow the other way around you can make this the definition and and then prove the you know the claim from this definition okay so kind of choose your poison so if you want to be you can kind of make this the definition of a pseudo-distribution so let's now but but let's now just so just take this on and so so the heart of the whole thing okay so in some sense this pseudo-distribution was motivated by Bayesian considerations but at this point you can kind of forget how I came up with it if you simply want to see the proof of this theorem what you want me to basically show this main lemma and main lemma is the following if you define a mg to be the matrix which is like n to the d over 2 by n to the d over 2 such that mg for ij is the pseudo-expectation of x i union j for the graph d maybe i write it this way and then with high probability and is assuming you know the the right parameters right so there is when the right way to say it is that for every epsilon and d there exists the tau such that if omega is n to the half minus epsilon then okay so this is this is this is this is the this is the high this is basically what we we have to prove so so so the people are real that basically if we prove the main lemma then we have proven the theorem so so this is really the heart of the matter it's always the heart of the matter right proving this psds condition and i don't like let me say right now i don't like the proof that's in the paper and i feel like there should be a different proof i still don't know what's the right book proof and i'll tell you a little bit two possible approaches maybe they could be merged at some point how i think like a like a clean proof that's not like you know 50 pages should look like so it's one thing i would have loved to be able to to prove along the line so here is a somehow a green pool and so maybe we want to prove the following thing we want to prove that we want to basically say assume for a sake of contradiction that for most for many hg's there is the green at most d over two fully fg such that expectation g of fg squared so this is the green less than d over two in the variable zero then the hope would be to somehow do some kind of case analysis you'd say if you know if the green g of fg is less than tau then you kind of done the reason is the following if f this function has small degree in g then we know that this thing fg squared or less than d over t so if this thing had less less you know if this thing had degree it's one degree then we know that this expectation so so we somehow we get that if this is if this is negative then we'll get that somehow expectation of this happens for many g's then it will be the case that somehow if we take expectation of your g it still should be negative but then this switch with violet star because we know that in this distribution definitely this is an actual distribution so expectation of something squared here is non-negative so so somehow it should be like if this degree is small we should be able to conclude and otherwise maybe we would say that a kind of fg is kind of random and we would somehow argue that it's it kind of somehow it cannot be correlated with the large or the particular the egg space of this of this matrix that is particularly small so in some sense we would want to say it's kind of like a randomness versus structural partition if you have this name and that we would kind of want to say that we take this matrix and we would say that it's it's extreme eigenvalues the interesting eigenvalues correspond to low have eigenvectors that are low low degree polynomials in g and I think something like that might be true but I don't know how to put things this way so this is not how the proof goes and so this is the one important problem and now let me say about how the loop actually goes unfortunately it's more it's more complicated okay so I think at this point I need to basically have this this people with this point we mentioned this is definition and I need to put the main lemma okay and forget about everything else so basically we have a random matrix and yes sorry so yeah no you you you you're not immediately but you would hope to kind of show that a random vector will never cause you a problem here like that might be easy to kind of analyze to show that most that if there are negative eigenvalues there are kind of like a low-dimensional subspace or something like that so there is something special about them so you would need to do something but in some sense you could hope that so so basically now we need to prove this main lemma and let me kind of give you a right I have an idea of how the proof goes so okay so so basically we're basically trying to prove this is random matrix is a positive semi-definite and there are tons and tons of there are tons and tons of results on random matrix like if you you know there are books and books about random matrix theory so you might even think that at this point we should be done basically you know just find the book and just quote the right theorem and you know maybe maybe that's true I just haven't been able to find the book and let me tell you like the main kind of reason why at least most of the work on random matrix theory is not cannot be used directly in this case and the main issue is the following thing like m is basically a map from zero one and to the n choose two into n to the d by n to the d matrices so and we are thinking of it as a random variable we choose the the input at random so basically there are you know about n squared bits of randomness and about n to the d right to the 2d that minus k negative n to the d and n squared bits of randomness and n to the d bits of output in particular what it means is that the every entry in these entries of this matrix are not independent of each other there is not enough randomness from them to be independent so there and so in most of random matrix theory at least as far as I know is you know it can be arbitrary distributions of every entry but they typically assume that the entries are independent and these dependents can really screw things over yes yes from a random just do you think of it as a community random graph beautiful yes so at this point like at this point we kind of forgot about you know all our in some sense we forgot about how we came up with this distribution and all that and now we're just basically saying we want to say that with my probability when g is done uniformly if we define this matrix which is you know well some explicit matrix well defined as above then it's going to it's going to satisfy this so so so it seems like a very but so these dependencies can be very bad and the main thing that we use to try to still handle this is this notion of symmetry so Emma's this property that you know if you permute the graph right if you if you if you permute the graph then it induces a permutation on the entries of M right if you permute the vertices of the graph it induces a permutation on on the entries of M because I'm kind of depending the entry in symmetric ways right so so for every so some kind of a crucial property that we are going to use is symmetry that for every pi permutation of m basically m is ii and this is what I always think g is what m I think this is the right thing all the other matrix pi universe I think this is the right thing basically but basically I think you understand what I'm saying is that you know if you kind of say switch vertex 3 with vertex 17 in the graph then the matrix is basically obtained by taking every set where vertex 3 appears and and and switching it with vertex 17 and vice versa right so so this symmetry is actually extremely useful for us so in particular what it means is that if we think so so now let's speak of m as a polynomial right so let's speak of particular right so if we think of m as a polynomial like so m at the point ij it's some polynomial right some coefficients actually we wrote exactly these coefficients but just so but at the point at the moment we don't care about what is what is so what we know is in some sense if you look at this kind of a subset of edges right so we can think of t as a subset of so so for example let's just suppose that i i don't know that i is one or two and j is three four so let's just write this one two three four and suppose that for example t is something like i don't know this this is seven right so so by deceiving we kind of see that the coefficient coefficient of this t would be the same as the coefficient of this when instead of seven we put here thirty three right by symmetry this needs to be the same coefficient so the coefficient really doesn't depend on the you know the exact the coefficient for for t doesn't really depend on the exact edges that appear there but rather on the shape of these edges and so this is the the crucial thing is basically say a shape is basically a take a graph we got most say tau edges or something size of most tau where it has some labeled parts which we think of as in the left part and the right part and some edges between them and basically the so let's call this shape sigma is a labeled graph and basically m sigma is this matrix where where m sigma maybe i should call it like m sigma at the point ij is the sum over all t that has this shape sigma where the left is equal to i and the right part is equal to j of k i okay so this is the basic kind of this is basically m sigma is the basic is the basic symmetric semi-symmetric matrix valued polynomial that corresponds to the shape sigma and basically now we know but now we can see that now now we can say that we basically we know that our m because it is symmetric it's the sum over all sigma you know it's the sum sum over all sigma that are of size at most tau it's a real shame that my sigma and tau look so similar try to make them a little bit different of some alpha you know some coefficient depending on sigma m sigma right just because it's symmetric just because it's symmetric it means that it and because all the monomials are small it automatically means that you can expand it as a sum of these basic symmetric functions right so now if you think about it we have done here a pretty important thing because you know m is like this n to the d by you know n to the d by n to the d matrix etc and now basically this this function this run you know this crazy random random variable we have reduced it to a constant number vector vector of constant constant that size right it's just a constant number of parameters you know you basically have right so you have alpha you know alpha corresponding to this guy and this guy alpha corresponding to this guy this guy and so you know it's it's not a small number but it's a finite number of parameters and that completely characterizes m so this is another open problem maybe it's actually even easy to show in principle but i don't know is say show a finite computation just given a you know given alpha and even some you know alpha sigma for all sigma of size of most tau determines if m sum alpha sigma m sigma g is p s d in height you know or maybe in height or d or something like that or maybe determines the expectation of the lambda mean or i don't know or something some this is something what we really want to to know or show some simple sufficient condition on these alpha signals that implies this or more generally maybe show some way to compute the spectrum of some alpha sigma m sigma now in our case basically these alpha signals are not that hard to compute so this is actually like an exercise basically where is this coefficient so basically and so in our case it's basically alpha sigma is going to be basically omega over m to the number of vertices you see so that kind of comes comes out of this thing it's actually pretty straightforward right but for every sigma you're basically going to look at the number in terms of for every sigma this is going to correspond to the shape is going to come from s immunity the shape would be some graph from s immunity and and this is an exercise but but the point here is I mean especially this time it's not exactly to so I want to give you some intuition of how we we go from here okay so so now what we would want to prove is that this thing is psd and and and now unfortunately it's not super super easy it's actually like still quite complicated but let's start getting some intuition as much intuition I can give you like these 10 minutes so one case is like very simple case so suppose that sigma is this thing just a single edge this is left this is right so what is m sigma of g right so m sigma of g is basically and at the point ij right you just put it's just basically chi the edge ij right because at the point ij you basically put this graph so it's just so it's just chi ij so it's basically the and let's call it so basically this matrix is simply going to be minus one if there is an edge and plus one if there isn't an edge so let's call it the plus minus one ij estimated let's just call it g hat and if you look at m you know if you look at m corresponding to say two edges then it's going to be g hat tens of g hat and then basically one thing that's one thing that you should notice about this g hat is that it's these matrices are always going to be these m sigmas always going to have like plus minus one entries so if they're random the spectral norm is going to be square root the dimension and and these matrices are going to be like this matrix and this matrix both have this property that the spectral norm is square root of the dimension and and then there is this lemma which is somehow crucial and the following thing is suppose that sigma you know is this thing it has left and right let me not worry about these things where left and right intersect right now and and suppose that this sigma and both of these are of size t so suppose that sigma has t vertex disjoint paths from l to r then m sigma is less than all tilde square root of the dimension of m sigma the dimension this is a matrix of m to the t by m to the t so this is all tilde of m to the t of r so this this is in some sense the first result that one needs to prove on these things how do you prove it it's not that hard it's you know it's like a page or something in the you can find it in the paper maybe have a page basically what you do is you compute the expectation of the trace of you know m sigma m sigma transpose all that raised to some power l and this trace always gives you some kind of an upper bound on the certain norm if you take l big enough it's a decent upper bound and and basically what basically what it says is that this thing behaves like like in some sense you know if if sigma one way to prove it is basically to say that if you add edges to sigma it's not only reduces the spectral norm that's something that one needs to prove and and or at least it cannot change it cannot increase it by more than some kind of low and when you when we say sigma is exactly like basically just a bunch of perfects it's joint paths it's basically behaves as if it was like a matching and when it is a matching then we kind of know exactly what it's going to be so this is one thing we know and in some sense if all the sigmas we had to worry about were like this then we would be in very good shape the reason is in some sense if the all the sigmas we had to worry about were like this then sometimes the off diagonal part would have a spectral norm like a random graph and then we would have no issues at all because basically in our a roughly speaking in our moment matrix it looks something like that it's like n to the d over 2 by n to the d over 2 and these guys are like omega n to the d over 2 and this kind of off diagonal guys can be like omega over n to the d but there are many of them they are like n to the n to the d over n so if this off diagonal part behaves like a if this off diagonal part behaves something like a random matrix then we would kind of say this is basically like omega over n to the d over 2 plus some matrix plus you know omega over n to the d over 2 plus some matrix some matrix are such that the spectral norm of r is at most n to the d over 4 and then if you do the calculations then this is fine this will be positive but this will be like n to the d over 2 i plus this thing and then you know this will be pst as long as omega is much smaller than square root n so if everything was random then we would be completely fine but and everything was random would kind of correspond to the case that say the only the only shapes that get non-zero coefficients are shapes that have that have these these joint paths between them but this is not the case all shapes get non-zero coefficients and then what we want to do is we basically want to say the following thing and this is where things become more messy let me maybe write it here and just so basically we have some shape where there is no vertex it's possible by kind of many views t or f what it means is that there is a vertex separator so we basically have like say t on this left at some point we have to worry about intersection but let's not worry about it right now t on this left t on this right but there is a vertex separator there is some vertex separator there is some part in the middle where somehow of size d prime and where everything has to go to this obstacle and then we're kind of going to go the left most obstacle the right most obstacle so we're going to separate things into left right and middle and we're basically going to say this graph is actually a union of three graphs the left part the middle part and the right and the right part and the nice thing is that if it's a union of three graphs that are edges joint then you know chi corresponding to this t will be basically chi left, chi middle, chi right and the middle part is basically the kind of the part that we have basically already normal to handle and this left and right part basically what we're going to do is kind of sum over all over them and kind of treat them as a way to somehow factorize our matrix so this may be a little bit big but I hope it at least gives you some some some clue if you now go in with the paper maybe it will make more sense than it would have in other words okay so that's it