 So the focus of today is the following problem, which I don't know if there is a good name for it. So let me call it the Vanquan vector problem, some subspace w or subspace rm squared. The goal is to distinguish in two cases, one is the case that there is a Vanquan matrix, and we are going to focus for today on the symmetric case. So there is a vector such that v v transpose is in w, and the other case is that for every v, you say m in w, the distance of v v, v v transpose from m is at least epsilon. So this is the problem that we are going to be focused on solving, and the result we will talk about and this is if our 3 m writing style and hopefully it will be, we hopefully will appear in stock 17, God willing, is 2 to the whole tinder square and algorithm for this problem, and it's based on the sum of squares, and it's based on the sum of squares hierarchy and we'll see how it goes. And I think this is kind of a natural question in its own right, so basically finding a Vanquan solution to a set of linear equations, I think it's somewhat natural maybe to people like us, maybe probably not very natural for the global population, but I think it's a somewhat clean problem, but if you don't care about it, let me tell you why if you care about quantum information theory, you might care about that, and not only be kind of brief about it, this is somewhat of a repetition of what we said last week, so it's also appeared in the paper, but let me somewhat just briefly sketch the connection to quantum, so basically one thing that people are trying to do when they are, one thing that people are trying to do in the context of quantum information is they have some system and they want to prove that the system has entanglement in it, so how do they do that? One thing is that you know, this system has like two parts and you make some measurement, so you apply some measurement to, so you apply some measurement to the state and you get some very, and you succeed with some probability, let's say you succeed with probability one, nothing is really one, but let's say you think of it as one or so, like one minus one, this little is very little, so you succeed with probability one, you want this thing many, many times and you say okay, this state passes this measurement, and now we want to say that the fact that it passes this measurement certifies that the system contains entanglement in it, so what you want to do is you want to certify that for every separable non entangled row the trace of M row is at most say one minus some epsilon, so and what does it mean that the state is separable? Row is separable, that means that it's a convex combination of something that looks like u, u star, v, v star, or in the real case if we are in a symmetric real case then it's just expectation of v tensor to the form, where v is unit vector, so you want to certify that this is the case, so generally one problem is that we don't necessarily have an efficient algorithm to look at the measurement and say if that's the case or not, and this is kind of the best first algorithm that beats bootstores, so we really want to know how well you can do, and there are results of how and what and how, deeply saying that you know if you assume exponential time hypothesis, which is a reasonable assumption, the running time needs to be at least n to the omega of log n, and with the running time, the true running time is closer to this quote n or to n to the log n, it's a super interesting problem, and at the very end of today's lecture I'll talk about what kind of directions we could look at if we wanted to improve things further, and I think that's a great open problem, I don't know how easy or how it might be, yes, why are they equivalent, more equivalent, so think of this as being one, so think of w, so m is an n squared by n squared matrix, it's a measurement matrix, so it's between 0 and i, so we can think of it as some w i, so where lambda and then w i, w i is the projectors to the eigen space corresponding to lambda i, and basically if we just define w to be w1, or you know w of lambda i, so w of 1, like the projector to the eigen space corresponding to the eigen value 1, then you see that if there is a separable state, if there is a separable state that achieves probability 1, that means that there is a vector v, we are like saying in the symmetric case there is a vector v such that all its mass is completely inside this eigen space w, and if not then every vector v has at least some epsilon mass outside it, yes that's a good question, yes so that's why these problems are basically the same, and let me already say here is an open problem of the week, right now, so let's say open problem of the week, imperfect completeness, so that would basically say handle the case where say the projection is say distinguish between this case that say given something like that would be interesting, say distinguish between the case that the projection is 99% and the projection is 1%, and I'll mention something maybe towards the end that to some extent right now our result would basically cannot really handle the 99%, it can handle maybe 1-1 over n or something like that, not 99%, and I'll mention maybe at the end a little bit about some directions to that, to some extent you can say that if you're familiar with some results in mathematics like van der Waalden theorem versus Semmerer's theorem or you have like coloring versus density statement, so in some sense you could say that we are in a situation where we have the corresponding coloring statement but we don't have the corresponding density statement, if these things didn't make any sense to you maybe at the very end of the lecture I'll explain it more and it will make a little bit more sense, but yeah so this is this is open problem of the week and the other problem I think so open problem of the month is you know two to the n to the one third and maybe of the year is you know n to the polylog N, so and again I'll not talk about it too much, it's in the paper the way if you talk to quantum, I guess particular type of quantum people, so I think there are like two families of quantum information theories, one that you know come from the more quantum side and then they call this problem they will talk about entanglement witnesses and these are the people that actually maybe look at actual real quantum states and want to find out if they're entanglement and then there are like people from the more computer science side which I think they will this the way they will take they will define this problem is they'll ask if QMA2 is inside X and that would be I think the phrasing like of the more CSE type quantum but we are going to actually so in general quantum quantum mechanics is reversible so you can't really erase information but let's try to imagine that you could so erase everything quantum we forget about quantum from this point on we're going to focus on this problem everything is real symmetric mostly because of notation I should mention that even you you might think that the sum of squares inherently is only over the wheels but it's not really because you can define some squares over the complex you know you use the fact that instead of you replace x squared by you know xxr being non-negative so you can define some squares over the complex so basically everything that I'm going to say will generalize to the complex case will also generalize with the asymmetric case it's just a little bit more pain okay so basically the problem that I'm going to focus on is this problem over there we are given a subspace in n-square dimensional space we want to know if it contains a one one matrix or everyone one matrix is far from it either is kind of an obvious two to the n-type algorithm you make an epsilon net over all n-dimensional vectors you just try them all and see what happens so we're trying to do something better it's kind of significantly okay so okay so we have basically gone over steps one two and three in the plan okay so now let me describe the algorithm okay so the right the input is the subspace of power n-squared again the way I when I say input is a subspace I mean you get like some basis to the subspace and the operation do the following thing we uh we let k we're going to let k be all tilde of square root n some value that we'll choose later and we're going to run sos to get and a mu over the sphere the n minus one dimension sphere it's like v in our n v is one satisfies this constraint that v transpose is in w and obviously if we don't get something like that we can you know output fail we guarantee that by the sos algorithm that if there was actually if we were in the first case then there would be an actual distribution satisfying it in particular there would be a pseudo distribution for mk so so if we don't get something like that we know for sure we are not in the first case so we can basically output second case and be done with it now if we do get this into distribution like that what we want to do is find some vector say little w that has decent projection into such that little w the little w transpose has decent projection into the subspace capital w okay so what we are going to do number two is and this is kind of the heart of the matter is going we're going to you know appeal to a structure theorem which we'll describe shortly and and that structure theorem will give us we're going to get some polynomial p of degree so at most k minus two such that the expectation matrix p of x may p of v v v transpose is roughly one one what does it mean that it's roughly one one so that this this minus v zero v zero transpose is less than epsilon v zero v zero transpose if you think about it it's maybe some seems somewhat strange a priori you have a distribution over one one things there is no reason why the expectation matrix should be one one right typically that's not the case but but if it is the case then we're very happy right we just output v zero and we just output v zero yes so v is the distribution over over v right so maybe think of it v comes the sort of distribution goes over v we weigh this distribution by p of v and then there really exists some v zero so the the structure theorem tells us that there exists a polynomial p and such that v zero v zero will actually and some v zero actually the way we will do it is we rescale things so that we're going to rescale things such that by a constant so that expectation of mu pv is one and then v zero is just expectation of mu pv so these are something to get out what these are something to get out yes yes so in such as what in such as this is simply the expectation maybe another way to write it is which is actually how we are going to define it so basically we're going to say if we define mu prime to be p of p of v mu of v so define this and the way we scaled it mu prime is also a pseudo distribution then then the way we can write it is the expectation of a mu prime v transpose minus expectation of a mu prime v epsilon right and then we and then you can write p zero which is just expected i didn't understand exactly what m was in the very first section here and mu of mu yes okay sorry yes so mu is a pseudo distribution of the greek k yes yes because previously you called m yes sorry yeah that's yes so so mu is a k pseudo distribution so the greek k pseudo distribution so we're going to write in time into the k roughly and and we're going to output this pseudo distribution so so the heart of the matter with this kind of structural theory which a priori you might think is just a little fishy in a sense that typically think about it you rank being say rank one is a very non-convexed things right you take combination of two things that are rank one it's going to be rank two so so if you have a distribution a priori you might think that it's not going to be if it's a distribution of many rank one things if you take the expectation of it it's not going to be rank one but somehow the point would be that the way we want to think about it is that we're going to have a this distribution b there is going to be like every every guy in the support of the distribution is going to be you know that a small vector and this is v zero so every v will be equal something like you know v will equal v zero plus v prime where v zero might be very very small this is kind of the expectation vector but the point is that it's always in the same direction and this v prime is going to cancel out so so basically what what could happen is that that's what we kind of will happen is that you're going to look at expectation of v zero plus v prime v zero plus v prime transpose that's going to be you know v zero v zero transpose v prime so v zero is the expectation v prime is the rest and think of the rest as being much bigger than the expectation but it has been zero so because it has been zero these cross terms cancel out so you get expectation v prime v prime transpose so so the point would be that maybe what we could hope is that these cancellations will be so big so they will be so significant that that even though this is not a unit vector the even though every vector v prime is individually might much bigger than v zero when you take the average you get something that is of a forbidden snow much smaller than the forbidden snow of this yes sorry that's not missed this but you say we're running SOS yes what what are we optimizing we're asking for SOS with constraints so you could say one way to think about it is you're minimizing the projection to w perp of v v prime transpose minimizing over v one but the minimum will be zero so you the minimum is more than zero so projection of d squared right so this is a kind of a degree whatever fork or an omnia or something so but but in other ways the minimum would be zero so in substance you the way you cannot imagine is that you have a distribution that is supported only on vector on vectors v that has that satisfy v v transpose in that way yes for the soundness so so basically what I want to say is the following ever this you know if this doesn't work basically what the structural theorem will tell us that if if we find a sort of distribution like that we are guaranteed that v zero is going to be good I mean because v zero okay so let me say why is this going to be good so let me let me give the analysis so the analysis is very simple if we have this condition stuff so the analysis okay let's let's pretend that let's let's start by pretending that this is an actual distribution so if it's an actual distribution then we get that expectation of v v transpose is in expectation right then we get that expectation of p of v v v transpose doesn't really matter what p of v is is is always in w right it's a linear combination of things that are in w so it's always going to be in w and so we get so basically what we get is that there exists some matrix w some matrix let's call it m in w such that m minus v zero v zero transpose is smaller than epsilon v zero v zero transpose which is exactly shows that you are not in the in the previous case right so so so this if it's an actual distribution now let's suppose it's not an actual distribution it's a it's a pseudo-distribution we what we can say is the following so when it's a pseudo-distribution what we know is that okay so maybe if I only had colors let's let me put you like what we will say in a pseudo-distribution in pseudo-distribution what we know we know that expectation of a this p by the way let me again okay maybe that might okay so let's so so what we know is that expectation of a a pseudo expectation of projection to w perp v v transpose squared is zero right that's the condition that we have and so by Cauchy Schwartz we can we know that expectation of p p of x this thing is smaller than expectation of p of x so basically so basically we can so basically what we can say is we can we can argue using Cauchy Schwartz that you know because this is basically always a positive polynomial and its expectation is zero that when you hit it with any kind of polynomial you're still going to get that you're still going to be at zero and I maybe maybe I need to also assume this which somehow it comes automatically if you assume that this thing is squared is zero this thing to the focus all what also zero so then you bound this the expectation of this polynomial by by this squared which is zero so the bottom line is that you can argue that that the projection to w perp the norm of the expectation of the projection to w perp v v transpose is going to be less than the norm and this is with pseudo expectations now it's going to be less or this squared is going to be less than the pseudo expectation by this is also under mu prime okay so the bottom line is that also if you get the pseudo expectation also in the pseudo distribution case this matrix if you look at this matrix pseudo expectation of under mu prime of v v transpose it's going to be inside the subspace that is because for this to make sense okay the structure still will require that and so it will give us a polynomial of that degree and right so so the structure theorem with us that we it requires this degree and we see in the structure even why this comes up okay so I think I may have made it a little bit even more complicated than it needs to be because it's very actually very simple right so the algorithm itself is very simple you basically just say that you're going to have somehow all these cancellations that that basically that you are going to have this condition that the expectation matrix is roughly one one and then here and here yes and this one okay and to bound this you could say this is basically in this case why why is this zero the reason it's zero because expectation of this will be basically like expectation of pure v then you could do some koshish files this is and then you could do some koshish files so and again for an actual distribution this is kind of obvious if you have a distribution over things that are in the subspace if you take no matter how you weigh the probabilities it's still distribution of the things that are in the subspace so okay so so basically the whole thing boils down the structure theory which I'm going to phrase now so I'm going to phrase it both in the the version of it for uh so the distributions and the version of it for actual distributions distribution and one of the interesting things about this maybe algorithm which I'm not sure maybe we've already seen it I'm not sure but sometimes it's the case that you have results that are like super trivial for actual distributions and there are hard things to show that they extend for a pseudo distribution this is a case where this structural theorem is actually interesting for actual distributions as well in fact for actual distribution it's it's a generalization of Lovett's result for the low band conjecture which was like the first advance in this low band conjecture like a more significant advance in like in 30 years or something like that so so it's it's it's the case that it's actually like a non-trivial I don't think it's the only case like that but it's one of those examples where the thing is non-trivial even for actual distributions and in some sense the heart of the matter is proving it for actual distribution and extending to pseudo distribution is not that much or a big deal so when you think about this theorem that you should probably you should pretend that it's an actual distribution because that's where the technical core and the creative core is but let me start by stating it for actual distribution for pseudo distributions for pseudo distributions the way I would say it is the following for every there is some k which is going to be a all tilde of square root n the all tilde things are going to basically include poly terms that are polynomial in n and polynomial in one over epsilon and so there is a k like that such that for every k plus two pseudo distribution mu there exists a polynomial p of the degree at most k this is just a normalization condition such that set mu prime defined it to be you know mu p so we will sometimes call this like a k round okay degree weighing right because basically the way you think about it is you rewind the probabilities it's actually going to be an actual yeah it's going to be an sos polynomial so it is it is actually like a proper way so so you rewind the probabilities by by p and it's kind of clear and maybe maybe should I have made it an exercise probably if I didn't read the lecture not that would have made an exercise think of it as exercise show that this mu prime is at least the degree to pseudo distribution or whatever in in general if you take a pseudo distribution you a of the degree l you reweight by a polynomial of the degree k you will left with a pseudo distribution of the degree at least l minus k kind of an easy exercise actually so so so if we define mu prime as follows then we get that expectation of mu prime v v transpose minus v zero v zero transpose at most epsilon v zero v zero transpose when v zero is the expected okay so this is exactly what we need right so we'll see the we'll see the proof we'll see the proof and and and the proof will also give you the proof will actually actually give you away it will be constructing because it will give you a way to and of course just to solve the distinguishing the question you could have you know the algorithm could have stopped somehow in step one and say you know if there is a pseudo distribution output yes there isn't one output no and that would be enough but we actually can find it typically you kind of want to actually find it so so let me say here is the version for actual distributions basically what we would say is the following we call this star the way for for actual distributions what we would say is that you know for every right but what is mu mu is in this case let's say u is this some distribution over the unit sphere like this I mean for this facture theory we don't care that these things are in w that's not it's a general theorem about any distribution of an unit sphere you can always make it so that the expectation matrix is approximately right one if you're willing to spend let's go and and for the actual distribution the way you would say it is the following for every distribution mu again over the unit sphere there exists a mu prime with k divergence mu prime mu at most k satisfies the same thing stuff so for actual distributions what seems to be natural is to us is to replace this condition that you reweight by the degree polynomial the condition that you can reweight the distribution you know it's an arbitrary distribution you can reweight the probabilities by anything you want but but you shouldn't but on average you shouldn't touch the probability by a factor of at most two to the k more than two to the k right so k divergence and talk about it already in this class and I mean doesn't have to remind us then you know k divergence of mu prime mu is the expectation of a mu prime of a log mu prime of v divided by log mu so so generally the way you think about it you take one probability distribution and now you reweight probabilities make some of them larger some of them smaller as long as the the average factor by which you change things is at most two to the k then you have k k divergence and sometimes you could also think about it in terms of a entropy or a information somehow you look at at most k bits of information about every vector and use that to change the probability so kind of the information between new prime and new k bits so so yeah this is kind of like what so the degree of the polynomial in the k is the same thing and in another and somehow if you look at it in kind of the Bayesian framework what you could say is that new prime represents the result of learning k bits about the unknown solution so you had so this distribution you represented your uncertainty about the solution and you went and asked for k bits and now you know more and now you can use that yes oh yeah right you're right that's yes you're right that actually makes more sense right what I said right log of the quotient right so so if on average you're like the way you get new prime is by you know on average you shift the probability by most two to the k then you're and let me say you like the aversion which is a version for flat distribution which might be the easiest to think about and captures the the heart of the matter so the version for flat distribution is the following if suppose that mu and you know it basically says the following for every mu now think of me as a set v1 vn some subset of the unit sphere so there exists an i such that i is not too small i is at least two to the minus k times n so it's a subset of the indices such that if you look at the expectation of i in i v i v i transpose you know it satisfies style okay so one one one version like for a single view and new parameters flat distribution the rest uniform over some set of vectors then and so you can think of that basically mu is uniform over some some set of capital n vectors where capital n could be very could be huge and the new prime the k divergence basically means that new prime is a distribution of a set that has fracture at least two to the minus k inside yes it's the same so and so so it's the same it's always the same k k is all these are all these are square of them yes so all the theorems basically say the same thing and the way we would do it is in some sense we would kind of prove this version and then show how we kind of we'll think about this version and our proof we'll kind of maybe get this version and we'll show how it extends to this version so yes yes yes the new flicker thing almost like gave me the 30-bit solution and then I reweighted the bits like did they need to become very small right or the reverse becomes roughly one one so so so another way to say it is maybe maybe the thing that says given arbitrary distribution on you know the sphere n minus one dimension you can all deal those quote and the divergence and and get distribution so that the top eigenvalue expectation v v transpose dominates the sum okay so this is another way to say it so you can reweight things so that now if you look at the variance matrix the the top eigenvalue the sum of dominates all the ways the variance matrix is the expectation deviation of minus other kind of yes yeah the v1 kind of like you could say the expectation yes so you could say that yes the variance is small yes the variance is small compared to the expectation okay is it somehow key that your only thing is very small compared to the expectation I don't know maybe it's just not that you can you can start with an arbitrary distribution and reweight it by a small amount of variance very small like you killed a lot then so no so so in some sense what you want to say like one intuition could be that you kind of want to say that you have maybe kind of from win-win situation where you would say okay there are two kind of cases one case is whether the original distribution was very concentrated on some you know small subset of direction etc then I you know I learned you know scrolled a bit of information I could completely fix it the other case is maybe the other the one case maybe the distribution was completely uniform think of the distribution has been completely a random completely random vector let's say maybe that's actually like a good example to think about so here is an example make make make the life very easy think that v the original mu is uniform over plus minus one over square root n to the n right these are unit vectors think of v as being a uniform over these things so what we'll do is now define mu prime to be the distribution where you basically fixed plus one over square root n plus one over square root n for the first I don't know hundred square root n coordinates and the rest are fixed as before okay so so we have this uniform distribution so mu prime you only fixed square root n coordinates so think about it you didn't fix a lot right you only fixed very small part of this thing but now what happens is that p0 in this case will basically be right plus one over square root n plus one over square root n and the rest of the coordinates are zero so the norm of p0 say squared is basically k over n roughly I guess hundred over square root n right that's the norm of p0 and now what's the norm of the what's what's the rest the rest is kind of the the rest is basically this variance matrix of the remaining part the remaining part if you look at the expectation of like v prime v prime transpose then basically the way we look is basically will be like the you know this is essentially n coordinates right so it's basically will be like one over n times the identity the covariance matrix here will be basically one over n times identity there one way to see the right I mean you can just compute it right the expectation of ij if i and j is different from it is not the same then expectation is zero and if i is equal to j the expectation is one over n so you so and the norm of this thing is one over n squared the norm squared of this thing so by the norm squared of v v prime one over n squared n times one over n squared get this one I typically get these things wrong so be like sorry sorry even it's one over n squared times the identity thing so times n so the norm squared of this is one over n and or and so what it means is that the norm of this thing the norm of v v prime transpose is one over square root n so you see that the norm of the variance is one over square root n the norm of the v zero is hundred over square root n v zero dominates so this is kind of a good example to keep in mind so so and this is also demonstrate what I said before we actually fixed a very small part we actually fixed a very small part of the of the coordinate space so if you look at every individual vector most of its mass comes from the part that we didn't fix but overall when you take expectation that this large part is going to be cancelled out unless it is not your theorem so it turns out that we can always do that okay so I think that might be like a good point to take a break so but somehow I hope that you know I'll give you an extra long break of like 10 full minutes so I hope try to ponder this example because in some sense if you understand this example you kind of understand what the structure theorem is all about so make sure that you understand this example if you don't ask me questions after the break and then we'll go on to putting it so so so one way kind of to if we cannot take a big picture that is a little bit like this algorithm you're kind of saying that ignorance is bliss to some extent and which is the following thing I think you think of this new as really kind of capturing your uncertainty about the actual disability and you basically say the following thing you have two options that like mu mu it's even as you know it has like low entropy and then basically you kind of fix let's let's even imagine this kind of case situation where it's some distribution over plus minus one over square root n and basically it's going to say well we're going to fix square root n coordinates and it's going to be completely fixed so then we are very happy and the other case is that it has high entropy and then we basically want to say we are going to use the structure of this problem and then let's kind of say if the reason because this problem is really has this property that if you take vp transpose and you know that it's in the subspace and you know v prime v prime transpose is also in the subspace the linear combinations are also in the subspace then then you have this ability to average out all these solutions so you kind of take a bad thing basically the distribution mu having high entropy is a bad thing it's basically corresponds to your ignorance it tells it says that you you have no idea where the solution is and you take this bad thing and you turn it into a good thing you say okay i can i have this thought huge uncertainty about the distribution i pretend it's an actual distribution i can average things out get and get a different solution that's in the subspace so it's it's somewhat weird but nice things that it works at least it seems to work if you find the back you know and so and so and and and the high entropy case so it's typically the more challenging case so typically if you think about if you try to analyze an algorithm in this kind of sub squares framework where you think you're kind of given these distribution structures your uncertainty about about the solution you think of the high entropy case and indeed for the for for example for hard problems like 3SAT that's basically the case where this say this Gregorian distribution that we built was very seemed like a very high entropy distribution and it was basically we couldn't kind of get off the ground but we given this distribution we don't really know what to do it because it basically the problem has this structure where it can have this bunch of this huge number exponential number of solutions say solutions to a SAT equations or new solution to a 3XO equations which which have this they're completely random and we really don't know what to do with them but here we have this this problem has this nice structure that if we get two solutions we can do something with them like the average of them needs to be also has this kind of structure and that's something that's useful so that that was kind of the big thing yes so in the actual in the actual distribution case essentially flat case we never know or high you can always assume that things are flat in the actual distribution case no there's no really problem like the heart of the matter is the flat distribution in ever case but yeah indeed in the low rank setting people always think of just flat distribution so let me now kind of explain why what this has to do with this law right so indeed i'm going to assume so in some sense especially when you're talking about you can think of if a distribution is always a flat distribution over a multiset right and so so that's almost basically we've outloss of generality can think of so so now we are going to talk about an actual distribution case we think that mu is uniform over 1 till v capital N the subset of this and I mentioned now the point is really that capital N could be huge possibly consider you know i can think of it as almost infinite like it's we have no control meaning in actual because of epsilon net consideration it's probably going to be at most exponential in little n but but think of it as a completely huge number and and now basically our goal is is so so let's even think about our goal is to find some subset i capital N such that if we define this matrix M which is expectation over i in small i and and that is not to be too small i is at least two to the minus of the index and middle and capital N so that's this thing if you define this matrix v i v i transpose then we're going to get that lambda lambda 1 M squared it's going to be bigger than related in L2 now the way I should comment that the whole thing will generalize and the square root the reason for the square root and it really comes from the dominating in two no and if you wanted to dominate in five no you would do it in the end of the one over five or something like that but we don't know what you see at the moment but it's possible that we could do that but anyway that was just a paracentric amount so this is what we want to do okay we want to find a subset of capital N that that satisfies this property that the first taken value will dominate the rest of them and now notice that this is scale three doesn't really matter if you put your expectation or some right so basically we can think of this matrix right so this matrix M M is the matrix the way it looks like it's the matrix you can think of it it's this matrix right v capital N but this matrix M has the same spectrum as the the matrix I don't know what to call it let's call it M prime which is the the one where you do things in the other way around you look at v1 v capital N so now this is this is an N by N matrix small n by small n matrix this is a capital N by capital N matrix but the beauty of linear algebra is that they have the same spectrum why do they have the same spectrum the reason is that they look like basically they involve the same vectors and if you look at the rank of M prime then the rank is only little n right so you know if two matrices have the same kind of in both the same vectors they have the same number of eigenvalues surely the eigenvalues are the same right so so so that's basically proves it so yes so so you can trust me on that yeah it's actually not so hard to show that these things have the same yeah if you want to be more serious one way to I mean you could do calculations and show that they have the exact same eigenvalues one one reason why this box is the fact that this is a nice fact that by trace of ab equals trace of ba and this shows that basically these matrices are the same trace and also all the powers of the trace and if you have the same place and all the powers of the trace you must have the same okay so so so so basically so so so basically we could ask yes yes yes yes yes there were like various ways to show the same things so so yeah it's a it's a basic factor in the other body these are the same spectrum and so so so now let's look at the matrix m prime okay this matrix m prime is an n by n matrix by the point i and j you have the dot product of v i v j right and now what we are asking is to find a subset i so capital i so that this part this this part of the matrix will which will correspond to this thing okay i guess i kind of use the area i use the m for the thing that's here maybe i should call this this is mi so this is mi this would be m prime i right the restriction of m prime to lie and so what we want is for this to be approximately one point so now we basically what we can now phrase our question is the following thing given say an arbitrary rank and a little n matrix can you find a sub matrix of measure at least two to the minus or till the square n that that has that has an approximately one one and in fact for our applications we are fine with the we are fine with the restricted case where it's not an arbitrary one one n matrix but let's think that would be that the symmetric where all the vectors have the same magnitude it turns out that these things are not really important to this case so but but basically the so but they do make like mutation easier so so basically this is this is now our question we want to find a sub matrix that is approximately one one one one so so this is basically where we start seeing similarity to the log run conjecture so this is a so the log run conjecture it comes from a work of lobus and sacks 80 something something remember and the version the way i'll state it is from 90 something seven approximation the way i'll state it is from like it's a what it's time to figure some and if you if you want to see the original statement of the log run conjecture in the communication complexity setting then i think mika bus is going to give a lecture in the reading group today at 130 and i think he's going to talk also about the lower conjecture is going to talk about the communication complexity setting i'm not okay i'm not going to talk about it in the way it's originally first i'm going to talk about it in the way that was shown equivalent to any sun figures on and what they showed is the following here is the conjecture for every for every n times n matrix let's call it plus minus one plus minus one squared of one little n and there exists some i a subset of n size at least two to the minus poly log capital n such that so basically what they said is you take an arbitrary matrix m m you can always find which is boolean and small rank you can always find a small rectangle where everything is a plus ones or maybe everything is blood and monochromatic is obviously also rank one so so we can make this rank one it's also the other direction as well because if you have a rank one may a rectangle a rank one sub matrix in a one one one sub matrix which is boolean then you can restrict yourself to half the rows or after columns and you're going to get something that's monochromatic so it's basically the same thing that monochromatic for that one so i should say that the way the typical phrase m doesn't have to be symmetric so i'm thinking of m a symmetric and then i'm thinking of i as the same guy here and here the the way that phrase it is usually there is a subset i here and a subset j here doesn't really matter so much let's just ignore it for and look at this thing so so basically the the log n conjecture seems kind of pretty relevant to what we're doing it makes a stronger assumption it makes a story it makes the assumption that the matrix is actually boolean and it gets a stronger conclusion which is that the sub matrix is exactly one one what we are looking for is relaxing the assumption don't we don't force the matrix to be boolean and and and and get a relaxed conclusion which is which is that is only approximating one one and and because the log one conjecture is not yet proven what we hope is to use so this is a conjecture what is a theorem which is at least two to the minus o tilde of scrolled and exactly so and and what we are using we're basically going to use is the proof of flow rate theorem and actually a version of the proof that's by what was to get a to show that the same proof or it's actually not the same proof like some extension of this proof can can lead to to end to the result that we are looking for and I should mention that he's an african of interesting open problem is a Gavinsky and Lovett show that this is also equivalent to something like one minus one over eight and approximate one one so basically so they basically show that you can log one conjecture that don't have to be exactly monochromatic or exactly one one it's enough to be one over you know one I'm not sure is it hey it's some constant and some constant it's not that you like you know the majority of you one one up to like almost all of the entries and in our setting we care about one minus epsilon and whether the one minus epsilon version what kind of implication does it have to communication complexity I think it's still open interesting question yes yes so I think somehow this is related like these things somehow related to to this thing but the notion of approximation we kind of interested in the notion of approximation that might be weaker than the one that they're looking at and but it's I think it's still kind of not known exactly the relation so so basically what we are going to do is and what we are going to do is going to go over so so we're going to show how basically you can we're going to prove that our statement but basically following the pretty closely this day statement and maybe I'll say a little bit about how this statement is proven and and to people that actually I think walked on the low ground conjecture what we're doing is a little bit the suck religious because somehow you know dropping the plus minus one condition from the low ground conjecture is like I don't know like dropping the free trade from the platform to a public party it's like it seems inherent to the problem so so I mean the lower conjecture and the way it's typically not about is that it's all about it's all about Boolean matrices and the idea is that if you want the Boolean matrix to be low rank then really the only way you get it is by basically cobbling up kind of this joint the way you get the way we'll get a Boolean matrix to have small market by somehow kind of cobbling up kind of this joint rectangle and you cannot really enjoy constellations and keep the matrix being Boolean and somehow some of what we show is that you can you can get some statements that you don't talk about Boolean and there are two ways to look at it I think the half empty glass and the half full glass so the half empty glass would say you know no ground conjecture is really about Booleanity and if you manage to show that such a result from non about non Boolean vectors it means that the the square root n thing is not really about that that somehow the interesting regime of low ground conjecture starts beyond square root n and that means that probably like it's a negative result about these techniques they hold even for non Boolean matrices so obviously they are not relevant to the real low ground conjecture and the half full empty glass is that maybe the low ground conjecture is a special case of a more general statement about low ground matrices in general and there could be even stronger statements that would actually imply and imply our results as well and speculate at the very end what such statements might look like yes so okay given what you just said you can't know the answer to this question but do you think that there's any hope of proving your results just taking love that other black box did you see any possible way of doing that so yeah not really because like this plus one I mean plus minus one is a very strong assumption so like taking it as a black box typically you need to you right so we need to kind of open the black box in two ways first of all we need the proof to be sos so but but even assuming that you are the sos proof of the low ground statement the fact that it assumes this Booleanity is like very something that doesn't hold in our case so we kind of need to to to open it also in that way and it did be or not even choosing I mean the proof is more complicated than the low bed spoof we it's kind of you can think of it as iterating the bed spoof so we'll see that or more what was spoof really off of its bound so so just maybe before we kind of go into so so so now focus is going to be the proof the structure theorem at least in the actual distribution case but before I do that let's just think about about the other thing which is like the algorithm the way that the final algorithm looks like which is somewhat strange if you think about it right so so so easy if you think about the final algorithm then what it does is basically you know it you know it you know dreams of this mu right doesn't really exist this pseudo distribution mu which you know over these huge seven vectors it then builds you know the matrix that doesn't exist like this huge in pseudo matrix imaginary matrix that where you had v1 till v capital N v1 v1 capital N and then you and then basically finds like this vector and get inside this matrix and you know extracts solution so so so so the other you kind of inducing right because you kind of have this you kind of really imagine that this is the distribution the actual distribution and then there is an actual matrix and you find the rectum here inside this matrix and then and now after you found this kind of strange description maybe you can find like a border ret description of the algorithm that doesn't involve any kind of dreaming etc but but it is kind of a way to potentially discover i mean at least that's the way we kind of discover this algorithm and and and it does show how thinking about things in this way could maybe allow you to think of algorithms that might not look natural if you think about them in other ways so i think it kind of demonstrates this notion of pseudo distributions and pretending that distribution and seeing where that leads so okay so i think so by now i think we have done one two six and now we are going to do seven and eight and nine so so i think i can erase this plan so so right now what we want to do is basically prove prove the square root and statement but just sketch the proof and we are fine in our setting we want to prove this statement we are fine in density and we are fine with being say 99 1 1 we don't really need the 99 per cent monochromatic so so this is the what was proof let's assume that m is n by n plus one and let's scale things more similar to what we are used to so scale it to 1 plus minus 1 over square root n v i v j i mean our n going to be that minus all of square root such that expectation is going to be closer to the plus one okay so this is this is this is this is what we want to provide that there is a nearly monochromatic a record and step one and you can assume v j you can assume that it's at least at most like in our case we in in our case we assumed it's uh there are unit vectors but not bullion in the general case it turns out you can assume that they are at most one or one or something like that it's some something to do with the johnson elixory or something like that in our case kind of this assumption is not even we are of course not into our behavior so linearity but our vectors will be unit vectors but it turns out that this is kind of our loss of generality so let's even assume that they are exactly one for simplicity you know all the v all the vectors all the v1 tpn and then and then what we're going to do is we're going to pick we're going to pick g to be a say a Gaussian in so we're going to pick a pick a Gaussian a random Gaussian so so what does it mean it means that g is g1 in g small n then g i in random Gaussian variance 1 over n and we've and defines 1 over n and we've mean 0 and we are going to define i to be little i such that bi g is at least a quarter standard deviations n to the one quarter standard deviations okay so so this is going to the probability that your n to the one quarter standard deviations is like 2 to the minus square root n right if you're picking a Gaussian so let's just so so the size of i is going to be roughly you know 2 to the minus or square root n capital n now let's just figure what the standard deviation is so the standard deviation right so the variance expectation of a vi squared is going to be 1 over n vi squared right so and so the vi g is going to be distributed like a normal with a squared over n so this basically standard deviation is 1 over square root n so this is basically the basically i such that vi g is at least this 100 over n to the one quarter they get this right so now now we're going to look at now we want to look at expectation over i and j okay so now we're going to look at basically okay so now i think what we want to look is the following thing suppose i what's the difference what suppose vi suppose vi vj is equal to minus 1 over 1 over minus 1 over square root n versus vi vj equals to plus 1 over square root n now we want to understand what in what case what's the probability that both i and j are in and what's the probability that both i and j are in i in this case okay so so we want to compute so so what what we hope is that basically when we do this thing we if vi vj say had a let's assume things were balanced before because otherwise it was nearly normal but it was already nearly monochromatic to begin with so so suppose it was like half half before and now what we want to say is that if vi and vj are the positive product then they are much more likely to be now together in i and they are negative product and therefore now this set i is going to be very balanced so so because of the this the way gaussian work with the rotation in violence it's actually not so hard to do it right so we can write vi we will write it the following we will write a they have some so so if vi vj is like a plus 1 over square root n we can write and as there is some v we can write vi as some w plus w prime and vj as w plus w double prime where the w the w's primes are the orthogonal part and vi vj times if it's plus then yeah it's in a plus direction so the w the norm of w squared is the norm of w squared is and in this case one of us quoted right they get it right this is the direction which they agree on this is the direction there is some direction which they are the same and then there is the direction that is orthogonal in the direction in that direction kind of so it's kind of by rotating you can say you know they agree on the first coordinate and they are kind of post-organizing the other coordinates and that's what happens and now right and if it was minus then this will be w and this will be minus w okay so and so basically and if we can if we pick this random Gaussian so now we are analyzing for a fixed vi and vj and we pick in a random Gaussian so we can think of it as we pick the random Gaussian in the w direction and we pick pick it then in the you know the orthogonal direction and the orthogonal direction and everything is going to be okay so so basically when we pick it in the in the w direction then we get so so what we're going to get is vi g is going to be vi w plus a sorry w g plus a i g prime okay so g this g so let me call it g0 g0 will be the part of g in the direction of w and vj is zero this being so now we said this this part is going to be I think I'm just making things too complicated so let me try to see if I can so so basically okay so maybe the way I should think about this okay so we just want to look at the probability that you have g vi is at least this value 100 n to the one quarter and the probability that g vi is at least g vj is at least 100 n to the one quarter so so you can basically write it and it's going to be basically the same as the probability that g i plus vj is going to be at least 200 n to the one quarter and then now basically there are two cases here in the two different cases now this is a vector that if these guys have a plus direction is is just longer than the case when this guy has a minus direction so so this probability is basically like it's going to be like two to the minus right so so this probability is going to be two to the minus 100 square root n right we kind of set it up this way and but now we're going to multiply by one plus minus one over square root n depending on you know whether vi vj have a plus direction or a minus direction and so we can see that there is going to be like two to the hundred multiplicative difference between these two probabilities okay so let me think I've made a little bit of passive depth but so let me kind of write down the calculations are always bad for me but let me let me write down like this thing okay so this is going to be okay the probability that i and j are in i the baseline probability for this is two to the minus some constant time square root n this is the baseline right the constant is really the probability that one of them is in this is two to the minus c over two and time square root n and the probability above them the baseline is like the square of the probability that so this kind of the baseline but when these two guys are positive in our product the probability is going to be a little bit bigger than the product of the probability with the two guys are negative in our product the probability is going to be a little bit smaller what is a little bit a little bit will basically be in the exponent it's basically going to depend on the length of this vector vi plus vj in the in the case where they have positive in our product it's going to be something like one plus one over square root n longer and in the case so this is going to be basically like one plus one over square root n and this is going to be one minus one over square root n so the point is that this is going to be about two to the c times bigger here this probability than so this probability plus minus so this probability is going to be about two to the c times bigger here because it's always i think i think it was it was better because it was correct before okay this when the vector is longer this is actually an event that's more likely to happen so when the vector is longer this is an event that's more likely to happen so the point is basically when the vector is longer by a factor with the negative factor of one plus one over square root n the event is going to happen with a constant factor of more probability so this is a very different difference it is a very small difference in the nose between the vector vi plus vj in case one and case two that because we this goes into the exponent with the multiplied by square root n this becomes a noticeable difference and you can see why the square root n exponent comes into play okay this by the way the way this is sometimes called this is this general method if you search for it it's called dependent random choice kind of it's a method that's been used in many cases in combinatorics i think if i remember correctly i think the jacob fox gave recently a talk at the iis about that and you didn't see the talk but i think the videos is online it might be worthwhile to look at it so the the idea is that you why is it dependent random choice because we kind of choose this random we choose this random Gaussian and now the the choice of whether to include every individual detail i in the set i is dependent on this random Gaussian so individually every individually every vector has the same probability of belonging to the set i but the correlation between two vectors that are positive in a product are more likely to be to fall together in that set than two vectors that are negative in a product and that what makes the the final set be monochromatic yes okay yes so let me i mean i i i want i can say it's now a little bit yes so basically um what we are really doing one way to think about the the probabilistic method it's okay what we're really doing is we are selecting g at random that's okay we are not doing a union bound what we're really doing is we re-way vi by e to the by something like a i think what comes out to be like e to the n to the one quarter vi g maybe even like maybe even i need to make this let's make this let me write this g to make this into a standard normal i need to multiply this by square root n so this is this is now a standard one normal viable right and now we are re-weighing we are re-drain vi by e to the n to the one quarter this thing so so we could also phrase these things as rather than the conditioning re-weighing and now we could do a Taylor approximation of this function so we get a polynomial it's a randomized polynomial but that's okay so we're going to analyze the expectation of the re-weighing polynomial and then it could be s o s so so i think maybe we can do a break now and you can if i made a mess of these calculations maybe you can use the time during the break to try to figure them out yourself because it's not that complicated so so it's not it's not that complicated so basically this i claim that we have even give you one more minute so i'll say one more thing so i claim i i claim that basically this and this has proven so you can complete the proof in the if you didn't leave me you can complete the proof in the break so this has proven the this statement that we are going to get we are going to get when we average over the choice of g there is going to be at least one choice of this g that will give us a subspace that has this much size and when you know 99 percent of the things are going to be are going to be plus one and then and then basically based on that so that gives us basically the rates bound tell us that the approximate is good enough for them and what i'm going to do after the break is show what we do in our case in our case we have to do something a little bit more complicated and we'll talk about in our so now our goal is we're going to have the same assumption except that we we we don't assume that it's bullying right it's we have a matrix so maybe before saying the goal what is the setup so so now we have our case so we are giving this matrix m where m i j is v i v j little n and now we don't know anything about the dot product we don't know that there are plus minus one over square root n and our goal is to find the minus all t plus square root n such that going to be much larger than so as expectation so we can also raise our goal in this way so so let's let's start by seeing what happens we let's assume this is also a loss of generality that this is the assumption that originally the vectors are in isotopic position and you can always kind of transform vectors to have in an isotopic position so without changing the matrix and so you can even transform them while keeping implied by the bus come leaving equalities but let's but yeah this is really all more an assumption for we're going to drop this assumption so this patient so okay so so beyond these vectors and and we want to make them approximately one one and the idea is the first we're going to do the same thing that was okay so so we're going to choose g at random in you know n zero one over n i and we are going to define i to be the set i such that g display right over n to the one above we're handling these some constant that may get large as possible even be pulling open so right so we said we defined set i and now we want to understand what's the so let's start with start by understanding what's lambda what's going to be lambda one what's going to be the top eigenvalue of this matrix and in one way to lower it bound it let's look at a g transpose expectation i in i the i the i transpose so that by definition is going to be at least 100 over n to the one quarter so at least 100 over scrotum right so so we're going to create we're going to create this eigenvalue which is 100 over scrotum n and if we didn't touch the others right so so if if for some so so we know that basically lambda one of m i i is going to be at least 100 over scrotum by the way of course when you do this kind of calculation you always have to divide by the norm of g squared but the way we chose g its norm squared is very very likely to be one so it's very concentrated so so this is really the eigenvalue of the top so this is a low bound of the top eigenvalue so the top eigenvalue is going to be at least 100 over scrotum and now if the rest of the eigenvalues so originally the sum of squares of the eigenvalues if the originally the matrix are this look like this one over n one over n what's the sum of squares of n you know of these guys like n times one over n squared is i don't know how you guys are going to survive like me the goose is like to me like you know there's another three hour lecture and i was thinking that Friday maybe we just passed and we should be able to add another three so right so so originally the sum of the lambda i was one over n now we have gotten made some of the lambda i squared was one over n so so you know let's look at it so before we're actually having this it's the inverse of what you usually have like before and after picture before you had this guy that is very skinny some of the lambda i squared is is one over n right that's the isotopic and after you have this beer belly like lambda one is some right lambda one squared is larger than one over n right it's like 100 over n something like that now if that's the only place so we made it fat here if that's the only place where we made this guy fat we are very happy right we've done we are done we this matrix is approximately one one but you know we this is an arbitrary distribution we have no control over this distribution it could be very well be that we suddenly you know we try to do that but we suddenly created this thing so you know it's just your guy in generation i was reading this book to my son yesterday flat time kid is flat and his brother is putting a bicycle a pump into him and by the bottom up so it's very similar so so you know we hit this thing we hit it so so the good case is that we kind of we put this Gaussian and somehow we only increase got one egg and value out of it and we didn't get the others but maybe we increase the others but usually this kind of thing we do in it says okay we've we've done something like that we have made progress and in particular if this thing happened and we so if we moved from let's look at the total for binius norm squared so originally before it was one over n and now the total for binius norm squared is like now 100 over n at least and we have made progress we if this happens we couldn't repeat it more than log n times if this kind of repeatedly happens so the thing is we say okay if if only lambda one square if lambda one square dominates the for binius norm for now the rest we declare victory and go home otherwise what we are going to do is the intuitively this is what we're going to do what we are going to basically now choose now a new Gaussian g prime which is going to be from the normal distribution but now it's going to be from the normal distribution but with the new it's going to be from the normal distribution but based on the based on the isotropic that based on the not the isotopic based on the moment maybe matrix of a new distribution. So that's the general thing we're going to do. So in fact, this is what we did before. I just didn't say that, right? We chose G for this distribution. This was really the expectation of VIPI transpose. And more generally, we're going to choose G based on the moment matrix that's going to match up the moment matrix of our distribution. So when you do this, if you've not seen this generally before, when you select G from N and 0 and some PSD matrix M, what it really means is that, say, in the eigenbasis, if you think of M as lambda 1 to lambda N, in the eigenbasis, you basically select GI from N, 0, lambda I, which means that expectation of GI squared is lambda I. In particular, the expectation of the norm of G squared is going to be the trace of M. In our case, it's going to be 1. This division is always the expectation of VIPI transpose over unit vectors. The trace is always 1. If you want to understand what's going to be, so now we want to understand what's going to be, say, VIG squared, the expectation of VIG squared. So that's going to be, let's write it in the eigenbasis. So that's going to be, we're going to think of it as right. This is a normal variable and 0. And we want to understand what's the variance of it. So it's going to be sum over VI squared GI squared, which is going to be sum over lambda I. The GI squared will give lambda I, and an expectation VI squared in the eigenbasis is going to be also lambda I. So this is going to be the variance of this going to be the Frobenius norm of M squared. So basically, on average, for a typical V from this distribution, VG is going to be like a normal with 0 and the Frobenius norm squared for the variance. So what we are going to define is that I is going to be I, except that VIG is at least 100 N to the 1 quarter standard deviation. Standard deviations is like the Frobenius norm. So that will give us a set of the right kind of size. And now we want to do the same analysis. So we're going to look at, so we define MI, to be expectation of our I in VI, VI transpose. And we're going to look at, say, G transpose MIG. And we know that this is going to be at least 100 squared N Frobenius norm squared. So say this is going to be at least, because the Frobenius norm is always at least one of a squared N, then this is going to be at least 100 times the Frobenius norm. So we get that lambda 1 of the nu M squared, lambda 1 of the nu M, is larger than 100 times the Frobenius norm of the old N. So in particular, the contribution of lambda 1 squared is already so basically you see that if you're not completely done in some sense, then you're increasing the Frobenius norm. So that's basically the outline of the two for actual distributions increased by a counter fact. Yes, so you only need log N factor, log N amounts. But what is the Frobenius norm of the old N? What? The Frobenius norm of the old N. You know that the first I in value is the same, right? Yes. So the increase in the first I in value already, so basically what we know is the following thing. So let's look at the bad case, right? So originally, the Frobenius norm squared goes 1 over N. So this was step 0, step 0. Now step 1, that lambda 1 squared, let's say, is like 100 over N. But we're either done or the reason we are not done is that the rest of the thing, like the Frobenius norm, is like 200 over N or something like that. This is not the reason for all the Frobenius norm. There's still some left. So now in step 2, we're going to basically do this tweak. And now we are going to get a bit even better I in value. Now we're going to get lambda squared is going to be like this 200 over N. But the reason we are not done is because maybe this Frobenius norm is now going to be, say, 400 over N. This would be like also 400 over N. Or let's say this is going to be 200. This is going to be 400. So in step 2, we have increased because what we see is that basically if the original matrix was not isotopic, we get more. This is basically what this argument says. We didn't get merely, before we only said that lambda 1, the new lambda 1, will be at least 100 over N. Now we say, if the Frobenius norm is better than 1 over square root N, we get better than lambda 1. So as long as the Frobenius norm is larger, we can get a better lambda 1. And we cannot repeat this more than log N times. Frobenius norm has an absolute upper bound of 1 over N. Right. It's between 1 and 1 over N. Yes. Yes. Yeah, that's the idea. That's the important part. Right. That's why it's only log N. Exactly, yes. So we cannot doubly it too many times. So basically, this is the kind of argument between the actual distribution. As you say, if you actually go and look at the paper, then the things are a little bit more complicated there. Basically, the general idea is kind of the same. But you have to be a little bit more careful. And so there are two main ideas that are useful also, I think, in general, in essence, that I want to say. So I want to mention them. So one is this thing that I kind of glossed under the table before, which is this thing. I kind of said that Vg will be like normal with this thing, which kind of assumes that every vector behind is kind of typical in the sense of it's kind of typical. And things might be hidden non-typically. So there is a general kind of statement we want to do. And there are kind of two steps that we want to do actually even for actual distributions. If you want to make this thing into an actual proof, and definitely also for pseudo-distribution. And one is like fixed scalar quantities vector. And the general, let's just remember what's kind of the rule of the game. We have a distribution, and we want to restrict it to an event that happens with probability roughly 2 to the minus square root n. And where nice things happen. So restricting to events that happen with probability 1 over polylog n, that's kind of free for us. But this is like 2 to the minus log n, or that's kind of free for us. We can always kind of assume if we don't like the distribution right now, we can condition on some event as long as the event happens with probability at least 1 over polylog n, 1 over poly n. That's kind of for free. That's not something we worry about. So in particular, that's something that's useful when say, so suppose you have a distribution mu over b. And you have some f, like a scalar f. So f from rn to r, from rn to r. And you want to say, and suppose you know that the expectation of, how do you want to say it? Suppose that expectation of f, let's say f is in 0, some n, or poly n, and expectation. So let's say f is in 0, n. So we can ensure, so in actual distributions, we can ensure it's in some interval right, in some interval a and a to the mu plus minus 1 over poly n. Yes, we can show this with probability 1. Right, we can split this kind of, we can split this interval, say, from 0 to n to a polynomial number of intervals, and restrict attention to one of the intervals will contain at least one or a poly n of the mass. And we can restrict attention, we can condition on being inside this interval. So that's something we can do in actual distribution. We can ensure that probability is 1. In what we talk about pseudo-distribution, we kind of never can really condition. So we can never get something that has the probability 1. If you're a quick list, if you're really pedantic, you can say that we can never talk about probabilities in actual distribution, but let's not get carried away. But the way you think about it in pseudo-distribution, you cannot ensure that you get probability 1. So in actual distribution, you can ensure that in all log n cost. So in pseudo-distribution, what you would get is you can basically assume this. You can basically get probability 1 minus 2 to the minus k, or something that we used, 1 minus 2 to the minus t, in basically all of t log n cost. So in some sense, you can never get exactly fixed quantities completely, but you can have the effect, including the effect that you could basically do union bounds, et cetera, of doing something of fixing this with probability 1 minus 2 to the minus t. Now, what does it mean, actually, to fix for pseudo-distribution? So basically, what do we say when we want to fix pseudo-distribution? The statement you would want to prove is the expectation of f of v, say, minus a to the t is smaller than, say, suppose you want to fix up to accuracy epsilon, you would want to, say, think t as an even number. This expectation is at most, say, epsilon a. So this basically says that, right, so this is an expectation statement that we could at least write for pseudo-distribution. And basically, the kind of statement you want to say that there is always a condition of pseudo-distribution that will ensure this statement. And the way we interpret this statement is that probability that f of v minus a is larger than 2 epsilon a is less than 2 to the minus t. So this is basically what you do. So this is how you kind of translate. So the typical way you want to translate results from actual distribution to pseudo-distribution is you want to first make sure that all the manipulations that you want to do to actual distributions only involve re-wearing the polynomials. And already when you do that, you have to deal with this part that you cannot ensure that things happen with probability 1. You have a probability 2 to 1 minus 2 to the minus t. A lot of the complications happen at that point. And then you want to make sure that everything you're using that analysis is basically SOS and that. So one of the steps, if you go and look at the proof, is we need to fix scale-up quantities. And we want to fix scale-up quantities to make sure that they don't go too widely. So for example, you have these vectors. And then what happens is that you don't want to fix too many of the scale-up quantities. So we can't really afford to fix the projection of VI in every one of the individual directions. So we kind of look at every individual eigenvector and we kind of look at higher dimension of the eigenspaces. So if you look at the proof, the way it looks in the paper, what you would see are the general statement would be the following thing. Look at the matrix M expectation VI, VI, SOS. And you sing of it in the right eigenvases. It's lambda 1, lambda n. Then, first of all, here is an easy kind of plane. The first 10 squared n eigenvalues, the first 10 squared n eigenvalues contribute, do you want to say it? So you want to say that maybe lambda 1 plus lambda 10 squared is larger than, let's say, is larger than, at least as large as, let's say, at least half of the i to 1 lambda i squared. So if you looked at the first 10 squared n eigenvalues, then just because they definitely contribute more than every other block of this quote, and at least as much as every other block of this quote, and then you can show that this thing is true. So that's just a calculation. What it kind of means is that if you manage to take this, so what it means is the following things. Like, look at this distribution in the first squared n coordinates, or 10 squared n coordinates. If you manage to fix this thing into a single vector, then you would basically take all the mass that was on the subspace, and you collapse it to a single direction. So in the new matrix, the new eigenvalue would be the sum of these eigenvalues. This makes sense? So what does it mean when you take a distribution? So you have a distribution, say V, which has some mass on some subspace, capital V. So basically, expectation of V, V transpose, the trace of this is like the norm of the trace of expectation of V, V transpose in the first, you take the trace of the first some L directions, it's like the expectation of the projection, the norm of the projection of V squared into this first L direction. And if you manage to fix this projection so that it's always the same vector, then you're basically taking all this mass and you're moving it into the first into one single eigenvalue. So this is in some sense what we want to do, right? We want to take one eigenvalue that will dominate it for the new norm. And the way we do it is we want to look at the way we're going to do it is we're going to look at the first square root and eigenspace. And we want to fix the distribution with the inside that the eigenspace then basically it will move all the mass, it will collapse all this mass into one eigenvector. So it will make one, the new eigenvector will have the value which is the sum of this eigenvalue. So the first thing we'll do is we'll fix. So we find this, we look at the matrix, we look at the eigenspace. We do our fixing scalar to fix the total norm that is in this eigenspace. So we can assume that this thing is concentrated, that the projection of the vector to the first L squared coordinates is fixed. And now what we want to do is we want to fix the vector inside the subspace. So fixing a vector inside the subspace, just more speaking, if you have a subspace of dimension L and you want to fix a vector, you should be able to do it using L vowels on the degree L. Right? This is kind of like a sequence again as an integration. See if you have L coordinates, you can ask L questions about the L coordinates. You use L vowels of that degree L or 2 to the minus L probability, however you want to do it. So basically, you have to prove a limit like that. And you would kind of imagine that the people in the SOS world have already proven it, but somehow they didn't prove it in divide parameters. So we basically prove something like the following. OK, so if mu is an actual distribution, say over, and now sing of it as like Rd, so sing of it as what's going on in some d dimension, then OK. So then this is kind of an epsilon net argument, right? We can find mu prime where divergence of mu prime at most say d over epsilon or something like that. And such that in some vector v0, such that with probability 1 over mu prime over v from mu prime, the norm of v minus v0 is at most epsilon. Let's say it is vector v. But this we can do by some kind of an epsilon net argument, right? You partition, you take an epsilon net, you take the guy in the epsilon net that's most likely, and you can do that. So basically, we get a version of that for actual distributions. We basically says for pseudo-distribution, so basically says if mu is a pseudo-distribution over Rd unit vectors, then what we want to say is that you get, can re-way, I think we get like d over epsilon, like all of d over epsilon, something like that, such that the new distribution will have something like 1 minus epsilon weight here and the rest. And the rest could be elsewhere. So you get some v0, so you get some v0, such that v expectation v0, v square, the unit v0, is at least 1 minus epsilon. Yes? So instead of talking about changing by some polynomial, and you just talk about approximate KL divergent of pseudo-distribution by taking the Taylor convention, it's like taking the normal definition of KL divergent between Taylor and Taylor convention. I think it might be possible, I don't think anyone has done it, but I think it might be possible. It's part about moving to polynomials seems a little bit mechanical. So basically, I mean, one way to say this, but basically what you want to say is that, yeah, if you were going to spend like d over epsilon time in dimension d, you could dominate. You could really completely fix things. And completely fix things would be that basically it would be the analog to this kind of thing. It's basically, you're going to dominate in the trace norm, rather than just dominate in the phobinous norm. So in some sense, what you want to say is that you find this subspace of the square root n dimensions, and you fix that thing, which kind of says that subspace is dominating trace norm, and then that means that you kind of dominate the entire thing in phobinous norm. But you might have created some new large eigenvalues because you did that, and then you have to repeat it, but you're not going to repeat it too many times. And now let me just end by saying, OK, so this, I think that the way I stated this structure theorem, it's tight. Like, you cannot do the example I started in the beginning. Just 1 over plus minus 1 vector shows that it's tight. But there is a very interesting to me, like, there are several directions how you could potentially try to get beyond it. And let me give you one direction, which I don't, we haven't really had time to think about a lot. So maybe one of you guys can find an easy counter example. So let me state it as a conjecture, but I don't know if it's really a conjecture. So let me write this as a conjecture. So suppose that v1, this is mu, is uniform over this thing. Again, you need potentially negative reweighing function F. So what does it mean? I mean, potentially negative reweighing function F. So I mean some F such that expectation over mu of F of v, say the absolute value is 1, and expectation over mu of log of the, let's say, little of 1. Maybe start with, even this would be interesting, enter the one-third. Such that if we define, so that basically, that expectation of F v v transpose is roughly 1, 1. So it's the result of 1, not 0. So basically, could we get more power by reweighing with negative coefficients? And why would we hope to get something like that? So because let's think of this plus minus, let's think of this plus minus 1 example. OK, let me even say something like, OK, maybe not even 1, 1. Let's say L is 1, 0, 1, or something like that. So the idea would be the following thing. Like in the Gaussian example, we knew that square root n things, in square root n time, you cannot really dominate in the Fubini's number. You can suppose, what would happen if you reweigh with only n to the 1-third? So when you reweigh with only n to the 1-third, then you basically can do, you can get some distribution that looks like v0, v0 transpose plus 1 over nid, like in the plus minus 1 case, where the norm v0 squared is now 1 over v0 squared is now 1 over n to the 2-thirds. So the Fubini's norm of v0, v0 transpose Fubini's norm squared, which is like v0 to the 4, is like 1 over n to the 4-third. So this is very small, this is small compared to the Fubini. This doesn't dominate in the Fubini's norm, right? But suppose we did it twice. So we have one reweighing, v1, that gives us v0, and this gives us v2, this. And then we have another reweighing, we've also entered the 1-third rounds. That gives us v1, v1 transpose plus. Now we can subtract the two. And now we get basically one rank two matrix. So we get a rank two matrix. And now if we think of our original application then we'll get something that's still super interesting, like a rank two matrix that is inside our subspace w. So for reweighing, it would be super interesting, I think, to get expectation. You get that expectation mu1 v0 transpose minus expectation mu0 v0 transpose. This would be a matrix that is in w and would be roughly a rank two. So if this conjecture is true and it's true for some of these solutions, you could get a very interesting round in the algorithm here. And in fact, some version of it might be also useful for the low-rank conjecture. Because also for the low-rank conjecture, if you find a rank two sub-matrix, then you will find a monochromatic measure one-quarter or something like that, so it's not. So basically also for the low-rank conjecture, again, there almost might be an issue here. But basically, so if you allow negative reweighing, I don't know of a good counter example. And we'd be interested in that. Maybe there is one, maybe there might be some obvious counter example. Might be some way to improve things. So that's basically an open question on how, it's hard for me right now to say how easy it is because we really haven't spent a lot of time thinking about it. So maybe there is just a simple counter example that we are just missing or... If we want to achieve rank two, it's easier than rank one. It sounds as, yeah, like everyone, one matrix isn't available, so I'm doing that. And we allow negative reweighing, it's also easier. Yes. Why is the growth become harder? No, I want it to the one third. Okay, I see. Yes. So the question is whether this gives me the power, right, so the question is whether this thing gives me enough power to get beyond square root n. Because I kind of know that if I want, the rank two thing is not so important. I think if I want beyond square root n, if I restrict myself to positive reweighing, even if I want to rank five, it wouldn't matter. Like, we cannot do that. The thing for positive reweighing, also, it might not make a difference, rank five and rank one, because you could always just spend like an extra constant number of rounds and fix yourself in that subspace. But the question is whether negative reweighing can allow you to break this growth n. And I don't know the answer, so. So, sorry, I went five minutes over time.