 Thanks Makran. Thank you actually Ankit Makran Tony for organizing this fabulous workshop and thank you for inviting me Okay, so So, you know this day has been like all about lifting theorems and I only ever proved one lifting theorem in my life Which is what I'm going to talk about today. That's because my main motivations come from Approximation algorithms and hardness approximation So in some sense lift theorems are more like a utility to me than you know my bread and butter But you know, I'll still tell you how they are useful today So it's not also fits nicely in the progression of talks today. Like, you know, we we saw lifting theorems in accordance of complexity in communication complexity Then the last talk was of extension complexity without lifting theorems And now you know this talk is going to be about you know, proving LP lower bounds and extension complexity lower bounds using lifting theorem, so it's kind of you know fitting All right, so let's begin. So this is based on joint work with Raghu Meika and Prasad Raghavendra And it almost never happens to me that I can state the main result of a talk on the first slide, but It's possible here. So kind of excited about that. I'm going to do that So the main result of it's this talk is going to be a lower bound on linear programming relaxations So, you know today I'm going to show you that There is some fixed constant delta naught think of it like 0.0000001 Such that if you write a 2 to the n to the delta naught size LP relaxation Then such an LP relaxation Has the following integrity gaps. Okay. Let's say you want to solve the max cut problem with such an LP Then the integrity gap of any 2 to the n to the delta naught size relaxation would be at least 2 Similarly max 2 sat the number would be 4 thirds Max 3 sat the number would be 8 7th and so on Now these numbers appear funky But actually they come from The approximation ratio of a very simple algorithm the algorithm that simply returns a random assignment Okay, so, you know the one of the first randomized algorithms we learn in algorithms classes is you know How do you solve max cut? Well, here's one thing you can do just simply return a random cut Right, and there's a very simple analysis that shows that in the expectation the cut that you return would be half optimal And what this result is saying is that well, you know, even if you augment your trivial algorithm and use like a Mega 2 to the n to the delta naught size LP to help you out Well still not gonna work out. You're still in the worst case gonna suffer from a gap of 2 Which is basically the same as you know the performance of the trivial random assignment algorithm And similarly for you know max 2 sat where you know the right number is 4 thirds and max 3 sat where the right Number is 8 7th and so on so, you know In this slide I also want to somehow flash the main technique for you to look out now This is not supposed to make sense right away, but this is you know something that you should look out As a talk progresses and you know whether I can justify it to you or not So the main technique of the talk is gonna be almost like an upper bound instead of a lower bound or at least That's how I would like to spin it So the way we prove this theorem is by showing that let's say, you know, I give you a budget Let's say that you can write a 2 to the n to the delta naught size LP right then you could go and ask okay Which constraint should I add I have this budget of these many number of constraints what LP should I write what LP realization? Should I choose to let's say solve max cut well It turns out that if you don't care about polynomial factor losses Then I can precisely describe a single fixed LP to you for any given size budget and it's a well studied LP It's not some you know archaic weird constructed LP. It's like a very well studied LP. It has a name It's called the Sherali Adams LP and so somehow the main result of this talk is gonna be that if you are given a budget of Some size 2 to the n to the delta naught then there is up to a polynomial factor loss of fixed LP Which is best possible LP in terms of the worst case integrity gap or approximation ratio that that LP could give you for solving any of these problems Okay, and that's a Sherali Adams LP and by just combining Lower bounds that are known for Sherali Adams LP you get the results out You know just use that Sherali Adams LP of that size as a lower bound Let's say of 2 for max cut and you get you know the result on the main side now before moving ahead I want to like point out one one, you know somewhat disappointing fact about LPs Which you may have already observed we already know that we can greatly beat the approximation ratio of 2 for max cut problem Right that we can get something like one over point eight seventy eight By using semi-different programming and this hyper plane rounding that you may have heard about well, what this is saying is that If you want to somehow even do slightly better than point five Like forget achieving the guarantee that a very small sdp achieves Even if you like want to like do a little bit better than random assignment You to spend basically an exponential size budget for writing a linear program And so, you know it kind of shows that you know if you want to match the performance of Simple sdp for let's say a problem like max cut you to really have exponential size linear program So it's kind of you know shows you some gap between linear programs and semi-different programs for like, you know basic problems like max cut Incidentally turns out that this 2 to the end of the delta naught kind of result for max cut is tight It turns out that you don't need to do the n-size LPs to replicate the performance of sdps You can do it with some 2 to the end 2 to the delta 2 to the end of the delta naught for some fixed constant delta not less than 1 This was proven just a few months ago, but okay, that's not what this talk is about Okay, and so, you know this so ours is not the first result that relates arbitrary linear program for shiralee Adam's LPs this kind of a technology was first developed in this very nice paper of Chan, Rheera, Gevendra and Stoider from 2013 and What they proved was basically a similar result but with a worse bound So in like their result implied that if you wanted for example to simulate a quasi polynomial LP Or let me just put it this way if you applied the result because it was like far from you know Being optimal in the way that I have written down on this slide. You only obtained a quasi polynomial lower bound So somehow, you know, you could not obtain lower bound against exponential sizing the program Your techniques kind of broke down at some quasi polynomial bound So the main somehow the technical punch of what I'm going to show you today is that we can somehow You know take this connection and push it all the way We just the polynomial factor loss between any LP and shiralee Adam's LP giving us an exponential lower bound And so somehow the key technical idea is going to be you know showing lower bound on non-negative rank Which you already seen Makran stock now on certain kind of structured matrices which are called as pattern matrices and This we will prove using a lifting theorem. So, you know, it all fits. It is a part of the same workshop So there's a gonna be there is gonna be a lifting theorem Okay, so that's that's basically the summary. So let's you know Let's begin by you know before I like tell you a little bit more about the result and how it works Let me you know show you a little bit of context now this slide is dense But the expectation is not that you read every single result and now here the point I'm trying to make in the slide is that for specific linear programs So, you know, you people have considered Various specific type of linear programs and you know in literature like shiralee Adam's LP lower Shriver LP You don't need to know what these are. You just need to know that they're very specific well-defined you know linear programs and Because these are such important problems people have considered the question of how good are these linear programs for solving Let's say max cut which is going to be my running example for you know The kind of problems I'm gonna deal with today and people have essentially figured out essentially all that we could So, you know, we know the tight integrity gaps on all such, you know various general notions of various specific LPs that we could write for max cut max 2s at etc And so this talk is you know trying to go beyond these results and asking well, okay Let's say. I don't want to write any of this specific named LPs I'm free to write any LP whatsoever as long as I have a size budget. How do I somehow? Azerton that this will not help me or can it help me? Good, so that's what we are after and this kind of a Effort was begun and formalized by an Akakis as like my current already showed you so I'm gonna be quick in this slide And so this was 1988 and not much really happened after you know, Akakis says, you know monumental work Until like 2012 where this area again broke open with this work of Piorani, Massart, Bacuta, Tiwari and the Wolf Who showed this to the root and lower bound for the traveling salesman problem the TSP polytope and since then a lot of Amazing works have happened again. The slide is too dense. You don't need to read everything It's just there to impress you that a lot has happened since 2012 Yes, ah previous slide Good so again, it's kind of my bad that I use the word there, but it like just think of our rounds So, you know Because I've written it down. Let me put it this way These are not single LPs But they are kind of you know a hierarchy of LPs and it turns out that for every number R Which we call as rounds there is like a end-to-the-R size LP corresponding to it of Sherali Adams type or lower Shriver type And so, you know for for the purpose of this slide You're just supposed to think of it as R as defining end-to-the-R or end-to-the-o of R size Yes, so like exponent yeah exactly so, you know You're supposed to think of them as omega n rounds meaning 2 to the omega n size LPs good So yeah, a lot has happened since 2012 in this direction of you know Proving extension complexity or lower bounds and arbitrary linear programs and you know this work is going to be about proving Exponential lower bounds for approximating constraint satisfaction problems of which max cut etc. Examples good so I've been using this word So far, but you know, maybe the first question you should ask is what does it mean to be an arbitrary LP for max cut? What kind of LPs are you allowed to write like in principle you could be like You know sneaky and write the following LP take a graph G Compute its max cut we are whatever algorithm then add the constraint that max cut of G is you know smaller than something in some clever way Of course, I can't possibly prove all the like lower bound against this LP And that's because you somehow hard-coded some information about the graph G Right and so I have to be a little bit careful in what I mean when I say that you know I'm proving a lower bound against LP relaxations And so I'm gonna do what I'm gonna do is like formalize what I mean by arbitrary LP for solving max cut and then you know state The result and how we prove it, okay? so Before telling you what an arbitrary LP means let's first of all see at least one LP one natural LP for solving max cut And so let me remind you how max cut problem looks like you know if you've not seen before You know you are given a graph your task is to find a subset of vertices And you're trying to maximize the number of edges in the cut defined by the set of vertices s Now it's a very simple job to write this as a quadratic integer program. Okay, so you can somehow, you know Take a vector x which takes values in plus minus one to the end You can think of like let's say one as one side of the cut and minus one label What is the other side of the cut and you can somehow, you know decide which edges look at by some simple quadratic function Like one minus x j x i x j over two and they're just summing up summing all these terms over You know all the edges gives you the count of the number of edges in the cut, okay? And so just maximizing this quadratic Function over all possible access in plus minus plus minus one to the end is you know equivalent to the problem of solving max cut, okay? So but I want to write a linear program for it You know this has this particular formulation has lots of issues with it Well, it's quadratic. It's not a linear like it. It's so integer variables and so on so we somehow have to fix it And so how do we do it? You might have seen this before but let me just do it for you anyway So first I have to somehow, you know make this quadratic function linear So here's one way to do it. I'm going to introduce new variables y i j Which are supposed to stand for whether the edge i j is included in the cut So y i j is kind of my proxy variable for one minus x i x j over two in this quadratic formulation, okay? That's good. So now, you know, I can write down my objective as a linear function in y i j That's good. So I'm just trying to maximize the sum over all edges of y i j's Of course, you know, I have to add some constraints on y i j's. Otherwise, you know, they have no reason to be you know behave as cuts are supposed to behave So the second thing I do is add any collection of valid constraints. So what does valid mean? I can add a linear inequality constraint on y i j's as long as this constraint holds for all cuts So all y i j's that can be obtained by some x's that define cuts. They are a special class of y i j's They're not all possible vectors like that Any inequality that holds for all of those vectors. I can add as a constraint, which is legit, right? Like if an inequality constraint is followed by all possible cuts, there's a legit constraint to add I can add any such constraint and any collection of its constraints will give me a valid Relaxation, okay? So here is for example one natural way to do this The first thing I can observe is that you know y i j's can take values only between 0 and 1 That's good. I can add these bounding constraints. I can also observe that if I take any triangle of where this is i j k Then in any triangle The max cut can cut at most two edges, right? Because it's a odd cycle And so I can add this odd three cycle constraint this you know y i j plus y i k plus y j k Less than equal to two means that I can cut at most two edges out of any triangle and similarly, you know If I was feeling funky one day, I could like do it for five edges and seven edges and so on You know there are various ways to you know add more inequality constraint But this is supposed to give you an idea of one possible relaxation you might want to build for max cut, okay? Good So now that we've seen you know one specific LP for max cut, you know It's kind of natural to understand what an arbitrary LP might look like, okay? What are you allowed to do? So I'm going to take the procedures we did on the previous slide and just like you know allow them to be done In the most general possible way, okay? So the first thing we did was we linearized the objective function Okay So here is what I'm going to allow you to do in most general in the most general setting I'm going to allow you to take the graph g and represent it as a vertex as a vector v sub g in some m dimensional space This m could be much larger than the number of edges in the previous slide The graph was represented as the 0 1 vector of which is which edge is being present So it was like an n choose two dimensional vector But now I'm allowing you know other possible, you know vector representations of a graph g If you know that you might want to construct it could be completely arbitrary representations of you know the graph similarly you can take all possible cuts and You know map them into vectors vice of s, okay? And the only restriction is that you should somehow, you know map the graph And the cuts to live in the same number of dimensions as long as you know They are same dimensions I should be able to take in a products of them and like right linear inequality constraints, okay? That's it. So for any mapping of the graph g into a vector v sub g and for any system of mapping cuts s Into you know some vectors vice of s. I can define an objective vg dot ys Which would stand for you know the cut The the value of the cut defined by the set of vertices s in the graph g, okay? so in the previous slide vg was you know the indicator vector of the edges and By s was you know this This plus minus sorry zero one indicator of you know ij being cut for any given x does it make sense? Yes, exactly. Yes, so this yeah, this should hold for every graph. Yes, that's an important thing So somehow, you know, I'm trying to construct a linearization for all possible graphs and all possible cuts of a fixed size at once good The next step we did was add any, you know valid inequality set of inequality constraints this step I'm going to call a relaxation step and so you know at this point. I allow you to choose any polytope Which contains all the yss corresponding to all possible ss, you know like you had a way to take all possible cuts The two to the n possible cuts and map them into vectors But any polytope that contains all of them You can define like I can take any polytope that contains all of them Like write down it in inequality constraint form and that's a valid. That's a valid set of inequalities and you can take any of these Okay, that's it. So that's that's what my arbitrary LP for max cut looks like Okay, now I'm gonna care about two parameters of this LP first I'm gonna care about like what's the value of this LP meaning if I take the LP value the LP maximum of The the max cut relaxation produced this way then what is it and how does it compare to opt of G which is the true max cut value and Second I'm gonna like care about the size of this LP and I'm gonna measure it today by just the number of variables in the LP plus the number of inequality constraints Okay, so M was the number of dimensions so which also corresponds roughly to the number of you know up to a factor of 2 I think it corresponds the number of variables and then the number of inequality constraint defining p Does it all make sense? Yeah Good so Makran kind of mentioned this particular comment multiple times that you could have chosen any set of valid inequalities To define the slack matrix. Well, I am choosing the inequality. This comment is not supposed to make general sense So if just if you don't follow, it's okay. It's not needed for the talk, but you know here I'm choosing the inequalities that looked like max cut of G is less than equal to the opt of G for example, okay, like they The the true max cut polytope which may have some very large description does satisfy the inequality It derives all the value inequalities and I'm kind of taking the polytope defined by those value inequalities Okay, good. So so that's that's that's the model of arbitrary LPs I'm going to think about and now, you know once I've defined the model I'm going to tell you about what it means to be a good model. So I'm going to say that an LP L with you know, so I'll be right. Remember it's defined by the scheme of linearization We chose and the polytope P we chose now for any such fixed linear program L. I can ask I can I can I'm going to like measure its performance by how good an approximation it gives This is where the approximation plays in so I'm going to say that it gives me a c comma s approximation if For every graph for which the true max cut value is at most some small s The LP maximum is at most some number small c Okay, and just for a sanity check this number c of course is always going to be bigger than equal to s Why is that true? Well because we always chose the polytope P to be a relaxation It always contains the true cut vectors. So I can always achieve the value s the optimal of G In general, I could achieve something larger because I have a relaxation and you know I'm trying to I'm hoping that C is not too far from CS if I'm trying to construct a big a good relaxation Okay Very good So now I'm kind of mostly rehashing McCran's talk at this point. So, you know, we are trying to understand when do C comma s approximating LP relaxations of some given small size exist Okay, now, it's kind of actually the way I define makes it look extremely general It's like how how how at all do we reason about all possible linearizations and all possible polytopes seems kind of daunting But you know back in 1988 Yannac is already made our life simple and you know showed that there is a fixed parameter That depends only on the problem It does not need to know linearization. It does not need to know the polytope fixed parameter Does depends on max cut, for example, that completely determines the smallest LP that could achieve Let's say for example, C comma s approximation And so I'm going to define this parameter for you now it's same as what McCran described But I'm just gonna do it in this new notation because that's what I need. Okay So what Yannacakis did or what Yannacakis would have done if he wanted to do max cut and approximating setting Was he would look at this matrix where the rows are indexed by all possible graphs Where the true max cut value is at most small s Okay, so each row corresponds to a graph and the true optimum max cut is at most some small s Okay, the columns correspond to all possible to to the end cut So these are graphs on fixed number of vertices. Let's say n okay, and The g comma sth entry so each entry is now indexed by a graph g and a cut s the g comma sth entry is C minus the value the number of edges in the cut at s in the graph g Okay So just by the description of the matrix it should be clear that this matrix is non negative Why or because the optimum value of g is at most s C is bigger than equal to s. So C minus cut g of s is always positive So all all the entries of this matrix are non negative. So it's a non negative matrix Okay, so Yannacakis proved or Yannacakis's proof implies that there is a property There is a there is a simple parameter of this matrix which is called as a non negative rank Which I represent as rank plus of this matrix. I'm gonna call this matrix m cut which completely determines You know the the smallest size of the LP relaxation that gives me a C comma s Approximation for max cut, okay And so that that quit quantity is just this non negative rank and I'm gonna define it for you again in some sense Because my chronology defined it for you before Yeah, so size of the LP so is the num like the dimension That LP lives in the number of variables plus the number of inequality constraints good So Let me remind you what this non negative rank is. Yes Cut g of s means the size of the cut defined by a set of vertices s in the graph G Okay, just like how many edges are there in the cut of vertices s in the graph G Yeah, like the number of edges crossing s and g good. So what is non negative rank? so, you know The following is maybe not maybe not the first definition you saw of rank But you know, it's definitely something that you would agree with the rank of a matrix is the minimum number of rank one matrices Such that adding them all up gives me the original matrix You can define rank what like you can define rank of a matrix as the smallest decomposition Into rank one matrices which one added up give me the original matrix, okay Now when I have a non negative matrix in principle I can ask that each of these terms the rank one terms they themselves be non negative Okay, in a in a rank decomposition, it's not needed that they've been on negative right like, you know Negative terms could still add up to you know, give me a non negative matrix But I could in principle ask that I also want the rank one terms to be non negative Well, if I ask that then the quantity that I get is a non negative rank So said in another way the non negative rank of a non negative matrix is the least number of rank one matrices Which are non negative and which when added up give me the original matrix Does it make sense? So in general again one sanity check the non negative rank of a matrix must at least be as large as The rank of the matrix because you know the non negative decomposition is at least one ranked decomposition of the same matrix Good. So now now, you know, you know basically one get a case new already and You know now our goal reduces To so when we want to prove LP lower bounds our goal reduces to just to somehow show that this fixed matrix M cut Let's say for max cut, you know Has a large non negative rank. Okay. That's our goal Now and in some sense, you know non negative rank is kind of the main quantity that gives connection to communication complexity, etc Because there's rectangles rank one matrices, etc, etc, and you saw like the whole day of talks on that, but it's not important for us So we want to prove a lower bound all we want to do now is to want to prove a lower bound on the non negative rank of this Fixed matrix, but again, like how do we do that? It's not it's not it's a fixed matrix, but it's not like a clearly well understood matrix Like if we understood the cut off all max cut off all graphs and we will not actually be standing here So it's clearly, you know, not very easy to reason about this matrix So the way we are going to do Is we're going to find some structured submatrix inside this matrix Okay And we are going to prove a lower bound on the non negative rank of the structured submatrix instead Okay, and the structured submatrix is going to be something called as a pattern matrix and pattern matrix is described by a Function f like the data that a pattern matrix takes is some function f I'm going to define this in in like again more formally very soon But basically right now take it as just a shorthand for some correctly structured matrix Then I'm going to find as a submatrix inside m cut and then prove a lower bound on this structure matrix To derive a lower bound on the whole matrix. Now. Why is that second implication valid? Well, any Rank 1 decomposition of the big matrix gives me a rank 1 decomposition a non negative rank 1 decomposition for the small matrix And therefore if I manage to prove lower bound on the non negative rank of the submatrix, I would get one even on the original one Okay, so far so good Good. So what is this pattern matrix? That's where communication complexity You know kind of enters the picture. That's where also where the lifting theorems enter the picture So I'm gonna use notation and names which are inspired from communication complexity But the names are not important like you can understand this without any knowledge of communication complexity Even though you actually have a lot now so Let's say that you are given a function on n beds. Okay So it maps n beds into real numbers And I'm gonna call this a one-party function because it takes only one input Okay In addition, you are given a two-party function because I think of its input as coming from as as like spit into two parts Each coming from some alphabet sigma Okay, and for whatever reasons I call this two-party function a gadget, okay So the gadget maps You know two inputs Each coming from this alphabet sigma into plus or minus one Okay, so given a one-party function f of n beds and a two-party gadget G. I can create a matrix Indexed by sigma to the n on the rows and sigma to the n on the columns Okay, where the x comma yth entry is as follows. Okay at any x and y. So now, you know each x and y is like a n length Vector where each coordinate comes from this alphabet sigma. Correct. That's what you know sigma to the n means So now what I'm gonna do is to compute any entry. I look at, you know The first block the first entry in both x and y They are elements of sigma. Okay, so I can apply the gadget G should have a picture here So what I do is, you know, I look at these, you know n blocks x1 through xn and y1 through yn Then I apply this gadget G to the first entry of x and y to the second entry of x and y and so on and this way I get n bits out right because output of G lives in bits and then I apply the f to this n bit output Okay So it's like a completely syntactic operation that I did Okay, and I'm gonna define this I'm going to define this resulting matrix produced as the pattern matrix of f with gadget G Okay, good. Okay. So I promised you that the way I'm going to prove a lower bound on the non negative rank of m cut Is by finding a pattern matrix as a sub matrix inside it Okay, and so at some point my task will reduce to proving a non negative rank lower bound on such a pattern matrix right So that's where you know lifting theorems come in Because lifting theorems somehow, you know, like at a like philosophical level they relate Some hardness measure of this matrix m sub f To some other hardness measure of the single party function f and you saw several examples You know of this phenomenon earlier today and you know, these are I don't even think they're all the examples that you saw today They're just a small subset where we show that Some natural notion of complexity of this matrix mf Can be lower bounded in terms of some natural measure of the complexity Of f and so, you know one example you saw for example was the decision tree size or depth of the function f And you know the deterministic communication complexity of m sub f But you know there are several several such examples And we are actually after you know a very similar theorem today, right? We are interested in understanding the non negative rank of the matrix m sub f So maybe we should really ask what complexity measure of this function f should somehow relate to it Could I somehow lower bound a non negative rank of the pattern matrix m sub f In terms of some natural complexity measure of the function f Okay, so that's exactly what i'm going to do And to tell you this uh, uh, complexity measure i'm just going to make one definition, okay So i'm going to define a function on n bits to positive reals as a d junta If it depends only on d out of the n variables it formally is a function of Okay, now you may be familiar with the more natural usage of the term d junta to refer to just as functions of d variables I'm kind of plugging in non negativity Without actually adding a new terminology for it because well, you know, that's what I care about today. Okay So given that definition the non negative degree of f which is i'm going to write as degree plus of f Is the smallest number of non negative d junta's Okay, whose sum equals f Okay, now again, let me relate it to something that you're more familiar with the polynomial degree of f is the least Least integer d that f can be written as a linear combination of monomials of degree d Right So this definition is almost the same except that the linear combination If you in fact like restricted positive monomials like conjunctions or something Then this definition is just telling you that the linear combination should only use non negative coefficients So it's basically, you know polynomials with non negative coefficients Like almost exactly equal to polynomials with non negative coefficients Okay, and this should like almost remind you of this non negative rank versus rank business Like it was also it almost felt like if I just restrict signs, that's what I get there And it's obviously not accidental because they're going to be related to each other Okay, so this is a non negative degree of f And so what turns out to be not that hard to prove is the following upper bound That you can somehow show that the non negative rank of this pattern matrix m sub f is So let's maybe parse this notation in a second But basically it can be upper bounded in terms of the non negative degree of the starting function f Okay, and the size of the alphabet sigma And so the precise function is that it's going to be some exponential in the non negative degree of f Times log of the size of the alphabet sigma All right, so that's the upper bound The main result is going to be almost matching lower bound. Okay, so let me parse this out for you So I'm going to be given a function f which is non negative Of course, you know to talk about non negative degree. I have to look at non negative functions Now ignore this expectation f equals one constraint for now. Okay It's coming because there's going to be some additive thing going on So I have to have some scale parameter in so you can ignore e f equals one The gadget g the the alphabet sigma is going to be b dimensional hyper cube Okay, where b is at least some big constant times log n. Let's say 10 Okay, the gadget I'm going to use so this is not going to be true for an every gadget The lower bound is not going to hold for every gadget. It's going to be true for an appropriately chosen gadget The gadget g I'm going to use is the inner product gadget. Okay, so you always I think you all by now know what the ip function is But just you know to set the record straight G, you know takes this b bit to b bit strings alpha and beta And there's just maps Alpha and beta to you know the parity of the sum of alpha i dot beta i Okay, so you you you this is all good like the notation and the setup is all good good So then under this like for this for this ip gadget and a function f I'm going to show you that the non negative rank of the pattern matrix of f with the gadget g is going to be Uh lower bounded by exponential in something that looks almost like the upper bound Okay, it has a degree plus term the non negative degree term And it has a size of log of the size of the alphabet term, which is b The only difference is that I don't have the non negative degree of f Is a non negative degree of f plus some additive shift Okay, now here is the one way to think about it If I add a positive number to a function f I kind of make it only more positive So what it actually means is that you can only reduce The non negative degree by adding by shifting a function by some positive amount So in general the non negative degree of f plus let's say some tiny 100 over n Is you know smaller than or equal to the non negative degree of f So in principle this lower bound is a bit weaker than what would have been if there was degree plus f there But we are saying that okay as long as you have a function Which whose non negative degree does not change much by this tiny shift is going to be essentially a matching lower bound Okay, and now this because I have an additive shift 100 over n I kind of normalized e f to be one like you know, otherwise this additive shift doesn't make sense Basically, we're saying that if if function f the typical value of f is like order one large Then the shift we have to include here is like one over n large. Okay. Does this does the statement make sense? And so, you know, it's not I mean it's written on the board it's written on the slide right up here It's like, you know, it's going to be constant is tight up to like constant factors appearing in this exponent inside here and this additive shift Okay, okay. Good. So, you know, this is tight And so it's kind of nice to compare it to, you know earlier results And you know what they were proving So I told you this result of chan Lee raga bender and stryder and I kind of mentioned that, you know They were able to prove a quasi polynomial lower bound for, you know, solving max cut to an approximation ratio better than two And you know, I told you that the main result of this talk is basically improving their lower bound to exponential So you can ask can I somehow interpret the result in this framework? Right? So it turns out that you can they don't state their theorem this way But somehow I think the nice way to understand their paper is to really look at their stuff also as a lifting theorem Except that they are lifting two slightly different properties Okay Instead of looking at the Instead of looking at the non negative rank of mf They actually look at approximate non negative rank of mf and it kind of means what you think it means But the non negative rank of, you know, mf is basically, you know, the smallest non negative rank of any matrix which is closed, you know entry wise To my matrix within delta Okay, so So they they actually related the approximate non negative rank of the pattern matrix m sub f To approximate non negative degree of the function f So up to this approximate one in there, they basically had the same kind of lifting theorem and their lifting theorem is also tight So so they prove the right lifting theorem for the right the quantities they consider except the lower bound is weak And actually we can explain it away very easily It turns out That this m cut matrix is somewhat weird The m cut matrix Has high non negative rank. Well, that's what gives it the large LP complexity But has a at most quasi polynomial approximate non negative rank So, you know, any any notion of lower bound that proves a non negative rank lower bound by actually lower bounding the approximate non negative rank Will never manage to prove, you know, better than quasi polynomial lower bound And that's kind of what was happening in the in the previous works And so somehow, you know, I guess, you know, one key kind of conceptual thing here is that, you know, we are kind of somehow Have to have to really exploit the distinction between approximate non negative rank and the non negative rank to prove the right lower bound here Okay, good. So that's the result That I want to tell you about but, you know, before, I guess it's what I make. So this is just kind of dated But I'm giving a talk here because, you know, that's the only lifting theorem I've proven So this result is like two years old from 2017 And so a year after, you know, guess, Tony and Thomas Watson Give this cool proof for query to communication lifting for BPP And using those ideas, James Lee, Raghumeika and Thomas Widdick were able to give a slightly simpler proof of the same result And so that's not what I'm going to present because I don't know that proof But but basically, you know, there might I don't think there's a manuscript out with that Just like apparently has appeared in some talks, but hopefully at some future date, there would be a manuscript with a simpler proof Anyway, good. So I'm going to tell you about the proof a little bit So, okay, so you kind of saw this piece already, right? You want to prove a lower bound on the LP size for approximating max cut? Well, it's enough to prove a non negative rank lower bound on the M cut matrix So I'm going to prove a lifting theorem to prove a lower bound on the non negative rank of some M sub F And the F, you know, somewhere it has to come up that there is a hard instance of max cut, right? Like We are proving hardness of max cut. So hard graph should somewhere come in So this F is going to be related to the hard the hard instance for max cut, okay And I told you that we are going to reduce proving lower Like we are we are somehow going to show that the best LP for solving max cut is the Sherale Adams LP So the F is going to be related to the integrity gap instance for this specific LP So what we're going to do is like take F that comes from the integrity gap for Sherale Adams Then find the pattern matrix M sub F inside the M cut matrix and prove a non negative rank lower bound on that matrix Okay So that's basically this picture, but I'm going to rush through it And just in case you were curious, that's the form of F So fx would basically be one minus epsilon minus g0 of x This one minus epsilon is your c like remember the c comma s approximation we were talking about So s is really half for max cut While c is like close to one. That's what gives you the gap of two And so the function we are going to use is like just one minus epsilon Minus g0 of x and g0 g0 of x just means you know the value of the cut defined by x in the graph g0 And g0 is this hard instance Okay Anyway, so Rest of the slide can be ignored And you can just take home this last corollary which is that The fact that g0 is a hard instance for max cut for Sherale Adams LP Is equivalent to a non negative rank lower bound on this function one minus epsilon minus, you know This g0 of x or cut of g0, okay So basically what we are saying is that the Sherale Adams lower bound when interpreted correctly Gives you a non negative degree lower bound on this, you know shifted cut function Okay, so now we are doing good, right? We have a function with a non negative rank lower bound We want to take the pattern matrix approval lower bound on that find it inside the m cut matrix and we are done Good. So that's this step And so now let's see how much of this I can Tell you about but the high level part of this argument is very similar to the two other talks we saw today Okay, how do you prove? You know, uh, how do you prove that mx non negative rank is lower bounded in terms of the non negative degree of f? Well, you you know, you you go in the counter positive You say suppose mf has some small non negative rank towards the contradiction Then you want to somehow extract a representation for f that implies that it has a small non negative degree and then it like, you know Contradicts this non result and that's why, you know, we are done That's basically the structure of the argument. So We are not going to be able to do that as I kind of already told you We are actually going to derive a contradiction to f plus 1 over n this tiny shift of f Having, you know, small non negative degree now it turns out that for This this hard function that comes out of the charlie adams instance We can not only prove a lower bound on the non negative degree of like, you know One minus epsilon minus g zero we can actually pruddle this, you know epsilon a little bit like add one over n etc And like it also has high non negative degree. So that kind of doesn't bother us at all here So that's kind of a maybe a partial answer to this question from before In some cases it does not matter in some cases the non negative degree does not really drop Good. So now, you know, what does a simple representation for such a function mean like a simple representation that Implies a non negative degree upper bound is going to be what I'll call a conical junta approximation Okay, a conical junta is just, you know, this positive coefficient polynomial that we referred to before And so instead of constructing a exact representation for f Which is a conical junta I'm going to instead construct an approximate representation for f, which is the conical junta And from there conclude that f plus one over n must have a small, you know, non negative degree That's basically, you know, slightly more details for the step number two Okay Let's see So now you kind of see maybe this pick this piece at least making some sense this junta approximation The junta approximation is, you know, somehow that the proof, you know, is going in the reverse direction You assume that you have a low non negative rank. You assume that mf has a small non negative rank You want to somehow, you know, get a junta approximation for f and then you will be able to complete the proof The key technical piece in getting this junta approximation for f from small non negative rank of m sub f Is this piece called as decomposing high minotope distributions, which I want to spend five minutes on Okay, so I'll spend five minutes on this piece because that's basically the key technical piece Okay, so, you know, in the in some older versions of the talk, I used to say that I'm going to say a controversial statement that communication complexity is all about rectangles But today I learned that it's not controversial at all So so I guess, you know, it's just the truth Good. So so let me tell you, you know, how this how this piece works and how it fits So remember we had a matrix of this form, you know, it was indexed on rows and columns by elements of sigma to dn, right? And its entries were, you know, uh, f evaluated at g to dn applied to, you know, x and y where x and y are row and column indices, right? So let's look at rectangles. Well, because that's a fun thing to do So let's look at a rectangle which is defined by a and b Okay, two subsets of rows and columns for a given rectangle a and b I can define the following quantity a sub r. Okay Now if I look at any rectangle, it has x comma y is various pairs of x comma y's at every x comma y I can apply this g to dn gadget and get a string of n bits out, right? Because that's what g to dn does Right, it takes like the first components of xs and applies g and second and so on So now I can ask how many times does a element of plus 1 minus 1 to dn, let's say z occur inside, you know, g to dn applied to elements of this rectangle Okay, that's this function a sub r Okay So I am interested in understanding how simple is this function for whatever reason Okay, so in fact more generally instead of like dealing with rectangles I'm going to generalize them and look at all product distributions So the thing about this a comma v was that I was going to choose x from a at random and y from b at random And that's the way to choose a random entry from the rectangle And so all I'm going to remember about this process that it was a product distribution over entries And so take any product distribution. Let's say defined by u and v Okay, which you can secretly continue to think of it as rectangles I can ask a u v of z and how simple is it? Okay So let's maybe look at a couple, you know Examples to get a sense of what's happening Let's first look at the simplest setting when the u comma v The product distribution is just uniform That corresponds to the whole matrix being the rectangle by the way Okay So this one is easy So how what's the distribution of g to the n applied to x comma y when x and y are picked at random from, you know, u and v Well the inner product, you know, it's like balanced function So g to the n of x and y is going to be literally uniform over plus one minus one to the n Right So g to the n is uniform and therefore The frequency of all elements of plus one minus one to the n is the same And so a u v of z is a constant function Good Okay, let's look at like, you know, one more example and then and then I'll end Let's now, you know, think of a distribution where the first t bits, the first t blocks are fixed Rest of the blocks are still uniform Then what happens to g to the n? Well, the first t entries will be some fixed value The rest will still behave as uniform And so this time a u v will depend only on what's happening To the first t bits of z. So a u v is a t hunder and finally I'm going to rush through it Actually, it turns out that you don't even need this Unfixed part to be completely uniform as long as they have sufficiently high entropy in each block It's enough to draw the same conclusion And this is the first time you use the fact that g is an inner product gadget It turns out that if you just have high entropy things here because it is something called as a two source extractor G to the n applied to this high mean entropy part still gives you something close to uniform Anyway, so a u v is still close to a t hunter And so the main result that we prove here is that this is actually true in general that if u and v Correspond to let me just rush through it and let's say just I'm going to just flash this theorem and end So if if u and v are distributions Which correspond to high mean entropy distributions meaning that they correspond to they they they look like large rectangles Then they the the product distribution u v can be decomposed into Things which look like t bits fixed and then uniform on the rest Plus some tiny error term Okay And so basically we represent a u v's as approximate t hunters and it turns out that you can if mf has a low non negative rank decomposition Then you can you can somehow use this simplicity of a u v And the fact that a u v is approximated by t hunter to actually construct an approximator Which is a t hunter or a conical combination of t hunter for the function f This part is not going to make sense. So let me just end Maybe I'll flash this slide. Okay. I'll stop. Yes. How is this decomposition theorem different from the like the glm wz? Good, that's because I rushed through the important slide there So here is somehow the main part So glm wz also have a decomposition theorem which looks exactly like this But it holds for a single distribution So their theorem can be thought of as saying the following that if you have a high mean entropy distribution u Then you can somehow approximated by a fixed part and a uniform part and the fixed part is going to be not that large Somehow in this case we need that both u v when they have high mean entropy can be written Exactly in this form and that the fixed part should be aligned And this turns out to be somehow one of the key aspects where the theorem has to differ But anyway like this aligned Yeah, so we have to somehow approximate this product distribution u v by sums of aligned Conjunctive blockwise dense thingy, which is this last thing. So that's maybe one main difference The other main the other differences that the error we get is actually Needs to be much stronger and that's kind of where this alignness comes in So it's like, you know, you could have used individually You can you could have used this decomposition individually on u and v and use them to get some aligned thingy But that has a huge like much larger error So you like because you construct this decomposition already in his aligned form You actually get managed to get a much smaller exponentially small error And so that turns out to be important in getting this exponential lower bound Anyway, yes Non-native rank like the previous talk is comes up when you're looking at the extension Yes, so what kind of setting gives rise to the approach degree like right? So again, I kind of rushed to the slide, but it turns out that it's kind of very nice serendipity But it's like there is this specific LP called as shiralli adams in your program It's like a hierarchy of inner programming relaxations and it turns out that Integrality gaps for, you know, solving constraint satisfaction problems using this LP Exactly correspond to the shifted functions Having high, you know, non-negative degree So it's like whenever you prove That a certain fixed LP called as a shiralli adams LP cannot solve. Let's say for example max cut Then it gives you that this one minus epsilon minus the max cut function must have a high non-negative degree Turns out to be essentially equivalent as statements. Does it make sense? I was asking about the The approach degree like you mentioned that the previous or where does approach degree come in? Yeah, so All the papers were really trying to prove lower bounds on, you know, a non-negative rank and therefore non-negative degree But it's like, you know, the techniques you use Like the techniques the previous papers were using Actually lower bounded not just Degree but also approximate degree and not just rank non-negative rank, but also approximate non-negative rank And that's why they suffered from this disadvantage that they could not go beyond quasi polynomial The attempt was the same except that their technique somehow applied to this more general notion And therefore could only result in a weaker bound Does that make sense? Say excellent question And in general the answer is no Basically and it's kind of related to, you know The talk that makran was giving like, you know for this matching for example, it's like one canonical example where somehow, you know Some constraints that depend on the graph kind of play into the polytope. That's maybe one way to think about it At least the polytope is not like a simple polytope like plus one minus one to the n And there for example, we don't have any strategy that proves a lower bound by proving a lifting theorem We kind of have this hard coded specific strategies like makran showed Yes, so this talk doesn't prove any lower bound or vertex cover even though an excellent formulation lower bound for vertex cover is known This method does not prove it