 So, welcome everyone. This is the first talk of the spring semester, starting the sixth year of TCS Plus. Before I start, let me thank my fellow organizers, Ilya Razenshine, Anindya De, who saved the day earlier today. We had some technical trouble. Tomas Vidic and Klement Kanon. Also, I should say, before we start the next two talks, in two weeks we have Dor Mincer on the 2-2-2 result, the recent result from that, and Michael Cairns two weeks after that. So we have lots of groups here today. Usually we go around the table and we try to do it quickly and see if I can identify all the groups. So, we have Amit Levy with the group from the University of Waterloo. Hi everyone. We have André Nusser from Max Planck Institute. Oh, the big group. Hello everyone. We have Ayush Sekari from Cornell. And we have Benjamin Miller from Mid-Tab Medicine. Hello everyone. And we have Budima from EPFL. Hello guys. We have Kobchen Wang from University of Michigan. Hello everyone. And Gregory Diaris-Laztev from Indian University. Hello. And we have Heng Guo from University of Edinburgh. Hello, good evening. We have Brinal Conti from TTI Chicago. Hello again. Taylor Christian from University of Birmingham. And we have Samson Zhu from Purdue. Almost there. Shavasrao, a few floors above me here at NYU. Hi guys. Sushant Sajdeva from Toronto. Hi guys. And last one. Vago said I forget. Oh, this is okay. Hello, Kremol. So, Kremol cannot join us. Stanford, right? Okay. So, back here somehow. Okay. So, I'm finally very happy to present today's speaker, Avi Vigderson. Avi finishes PhD in 83 under the supervision of Richard Lipton. He was at Hebrew University for some years before joining the Institute for Advanced Study in Princeton. Avi runs a very successful program there. As many of you probably know, TCS and discrete mathematics. It's a great place. Many postdocs have spent time there. And Avi is well known for many results in computer science, hardness, randomness, and many other results. He won many awards, including the Nevalina Prize and the Gettle Prize. So, without further ado, let me start the talk. So, Avi, please. All right. Yeah. So, I hope you all hear me. It's a very weird thing. I'm doing it the first time, so we'll see how this works. I understand that you can ask questions. So, ask questions. I prepared 60 minutes. I prepared some material for more if you want to know more about some of the things that I'll show you. So, I'm going to talk about the problem that I've been looking into with various collaborators. You'll get to know them all soon. We call the operator scaling problem. And the way I titled the talk is more reflecting. It can work for lots of other papers, optimization, complexity, and math. What I find quite remarkable in this one, in just the last four years that we've been at it, how much math and from how many areas combining to help this problem and related problems. So, let's see now. Why don't I cannot flip mine? Okay, let's see if this works. Okay. So, the talk will focus on one problem and one algorithm. The problem is the problem of singularity of symbolic matrices. The algorithm is alternative minimization algorithm or heuristic. So, let's see the focus, the lens for which we will, you know, explore these connections with various mathematical areas, several mathematical areas, physics and complexity and optimization. The connections are really quite remarkable. You don't have to read all the small letters, but in particular we'll see or we'll encounter non-communistive algebra, invariant theory, commutative invariant theory, quantum information theory, analysis, and problems in computational complexity and problems in optimization. I want to mention already in computational complexity, these will be problems that are related to algebraic models, PIT and low bounds, and in optimization as it turns out it will relate to all the algorithms we will give will turn out to be efficient algorithms for several non-convex programs. That's always good to have and for exponentially large linear programs and various other types of problems that, you know, we don't have general solutions for, so these are specific families of such problems. And one hope of this lecture is that people will find out more applications of these algorithms that I'll show you. Okay, so I want to start slow. I'll just describe the problem first. There will be a couple of problems as you'll see. They look the same. And then the algorithm. And once we do this, we'll start getting into the real stuff. So the problems, I think, the basic one you're all familiar with is the problem of testing with a bipartite graph. The N by N bipartite graph has a perfect matching. There are numerous motivations for this problem and this one motivation we encountered when the kids were small. Not every child likes every candy, so you would like to match them to candies they like, like this one. So this graph has a perfect matching. Here's another graph and in this graph you can convince yourself there is no perfect match. So these are combinatorial problems and one of the main features here is that we connect these to algebraic problems and also later to analytic problems. And the most simple way to make this algebraic is to look at the adjacency matrix of the graph, connect, put one for an edge between u and v and zero otherwise. That's the adjacency matrix. And now the perfect matching problem becomes an algebraic one. What you're asking is asking whether there is a perfect matching is the same question as asking whether the permanent of this matrix is positive or the permanent of this matrix highlighted in yellow is just like the terminus, except that you don't have the signs. So we will talk a little more about this, about this function. You just have to remember that it looks like the terminus but morally it's very different. It's a difficult function to compute. Here we are just asking whether it's positive. And whether it's positive, whether the graph has a perfect matching is one of the oldest efficient algorithms in our field. It was sold by one of the Coby brothers. I'm not sure who there are many. This is 18, the year is 1890. Long before we defined efficient computation but he already found a polynomial time algorithm. The problem arose in solving a system of ordinary differential equations that he needed to solve and that made it efficient. So anyway, that's an easy problem. So we can test it in P. And the sort of the key to the problems we really care about is an interesting and important observation of Edmonds that suggest a different algebraic formulation of the perfect matching problem. And one and he already understood is a difficult kind of function. You want to relate perfect matching to the terminal and he found a way to do it a very simple way. Suppose that we replace every one in the matrix in position ij by a variable xij and you leave the zeros alone. The matrix becomes symbolic. It's a matrix of variables. And you can look at whether you can look at the determinant of this matrix. Now the determinant is a polynomial is a polynomial with these variables. And what he observed is that graph has a perfect matching, if and only if this symbolic determinant does not range. Okay, it's very easy to see. I mean, if there's a perfect matching really is a diagonal in the general diagonal in the adjacency matrix and such a diagonal will become a monomial that cannot be cancelled in any way because other variables are different. So, again, this problem of testing some whether this particular type of symbolic determinant is zero or not zero as a polynomial is in P. Simply we solve the related perfect matching problem. That's how we do it. There's no stress that you know just expanding the determinant just trying to see whether this polynomial is zero by expanding the determinant is a very bad idea because they're exponentially exponentially many times. But with this special case it's simple. And then he made a leap and asked the following generalization of this question instead of having just a single variable distinct variable in every entry set supposedly are just filling a matrix with linear forms. In some set of variables. Okay, so now the variables are not associated with the edges anymore. They're just some set of variables. And every entry has a linear form LIJ with just a linear combination, some linear combination for each idea of the of the variables. This is what we will call a symbolic matrix. And we can ask the same question is the determinant of this matrix identically zero. Same question we asked before, only for this general symbolic matrix and this problem are called singular or sing for sure. Okay, this is the basic problem we want to solve. This is the problem that we started up wanted to so wanting to solve. And this is the problem that we cannot solve. So nothing will change in the status of this problem. But nevertheless, let me tell you about this problem. So the question whether it whether it has a deterministic polynomial time algorithm was raised by months long 50 years ago exactly. And love was observed about 10 years later. That this problem has a very simple probabilistic algorithm. So this problem is an RP. You can do it what's a probabilistic algorithm is very simple you just take random values for the variables, appropriate integers at random. And then you have a numeric matrix and evaluate them just evaluate this matrix a determinant of this make the, it's pretty clear that if the determinant was zero then you always get zero. And it's easy to see that if the symbolic determinant was not zero then with very high probability you'll get a non zero numerically. So that's a very simple probabilistic algorithm and 40 years ago and more or less I started looking at this problem. Like many other people that the era of the randomization began. I'm sure that you know it must be simple the algorithm so simple it's probably really easy to make a deterministic algorithm for for this problem. But at the time also it didn't look that special. It didn't look so special. And it turns out that it's extremely special I mean you really want to work on this problem it captures, you know, very important other problems that I want to mention. This was at the same time but I guess this understanding took time to evolve this or to express this way. Determinant is complete which means that this problem captures is what we call PIT today polynomial identity testing. We're not proving any algebraic identity of, you know, the type that appears across mathematics. So that's one reason to test algebraic identity is very natural we want to test them and this serves out to be equivalent to testing a symbolic determinant for being the. That's one reason the more sort of striking reason is this result of carbonate and impagiato that finally a deterministic as opposed to publicity algorithm for this problem will basically prove lower bound will prove something like P different than MP. To know it will prove that VP is different than VMP or something like that. And the algebraic version of PMMP I will not get to have time to get into this but the one thing for people who have never seen this. I want to stress something magical about this result. The assumption is that you found an efficient algorithm and the conclusion is that there is no efficient algorithm for another problem. Okay, so you found an efficient algorithm for symbolic determinant and you prove that the permanent has no efficient algorithm. This is something to remember. And there's another reason to consider this, this problem now it rises in prominent another. This problem in various guys as appears. Intensively studied in linear algebra. It's, if you if you take a symbolic matrix and you just try all values to the variable just all values from the field the underlying field I didn't say before but this question is interesting over every field. Who are these matrix will mostly deal with the, you know, F will be some characteristics zero field, maybe the rational though the complex numbers. Anyway, when you feel the matrix with all possible values in the field then you get a space of matrices, the linear space of matrices. The question is really a question about the largest rank of a matrix in this space. And this turns out to capture lots of problems in particular. Modulize morphism problem in algebra graph rigidity structural rigidity question graph theory and many, many others. So hopefully this is the motivating slide we really want to solve this problem. It's crucial that we solve this problem for complexity. But as I said, we are not going to solve this problem. So let me tell you what we are going to solve. Okay, so I want to tell you about the dual life of symbolic matrices. So we have the same setup as before. The symbolic matrices. And I'm repeating, we want to give and so we should think of this is a computational problem. How is it given a way to represent a symbolic matrix is simply by the coefficients of every variable, the coefficient of every variable is simply a matrix a mass scalar matrix, right. A matrix over the field. So we are given let's say M matrices each end by end of entries in the field. And we want to know whether the determinant of the whether this linear combination some AI Xi is a singular matrix. Until now, I sort of implicitly assume that the variables we are talking about commute or I'm sure you implicitly assume that the variables we are talking about commute that's the most natural setting. And in this case we said that, you know, what does it mean whether the matrix is singular is just a just singular world. And the answer is obviously over the field of rational functions over this variable. And as I said, we don't know how to solve this problem. We have a probabilistic algorithm but no deterministic algorithm. But you can can consider another variant of this problem when the variables do not commute and this is what we will do. And this is the problem we will solve and it turns out that this looks like a much harder problem. And this problem was studied. So first of all, what does it mean that the matrix a symbolic matrix is singular over what field. The answer obviously over the rational functions over the non commuting variables. It turns out that this object called the free skill field is a really weird weird object. And maybe we'll talk about it later. But it's a very strange object and therefore I actually I didn't really define the problem yet. So I'm asking whether the matrix is singular over this strange field. But if you don't know what it is and I assume most of you don't then it's not yet defined but we will see that it has interpretations or formulations that are, you know, elementary and so you don't really have to know this. Anyway, the problem is whether a matrix symbolic matrix in non commutative variables is invertible or in singular. That's the question we asked, even though it's not well defined yet. And this was studied and it turns out that unlike its sibling of commuting variables, it's not, it took time to even get it to see that it's decidable. It's not obvious at all that it's decidable. This was done by corn who developed the theory of the free skill field. And then we've written our they actually gotten exponential time algorithm for this problem. And this was the best known until until now, a couple of years ago. And the algorithm I talked about today is with the Gourvitz garden, Olivera, giving a deterministic polynomial time algorithm for this for the case of characteristics zero. This is where all the connections will some of the connections to the various fields materialize both as sources of tools for analysis and also as implications of the fact that it is a polynomial time algorithm. And after that, if any of us are in super money and gave a very different algorithm for the same problem. It is our algorithm is convenient story is a analytic their algorithm is algebraic. I will not talk about that I will talk about ours only. The fact that I've given talks of various lengths about this material the last series was in the summer in CCC that I talked for six hours. This by the way was recorded so you can find it and the beautiful notes that some of their tendons have written so they're all on my website for those who want to even more about this problem. So anyway, I want to tell you about the this algorithm so so far I define the problem. Is this okay by everyone, we want to solve non commutative singularity or symbolic methods. Okay. So I want to show you the algorithm. Like we started with the problem we didn't start from the general problem but we started with perfect matching. I'll describe the algorithm first for its baby version for the perfect matching but once you understand the, you know, the style of the algorithm. I will not have to have to give you too many details about the real one. You know, we can proceed by analogy. So I want to talk about this algorithm. Let's consider another problem we didn't talk about it yet you'll see soon it's very related to perfect matching it's called the matrix scaling problem was introduced by sink horn. The input to this problem is then one non negative matrix, like for perfect match. No negative matrix. And remember that a matrix is doubly stochastic it's called doubly stochastic. If the row sums and column sums are all one. Okay, like a permutation matrix in particular. Okay, so we would like our matrix to be doubly stochastic maybe it's not. And then we can try to make it doubly stochastic by scaling what is scaling scaling is simply multiplying rows and columns by arbitrary no negative concept. So, in other words just multiplying it from the left and from the right by diagonal matrices. Okay, so we want to do that so that we will bring the matrix into doubly stochastic form, if possible. The problem that's what we call the doubly stochastic scaling problem. You want to find such diagonal matrices, so that after you scale the given matrix, you get rows and column sums that are close to one. The actual question would have been find the scaling that will make it exactly one, but turns out that the important and really more natural problem for many points of view is really asked whether you can approximate row and column sums one arbitrarily well. So for every epsilon there'll be R and C that will scale the matrix of the rows and column sums are within epsilon let's say in each. Okay, so that's what we want to do that's the doubly stochastic scaling problem, which will turn out to be a special case of the problem we really want to solve. Now, why would you like to do that what this why the single and then many others wanted to do this I mean I can give a whole lecture about this because I really enormous number of applications for this. I think it's very important for application numerical analysis solutions of, you know, solutions of linear systems that are numerical stable has applications in signal processing has applications in approximating the permanent. It has the application direct very direct application to the perfect matching problem. Why, because turns out it's not difficult to pull that you can do that you can doubly scale the initial matrix if and only if it's permanent is positive. Okay, so solving this and solving perfect matching is the same question. Okay, I'm not explaining this it's easy it's an exercise it's a nice exercise to try. Okay, so we are interested in scaling a matrix. So let's try to do it suppose we are given such a matrix maybe it comes from a graph maybe the adjacency matrix of this bipartite graph. And we asked whether there are these scalars diagonal matrices that we can pre and post multiplied with so that those and warren column sounds are one. You look at this matrix, you know, it's not doubly stochastic that some of the fair, you know, the first or sums to three and the first column sums to three we want all the column sums to be one. Now, it's not always as well to do it in fact it looks difficult. The second idea of sync on and other people came with it independently is sort of a greedy approach, which is really as they will come to call it soon as an alternating minimization approach. It's the following, you know, naive idea. We want both row sums to be one and column sums to be one. But it's really simple to get all row sums to be one. Right. That's, that's easy. So let's do this first for this, you know, we just have to divide every row, every row by some right. So let's scale the row. And we are here now all the row sums are one stochastic matrix. We saw one half of the problem right. So now, you know, all we need to do is fix the columns. They are not sums are not one, but it's clear what to do, right? We have to divide every column by column sum. So let's do that. And then now all column sums are one, but the row sums are not one, right? So, well, we can fix this. Let's divide every row by its own sound. And we get this. And we can proceed doing this, alternate, fixing the rows and fixing the columns. And actually it's quite easy to see, you know, to convince yourself at least that this is going to get into a loop. Either we get column sums one, or we get this. It will really alternate between these two possibilities. So this is not going to converge. But in fact it's good. It doesn't converge because this graph we started with does not have a perfect matching. So here it didn't converge and we don't have a perfect match. I'll show you another example in a second, but I just wanted this algorithm to be clear. It's just alternating row sums and fixing rows and columns. So let's say, you know, just formula the algorithm because we'll see the real one in a similar form. We have a negative matrix. Wanted doubly stochastic for, you know, given the matrix we can define, you know, good, you know, candidates for scaling, you know, one of the row sums and one of the column sum. And we just alternate. And we will alternate this polynomial number of times. We will normalize the rows, scale the rows and then scale the columns, you know, and cube time. This is the algorithm. It's the simplest algorithm. I'm sure you've seen test two lines. Okay, you're dividing here, but it's sort of easy to convince yourself that you are never divided by zero in the sense that if some row sum was a zero. That means all the entries are zero and that means you'll never scale it or you can tell immediately that it's not scale. This type of algorithm where you fix one half of the problem and then fix another one. So the problem is divided to several parts and each part is easy. But you know, you can do it only when the others are fixed is sort of an alternate minimization heuristic. We'll talk about it a lot later at the end of the talk. I want to give you some examples of where it occurs in places, you know. But let's do it again with another matrix that it corresponds to this second graph that we've seen in the beginning. This graph does have a perfect matrix. Okay, let's just try to see what happens when we apply it here. So again, we normalize the rows, we get this, the row sums are one. Then we scale the columns. You can believe me or check for yourself later. And then we do it again. And maybe you start noticing that some entries get smaller and some entries get bigger. And in fact, if you continue it, this is what it will approach. That will be the limit. And so it does convert. And it's good that it converted because this is the case where we have a perfect matching. So that's what we want. So here's the algorithm in more precisely the full algorithm. And I'm telling you this already I have the reference to a paper of myself with some old Nitski because I want not only to show you the algorithm but I want to show you a particular, very particular analysis that we used in this paper. That's a paper about approximating the permanent. Anyway, so you run this alternate minimization and a few times and then what do you want? You want to know if it converts or not? So the way to do it is you want to see whether it's already very close to being doubly stochastic. How close? One over N is good enough. One over N, which norm it doesn't matter. It can do an infinity norm or one norm or two norm. It will affect the running time by a factor of N or something. So it doesn't really matter. And, you know, if it's already close after N cube time, then it will converge. And otherwise it will never. So in these cases, you know, if yes, it converges, you say the permanent is positive. And if it doesn't, you say permanent is zero. Okay. How do you analyze this? How do you show that something like this works? This is the natural thing to cry. You have some kind of a dynamical system here. It's just a sequence of matrices. Those are the matrices that you have generated in the process. And if you can find some good measure, some good progress measure of your matrices, maybe you can use it to prove convergence. And that's exactly what we do. And the question is what to use as a progress measure. And what we did in this analysis in this paper was used permanent itself as a progress measure. Well, that was very useful with a lot of hindsight. I mean, because it, you know, sort of created follow-up words that probably will, you know, well, anyway, we'll see. I'll talk about it. So you use the permanent as a progress measure. So there are three things you need to do with any progress measure whenever you prove convergence. One is to prove some upper bound that this progress measure is never too big. Okay. Well, whenever you pick an increasing progress. Okay. So it's always bounded. And this is very easy to see because all the matrices we are producing are either elastic or doubly or column stochastic. And they're permanent cannot exceed one. That's the reality. The other thing is you want to show that every step increases this progress measure. So you want to show that after one application for all scaling or column scaling, the permanent goals, of course, it cannot grow if it's already very close to one. So if you're talking about as long as you are not one over and close to one, it turns out that the permanent goals by a factor of one plus one over. And again, this is a one line argument. I want to stress these are simple things. This is one application of the RMGM arithmetic mean geometric mean inequality. It's really easy to show that it's increases. Okay. So the third part is to say that when you start to start with some, you know, you don't start very low. You can assume we are we are in the yes case in the no case. It's easy. So in this case, the permanent is positive and you want to say, if it's positive, it's already non trivially big. And that's what non trivially big means. It's at least exponential in exponential in minus. So after the first scaling, let's say we're, you know, I'm going to assume the initial matrix was a zero one matrix. If it's a matrix of rational functions, then you need to put in the input size. And, you know, for all practical purposes, you can replace and with the binary input size and everything will work just as it. So let's assume it's a zero one matrix. And in this case, after the scale once the permanent cannot be too small. It's at least exponential in minus. This is really easy to see or maybe end to the minus and something like this. Why is it enough? Because if you start that big, I mean, it's small, but that big and you improve by one plus one over and you can see that in about n squared or n cubed steps, you will convert. Okay. So all this, I want to highlight this because we'll see it again and maybe again, this is a whole analysis. And these three steps will have different complexities later. Okay, proving them. The choice of, of August measure will be different and the analysis will be different. But in this case, it's really, really easy. So that's I want to stress. Okay, so far. So we are done with the toy with a baby case of scaling. Yeah. Okay. So in particular, notice that this is an algorithm for testing whether a graph is a perfect matching and is not normally taught in the algorithms classes. It's only the simplest to call. It's a little less efficient than the combinatorial algorithm, the mental path. Okay, now I want to move to the main, main result is this is the operator scaling algorithm. So we'll see a generalization of this problem. And we'll see a polynomial time algorithm for it that looks the same and has the same analysis. And I want to describe and in particular, we saw the problem will be the polynomial time algorithm for non commutative singularity. So you should remember the relation, perfect matching corresponded to matrix scaling and non commutative singularity will correspond to operator scaling. So in some sense, non commutative singularity is an algebraic generalization of the perfect matching problem. One might, I mean, I think I would say that people always thought that the commutative singularity problem animals problem is a natural algebraic analog of perfect matching but turns out that it's not. It is the non commutative version that is the analog. So let's talk about the problem of operator scale. In order to do this, let me remind you what we want to do this non commutative singularity and you will already start getting a feeling that these samples of matrices which for us define a symbolic matrix have many different interpretations. And we'll see but the first one and later we'll see more interpretations of a couple of matrices. A couple of matrices can represent lots of things. Here it represents a symbolic matrix and we want to know whether it's non commutatively singular. That's a problem we want to solve. And the quantum leap that Gourvitz made in 03, really 15 years ago was really following, it was really sort of a quantum generalization of the algorithm in LSW, you know, of the algorithm and the analysis of the LSW algorithm and analysis. And it was at least an attempt to provide such an analogy and Gourvitz analysis and the quantum leap is providing a quantum generalization or quantum theoretic view of the top of matrices. And so here's another thing you can think of a top of matrices as you can think of it as a completely positive operator. What's a completely positive operator? It's a functional matrices, it's a linear functional matrices. So L takes P and arbitrary and by a matrix into the sum of AIP, AI transport. Okay, so it's a different thing to do with the top of matrices. And it's pretty clear that this is a, this linear map is clear that it's linear in P and it's clear that it takes a positive semi definite matrix into another positive semi definite matrix. And this is the importance for this type of inputs are the important inputs in quantum information theory because the P's you look at are actually density matrices, which are PSG matrices of place one. They represent quantum states. So this is really a general quantum operation or sometimes a general quantum noise operation. So it's very important to study and understand it and it existed long before. Of course, this Gouwitz work but what Gouwitz was trying to say is that maybe some property of positive operators captures the rank of symbolic matrices. And what he believed is that it captures that it captures the rank of commutatively a singular matrices. That didn't quite work but you'll see that it works with just moving to the non commutative world. Anyway, he suggested asking the following question about completely positive operator is given such a positive operator, maybe a tuple of matrices, does it ever reduce the rank of any input? Is it possible that there exists a matrix P that the rank of the output L of P is smaller strictly smaller than the rank of the input. Okay. And this is the thing that he tried to connect to the symbolic singularity or the rank of a symbolic matrix but we discovered a couple of years ago which was really the starting point or one of the starting points to the new paper that Korn discovered long ago the same Korn I told you about that this problem of whether this operator ever reduces the rank of a matrix this is true even only if it is the symbolic matrix is non commutative singular. Because of this equivalent I have already defined for you what non commutative singularity means in elementary terms. Right. Now, there's another problem whether this linear operator reduces the rank of P or not for some people. This is really the problem that we want to solve. And so I'm not going to put you this equivalent in fact he developed a whole theory to understand this and he didn't quite formulate it this way formulated it differently but he was not talking about positive operators at all but anyway, that's equivalent to the original formula. And what Google is further understood and that's absolutely key is that for positive operators there's a quantum generalization of being doubly stochastic. There's a very natural way to say that an operator is doubly stochastic so now the operator is a generalization of a non negative matrix instead of a positive matrix we have a positive operator. We had doubly stochastic matrices now we have doubly stochastic operator. What is doubly stochastic operators they have to satisfy this basically or equivalently when you apply them to the identity both the operator when it's transpose you replace AI with AI transpose both of them evaluate to the identity. This is exactly analogous to having row sums one and column sums one. And now you can ask a question it's not clear why it's related to everything I've said so far but you can ask a question of whether a given positive operator given by this type of matrices can become doubly stochastic under appropriate operation. Which operation? Here's the caricature of quantum leaps. Quantum leaps you want to you have something classical and you want to make it quantum what you basically have to do is increase dimension by one and move from L1 to L2. So we have the matrix scaling we want to maybe all that will be furious at this or some quantum people in the audience. We have the matrix scaling problem. The input was a positive matrix we generalize it to a positive operator so down to the input is really a couple of matrices it's like a 10 go. We had a notion of doubly stochasticity that it was a row and column sums are one here it's the analog it's written down here. And the scaling was basically with vectors with diagonal matrices and increasing in dimension again they become you want to change it to a doubly stochastic operator. So the operator is the middle blue you know this tensor of matrices. And now you want to again pre and post multiply them by matrices now these matrices will be arbitrary invertible matrices. Okay now they are not diagonal that's the usual quantization of a diagonal matrix is a general invertible matrix. You want to change basis on the left and on the right. And after this change of basis you want your operator to become doubly stochastic. Okay so it's a simultaneous basis change same basis change on the left and on the right for all matrices. Okay so this is our task now we want to we are given this type of matrices and we ask whether we can scale it. Again scaling multiply and post multiplying it by invertible matrices so that you get this doubly weird doubly stochastic generalization. And as before you remember in the previous algorithm when let's say the row sums were not one it was easy to figure out how to make them one by pre multiplying by a matrix by a diagonal matrix also here is very easy to see you sort of basically divide by the right hand side only you get a square root because every matrix appears twice. So you don't have to read it down natural scaling factor for both the rows and the column or the analog of the rows and the column. And once you realize this you have the same algorithm. This is the algorithm. Alternate you know it's another problem where you can fix its part easily but maybe not both. So we repeat some polynomial number of times scale the rows so you know satisfy one of the conditions or satisfy the other. Again you don't have to read this is exactly the same analog only with the appropriate scaling factor. And then as before you can test to see whether you get close to being doubly stochastic in the same sense. And it turns out that's this thing that I have to justify. It turns out that well it will be yes or no and if it's yes then you answer that the initial type of matrices is non-singular and if it doesn't convert then it's non-singular. So it's just a complete exact analog of what we saw before for positive permanent. And again if you want to analyze it you need some progress measure so now you have an evolution of operators this algorithm defines an evolution of operators. You want to justify what's happened here and you need a progress measure. And the next key ingredient that Gourwitz provided already in this old paper. You couldn't analyze this algorithm but he provided basically all the ingredients almost all the ingredients. So Gourwitz is a proposition of what to use as a progress measure. And this progress measure is not a permanent by the way he did try a quantum version of the permanent which is natural to define and it's not good. And he noticed it's not good so he defined another one which is this expression so let me read it. So if you want to the capacity of an operator it's the smallest ratio you can have between the determinant of an output and the determinant of an input when you feed it a positive definite measure. Okay. It's how small can it be. It turns out that this has served as the same in the same capacity as the permanent did before in that it turns out and this is not difficult to pull that it's, you know, the non commutative singularity in turn, at least when you think of it as the rank decreasing is exactly by capacity being zero capacity being positive. And of course that's the reason to use it as a progress measure like with permanent before it's completely analogous. How does analysis goes. You want to say that in this process there is an upper bound it's the most one distance out to be easy almost as easy as the original, the previous one. You want to prove that at every step. If you're not too close you go by factor one plus one over and this also turns out to be exactly mimicking the previous proof it just one application of a mgm, but maybe it's two lines not one because the expressions are more complicated but it's, you know, again the similar kind of exercise with a mgm. The last thing that go with wanted to do but couldn't is prove this lower bound namely that if you are positive, then you are not too small, you're only exponentially small. You can prove this if you just exponentially small, then, you know, you increase by one plus one over and so in polynomial number of steps you convert if you can convert. Okay. So, but this is this occupies the main this proof of the last step occupies the main body of the of the paper from three years ago. Okay, so what's what's different what's different is that this is this talks being elementary. It turns out that in order to prove this you need tools from non commutative algebra and from invariant theory that I will talk a little about. I've been going slow so I probably get less than what I plan to but I will do want to show you something. So, this is I don't want to stress again all we need to prove is a low bound capacity assuming that it's that it's positive as before. I just want to note one thing about the algorithm. So it solves it's a polynomial time algorithm for the problem we set out to solve on commutative singularity but it does much more. In fact, I should say I'm called dead for helping us realize this. The permanent. This algorithm can compute the capacity so the progress measure. Before was permanent and it's hard to compute or even approximate certainly deterministic. But in the case of capacity turns out that the same algorithm. It's not only test whether capacity is positive or not but if it's positive, it approximates it arbitrarily what, and this is very important for some of the application. Okay, so I want to tell you, you've seen that symbolic matrix can have another interpretation I want to show you some other interpretations that are used in the proofs and in the application. We started with a symbolic matrix being so a couple of matrices as representing a symbolic matrix. And we already saw that it can be viewed as representing a positive operator that's a connection to quantum information field. In non commutative algebra a couple of matrices turns out to be exactly an element of this free school field I was talking about. This is a rational expression. It's an analog. It can represent the analog of a rational expression with no commutative variable. Again, it's a complicated beast. You can realize this just assuming that you know you have an element which in the commutative cases the ratio of polynomial now is something that's described by by couples of matrices. So it's a it's real but anyway it's very important to realize this connection. It was important for the analysis. And let's say one more thing. It also captures the analog we discussed it the non commutative singularity was the analog of the PIT problem polynomial identity testing rational expression identity testing. In the non commutative setting. So if you are given it as an input the question is, is this element zero in the free school field. Another way to think of a couple of matrices. Especially given the algorithm the algorithm which is sort of evolving by left and right basic sharing is an orbit of a certain of this group action of changing basis on left and the right. And this is absolutely crucial to the analysis. For an application of this whole, you know, the algorithm we decided it turns out that it's very useful to think of a couple of matrices as projectors I'm not explaining how in what's called the bastard keeping equality. It's a family of inequalities that are very important in in analysis they capture basically every inequality you know from. Koshy Schwartz and held there and the shearer to Brunnen Kofsky and the loomis Whitney and you know hypercontractive estimates are all captured here. And it turns out that the question here this table represents such an inequality and the algorithm will tell you whether an inequality like this is valid and what's the optimal constant how you know the tightest inequality that you can get in this. This is not explained here I'm just telling you that it has a certain interpretation. And because of this connection to brass complete inequalities it turns out that you can view such a couple of matrices as describing a linear program with exponentially many facets. And that's another view. And the beauty of this algorithm is that it puts all the associated computational problems in P. Basically these interpretations are not only used for some of them are using the analysis, but they also give consequences in all these fields we get new polynomial time algorithms for sort of pretty basic questions. So let me tell you I have like maybe five minutes left. I want to tell you this we've seen. I want to tell you a little about the connections that come into the analysis from non-communistive algebra and volume theory. And this maybe I want to tell you about. Okay, so I want to explain the way that the formulation of capacity leads us to connections in non-communistive algebra and the end volume theory after we saw how it. Yeah, we saw it in information field. So for this, we really want to think about here we have the initial type of matrices a one for AM and these were end by end matrices. And imagine you have another type of matrices D one for DM of some other size they are D by D matrices. And a key a key feature just a second. Yeah, a key feature is trying to understand what happens when you compute the determinant of the following expression. The sum, the tensile powers of AI and DI, we multiply you take the tensile product of AI and DI the corresponding matrices in the two tuples, you look at their sum. And you consider this determine this books. I don't know why it disappeared but we'll see it here we see it again. So, in the case of quantum information theory, this expression is related to the capacity of the that's what you get that's what you have to understand when you study the capacity of a tensile product of two operators. And also for the analysis one thing is needed for the analysis of the algorithm is to understand capacity of tensile powers of our original operator with other operators. And there's a simple inequality you can prove that's written down there will not get into this that is using the analysis they're very simple thing to pull but that's the one place where this, where this expression arises. It turns out this determinant of some of AI tensile DI arises in no commutative algebra. And how does it arise there. It arises there as an infinite, infinite sequence of equalities. And because you want to make it zero this determinant you want to make it zero for all possible choices of this table D. So for all sizes, little D, and for all choices of matrices of society. Okay, so this is just what I've written here is a sequence of identities. And it turns out that this sequence of identities. Hold exactly. If the top of represented by AI by the AI is a zero element in this physical field so that's an access to the non commutative singularity problem in a commutative way. Now you're looking normal determinants, only the determinants have coefficients, which are matrices as opposed to scale up. So generally this expression, the same exact expression. You just look at the determinants of some AI tensile DI again for all possible matrices, DI of all possible sizes. And if you think now of the AI is not as a, as a, as matrices of scalars but matrices of variables commuting variables. There are also polynomials infinitely many polynomial. These polynomials turns out turn out to be generators for the invariant ring for the old for all invariant polynomials under this group action. It's very helpful and probably you don't see what it is, but basically a description of the most fundamental objects related to the group action we discussed, namely left and right basis change on this type of matrices. The important object of study for any group action is the invariant ring is this ring of invariant polynomials. Then in this particular case automatically, invariant polynomials are exactly the same, the same determinant. And the connection between the three is exactly the connection that makes the, the, the property. Oops, yeah. So, in all cases we need to understand this expression. And, yeah, so, yeah, let me go back. Yeah, you have to understand this expression and now I want to go to the, just a final slide because I'm out of time. Find the, let me find my final slide. Just summarizing what we've seen, general things that appeared in this talk is that, of course, what we know and love there's a fantastic interaction of mathematics with the, with the design analysis of efficient algorithms. We saw that it will get a particular analysis of this alternate minimization, type heuristic in this particular case and turns out that in what I didn't show you there are many other cases. There are many incarnations of symbolic matrices you can interpret the top of matrices in many ways. The algorithm we, I showed you is a sort of a bit striking the original question is obviously algebraic you are asking about the singularity of a matrix of variables. I mean this clear algebraic question, but the solution is entirely analytical numeric. And, on the other hand, this analytic algorithm continuous type algorithm, in order to analyze it, we have to go back to algebra and to deep algebra and use it to analyze. As I said, the commutative singularity problem is open is wide open as it was 50 years ago when Edmunds asked it and very good reasons to study. The algorithm I've given you seems to solve non convex problems and other types of problems like exponential side in our program. What can it be useful and more generally, we have extensions of analysis of alternating minimization other group theoretic setting, but there are many, many other problems for which it is used, but we don't have convergence analysis. So it would be really interesting to understand. All right, thanks. Thank you, Avi. And I think let's see if there are any questions. And before people leave, I think Avi suggested maybe giving five more minutes if you still have the time for some more details. Maybe five is maybe 15. But if we want to make something meaningful, but maybe we ask questions and then anybody wants to hear more. Great. So let's start with some questions. Anyone? We have to unmute before. I have to or you have to. Can you hear us? Yes. Two questions. One question is, imagine I run alternating minimization and it converges. Can I recover the perfect matching if it exists from the result without, I mean, obviously it can be used by like trying, you know, to bring back this and run this several times, but just if I run it once, can I recover? My question is, is there a simpler or faster algorithm for the non-commutative problem if we can randomize, if we don't have to be deterministic? Okay, so both good questions. The first question is that I don't know a better algorithm that what you suggested by simply the usual reduction from search to decision. You can use, you can start in order to find the perfect matching using this algorithm. You can just eliminate edges and see whether they're in the matching or not. What it does, what you can do with randomization actually is you can use isolation. It turns out that if the matching is unique, then what will survive at the end of this algorithm, right, is after the scaling will be the permutation matrix that defines the matching. So that's the answer to the first question. The second question, so operator scaling. Okay. So remind me now that I focus on the first one. The second question, you know, if we randomize using randomization for this problem, can we get like a faster or simpler algorithm? Yes, very good. So, let me remind you of this expression that we looked at. So this is the terminal of some idea. Okay. I'll just put the slide. The one thing I didn't say is that in the last two expressions where you have a universal quantifier over all possible matrices, a very basic question is whether you need all these infinite set of equalities or generators or not. And really fundamental use of results in one theory is bounding how big a D, you know, the size of the matrices, the I, how big they can be so as to capture all the other ones. It turns out that there's always a final bound. And the best final bound not known today is that D can be linear in N. Okay, D can be linear. This is a result of Dirksen and Machen. Because of this, you can reduce the non-commutative singularity problem to evaluating determinants only for DIs which are also N by N. And you don't need to check them all. If you allow randomization, you can simply pick them at random and use Schwarz-Zippel in exactly the same way. So the answer is that today there's a simple probabilistic algorithm to do this. At the time we discovered the algorithm, the best upper bound on D was exponential in N and then we couldn't use it. So it was not even known that there is a randomized algorithm. But today we do know and it's very important, this discovery because it's using many subsequent results and it is using the algebraic algorithm that I didn't describe of Ivaniosky, our supermanium. Yeah, I just wanted to add to the first question. Can you hear me? Yeah. So I think, following up on the first question of Gregory, I think given a perfect scaling that gives you a doubly stochastic matrix, you can interpret it as a fractional matching, a perfect fractional matching. And given a perfect fractional matching, the algorithm of Goyal, Kapralov and Khanna gives a matching in linear time. In fact, you are right. Even the algorithm of Lev, Pippinger and Valiant, which is much older. Yes, yes, yes, even in parallel. You're absolutely right. So yeah, you will get the fractional linear matching. You will get only the edges that belong to some perfect matching. Absolutely, that's good. Can you ask the same question in the non-commutative case? Can you, if you can find the PZ matrix that makes it smallest, the capacity? Yes. And I'm not sure what the answer is. Yeah, I'm not sure. Yeah, it's a good question. I really don't know what the answer is. You can ask this question. It's a good idea to ask this question. Great. Thanks. Any other questions? Okay, but it seems like everyone's staying and waiting for some more. There is more. Okay, so feel free to go ahead if you have more time. Yeah, sure. So we can go and I'll tell you just a little more. Thank you. I'll tell you about the recent paper and there's even follow-up, but I want to tell you that we understand these questions better and better. So I'll tell you about the generalization of operator scaling to tensile scaling and somehow unifying both matrix scaling at this problem. It will also tell us something that we didn't look for, namely it will tell us where are these scaling algorithms coming from. So here's what I mean. So this unification will involve, like we've seen before, actions of linear group from tensiles and we've seen two examples of this. We've seen the matrix scaling where we tried to, the goal was scaling a matrix into doubly stochastic and this involved action by diagonal groups, diagonal matrices. And we saw the operator scaling where we wanted to scale a tensile and the actions were with a general linear group. So here are the algorithms in both cases. They apply a sequence of left and right matrices that are easy to compute and test convergence. So that's how the algorithm looks over here. We had a problem. The problem was scaling. I mentioned before that the scaling problems were invented with good reasons in common. Had these reasons for inventing matrix scaling and then the other reasons, an operator scaling group, it's had a reason to invent it. So scaling came as a goal. And the solution of this or achieving this goal turned out to involve groups and group action. And then the analysis of the algorithm involved minimizing some potential function, which we've seen in one case was permanent in another case was capacity. That was the logical order in this problem. And now what we are going to do with a lot of hindsight is replace the analysis and the goal. We are going to set a goal of minimizing some potential function. This will be our goal. And voila, like some magic, the scaling will arise spontaneously. Okay, and I want to tell you how. And it involves other environment view. So here is the setting. It's just a natural generalization of the setting we've seen so far. And it's a natural setting also for alternate minimization algorithms. This is a work with the Burgizer, Gang or Rivera and Walter. So we have a product of groups, maybe not two. We had two so far. Maybe we have some cave groups. In what I've described, there'll be just invertible matrices of the terminal one. And they act on a vector space, just a product of vector spaces. How do they act? Just like matrices act on vectors, like matrix vector multiplication. In every coordinate, the appropriate group just is multiplied by the appropriate vector. That's the action. It's the most natural action of a tuple on top. Okay, so we have this and now we can, we may be given one vector in this space and ask to minimize the following function. You can think of it as a generalization of capacity. We are simply asked to minimize or look for the smallest value that we can achieve by applying these groups to this, by applying this group to this vector. Okay, just minimize the L2 norm. The question again is, find the lightest vector. This is the argument. Find the lightest vector V star in the orbit of V under this action. So group actions define an orbit, everything you can get to buy by applying a group element on your vector. And the question is, what's the lightest vector you can get to? This in some sense is a canonical element of the orbit. Okay, it's not even clear that it's unique, but it's unique. It's an optimization problem. It's a non-convex optimization problem. And you can be interested in it or not. So you can ask, why is it interesting? Well, of course, we know that it's interesting because even for the case of dimension two, it generalizes the matrix, so it captures matrix scaling and operator scale. Okay, and then you can go to higher dimensions. And in the case of three, it turns out to capture a very famous problem in in volume theory and in algebraic combinatorics in the study of presentations of the symmetric group called the question of conical coefficients. I'm not going to explain it. So it captures some aspect, the asymptotic positivity of conical coefficients. And when you go even higher to any k, so you have like k parts to this vector, this captures what's called the entanglement distillation problem or sometimes the one body marginals problem that anybody in this audience says maybe there are some water loop people and other people who probably know and maybe even love this problem. I'm not going to describe it again. It's just because of time, but it's a basic problem of testing whether you can get to describing it, whether you are given a quantum state that is distributed across many, many parties, many agents, whether you can move it to another state that in a way that, let's say, increases or maximizes the entanglement between the different parts. Anyway, whatever. Good question. Can you explain again the problem? So what do you mean by GI acts on VI? You multiply the matrix by this vector. So I guess I'm confused. Why isn't the entanglement always zero? You can always shrink the vector arbitrarily. Because I said it's SLM, so it's matrices of the term 1. Yeah, but you can shrink one direction and expand some other orthogonal direction if it's independent in all the coordinates. I'm missing something basic. No, no, no. You are missing, but it's good that you are asking because I want everything to be understood. The input is a tensile. The input is a tensile, right? The input is a V1. This is my notation. I guess I knew how to make a tensile. Send you the character. It's a unique code. Go ahead. Okay, but let me say it completely understood it. So the input V is a tensile. It's a k-dimensional tensile. The action of one component is just a matrix of vector action on all the fibers of the tensile in this direction. Like we did before when we multiply, when we had the operator scaling, when we acted by a matrix on the right, what we did is base change all the columns of the matrices, right? And then on the right, it changed all the rows of the matrices. Okay, so here we changed all the fibers in every possible direction. Thank you. That's a very good clarification. And the last reason why is really the reason that people have been asking this for many, many years since, I don't know, Mumford invented the geometric invariant theory is that when this infimum is zero, it's a very special case and very important special case, then they call V a member of the null cone of this group action, which really means that just you can push this vector to zero by the group. And if it's not there, okay, so we'll talk about not zero in a second. Okay, so it's a very natural object to look at in geometric invariant theory. This optimization problem, that's not correct. Okay, and so I've just written it again. This potential for this infimum is zero. If the argument, if you can push the vector by the group to being zero and a very old result in this field by Kempf and Ness asks what happens when, you know, so when is it not zero? You want to doality theorem. We know when it is zero. You know, we can prove that it's zero just by supplying, by supplying a sequence of elements of the group that makes the vector the tensile smaller and smaller. We can ask when is it, how can you certify the NP-CoNP thing? How can you certify that it's not zero? Well, it turns out that you can write what you may call is a non-commutative generalization of the Lagrange conditions, right? To characterize what's the minimum of this objective function. And voila, what comes out is a bunch of conditions that look exactly like doubly stochastic, but it's really, you know, not doubly, but now it's dimension K, K-way stochastic. They look exactly like the operator scaling conditions in K dimensions. And for those who like quantum information theory, they really want all the marginals to be identical. That's what comes out, you know, whether you want it or not from this. So in other words, scaling arises as a dual condition, dual optimality conditions of the case where this generalized capacity is not zero. So, you know, even if you don't want it, you find it. So scaling arises naturally from this optimization. Now, we already have scaling algorithms. In fact, scaling algorithms suggest themselves for this kind of problem, just like we've seen. So we can now think of this L2Norm as a generalization of another, how should I say, potential function for the natural algorithm that tries to compute this optimus. And in fact, that's what we do. I want to mention that very recently, in fact, it's not completely written. We have another generalization of this work where, instead of scaling to the uniform case, as they call it, to doubly stochastic, or to identities which are just the same in every direction, you can try to give other constraints and try to scale to, suppose you're doing matrix scaling, you may try to scale a matrix so that, you know, its raw sums are a prescribed vector and the column sums are another prescribed vector. And I'll say this question, and this was studied a lot. So there is a natural generalization here that everything I'll say applies. At any rate, what we can pull in this paper is exactly the same. In fact, that you run the same algorithm. You just repeat, you move from coordinates to coordinates and you scale it in that direction and repeat it a polynomial number of times. You can tell whether it can be scaled in the sense that it can approach the scaling conditions in polynomial time in all parameters. I'll say all parameters, both before and here, it's polynomial in one over epsilon given any epsilon. It turns out that it's really very desirable to get log one over epsilon, but I will not talk about this now. And the key message here is that it's the same analysis. It's exactly the same analysis you've seen before. Okay, it's really the same analysis. You just have to prove this hard thing. I mean, it's that if this functional is positive, then it's not too small. And the way you do it, if I'm going to make a joke out of it, is just using variant theory. That's what we want to prove. And let me not get into this, but it has this important connection, characterization of vectors in the null cone using the invariant polynomials, which I mentioned before. So there's a group action. This group action has an invariant link. It's generated by a bunch of polynomials, multivariate polynomials. And a good description of these polynomials, especially a description that has low coefficients and low degree, if you can get it, then you can use it to bound the running time of this algorithm. And that's exactly what we need using techniques of Cayley and others to get this. So that's where this is one line. If you've done all this work, the analysis becomes just one line, but it doesn't matter. Let me tell you in the last five minutes, just tell you some, you can ask me questions about this, but I just want to tell you some other settings of alternate minimization just to show you that you know these problems and you know these algorithms. And we simply have no way of currently of analyzing them. They are not group theoretics. They are different. But I think maybe it's a hopeful way we should look again and more seriously at this point. So I'll give you three problems very quickly. Again, alternate minimization appears everywhere, statistics, optimization, machine learning. They all have the same structure. There is a global function of many parameters F. We want to solve it. Again, maybe let's say optimize it, minimize it or something. Okay, over some domain. In our domain, the ZIs were group element. This is a complex problem, but it turns out that every one-dimensional special case, you know, for any particular coordinate you want, if you leave the others fixed, then it's an easy problem. It's a simple problem, or maybe in the quantum case, it's a local problem. One agent can do it by himself, but he can't change the state by applying, you know, transformation. So here are some examples that we would love to understand. One example, a very old example is from Neumann's study. They are all non-convex, by the way, as you'll see. So what's the distance between two convex sets? It's a non-convex problem, but so I'll give you two convex sets of natural description. You want the minimum distance, and you can do what he called alternate projection. You can start with any point in one set and project it on the other. Look for the closest to it in the other. That is a convex problem, right? And then look for the closest problem to it on the other body. And look for the closest and look for the closest one. And the question is, does it convert? This is an obvious alternate minimization algorithm. Sometimes it always converges for convex bodies. The question is how fast it converges, and from Neumann had a beautiful theorem that it converges for closed subspaces of any universe state. But we don't have really good characterization of, you know, the general cases for which it converges. I mean, with our fix zone, I'm not going into that. That was one problem. That is natural. Maybe you've never thought about it, but it's natural. Here's a problem I'm sure many of you have seen. This is a, you know, Monte Carlo Markov chain problem. Gip sampling or metropolis algorithm. Why is this an alternate minimization algorithm? Suppose you want to sample from a complex probability distribution. You know, it's a probability distribution over many parts. Let's say k parts. There's a coloring of a graph with k nodes. So these are the parts. Every part can have one of some number of colors, one of three colors. And you want to, but there are constraints, and you want to sample from this distribution. Let me illustrate it just with two parts. So you want to sample from a probability distribution from two parts. Or just finishing the story about coloring. You know how it goes. You keep recoloring a vertex, jumping from one vertex to another. So it's an alternate minimization algorithm in an obvious way. So for two distributions, you want to sample it just to, you can illustrate at the same point. You just repeat, and that's what happened in a general metropolis algorithm. You keep sampling, re-sampling one coordinate, fixing all the others. So in this case, you fix just the other one and you can sample from, sample pure conditional P and sample P conditional Q. And these are simple things. And the question again, whether it converges or not, and how fast, and we know that's a hard problem in general for Monte Carlo Markov chain. In this particular case of two, it turns out that the paper of the icon is set out. It shows how it's very much related to the problem of distance of two, to the von Neumann problem. They're very related. And the last one you know is Nash equilibrium and the famous Lenke-Hausen algorithm. Again, a problem that we don't know an efficient algorithm for. What's the problem? We have a two-player game, so it's just two matrices that give the payoffs to the players. Strategies are distributions on their strategies. So strategies for one player is the rows and the other is the column. So Nash equilibrium is two probabilities, two strategies for the players that each is a best response to the other. This is a formula what it means to be a best response, that you don't want to change your strategy given the other. And you may want to find one that's an important thing in economics. And the Lenke-Hausen, roughly speaking, it's more clever than what I'll say. It's more specific than what I'll say. But basically, it's alternating best response. You start from some strategy for the first player, P1, and then Q1 is the best response to Q1 and so on and so on. You just repeat this. And in fact, it converges. And people use it. But there are well-known cases where they converges this exponential time and it would be nice to find better algorithms. It would be nice to generalize it to more players. Maybe it was done and so on. But that's another setting where alternative minimization appears. And there are many other places and there are many other ways. So just illustrating the breadth of this heuristic. Question. We can hear you, Avi. There's a question here. Yeah. Go ahead. Could you go back to the Nash equilibrium sort of thing where you were characterizing the Lenke-Hausen algorithm as kind of a best response dynamics? I'm a bit confused about that because in general best response dynamics does not converge. So just think about, for example, rock paper scissors. No, it does not. I agree with you. It does not converge. I said that the Lenke-Hausen algorithm is much more clever and more detailed than what I've shown in this caricature. But it has, you know, so it picks particular best responses. There would be many best responses. And it picks once cleverly so that there will be progress. And in that case, it does converge, sometimes in exponential time. It's not that any best response will converge. I mean, for example, in the context of rock paper scissors, there's, you know, if you start, when one player starts with rock, there's a unique best response. Play paper with probability one. And then there's a unique best response. Play, you know, scissors with probability one and so forth. But that cycles. I agree. So you have to do something else. Okay, I see your point. Okay, good. What you're saying is that Lenke-Hausen, sorry. Very good. So what you're saying is Lenke-Hausen is not quite, you're right. You're absolutely right. That's an oversimplification. It's not just best response. Yes. In many cases, it looks like best response. Maybe I should remove Lenke-Hausen there and then ask whether this algorithm. Lenke-Hausen is more clever. So this is a natural heuristic for finding a national equilibrium. And like you just demonstrated, sometimes it does not converge at all. Sometimes it converges and slowly, and sometimes it converges quickly. And one can try to understand it. And as you point out, Lenke-Hausen does something else in situations like the rock-paper-scissors. That's a very good point. I just have to remove Lenke-Hausen from this slide. Thanks. Are we going on, Avi? I can say a little bit about the non-commutative algebra, what are the skew fields. This will be one slide. And I can say a little bit about invariant rings. This would be two slides. So I estimate maybe 15 minutes. But I don't know. I can go on forever. Everybody knows me now that I can talk as long as people are listening. But as you wish, you are the boss. So let me take us offline at least. But people are welcome to stay and chat here. Let me just thank you again. It was a great talk. And I'm hoping to see everyone again in two weeks for Dorminser's talk. So again, thank you, Avi. Just take us offline. Otherwise, Google will hate us for spending so much of their bandwidth. But you're welcome to stay.