 Right, so we're talking about something that's loosely related to the other lectures. We're not not so closely related Which is the Lindenberg exchange method, which is a tool for getting some universality results It doesn't get you all the universe universality results that you'd like You often need to combine it with sort of a transverse method to get the best results and we'll talk about that in a little bit So basically the the general Principle of this method is you know if you have a bunch of Independent random variables And you're interested in some statistic of them So maybe you're taking some some combination of these random variables, and then maybe taking the expectation some of some of some Function it can be linear nonlinear function of these random variables You want to understand what that what this expectation is So what the exchange for that does is that it exchanges these random variables with a difference that around the variables Which you maybe you understand better? But they're similar to like they may have the same mean and variance or So they're related to these these original random variables usually by some matching moments So for example, their means might be the same Okay, maybe the variance is the same And in the case of random matrix applications, we often need a few more matching moments than just the first two Then often what you can do with the exchange method is that you can compare This statistic of of of this set of random variables with the same statistic of the other set of random variables If you have if you can check certain conditions And what I told you is that the statistic is universal in that it It doesn't depend on the precise distribution of each of these inputs as long as they're independent as long as as you keep certain moments fixed and So this gives you a certain type of universality Unfortunately with a moment matching condition, which is often not optimal by itself So you have to combine that with other universality techniques to get the best range of universality Okay, so this is the general Description of the method but maybe the the easiest way to explain the method with an example Is to talk about the most classical example by Lindbergh himself in the 20s When he put the central limit theorem, okay, so you'll know this there Okay, so if you have I divine variables They all have the same distribution X Sorry To stay on okay, if they all have this let's say a real one variables of means you in variance one Okay, and then you form the sum and you normalize by root n Then as you will know this converges in distribution To the the normal distribution Okay So another saying that is that given any test any test function if you have a function f which you can take to be smooth and compactly supported if you look at the statistic if you look at this linear statistic you take this smooth function of this of this of this average of This empirical average you take the average This will converge to the expectation of f of a Gaussian random variable where Jesus any random variable with the distribution of The standard Gaussian Okay, this is an equivalent form of the central limit theorem So yeah, normally when you say conversion distribution you look at the probability that this is less than some threshold That corresponds to taking f to be the indicator function Which isn't smooth but you can approximate it by smooth functions and because the Gaussian has this has a smooth distribution You can easily see that these are equivalent Okay, so so this is one formulation of the central limit theorem Okay, so the way Lindenberg proves this theorem. So first of all, there's a technical remark Is that so there's a standard truncation argument? so right now Random variable only has two moments. Okay, it's mean is zero and various is one All the higher moments may be infinite But without us a generality you can truncate and assume for example that the third moment is finite Okay, so this random variable could have heavy tails, but This statistic is very stable and so you can apply truncation methods If you if your random variable has a has a has a heavy has heavy tails So for example, if the third moment is infinite you can truncate it replace it with a say a bounded random variable plus an error which has which has small variants and The error will not contribute very much to this to the sum just from I guess Chebyshev's inequality And so it's very easy To truncate it does it changes the mean and variance a little bit But that also you can show doesn't doesn't doesn't cause too much damage But there's a standard truncation argument which is in the notes. I'm going to skip it So you can assume without loss of generality that you have some finite moment here Which you become important very shortly Okay, so Normally the way that you prove this sort of thing. So the standard truth of the central limit theorem Proceeds by Fourier analysis Right, so you you you write this statistic in terms of things like e3i the characteristic function So the textbook proof of the COT Represents this this sort of statistic in terms of the characteristic function and the point is that is that? Because especially of some is some of exponentials that this fact this factors This fact is very nicely and because of independence Everything fits very nicely and you can compute this and and and you just need to do a certain amount of Fourier analysis To to get from from this back to the central limit theorem But this is about a special technique it works very well for the COT It gives extremely sharp results for the COT But it relies very much on the fact that you're taking a linear statistic if in order to serve this exponential If you had a nonlinear function here, this technique doesn't work nearly as well And the statistics that we are interested in random matrix theory are very nonlinear. So we don't use this method much in matrix theory so Yeah, so instead what we do Yeah, so these random variables are not Gaussian, but what you do is that you introduce And it was that our Gaussian Okay, so you introduce some new random variables that are also independent independent and Okay, so the independent of each other we can also assume the independent of our original variables Okay, so everybody's independent everybody else And so these are new random variables and they have the same union variance of course Okay, so they have mean zero and they have variance one so they have the same first two moments as Your original random variable the third moment may may differ of course Gaussians have third moment zero, but but your original random variables that we don't assume anything about the third moment There's no could be anything okay, so Okay, so the thing is that the central limit theorem is very easy to prove for Gaussians, okay, so All right, so we already know That if you take the empirical average of a gas of the Gaussian random variables Of course some independent Gaussians is again a Gaussian. So, you know, this is in fact even just equal To second the right thing, okay So for Gaussians for Gaussian random variables, we've already essentially with him. It's just an algebraic calculation You know, and it's nothing more than the fact that some of independent Gaussians is Gaussian and you just keep track of the mean and variance So all you need to do so all you need to do all we need to do To finish the proof This show universality In the following sense Okay, so you need to show that that the statistic coming from the X's minus the statistic coming from the what? From the Y's Goes to zero as n goes infinity Okay, that if you can show universality in the X and Y can be exchanged with each other with Negligible cost to the statistic. So the statistic is universal then you're done Okay, because this this guy was already the right answer. So this would give you the central limit here Okay, so the thing about about this formulation is that you no longer see the limiting distribution. I mean g You only need to figure to understand universality Okay, so The way Lindenberg proves this is By by the exchange method, so you know here you're swapping all the X's with all the Y's But rather than swap all n X's of all and Y's you can swap it you can you can fill up this You can decompose this big swap into n little swaps where it in each swap. You only exchange one variable at a time So so this left-hand side Star Telescopes You can write this big difference as a telescoping series I close I guess one to n minus one No, and and of Let's see. No actually doesn't really matter But I think I think a partial swap like this where I minus one of the X's have been Switched to a Y and you subtract off the same thing where I of the Y of the X's have been switched to Okay, and so this is a telescoping series. It clearly sums to to this is this big thing. Okay, so you just flipping you just sort of Doing this swap sequentially one at a time and just keeping track of Of the Delta of the change as from each from each change, okay So basically the strategy is that you just try to understand these Small differences as best you can and then you just sum it up using the triangle inequality Okay, so just for sake of explicitness. Let's just deal with the The final entry I can only move the first one. Let's do it. Let's do the first one So why one plus X two? Sorry, sorry X one. Let's just look at this one Okay, so let's just look at the effect of just swapping the first X one to a Y one keeping everything else the same Okay, so the point now is that these arguments are very close to each other Okay, so that this is Okay, so You know, I mean The full average here could be quite different from the full average here But you know these are all they're almost bound of variables. They had kind of third moment And so this this guy is will typically be about just distance one of a root n away from this guy here So yeah, these are these are fairly close to each other And you're evaluating some nonlinear function f at two points that are close to each other and because of that It's a it's usually a good idea at this point to do some sort Taylor approximation to try to compare these two So first of all we can we can so this is a big common term here. So let's call this s okay, so so this is s plus y1 root n minus the expectation of s plus x1 root n Okay all right So now what we do Yeah, we just do a Taylor expansion. Let's take the first term for instance. Okay So you can tell you expand this is this is f of s s plus f of s, the prime of s Of course we assume it is smooth y1 root n plus expectation I'll say one half double derivative of s y1 squared over n plus higher order terms, but you can So all the higher terms so that the third derivative of f is bounded you get y1 cubed But why next we assume to find a third moment and then you got a one of a root n to two now three powers So if you think about it the error term Will be a size n three-halves okay So this is the main term a size one There's a correction of the size one of root n a second correction of size one over n And then all the errors will be bounded by one of the three halves because of this assumption of a final third moment And of course the same thing is true for x. So you place y by x Everything is the same. It's all the y1 to the x1 Okay, so this is it looks like a mess Yeah, these don't look pleasant to compute, but actually we don't need to compute them. This is the beauty of the of the linear method See this s is complicated. Well, it's not complicated But this s is some is this combination of these random variables But the point is that this this s variable is independent of the x of the x1 y1 So all the input random it was x1 to xn y1 to yn Independent so in particular x so s which depends only on the other random variables doesn't depend it's independent of x1 y1 So because of independence This expectation splits Like that and this expectation also splits So yeah, so here's where we crucially use the independence assumption to split up the These expectations now these don't look fun to compute In fact, you basically need the COT to compute them which we are trying to prove but we don't need to compute them because We now observe that if you replace the y's If you place y1 with x1, whatever these these points are they don't change Okay, so because there's no y1 here. Anyway, if you place it y1 of x1 because moments match and x1 and y1 have the same moment first moment this Expression is the same and again similar here when you change change Well, why x1 to y1 here this expression doesn't change So when you take the difference all this was cancelled you don't actually need to compute them and so what you find So is that this difference therefore the whole difference is just nothing more than on the end of three halves It's just a size in the three halves Okay, and you're summing n of these terms and then you're done Okay, so so what you find in fact we get a more quantitative theorem of various scene type In fact the difference between these two statistics is actually one of a root n Because every individual switch by doing this Taylor expansion Has an arrow at the three halves and you're doing n of these swaps. Okay, and so the The decay rate that you're getting is good enough that you get a good balance here Okay So this explains for instance why it's important to have exactly two conditions in the central limit theorem So you can think of the central limit theorem really as So so under this interpretation you can think of the central limit theorem as what you might call a two-moment theorem What is really saying is that the first two moments of your random variable? All that are needed to determine the asymptotic behavior of your statistic But if you have different variables with the same first two moments then asymptotically Your statistics match if you only had one matching moment Then this wouldn't work if you had one matching moment You can't afford to tell you expand up to this order because you can't control this term here You can only expand the first order and then your error term will be one of an n and then when you add up And your n terms you now get a difference which is bound bigger of one of the literal one And of course the central thing Central limit is not true if you change the variance if the variance is not one you don't converge to the variance one Gaussian You know so the two moments are absolutely necessary for COT and this argument shows you why This argument also shows you that that if you had more matching moments So so if you're if for example your random variable have the same Match also third moments with the Gaussian in other words. It was not skew the third one was zero This argument would show that in fact you get a better convergence rate So if you had three matching moments You could expand to tailor expansion one more time your error term would upgrade from one of it into three halves You don't want to be in squared and then your total error in your financial limit theorem would upgrade from one of a rude end to One of an end So the more moments you have matching The better the error term you'll get for for at least for this formulation of the COT Unfortunately, when you go back to the standard formulation of COT the error term actually degrades back to one of it We didn't but at least for this sort of smoothed out version you get better decay You can also see that from the Fourier method to actually but Okay, so this is the basic principle behind the the linear bug method and The the beauty of this method is that you know, it didn't rely so much on the fact that this was a linear statistic I mean it did it just a little bit, but it is much more robust to to having a nonlinear behavior of It with respect to the X is then the Fourier method So by the way people who know about Stein's method may see that they think they may see a lot of similarity Science method is also another method for proving things like central limit theorem. It is it is very similar Although science method is much more focused towards proving that things converge to Gaussians Rather than proving universality And and linenberg's method Works even at the limit even the limit distribution is not Gaussian Which is often the case in in nanometrics there I mean you can also the science method has some some application in that matrix you too, but I won't talk about that All right Okay, so right so basically The general principles is that if you have a statistic involving n functions n random variables Normally, there's a normalization factor one of root n Which means that every time you do Taylor expansion you every term improves by factor one of root n and if you're doing n swaps You needed an error Each swap should have an error of like one above end of the three halves and that that's why you need two matching moments In random matrix theory, you don't have n random variables. You have like n squared Well, either n squared or n choose two or something like that Okay, but the number of random independent variables is now n squared And if you want to swap one random matrix of another you need to make n about n squared swaps rather than than n swaps Which means that if you want to use this method the the error for each swap should actually now be of order n to the minus five Halves rather than minus three halves in order to to get little one when you multiply like you do or in the square steps So normally when you go to a random matrix theory all these two moment theorems become four moment theorems And you would you actually start eating four matching moments in order to make this argument work Okay So let me give you a simple example of all of this. Well, it's not that simple, but one of the simple examples of Linux method applied in a matrix theory. So let's take We will make the matrix Okay, so these are real variables Okay, you can also kind of complex. Yes, the notation just becomes a little bit messier. So let's let's take real They're independent. Let's even say I ID. Well, let's say independent so On the upper diagonal and they're all normalized to a mean zero And variance one Except okay. Well, thank you for It's gonna be convenient to be on the diagonal actually it's convenient to normalize the variance to be two Okay, so we don't have a random matrix variance two on the diagonal and and one outside the diagonals It's not so important. There's two the only reason I have this to is because one of the most important examples of the real big ensembles of the gas to know ensemble GOE and This is the case where the off diagonal Entries have to have our Gaussian reverence one and the diagonal entries are Gaussian variance two This does have to be the correct Normalization if you want to make this ensemble off of the invariant. That's the old the GOE Okay, so so this of course is The basic example or you could take But there are many many other on fumbles of this type You could take sign matrices where all the off diagonal entries are plus or minus one use you import symmetry and you impose that the diagonal Say plus or minus root two Okay, so so these are real big two matrices and so the Claim I want to make so what we're going to do this matrix first of all we normalize it one of your 10 And then we go look at the still just transform so Okay, so we pick some some spectral parameter Z and Z would be in the upper half plate like this sitting about some energy Ada we're going to look at the still just transform Which is that you take the resolvent or green function if you like take the trace take a normalize trace So I think in order to let you so this was called ammo M of Z. I like calling it s of Z But yeah the equal M if you like Okay, so this is the still just transform So this as we saw in the dishes lectures. This is a very important statistic it governs a lot of the Local it covers both so the bulk behavior and also the microscopic behavior of your spectrum depending on and the data sort of Determines what what scale or spectrum you can actually see using this this This approach and so Okay, so what should it? Yeah, so what we're going to prove See exactly. All right. Yeah, so if you have to and if you have to MN and if you have two dignitaries, okay, so So let's first just assume just two matching moments to begin with and nothing else There's also some technical tail conditions We're gonna say we need these These variables to decay fast enough. So we're going to be in sub Gaussian Let me not dwell on exactly what you need Think of your variables. See it's either being Gaussian or bound or bounded Like it's so sweet. So like all moments all moments Okay, then If it is big enough for some of it is bigger than what is what I needed here? Let's say in minus one half If you see if you are sufficiently far away from the spectrum, then you're universality. Okay, let me call this s of that You take this particular statistic the expected sort of transform of Of matrix which is very roughly speaking is trying to count how many eigenvalue It's not exactly this but it's roughly counting how many eigenvalues there are Between E minus A to E plus A to it. It's not quite that but that's sort of roughly what what it's doing And so the statistics universal you can swap M with M in prime and you get the same answer as long as you are You are not at very small scales. So you stay in minus one half now This is not optimal. This is not really satisfactory because if you want really local behavior the with this normalization the the average The the microscopic scale is actually one of a n and so this is quite far from that so you only get You only gets a halfway between the bulk scale of one and the micro scale one of a n with the serum but the thing I wanted to add but if If you assume full matching moments, okay, so what that means is that it's not only not only the means of variances are fixed like this, but but The third moments agree and the fourth moments agree these moments agree then Then this works. Okay, so then then one can take data all the way into the minus one In fact, and he didn't go to the tiny bit below into the minus one. In fact, you can you can actually penetrate just a little bit below the Scan factor the argument gives minus 13 12th and and And and okay, and you still get universality. Okay, so so even at the micro scale you still get universality But you need these extra matching moments Okay, if you only have two matching moments, you can only get down to do this scale If you have three matching moments, by the way, you get down to end the minus one bus epsilon You almost get down to them the micro scale but not quite Okay so So basically the way you do this it's almost the same proof So let me call this This name so So not the trace of the inverse You know before I do that So the yeah, so you have these two rig the matrices M and M and prime and so they're all different completely different from each other so You know, so you have to swap n-squared entries here You have to sort all insured entries of this matrix, you know what to get this matrix But you can you can break up this swap, you know, so to get from you one matrix to the other You can break it up into I think And it's one of the two different swaps Decomposed Now you can't just swap the entries one by one because you have to maintain symmetry So if you're on the diagonal, you just swap the diagonal that's not a problem But if you want to swap I say this entry with the cost my entry over here You have to also swap the the opposite entry over here to maintain a symmetry So writing down exactly what the swaps are requires with no notation, but but basically what you end up A single swap looks something like this so You can break this this big swap into a bunch of little swaps and what each little Swap looks like is that you there will be some intermediate matrix and then told that Which some of the entries will be z ij and some of the entries with z ij prime And there will be one entry to zero and then the opposite entries also zero So there will be a matrix almost big net except that one entry is it has been zeroed out So it's essentially a big matrix I call the generalized big matrix. So you you have this sort of this generalized big matrix here For the zero and eij is a modified elementary matrix It's it's a matrix of a one in the position to zero here and also one in the opposite position Okay, and zero everywhere else. So it's just a sort of rack to matrix here Okay, and if ij is diagonal you do something slightly different, but but in the off diagonal you do something like this Okay, so yeah, so if you add z ij times eij to this matrix You just replace your you are placing the zero entry with z ij and if you said add z ij prime your your places zero Okay, so I'm not going to write down exactly the precise definition of these things hopefully sort of clear what I mean by this but But basically this this difference here So the upshot is that you can write this difference as the sum So star is the sum of about n squared terms Expressions the form expected value of still just transform of a mentoda plus z ij over matrix Minus the expected transform of the other guy and so basically If you can show that if you each of these terms if you can show that each of these terms is little old Which is better than one of n squared smaller than one of n squared