 Okay. So, instead of minus 5.5, I can just take minus 10,000, 10,000, okay? Here, and then I've actually proved what I'm claiming I proved. Okay. So, the bottom line is this is putting together all of the information that we got yesterday about the expected moments, convergence to the Catalan number, the variance of the moments themselves being small and therefore we can conclude that F of W bar converges in probability to sigma. Okay. But, mind you, we, okay, so let me not go forward and just stay here for another second. We only showed this by doing pass counting and a bunch of other stuff and for that you have to assume the existence of moments of every kind, okay? So, all the moments are bounded. However, in practice that's not the requirement for the ESD to converge to the semicircle distribution. So, what I'm going to do now is I'm going to remove that. I'm going to only keep along as requirements the centering of the variables. The fact that the off diagonal variables have variance one and the fact that the diagonal variable variables have bounded variance. I don't care what that variance actually turns out to be. It's enough that it's bounded. But I will remove all higher moment assumptions and prove that the semicircle law still applies. Was there a question? Yes. They can be different but what I want is that the variance of the diagonal elements is bounded by some absolute constant that doesn't depend on n. Two, three, ten thousand, okay? Any other questions? So, to this goal, to this end, let me tell you about a beautiful linear algebra theorem, the Hoffman-Villain theorem. Sometimes, in your mathematical explorations, you will find, you'll stumble across people who will tell you things like linear algebra is an undergraduate subject. Not seldomly enough, sadly. But when they say that, you can tell them just look at the Hoffman-Villain theorem. This is a very beautiful theorem, the proof of which, yeah, arguably, you can explain it to smart, really advanced undergraduates. But you cannot really teach it in an undergraduate class. I would guess not even, you know, at Harvard and MIT. It's a beautiful theorem. It's not terribly hard to explain but it relies on a few very interesting statements like, for example, the fact that the doubly stochastic matrices, the set of doubly stochastic matrices, is convex and its corners are permutations, permutation matrices, which is not terribly hard to prove but it's not trivial either. So the whole thing, it's just a bringing together a very beautiful linear algebra facts to state the following thing. Suppose you have two symmetric matrices A and B and suppose that their eigenvalues are ordered, lambda 1 through lambda n in increasing order, lambda 1 through lambda n of B in increasing order. Then the sum of the squares of the differences of the eigenvalues is less than or equal to the trace of the square of the difference matrix. This is a perturbation bound, which turns out to be very useful in practice in linear algebra and numerical linear algebra and as we see right now in random matrix theory. So this is the Hoffman-Villain theorem. I'll be happy to tell you how to prove it if you want after the lecture. If you already know how to prove it, good for you. I like it when people know a linear algebra fact. Okay. So that's going to be our tool. Now I promised that I am going to remove the moment condition on the variables. To do so essentially I'm going to do the following thing. Let's see if I can... Do we have eraser? Oh, there it is. An eraser. What I want to do essentially is chop off the variables. I want to pick some large compact set such that I define a set of variables that are exactly equal to my WIJs on that compact set, large compact set. But these new variables WIJ hat will have the property that they drop like a stone right after that compact set, but they keep expectation and variance being what they were before. So in other words, if I don't assume that my variables have moments of all kinds, that means that they can have fat tails, right? The distributions of these variables can have fat tails, but no matter how fat they are, you can pick some epsilon such that if you pick some large compact outside of that, the probability of those variables taking values is very, very small. Okay? So that's the trick. So here's minus k. Here's k. And I want to pick perhaps my variables, so the distributions perhaps look something like this. And I want to pick my other variables such that they're identical here, but then their distributions drop off fast. Maybe they undergo some sort of a bump right here to kind of squish in the remaining expectation on both sides and appropriately make the variance one, okay? I'll leave this for you as an exercise to see that you can in fact do that. The important thing will be that these new variables almost always, with the exception of small probability sets, will agree with the old, okay? That's what the new variables will do. And actually, they will do more. They will be compactly defined, which means that they will have moments of all sorts, okay? So a variable that's compactly defined does indeed have moments of all kinds, bounded, et cetera, et cetera, et cetera. And therefore, for them, we will be able to apply the Wigner semicircular law and conclude that their ESDs, the ESDs of the matrix defined with the new variables with hats, does converge to the semicircular. Have I said anything surprising up to this point? Okay. So that's the gist of it. I'm going to define these new W hats. And then I'm going to want to show that the ESDs of the matrix defined with the Ws is very close to the ESD of the matrix defined with the W hats, okay? So this is the approximation, truncation and approximation. Okay. So this is what I just mentioned before. Now, let's take a look at the following statements. As I truncate my variables, so the distributions of the variables, Wij, farther and farther out, the expectation of what lies outside goes to zero. And that happens because the expectation of the Wijs themselves goes to zero. And the probability of the Wj's being outside of this minus KK goes to zero. Okay. So eventually what happens outside doesn't matter. You shouldn't be able to say this for any variables Wij, but you have the additional fact that they do have tails because their squares, the variances are finite. Okay. That's what allows you to make this statement. Is it clear? Okay. Similarly, the same thing will happen with the variances for the same reason. As K goes to infinity, truncating these variables on bigger and bigger sets means that what lies outside, the variance of what lies outside goes to zero. But if you put these two things together, it follows that you can pick K, depending on epsilon, big enough such that essentially the probability that Wj falls outside minus KK is very small. And the probability of Wij hat falling outside of minus KK and thus being different from Wij is also very small. Maybe not exactly epsilon here. I should have put a big O of epsilon. Let's say a hundred epsilon. Okay. A hundred epsilon is going to get us covered in any case because if you look at how you can define these guys, the probability that Wij falls outside of minus KK is already smaller than two epsilon and the rest of it can be made as small as we please. So that means that when you put these two together, the probability that the difference between Wij outside of that and Wij hat outside of minus KK is bigger than epsilon can be made smaller than a hundred epsilon. Okay. So most of the time, they agree. That's the point. Most of the time, the two of them are actually exactly the same. And when they're slightly bigger, well, that's a probability that's really, really small. Okay. You can do this using Chebyshev. So you look at probabilities. You do some sort of a triangle inequality bound and then you use Chebyshev. You use variances to prove their probabilities are small. All right. So what does that mean? Let's take a look at this quantity. 1 over n trace of W bar minus W bar hat square. What is that? Well, you pull out an n, you get this because remember that W bar and W bar hat are respectively 1 over root n W and 1 over root n W hat. So you square that 1 over root n and put it out. That's 1 over, it gives you an additional n here. This is small case n, not large case n. And that gives you 1 over n squared summation of these things square. Let me write it down correctly. Percy, how much time do I have? Okay. Thank you. So what do we have? We have 1 over n trace of W bar minus W bar hat or W hat bar. How do I write it? The other way around. Okay. Square, this is 1 over n squared trace of W minus W hat squared. And now let's see what this is. If all the Wijs are smaller than or equal to in absolute value k, then the Wijs are actually equal to the Wij hats. So I only have to look at this under the assumption that Wij is bigger in absolute value than k. So this can be written as therefore 1 over n times 1 over n squared, sorry, times sum. It's the trace of the square of the matrix. So that basically means that you're looking at W, let's see, Wij 1 Wij bigger than or equal to k minus Wij hat. Same event, 1 Wij bigger than or equal to k squared. It's symmetric. They're symmetric matrices. Okay. So W minus W hat ij square trace of that is going to be W minus W hat ij times W minus W hat ji, but those two are the same. And that's why you have the square here. Okay. And it's solely taking stock, so that difference is zero if Wij is within minus kk and it has meaning only if you're outside of that interval. So far so good. But if we go back, we have this. So we can pick, we can choose k appropriately so that the probability of this phenomenon happening being less than 100 epsilon for every i and j. So for every i and j, this is the important part, which means that the probability that this quantity here is bigger than epsilon because you have n squared objects, but you divide by n squared as well. The probability that this quantity here is bigger than epsilon is less than 100 epsilon. Okay. So the probability of 1 over n trace of W bar minus W bar hat squared, which is less than or equal to this, being bigger than epsilon is less than 100 epsilon. It's small. So now let's look at the complement, the complementary event. Suppose that we're not in this event, we're on the complementary event. So 1 over n trace of W minus W bar hat squared is less than or equal to epsilon. If that happens, let's take a look and see if I pick a Lipschitz function with constant c, what the difference will be between the inner product of FW bar with F and of F sub W bar hat with F. What's the difference between those two inner products? Well, let's take a look at what they are. So FW bar F is going to be 1 over n summation of F of lambda i of W bar. So far so good. And the same thing will be true for W bar hat. Think about the fact that we've seen these two identities exemplified on monomials. So we've seen them when F is x to the k as 1 over n trace of W bar to the k. But this is true for any function F. So when I subtract these two guys, essentially what I'm doing is I can group them in such a way that it becomes a sum of differences of F of lambda i of W bar minus F of lambda i of W bar hat. Let me write that down. So absolute value of FW bar F minus FW bar hat F is going to be therefore less than or equal to 1 over n summation i equals 1 through n absolute value of F lambda i W bar minus F lambda i W bar hat. Triangle inequality. Let's recap. So what is this? This is the inner product of FW bar. So corresponding to the original matrix whose variables have no assumption of bounded moments beyond moment number two, beyond the variance. And this is the ESD of the truncated variable matrix for which we have moments of any kind, bounded moments of any kind. And the difference between these two numbers is going to be bounded by this sum. But F is Lipschitz. So that means that each term in this sum here can be bounded by some constant by the constant fixed finite constant C corresponding to F times the difference of the eigenvalues, of the difference of the eigenvalues. So i equals 1 to n lambda i W bar minus lambda i W bar hat. And that's this inequality right here. That's the benefit of using Lipschitz functions. Now we can go back and use Hoffman-Villand. And what does Hoffman-Villand say? It says that if instead of doing squares here, I take, sorry, if I have squares here, then I get the bound with a minus b squared. Now here, I don't have squares. I just have this, right? But I can use, what can I use? I think some Cauchy-Schwarz kind of inequality, I think. In fact, I'm certain of it because you just square everything. You use Cauchy-Schwarz and that gives you the following inequality. So this difference here is less than what essentially is the trace of W bar minus W bar hat squared averaged out 1 over n. And as we talked about it, this is the good event. This is the event that happens with probability 1 minus 100 epsilon, at least, on which this quantity here is less than epsilon. Therefore, the difference between the inner product with FW bar and the inner product with FW bar hat is less than C root epsilon. But of course, epsilon has been chosen arbitrarily. So what have we shown? We have shown that no matter how small you choose epsilon, you can prove that the difference between these two numbers on a set of probability going to 1 with epsilon, the difference between these two numbers goes to 0. But that's the same as saying that the SDF FW is converging. If this is converging weekly in probability to sigma, then so is this because the inner products go to the same numbers. Is that clear? Any questions? So essentially this is why this truncation, putting the variables on a compact, putting the variables on a compact has the effect of giving them moments of all kinds. And then proving that if you take this compact to be large enough, depending on epsilon, then essentially the inner product of the resulting ESD with any Lipschitz function is very, very close to the inner product of your original ESD with the same function outside of this very, very small interval of probability very, very small. So that's why now basically this is the conclusion of the lecture. We've shown that the Wigner semicircle law happens, not just for matrices, for Wigner matrices that have moments of all kinds, but simply for Wigner matrices that have centered variables, variance is one on the off diagonal and bounded variance on the diagonal. And that's all you need to know. So this is the full strength of the semicircle law. Not tomorrow. Tomorrow I'm not going to see you because we don't have a lecture tomorrow. Actually, I guess I'm going to see you, but I'm going to see you outside of this, outside of the classroom setting. And on Thursday we will go on and we'll say, okay, so now we see this first order effect. So these, all of these ESDs are converging to the semicircle, but what can we say about the fluctuations from the semicircle? And that's the next question that we'll be exploring. Okay, thank you. Any more questions for Wigner? Thank you.