 This will be Janne's second lecture on the semi-circle loan. Hi everybody. Sorry, just a sec here. These are harder to get at. Let me just put it there. Okay, so first of all, I'm sorry. This is starting a little late. I got on the wrong bus and I was told that I had to get off in the middle of nowhere. Okay. So I'm a little short of breath too, because I ran here. Alright, so let's see where we were yesterday. In our exploration of the semi-circle laws. Recall the notation. So we have a matrix W, a random matrix W, defined as Wij. All the entries are centered. And the second moment of the off-diagonal entries is one. So the variance is one. Yesterday, the assumption was that there are moments of all kinds for all the variables. Okay, so the moments are bounded. Today, we will remove that. But let me remind you that we're not actually working with W. We're working with a scaled matrix W bar, which just represents essentially what it does is it puts the eigenvalues of W almost on a compact set, asymptotically on a compact set. The object of study is the empirical spectral distribution, which is defined as being the average of the delta functions at the eigenvalues of the matrix W bar. Sorry, W bar. Okay. So this is the notation that we used yesterday. And we proved yesterday the following two facts. We showed that if we average, so if you take the expectation, over both the distribution F of W, sorry, it's not F of W bar, I should have put F sub W bar. So let me just make a note of that. Wherever you see it, F of W bar actually means F sub W bar. It's a distribution. It's the aforementioned distribution. So if you take the expectation with respect to W bar and F of W bar of the monomial X to the K, that's the same thing as looking at the average one of the expectation over W bar of the trace of W bar to the power K. And that goes to zero if K is odd, and it goes to the Catalan number for K over 2 if K is even. This is what we did, the bulk, I should say, of what we did yesterday. We showed that the moments converge to those of the semicircle, and these are the moments of the semicircle. Moreover, also yesterday we showed concentration of F of W bar or the moments with respect to W bar of F of W bar around their expected values. So we showed the following thing. The variance with respect to W bar and F sub W bar of X to the K, which is the same thing as one over N squared times the variance over W bar of trace of W bar to the K. That object goes to zero for all K. That's what we showed. In fact, we showed that it's of the order of one over N squared. So in other words, if you want this piece here, just the variance bit, is order one. And that makes the whole thing order one over N squared, which goes to zero as N goes to infinity because K is fixed for any K. These are the facts that we showed yesterday. From this, we concluded that the moments of the SD, so F sub W bar, which are these objects, one over N trace of W bar to the K, is the same as the expectation, as its expectation. And it goes to zero or C K over two asymptotically as N goes to infinity, depending on the parity of K. So maybe I shouldn't have put exactly equal. I should have put similar as in its asymptotically the same. Concentration, that's what we showed. Okay. So today we will show that because the moments are concentrated, this means that actually F of W bar itself converges weekly in probability to the semicircular. We'll do this by a combination of via stress and bounding or doing some sort of a spilling bound. So we will bound the probability that the eigenvalues of the matrix spill over some compact set. And that's part of what we'll be doing today. Okay. So let's first establish a couple of very simple facts. One of them is that C sub K is less than or equal to four to the K for all K. I just realized I don't have, oh, I do have a watch, never mind. All right, we're good. So the Katelyn number for K is less than or equal to four to the K for all K. So how do we show that? It's actually not hard to do. It just follows from the definition of the Katelyn number. Remember, C sub K is one over K plus one, two K choose K. A very quick and dirty bound. We'll tell us the following things. It will tell us that two K choose K is smaller than the sum of all the binomial coefficients corresponding to power two K. And that is what? Two to the two K. So this whole business is actually less than or equal to two to the two K over K plus one. Okay. No big deal here. And of course, this is just four to the K over K plus one. So this is a very, very simple minded bound. It's going to be sufficient for our purposes. And the second thing that I want to recall, I was told to protect the microphone if I'm going to cough. I forgot it. Sorry about that. So the other thing to remember is a virus stress approximation theorem, which says the following thing roughly, given any epsilon greater than zero in some function that is continuous and has support on a compact interval i, there exists a polynomial p sub epsilon. Mostly I will talk of it as being p, but you have to keep in mind that it actually depends on the epsilon that we choose. So p or p sub epsilon, such that f of x minus p sub epsilon of x in absolute value is less than epsilon uniformly does on this compact set i. What happens outside of this, we don't really care. What happens in practice is that this polynomial is going to blow to infinity very quickly outside of the interval i, but what we're interested for it to happen is that on the interval i and on the interval i alone, the difference between the polynomial and this function which has compact support is very small, uniformly very small. That's a virus stress approximation or theorem. Oops, sorry. All right. So now let's go back to looking at f sub w bar and think of it as a distribution which is defined by the values that it takes on various functions, continuous functions, functions with compact support, lip sheets functions. So there's a number of classes of functions on which it is efficient to know what f of w bar is or that it converges to something in order to be able to conclude that it converges in distribution. So let's take a look at continuous functions g. Now I'm going to denote that by the inner product. So when I write f w bar g, this is going to be essentially if you want the expected value of g with respect to the distribution f of w bar, right? So you can think of it like this or you can think of it as being 1 over n summation i equals 1 to n g of lambda sub i of w bar, okay? So far so good? That's what it is. All right. So now let's look at the following function, which is not quite continuous, but we can consider f of w bar with respect to it. So suppose that instead of looking at the monomial x to the k, what I'm going to do is I'm going to take the absolute value of x and raise it to the power k and then multiply it with the characteristic function of the set. Absolute value of x is bigger than 5. So I just want to look at what happens when I want to sum essentially or do the average of just those eigenvalues or the powers of those eigenvalues of w bar, which are bigger than 5. Is that clear? That's what this is, okay? It's the average of those eigenvalues that are bigger than 5. Suppose now that's bigger than epsilon. I want to look at the probability of this event, the probability that this happens. Well, as we know, this is going to be immediately bounded, this is Markov's inequality, by 1 over epsilon times the expectation over w bar of the same thing, okay? And that should be clear because essentially what we're doing is we're lower bounding, we're saying that this expectation is actually going to be, because f of w bar is atomic, is going to be a sum and you're just taking certain terms and you're actually, moreover, you're lower bounding them, okay? So this is this inequality here. And now I claim that if I just forget about the 1 over epsilon and look for a bound for this expectation here, that is going to be less than this. And the reason is simple. So let me write the inequality that I want to show here. E w bar, f w bar, x, oops, to the k. I write very enthusiastically, which makes every board shake. Okay, x is bigger than 5. Oops, like this. I will write it as times 5 to the k. I want to show that this is less than or equal to the entire expectation, but now it's not the expectation of f w bar, absolute value of x to the k. It's the expectation of f w bar x to the 2k. And it should be easy to see why this is happening. This is essentially conditional expectation. I'm looking at the expectation of the whole thing. I split it into the expectation if the variable is smaller than or equal to 5. And I ignore that completely. And I add to it the expectation that the variable is bigger than or, sorry, conditional on the fact that the variable is bigger than or equal to 5. And I lower bound that. Is that clear? I've forgotten a term and I've lower bounded the other. I've used the fact that I have x to the 2k here and I take x to the k of it. Conditional on the fact that x is bigger than 5, 5 to the k is going to be less than x to the k. So this here is going to be less than or equal to the expectation over w bar of x to the 2k given this. And that means that I forget the rest of the expectation. The other term in the expectation. Is that clear? You can convince yourselves of it if you just write what these things are. Essentially. Integral with respect to w bar, whatever that means, summation of absolute value of x to the k times 5 to the k is less than or equal to x to the 2k. And you integrate that over the entire domain, not just over x to the k. It's positive. It's positive everywhere. Okay. So that explains the second inequality here. And now if you put these two together, what do you get? You get that the probability that f of w bar in a product with absolute value x to the k over the interval where absolute value of x is bigger than 5 is bigger than epsilon. So the probability of this event is less than or equal to 1 over epsilon. Times 1 over 5 to the k times this expectation. But this expectation, we know what it goes to. It roughly goes to c to the k. Sorry, to c sub k, the Katelyn number. So maybe I should have here a c sub k minus epsilon over 2 or something like that. Or plus epsilon over 2. I can certainly take n large enough for this to be as close as what I wrote as I want. And the bottom line is that that is essentially like 1 over epsilon times 4 to the k over 5 to the k. And what's the most important thing about this number here is that as k goes to infinity, this goes to zero. And no, I shouldn't have that here. You're right. Sorry about that. This is a mistake. Thank you. There should be no indicator function here. Very, very right. Sorry about that. Okay. So note that what this is saying is that f w bar is almost surely compactly supported in minus 5 5. Because if you had some remnants, some non-zero mass outside of minus 5 5, 5 is bigger than 1, you would see that. You would see that. You would see that here. Okay. Because it would be some large contribution here. And the probability of this being bigger than epsilon would not, as k goes to infinity, would not go to zero, would go to something else. So now let's take a look at the approximation that we're going to do. Suppose I take some g, a compact function. I have lots of typos here. There's no such thing as bigger, bigger. It's just bigger. There's only one inequality sign here. So for any g, compact function with support in minus 5 5, compactly supported and continuous, given some epsilon bigger than zero, there exists by wire stress. A polynomial p is equal to p epsilon. And I'm going to drop the epsilon dependence just because it makes my life easier. Such that the absolute value of p of x minus g of x is less than epsilon over 8 uniformly on minus 5 5. The degree of this polynomial is, of course, epsilon dependent, it's finite. So now suppose that I want to look at sigma being the semicircle distribution. And I want to look at, specifically, this difference. I want to look at the difference between the inner product of fw bar with g and sigma with g. I want to look at the probability that this difference is bigger than epsilon. There should be an absolute value here. It's bigger than epsilon. What's the probability that this is bigger than epsilon? If I can show that the probability that this is bigger than epsilon is less than epsilon, let's say, or some function of epsilon for any epsilon given n large enough. It follows that actually fw bar converges to sigma in distributions. So now let's split this up. How am I going to split it up? I'll split it up as follows. So I will assume that instead of g, I'm subtracting off p. Maybe I should do this step by step because there are three terms there, and it may be confusing. So let's see what I'm doing. So I would like to apply essentially what I'm going to do is I'm going to apply a triangle inequality. So the probability, I'm going to look at the following first. I'm going to look at the following absolute value. fwg minus sigma g is less than or equal to fwp minus sigma p plus fwp minus fwg. Plus sigma p minus sigma g. Now, this is easy. This part right here is going to be majorized. It's going to be bounded from above by the biggest difference between p and g on the interval minus 5.5 because sigma is compactly supported. So this is going to be less than or equal to some function of epsilon. Epsilon over 8, right? So this part here is what gives us the epsilon over 8. Let's see. And now I want to do something else. I want to, instead of looking at this, I would like to look at the expectation of f of w. So instead of, I wanted to replace sigma with the expectation of f of w. If I can do that, no, rather, I should be able to do that because I know that in expectation, fw does converge to the semicircle distribution. So if I add a little bit more than here plus some other, let's say, O of epsilon, I can take this and transform it into expected of fw bar. There are bars all over here. Sorry about that. Is this clear? We can do this replacement because we know that in expectation, f of w bar converges in distributions to sigma. So the difference can be made on this p, can be made as small as, we please taking an appropriately large. So there's some other O of epsilon. So that explains how we get the first term. Now, let me see what else I want to do here. No, I think I confused myself. Oh, okay. Let's see. Okay, so that explains the first two terms, actually. The first two terms come from here. This term, and this term here splits. So the first two terms in there come from this term. And this term here splits into, let's see, the part of it, which is on x less than or equal to five, on which the difference between p and g is very small and because f of w bar is a distribution, by the same reason that gave me that this is less than epsilon over 8, that part will also be less than epsilon over 8. So let me write that down. So f w bar p 1 x bigger than or equal to five plus f w bar p 1 x, let's say this is bigger, slightly bigger, less than or equal to five minus f w bar g. This is less than or equal to the first part. And then the second part, which is this difference, is going to be less than or equal to epsilon over 8, for the same reason that this is less than or equal to epsilon over 8. We're averaging with respect to some distribution, but we're taking an average of a function, which in absolute value is small. Therefore, its average is going to be small. Okay? So that expoops, how did I get back to there? Okay, so this explains maybe we don't have an epsilon over 8 here, maybe we have some other kind of epsilon, two epsilon, let's say. But the bottom line is that we have these three pieces here and the rest of it is small. So now let's see, I claim that the probability term on each right-hand side is going to be appropriately bounded by some function of epsilon, maybe not exactly epsilon over 4. So why is this true? First of all, why does the following, the first one, the first bound, why is this going to be small? Some O of epsilon. Well, it's very simple. P is a polynomial, P is finite degree, and we just showed that the last lecture actually, we showed that if P were a monomial, then this essentially converges to zero as n goes to infinity. So what all we're doing is we're just taking a linear combination of things that go to zero. It's a finite combination, therefore we can always make it by taking appropriately large n as small as we please. Is that clear? I'm sorry, because of the light, I can actually get a lot of feedback from you because I don't see you. So if you have questions, please ask. So anyway, this is why this being a linear combination of monomials, of monomial terms, each of which will go to zero, we can make it as small as we please by taking an appropriately large. The second bit is also going to be, can be made as small as we please by taking an appropriately large, because we know that the expected value of f of w bar with respect to w bar converges in distribution to sigma. Okay, we know that. And then the rest, for the rest, which I wrote wrong here, is the expectation that is the integral of this polynomial over x bigger than five. Suppose instead of taking five hours to take some larger value here, a larger value of five, like 10,000, okay? Then everything I wrote here can be made as small as we please. And everything that we made here is going to be made as I wrote it, except with that larger value of five plugged in. And so I can conclude that this is also as small as I want to make it by, because four over that larger value of five is going to be arbitrarily small. Okay, sorry, I used five, I shouldn't have used five, I should have used some other value. But the bottom line is that by choosing five appropriately large, we can make this probability as small as we please. So the probability that the inner product of f of w bar with g minus sigma with g is bigger than epsilon, the probability of that event happening is going to be, can be made as small as we please. Yes, no, I'm probably confused you, so. So you want me to go back? This, yes, why is this true? Oh okay, so this is the level, the distribution corresponding to the level density, okay? So this is the distribution whose moments are the expected traces of powers, which we showed that they convert to the Katelyn numbers and therefore in distributions it converges to sigma. Yeah, any other questions? Yes, yes, that's going to be a minus larger value of five, larger value of five. Why? Yes, so this is essentially that I'm doing, what I'm doing, yeah. So the probability, so no, it's not equal to zero, it's going to zero, right? It's going to zero as it goes to infinity. Sorry, I did not mean to say almost surely, I meant to say actually with probability one that it's supportly compacted. Well, it's actually almost surely, but I'm not showing that here, okay? So this is more information than I'm proving, okay? Yes, yes, but I did not actually show that. So I did not show that, I wrote, okay. What I wrote here can be used to prove that this is completely supported almost surely, but not directly, I haven't shown that. Yes, okay, let's, for simplicity, let's just consider that I never said this because it doesn't actually serve me any purpose in the proof and we can talk about it afterwards, okay? So I shouldn't have waited so long to do my slides. Strike this through and replace five by some larger value of five, like 10,000 throughout and you'll get something true, okay? The notes are going to be correct, but obviously these slides are not. I'm not using that anywhere and so we can just talk about it offline, okay? All right, so to come back, this part was easy, this part was easy, and this part was also easy if instead of five you used 10,000 or some large value thereof, okay? Appropriately chosen with respect to epsilon to make four over five or 10,000 to the k smaller than epsilon square, let's say, or something like that, okay? So the bottom line is that each and every one of these terms on the right can be bounded by some function of epsilon. Therefore, we can get some epsilon appropriately small such that for any epsilon smaller than that, the probability of this is less than, say, two epsilon. But that's the same thing as saying that in probability f of w bar converges to sigma because it says that the limb soup of the probability of this being bigger than epsilon is zero for any for any function g, and therefore the convergence in distribution happens. Yesterday this worked perfectly, today not so much.