 could our stroll through this entire history and geography, if you want, of empirical spectral distributions. Just to remember that we talked about the existence of the limit for the empirical spectral distribution for Wigner matrices, we talked about the fluctuations from this limit, so the fluctuations from the semicircle. And today, we're going to go one step further and talk about the minor process. The minor process is generally defined, as we will see in a moment, by taking overlapping Wigner matrices. And what this does, as we shall see, is it adds one coordinate, in a sense, to this description, this characterization that we've been doing. And of course, it complexifies the nature of the problem, but surprisingly enough, the answer turns out to be relatively simple, beautiful, and a nice generalization of what we've seen up to this point. So without further ado, let me actually go quickly through the notation that we've been using. So W always stands for Wigner matrix, Wij. In general, up to this point, we've had just centered entries, and we had that the variance of the off diagonal entries is always one. And we've also had, last time, and I didn't mention it here, the fact that the variance on the diagonal is constant and the same for all of them. So these are independent variables. We've been assuming, and we will start here assuming the same thing, which is that you have bounded moments of all kinds. As we will see, and as has been the case yesterday, this condition can be replaced with existence of four plus epsilon moments. But we'll see that in a little while. Pardon me, was that a question? All right, so we're looking at the scaled matrix. So whenever you see a bar, you should expect that that means that the matrix is scaled. However, when the time comes, I'll specify something about the scaling. Of course, the empirical spectral distribution, as always, is given by the average of the delta Dirac functions at the eigenvalues of the matrix. And we examined, last time, the linear statistic for, first, for a polynomial, and then for a smooth function, F. And what does that mean? It means that you average out the values of the function on the eigenvalues and subtract off the expected value of this average. So notice that we have two expectations. In a sense, we have a multiple of the expected value of F of lambda, where lambda is one of the eigenvalues chosen at random. And then we take the average over the matrix W bar itself. If you particularize this linear statistic to the case when F is the polynomial x to the p, then we have the quantity that we examined last time, which is trace of W bar to the k minus its expectation. And I'll remind you that last time we showed that the quantity that I just spoke about converges to a centered normal with computable variance. In fact, in the problem session, you were supposed to compute the covariance of two of these variables. I understand that you didn't necessarily go all the way. We'll see the problem again briefly if we have time in a more general context today. But the important thing to remember is that essentially what you can, the conclusion that you can draw, because you can calculate the variances and covariances of this process, and you see that the limits are always Gaussian, this is essentially defining a Gaussian process on the line, on smooth functions on the line. So you plug in a smooth function, you get out a Gaussian variable centered and with variance given by some function of that function. That's a Gaussian process. So let's do a little bit of recap. The ESD converges to a deterministic distribution, which is the semicircle. In a very real sense, we can think of this as a zero-dimensional process because this distribution is a point. So you can think of it as lacking randomness. The distribution is a deterministic distribution. It's a point. Now, then you look at the fluctuations. The fluctuations are defined by this Gaussian process on smooth functions on the line. That already introduces a one-dimensional feeling. And then you can ask, is it possible to generalize this to something that's two-dimensional? So rather than talking about something happening on the line, you want to look at something happening, say, on the plane. And the answer is yes. We need to introduce a new dimension. It turns out, as I will tell you at the end, that we haven't quite 100% defined this satisfactorily as a proper two-dimensional process. But because of the quantity that is involved in defining this process, which is the Gaussian free field, which I will define, you can think of it legitimately as a two-dimensional process. So that's what we're going to do today. We're going to introduce this other dimension. And instead of having convergence to a point or convergence to a Gaussian process, we will have convergence to the Gaussian free field. So we're going to study collections of these centered linear statistics, so X sub F of matrix, of sequences of large overlapping random matrices. So specifically, this will have depth and width, in a sense, because of the additional overlapping constraint. This has been studied in the literature under two names, one of them being the minor process, because what you end up doing is looking at a huge, well, an infinite array of variables and chopping off principal minors of it, overlapping principal minors, and studying the centered linear statistics thereof. So that's one of the names. And the other one is corner process, corner being assimilated to minor in this context. So you'll find them in the literature under these two names. OK. So this is what we're going to do. We're going to assume that we have this infinite double array of variables for simplicity, but it's not really necessary. But for simplicity, we can consider them to be IID of diagonal and IID on the diagonal. It's a symmetric array, so everything is defined by what happens on the diagonal and above the diagonal. And we assume, again, for simplicity, even though you can remove this later, that we have moments, bounded moments, of all kinds for the distribution that defines the off-diagonal elements and for the distribution that defines the diagonal elements. And we shall have that. So we have w11, w12, sorry, w22, w33, and so on. And then w12, w23, et cetera. But the only thing that we need to know because of the identical distribution and independence is that the expectation of w11 is 1. Remember that the square of the w11, so the variance of w11 is 1. Remember that it's actually 2. Why am I writing 1? Sorry. Remember that yesterday, we just thought of it as a constant when we did our calculations with the ways in which we could glue our graphs together and the like. And at some point, a loop would appear, and we had to square it. And we had to look at the weight that that gave to the entire term in the trace of w bar to the k. We didn't care what that variance was. Now we do. And the reason why we will define the variance to be 2 is because this will agree to the canonical case, which is the Gaussian orthogonal ensemble. This is what the variance will be in that case. And we want to match the moments. The reason why we want to match the moments is that if we don't match the moments, we don't get the GFF. We get something else. We get other limits. So in this sense, this is a less stable result than just looking at the fluctuation. Because for the fluctuation, you could always get, I mean, you have always some sort of a Gaussian process. But in order to get from that to the Gaussian free field, it has to be a precise type of covariance structure, which you can only get if you prescribe the moments. Is that clear? So this is what we're going to prescribe. W11 squared is 2. The expected value of W12 squared is 1. And I'm going to write something. Well, I wrote there W12 to the fourth is 3. And you might remember that we did need the fourth moment yesterday in calculating the fluctuation. And here, because we're doing something more general, you can expect that we're going to need the fourth moment. Again, we hadn't prescribed the fluctuation here last time. But now we will because we need this parallel with the Gaussian orthogonal ensemble. And that's what the fourth moment is in that case. It is worth spending a moment, no pun intended, to explain that everything that I've said to you so far about the empirical spectral distributions, about the fluctuations thereof, can be extended. And in fact, the custom is that, generally, these things are computed first for the Gaussian unitary ensemble and then for the Gaussian orthogonal and Gaussian symplectic ensembles. So those are the ensembles that can be obtained by taking Gaussian variables. Gaussian variables that may be real, complex, or quaternion. And this corresponds, as you've probably learned by now in multiple other mini-courses, this corresponds to a certain classification of these beta ensembles, which has beta equals 1 for real, Gaussian orthogonal, beta equals 2 for complex, or GUE, and beta equals 4 for quaternion, or GSE, Gaussian symplectic ensemble. Probably, unsurprisingly, everything that I'm going to tell you today is also extendable for beta equals. So we're only doing it in a sense for the real case, so beta equals 1. But everything is extendable to beta equals 2 and 4. The only difference will be that we will want to prescribe the fourth moment to match that of the GSE and the GUE. And that means that we're going to have to do, instead of 3, we're going to do 1 plus 2 over beta. So that's a big difference if you go from real to complex or quaternionic. All right. So we have this infinite double array. From it, we will extract a piece, the upper left hand corner, which will be of size L by L. Don't ask me why we switch from N to L. It seems to have been the thing to do in the literature, so I'm going to try and follow that. OK, so instead of N, now we have L as the biggest size of a matrix. So we're going to extract, for some large L, a principle minor, the upper left hand corner principle minor. And from this, which will take to be at that point its own matrix, from that we will continue to extract overlapping minors. So given some value k, we will extract k sub matrices, k principal sub matrices. By the way, I think that the word minor, unfortunately in linear algebra, is used to mean two things. It's used to mean the sub matrix, and it's used to mean the determinant of the sub matrix. There are no determinants in this talk, so when you hear minor, understand the sub matrix. It's an unfortunate thing. OK, so given k, given some number k, an integer, a positive integer of that, we will extract w1, wk, k principal minors of this L by L piece of the array. We will have the corresponding sizes N1 of L. I'll write it as N1 of L, we'll see why in a moment, Nk of L. And we shall have overlapping pieces. So what do I mean by overlapping pieces? Well, if you think of this as being L by L, when we extract principal sub minors, maybe one of them is in the corner, maybe one of them is here, maybe one of them is like this, another one could be here, maybe one of them is not really overlapping the others, and perhaps one of them even includes one another. So this can be a sequence of principal minors that we extract. And of course, in principle, we're going to be interested in the indices of the rows and columns that go into these minors. But in practice, if you think about it for a moment, you realize that rows and columns are interchangeable, subject to, of course, keeping symmetry. And therefore, in a very real sense, the only thing we care about is their sizes and how big the sizes of the overlaps are. We don't actually care about the indices. We can rotate these things. We can permute rows and columns to put them wherever we want. In principle, we could construct sub matrices that are not like this, that are not contiguous, that are just a few columns here, a few columns there with spaces between, but because of this beautiful property of interchangeability of rows and columns, we might as well think of them as contiguous. Yes, I see there is a question. I'm sorry, I couldn't hear you. Does it matter if you have the minors diagonal, whether they include diagonal? I'm not sure what you mean because you have to, the minors should be symmetric. So that's why you take them principle. Yes, yes, yes. Yeah, but it doesn't matter. So we're only interested in intersections of two by two. Yes, so for example, here, if instead of doing this, maybe you had something like this, then yeah, you could have triple intersections. It's not going to come out in the calculations. So it's not, because we will be working with Gaussian variables, a set of Gaussian variables is defined by, or a multiple Gaussian, that is the vector that's defined by this, Gaussian is defined by the covariance. And therefore, even if you have triple intersections, the covariance doesn't care about that. Does that make sense? OK. Any other questions? All right, so let's keep going. So I was about to define the intersection sizes of pairs of these things. So we shall have n ij of l be the size of, I will write, w i intersect w j. And what I mean by that is the sub matrix that corresponds, that where w sub i and w sub j overlap. It's a little bit of abusive notation, but I hope that it's self-explanatory. So given this structure, we can examine the linear statistics of traces. So for example, eventually we will want to extend this. But for now, we're just going to look at polynomials, and in particular, because polynomials are linear functions of monomials and linear statistics add and Gaussian's add. It's enough to look at what happens if you look at center linear statistics corresponding to polynomials x to the p1, x to the p2, x to the pk, where these p1, p2, pk are integers. And you look at w1 bar, w2 bar, wk bar. And I'm going to say something that's slightly different from what we've seen so far. Now when I define w sub i bar, you might expect that what it is, it's 1 over square root of n i of l w i, because that's the definition that we have used so far. But in this lecture, because the true size is actually the one that I start with, the one in the big matrix that I chopped off, I'm not going to define by the smaller size. I'm going to define by the bigger size. So this is going to be 1 over root l w i. In a sense, this will matter marginally and only in computing covariances, because what we will actually want from all of these sizes is that they are proportional to l. So we will want that n i over l converges to some quantity b i, obviously going to be in 0, 1 for all i. And we will want that n ij of l will also converge to some quantity c ij, again in 0, 1 for any ij. So we're going to chop off minors that are as big as fractions of the matrix. And their intersections are also going to be fractional. Yes, thank you. Correct. Otherwise, what I wrote is complete nonsense. Thank you. I think it was written correctly on the slide in my defense. So I hope that the setup is clear, because now what I'm going to do is I'm going to define a height function. And this height function has a very peculiar, perhaps, form and interesting. And that is to look without loss of generality. So again, by permuting rows and columns, in a sense, it's sufficient to define things for the upper left-hand corner. Because then, if you slide that minor along the diagonal, definitions don't really change. So we're going to define for some y greater than or equal to 0. In principle, we'll only look at y being less than or equal to 1, but we can define it for any y greater than or equal to 0 by referring back to the infinite double array that we started with. We define the upper left corner of this array called w sub y. It's going to be of size floor of yl by floor of yl. I wanted to write yl by yl, but of course, that wouldn't necessarily be an integer. But for all practical purposes, you can think of it as yl. So as we know asymptotically, floor of yl over l, as l goes to infinity, is y. So we define this matrix. And for any interval on the real line, we define that quantity n sub i w y as the number of eigenvalues of w y in i. And in particular, we can define this height function of x and y, which takes that interval to be our infinite left bounded interval x infinity. And it looks at the matrix w y. So it's the number of eigenvalues of w y that are bigger than or equal to x. But of course, we're going to center this. We center it. What does this suggest to you right here? What is this function if I were just to look at this? Or rather, OK, so maybe you can't see it because I'm looking at x infinity. But what if I had looked at minus infinity x, the number of eigenvalues of w y in minus infinity x? What is that? Yes, the integral from minus infinity to x of the spectrum. But so it's the distribution. It's the probability, actually. It's the probability that an eigenvalue is less than or equal to x. But this now we're looking for our x to infinity, so you'd have to subtract everything from n. So it's a little bit more complicated. But it's a quantity that is deeply enmeshed with the empirical spectral distribution. OK, so we define this as being the height function. For a given function f, we recall what the variable x sub f w y is. It's the center linear statistic. Let me change that pointer. And something very interesting happens. If you take f to be suitably smooth, continuous, compactly supported lip sheets, you can add as many conditions as you want at first. Then you can remove some of them. But say that we take these conditions, you will see that the integral of f of x with respect to h of x y dx is actually going to be the linear statistic corresponding to matrix w y of f prime. And the reason why this happens is that you can do integration by parts and use the fact that h of x y is going to go to 0 as x in absolute value grows large. Because obviously, when x is minus a billion, chances are you won't have any eigenvalues. Actually, if x is minus a billion, then you'll have all of the eigenvalues above that. So maybe I should not have put absolute value here. I guess it's a constant. So as x goes to infinity on the left, so x goes to minus infinity, it's a constant. You integrate that, and you get the expectation is going to be n. So the number is going to be n. The expectation is going to be n. So actually, x of x y is going to go to 0. Does that make sense? If you look in an interval where all the eigenvalues are and you expect all the eigenvalues to be there, then h of x y is going to be 0. And when you go to infinity with x in the positive direction, then all the eigenvalues are going to be below that. You will expect all the eigenvalues to be below that. And so the number is going to be 0. The expectation is going to be 0. h of x y is going to be 0. In fact, if you think about it, h of x y is going to be a step function. Each time you hit an eigenvalue, something will happen. All right, so again, let's look here. This is important because it codifies the relationship between linear statistics and this height function, h. It's a very important equality. So the height function, as I defined it, is defined naturally on r cross positive real axis, or a real line. And we expect that all the places of growth of x y in the x direction are concentrated in this domain. And the reason for that is going to be that the eigenvalues of w y will be between minus 2 root y and 2 root y. Remember that we scale not by root y l, but by root l. So the eigenvalues will have size between minus 2 root y and 2 root y, with probability going to 1. And therefore, that's where we expect the places of growth for h of x y to be. So what we're going to do is we're going to map this domain here onto the upper half plane via this map. You can take time offline and check that it works. What it does actually is it's going to map semicircles into lines. Its inverse can also be computed. And we get this. You can see that y is going to be the absolute value of z squared. So this is a fairly simple map. And why do we care? Well, we care because, in a sense, this map will appear in a very real way when we calculate covariances of these linear statistics for different matrices. In fact, this is the theorem that is kind of key to it all. And it says the following things. So we have the k principal minors, that overlap. They have sizes defined like I specified. And the overlap is also defined as specified. Everything goes to constants in 0, 1. Therefore, when you look at the linear statistics corresponding to polynomials, to monomials, x to the p1 on w1, x to the p2 on w2, x to the pk on wk, this should have been with bars, sorry. So I do want the scaled matrices. This vector of linear statistics converges in the sense of moments to a vector of centered Gaussians with covariance given here. And it's not a very simple expression. It's not particularly hard. Believe me, if you compute the same thing for wishart matrices, for those of you who know what wishart matrices is, you'll get something more complicated. But still, it's computable. And you notice here that I used the definitions of x of z that were given before. So I'm using, yes. No, that's what it is. Yes. We just compute the moments. And therefore, because we see that the moments converge, we conclude. For Gaussians, that's what you need to do. So we see these expressions, x of z and x of w. And you can guess because they appeared in the.