 Now, now, is it over? I'll wait one more minute and start again. OK, first I want to apologize at the end. I rushed a little bit through Freeness. Freeness is a very complicated concept, but I think there'll be a lot of talk about Freeness in the school. I think Camille, for instance, will talk about it. I'll also talk again about Freeness today. So today I want to talk about three things. So this is the kind of the second introductory lecture. And I want to talk about sample covariance matrices, so estimating covariances from real data, but often from a theoretical point of view. And this was actually studied in the large N limit. Well, it was studied in a small N limit. And for finite side matrices by Wishart, this is why the matrices will be at the beginning of the 20th century, I think, 1920, just something that I don't know the date exactly. So Wishart studied small sample covariance matrices. And then Marchenko and Pasteur in 67 studied the large N limit of these matrices. And then the more modern formulation is using multiplicative free convolution. So I'll explain what that is. Just a little note, Marchenko and Pasteur, which I'm going to talk about their work today, their Ukrainian mathematician. Pasteur is about 80 years old. He just moved to a cold normal because of the war. And miraculously, Marchenko is still alive. He's like 99 years old. I've heard he's now in Canada safe from the war. So I was always wondering if Marchenko was still alive because he was born in 1922. Anyway, but he is. Well, I don't think he's very active anymore. Okay, so sample covariance matrices. So I wanna tell you how large sample covariance matrices are interesting. So what's the, well, first, maybe let's assume that have some data drawn from a matrix from covariance C. Maybe to make things simpler, you can think of Gaussian IID data drawn from a covariance matrix, but you don't have access to this matrix where you have access to is a sample. So the sample, let me call it H. So H would be a rectangular matrix, which sometimes I will deny XIT, okay? So it's a variable I at time T. So I have multiple times, so I'll have T. So I have T slices of my data, which I assume to be each column each time is IID. And then what I'm, my data is actually a vector and the vector XI has actually a covariance, true covariance C that I typically don't know, okay? Or, and I'm gonna, so this data XIT, I'm gonna put in the matrix H, which is gonna be N by T, okay? So I have N variables and T observation. Be careful, often mathematician will use P variables and little N observation, okay? But my notation is N. So my matrices are always capital N by capital N, okay? But very often, if you look at sample covariance matrices, they'll be P by P matrices, because it's also to the number of degrees of freedom and so on, they use P. But I'll be using T for the observations and N for the variables, okay? So what is, and I'll use this E for empirical. So I'll define the matrix just if you wanted to compute a covariance matrix from data, you would say one over the number of samples, some T goes from one to big T of XI at time T, XJ at time T. This is how you build a covariance matrix and here I assume that either you've already subtracted the mean or the mean don't matter or you don't know the mean. So I'll never worry about the expectation of the variable X. So let's assume that you know a priori that have zero expectation for instance, okay? And this you can write as a matrix product. So it's easier to write as H H transpose over T, okay? So here T is transpose and here T is the number of observation, okay? So this is an N by T matrix. So this is a rectangular matrix. So it's very important. So this is a rectangular matrix N by T. The transpose is T by N. So this is the result, the sample covariance matrix E and it's actually a matrix element. So the matrix E itself, all right, as this product, okay? So what can we say about this matrix E? Well, one thing that if T is finite, no, sorry. If N is finite, let's say you have three variables, okay? And T goes to infinity. Well, then E converges to C and there's nothing more to say. The class is over, okay? But this is not the limit that interests us, okay? So this is what I would call classical statistics. In classical statistics, you don't work if you have thousands of observation and three by three or five by five matrices, you don't worry about estimation error of your covariance and all I'm gonna say is irrelevant. The interesting limit is when the number of variables is large and by large, I mean large compared to the observations. So I was interested in this problem because in finance, especially in quantitative finance, you often have very large portfolios, say hundreds or like 500 stocks in the S&P 500 or 3000 stocks in the Russell index. So typical people deal with portfolios with thousands of stocks and they need to measure covariance matrices using daily data and say 1000 business days is four years of data. So if you use eight years of data, you have 2000 business days and if you have 2000 stocks, you really have about the same, well, you literally have about the same number of observation as variables. And then we're very, very far from this limit of n finite t goes to infinity. So saying that e converges to c is very, very wrong. So this is one I'm gonna try to convince you today that in this limit, e does not converge to c. E is something else has interesting properties and we're gonna study the properties of the sample covariance matrix, e, okay? So what can we say? So let's, before having a model, let's just say that we have some data h. So h could be anything. It could be non-stationary. It could be random. It could be all sorts of randomness that I don't care about. It's just some rectangular matrix of data as long as the label, the rows, the rows make sense. Then I can compute a sample covariance matrix of the rows. And so, which would be e equals one over t, h, h transpose, okay? And some basic things I can say about this. Well, first of all, it might have zero eigenvalues, okay? And it often has zero eigenvalues. And to see this, I'm gonna consider, actually I'm gonna forget for a moment because if I only consider zero eigenvalues, the normalization one over t doesn't matter. And it's simpler to ignore it. I'm gonna consider matrix h, h transpose, and h transpose h, okay? So these look very similar matrices, but they're completely different object. This is an n by n matrix, and this is t by t. But you can show, I won't do it here. It takes three minutes to show that the, if this matrix has a non-zero eigenvalue, then this one will have the same non-zero eigenvalue. They share the same non-zero eigenvalue, okay? And so, given that they're not the same size, and by the way, they're symmetric and positive definite. So they can be diagonalized and all the eigenvalues are positive or zero, okay? So if you have two matrices that are not the same size, and they share the same non-zero eigenvalues, it means that the only difference could be in the zero eigenvalues. So for instance, typically we'll be dealing with a case where t is greater than n, okay? And in that case, you know that at least t minus n eigenvalues are zero in this case, okay? But you could have cases where n is greater than t and then it's on this side, okay? So if t is greater than n, I have the same non-zero, therefore I need to have quite a few zero eigenvalue. And for instance, yeah, maybe that's an interesting case. Already, okay, now I go back to sort of a sample covariance matrix that I'm trying to estimate from real data, that if I'm in the case where n is smaller than t, then my e must have zero eigenvalues, eigenvalue is equal to zero, okay? And then the number is at least t minus n, okay? So my matrix C, I could assume it full rank as non-zero and I cannot, so already you see that the matrix E is not a faithful representation of C because if I'm in this limit, which is pretty sick, I have, sorry, sorry, sorry, sorry, sorry, sorry. The limit where I have more variables than observation, if I have more variable, there are many ways of seeing it. I like to see this way, but there are many ways you could also see that the system of linear equation is incomplete and therefore they must have zero eigenvalues. Anyway, so if n is greater than t, if t is too small, then definitely e has zero eigenvalues where I could postulate C doesn't have zero eigenvalues. And so hints as a fact that the matrix E is not a faithful, doesn't converge to C. Another thing I could do that's also fairly easy where I don't really need a random matrix theory is to compute the moments. So remember the moment operator, which is one over n trace of a, so T of some matrix M is one over n trace of the matrix. And this typically, in most cases, becomes self averaging. And but I could add an expectation value, okay? So in expectation value, actually in this setup, the expectation value of E is indeed C. And so this trace operator is linear. So I do have that the trace of E is equal to the normalized trace of C. So the first moment match, but it's a relatively easy computation. You see that already the second moment they don't match anymore. So that I have that trace of, this is, I'm taking here, maybe I should be more precise. I'm taking the n goes to infinity limit with a number of Q, which is n over T fixed. Okay, so I'm taking a limit where both n and T go to infinity, but the ratio is fixed. So again, my setup for stocks, suppose that I have twice as many days that I have stocks, okay? So Q equals a half, but then I let the number of days go to infinity and the number of stocks goes to infinity and I study the properties of these matrices. And then in that limit, you can compute the second moment. Okay, so basically I'm dropping terms that are negligible as n goes to infinity. And I get that T of E square, the second moment is equal to the second moment of C plus an extra term Q trace of C, what now, normalized trace of C squared, okay? So remember that this limit here and finite goes to infinity, T goes to infinity is since Q is n over T is the limit Q goes to zero. So the limit Q goes to zero, you get normal statistics and you can show actually that in this limit, the moments match. So when Q goes to zero, when n is fixed and T goes to infinity, this term disappears and you have that the second moment of the distribution match and actually all moments of the distribution match. And again, we're back to this, that E has actually converged to C and there's really nothing interesting to say about the matrix E. But in the general case, and this limit here, when n is greater than T is when Q is greater than one. Okay, so because Q again is n over T. So if n is greater than T, that's greater than, okay. So we'll have essentially, we'll have Q equals zero, which is the sort of classical statistics. And then we'll have an interesting regime from Q between zero and one. And then Q greater than or equal to one, we will have zero eigenvalues. You can still say, you can still quantify what happens, but in the Q greater than one, the E will sort of have zero eigenvalues. And somebody asked me in the still just transform when you have a finite number of eigenvalues with certain values, in this case, we'll have a finite fraction of zero eigenvalues. We'll have to have a pole in the still just transform or a Dirac delta in the density, okay. So, but I mostly concentrate on, in this case, that's more typical where Q is where we have enough data. We have more data than variables, but not that much more. So Q is between zero and one, okay. Okay, so can I say more things about the matrix E? Well, first, before doing that, I need a model of my data. So how do I model data with a fixed covariance matrix? Well, if you were to make a computer program to generate correlated data, in many ways you could do it, it's easy to generate IID data. So you generate IID columns and then you correlate the columns. And one way to do that is you could say, work in the eigenbase, for instance, and multiply and then the variance if you're in the eigenbase, the variance is the standard deviation is the square root of the eigenvalues. And so you can generate data in the eigenbase with standard deviation equals the square root of lambda and then go back to the physical base. That's one way of doing it. There are many ways of doing it, but they all amount essentially to doing a square root of my covariance matrix C. So I have my covariance matrix C. So if I want to generate data with covariance matrix C, what I'm telling you, if you generate some sort of matrix square root and the matrix square root, you can think of you diagonalize C, you take the square root of the eigenvalues and you go back in the physical space. If you look at this combination. So H zero is IID data. So H zero is N by T, IID, normal zero one for instance. So you first generate a white data and then you multiply to the left by square root of C and then this will give you correlated data. So I'm saying that I'm a model, this is a model for correlated data. And again, in real life, there are many, many complicated things that can happen. But if I want to have multivariate stationary, by stationary, I mean the covariance is the same for all the columns. And then multivariate Gaussian, it's not necessary, but it just makes life easier to, at this introductory level, just think of everything as multivariate Gaussian. It doesn't need to be, because in large N limit, as long as the variance exists, it's enough. But anyway, in this case, we'll think of a multivariate Gaussian with a covariance C, then you can build it this way. And there are many ways to prove that this is the case. But just if you look at expectation value of one over T, H, H transpose, you can quickly convince yourself that this is C. Okay, so that your data, and this is true for every column. So this is a multiplication to the left, so multiply every column. So every column of H as covariance C, and it's IID on the time axis. So again, I like the finance analogy. It's like saying that every day you get a different sample, but of correlated variable. And this is what, okay. Okay, so now I'm building, so now I can say that E is really the same thing as square root of C, H0, H0 transpose square root of C, and then I need a over T for normalization. Okay, so this is my model of the data. And I could try to study the eigenvalue distribution of this, okay. Now remember that. Mark, sorry to interrupt your question on the chat. So on a physical note, why are we actually interested by the eigenvalues of covariance matrices and applications? I'll give examples on Wednesday. On Wednesday I'll show financial applications. Well, but we actually not necessarily interested in the eigenvalues per se, but we need to characterize this matrix and the eigenvalue gives us a lot of information. And for instance, I also mean this, my series of lectures about eigenvectors as well. So I'll also, but I need to start with the eigenvalues. First I understand the eigenvalues and then I can understand the eigenvectors. And then when I want to build, the whole point, and then there will be the lectures on, the two lectures on Wednesday will be estimating C given E. But today we're doing the forward problem. We're given C, what's the consequence for E, but the inference problem is more interesting. The inference problem will be, we're given E and we want to infer C. And essentially if we can infer the eigenvalues and the eigenvectors, we can reconstruct the matrix. So that's kind of the argument. And another reason you want to know about eigenvalues because when you invert, very often you need to invert a matrix. And if it has small noisy eigenvalues, inverting a matrix with small noisy eigenvalues is super dangerous. So knowing that there are small noisy eigenvalues is important. Okay. So again, so this I can do. Yes. So this I can do with free probability theory, but I won't use free probability right now. And therefore I'll use, I'll compute something slightly different, which is equivalent, but it's a bit cumbersome to go from one to the other. As I told you H, H transpose and H transpose H have the same eigenvalues except that different size matrices. And so, so this is an N by N, this is T by T and the normalization are slightly different. But anyway, let me compute. Just this could be slightly easier is to do this object. Let me define this object H, a zero track. Well, let me define. Let me just this object. So, so this object is a bit different where it sees on in the inside or see here is on the outside. In terms of eigenvalue, as I said, it really doesn't matter. It's just a question of keeping track of the zeroes eigenvalues and the proper normalization. But this, this, this is this computation is easier to do. And this is actually what March and go to our study. Okay. And if you think, well, since H is rotation invariant, you can actually, it doesn't matter. You can choose C to be diagonal. If you choose C to be diagonal, then it's like studying data where the variant changes every day. You have IID vector, but the scale fluctuates. So it's something that happens in finance. It's called volatility. You have on different days. You might have days where things move a lot. It's a big correspond to a big value for C. And days where things don't move much for the small value for C. So think of measuring IID vectors in a fluctuating variance way. Okay. So, and this I'll write actually as a sum. So I have one over T sum over T of X, I T J T where X or IID, but then I have this C, which I'll say that the eigenvalues of C are sigma square at time T. Okay. And I, and the order doesn't matter. So I can even rank them. Okay. So the matrix see I can diagonalize it because this is rotation invariant that could also rank. So, so a sigma square of zero with a sigma square of one would be the smallest eigenvalue of C and sigma square of big T would be the largest eigenvalue of C. And this is equivalent to some and here the either IID variables. Okay. What's nice about this form that this here, I might put the one over T here. Okay. This object is a rank one matrix. And so it has a single eigenvalue non zero and all the other eigenvalues are zero. So I can really write it as some over T as some projector. Okay. And the projector tease was just redefine projector T as X was basically vector X. T X T transpose and then some normalization in the square over T. Okay. So I have a projector. Well, and this is a this is a vector of size N. Okay. And in my normalization given the, the, that the X is our IID with variance one, this, this object is variance N. Okay. So, and so you can convince themselves that this has two, two, I can, as one eigenvalue. And in large and limit, this is the, the, the, the normal sum of the squares of this is N, N component. So it's a sum of N positive number. It doesn't work with large number. This doesn't fluctuate. So I have one eigenvalue equals to N over T. Sigma squared T. And I have N minus one. Eigenvalue. They're all equal to zero. Okay. So this is a, this is why I call it a projector projector. This is a projector in one dimensional space. There's a single eigenvalue. And on top of that, not only it's a projector, it's rotation invariant. So here I've written my matrix E as a sum of rotation invariant matrices. And so I can use the freeness that I sort of rushed. At the last lecture. To say that I can compute the spectrum of E, because I can compute the R transform of E. The R transform. If they evaluate at some point G will be equal to the sum over T of the R transform of each of these, these matrices. Okay. So I have a sum of matrices. They're rotation invariant. So I have this property that the R transform is additive. It's additive two by two. But since each of these is rotation invariant, I can add another one, add another one, add another one. And so it's additive all the way for, so I have, I'm adding. I'm adding T large capital large T. Rotation invariant matrices. So I can, so if I can compute the R transform of this object, I'll be able to compute the R transform of, of the sample covariance matrix. And again, the R transform is related to the steel just transform. So I can invert the steel just transform and get, and take the imaginary part on the real axis and get the density. So this is, this is how we go. This is not how much you're supposed to work. Did it because they didn't know about the R transform, but I'll get the same result as they did. Okay. So what, so to get the R transform of this object. I need to, to. Basically, our trend. I need to do the steel just transform. So again, I'm going to put a little T because this explicitly depends on this variance sigma T. Okay. And what is it? Well, I have, it's one over N. And then I have one over this eigenvalue Z minus. Well, this is Q by the way. Okay. So I will cue Sigma squared Sigma T squared. And then I have N minus one other eigenvalues. They're all equal to zero. Okay. So this is a, this is the, this is a serious transform of a projector. Exactly. So normally, I'm sort of cheating here because normally, the R transform is only defined defined in the large and limit. So I should only keep the terms that are finite as N goes to infinity. So as N goes to infinity, the, the, the serious transform of a projector. Is just one over Z. Okay. Which is the same as the previous transform of the zero matrix. Okay. And the R transform associated with this. Okay. So I can. So this is G. So again, if I only keep the terms that are. So this, this disappears one over N. And this minus one also disappears as one over N. The only thing that matters is one over Z. I can invert this relationship. I get Z of G. Equals one over G. And the R transform is Z of G. Minus one over G. So the R transform is zero. Okay. So I've lost everything. I get, I get a sum of zeros. Okay. So I'm going to cheat. Because I'm not a mathematician. I'm allowed to cheat. I'm going to say, well, because I have a large sum here, maybe I should keep my R transform to first order in one over N. Because T is large. So I'll have N terms. So if I compute the R transform to order one over N. Then. Then the R transform won't be zero. It'll be, it'll be something. And then I can sum. I can sum T of these things. And get the, the correct result. Okay. So. So I see time is going by. So I'm just going to state the result. Okay. So it's fairly easy. Just to give you just a bit. So basically I'm going to do it iteratively. So I'm going to write G of Z. As a term to first or to a zero order plus a correction in one over N. And then I'm going to invert. I'm going to invert this. To first order in one over N. And then remove the polar one over D. So I see time is going by some just going to state the result. Okay. So it's fairly easy. So I'm going to put the R transform. And then I'm going to put the R transform. And then remove the polar one over G that's all there. And get this, the, the R transform. And then I'm going to sum over T. So if I'm not mistaken, I get something like this. Are. You have G equals some. Okay. One over NB. T equals one to T. And then. Right. Otherwise I'll get. There's. Hugh Sigma doesn't really matter exactly what it is, but one minus. G. Okay. And, and in the larger end limit. I could I could I could sort the eigenvalues of this matrix C and and they become continuous and and sort of say that sigma square t becomes a function sigma square of t between zero and one. Okay so and replace this sum by an integral this is a sum over t I have a one over n so there's a factor of q that will appear with cancel this factor of q and trust me the answer would be zero to one dt of sigma square now of t of a function one minus g q sigma square of t okay and this is the r transform okay so two things you can jump out well maybe I was too ambitious to to deal with the case where does the matrix C okay so if C is the identity if C is the entity all this eigenvalues are one these sigma squares they're all one this integral of one just simplifies tremendously and this is called the white case it's called a white wishart so the sample covariance matrix is a wishart matrix wishart actually considered the matrix C so if you don't have a matrix C then it's called it's a wishart matrix with the identity true covariance and then I call that a white wishart and then you have that the r transform is very simple evaluated point g is equal to one over one minus q g okay so that's one result that we get here it was just it's like doing the same computation without bothering with the sigma square terms okay and then another thing you could do is rewrite because this is the r transform but the r transform is directly related to the steel just the inverse steel just transform uh minus one over g and if you write it like this and and you say this is an implicit equation for g okay because uh and and you can just write like that instead g of z g of z okay so this equation here written like that this is essentially what marching point pastor got in their paper in 67 okay they related the steel just transform of a matrix of that type to an implicit equation that that's that's the marching point pastor equation that's a bit hard to solve but you could solve it for them the methods to solve it numerically and it relates the steel just transform uh with itself with this integral that depends on the on the density uh yeah um okay and yeah and and so this is a bit cumbersome and and it's particular to um to sample covariance matrices and so I just want to finish the last five minutes um with something a bit more general we in this uh having to do with because um really these these objects go I'll go back to the previous object e equals square root of c h0 h0 transpose over t square root of c this is really a product of the matrix w okay so I'm going to find a white way short the mu zero if you want is just this matrix okay this this is this is if you want this is a sample covariance matrix of of white noise the sample covariance matrix of data that uncorrelated okay and it has the r transformed equal to this okay by the way I um I won't have time you can invert this you get a quadratic equation for g you can easily solve the quadratic equation you get a branch cut and just want to write it at least that the density of eigenvalues that you get for something like this which is going to write it's called the it's also called the martian copeture density again martian copeture density is for the white wishart okay for the if you want the colored wishart the one with a matrix c in the middle of a matrix c on the side uh is is an integral equation but for a white wishart it's a simple equation and its row of lambda is going to be a square root of lambda plus minus lambda lambda minus lambda minus over two pi lambda and with lambda plus or minus that depend on q by this formula and where it looks like a very small graph it looks something like this okay it goes from lambda minus to lambda plus and so if if q is between zero and one this is between zero and four okay so lambda minus is between between zero and one and then the plus is between one and four okay so and and it depends on q okay so this is the martian copeture density for the for the white wishart so so this matrix here we know how to characterize very well and the matrix we're interested in here is actually the product and it's a rotation invariant it's also an invariant it's a rotation invariant product of it's a product of rotation there matrix with some other matrix well i mean i write it this way um it's it's but yeah maybe i think that's important that most people would say the product of a matrix would be like c times w but c times w is not symmetric symmetrical so for positive definite matrices i prefer to write square root of c w square root of c okay and for the eigenvalues it doesn't matter well this is a non-symmetric matrix but it has the same eigenvalues as this guy but for the eigenvectors it will actually matter quite a bit whether um the the c is on the outside or in the inside for instance this is a rotation invariant matrix this is not this depends on the eigenvectors of c okay and i'm really rushed for time so remember i told you that well we we worked a little bit with the r transform that that sort is additive for a product for sums of free matrices well there's something called the s transform and what you would have is a s so it's evaluated say at point zeta you know at point x would be sc so there's a there's a transform that for an object like this for an object that's a product of a random rotation very large matrices and some other matrix there is there exists a function that depends on the spectrum it's called the s transform if you know the s transform of c and you know the s transform of w zero then you know the the s transform of the product the free product and once you know the s transform you can go from the s transform to to the to the the seal just transform and to the density and i think i should stop here and when i'll need to use this i'll explain a bit better how to compute this s transform this transform is again related to some inverse of seal just transform it's a bit just slightly more complicated but but we'll see that next time okay thank you ranch and by the way i'll go slower the next time this is this is background material let me recall the program who's next so okay someone who's next should know who's next so let me just distribute