 Okay, sorry. I'll say it again. Doing the problem of estimating a covariance matrix, or more generally, estimating a matrix C that's been corrupted by a multiplicative noise so I have the w is some noise matrix. You can think of it as a white wish art which it is in the case of it for the sample covariance matrix, but more generally, it can be any multiplicative rotation invariant noise. It's easy to normalize it so as mean one. Okay, and then I argue that if you don't if you don't have a prior on your matrix C if you don't have a, if you don't have a prior on the eigenvectors of C, if your prior on C is rotation invariant, then you'll follow your estimator has to be in the case of rotation invariant estimators and a rotation invariant estimator, since the only eigenvectors in the problems are those of the matrix E, it must be in the same basis. So if E is diagonal. So you diagonalize E, you get eigenvalues lambda k and eigenvectors vk, your estimator of C will be in the same basis, but you can change a coefficient. For example, you have a tremendous dimensional reduction, you go from estimating and square coefficient to estimating just and coefficient, and the other merit miracle is that you can all do that in the large and limit, but the two miracles are related, because initially, you have sort of n square data and you're trying to estimate n square, the object of size n square, and that's very very hard to do to do a good job. Now, I only want to estimate and coefficient. So, there'll be, I still have a sort of factor and it goes to infinity. So I can have a concentration. If you want to talk about medical term but you have things that will actually converge to the right answer. By the way, this is a hopeless problem. It's impossible to recover C, exactly. You'll just get the best estimate of C. C is, I mean it would be extremely unlikely that C is in the base of E. Okay, so, so, so, so in typical meaning in every case, your estimator will be, will not be a perfect you can't find you can't have perfect estimation the only case where you have perfect estimation is this is parameter q, which is also super important here, and over T, if q, if q goes to zero, then then you recover perfect estimation. Okay, but, but, and, and, but, I guess in the general case I don't need this, this, this concept of q because I have a general noise. And what will matter is the S transform of this noise. Okay, so. The estimators in this form, we can ask the question what's the best we could do. So the best we could do is called the oracle. So suppose I, I'm looking for an estimator in this class. Okay, but I actually no C. So, okay, so it's kind of a strange question but it's kind of gives you a bound on how. So if you stay in this class, and you, and then the best you could do is the estimator, the best estimate of that class knowing see. Okay, so this is called the oracle, it's cheating. But it's useful to compute. And if you, and so the oracle would be. Okay, so I would say, let me just write the equation, what I want, I want to minimize I want the, I want the argument over the side K of, like a VK, K transpose minus C, and then I take maybe a I can take the trace of this. Okay, so if, if I know the matrix C, then I'd like to fix the coefficient. So, so it's going to come with the problem I have some some basis which is not the basis of C, but it's a basis. The basis I have, what are the best coefficient. Okay, so this is the best estimate of C in the least per sense. Okay, and it's fairly easy to do to do I'll just give you the answer is to the side K. So and this is an oracle or to normal basis. Okay, the VK or norm, their, their, their, their, their normalized eigenvectors so they're orthogonal and, and normalized. And then you get that the coefficient or just VK transpose C VK. Okay, so the intuition. Again, here, I'm sort of using the language of finance. C is kind of a risk matrix that I'm trying to estimate. These are, you can think of them as portfolios their vectors of weights on stocks. So the best risk that you should apply to each of these portfolio. Well, in this case, it's kind of, it's kind of pseudo. Well, it is the, it is the eigenvalue of the matrix. Chi. And what I can value you should use is the realized risk the real risk of these portfolio. So if that that finance analogy is something. Okay, but you can just do this in two line computation. Okay, so the question is, can we estimate something like this. So, obviously to compute this I need the knowledge of C. And so as I said, then the the there's a miracle that happens in the large end limit that you can actually achieve this without knowing C. Okay, so the, the, and if it will be the argument as follows, this is the best. So I'm stuck in this, I'm stuck with this class of estimator. And in this class of estimator the best I can do is this. If I achieve this on average, then I have the best of this class of estimator and this class of estimator is the, the best I can do in this problem. So, and this shows that the estimator that will build is optimal. And, and, but of course, I need to do this without the knowledge of the matrix C. But I will you see I'll be using these kind of properties, especially the one at the bottom that where well actually I'll specialize it to the problem that that that I'm looking at, I'll be using the T transform of E. I'm going over W. And I'll have. So I'll have something like this for, for my problem where I have the free product of C and a matrix W. I'll be using an expression like this, which relates to me. So this matrix in the basis of E and a matrix in the basis of C. And from this I can compute, I can have a relationship between the eigenvectors of E and the eigenvectors of C. And this will, and it's essentially this is a sandwich between the eigenvectors of C and so these are the eigenvectors of E and, and the matrix C some. This is the kind of thing I'm going to try to do. I'm going to try to do this. Well remember, well. Yeah, so remember the G matrix. This is maybe more useful I won't use this one but remember, I said this many times that it's, it's a projector vk okay let me, this is E. But it's true for any matrix. So, so the idea to try to get something like this I wouldn't I would like to. So to know psi of K, I just need to. I need to, to, to, to take GE and Z matrix C take the trace. If I do this I will have. Yeah this I can also write as trace of this is a this is a scalar this is a number, you can write as trace of vk transpose c vk. And then you can use the cyclical property of the trace. It's also the trace of C, vk, vk transpose. If I can take the trace of the matrix C with a projector on the on the eigenvector. I have the quantity that I want. And remember, the resolvent at this format the resolvent matrix as a projector on each eigenvector with a pole at the eigenvalue. So, for the, but I want to be using the T matrix because for the product the T matrix as as this beautiful property. So for the T matrix is very similar for T matrix I have that it's equal there's just an extra matrix here, which shows up here as sum over K of lambda K, vk, vk transpose in the same pole is the minus lambda K. Okay. So basically what I want what I morally would like to do which I could do in finite and so at finite and these objects of actual poles. Okay, so at finite and what I really wanted to do is trace of C, T, E of Z. So this would give me sum over K of lambda K, vk, vk transpose C, I take the trace, let me pull out the lambda K, and then I take a trace here traces linear, I can, and then Z minus lambda K. So this trace I'm interested in, I could just take the residue, so I can compute this function. And I take the residue at, at, so if I want side K if I want side 27. I just take the residue at the pole Z minus lambda 27. And I will get, I will get this object that I want. Now, this, this, this trick worked really well with outliers, because with outliers, you had a pole, you had a, or if you have a direct mass, you are, you still have a pole in these functions. The problem here, when we do the continuous limit is that these functions, when there's a lot of eigenvalues, they don't, well, if you want on the real axis they stay, they stay random, and near the axis, they behave as the, all these random eigenvalues behave as a branch cut. So, taking a residue at the branch cut won't work. Doesn't make sense, but nevertheless this is never, this is the right object that I want to compute. So, let me just call this object. Let me just call this H of Z. Okay. And, and, and really this quantity. Here, I've written it, this is side K. So the quality I'm interested in is side K, so side K is in this equation so let me rewrite it. So, above H of Z, I said, it is equal, I'm not sure I can compute it but it is, it's true at finite N and it will be equal to lambda K, side K, Z minus lambda K. So, I have this T transform of the matrix E. I trace it with the matrix C that I don't know but later in the computation I'll manage to get rid of the matrix C, but so I can define this unknown function H of Z. And at finite N, it has this pole structure. It has the coefficients I want. The coefficients are residues of poles of for, and then lambda K's are the eigenvalues of E. Okay. And, but now I want to take the large N limit of this equation. And then the large N limit, if everything is nice and smooth. These all go to continuous function. If I'm evaluating this function not on the real axis but anywhere else in the complex plane, I claim that this will become integral from lambda minus to lambda plus the assuming that the spectrum of my matrix E in the continuum limit will have a density of a row of lambda I could have poles as well but a direct masses but let's assume it converges to a, to a nice density without direct masses. And, and this density is defined is bounded so it's defined between line and the minus and lambda plus, and then I have lambda side becomes a function of lambda. So I have lambda minus lambda d lambda. Okay, so what I'm saying is that this sum in a continuous, evaluate at a Z that's outside the real axis converges to an object like this one. And the object I'm trying to, to get is psi of lambda. So what is client lambda. So initially, I had a my estimator was side k v k v k. But I can write it as psi of lambda k v k v k. I'm also estimate I'm also making assumption which is justified that that this coefficient converged to a smooth function of K, and K becomes a continuous index, and instead of labeling it from from zero to one. So in increasing order of eigenvalues it's it's easier to to label it by the corresponding eigenvalue and. Okay, so, so psi of care now right it's psi of lambda k, and I said it psi of lambda k behaves as a nice smooth function that I'm trying to estimate. So this is, this is why I'm saying that this lambda of course. This is lambda. I have a sum over K but I'm switching to an integral over lambda so I need the density. Okay, and just remind just the parallel, which you which you should all know is G of Z, for instance, which was some over K, the classical function just transform Z minus lambda k. It converges to an integral row of lambda Z minus lambda. Okay, so it's really the same type of of continuum limit. And again this is only true for an argument Z elsewhere in the complex plane on on the real axis. These are random functions with poles and they they're violent and they don't converge anything nice. So if you look away from the real axis, these converge tonight functions. Yes. Yes. Yes, yes, yes, where happened to my one over and so here definitely there's a one over and here there's a one over and, and I lost a one over and somewhere. It's in the trace. Yes, here it's a. Okay, I'm sorry. This is. This was a towel. So let me put a trace. Yes, really it was a towel. So this is a towel and towel is really trace one over and. Okay. So H of Z is towel of CT using. And then, then I need a one over and here. Okay. So I'm almost done. I'm not, not almost done but by the way, I know if we, when do we end. Okay, so what can I erase I'll start over here. Okay, fine. And then this. Basically, I'm going to use again the plan. Sorry, I never in the Plemish formula. So you know the Plemish formula for, for, for, for G is that once I have the continuous version of Z. You shouldn't all know that pi times the imaginary part of G of lambda minus I epsilon, when I take the limit of Ada, Ada times those two zero plus this equals. It's one of the fire. So fire row of lambda. Okay, so. So this. And this is not specific to the function row. I mean this is true for any such integral. So in my case I have, I have the what's on the numerator is not row but row times lambda times this function psi of lambda. So the claim is if I know this function H of Z, I can recover so that the imaginary part of H of lambda minus I Ada will converge to pi row of lambda times lambda time this function I'm interested. Okay. So, if I have my function, I just need to, to compute this H and take the specific limit. Look at H on the real axis because really, okay, maybe just practically this step back a little bit. What are we trying to do. We have a sample covariance matrix E. And we know it's not a good estimate. Okay, so yeah. We're going to pause to pause anyway here to try to remember what are we what are we trying to achieve. We have a sample covariance matrix E. And we know that for instance it's eigenvalue distribution is too wide. Okay, so maybe the truth see somewhere here. And, and so in a sense, we'll need to shrink the eigenvalues because we, we know from the direct problem that the matrix E is wider has a larger variance. And so, so what we'll like to do is build an estimator chi, which which I write again as psi of lambda k so basically I go from. What I want to do is take the eigenvalue so lambda case or the eigenvalues of the matrix observed, and I want to modify them by a function. So I'm trying to find this function, the optimal function to shrink the eigenvalues. So to think of this function. This is as a function of lambda. Well this function is really only defined from lambda minus to lambda plus. I really don't care where the function does outside. But actually, it's not true it even tells you something if you have a few outliers, the function can tell you how to treat also the outliers because this is really a theory of the bulk, but it also works. There's a word by Auntie Knowles on the fact that that this also works for outliers. Okay, but anyway, let's focus on the bulk. No size is the replacement function this is this is the continuous version of side cake. So, what I want to say is that. So if I don't do anything if I don't clean my data. There is the. So this is the Y equals X line. Okay, so this is not doing anything. Okay, so I, so if you think psi of lambda, if I say, if I say psi of lambda equals lambda for instance maybe right this way, psi of lambda equals lambda is the, I don't do anything function. I said this is equivalent to say my estimator is just a sample covariance matrix. Okay, but I know that the small eigenvalues are too small, and the large eigenvalues are too big. And that's one reason and also the fact that I don't know precisely the eigenvector also forces you to do even more shrinkage. What you end up doing is some thing where these guys you definitely want to move up these guys you want to move down and you want to do something in the middle. Well, it's actually, it's actually monotonous. Okay, so your function you want is a function that, and actually, yeah, so, so this, this is lambda and this is psi of lambda. So if I say that shrinks, I mean I should use a different color although I don't think the colors show very well in the screen but so have a function here might be like this and I have different shape depending on on the noise and. Okay, but I want to it's a shrinkage function, it tells you I see an eigenvalue lambda that's small, I give it a bigger value, and then I see a big one I give her a smaller value, and I do this in a sort of way, and typical eigenvalues are one, and so an ideal value one I typically, I might move it a little bit but but but one is not exactly a fixed point but an approximate fixed point of this problem. And so, okay, so this is the, so this is the function we're trying to get. And I argue that in the continuous limit on the nice circumstance, it converges to a smooth function of lambda. And it's actually monotonous, I'm not sure with the I think we finally proved that it's monotonous but but I intuitively has to be monotonous it'd be very strange. If, if this function is not in monotonous the increasing. So if I was small I can value, I do something with it, but I'll. Okay, so this is how I build my estimator. Okay, and this is the function here. So in a, it's just. So I have a this function of a complex variable H of z. So, again, H of z is going to be the normalized trace of T, E of Z, I'm seeing, so I haven't computed this function yet. By the way, it's very similar to the, the object that I encountered yesterday in the, the phase retrieval problem. So there was a function that I didn't compute in the phase retrieval problem, also called H of z, but it was a slightly different version of this but it's again, a phase of some noisy matrix and, and, and the original matrix, except that in the other case, the, the, with the Wishart matrix and the noise were sort of inverted but anyway. So, so I haven't told you how to compute this and I really hope I can do this in 15 minutes. But if I know this function, then I'm done, because on the relaxes where they're where their eigenvalues, this thing will have a branch cut. I can look at the value of the branch cut which is really this process of taking the imaginary part very, very close to the relaxes. The branch cut of this function will give me my this function kind of lambda time stuff that I know I know pi and a row row this is the sample, the sample, I can measure the sample density. So the sample density I know I know pi I know the lambdas themselves. And so I, from this, the branch cut of this function H, I can estimate this function, at least in theory. Okay. And so, and this function as I said it's something it's just some of her shrinkage function that will take for, for, for noise that definitely, it will be some sort of monotonic function of lambda. Okay, and, and there's some sort of limit when there's no noise, you get the identity function, when there's infinite noise, you just get a flat, you can estimate everything, anything. So, so the other extreme limit would be psi always one, this would be the sky like kind of the infinite noise limit. So the no noise limit you get, you get cycles lambda, the infinite noise you get cycles one and for in, in normal case you get something in between. Okay, now, how do we estimate this and this is where these formulas become in handy. Okay, because you see, I need to compute the normalized trace of a tea operator. This is the matrix of a noisy matrix and trace with with C. And if I look at this relation that I've specialized. So, the expectation value over the noise of the team matrix that I'm interested in is related to the team the team matrix of C. The expectation value of the matrix of C is super simple. It's an object like this is C data identity minus C. Okay, so let's let's let's see what we can do with that. And then just a little bit of simple algebra to recover something useful. And see the clock is running. Keep this. Is that I can take this expression, multiply this is a matrix expression, but I can multiply by C on the left and on the right and take the normalized trace. Okay, and so I get that my function H of Z will be given by the normalized trace of C. And then let me specialize exactly. Well, let me call this this big object here. Capital Z. So, so, and maybe I'll use a lowercase z or data. So, so capital Z will be what's in the, the argument and the argument is is well little Z as transform of the noise evaluated at the T transform of the sample covariance matrix. Okay, so this is just this argument here that I give it a letter to be able to to do the algebra more quickly. Okay. Yes. Yes, but it's a normalization convention I don't actually need it. I don't know. It's just a clean way of normalizing things so that the typical eigenvalue is one, and so on. If trace of C were seven, then the typical eigenvalue would be seven, and etc. So a clean way maybe it doesn't cost anything to normalize in that way and it's easier to when I plot things that one is the value where where I expect most eigenvalues to lie or the average eigenvalue etc. Okay, so I just define this this this variable Z is the argument, and then. You know, just like T of C is C evaluated some data is data identity minus C inverse, but here this and I need this evaluated at this big Z. So I have C, another C, not going to write C square, because I'm going to expand this, and then big. So it's just that I'm evaluating this. Yeah, I'm evaluating the, the, the, the, the T matrix at this argument. Okay, this argument is a bit messy. So I'll just leave it here. Okay. And now I want to do some very simple algebra. I have C of two C's and basically what I would like to do is, is, is relayed back to, to another to the original T transform. If you want the one thing that I also have this other expression that T of E. The value of Z is T C, even a big Z. Okay, so if I can. And T C is really the normalized trace of C evaluated a big Z. So what I'm trying to do is recover an expression like this, except that here I have two C's I have C squares. Okay, but instead of writing C squared C times C, and you see that by adding and subtracting this identity matrix times Z. I can use the fact that that this is the inverse of what I will, I will build something that will cancel this term. And so I actually did this way I'm going to get rid of one of the C's. Okay, so let me do this. Let's say that this is the same as tau of C, C plus C minus big Z identity plus big Z identity so I haven't really done anything, and then I have big Z identity minus C inverse. Okay. So if I look at this term here. The inverse of this term with a minus sign. Okay, and this is just a constant. I mean, in the matrix sense. Okay, so what I get I get. Well, let me tell of minus C. This is the identity minus C. And then I have this term here which is the identity matrix that can pull out of the trace, plus Z tau of C. This is the identity minus C inverse. Okay, well this is just the T transform of C evaluated in capital Z, and the T transform of C value in capital Z is just the T transform of E evaluated in the normal Z. So this is just plus Z. E of Z. I should put smaller Z's like this. So you don't don't confuse. So these are the capital capital Z's are this ugly beast. Okay. And here. The the this this has disappeared. Sorry. This is this here. This is the inverse of that. So it becomes the identity matrix. So I get minus tau of C plus minus because this is the minus the inverse of that. Okay, so this is my major H my matrix H of Z. It's very simple. It has a term that depends on the trace of C, but actually, I don't even care about this because it's this is real. This is real. And the only thing I really care about every race it's here is imaginary part. So this doesn't even contribute to the imaginary part. Okay, so that's, and then. And then I have the, the T transfer and you see. It's like miracle it's gone. See is gone. Well, here, what's left is a trace of C but, but it's real anyway so I don't care. And I've managed to express everything back into quantities that I could have principle observe the T transform of the sample covariance matrix. And this object here, which has the S transform of the noise, but evaluated the T transform of the sample covariance matrix. This is the sort of the large and miracle that essentially because everything converges to limits that I can compute. I could actually compute this only using objects that are that depend on E. Okay, so I put all the pieces together. And then you're going to have to trust me on some simple algebra with imaginary parts and so basically what I need to do. I'm really interested in in psi. So psi is really H imaginary part of H divided by row. This is itself. So row lambda actually row lambda is the imaginary part of the T transform. So I can write it as a ratio of two imaginary parts. So, this is what I'm going to do. And then the factor of pies cancel out so I get psi of lambda will be equal to some ratio. Initially, let me put the limits but actually it could be the same limit to ratio of two limits of imaginary part. So I have the imaginary part of I really erased it, but it was to do to what was it. Use my notes because I don't want to say something silly. Yeah, it's. T of E. Evaluate that lambda minus I eta times S transform of the noise evaluated the same T transform divided by. Okay, so it's a ratio of the imaginary part of the T transform. Except that on the numerator, the T transform is multiplied by the S transform of the noise. So if the noise is one. If there's no noise. If there's no noise the S transform of the noise is just the one. And then I get the ratio of something divided by something but the same thing. So, and there was a lambda that disappeared. What happened to my land anyway. Yeah, there's a land in front that I somehow missed somewhere. Okay, so. So, so if the noise is zero, then if there's no noise, no multiple, no multiplicative noise means multiplying by one, the S transform of the identity matrix is one. So I get the ratio of two quantities, exactly the same. So I get that the, the, the, the shrinkage function is lambda itself. Okay, then I can do a few things that I don't have time to do, but I can specialize. And then I go back now I say okay well what really just mean we're sample covariance matrices. So if I say the sample of the noise is one over one plus qt. Then I can write explicit equations that a bit more explicit than this. I promise that if C is an inverse we chart, then actually you have explicit formulas for T e. Okay, so, so, so he becomes the product the free product of which are an inverse we chart that that's that all of this is very explicit, and you can actually write the formula, and you get, you get linear shrinkage. And what is linear shrinkage is you get something very simple you get a side of lambda is essentially alpha times the identity. Or maybe I'll write it as alpha times your matrix plus one minus alpha times the identity, and alpha is some sort of signal to noise ratio. I think alpha is is p over p plus q. The variance of the signal divided by the variance of the signal plus the variance of the noise. Okay, this is typical shrinkage. Oh, yes, this is also okay you can write it here lambda. But this translates directly onto a matrix. The matrix chi is alpha identity plus one minus alpha identity. So here's the number one. Yeah. So, so basically, you have a shrinkage function on the eigenvalues but because it's a linear function of the eigenvalues. It's a linear function of the matrices themselves. So this is the only the only super simple case. Okay, and as I said, this is even true in finite dimension this was done by statisticians. In finite dimension, you can you you don't do you don't use this you use a Bayesian approach, and the fact that C is an inverse wishart you say you put an inverse wishart prior on the matrix. And, and you get this formula, but in this context what it means is that. That these formulas, you do you do recover the Bayesian what I want to say is that you do recover the Bayesian approach you do see that this is a shrinkage, and a shrinkage is always a signal to noise parameter. So, q is the noise in the in the sample covariance matrix P is the amount of signal that you have in the original matrix. So this is kind of a signal to noise ratio. I haven't told you let me just say all the things I haven't told you. These are all continuous functions on branch cuts. And in real data, you have finite and then you have polls. And so I haven't told you how to deal with that, but but the simple trick. One trick that works. It's not the best thing you could do but it, but it's, it's the first thing you can do is to just take an imaginary part. An imaginary part that's not too big or not too small. So, imaginary part of order one of a root and then you can apply this. So you you, you compute these objects, not on the real axis because on the real axis you have polls, and you really want to look for branch cuts. But so, and in a way in a way to a way to but you can you could just make take this limit and not go to completely zero. You basically say, Okay, I'm going to use eight I got one of our square root event and use the equation as is, and this is a pretty good answer. You can do better, but you need to work more and to play with data and. And I'm done. Sorry, for being a bit late. Thank you for one question. Isn't it a mixed up in the notation and a bit confused for a minute or two, like you end up with HZ at the top, which is some of our care of lambda k psi k Z minus lambda k. Yes, yeah, and below you, the H of Z is the tau of T e time C. Yes. The lambda case are the eigenvalues of E. Okay, this is role. This is role of E. Okay, then it's fine. Yeah, I actually never use in this derivation. I've never actually give you a symbol for the eigenvalues of C. C appeared as a trace C appears in the trace. I never actually use an eigenvalue decomposition. So nowhere in the computations do I needed to tell you what the eigenvectors or the eigenvalues of C is. There's another way to do it is that Jean-Philippe Bouchot prefers because it's to define the overlap between the matrices, and then, and it's even more intuitive in a sense. But I find it's a bit cumbersome. I find it more cumbersome, but he prefers this formulation and then this is what I prefer. So, so you can define, you can say that C as eigenvalues mu's and eigenvectors use, and then you ask yourself, if you have a ul and a vk, what's the overlap, and you can define the overlap function. And the overlap function will depend on lambda and mu. The problem with that is the continuous limit, the continuous limit of that is a bit tricky. And so you need to be a bit fuzzy about what you're, but I guess you can make it more rigorous, but I find that this is closer to being rigorous. It's very far from being rigorous, but it's slightly closer to being rigorous and then introducing this matrix of, so basically it's a function of two parameters, lambda and mu, so it's a function of, which tells you how an eigenvector associated with lambda of E, what's the typical overlap with an eigenvector of eigenvalue mu for the matrix C, and you can do the sole derivation by introducing this function, which actually useful because this function can be used in other computation as well. I don't know what you do in your paper actually, but in the book you do that kind of question instead. I think we do both, yeah. Okay. Yeah, we first introduce this big five of new and lambda. But then, then I think we go straight to this. Yep. All right. If not, then thank you very much. Thanks.