 Okay, welcome everybody for the second day of this school. Well, actually yesterday was a great day because I realized that with Pierre Paolo and Kimi, we have overlapping subjects. So you'll see similar things from different angles. I'll do computations today that are similar to the ones that Pierre Paolo did. I'll use concepts of free-ness that Kimi will talk a lot more about than I will. So today is still relatively basic. It's still relatively standard random matrix theory. I'm gonna talk about a problem of inference. And the problem is you have a rank one perturbation. So a vector or a projector onto a vector that's hidden in a big matrix. So you have a full rank matrix of noise if you can think of it as noise. And within it, there's a rank one vector of a rank one perturbation. So a projector on the vector that you're trying to find. And the question is, how strong has the perturbation to be for it to be observable all of this in the large N limit. So this is called the problem of, it's usually called the BBP transition because the transition as you increase just the size of the perturbation, there's this regime where you don't see it. And then there's a regime where you see it and add the transition, some funny things happen, but I won't talk too much about exactly what happened in the transition. This was a subject of this paper, famous paper by Ben-Aruz Bacon-Pichet. Pichet is something Pichet, one of the organizer who's online. Okay, so, but the problem was first studied by Edwards and Jones in 1975. So it's not a recent problem. Okay, and the type of tricks we're gonna use are things like Sherman Morrison. So basically there are two tricks with matrices that you need to know is Sherman Morrison for rank one vectors. And something called Schur's lemma, which are inverse by partitioning that Pierre Paolo used and I'm gonna use again in the second half today. I'm gonna use a, and they're very similar formula. There's just algebra and the best way to prove them is just to compute. And you can check that this is indeed the inverse of this matrix. So you have a full matrix, whatever matrix is that's invertible, either a rank one perturbation, the more general one is u times v, but we only need a symmetrical case of v times v. And you get this formula. So I'm gonna use this today, but this is basically the computation we need to do. So what's the problem? So I have a, I'm gonna have a rank one vector, which I'm gonna call, oh, it's always a u and a v. So I'm gonna call it u. Yeah, vector u, okay, that I don't know that I'm trying to infer. And it appears in the matrix as a perturbation a times u, u transpose. So I set the norm of u to be one. And so the amplitude of the variation is governed by a. And it's hidden in a full-ranked matrix M. And my question is, can I, so this is, I didn't give a name to this matrix, but so this, so I observe M plus a u, u transpose. And the question, is there a way to recover this vector u? Okay, so, and again, I promise problems of eigenvectors. So I'll be talking about not just the eigenvalue, but also the eigenvector, okay? So the way to, and so I assume M to be a matrix that has a bounded moment and so forth that has a bounded spectrum of eigenvalue. So M will have a maximum eigenvalue lambda plus, okay? Which is, okay, and I'm doing everything in the large n limit. So it's really the edge of the spectrum. So I assume M to the continuous spectrum of eigenvalues and it doesn't have a direct mass and then this largest eigenvalue is not a direct mass. It's largest eigenvalue is the edge of a spectrum, okay? So, because you could imagine a matrix with a direct mass and then actually you could, this whole technique wouldn't work. Okay, so the technique to find eigenvalues and eigenvectors, as I mentioned, especially for bounded matrices, is to use the resolvent matrix and it's trace, which is still just transformed. So what, again, is the resolvent matrix of this is G, it's a function of a complex variable Z. It's a matrix function. So it's a random matrix function of this object and it's Z identity plus minus M minus A, V, V transpose inverse matrix, okay? And as I said today, this object has a pole. I mean, at this function as a function of Z has a pole at each of the eigenvalues of this and the residue of that pole is a projector onto the corresponding eigenvector. So we'll be able to capture the eigenvector, okay? So basically this is, if you think as Z identity as this matrix M here, then you can just apply Sherman Morrison, essentially as is. And so with the inverse, and I will call G zero of Z, this object, Z identity minus M, and I will call G zero of Z this object Z identity minus M, okay? Inverse, so this is the unperturbed, so if I didn't have the rank one perturbation, this would be the resultant matrix of M, okay? And I just apply, so this will be equal to, it's gonna be equal to G zero of Z and then plus, I'm just gonna say anything stupid. Yeah, okay, so plus A, G zero of Z, V, V transpose G zero of Z, one minus V transpose G zero of Z, just A square, no, it's A, it's A. No, it's A, it's A. Oh, there's an A missing, sorry. Yes, and then therefore it means that my Sherman Morrison might be wrong, or okay, that's a plus, sorry, sorry, sorry, sorry. Okay, so this is correct, I sort of backtracked the Sherman Morrison and I forgot that there was a minus sign. Okay, sorry about this. Yeah, and just also in terms of notation, I use, so V is a vector, so V, V transpose is a projector, it's a full, it's an M by M object, whereas V transpose matrix V, this is a scalar, okay? So this is the, okay, so here I have, this is a matrix equation. So on top of a product of three matrix, the matrix G zero matrix times the projector and the G zero matrix, this is a matrix where the denominator, this is a, one is a scalar, obviously, but this is also a scalar because it's V transpose GV, okay? And so the first question is, when does this thing have a pole? So basically, I said, I expect G zero, G zero as poles everywhere, say, between lambda minus and lambda plus. So if I look at my original matrix M, it has lots of eigenvalues. And so the function G zero has zeros and poles everywhere from here to here, but beyond lambda plus, it's perfectly regular, okay? So the idea is that there'll be, if I'm looking for a pole, so I'm looking for a pole at lambda one greater than lambda plus, okay? So the pole cannot come from here. All the poles are smaller than lambda plus. And again, here, the numerator might have lots of poles below lambda plus, but the numerator never has a pole for Z greater, for Z real and greater than the plus. So the only place where a pole could appear is if this denominator is zero, okay? So what's this denominator? So one minus V transpose G zero of Z. Well, let me put lambda one. So the question is, can this object, if I find a solution to this equation, then this is a scalar equation. If there's a solution to this equation, then I have a pole at lambda one. And I need a solution bigger than lambda plus, because if lambda one is, if I have a solution less than lambda plus, then it's not necessarily a pole because there are lots of zeros and poles everywhere and it could be all sorts of cancellations. But if I'm beyond lambda plus, all these things are regular beyond lambda plus there and they're not poles. So this is definitely a pole, okay? Now the other thing I'm gonna say in the large and limit, I basically have a vector V, which I assume that this vector V is either rotation invariant or at least or the matrix M is rotation invariant. But somehow there's no correlation between this vector V and the eigenvalue, eigenvector and the eigenvectors of M, okay? So in the large and limit, this essentially this object here converges to the normalized trace. If the V is a unit vector, so you can think of it as it converges to one component in the basis where V is a basis vector and this is one component on the diagonal. And so this would be, this converges essentially to the element G zero, the element one, one in some basis, but by sort of permutation symmetry, this converges to the normalized trace. So the trace would be the sum of such all elements divided by N, so it should equal to this. So basically this equation just becomes in a large and limit, one minus a G zero of lambda one, okay? So, and again, I define G, the field just transform, G of Z, I define as one over N trace of G of Z. And again, this is a random matrix, it's a big object, but this is a scalar function that under, I haven't said all the basic time of physics, I'm trying to compute answers. And then once we have an answer, we could try to see what are the hypotheses that are really necessary to get this answer, but under sufficient conditions, this converges to this outside of the spectrum converges to a nice function. So we need to solve this equation, and let's look at the function G zero, okay? So again, think of blob of eigenvalues of M, and then, so this is row of lambda of M, okay? The plot on the same axis here, I'm plotting as well G of M. So the function G, it's a, which you can write in integral form outside of here, you can write G of Z integral row of lambda, Z minus lambda, E lambda, and so this is integral from lambda minus two lambda plus. I assume that the density has no direct mass, it's just a continuous density. And so beyond lambda plus, it's just a decreasing function of its argument, okay? It has a branch cut here, but we're not interested in the behavior here, okay? We're interested in the behavior beyond lambda plus. So beyond lambda plus, it's a nice function that's monotonically decreasing from a point called G star, and G star is this G evaluate at lambda plus, okay? So it's really the limit, the limit on this side, okay? And because at that point, it's singular, it has a branch cuts, at finite N, it has tons of poles, at large N, it becomes complex. But, and then I wanna find a solution, okay, let me just rewrite this as G of lambda one equals one over A, okay? So I'm looking for a solution, suppose that this is one over A, okay? So I'm looking for a solution, G of lambda one equals one over A. So as long as one over A is below G star, there'll always be a solution, okay? So, but if one over A is above G star, there are no solutions. So the condition for a solution, if one over A is smaller than G star, which I write as if A is greater than one over G star and just to remind her, it's the same thing as one over G of lambda plus, okay? So it's still just transformed at the edge of the spectrum, okay? So we have a condition, so we know now that if A is too small, as I said at the beginning, if you have, if your outlier, if your one-on-one perturbation is tiny, then it will be lost in the C of eigenvalues of the matrix M and you won't be able to recover it. Whereas if it's large enough, if it's extremely large, if A goes to infinity, then it's pretty obvious that you'll have a large outlier. So, okay, now where exactly is this? Let's try to get a bit of intuition. Well, again, my function G on this domain is perfectly, it's monotonous, so I can invert it. So I can, and I did that yesterday. So I have function G of Z and sort of lazy notation. I say Z of G is its functional inverse. Okay, so Z of G is the functional inverse. So if I use the functional inverse, then the solution is quite easy. Is lambda one equals Z of one over A, okay? So, but this is really depends on the matrix M, okay? I use other notation zero for unperturbed or M for the matrix M. And it's slightly more, remember yesterday I rushed through the definition of the R transform, which is extremely useful tool. And I just define the R transform of a matrix as a function. I usually use it as a function of G because it is this inverse function Z, inverse field just minus a pole at zero. And we're gonna get a little bit more intuition using the R transform and this expression. So what this is is just the R transform of one over A, evaluate at one over A. So Z, so R is Z minus one over G. So Z is R plus one over G. And one over G, well, given that is a, okay, is that clear? At the solution. So I have this expression and let me write it again. I have this expression for the outlier. The outlier lambda one will be at A plus R transform of the matrix M, evaluate at one over A. So you see that, well, first of all, and then we know a lot of things about the R transform. So R transform is a monotonous function. So, and for a matrix that is, let's assume that the matrix M is traceless. So trace of M is zero. Otherwise, if M is not traceless, by adding M, I'm adding an average value and I'm shifting, but the eigenvalue by a constant, which is kind of trivial. But so if I'm not, if there's no constant in M, so the trace of M is zero, then it's R transform tends to zero. So basically, another way of saying this is that the R transform of a matrix as a power series at near the origin is tau of M, the trace of the matrix plus G times sigma square. Sigma squared is the variance. So the second moment, the center second moments plus order G squared. So all I wanna say is that, and we also know this function is monotonous. So as A gets larger and larger distance to zero, and so for large A, it's quite obvious that lambda one equals A. But for intermediate A, we have this correction. So the eigenvalue is not at A, it's actually typically to the further right. So essentially by adding this outlier to the matrix M, we get an outlier that's actually further away than A. And let's look at a specific case. If I look at the, if my matrix M is a Wigner matrix for which I know a lot of things. Okay, for the Wigner matrix, R of X is just, well, let's even do a unit. I can do a unit because the scale of the problem is given by A. So I could say that the matrix M is just a unit, so a Wigner matrix with variance one, then this is just X. And so we have the famous result that lambda one is A plus one over A. And the condition, another thing we can say, we know about the Wigner matrix, it's easy to show that G star is actually equal to one and lambda plus is equal to two. Okay, so what are these? So you know that the Wigner matrix goes from, the unit Wigner matrix goes from minus two to two. So the largest eigenvalue is two and you can actually easily compute the still just transform at two and it's actually equal to G star equals one, okay? And so this A is larger than one, which is a bit counter-intuitive because you would have assumed that naively that you would need a perturbation to be bigger than the largest eigenvalue, but it turns out that there's a regime, so I can maybe make a bigger plot here, there's a regime between two, so this is one. So as a function of A, so here I have no outlier, but as soon as A goes to one, I start to have an outlier here, okay? This is this function, A plus one over A, as soon as A is bigger than one, A plus one over A is bigger than two, okay? So you can have, the funny thing is you can have a, you can have a rank one perturbation of size, say three halves, which is in the C of eigen, which you would think would be in the C eigenvalue, but you wouldn't see it, but actually because of this mechanism, actually an outlier pops up, pops out at A plus one over A, which is always greater than two for any A bigger than one. So that's one of the little counter-intuitive things. Okay, I said, I promised to tell you something about eigenvectors, so I've erased what I needed, but let me go back. So I computed the resultant matrix, let me write it again. So this is the resultant matrix for the perturbed matrix as a function of the resultant for an unperturbed matrix. And I said it was given by A, G of Z, V transpose G of Z over one, and I'm gonna directly put the right, the large and limit result for the denominator. So we started the, in the large and limited denominator tends to A times G of Z. And I said, there's a pole that if lambda one equals G inverse of one over A, I have a pole here. And the question, what's the projection? So basically this function will give me a, at the pole, the residue will be a projector. And can I compute, what I would like to compute is this eigenvalue would be associated with an eigenvector of V one. And I wanna know this eigenvector of V one. Sorry, I was using U. And I wanna know whether there's a significant overlap between the first eigenvector of the splitter matrix and the original vector that I put in. So I wanna know the overlaps of V one transpose U. And because eigenvectors, you see the problem with eigenvectors is you don't know the direction. Minus an eigenvector is always, so when you compute a scalar product of an eigenvector with something, you should always square it because if it's minus one or plus one, it means you're perfectly aligned. But if it's near zero or either positive or negative, it, there's no alignment. So we always look at square overlap when you look at eigenvectors because of the sign ambiguity. So we wanna compute this quantity. And I'm saying that this is the limit when Z goes to, this is just a residue, just writing formally with the residue, I mean of Z minus lambda one times U transpose G of Z U. Okay, so I'm computing the scalar product of, so as if I'm looking at in the direction U, I'm sandwiching the, it's made to big G and I'm computing the residue because I know that at lambda one, when I have a pole, at lambda one, there's a pole. So to get rid of the pole, I need to multiply this by this and this is the residue, okay? So this comes from the fact that this is some of projectors with poles on the eigenvalues, okay? Okay, and so the bottom part I've already done the large and limit now I need to do the large and limit of the top part but basically this thing doesn't have a pole at this object, doesn't have a pole at lambda one which is because all its poles are below lambda one. So it won't contribute to the residue computation. All I need to do is compute the numerator and then compute the residue. And essentially I'll go a bit quickly but essentially the numerator is A G squared of Z, okay? So basically I need to compute this object. Let me run out of time, there's a few more things I wanna say but it's fairly easy to convince yourself that in the large and limit, this quantity here is just G squared. The steel just transformed, this is a scalar now, okay? And then one minus A G of Z, I need to compute the residue, so I need to take the limit, we're going to lambda one. And so obviously this limit is zero over zero and so this is typical when you're computing a residue. So the numerator at lambda one, this is zero and this is zero. And when you have a zero over zero limit, you wanna compute, you use lospital rule which says that this limit is also equal to the limit of the derivatives. So if I have a function F, if F over G, I think you should know, I won't explain but anyway, to compute a limit that zero over zero, it's equal to the derivative of the numerator over the derivative of the numerator. So I get that V one transpose U squared in the large and limit equals, so what is it? So essentially, so I'm gonna have A, I'm gonna have two G but then Z minus lambda one is gonna go to zero and then plus G squared divided by the derivative of the bottom which is minus A G prime. So if I put this together, I get that my residue is just minus G square evaluated at lambda one over G prime divided at lambda one. Okay, so I can compute this overlap. Again, this is, in this form, it's not super intuitive but again, if you have a good intuition about the R-transform, which when you do a lot of random magic theory, the R-transform speaks to you. So I just wanna use a formula using the R-transform to compute G prime. So again, we know that, well, we know this expression that Z equals R of G, of Z plus one over G of Z. This is another way of writing the definition of the R-transform. It's the inverse function minus the one over G which I put on the other side. So this is an implicit expression for the R-transform and from this I can extract an expression from G prime. I just take the derivative, I get one equals R prime G prime minus one over G square G prime. I just took the derivative with respect to Z of this expression and I get an extra, what I'm trying to do is G prime, I don't have a good intuition for what it is. So I'd like to express it in terms of the R-transform and so here I'm gonna have an expression for G prime as a function of the R-transform which has done just G prime of lambda one equals one over R prime of G of lambda one minus G minus two of lambda one. Okay, this doesn't seem to help but in the end there's a miracle that happens. For instance, we know that G of lambda one is one over A. And then if we compute this object, so minus G square over G prime, we get a relatively simple answer which I'm just gonna write down from the notes. We get one minus R prime of one over A divided by A square. Okay, so it's just elementary algebra just plugging this here in this expression and using the fact that G of lambda one is one over A. Okay, so which is nice because R is monotonous, this is the positive quantity. So at least what's reassuring is that this overlap is bounded by one which it should be an overlap between two normalized vectors should be bounded by one. So that's most one. You see that when A is super large when your outlier is very far, then you have very good estimation so that this quantity goes to one. And then some interesting things might happen when A is near the transition. So again, to give us some intuition, let's look at that Wigner. So for Wigner we said unit Wigner, unit Wigner is super simple, R of X is just X, so R prime of X is just one. Okay, so in this case we have that the overlap V one transpose U squared is just one minus one over A square. Okay, and we also have the condition for unit Wigner that A has to be greater than one, otherwise there's no outlier. All of this is conditioned on A being greater than one. And it's a miracle this at exactly at one when you barely have enough perturbation to have an outlier, the overlap tends to zero. So for Wigner, the overlap, so the overlap starts at zero. So if you start, so if A is less than one you don't have an outlier. So there's no notion of overlap. As soon as A reaches one, you start to have an outlier and the outlier tells you something about your perturbation, U, but it starts with a zero overlap and then as you increase A, the overlap increases. So you get a picture like this as a function of A. What does it look like? So this is A is one. So it starts like this and then it's balanced by one. But is this generic? For, is it true for, and I'm gonna end with this computation. Marc, can you re-explain the starting point, the first identity, how you relate the overlap square to this limit? Oh, this here, okay. So I think the way to do this is to recall the definition of the resolvent matrix. If you write the resolvent matrix in the eigenbase. Okay, so again, resolvent matrix G of Z is Z identity minus a matrix inverse. And in the basis of M, in the eigenbasis, it's a sum over K of projector VK, VK transpose Z minus lambda K. So, and then if I sandwich a vector, so I suppose I wanna say, I wanna compute, no, I just compute, I take a vector, another any vector, I use a compute G transpose, I compute this object. This is a scalar. So it's a sandwiching the vector U with the matrix G. In this expression, I get sum over K of UK, no, U, U transpose VK, VK transpose U over Z minus lambda K, which is the same as sum over K of the norm square of this VK over Z minus lambda K. But then if Z is close to a certain lambda, say lambda one, there's a pole at every lambda. So if I pick a Z close to one of these poles, so say the largest eigenvalue, but it can work with any other one, then because of the pole, the only term that dominates, as Z goes to lambda one, the pole dominates and all the other terms become negligible. And so of course, this expression is infinite as Z goes to lambda one, but I can compute a residue. So if I multiply then by Z minus lambda one, then I will eliminate the pole and this expression will go to zero and I can pick up the residue and this is how you get the expression. So you could do the same with the second eigenvector? Exactly, yes. And you would see at the end that the overlap is always zero? In this particular problem, yes. The perturbation will only make you, yeah, it will not, the second eigenvalue will have zero overlap or actually more precisely need overlaps for if I take two random vectors, if I take two random vectors of dimension n, this object typically is of order one over n. It's a two random and high dimension, two vectors of the overlap between two random vectors is order one over n. So it's hard, you need to cook up something to have an overlap of order one, okay? So I'll go back here, I said for a Wigner matrix, I have that the overlap starts at zero, that as soon as I have an outlier, the beginning of my outlier has zero overlap and then the overlap grows. And I wanted to see whether this is a generic feature or if it's special, okay? And for this, I need to understand this ratio. And this is a, so I have G square over G prime and near the transition. So basically for a generic matrix M, I will have a condition that A must be greater than some A star, which is defined as essentially by this equation. What is it again? With A must be greater than one over G of lambda plus. Okay, so this is the definition of A star if you want. And in this case, I have an outlier lambda one, which is greater than lambda plus. Okay, and my question is when A is close to A star, in other words, when lambda one is close to lambda plus, does the overlap start at zero or it starts with a finite value? So the question is, what is this ratio? And we know that G at lambda plus is some finite value G star. It has to be finite because it's a decreasing function. So it's actually the maximum value of G. So it's a number of order one. So what happened with G prime? Okay, is G prime finite? Is it zero? Is it infinite at this point? And to do this, I'm gonna recall, so in the continuous limit and the large end limit, I can write this function G of Z as an integral. Okay, I'm mostly worried about the upper bound. So we have the density of eigenvalue and then Z minus lambda. Okay, and then I can compute G prime. It's the same similar integral except that I have a minus sign here and Z minus lambda squared. I just took the derivative respect to Z of this expression. Okay, and then I'm looking for Z which is equal to lambda plus, plus some epsilon greater than zero. Okay, so I'm just, and basically the question is, does this integral converge when at the upper bound? And this will depend on the behavior of the density. So for Wigner matrix and for actually a large class for most typical, it's a weird concept of what is, actually, so most eigenvalue distribution you'll encounter will have a square root singularity. Okay, so row of lambda behaves as lambda plus minus lambda to the power of one half with some pre-efficient. Okay, so this is typical and it's definitely the case for the Wigner ensemble for the Wishart ensemble. The density of eigenvalue at the upper edge falls at a square root, okay. This is, if you have a, if you draw around the matrices from an ensemble with a potential, you'll have eigenvalue repulsion and you'll settle into an equilibrium that's typically with this square root singularity at the upper edge. With this square root singularity of the density and it's related to the fact that you have a, it comes from a solution of a quadratic equation or the simplest case comes from a solution of a quadratic equation. And when this square root singularity is what gives you the imaginary part in the Stilge's transform, okay. So anyway, so there are good reasons why it's very common to find this exponent a half but it's actually possible to cook up models, random matrix models with exponent three half for instance do happen, you can cook it up or I never said that M was drawn from any ensemble. M could have been something that you cooked up yourself which is not drawn from a classical ensemble and you could have a density of eigenvalue as long as it's bounded, the whole computation agrees. So M could have been some sort of diagonal matrix randomly rotated and you picked up the density of eigenvalue and then it will matter what, but if I say that, if I look at the, if I say that role of lambda behaves as a lambda plus, I mean, it has to stop, okay, lambda plus is the maximum. So let's say it behaves like this with some exponent theta, then I can look at the convergent of these expressions. And so for this expression to converge, I just need theta to be greater than zero, okay. So there's still just transform is well-defined if the, still just transform is finite at the edge if the law falls with exponent bigger than zero, but for the derivative to be finite, I need theta to be greater than one, okay. So what happens in, I raised my formula, but what I had was at the overlap and then I'll stop here UV one was given by minus G square over G prime lambda one. So what happens, what happens in the Wigner case is that as you get, if A is close to one in this case or close to the edge, this integral diverges and G prime becomes infinite and in this case, this goes to zero, okay. So this generic behavior that the behavior we found for the Wigner case is the fact that the overlap starts at zero comes from the fact that the derivative of the still just transform is infinite at the edge, okay. And this is true for what I'm saying is that for typical random matrices, you have theta equals a half and so the still just transform is finite at the edge but not as derivative. So the ratio of still just square over derivative tends to zero. And so you have this generic behavior that the overlap starts at zero, but for some sort of critical density. So the density is where this edge is not as sharp. It's like a three half exponent, so it's softer. You have fewer eigenvalues here. You have this phenomena that as you increase A, if you look at the overlap, so this is for theta greater than one you have some sort of A star here. You have no overlap. And then what I'm plotting is VU V transpose U norm squared and then boom, you have a first order transition and it tends to one, okay. So what I'm saying is that this curve here is theta greater than one, whereas for more typical values, you have something like this for theta between zero and one, which includes that theta equals a half. So most random matrices have theta equals a half. So you get in this case where the overlap starts at the transition at zero, but as I said, you can cook up cases where and it happens in real problems that you have this first order transition where you have no outlier and as soon as you have an outlier, it has an on zero overlap. And we should take a break here. Unless there are questions, of course. Is there an intuitive intuition for a model? Like for suddenly you have these alignments from nothing to finite. The continuous case seems more, I don't know why, but more intuitive. Yes, but I forgot, maybe Pierre is not here. I've heard people, this problem recently came up in a physics problem or actually a machine learning problem where they had this transition and the people who thought that this was not known but it was actually known that it was in our book. Anyway, but yeah, so this sharp transition does happen in some, because these kind of problems do happen in machine learning or when you're trying to do statistical mechanics, you're looking for, there are problems where this type of transition happened. Actually, what I'm going to talk in the next hour is one such problem, looking for a projector. And so these problems do happen in real life and I've heard of some physicists interested in the case where, but they had a diagonal matrix. So they had a, and with a diagonal matrix, they sort of didn't come from a classical ensemble and therefore the distribution of eigenvalue could be anything they wanted. It didn't have to have a theta equals a half and depending on the theta they chose, they saw this phenomenon. So there is one question in the chat. Could you say something about the hard edge when theta equal minus one half, I presume, or does this analysis break down much earlier? I haven't, the problem is I haven't thought about it. Maybe I'll think about it during the break. And I think it works. Now, this analysis should work. It's just that you have G. So if theta equals minus a half, then yeah, then G diverges. So G star is infinite. Okay, let me think about it. I'll answer the question when we come back from the break. So let's meet in 20 minutes and there is the coffee break upstairs, like yesterday. Thanks a lot. Sorry, because of me, we started 10 minutes later I didn't realize the time. So take this additional 10 minutes so you have until, until, until 11.35. 11.35, okay. Okay, thank you. I want to finish with a few loose ends from what I left. But first of all, I want to say that I forgot to mention that what I presented is really based on this paper by Florence Benayi-George and Raj Rao, Nada Kuditi, that everybody calls Raj Rao, anyway, so they've done what I've done on the board and a lot more and with proofs, okay. And they also do the, I think the multiplicative case. So when you have a multiplicative perturbation, okay. And, because what I've done, I mean, this outliers has been done for Wigner matrices as I said by Edwards and Jones and 75 and then it was done for Wishart matrix. And then I think there was them that did it for general matrix with the R-transform as I did. It was also done independently at the same time by Jifang Yao. Okay, I, okay, I don't know. Okay, I want to say as well that it was not one but two sign error in my Sherman Morrison formula. It's because I'm always, you're always using with a negative perturbation. So there's a negative sign here and a positive sign there, okay. And so basically the two formula that are very similar actually that we use a lot with Pierpaolo used a lot yesterday was in mathematical the short complement or but you can think we're going to use this today. I have a matrix that I write as blocks. The blocks would be themselves matrices. I mean, but very often we use the block one, one to be a single number and the block two, two to be n minus one by n minus one. Anyway, so you can, if you write a matrix in blocks, I learned this from a book called, what was it called? It was, anyway, so it was C programming numerical recipes. I used it from, I learned it from programming in numerical recipes in C, which dates how old I am. Anyway, and this is the formula for the block, the inverse of the block one, one. And if this is a matrix, then this is the inverse matrix. If the block one, one is one by one, this is just one over a number. So I'm going to use this today. And there was a question, an interesting question from the chat, somebody asked what happens if the spectrum of the matrix M as a hard edge, for instance, if it has a, so if I look at the eigenvalue distribution of M, I suppose it has something like this, a singularity. This is typical, for instance, there's some problems where you have something called the arc sign law, which happens, it's kind of a, more generally, there's a Jacobi ensemble of random matrices and one specialization of the Jacobi ensemble, it's a very symmetrical, very simple, it's called the arc sign law, and where you have a rj lambda plus, and this behaves as, so one over a square root of lambda plus minus lambda. So basically this is theta equals minus a half. And the question was, does this formalizing still apply in this case, and the answer is it does. And so for instance, but in this case, for instance, g of lambda plus is infinite. So as soon as the perturbation is positive, you get a positive outlier. And actually if this thing is symmetrical around zero, if you have a slightly negative perturbation, you get a negative outlier, okay? As soon as, so basically this condition becomes, as soon as a is greater than zero, you get an outlier. And then to compute the overlap, you would need to do the limit, and I sort of did it in my head, but I could be wrong, I need to do it more carefully. But I have a feeling that in this case, this limit of g square, which diverges at the edge, and g prime, which also diverges, so the ratio of two divergent quantity, so you should do it carefully if you really want to know the answer. But I have a feeling that g prime will actually be more infinite than g square, and then this thing will go to zero, but you should do the computation to make sure. But I'm pretty sure that all this way we did today works. Somebody else asked me a problem where the eigenvectors of m are localized, then can you apply this? And the answer, yes, if you perturb it with a rotation invariant. So it was very important thing that I did in the computation is to say the objects of u, g of z, this is a matrix, u. So this is the product of a vector sandwich with this matrix. I say that this converges to g of z, and for this I'm using some sort of rotational invariance. So either the vector u is rotation invariant or the matrix g is rotation invariant. If the eigenvectors of g are localized, and then you try to sandwich this with a localized vector, this is wrong, so you're not allowed to do that. But if you have localized vectors here and we use something drawn over the sphere without any knowledge of the localization of these guys, this should be correct as well, but probably the interesting problems are, of course, when you want to sandwich a localized vector with something with localized eigenvalues and then these, basically all the tool that I know how to use are for rotation invariant matrices. And of course, a lot of people understood in sparse matrices or sparse eigenvectors, and the notion of sparseness depends on the basis you're looking at. And to be sparse in the canonical basis is not the same thing as to be sparse, you're not sparse in any other basis. So sparse vectors are not rotation invariant. And so most of what's in my book and most of my research doesn't apply. Okay, so those are kind of the tools for easy problems and the hard problems are when you have things that are sparse or you have some information about localization. And if it's not rotation invariant, the matrix M or U is, let's say localized. If you start, if you introduce a function that tells you something about the overlap between this vector and each eigenvector of the matrix M, can you go beyond and... Well, for instance, you could use this methodology except that you're not allowed to say things converge to the, to the still just transform. But other than that, and again, it's kind of this notion of free-ness. Free-ness is relative. By that I mean that you just need one of the two. To be rotation invariant. You can have a fixed vector or a fixed matrix like a diagonal matrix and a rotation invariant matrix they're free one respect to the other. So this notion of free-ness is actually a free, it tells you something about a pair of objects if they're free relative to one another. And in terms of physicist language for large matrices, this means that one of the two has to be rotation invariant, doesn't have, so when, and this is actually what we're gonna do here, I'm gonna talk about the problem where because my matrix is gonna be rotation invariant, I'm gonna take my vector as Pierre Paolo did yesterday. If my vector, if my matrix is already rotation invariant, I could choose my vector to be something very special to simplify my computation. Okay, so, and I think now I can talk about this. So today I won't, well, I think I'll do less, talk more and do less computation, but I never know, I'll probably get carried over and carried away. So I wanna talk about the problem of phase retrieval. So this is, the idea is, so I'm not gonna go very far in the problem, which is gonna show that the techniques that we learned today can be useful for practical problems. So this is kind of a toy practical problem. And the problem is as follows, it happens that if you're trying to measure Fourier components, you have some sort of detector that measures Fourier components of some signal, then often what you'll have is a power. So you'll have the Fourier component module square and you'll have this information and then you're trying to reconstruct the real signal out there, but to do a Fourier transform the power is not enough, you need the phase. And so there's a lot of literature, I think the centuries of literature on how to, how to deal with this problem where you have powers, so squares of one of these and you're trying to recover an image, for instance, which I would need all the Fourier components. So I'm gonna make a toy problem of that. So let's say I have a vector x that I don't know and I'm gonna also set the normalization. So all I know is that x is a norm vector, okay? It's in some unknown direction and I probe it with, so it's a dimension n and I probe it with, and normally you can choose what you probe it with and here I'm not gonna choose, I'm gonna say I'm gonna probe it with vectors y and to have an overlap of order one, I'm gonna say that y is, and also to have independent component, I'm gonna say that y is, so I have a set of yk's and they'll have a component i and this is iid normal zero one, okay? So I make a large vector, this is also a vector of dimension n, but just the norm of y, yk, the norm of yk is of order n and the reason I have a vector of order one and the vector of order n is just that their overlap is of order one, I mean, this is just scaling, you can define your variable anywhere you want but to have things that sort of the quantity of interest are order one, so basically one component of the vector y is of order one, and when I'm gonna project, okay, so basically this is a vector I don't know, I'm probing it and my machine that probes it, actually when it returns me, is the dot product of y and x, but I lose the phase, so I get yk transpose x norm squared, okay? And it's very easy in this problem to add some noise, okay? So basically I get a sorry, wrong notation, why is my vector I call it a, sorry, okay? Otherwise I'll be confused with my notes and everything I have in my head, okay? So this, so I measure a single yk, okay? So I have some vector somehow in space that I can't access, I have a machine that tells me the overlap with the vector and I'm gonna send a bunch of vectors so k will go from one to t, so I can do that a lot of times, okay? And I'm gonna be interested in the limit when the dimension of this vector is large, goes to infinity, but I can probe it also in a very large number of time. So I'm gonna probe this vector x with a very large number of a and every time I'm gonna get the dot product, but the problem, I don't get the dot product, I get the dot product squared, which means that I lose the phase and I don't know if I get a big overlap, I don't know if it was in the positive or negative direction. And here I'm doing everything real, but I think everything I say, you could just use complex numbers and when I say orthogonal matrix, you say unitary matrix, whenever you say symmetric matrix, you say emission matrix, I think everything works as well, okay? But just for, it's easier to work and so, and then this is a norm of square, so I also lose the complex phase of the dot product. And the question is, can I solve this problem? Well, you can do the sort of the inference problem, I could just say why hat is the argument of some penalty, which I'm just gonna take quadratic. So I'm just gonna go sum over K of, so I don't, so I'm K transpose X norm square minus Y and then a quadratic penalty. And I could just put this and through some sort of big optimizer and see what I get. And the problem, but this is typically a hard problem because it's a problem of finding a vector in high dimension. And the geometry for us humans, high dimension is dimension three, and a sphere in three dimension is just like the globe. And if you take, if you go, you wanna go on a trip anywhere on the earth, you can spin the globe, put your finger and you get somewhere. And but that's kind of the wrong intuition that we have in high dimension, you're always at the equator. It's very, very hard to leave the equator because, so suppose I have a sphere, okay, I'm just trying to copy. I saw a talk by Gerard Benahus last week who did this, it's been fairly well that if you specialize one component, so this is the component X, okay. In three dimension, well, the equator is a small place, but in dimension n, when n is extremely large, almost all the vectors are here. Okay, and so it's extremely unlikely if you have a sort of a random, like if you do, for instance, stochastic gradient or something like this, you'll have a very hard time, you'll start with a point, your starting point will typically be at the equator with respect to X. And you'll have a very, very hard time leaving the equator. And so I'm not a specialist of the earth, this is what experts tell me, okay. Well, the goal today is not to solve this problem exactly, but to have a starting point. So can I have a starting point X zero? So basically what I'm saying is that if I take any starting point, a random starting point on the n dimensional sphere, if I do X zero X norms square, this is typically of order one over M. So just to make sure when you say equator in large dimension, you mean delocalized vectors, delocalized vectors, so that essentially all the components have same order. Yeah, exactly. So the equator here means vectors. So if I have a vector, so let's look at my vector X naught, X naught as norm one, which means that X naught in the direction of X, let's say X is direction one. X one direction one squared is of order one over M. So if I say that this is the direction X and here I have N minus one directions. So I have lots and lots of direction. I have a lot of entropy. I can move basically the equator is big because I can move in N minus one directions. And it's only when I move in the direction X, which is one of N very large number of directions. So it's extremely rare that I will go in the X direction and it's very hard to leave the equator. And typically what I'm saying is essentially if I take a random vector on the sphere, if the sum of its squares component is one, its square component in the direction of interest is only one over N, which is tiny. And so essentially it'll be in the equator. So this is a typical random starting point. The question is, can we find an X zero with this square overlap of order one? Okay, so basically we'll use random matrices to find a starting point of order one. And then you'll use this algorithm to clean up and to find the best solution of this problem. Now why is this a matrix problem and not a vector problem? Because basically I forgot the definition, but I have these vectors A's and I know something about these vectors A. I pretty much know whether, okay, so again the data of the problem is X, the AK and the YK's. And the YK's tell me whether AK is in the direction of X or not. Okay, when YK is big, then AK is similar to X and when YK is small, AK is very different from X. And YK, things are normalized to the YK's over the one. The typical value of YK is one actually. So with the normalization. And so if YK were N, then I win, okay? But so basically, so I could say, well, this is just a linear problem. Let me just build an estimator. That's just a linear combination of these AK's that will, and so if I do a weighted sum of the AK's weighted by something like the Y, so when Y is big, I take more of AK, when Y is small, I don't, I discarded or I put a small weight, I will build a vector and I will get to my vector X. The problem is that I don't know the phase, I don't know the sign. And so if YK is big, then maybe X is in direction A, maybe it's in the direction of minus A. So any linear object that I will build will average zero in a sense because of this symmetry of, so basically there's AK minus AK symmetry, which kills anything linear, okay? And this is a little story about myself. Then my PGT thesis on the fly, on the visual system of the fly, and the visual, the fly is trying to measure velocity, and it's trying to do as simple as computation possible. And, but the velocity as in this problem is, if you flip everything, if you make everything black, white and everything white, black, your visual field, the velocity you measure should be the same. So the symmetry AK goes to minus AK means that the linear term is killed. And so the fly, what it does, it does something quadratic because if you build a quadratic function, then something that depends on the square, then you're not killed by the symmetry. And so as the fly does correlations to measure movements, we need to build a quadratic object. And how to build a quadratic object out of a vector, without losing any information. You could take the square, but that you lose everything, but you build a projector. So if I look at this object, AK transpose, that's a projector, that's an N by N matrix. And it doesn't suffer from this A minus A symmetry. I mean, it's even on the symmetry, which means that it will survive averaging over the randomness, okay? So then, but then this is original ideal, let's just do a weighted sum of these guys. So I'm gonna build a matrix M, and which is, I have T of those. So everything will normalize one over T, I sum over K equals one to T, I have this projector AK, AK transpose. And then of course, these are random vectors. They don't tell me anything yet about X. The information I have about X is encoded here. So I somehow wanna overweight the big Ys and underweight the small Ys, but I don't know yet what I'm gonna do. So I'm just gonna say, I'm gonna put a function here of YK, okay? But you think of, this is sort of an increasing function. When YK is big, I wanna put a lot of weight. And when YK is small, I wanna put zero negative weight. Yes? Yes, so this you know, yeah, these are known. And you're trying to, and this is you wanna infer that, okay? And in actually in real application, often you can choose, this becomes a more interesting problem. This is something sampling, there's a, anyway, there's some sampling problem where, Yeah, compressive sensing where we're actually, as you measure the As, as you send an A, you get a YK, you send an A, you get a YK, and you say, oh, okay, this information is in that direction, let me try. So this is a more complicated problem. Yeah, so you could do an adapt here. And essentially here I assume that the YKs are essentially random, that you didn't really choose them. They were given to you as these big vectors. And again, the components are, so IK component I is N01, what's gonna be important is, so they're not normalized vectors. And so the components are in IID, this will be useful. Okay, so I have this matrix, and then since today I'm talking about rank one perturbation and outliers, the question is, does this matrix, can I look at the eigenvalues and eigenvectors of this matrix? And maybe this matrix has a large eigenvalue and the corresponding eigenvector is somehow in the direction of X. So that's the idea. And if you remember from yesterday, I talked about which heart matrices, I won't need Sherman Morrison, I'll need a short compliment. Okay, so if I remember yesterday, I computed the Martian-Copas-Tour equation the terminology white-wishard is a wishard with no correlation and a colored-wishard is a wishard matrix with some true correlation. So yesterday I computed the Martian-Copas-Tour equation which gives you an integral equation, this kind of ugly integral equation for the spectrum of a colored-wishard and that looked a lot like this, okay? With one very important difference is that if I want to treat this, so basically this is like this matrix H, this matrix H of data here and it's just weighted, each column is weighted by some number. In the white-wishard case, in the colored-wishard case, these numbers are independent of the entries and then this becomes a free product and I could either use Martian-Copas-Tour equation or use much more powerful technology of free probabilities and free products and it can compute the resolvent and the subordination relation for these objects. The problem here is that it's not free because the function Y actually tells us something about the A's, okay? So there's some correlation between the Y's and the A's. So this is not a wishard matrix, okay? It's good in the end and it's not a wishard matrix because a wishard matrix would have a spectrum that is bounded and there's no outliers. So and we hope to find is an outlier that somehow in the direction X, okay? So by the way, since I've taken my A's to be random and these normal zero-one vectors, the vectors with normal zero-one components are rotation invariant. So the A's are rotation invariant. This matrix here is rotation invariant, which means that I can make a hypothesis about X. I can break rotation invariance on X and my computation will still work. So I'll just assume that basically or another way of saying this, I could write this problem in the basis where X is the first component and then I have N minus one other component that can choose in any direction, okay? So basically I'm saying I'm postulating X is E one, the first canonical direction and then direction two to N minus one, I don't care they're just orthogonal to X, okay? So if I do this, then I can, I'll write M. Well, I'll use the short complement. Notation here is, you have to be slightly careful. M one one is indeed the element one one of the matrix but M one two and M two ones are vectors. So they're correspond to the V's in Pierre-Palot's talk and I forgot how you called, Pierre-Palot, okay, so this corresponds to M tilde, okay? So M two two is not the element two two is a matrix of size N minus one by N minus one, okay? And what turns out is that because I've chosen the components of A's to be independent, actually M two two is a colored wishart, okay? Because if you want YK only tells us of, so the definition of YK is something like is X which is now E one, so it's really A K in the one direction squared plus eventually some noise, okay? The noise is not that important. But basically, I'll never use the noise explicitly, but because I won't compute, there's some expectation values and if you have put noise, the expectation value are a bit harder to compute, but okay. Forget about the noise. What's important is that once I know some God, I know that X is in the E one direction, then the only thing that matters is A in the one direction and A in all the other directions, all the other components are actually independent of YK. So M two two will turn out to be a wishart matrix because then I can say that for anything, so M two two is the same matrix where the vector A K is truncated, it doesn't have the element one, okay? So these are columns, so these columns are slightly shorter, they don't have the element one and then this indeed is independent. What happens for this guy, for YK doesn't tell you anything about the component and then you can use freeness and you can say that Y two two is a colored wishart, okay? Okay, so this is my matrix and again, we wanna know, is there an outlier? So we'll compute its resultant matrix G of Z, which is again Z identity minus M inverse, okay? But because direction one is special and whereas all these M minus one directions are generic, we're gonna use this true complement formula and really the only thing that matters is the thing I wanna compute is this guy, is G in the component one one. Again, remember, as I said, so G one one, just think of it as X transpose G, this is kind of the definition of G one one of Z. Okay, so and this will, so two things will happen, G will have poles, it will have actually N poles, but there will be a lot of poles coming from this wishart matrix. So this wishart matrix M two two will have a maximum eigenvalue lambda plus, okay? And the question is with this mechanism, did we manage to have an extra eigenvalue lambda one bigger than lambda plus, okay? And if so, we're gonna have an explicit pole in G and so the element G one one will have a pole greater than lambda plus and the residue at that pole. So if I do limit, so if suppose I do find a pole, suppose I do find a pole that, like this, well, it will actually be true for any poles or for any eigenvalue, but the impression is that since most of the poles come from the wishart matrix, and this wishart matrix really comes from the uncorrelated components, all these eigenvectors will have zero overlap, but if I look at the limit, so if I wanna compute the associated eigenvector V one, and I'm interested in V one transpose X norm squared, I say again that this is the residue, so as Z goes to lambda one of Z minus lambda one, remember I've postulated it has a pole there, so it has a residue, a non-zero residue, okay, so this is the computation we need to do. We need to compute G one one, find out whether or not it has a pole, and if it has a pole, that's identifiable as greater than lambda plus, which I didn't really say what lambda plus is, but we can also compute, because we're going to compute a spectrum of this matrix and find it's upper edge, and so if we find a pole above the upper edge of this matrix, then the overlap of the largest eigenvector of this matrix, this vector associated with the largest eigenvalue will be given by this residue, okay? Mark, the previous problem, it was kind of natural that we look for the leading eigenvalue and associated eigenvector, because we had this signals ratio A, which could increase. Here there is no such SNR, right? So why is it so clear that we need to look for the first eigenvalue and eigenvector? Well, remember this is a sum of projectors, and somehow if I have a lot of information in function f, if f of y tells me a lot about the, because f of y sort of tells me a lot, tell me something about vector A and x, okay? So the two things I can play is that if I do this for t sufficiently large, so if I have a very large t, then I'm bound to find something that will sort of go in the direction of x, because I'm putting weights on these projectors that have a non-zero projection on x. So the more projector I have, the hope is that somehow each of x is sort of, so if x is here, I'll have something like this, something like that, something like that, and so when I sum all these projectors somehow, with many, many, many such projectors, somehow I will find x. So indeed, and this problem, if you look at the solution, in the end, if t goes to infinity faster than n, so there'll be a q as in the Wishart problem, which is n over t, so as q goes to zero, meaning that t goes faster to infinity than n, then you'll recover x exactly, okay? And of course, but that requires a tremendous amount of data, and can you do this at finite x? Okay, so I have about 15 minutes left, it's about 35, 10 minutes left, right? Okay, but anyway, today I just wanted to show, I mean, I don't want to do too many computations, so I'll never write, right, so it's actually here, okay, except that m, I'm looking, I'm interested not in the inverse matrix, but z identity minus m, okay? But let me just write what g11 is, it's a number, so this inverse matrix is just a one over, and so I just need then, I'm gonna try to compute this in the large n limit, so I need to compute m, what is m11 as n goes to infinity, and what is this object, m12g22 of z, m22, again, the limit n goes to infinity, so these are numbers, and again, I'm very, very sloppy, but I say that they converge to their expectation value in the large n limit, and I'm just gonna write, so m11, so m11 is here just component 11, and what you get, it's fairly easy to see that you simply just get expectation value of f of y a1 squared, so, which essentially tells you, it tells you how much there's a function y, so y tells you something about the overlap, and I've chosen x to be in the first component, so this expectation value for instance depends whether you put noise or not in the problem, it also depends on your function, so y depends on a1, so actually if I did something, if I said f of y equals y, which I'm not allowed to do this, because everything has to be bounded, so if I want to use the fact that m22 is a colored Wishart matrix, this would be sort of some of unbound Wishart matrix, colored by some unbound correlation function, but just to simplicity, I just want to compute this expectation value in the simplest case, then, and y is just a1 squared, so this would be just sort of the fourth moment, so this would in this case, if f of y equals y, it's expectation value of e1 fourth, and this is three, I assume that these are normal variables, so this is just some number that I can compute, I don't want to say, I threshold function here, if I say, I only consider, for instance, another case is f of y equals theta of y minus one, so what this means is that I say, I keep the vector here, the projector, if the projector is bigger than one, I said typical projection is one, so it's kind of one standard deviation, so a significant fraction will have y greater than one, and a significant fraction, a non-zero fraction will have y less than one, so I keep about a certain fraction of my data by just thresholding, so I'm just thresholding, if y big enough, I keep it, y not big enough, I don't keep it, and one is a typical value to threshold, and then I can compute this expectation value here, it's just some integral over Gaussian, it's basically some error function, so this is just some error function here, basically if I know f, and I know that x one is a Gaussian variable, and actually in the problem without noise, yeah, in the problem without noise, this is actually expectation value of f of x squared, x squared over x, actually x is a bad idea, let me just write a, where a one is a normal zero one, so it's just some average over some Gaussian function, okay, this is more interesting, and I'm gonna sort of end with this computation, because there's an object in there that we'll use tomorrow, so in there there's some interesting random matrix computation to be done, which is a bit non-trivial, but I know what to do, it's one of the few things in life I know what to do, erase my definition, okay, so just m is one over t, sum over k, f of yk, ak, ak transpose, and this object here, so it's m, okay, so here's the two types of sums, so m is itself a sum over time in a sense, k equals one to t, but this is a matrix product, so it's a matrix product of dimension n minus one, so this is a vector of dimension n minus one, this is a matrix, this is a vector of dimension n minus one, so in a sense this object will have some i and j equals two to n, and then for each of these m's, I have a sum from k to one for the first m and then say sum from l, so I have sum from k and l from one to t, from this guy, I will have, because I have an m here and there, I have one over t square in front, I have f of yk, f of yl coming from each of these m's, okay, and then this is element one i, if you want, and this is element j, this is element one i, times the matrix element ij, and then I have j one, and the one which corresponds to one of these components here, so I'm saying, if I put the one component I have, I'll have something like a one k, a l one, so I'm just picking the ones from these guys and then I think that depends on i and j, I have i k, i, i l j, and then the matrix g of z, g22 ij, okay, so, but what's nice is that this part, and these parts don't talk to each other, so this only depends on the element one, where the information is, and this is essentially just garbage, but I need to compute this garbage, okay, so this is uncorrelated stuff, uncorrelated to the information, and this is a, this is the resolvent of a Wishart matrix, but that does not completely continue, completely contain these elements, okay, so I'm going to go a bit fast, but what I'm saying is that this part here will converge to expectation value, well, and then the other thing though, is that because of the symmetry, if k is different than l, and there's a, and there's a symmetry for each element, so anyway, what I'm saying is that this is non-zero, only if k equals l, if k, so in the larger limit, this goes to the expectation value, the expectation value as a sign invariance, and so the only way that this is odd, with respect to the flip sign, unless k equals l, and which becomes even, okay, so basically this part goes to expectation value of f square of y, a one squared, okay, so just another number, the thresholding function for instance, this is the same, if f is either zero or one, f square is the same, f square equals f, so it's the same expectation value we did here, but for any other function, it's some function of a Gaussian variable that you can compute, and this is a bit more interesting, to compute this, to pull out this expectation value, I use one power of t, what I didn't compute is this, is one over t, so I use one power of t to make this into, use the fact that k equals l to reduce, to go from a double sum to a single sum, and then the sum divided by t is the expectation value, so I have one left power of t, then this is basically a matrix product, I write it as element wise to n, and then I, well let me write it directly, what I'm saying is that is, this becomes the trace of, of a matrix H, okay, you can, you can work out that this, this is really the matrix products, this guy times G times this guy, and you can write it as a matrix, and the trace is cyclical, so I'd rather write it this way, okay, these are just elementary manipulations on matrices, okay, and this is an, these are n by n matrices, okay, so I'd rather use a normalized trace, so trace over n, because this will give me a finite number, but if I divide by n, I need to multiply by n here, and n over t give me a factor of q, and then I'm basically done, I know, well, and this, I can compute, call this H of z, some function, which again, normalized trace of, let me write it more explicitly, H, H transpose, oh and then the power, the extra power of t goes here, okay, over t is really some sort of, some white wishart, okay, so what I need to do is compute something like, z identity minus, and what I'm claiming is that, the block two two, which is, where y is independent of these guys, is really like a colored wishart, and I'm going to write this colored wishart as something like this, and the data H is the same data as these guys, is these A's, okay, so if you want, I have, so this is a white wishart from the, build from the vectors A, ignoring the first component, so it's n minus one by n minus one, and this is this, a wishart colored by these function F, which are uncorrelated to these, okay, and then I have here, so trace of the technical, but I think it's important that, what I want to say is that, this is something we can compute, with the notion of free product, so in a sense, what I'm saying is that, what we'll see tomorrow, is that this is W zero, free product with some matrix C, and the matrix C, I think is the matrix of these F of y, which you can think of a diagonal matrix, so this is everything's rotation, so if you put F of y's into the diagonal of a matrix, you call this matrix C, and you only look at the block without the element one, that's super important because, then you have independence, and then you can use this setup, and W zero is just a wishart, of size n by n, or actually n minus one by n minus one, but we don't care, n minus one and n are the same thing, and parameter q. Okay, and so in free probability, I can compute such an object. Okay, so I've given you all the tools, and then you can just do the computation, you can compute, we need to find an equation for where this is as a pole, this is zero, so g11 is a pole, and then I use the hospital rule to compute the residue, and it will turn out that it will depend on, essentially two things, it will depend on q, it will depend on C1, which is this thing here, it will depend on C2, which is this other expectation value here, and it will depend on this function h of z, which I haven't computed, but I can compute with using free probability theory. So I haven't told you the answer, but I've given you the recipe to do this kind of problem, and really the thing that I find interesting and beautiful is the computation of such an object, which one here? A color, the resolvent of a colored wishart with the under the white wishart, and this is something that we'll do tomorrow, tomorrow we'll look at similar type of problem, but similar, we'll also look at covariance matrices that are colored by a true covariance, and we're trying to estimate the true covariance, and we'll need to compute objects like this. And again, I've been over, I've taken some time, thank you very much. Any questions? Again, in this colored wishart, it is such a matrix because the thing is that the f of yk, like you said, they depend, they're correlated to the ak's, but not for the j22, right? Exactly, yeah. So really, in the absence of noise, yk is just ak1 squared, and so for an akj, j greater than 1 is independent. Okay, that's what's... That's the important part. Okay, thank you very much, Marc. Thank you.