 Yeah, I'm going to talk about this thing, but this title is wider than what I'm actually going to talk about. I'm going to talk about this thing, actually. So spectral analysis of generalized linear models with general Gaussian design. So this is based on joint work with Honchan-G at IST and also Mako, who is sitting over there, and Ram Jeevan Katturamanan at the University of Cambridge. First of all, I apologize that the paper is not yet on archive, but hopefully sometime next month or something. Okay. So just to put things into context, this has been introduced several times, but let me do it again. Suppose we have some unknown parameter x star that we want to estimate, let's say in dimension d. For normalization purposes, it has norm square root d. And we don't directly observe it or observe this, of course, but we instead get a linear measurement of that via some covariate vector A i, and then it's passed through certain nonlinear function q with some noise epsilon i. And this q is a bivariate function from r2 to r. And we do this for n times, and we get n yi's, so y1 from y1 to yn. And our goal is that observing these yi's and also those ai's, we want to estimate x star. Okay. That's the model. This, of course, incorporates many things that you have thought about or even worked on. Like linear regression, phase retrieval, when bit compressing. I'm specifying the function q, basically. And logistic regression, okay, here I'm specifying the conditional law of y given the linear measurement. It's roughly the same thing if the noise is sufficiently regular. Also polynomial regression, and sometimes this generalized linear model is also known as single index model and so on. So yeah, that's what I'm going to do. So theoretically speaking, most people work on random design because that's easy to analyze and also it captures certain structure of what people are actually using practice. So this includes mostly two categories of designs. One is ID, okay, I said two, but really one is the subset of the other. So one is ID Gaussian design where ai's are just ID isotropic Gaussian with variance one over n. And it's just for normalization purposes, not important. The other is something called bi-rotationally invariant or bi-authentrally invariant design. If you stack those, okay, this slide is somewhat dated, but if you stack those ai's into a matrix, let's say a is with rows being ai, then this is an n by d matrix. That matrix people assume is bi-rotationally invariant, namely if you conjugate an n by n-har matrix on the left-hand side and a d by d matrix on the right-hand side, the law doesn't change. Or putting in other words, it's saying that the eigenvalues and eigenvectors of that matrix are independent and the eigenvectors are completely random. They are uniform over the corresponding orthogonal group. And the eigenvalues have certain limiting law, let's say the spectral distribution. This is just saying I put a delta mass on each singular value and I take the average. This is the empirical distribution of the singular value. It converges to certain well-defined law. And everything in their universality class because you would expect that even if the design is not exactly ai-d Gaussian or not exactly rotationally invariant, roughly the same performance is expected to be true. This has been observed in the previous talk by Rishabh yesterday. And I will go back to this later. But today I'm going to go beyond these two things. But really it's the modest step beyond isotropic Gaussian and bi-rotationally invariant. I'm going to assume it's still Gaussian but with general covariance. Looks like a minor change, but it will mess things up a lot. So I'm going to assume ai are i-d Gaussian with covariance sigma, some general d by d matrix sigma over n. Again over n is for normalization. So equivalently you can think of that matrix A as at tilde times sigma a half, at tilde itself as i-d entries. But if you multiply sigma a half from the right-hand side, it's making the rows correlated but different rows are still independent. Entries within a row is correlated. Why is this interesting? As I said, first of all, it's not captured by the isotropic Gaussian design or bi-rotationally invariant design. And also in practice people do use some sort of covariance structure, for example, topolis or circulant matrices for some reason in their contexts. Also why is this non-trivial? Because first of all, as I said, it's not i-d Gaussian, it's not Wigner. Universality results for i-d Gaussian matrices do not apply. And also this is only left-rotationally invariant because I can rotate from the left-hand side but not from the right-hand side because I'm not going to assume anything about sigma. Sigma can be any general covariance matrix. But a tilde itself is i-d Gaussian, so I can rotate from the left but not right. And okay, it seems that the left or right does make a difference, at least to me. For example, I don't actually know how to do right-rotationally invariant for what I'm going to do next. Okay, now the problem is the following. So given that model, which generates those YIs, and also given the covariate vectors used in the GLM, I want to estimate X star using some estimator X hat. And the quality of the estimation is measured by this overlap, namely the absolute value of the cosine of the angle between X star and X hat. I want this to be as large as possible. If it's 1, then they are exactly aligned. If it's 0, then they are orthogonal. You can also think of this as, you know, up to scaling or additive factors. It's basically the same as the minimum mean square error or something like that. Okay, I want to maximize this. And one important thing is that I'm going to work with proportional regime and D both go to infinity for sure, but their ratio stays roughly fixed, and they converge to some number of delta between 0 and infinity. So as you will see, this is the most interesting and natural scaling for the thing I'm going to do, because at this resolution, interesting things happen. Okay, I'm going to put some more concrete assumptions, especially on the matrix sigma. So we call this the model. Yeah, this matrix A has rows being AIs, and that's how Y generated in matrix form. Recall this Q is a bivariate function. When I write this, I mean you apply Q row-wise to these two vectors, these two input vectors. And I'm going to assume sigma is unknown. So I know A, I know Y and A for sure, but I don't know sigma, so I don't know how rows of A are correlated. You will see this is the most interesting case. If it's known, it will be significantly easier and probably less interesting. And A, okay, sigma is strictly positive, definitely. This may not be necessary, but okay, let's assume that. And norm is uniformly bounded, very mild. And suppose, okay, this notation, keep in mind, this is very important, this will show up frequently later. The empirical spectral distribution of sigma, namely the empirical distribution of the eigenvalue, the histogram of the eigenvalues of sigma, converges to the law of certain univariate, a certain scalar variable, random variable sigma bar. I put a bar on top of that, meaning that it's a scalar random variable. So yeah, this thing will play a crucial role. And also, let's say the prior distribution is unstructured, it's just uniform. It's an interesting question to ask what happens if the prior is structured, for example, plus minus one, or it leaves in a cone or something, but we didn't look into that. It's going to be another talk. Okay, finally, mild assumption of the noise, ignore that. So let me introduce the main character of the talk, namely spectral estimator. What is that? So given Y and A, again, Y is a vector, we observe an A is the matrix consisting of the covariate vectors. I can do the following. I want to construct a matrix. Let's do the following. Let's take the average of both AIs, I mean, the average of the rank one components formed by each covariate vector. This is useless because it doesn't depend on our observation. This is just covariate vectors. So let's weight them by YI. Why is this a good idea? Because YI depends on a one-dimensional linear measurement. Y is equal to A times X. So if you do this, each component is weighted by each YI. And in fact, in general, we can pick some pre-processing function. Before we put into that matrix, we can pre-process each YI by some scalar function t. For example, okay, I don't know, maybe the link function Q is binary, then in that case you want to quantize this YI, maybe that link function is bounded, maybe we want to truncate this YI, something like that. A priori is not clear what you want to do with those YIs, but in practice, people do guess some YI and do that. It seems to improve the performance a little bit. And yeah, I will quantify this thing later. So what exactly is the effect of t? And there is going to be an optimal t, which is not possible to guess. So that's the matrix I'm going to look at, as I said. So there's one hidden direction inside those YIs that depends on AIs. If the aspect ratio delta increases, you would expect that particular direction will contribute to a spike of this matrix D. So therefore it makes sense to look at the top eigenvector of this matrix. If there's a spike, then the top eigenvector is expected to correlate it with the signal X star. And this can be easily corroborated if you compute the expectation of this matrix D. That's an easy computation you can do. And indeed, yeah, this is what I'm going to analyze, look at this matrix D, and look at the top eigenvector. So more specifically, why is this a good idea? I'm not saying this is the best estimator, because first of all, it's computationally simple. Just do the eigen decomposition and take the first eigenvector. And also, usually it's used as a warm start of many algorithms. For example, alternating minimization, expectation maximization, approximate message passing, gradient descent, stochastic gradient descent, blah, blah, blah. So yeah, people do use this in practice, because it takes some extra work to analyze random initialization, as Kabir talked about in the first day. But here I'm going to focus on just this particular spectral estimator. Also, this thing per se is an interesting random matrix theory question, right? If you peel off those statistics background, I'm just giving you this matrix, and I'm specifying the dependencies of those things. And I ask you, what's the statistics associated with the top eigenvector? It's not actually so easy. So I'm going to assume one technical thing. Okay, first of all, T is bounded, very minor thing, and not constantly zero to exclude some trivial behavior. And I'm going to assume the support of T is strictly positive, namely, it should touch some positive number at least sometimes. So it should be something like... It can be negative, but it should touch positive real line sometimes. It cannot be purely negative, because otherwise, if it's purely negative, then the negative of that is going to be purely positive. So you would expect, in that case, there would be an outlier to the left of the spectrum. In that case, maybe you want to take the bottom eigenvalue, eigenvector of the matrix, not the top eigenvector. So yeah, so this thing, yeah, I'm going to assume this. Keep in mind that T can be very negative in various sense. For example, it's like expectation can be extremely negative. I just want to make sure it takes some positive value. This will actually create some challenges. Okay, the main result is the following. For any fixed given T, function T, I'm going to define three numbers, lambda 1, lambda 2, lambda 2, and eta, okay, defining certain way I'm going to tell you later. If the following condition holds, this number, lambda 1, is larger than lambda 2, then I know three things. One, the top eigenvalue of D converges to lambda 1. The second eigenvalue of D converges to lambda 2. Third, the overlap between the first eigenvector and the signal is going to be equal to eta, I mean all in the limit. So note that these three numbers only depend on those, you know, random variable T and delta. They are dimension independent. They don't depend on N and D. You don't have to know the data A or Y to compute these predictions. Okay, so just, yeah, by the way, this condition very likely is going to be the right phase transition threshold. Namely, if this condition holds in the reverse direction, if lambda 1 is at most lambda 2, then both the first and second eigenvalues converge to lambda 2 and eta should be equal to 0. But we don't have the proof for this thing. We don't have the supercritical behavior. Okay, so what are those lambda 1, lambda 2 and eta? Let's define a pair of random variables, G and epsilon, by the way, when I put bar on top of the letter, it means a scalar random variable, not matrix or vector. G is this Gaussian and epsilon is that noise. G has this particular variance just because of my normalization, so maybe you don't pay attention to that thing. And Y is a one-dimensional observation generated by the GLM. Okay, define A and gamma to be the solution to this pair of strange equation. Don't read this, but the qualitative feature is that this pair of equation only depends on sigma bar, T and delta, and then define lambda 1 to be A times gamma. So okay, where does this come from? Two equations, two unknowns. I can solve this. Turns out the solution does exist and is unique. It's not so easy to show. But there are two possible interpretations of this pair of equations. One is that you should think, I mean, for those who are into random matrices, you probably recognize that this looks like the self-consistent equation for Steelejust transforms, and yes, it is indeed the case. So you should think of lambda 1 as the inverse Steelejust transform evaluated at A. But it's not going to be clear from my proof. I'm going to instead interpret it in a different way, subsequently, so in which this number A and gamma will pop out naturally. So yeah, let me try to do that now. Yeah, and similarly lambda 2 and eta are going to be similar, I mean, not similar, but are going to be defined in a similarly strange way. I'm not even going to write it down. Okay, so let's get some basic intuition. So this is the matrix D, just to write it down in matrix form. This A is that A, and this T is a diagonal matrix that put T of YI onto the diagonal. So D is equal to A transpose TA, D by D matrix. You would expect that in the presence of spectral gap, then the first eigenvalue would be detached from the bulk. This is the empirical spectral distribution of D. You plot the histogram of the eigenvalues of D. In that case, whenever there is an outlier, you would expect the top eigenvector will be non-trivially correlated with X star. Namely, it has a certain non-trivial angle, not 90 degree. Otherwise, if this top eigenvalue collapses to the bulk, then the top eigenvector would look like a complete Gaussian, independent of the signal X star. That's our expectation, but how to quantify this? This matrix D, if I expand A as A tilde times sigma one-half, it's like this. It's sigma-half times A tilde transpose times T times A tilde times sigma-half. T note that it depends on A tilde, because T depends on Y. Y depends on A tilde, because Y is given by a certain one-dimensional measurement of the signal. For some of you who are into random matrices, it looks like separable covariance matrices, at least in shape, because over there it looks like something times a Gaussian matrix, times another thing, times the same Gaussian matrix, times the same thing. Over there, the crucial assumption is that this T does not depend on A. It's independent, so the only random thing is A hat, which is random, and both sigma-half and T are ... You should think of them as covariance matrix in the random matrix theory context. But here are two different things. One is that T depends on A tilde, and also my T is not going to be PSD, because my function telegraphic T is not assumed to be everywhere positive. But you can imagine these two things are intimately related. In fact, if you take out the spike from this matrix, you should be able to end up with this D hat. Let's try to quantify that thing. So this is just a reminder, and then let's take out a spike. The A tilde is a Gaussian matrix. In particular, I can take any direction and project A tilde along that direction, and plus some orthogonal direction. And if I do that, the orthogonal direction can be replaced by some independent copy, just by isotropy of Gaussian. The orthogonal direction should be independent of the parallel direction. And we can do ... I mean, if you want to check this for G equals E1, it's just saying that I separate the first row from the rest row, and of course I can replace the rest row by some independent copies, almost completely trivial. Let's take G to be that special vector, which is the one-dimensional measurement, the most natural choice, and we use that formula in place in D, and it looks like something times T times something, and that something is the rank one deformation of the bulk. So this is the major part, the second term. The first term, if you stare at this, is a rank one thing. So if we take out the rank one thing, we end up with D hat. It looks almost like the separable covariance matrix. And in fact, okay, there is a projection matrix, but you can basically safely ignore that. It's just projecting out one direction, but for an N by D Gaussian matrix, projecting out one direction is basically doing nothing. And you can show that the eigenvalues of D and D hat are related in the following way. Their eigenvalues are interlaced. So if you look at, for example, lambda two, it's interlaced between lambda one of D hat and lambda three of D hat. And this thing holds for any lambda two to lambda D minus one, but not for lambda one and lambda D. And it's not exactly to be interlaced, because after all, lambda one and lambda D are possibly spikes, although I only care about lambda one, right? So, so far we, okay, we have this. We said lambda two is sandwiched between these two things, and actually these two things corresponds to something that looks like a separable covariance matrix. In particular, they both converge to the soup of support of the limiting spectral distribution of D hat. Whatever it is, it's actually well understood. Well, not really, because if T is positive again, then it's exactly separable covariance matrix. Then it's well understood, but here I don't assume that. And it actually takes some effort. It may be slightly more tricky than what you would expect, but the same characterization also holds if T is not PSD. You know, you take the very same formula and plug in some non-positivity, and it also holds. In particular, the conclusion is the following. So lambda two of D converges to the soup support of the limiting spectral distribution, which admits certain explicit description, which are defined to be lambda two. That's the some mysterious formula I didn't show you, but it's some explicit thing, depending only on these three objects. Okay, so far we have the bulk. So we are basically halfway through it. We showed equation number two in the main result, but what about one and three? Okay, for this I'm going to do something completely different. Suppose you don't know anything about random matrices or whatever. I give you some matrix, I want you to compute the top eigenvector or top eigenvalue. What would you do? The first thing might be power iteration. You take that matrix D, I expand it as this. You use it to hit iterate VT, and normalize it so that it has unit norm. And you do this for multiple steps, and you would hope that whenever there exists certain spectral gap, it should converge to the top eigenvector. That's what we learned from undergrad linear algebra. What I'm actually going to do is not exactly power iteration. It looks something much more complicated. So let's unpack this. So okay, first of all, note that there is a pair of iterates. U and V. This is because my matrix A tilde is rectangular. It's N by D, so I have to jump between n-dimensional iterate and d-dimensional iterate. So they are a pair of iterates. If you ignore that negative part, then basically it's A tilde times U tilde or V tilde, where U and V are nonlinear functions of V and U and V. Sorry, U tilde and V tilde are nonlinear functions of U and V. So this is basically nonlinear power iteration. I allow you to do some nonlinear transformation, F and G, but up to that it's just generalized power iteration. What makes this algorithm an AMP, approximate message passing, is the existence of these two terms, these negative terms. They are memory terms, and the Onsager coefficients, B, T, and C, T are very carefully chosen. They depend on the derivative of the functions F and G. Not clear why this takes this shape, but let's take this for granted. Only in this shape is called approximate message passing. Also note that I'm doing this iteration on top of A tilde, not this original matrix D, because AMP can only run on simple matrices. A tilde is IID Gaussian. So to specify an AMP, I just need to specify F and G or FT and GT. Okay, I want to use this to simulate power iteration. Okay, this is a typo, so it should be lambda V. Left and side should be lambda V equals D times V. So this is the eigen equation corresponding to the matrix D. But on the other hand, my generalized approximate message passing looks like this. I want to use this to simulate that eigen equation. How can I do this? It comes down to choose F and G. Suppose T is sufficiently large so that everything converges, so I remove the time indices heuristically. Now let's choose F and G in the following way. I'm throwing this right to you, but there's a way to reverse engineer this thing, but I'm telling you that you should choose F to be some matrix depending on sigma times V, and G to be some other matrix depending on T times U. And those matrices, there are two undetermined parameters, A and gamma. Note that if you fix F and G, B and C are fixed. They are given by Onsager coefficients, so there is no choice. So I need to tell you what is A and what is gamma. So you can scale A and gamma so that some nice things happen. These choices are somewhat arbitrary, but I choose it to simplify our derivation. So let's say I choose A and gamma so that this B is equal to one just for simplicity, and that thing, this normalized norm squared is equal to one. This makes sense because I want to make sure my iterates are bounded, something like that. So these are pair of equations. I can solve A and gamma from here. Well, not really because it depends on D, but I claim that these quantities can be computed from state evolution, so ultimately we'll get a pair of equations. And that pair of equation is exactly the same as the strange equation I showed you some slides ago. So if I choose this, you can plug it in and see what happens. Indeed, nice things happen. Do some manipulation. You end up with this. Let's stare at this for a moment. You realize that this looks like some number times some vector equals D times the same vector. So it looks exactly like an eigen equation, and we even know what the eigen values are. The eigen value is going to be A gamma, and this is what I'm going to call lambda one, and the eigen vector is supposed to be that thing, sigma minus a half times f of V. Okay, so the only mysterious thing I did is to give you this F and G, but I claim there's a way to guess it. All of these are heuristic, but let's try to quantify this. So to quantify this, I need state evolution. I don't think it has been introduced by now, but let's define random vectors capital Vt in dimension D as the sum of some component along this nonisotropic signal plus some independent Gaussian noise. So this Wt is independent of x star, and Wt is a Gaussian. And in fact, if you collect those W1 to Wt, they are jointly Gaussian. And their covariance structure is given by this t by t matrix psi t. Note that I have d dimensional Gaussian vectors, but I only need t squared numbers to describe their joint correlation, because those Gaussians are isotropic, so I don't need, naively, I need t times d by t times d a matrix, but here I only have this psi t, and then it's tensored with id. So they have very succinct representation. More nicely, this psi t can be computed recursively. I don't write down the formula because it's messy, but trust me that if I specify the function f and g, you can compute these psi t recursively from previous psi t minus one. And similar things holds for chi t. Okay, so then what is this vector? I claim that these are very accurate description of the iterate Vt in high dimensions, in particular, if you do this, they pass any pseudo-lipschitz tests. So basically, you can safely replace small Vt by capital Vt in your calculation. The nice thing is that capital Vt looks like a Gaussian, I mean something plus a Gaussian. Whose covariance you know. This is not at all clear for power duration. In fact, it's false for power duration, but it's true for GAMP. Okay, let's try to get to the crux of the argument. So I said the AMP equation looks like a power iteration. Indeed, by some manipulation, it looks like this. It looks like Vt plus one equals d over a gamma times Vt plus some error vector. Supposedly it should be small. And indeed, it's gonna be small if t grows. Let's ignore that. So this is exactly power iteration, right? Let's do some trivial thing. Run this for another t prime steps. Take the norm on both sides. Take spectral decomposition on right-hand side. You get these terms. I separate the first term from the rest for a good reason. So you have, these terms looks like, you know, lambda one of d over a gamma to the power of two t prime times the overlap between Vt and my eigenvector, the corresponding eigenvector. Now take the limit. And if, I mean I told you that lambda two converges to lambda, I mean lambda two of d converges to lambda two. This is known, I just showed you. If it happens that this number lambda two is less than a gamma, which I defined to be a lambda one, then those things will all vanish. Because lambda two is less than a gamma, it decays geometrically. So lambda three is also less than a gamma, because I, you know, they are in decreasing order. All of these vanish. I end up with this. Let's stare at this equation. I claim the left-hand side can be computed using state evolution. After all, you know, it's something related to the, to the, you can replace this Vt plus t prime by capital Vt plus t prime, but it's a Gaussian. So we can compute it. Now let's say it's equal to nu squared. Now staring at this equation, you cannot exchange the limit. It's important that you do sequential limits, which goes to zero. These terms, right? This is okay because this is true for any t. For fixed any t, this term is bounded. This is at most one. I mean, this is at most some constant. So this is bounded. For any t, if lambda two converges to something less than a gamma, as t, as d goes to infinity. There's no t here. As d goes to infinity. As, as d goes to infinity, if lambda two of, yeah, one has to be a bit careful with the limits, but I'm kind of going fast with it. Okay. Yeah, we are here. So we are at this equation. So left-hand side is some given number, is some number computed from state evolution. Right-hand side is this. It has to be the case, you know, staring at this. It has to be the case that lambda one converges to a gamma. Why is that? If it converges to something larger, right-hand side blows up. If it converges to something smaller, right-hand side converges to zero, which violates this identity. So it has to be the case, the limit of lambda one of d is equal to a gamma, and vt and v one of d are aligned. Okay, just by staring at this equation. So basically we're down, right? We said, you know, I give you the limit of lambda one of d and we have this. Why we're down? Because I can now compute overlap between eigenvector and x star by replacing eigenvector by my iterate. But my iterate, I have a good control. I know it's going to be a Gaussian. Compute that, you get some formula. It's called eta. And it happens to be strictly positive whenever this condition holds. Okay. Okay, that's it. I guess I'm gonna, okay, let's briefly talk about it. It makes sense, you know, this condition, roughly speaking, says that if the sampling ratio is sufficiently high, then I have correlation. You can think of this heuristically as delta larger than some threshold. Therefore, I can try to choose my t. I didn't tell you what t is because this is true for any t, but I can choose my t to minimize this threshold. This means that I can, you know, it is for me, it's easiest for me to get positive overlap. It turns out this delta star is given by this complicated thing, whatever it is, and the best t star is given by this. This actually, first of all, okay, there is no way to guess this. Also, this happens to match some physics prediction from this pair of paper by my yard at all, from EPFL, from EPFL, using heuristic calculations, but for more general right-rotational invariant ensemble. But note that here, my matrix is not right-rotational invariant. It's left-rotational invariant, but somehow we get exactly the same, the same formula, which is, I found this amusing, and this is probably suggesting some miraculous universality behavior, which I don't understand. And it seems that the universality results, which Rishabh talked about yesterday, does not really cover this case, because, you know, left and right does seem to make a difference. Okay, and yeah, this t, okay. As we saw this, okay, qualitatively, this formula depends on these kind of quantities, depending on the link function. These things are actually the Hermit coefficients of the link function, sigma zero and sigma two. So if the second Hermit coefficient of the link function vanishes, then the spectral threshold blows up. It means that spectral threshold, you know, spectral estimator is gonna be ineffective, no matter how large delta is. This makes sense in hindsight, because, you know, there's a line of work understanding information X-borne, introduced by Reza and his collaborators, you know, he's somewhere in the audience, by Ben-Aruz, Reza, and Jiganov. Information exponent is the order of the first non-vanishing Hermit coefficient of the link function. Here, we only look at information exponent two, and we say that, you know, whenever it's finite, then we're good, whenever it's infinite, then we're not good. So we can think of this as a small facet of that phenomenon. There's a generalization of leap complexity, which I think Eric is gonna talk about right after this. So I'm not gonna get into that. So, okay, let me skip this, it's not important. I said that we only have super critical behavior, namely whenever lambda one is larger than lambda two, we have characterization for everything, but not the other way around. Of course, we would expect the other way around, things would not work, but we don't have a proof. And, okay, get back to the title. I only analyzed this particular generalized linear model, but the very same methodology should, or is expected to work for many other contexts. Let's say you give me some natural random matrices arising from some statistical context. Chances of that, I can analyze that using this methodology, and indeed, we have already worked out some things, which are, you know, still in writing. Also, I mentioned this miraculous coincidence with the physics prediction for right-retentionally invariant ensemble, which is strange, because they, my ensemble is not right-retentionally invariant, so universality is something going on. Also, I only talk about spectral estimator, I don't claim it's a good estimator, it's something that people use, but it's not the best thing in the world. The best thing in the world is MMSC, is a base optimal estimator, you know, conditional expectation, conditional expectation of X given Y and A. That actually has been analyzed in the same paper, and I think just for that part, it's actually rigorous. It's rigorously known that the MMSC for this model with left-retentionally invariant ensemble is known. Now, the question is, you know, how spectral estimator compares to that limit, and is it base optimal, is it, sorry, is it information theoretically optimal? If yes, that's interesting, because spectral is good. If not, also interesting, it's probably suggesting a information computation gap, and how to mitigate that, how to show hardness, I don't know, and I leave it to the audience, they are experts. So yeah, that's it. Thank you very much. So yeah, everywhere in that paper, except this MMSC, is about right-retentionally invariant, general right-retentionally invariant, not necessarily Gaussian times something, just general, but just for this part, they look at some Gaussian times arbitrary matrix, and which is our model is a special case of that. And just for that part, the result seems to be rigorous using your adaptive interpolation method, and they have a formula. I don't know how that compares to, for example, my eta, and how the threshold compares to my threshold. I didn't do it. It's known, yeah. Sigma is identity, then this is well understood. Actually, both sides are one another's two. If you're above the threshold, it's known. Below the threshold, it's known to be zero. So we have a full phase transition. Yeah. So could you speak a bit louder? Okay, it goes, yeah. The random matrix theory part, you can probably do it because there are much more, you know, there are microscopic descriptions of those things if you want to do the math. AMP part is going to be much more technical. There are recent work analyzing non-atomic tautic behavior of AMP, but only in the context of rank one matrix estimation, it can, you know, it's potentially generalizable, but it takes a lot of work. But in the best case, I don't expect it, I don't expect it to give the negative side of the phase transition. I don't think that can be done. I mean, I don't think it can be done using purely AMP, using purely AMP. You can probably do it using RMT. That's the right way to go. But it's going to be challenging. I asked some experts in RMT, but it doesn't seem to be easy. Right. Actually, that's not the best. That's the good thing. That's the signal. So it has some component along the signal, plus some noise. Yeah. Sure. Yes, as I mentioned, okay, so I changed notation sometimes. The Q can also be thought of as the conditional law. So this Q of G epsilon can also be P of Y given G, something like that. So I sometimes use this, sometimes use that, and you see in the result, this shows up. If you think of this as a function, for any given Y, you think of this as a function of the second component, and you look at the Hermit coefficients, this thing shows up in the result. So yeah, in this form. I would say pretty crucial. It's using AMP, and if you change that, I mean, you don't expect the same thing holds, because if you change to Ising prior, for example, you may observe some information computation there or something. I would expect. So yeah, the threshold would change, I think. It can be done, but I didn't do it. Yeah. Yeah, but on the other hand, special estimator doesn't care about prior, so it may not be a good idea in the first place to use a special estimator, because it's just eigenvector. There's no a prior or any structure. Yeah, okay, sure. Yeah, yeah. Thank you.