 I'm back to the morning session of the workshop. So today, we have Mano Sanz from Udelar Uruguay, who's going to present us the price of Inguineras. Very nice title, looking forward. The four is yours, Manuel. Thank you very much. OK, first of all, I will warn you that I am heavily jet lagged, so I hope this goes OK. If I spontaneously start speaking in Spanish, it's because I had a little bit of a breakdown. You just save me, like, switch back to English, and I will do my best to do it. OK, so this is a collaboration with other three authors, two of which are here, John, and Marco, somewhere, maybe. And Tianqi, that couldn't make it, he's working in China. OK, the model I will be discussing is a model that already came up at least once during the talks, so hopefully you already have some kind of familiarity, and this will make everything a little bit smoother. So the general setting is a setting in which the statistician has some certain data available, which is structured in a matrix. This matrix is a full-ranked matrix, and for some reason, within this full-ranked structure, there is a low-ranked structure that is of interest for the statistician. So the idea is to try to extract this low-ranked structure from the full-ranked structure. There are many applications, for example, in community detection and group synchronization that I will not dive into, but this is in general a very recurring problem in this kind of topics. This is, in a way, a kind of inference model for PCA in the sense that the kind of task that you're trying to do is essentially the same kind of task that is solved by PCA. But the difference is that PCA does not incorporate prior information about the signal or the model in general into the estimator that it gives you, and the idea is to have PCA as a kind of comparison background to see the performance of these kind of estimators. So what I'm going to discuss is this task of obtaining a low-rank signal out of a full-ranked data matrix in the setting in which one assumes that the high-ranked noise, which is a kind of additive noise in the data matrix one is observing, is not the uncorrelated Wigner Gaussian matrix that is in the typical models, but the matrix instead has some kind of structure. So a little bit the logic of assuming this is that one has a certain problem in which one wants to apply these techniques and wants to extract this low-rank structure for some reason. But the data matrix that's the full-ranked data matrix that is available, the thing that one calls noise is, in fact, some kind of information. It's not really noise in the traditional sense. It's something in which you're not interested. So you want to, in a way, forget about it. But it has some kind of structure because the distribution it comes from is a highly structured distribution because this is a data set that comes from a real situation. And the kind of question we want to answer is, OK. You have this setting in which the data matrix you have is a data matrix that comes from some structure distribution, and you want to extract this low-rank signal from the high-rank data matrix. But you don't have detailed information about the distribution where the high-rank component comes. So in a way, one nice proposal would be to forget about the structure in this matrix and model it as a Gaussian matrix, which is essentially a matrix that has no structure at all. And the idea will be to answer the question of, OK, by using this kind of naive model on this structured matrix, what are the things that you're losing by doing so? And that's why the title of the talk is the price of ignorance. OK. And in particular, we're going to study a low-rank matrix estimation. I forgot to mention the name of the problem, in which, as I mentioned, the noise matrix, whatever you call noise, is some kind of matrix that comes from a structure distribution. But the statistician models it wrongly as a Gaussian matrix. And in this setting, the main results of the talk will be a characterization of the Bayesian estimator that wrongly assumes the noise to be Gaussian. So it's a heavily mismatched estimator. And the characterization of A and P that's also wrongly assumes that the noise matrix is Gaussian. So there will be parts of the algorithm which are incorrect, in a sense. And from these two theoretical characterizations that we have, then we have a whole variety of phenomenology that appears, which have very interesting kind of implications, which I will discuss a little bit. OK, but before diving into the problem, the structure problem, I want to discuss a little bit the traditional model in which you want to extract a rank 1 matrix out of a full-rank matrix. And really, you have an additive noise, which is Gaussian. So I want to discuss a little bit this setting before specializing in the new results. OK, so being more precise, the kind of things we have, in this case, is that the statistician has some data matrix, which is this y, which is equal to this rank 1 matrix, which is the P. As you can see, the P is a rank 1 matrix built out of a random vector, which is the x star. We will call this the signal that we want to infer. And then there is an additive, a Gaussian noise. And this parameter lambda star, which is the signal to noise ratio. During the talk, there will be two signal to noise ratios, this one, which is lambda star. Keep in mind that not in this setting, but in the setting of the work we did, there will be another signal to noise ratio, which will be the lambda without the star. And this lambda star will be really the signal to noise ratio involved in the data generating mechanism, while the lambda will be the signal to noise ratio assumed by the statistician. And we will allow, in the model we discussed, not here, for these two values to be different. OK, so you want to infer this random vector x star. And you have this Gaussian additive noise here. In this setting, because the real distribution of the noise is bigger, you have that the distribution of this random matrix is proportional to this density over here, this factor exponential of trace 2 of z squared. And this implies that the log likelihood of the model, the log likelihood that will appear on the posterior distribution, will have this quadratic form. This quadratic form comes directly from the quadratic expression that is from the distribution of the noise. OK, here I write some simplifications in two special cases, in the cases in which the prior you're using for the posterior has a support over a fixed norm. So it can be two cases. Here I wrote in the slides that the prior is uniform on the sphere, because here, when you distribute the square, you remove, you add it in this constant, the factor that has the term of the norm. But it could also be a product of n-radimajor, the prior, which would give you a vector which is uniform in the hypercube. In that case also, the norm would be fixed and would be equal to square root of n, and it will be incorporated into these constants over here. OK, so this form of the log likelihood will be important for us, because here I am discussing the log likelihood of the model that where the data generating mechanism has a Gaussian noise, but also the model models the data as having Gaussian noise. In the setting we will consider later, the data generating mechanism will be different. So this matrix Z will not be a Gaussian, but still because the statistician is not aware of this discrepancy, the posterior proposed by the statistician will be the same. So we will still have this same kind of expression. OK, in this setting there is plenty of literature available. I just mentioned here a couple of them. Some of the people here in the audience have been authors of this kind of works. But in a nutshell, in this context in which you have a data matrix that really has an additive Gaussian noise and the model has no mismatch, in a sense, everything is known. So the characterization of the usual quantities one would be interested in, for example, for the Bayesian estimator characterizing the limiting mean square of the estimator or characterizing the evolution of A and P. This has essentially all been done. In the particular context of the Bayesian estimator, here the Bayesian estimator is, well, in the slides I didn't write the meaning of this bracket, but this bracket means mean expectation with respect to the posterior. So expectation with respect to the measure that has the low likelihood that was on the previous slide. So it would be the measure that has this expression in the exponent. This is not working. OK, has this expression in the exponent plus a prior. In this context, the Bayesian estimator is then taking expectation with respect to the posterior to the matrix constructed in terms, the Ranguan matrix constructed in terms of samples from the posterior. So it would be this expression over here. And what you have is that for characterizing the asymptotic error committed by this estimator, it's usually useful to analyze the asymptotics of this auxiliary quantity, which in Bayesian terms would be the log evidence or in more statistical physics terms, it would be the log partition function. The details of this quantity, they are not really important for the talk in a sense. We will characterize it. But the important thing is that this quantity, in a way, when you study the asymptotic value of this quantity, it characterizes the asymptotic value of certain order parameters that are important in these problems. For example, in this particular case in which the data generating mechanism includes an additive Gaussian noise, and the statistician is aware of this structure of the data and its model correctly, you have this relationship, which is, if you characterize the limit of this quantity scaled correctly, which is divided by n, and you take a derivative with respect to lambda, that is the assumed signal to noise ratio, then you have that this limit characterizes exactly the mean square error of the Bayesian estimator. OK, nice. And in this case, this kind of result exists. So we know what is the limit of this normalized log evidence or log partition function, which is in terms of this optimization potential. And because this is known, then the mean square error of the Bayesian estimator is also known. So one of the contributions, one of the things we did in this work, is to have an analogous result, but in the mismatch setting I was mentioning before. OK, so what is the model we really explore? The model we really explore is a model in which we have a data matrix which is similar to the one I was describing before. So this looks exactly the same, right? We have a data matrix which is an addition between a term proportional to a matrix of rank 1 plus a full rank matrix, which is z. The only difference in this data matrix is that now this z is not a Wigner matrix, so it doesn't have this exponential of trace of z squared kind of density, but it is a rotational invariant matrix. So it's constructed by making a sandwich between a diagonal matrix and two uniform orthogonal matrices, a sandwich in which the bread would be the orthogonal matrices and the cheese would be this diagonal matrix. And we will be always assuming that this diagonal matrix has some elements that if you take the empirical measure of the elements of this matrix, it converges weekly to some well-defined limits. That has some kind of regularity assumptions that we need for the proofs. For example, we need this to converge to a distribution that has a compact support. And we need for the eigenvalues, the largest and minimum eigenvalues of the distribution to converge to the limits of this support. And another technical condition that it's not really to the point. But the important part in our model, the important thing is that, OK, so now the noise is not a Wigner matrix. So the distribution of this is much more general, right? Because a Wigner matrix, you can think like a rotational invariant matrix that has a really specific diagonal matrix here. Because Wigner matrices are rotational invariants, but they have a spectral distribution which is very particular. Here, we are allowing for much more diverse spectral distributions in this diagonal matrix. But the thing is that, OK, although this is the data generating mechanism, so really the data is coming from this additive noise that has all this structure, the statistician is not aware of all this complexity in the structure. So one nice proposal of what to do in these settings is to say, OK, there is something complex going on. I don't know what it is. So a default hypothesis would be to just assume that the noise is Gaussian. Of course, this is not correct. But this is a typical assumption that in a lot of real practical situations is used, right? And the motivation for this is really this one. OK, you have this data matrix that could be structured, but maybe you're not aware of this. So if you're really implementing this mismatched model in which you're assuming this naive Gaussian hypothesis, what is the impact in the inference task when you do so? And the kind of results we have include a variety of estimators. We divided them in three families. The first family of estimators are the spectral estimators because they are estimators that all are constructed in terms of the eigenvector corresponding to the largest eigenvalue. This is in principle a kind of a family of natural estimators. The most typical of this is PCA, which is just constructing the rank one matrix corresponding to this largest eigenvector. This exactly corresponds to the maximum likelihood estimator in the case in which you have a prior which is uniform on the sphere or rather maher. But as you can see, there are other two estimators that we consider, which we call the Gaussian PCA and Optimal PCA. What is the meaning of this? So these other two estimators, they are essentially PCA, but with a constant in front. Why would be a good idea to put a constant in front of PCA estimator? OK, these constants will depend, at least, in the signal-to-nose ratio. Why is it a good idea? OK, imagine you are in the situation in which you have some data matrix, but the signal-to-nose ratio is really small. One knows by theoretical results that the correlation between this eigenvector corresponding to the largest eigenvalue and the signal you want to infer beyond certain thresholds is really small. It's vanishing with n. So in these cases, if you use a PCA estimator, because PCA estimator is a matrix that has norm 1, this is a bad estimator in the situations in which there is no signal inside your data matrix. Because if there is no signal inside your data matrix, your mean square error will be much smaller if your estimator is exactly 0. So in these cases, you would like for your spectral estimator to have a constant in front, which if the data matrix you have has no information about the signal, that constant should be 0. And in that case, you would be estimating much better than if you're using PCA. And when the signal is really strong, maybe you want the constant to be 1. And in middle situations in which the signal-to-nose ratio is large enough for the data matrix to have some information, but not a strong correlation with the signal, you want this constant to be something intermediate between 0 and 1. So then why do we have two PCA estimators with these adaptive constants that tune the amount of information your data set has? OK, because there is an optimal way of putting a constant when you have this structure additive noise, which depends on the structure, really, on the spectral distribution of the noise matrix. So if you have information about the real distribution of the noise matrix, you could put a constant here that depends on lambda that would be optimal, in a sense, that would really adapt to the real strength of the signal you have in your data matrix. But if you're not aware of this, oh, sorry. You could still do this of putting a constant. But if you're assuming that the noise is Gaussian, you would put another constant in front, which would be the optimal constant in the setting in which you really have a data-generating mechanism that has an additive Gaussian noise. So we consider these two scale PCA estimators, one with the optimal Gaussian constant and one with the real optimal constant that takes into account the real distribution of the data. Again, in this setting, we also want to characterize the patient estimator, which the definition of the patient estimator now is exactly the same as in the setting of Gaussian noise without a mismatch. That is, patient estimator is the expectation with respect to the posterior of the rank one matrix that comes from a sample of the posterior. So it's this expression over here. But the difference with the setting in which you have additive Gaussian noise without a mismatch is that now, OK, you're taking an expectation with respect to the posterior, but now the posterior is mismatched. So this is not the correct posterior in the sense that the posterior does not reflect correctly the data-generating mechanism. In particular, we allow for three kinds of mismatch here. One is that the posterior we are going to consider only has uniform spherical priors, but the signal might come from another distribution. We also consider the setting in which the statistician doesn't know the real signal-to-nose ratio of the data matrix. And so maybe the signal-to-nose ratio proposed inside the posterior is not the correct one. That's why we have the difference between lambda star and lambda. So remember, lambda star is the real signal-to-nose ratio used in the data-generating mechanism. And lambda is the signal-to-nose ratio implemented inside the posterior. And also, the main mismatch in our posterior will be that our posterior will be the posterior corresponding to Gaussian noise, but the data-generating mechanism comes from this rotational invariant distribution. OK, one important point is that because we are in a mismatch setting, there are a lot of analytical tools that allow to analyze base optimal settings that cannot be used in this setting. So you have to work around some technical problems when you have these mismatches. And there is also the problem that is the general problem with the Bayesian estimator that in high dimensions, it's non-trivial to approximate the Bayesian estimator. So you need efficient algorithms to try to implement it. And finally, the last setting of estimators we analyze are AMP algorithms. There are two AMP algorithms we analyze. One is Gaussian AMP. And the other one, we call it, alternatively, I think in some slides, it's referred as true AMP. And in other slides, it's referred as correct AMP. Don't mind the difference. They are the same, essentially. And the difference between them is that in Gaussian AMP, that is the AMP that the statistician that implements a model for the data which has an additive Gaussian noise. So this is a mismatch AMP would use. It's the AMP that has the entire corrections corresponding to the Gaussian setting. But the true or correct AMP will be another AMP that has on-sugger corrections that are fitting the real spectral distribution of the rotational invariant noise. Here there is a mistake. I was trying to write. It's not working exactly properly. OK, here there is a derivative missing, but the idea is for you to see that's the kind of on-sugger corrections that were in the Gaussian AMP. And the idea is to take all these estimators, characterize them in the high-dimensional limits with some theory, and then to, using these characterizations in the high-dimensional limits, to compare the performances of all of them. OK, one quick comment is that, as I was mentioning before, because in the Bayesian estimator, we're using a model in which the data is assumed to be Gaussian, then the log likelihood that will be used in the Bayesian estimator will be exactly the one we saw before. So it will be exactly the same log likelihood which is here, the one that will be appearing. And the good thing is that, because we will be analyzing cases in which we have a Gaussian uniform prior on the sphere, this expression over here can be interpreted as a rank one spherical integral. So this is something we will exploit for the high-dimensional characterization. OK, the first of the main results is the characterization of the model corresponding to the posterior in the previous slide. There are these two functions we correspond in statistical mechanics, jargon to magnetization and overlap. That's why we use the usual letters to designate these order parameters. We have for these order parameters asymptotic values, which are these two functions, see that the limit of these two order parameters depend on these two parameters, which are lambda and lambda star. Remember, lambda is the assumed signal to noise ratio, and lambda star is a real signal to noise ratio. So these two values appear in both limits of the order parameters. And using the limit of these two order parameters, you can compute the limits of the mean square error of this mismatched Bayesian estimator. OK, here is a little comment that this H bar and this R, this is still just transform of the real distribution of the noise. And the H bar is another associated parameter. So in these two expressions, the real distribution of the noise matrix appears through this function and this quantity here. OK, and a couple of very brief comments about the proofs that are involved in this high-dimensional formula. One issue we have here, as I told you before, is that because this is a mismatched setting, there are things that are lost, some analytical tools that cannot be used in this setting. For example, this kind of easy relationship between the limits of the log evidence, the scaled log evidence, or the scaled log partition function free energy, whatever is your favorite name, is not really related to the MSC as it was in the non-mismatched case. So here, to work around a solution to the lack of this relationship, which is the IMMC formula, you have to consider a more general setting, which is not exactly the one we were saying before, but a setting where the noise you consider to have is the real structure noise, which is the z, plus a little Gaussian noise. And the thing you have to do is to compute the free energy, the limiting free energy, for this perturbed setting. Because now, this little Gaussian perturbation allows you to access the limiting value of the overlaps, which appeared in the MSC formula I proved before. And the good thing is that when you consider the setting where the prior is uniform on the sphere, you can compute the limit of the free energy of this log partition function that I wrote many slides before over here. So you can compute the limit of this quantity divided by n in the mismatched setting when you have a uniform prior on the sphere by using results on the asymptotic value of rank 1 spherical integrals and the spectral distribution of rank 1 perturbations of a structure matrices. With respect to the epsilon, yeah, it's convex. No, but still you have the usual relationship with the derivatives, and you can work it out. And using this limit for the free energy, you can access the magnetization and the overlap. The magnetization will be related to derivatives with respect to the real signal-to-nose ratio, lambda star, and the overlap with respect to the parameter epsilon that was introduced by this perturbation. Of course, because this perturbation was artificial, it's not really in the model we are analyzing. Then we have to take a limit of this perturbation parameter going to 0, but all the limits work out, and then you have your mean square error formula in the end. And the second main result we have is the characterization of the Gaussian wrong AMP, wrong in the sense I was saying before that the entire corrections are the ones corresponding to a Gaussian model and not to the real structured noise that the data matrix has. And the kind of characterization we have for this AMP is the usual kind of characterization. We have a characterization of all the iterates until time t in terms of asymptotic random variables that are described by some state evolution recursion. So this is essentially what is written here. And all this is in some specific metric that is the Wasserstein metric. Well, and a brief description of the proof techniques here, the idea is to write an auxiliary AMP, which has a correct denoiser that, in a way, corrects for the wrong Onsager term that is implemented by the Gaussian AMP. And the idea is that because the denoiser is chosen to correct this, but it still falls in the theory of orthogonal invariant AMP that can be found, for example, in this work. We go back to its theory to characterize the asymptotic value of this auxiliary AMP, which falls within that theory. And then using this characterization of this auxiliary AMP, we prove that the auxiliary AMP is really close to this Gaussian incorrect AMP. And we prove a kind of asymptotic bounce in L2. And we see that if we can track the auxiliary AMP, essentially we're also tracking the wrong Gaussian AMP. OK, so we have this asymptotic characterization for a Gaussian AMP and for the mismatched base estimator. And here what we do is to use these two theoretical characterizations that we rigorously built in the work. And we compare two characterizations of the other estimators we were considering that I described before. And we have a whole variety of these nice pictures, which shows the asymptotic behavior of all these estimators. We'll mention maybe just a couple of them if you want to really see the simulations and have a more detailed comparison, you can go back to the work. For example, in here we consider a setting in which the noise is not really structured. It's really big, but there is a mismatch only in the signal to noise ratio. So this is a milder mismatch than the general mismatch considered in the model we analyzed. And we're taking the assumed signal to noise ratio, which is lambda, to be two times the real signal to noise ratio. So in a way, what is the resulting model? The resulting model is a model which is correcting the kind of distribution it implements, but it's over-optimistic. It is like assuming that you have much more signal in your data than you really have. And this shows a kind of nice behavior in, for example, the Bayesian estimator, which is the dashed green line over here. I don't know how the colors are seen, if it's good enough. You see that now the limit of the MSC is not monotonous in the real signal to noise ratio. It's not monotonous in the real signal to noise ratio. And there is an intuitive interpretation of this. If you're really optimistic and you think you have much more signal intensity in your data, you will be over-estimating the norm that you should give to your estimator. Like if you think that there is much more signal in your data, you will make the estimator you have to have a much larger norm. But then there is not so much signal. So you put a large norm in your estimator, but you should really put a smaller norm. So in a way, you're making your MSC larger than it should be. And that's why you have these little peaks over here. Another thing that comes from the theoretical results is that there is an interesting relationship between Gaussian PCA and the mismatch Bayesian estimator. In this case, for example, Gaussian PCA, that's something which is similar. It has a similar phase transition in which the MSC starts to be decreasing again. You see both of them start increasing at some point, and then they become decreasing again. And these things happen at the same time. And this comes from the fact that they are both assuming the same kind of noisy solution. But they don't have exactly the same values. But we will see that in a lot of settings of structure noise, Gaussian PCA and this mismatch Bayesian estimator, they do essentially the same. There is another interesting behavior, which is that this Gaussian AMP is not doing the same as the mismatch Gaussian estimator. So even though this Gaussian AMP is implementing the same model as the posterior you have, the mismatch posterior, they are doing different things. And you can see that in all the simulations we have, always this Gaussian AMP is having a worse performance. And as you can see, in this setting in which you have the Wigner noise and uniform prior, the AMP that implements the correct on-side corrections and the optimal PCA, they are doing the same, and they are the best estimators. And that is kind of natural because these are the estimators that have really the true noise distribution. Here, this is a more mismatch setting because we are considering a noise matrix that has some uniform spectrum between these two values. These two values are only fixed so that the eigenvalues in the spectrum of the noise have variance 1. So this is comparable to a standard Wigner matrix. And as you can see, in this setting, Gaussian PCA and base PCA, they match exactly, which was not the case before. So there is a kind of interesting and complex relationship between this Gaussian PCA and the mismatch Bayesian estimator. And again, we observe that the Gaussian AMP is doing worse than with respect to all the other estimators. And the estimator is different from the mismatched Bayesian estimator. And the similar behavior is seen also if we study the problem in which now the real data generating mechanism has some additive rotational invariant noise with some Marchenko-Pastore spectrum of a spec ratio equal to 1. In this case, the picture is very similar to the situation in which the mismatch was only in the signal to noise ratio, but the relationship between the Gaussian PCA and Bayesian estimator now inverts. Now here, you can see that the Bayesian estimator is doing worse than the Gaussian estimator and the places of the phase transitions. Now they don't exactly match. You can see that the mismatch Bayesian estimator starts making worse predictions before Gaussian PCA. And another intuition that we had from the theoretical results is that if you plot the overlap of all these estimators in the case in which the prior is uniform on the sphere, you can see that the behavior of the overlaps, the asymptotic behavior of the overlaps as a function of the real signal to noise ratio for all of these estimators is the same. So with this intuition, what does it mean that all these estimators have the same overlap? It means that the angle between the estimator and the real signal you want to infer in all these cases is the same. So in a way, when the prior is uniform on the sphere, all these estimators, in a way, behave like if they were PCA, but scaled with some constant, which is different in every case. One important thing that I don't think I mentioned before is that while the characterization of the mismatch Bayesian estimator was restricted exclusively to the uniform prior on the sphere, the characterization of this wrong Gaussian AMP does not require that. So you can assume some general factorized prior or spherical prior. It's OK. But in the special case in which the prior is uniform on the sphere, you have that all these estimators have the same overlap, which tells you that they are essentially PCA with different adaptive norms. OK. So to close a little bit the talk, we have all these asymptotic characterizations on all these family of estimators in this heavily mismatched setting in which the statistician completely forgets, completely disregards the structure of the noise. And we see that this heavy mismatch in the model brings this complex phenomena into the picture. It makes all the relationship between all these estimators that in the situation in which there is no mismatch, they all coincide, makes all these interesting behaviors in different settings. In particular, I think one of the take home messages is that in this setting we have that the Gaussian AMP and the Bayesian estimator, although they are implementing the same model, the estimator in the end, they do not match. So the mismatch makes these two things to differ. That's it. Thank you for your attention. If you have any questions. Many questions. So if no one wants to start, I'll kick in. So you told us what is the price of ignorance. But I was curious whether you ask if you have any insight about the opposite question, like the price of knowledge. So if I had some underlying model, which is just like the standard one with Vigner noise, but for some reason, I read your paper, so I know like let me use just orthogonally invariant AMP. Would you think that I would pay a price for that as well, or somehow that would adapt to? You mean the situation in which you're trying to kill a fly with a bazooka? Exactly. I wouldn't really know. But that's an interesting question, like if you overdo it instead of underdo it. So I statistician, I don't know what the model of the data, but let me use the most powerful spectra estimator that I have out there. Do they pay a price if the model is simpler than? I guess in that case, I mean, if you really want to use this kind of base optimal algorithm, the first thing you would need to do is to have information about the spectrum. Because if not, you would always be in a mismatch setting, which wouldn't be kind of optimal. But it's still interesting to do. OK, thanks. Any other questions, otherwise? I have other ones. Thank you, Manuel. It was really nice. Please, can you go back to slide 16? 16? Yeah. Yeah. Yeah, one thing that wasn't clear to me is how does this z is related to the ground truth noise z? This is the real ground truth noise. So what we did is to really analyze a problem in which it's like an auxiliary problem that you need to analyze theoretically to be able to access this quantity Q in the overlap, which is needed to characterize the MSC. And in this case, you have to analyze a kind of more generic problem in which you have the real noise z plus a little component of an extra noise, which is Gaussian. But the z is the same as in the model you want to characterize. Thanks. Welcome. Thank you. Please, can you go to the slide where you have a PCA, OPT, and this one? No, no, no. This is where you have where you rank which one is the best. OK, the graphs. For example, yeah, all of these settings, which one? Yeah, you said, if I'm not mistaken, you said the PCA, OPT is the best. Are you saying that, like for example, in future selection, in a like data set, for example, we used to in a like data set, for example, we used to use PCA to like a dimension reduction of a data set before we use it for machine learning. In this case, are you with your own research? Are you saying with your observation or with what you have done, is this the best we can use? The best is PCA, but with a scaling constant. Remember that when we define the PCA, we said there are three variants of PCA. There are like real PCA that I'm going whatever please. Now the jet lag is starting to kick in. Luckily, it started to happen in the end of the talk. Here, you see we have three variants of PCA. One which is just PCA, so you just construct the rank one matrix coming from the eigenvector from with the largest eigenvalue. And the optimal one is what we call optimal PCA, which has a constant in front. And this constant is a function of the signal to noise ratio that has all the information about the spectral distribution of the noise matrix. So this is like a really smart PCA in a way. And that smart PCA is the one that is optimal in achieving the minimum MSc, which is this one. Any other? Yeah? If you don't have rotation in variants, but you still have a formula for the free energy, are you able to do a similar analysis with the AMP in those settings? If you have? If you do not have rotation in variants, but so instead of using the formulas for the spherical integrals, you have another way to compute the free energy. Are you still able to do the AMP analysis or is this rotation invariance important when you do the PCA analysis as well? I'm a bit confused. You mean for the Bayesian estimator? So you don't have a uniform prior on the sphere? So that's right. But let's say you have some other way to basically compute all of these quantities. Are you still able to do some AMP in the other settings? Yeah, the results for AMP that I forgot to say at the moment, you don't require for rotation in variants of the signal. The signal can be whatever prior that is factorized or Gaussian. It's much more general, this. Thanks. Can I just ask a quick follow-up? So this result here, the state evolution for this is not Gaussian. Is that correct? Or are these Gaussian? It's a state evolution coming from this auxiliary problem that has the state evolution of the orthogonal ensemble AMP. So it involves all the cumulants and it's a complex expression. So these variables here, these x1. I didn't write everything because it's a mess, you know. OK, but I just wanted to clarify. So even the x's here are not Gaussian variables in the limit. The x's, you mean these x's? Yeah, yeah, yeah. Because they have all these extra terms, they are not. So none of this, OK. Can I ask another more basic question? So in the very last slide that you have on page 22, I think it was? 22. This one? Right, so you have this comparison saying that all of them achieve the same overlap. I was wondering, do you also have a simulation of this in a non-sphorically invariant signal case? And then what does that comparison look like? I'm just curious. I think we did some simulations for AMP for the overlap. But because we didn't have a comparison with a Bayesian estimator because the result for the Bayesian estimator is only on a uniform prior on the sphere. We didn't plot them. I don't remember. You should ask Marco if you want to. He was the one running the simulations. Maybe I have my last question. Is putting my physicist's hat, if I understand correctly, like all your results are done by computing explicitly these spherical integrals. So we know that when we have mismatch, usually we have RSB. So from the paper of Fabry to Antenuch and so on, when we mismatch the likelihood, do we know here what kind of structure of the measure we have, like which level of RSB or if it is RS? Because we didn't use any of these kind of techniques. We went directly to spherical integral. It's not clear the structure of the replica symmetry breaking or not. But we had some discussion, John, about this. You remember? And we had this kind of heuristic argument, which might be wrong, that was pointing to the fact that this was not replica symmetry. We never knew for sure. Maybe we can discuss this at some point if you want. From the paper of Fabry to, at least in the case where you just mismatch the SNR, like your first, first, first plot, we know that I think two is not in the 80 line. But probably in general, I think, would not be RS. We thought about it for a while. We had this heuristic argument, which we were not completely sure it was correct, which kind of pointed out that there was a kind of replica symmetry breaking, but that had a really simple structure, like not kind of Schrodington-Kerr-Patrick model replica symmetry breaking. But maybe we can discuss it. I think it's interesting. OK, thank you. There is not other questions. Let's thank Manuel again.