 The second talk of the morning session. So for the second talk, we're going to have Zo Fun from Yale, who's going to be telling us about orthogonal invariant spin glasses and linear models, invariant designs. Floor is yours, Zo. OK, thank you very much, Bruno, for the introduction. I want to start by thanking the organizers for the invitation to come. So I'm not a physicist by training. I think I'm very far from being a physicist. And my first exposure to ICTP was a couple of years ago during this youth and high dimensions workshop that Jean has been organizing for many years. I think this was at the time organized between Jean, Marco, and Sebastien, and Mary Lou. And it was a wonderful experience for me. And I think this youth and high dimensions has played an important role in bringing together a lot of young people in this community over the past years. And at that time, this was in the middle of COVID. So I wasn't able to come to Trieste in person. I was participating in this virtually. So I was very happy to receive an invitation to come again for this workshop here. That's, again, organized by Jean, and Shibo, and Prakya, and Manuel. So I'm very happy to be here. Thanks for the invitation. So the work I'm going to talk about today is joint work with Shibo Broto, my friend who's co-organizing this workshop. Yufan Li is a PhD student at Harvard Statistics who's working with Shibo Broto and Prakya jointly. And I guess through my acquaintance with Shibo and Prakya, I've had really the great fortune and privilege of being able to work with Yufan together on a couple projects. So I hope that maybe he will be part of one of the future editions of Youth and High Dimensions, and then you'll be able to hear about some of this work directly from him. And then Yi Hongwu is my colleague at EOS and DS, who's a friend and personal mentor of mine as well. Great. So this talk is going to be about two main models, and they're quite standard models. I'm not going to say too much in the way of motivation. And I'll just start by telling you what are the models and what are the results that we were able to prove. And then hopefully, I'll have some time to go into some of the high-level ideas of how we do the proofs. So this will be the structure of the talk. So the two models, the motivating example, that the model that we were primarily interested in was this very classical Bayesian linear model. You have a measurement design that I'll call A. It's M measurements and regression coefficients. And then I'll put ourselves in a Bayesian setting where the unknown signal vector here denoted x star has IID entries. And I'll assume that their IID, they come from this prior distribution pi. And I'll use capital X star to denote also the common law of these signal components. And for the noise, I'll assume that their IID Gaussian. If we write down the posterior distribution of X star given these measurements, it would take this form. So you would have this quadratic term coming from the likelihood due to the Gaussian noise. And then you would have this product of pi. This is the product prior over the variable. And then I'm going to use sigma as the dummy variable to denote a sample from this posterior to distinguish it from the true signal X star. So hopefully that won't be confusing. And then for purposes of statistical estimation, we'd oftentimes be interested, for example, in computing the posterior mean of X star. So in this notation, it would be the ensemble average or the Gibbs average of this vector sigma over this distribution. And I'll use this bracket notation throughout the talk to denote this kind of ensemble average. And the kinds of questions that we're interested in are asymptotic questions of the following form. As m and n, the number of measurements and the number of signal components go to infinity, what is the limit of the mutual information between the unknown signal vector X star and your observation vector y? If I were to compute this posterior mean estimator and look at its mean squared error, then what is the limit of this mean squared error in this asymptotic limit as m and n go to infinity? And then for the purposes of doing variational Bayesian inference, if I want to actually compute an approximation to this mean, can we write down a system of mean field equations that characterize this posterior mean? So these are the kinds of questions that I hope to be able to talk about. The second model, so in order to study these questions, we started our work, I guess, a couple years ago by studying a simpler model, but that has a similar characteristic. So the second model I want to talk about is the spin glass model. It's the model that you see here where the Hamiltonian with the model is a quadratic Hamiltonian. The vector sigma here I'll just assume r plus one minus one using spins in this model rather than having a general prior. And the Hamiltonian consists of these two terms. One is this quadratic couplings term defined by this couplings matrix J. It's a symmetric m by n matrix, scaled by an inverse temperature parameter, beta. That's positive. And then there's a linear term that's this external field, which is defined by this vector h. Okay, and throughout the talk, just for simplicity, let me assume that the coordinates of this external field h are i, i, d coming from some common distribution and that common law I'll denote by this variable capital H, okay? And again, we might be interested in computing something like the mean of this distribution. This is what I'll refer to as the magnetization, and it's the ensemble average of this draw sigma from this measure. And I should say if you have any questions during the talk, please do feel free to interrupt as we go. Okay, and the questions that we hope to be able to think about in this model are, again, our analogous questions. So for certain models of the random couplings matrix J, which I'll clarify in a moment, asymptotically as n goes to infinity, what is the limit, first order limit of the normalized free energy, one over n times log partition function. And if I wanna write down a system of mean field equations for the mean or magnetization of this distribution, what would these equations look like, okay? And let me just clarify right off the bat that the results that I'm gonna talk about in this talk are restricted only to a high temperature regime of this model. So we're not gonna talk about any low temperature phenomena. Okay, so for both of these models, these questions are quite well studied and quite well understood in the settings where your disorder matrix has independent entries, right? So for example, if I start with the second of these two models, the spin glass model, and I consider the case where the coordinates of this couplings matrix J are IID Gaussian, let's say scaled with variance one over n, then this is the standard SK model. And in the high temperature regime, in other words, for small enough beta positive, it's understood that many properties of this model are characterized by a single scale or overlap parameter, which I'm denoting here by q star. This parameter solves the fixed point equation, q star is the expectation of the hyperbolic tangent of h plus beta root q star times z. Here h is the, if I remind you, h is the common law for the entries of the external field vector, and then z here is a standard Gaussian variable that's independent of h, okay? Okay, so in high temperature regimes, what's understood is that the free energy one over n log z has an almost sure limit that's predicted by this replica symmetric kind of formula, and the magnetization vector approximately satisfies a system of mean field equations that go by the name of the TAP or the Dallas Andersen-Palmer equations. And I'm writing approximately equal here, and I'll make a bit more precise what this approximately equals means later when I state a result. And okay, so the kinds of mathematical tools that have been used to prove these results over the years, so there have been several different techniques and tools. Perhaps most notably is a combination of an interpolation argument together with Stein's Gaussian integration by parts lemma that was introduced by Goera and further developed by Telegrond. And to prove the TAP equations, you can use this in the context of a cavity method, which was also made rigorous by Telegrond. And there have been a few other techniques based around studying a dynamical evolution of this Hamiltonian. So if I study a Brownian motion that reaches this eventual Hamiltonian J and then using stochastic calculus to understand the dynamics along this evolution. But what I want to emphasize here maybe is that a lot of these techniques rely quite strongly on this assumption that the entries of this coupling's matrix are independent, right? So they use this independence in a crucial way when constructing these cavity fields and doing these analyses. Okay, and the models that I'll talk about are rotationally invariant models to give a little teaser where we don't have this independence. Okay, so in the Gaussian linear model there are very analogous results that have been developed. So if I look at this Gaussian linear model and assume that the entries of this measurement design are again IID Gaussian with the variance here I'm calling beta over N, then in the limit as M and N go to infinity proportionally and the ratio converges to alpha, many properties of this model are characterized by some equivalent scalar channel or single letter characterization model, right? So the scalar channel is the following form. I just have one single observation or one single draw from this prior distribution that I'm calling capital X star and then you observe this in a Gaussian channel. So it's corrupted with some Gaussian noise and then the signal to noise ratio parameter of the scalar channel I'm denoting here by gamma star. Yeah, so you observe this, observe this noisy observation of X star which I'm calling here Y and then the characterization is that so this gamma star solves a system of two fixed point equations. You have this eta star inverse is the expected posterior variance of estimating X star on the scalar channel. In other words, it's the base optimal MMSE in this scalar observation model and then gamma star is related back to eta star via these two parameters alpha and beta, right? The variance of your measurement entries and then the aspect ratio M over N. Okay, and what's understood about this is that if I look at the limit base mean squared error in the original linear model, it coincides with the base MMSE in the scalar channel that's eta star inverse. The mutual information between the signal vector X star and Y in the linear model has a limit that's related to the scalar channel mutual information in some simple way. And then the posterior mean vector in this Bayesian linear model satisfies a system of mean field equations that you can think about as perhaps analogous to the TAP equations. And again, the tools that have been used or a lot of the tools that have been used to prove these kinds of things over the years, actually a lot of them developed by Jean, where I learned a lot of this literature myself, are extensions of these interpolation ideas of Guarara, notably this adaptive interpolation method of Jean that can achieve both upper and lower bounds. In this context, it's curious that there's an additional body of tools that seem available for these information theoretic models that don't seem available in pure spin glass contexts. So we have the IMMSE relation between the mutual information and the base optimal mean squared error. And then there are algorithmic arguments that you can do using AMP. And then arguments around the so-called area argument of integrating the IMMSE relation. And these ideas go back as far as Montenario and say in 2006 for analyzing these kinds of models. Okay, so what I wanna talk about in today's talk is, so we were, I guess back a couple of years ago, we started to think about these kinds of models and settings where the entries of the disorder, so in the Russian context, the design, they are not ID, they're not independent. And we were motivated by these kinds of questions because we wanted to apply some of this high dimensional mean field theory to problems of statistical inference with real data. And it's sort of clear in a lot of examples of real data that the spectral properties of the matrices we're looking at were very far from what is described by ID kinds of matrices. And our motivations, I think, were quite well explained in the talk by Rishabh back a couple of days ago that the hope is that if you have a theory of these results around rotationally invariant models that these results would be valid for a universality class of matrices that extends beyond the universality class of IID Gaussian matrices. Yeah, so the models that I'm actually gonna talk about today for the spin glass context, it'll be this orthogonal SK model. It looks the same as the previous SK model except that rather than assuming that the entries of J are IID Gaussian, I'll consider a couplings matrix J that's orthogonally invariant in law. What I mean by this is just that the eigenvectors of J are hard uniform distributed on the orthogonal group independently of the eigenvalues. And then asymptotically, as n goes to infinity, I'll assume that the empirical spectral distribution of this matrix J converges weakly and what I'll call strongly to a compactly supported limit law, D. So by strongly, I just mean that the upper and lower end points of support converge to the end points of support of this limit law. And again, okay, so the theory around these things, I guess from a rigorous perspective, I think is less developed and much more of what I'm gonna say is conjectural at the moment. So what's believed should be true is that, again, a lot of the characteristics of this model are characterized in the high temperature gene by a single scalar overlap parameter q star. It solves a system of fixed point equations that you see here where entering into the system of fixed point equations is the derivative of the r transform of this limit spectral law of D. And if you wanna understand the limit free energy, you can compute this using the replica method. This computation was done by Marinari, Parisi and Rittorpeck in 94. I'll show a formula for this on the next slide, but the free energy will be characterized by the solution of this fixed point system, q star and sigma star. And then you can also derive a system of TAP equations or TAP type equations that characterize the magnetization in this model. And I'm aware of two quite distinct derivations of these equations in the literature. One is this high temperature expansion approach, I guess, building on the approach of Plafka by Parisi and Potters in 95. And then the second approach was done by Oper and Wienter in this adaptive cavity method kind of approach where they introduced a new idea to close the system of means and variances for the parameters of the cavity fields. In terms of rigorous results, I guess before we started looking into this model, I think it was very sparse. So my collaborator, Shibarotto, with Bashar Bhattacharya had shown the following that if you're in this model without an external field, then in the high temperature regime, they proved rigorously the limit of the free energy of the log partition function. In the setting with no external field, the model has a special symmetry that causes the annealed free energy to coincide with the quench one. So in this case, you don't need the full replica method to compute the free energy, right? You can just do a second moment calculation. And this was made rigorous using a spherical integral analysis by Bhattacharya and Shen. Okay, so our result in the setting, I guess this goes back a couple of years ago now as the following. For all values of beta that's sufficiently small and sufficiently small depends on the limit of the spectral law of your coupling's matrix. First of all, the fixed point q star sigma star of that equation is unique. And then the limit free energy converges to this replica symmetric prediction that was computed by Marinari, Peresian, Rattort. And the equation is the one you see here. I think if I understand correctly, the full conjecture might be the following that this result should hold in a full high temperature regime that's defined by this pair of two conditions. The first is an analog of the AT condition for the SK model translated into this orthogonally invariant context. And then the second condition here is a condition that this argument beta times one minus q star belongs to the domain of the R transform that arises in this model. And then on the side of TAP equations, this is with Shiboroto and Yufan. We were able to show the following again for a sufficiently small beta or sufficiently small depends on the limit distribution D. The magnetization of this model indeed satisfies the TAP equations that were predicted by Peresian potters in this kind of L2 sense. So the squared deviation across all coordinates, the mean squared deviation of the magnetization was predicted by the TAP equation goes to zero as then goes to infinity. In the rotationally invariant linear model, there's a similar kind of picture. So if I consider this linear model, we're now the entries of this design matrix A are not IID Gaussian. But let me assume that A transpose A as a symmetric matrix is orthogonally invariant in law. And it's quite natural to look at A transpose A because if you think about the Hamiltonian in the base posterior distribution, the quadratic term is exactly defined by minus A transpose A, right? So if I assume that this is rotationally invariant in law and the spectral distribution again converges weakly and strongly to a limit that I'll denote as D squared as M and N go to infinity. And again, there's conjecturally a single letter characterization of this model. So if I denote now by Rz, the R transform of minus D squared, then there's an equivalent scalar channel that supposedly characterizes this model where the SNR parameter gamma star of the scalar channel solves this extended system of two fixed point equations and you see the R transform of this limit spectral law again appearing in the definition of this fixed point system. Okay, so for this model, you can again compute the limit of the free energy or the limit mutual information using the replica method and you get a replica symmetric formula that I'll show on the next slide. I think that this was first done by Takeda Oda and Kabashima in 06 and for other types of priors, this was maybe extended by Tulino, Kair, Verdu and Shami. The posterior mean in this model, you can again characterize by a system of tap type mean field equations and there have been many different derivations or approaches to deriving these equations in the literature. One of them is this expectation consistency framework that was developed by Albert Winter and I think for this model, this computation was carried out by Kabashima and Vekapara. You can derive these through these vector A and P iterations developed by Rangash Nader-Fletcher and then where I learned a lot of this literature from was this nice paper by Meilard Foyini, Kastanello's, Kazakala Mezard and Sderbarova in 2019. This was a paper in the Journal of Statistical Mechanics where they connected a lot of these different frameworks and showed that you can also derive these type equations by an extension of the high temperature expansion approach of Parisian potters. Okay, and again on the side of rigorous results much less is known about this model. So there was a nice results back in 2018 by B.A. McRee, Meilard and Kazakala who proved that this mutual information indeed converges to this replica symmetric formula in the case where this design matrix A factorizes as a product of individual matrices, each of them having IID entries and the last one has IID Gaussian entries so that is rotationally invariant in law. And this proof was using an adaptive interpolation argument to do the interpolation on this last Gaussian matrix. And then somewhat related to our work, so back a couple of years ago there was a work of Gerber-Loebara Kazakala who studied not the Bayesian model that I'm talking about today but this kind of regression problem with a convex regularizer. So they were looking at this in the context of convex empirical risk minimization rather than Bayesian inference and they were able to rigorously establish also a replica prediction for the limiting error of these least squares estimators with convex penalties. Okay, so for this model the result that we were able to show so far is the following. So suppose that you have this prior distribution for the coordinates of your regression vector, it's mean zero and then we have a technical condition for the prior that includes the following two cases. One of them is if it has compact support on some interval minus a to c. The other is if it has a density over the entire real line but this density is strongly locked on cave on the entire real line. And there is a more tedious and technical condition that I won't elaborate upon in this talk today. But so the result is the following then for some beta naught that depends, let's say on the length of the support. If the support of this limiting spectral distribution of A transpose A, the support of this variable D squared is contained in an interval of length at most beta naught. And you should think about this as a sort of high temperature condition after you center and rescale the prior or sorry, after you center and rescale the spectral distribution. Then we're able to show the following that the fixed point of the system is indeed unique. The mutual information has a limit that is predicted by this replica symmetric formula and it's the formula that you see here. It involves the mutual information of the scalar channel. The Bayes mean squared error has a limit that coincides with the mean squared error of the scalar channel. This is just this parameter A to star inverse. And then the posterior mean of this model does satisfy a system of tap type equations. It's the one that you see here. And again, we prove this in a sort of L2 kind of sense. So the average square discrepancy goes to zero. Okay, and I think the full conjecture here, I mean, so this is a Bayesian model with a correctly specified prior. So it's believed to be replica symmetric. In fact, it's known by quite generic arguments that the overlap does concentrate. And so I think the full conjecture here is that these results should hold without this kind of high temperature condition for the spectral support that we shouldn't need this condition that the support of D squared is contained in this small interval. But this is open. Okay, let me pause here and see if there are any questions before I move on. Okay, if not, so for the rest of the talk, what I want to do is just to maybe go through some of the sort of high level or bird's eye view of the proof ideas of these results and how we prove these things. So let me start with the free energy in the SK kind of model. And okay, underlying all of our results, we're using this argument or this technique that was developed by Bolt-Hausen for the SK model somewhat recently, I think, in 2018, which this idea of conditioning, this idea of doing a conditional analysis of the free energy conditioned on iterations of an AMP or TAP kind of algorithm for computing a fixed point of the TAP equations. So in the SK model, the argument looks like the following and then this is some rough summary, I think of what was done by Bolt-Hausen. You consider these TAP or AMP iterations for solving the TAP equations in this model and for the SK model, it'll look like this. And then this matrix J, the couplings matrix in SK would be GOE. So if I want, I can write it as Z plus Z transpose where Z is an asymmetric matrix with IID Gaussian entries. And then up to some fixed iteration, which I'm calling here little T of this algorithm, you define the sigma field or the filtration that's generated by the iterates of this algorithm, that these M iterates, and then these iterates multiplied by Z and Z transpose. Yeah, and you consider this generated filtration. And the idea inside of Bolt-Hausen is that if you compute the first and second moments of the partition function, not unconditionally, but conditioned on the filtration generated by this algorithm, then they're going to coincide with the replica prediction. So it's the replica symmetric formula and two times the replica symmetric formula. And then because this free energy one over in log Z concentrates exponentially around its mean, this is enough to imply that unconditionally the free energy converges to this replica symmetric prediction. And maybe some amount of intuition about this construction might be the following, right? So if you have a model without the external field, I guess the reason why the second moment agrees with the first moment is because the mean or magnetization of the model is fixed at zero. And if you do the second moment analysis, it's going to depend on this overlap order parameter. And without the external field, this order parameter is maximized at q star equals zero. So this is the point where your second moment is going to coincide with the first moment. And once you have this external field, the q star parameter is no longer zero. But what conditioning on these type variations allows you to do is essentially to recenter the model around where the magnetization model should be. And then if you look at two replica centered around that point, then in the second moment calculation, they should be orthogonal. And so you can recover the second moment kind of condition. And there was an idea that was related to this that was used, I think independently by Ding and Sun for analyzing a different model of the easing perceptron that I won't get into today. Okay, so at a bird's eye view, what we do to prove some of these results for the orthogonally-variant SK model is to carry out a similar strategy but using a system of tap iterations that are tailored to this rotationally-invariant kind of couplings, right? So this was an algorithm that was introduced by Chuck Mock and Oprah in 2019. The structure of the algorithm I think is motivated by these vector A and P kinds of algorithms that was developed for linear models. And it's the algorithm that you see here. It's defined via a resolvent of this couplings matrix J. The specific form of the algorithm I don't think will be too important for what I'm gonna talk about. So you can ignore the form here if you would like. And then this hyperbolic tangent of H plus YT is the vector that's conjecturally supposed to converge to the magnetization of the model. And I'll come back to this work conjecturally a little bit later in the talk. Okay, so from this algorithm we can use a similar strategy of defining a filtration or a sigma field that's generated by the iterates of this algorithm, right? So in this J, there are two copies of this orthogonal matrix and so we decouple those two copies. So you have these iterates X and then you're multiplying by this orthogonally-invariant matrix. So we condition on X, we condition on one copy of O multiplied by X. And then we condition on Y, which is the second copy of O multiplied by lambda times O X, right? And after conditioning on all of these things, what we can then show is the following that if you compute the first and second moments of the free energy conditioned on this partition function, then again, you recover the replica symmetric predictions IRS and to the IRS, okay? So let me explain a little bit how this computation goes. Yeah, I think I do have time for this. So what happens is when you condition on these iterates what you're conditioning on are these linear events in O, right? You're conditioning on O X being some fixed vector S and then O transpose lambda S being some other fixed vector Y. And so what you're really conditioning on is a linear conditioning event for this horror matrix O, right? That O times some N by two T matrix here is equal to some other matrix, yeah. And this linear conditioning event, it's easy to characterize and this is an idea that's been used a lot in the analyses of AMP algorithms for these kinds of models as well, that you have an explicit description of what this conditioned horror orthogonal matrix should look like, right? So it has to, if I condition on the event that OA equals B, then the matrix has to rotate A to B. So there'll be a deterministic rotation that moves A to B. And then in the space orthogonal to A, it still behaves as a sort of horror independent random rotation in N minus K dimensions. So you can express it in terms of a reduced horror matrix that's of size N minus K by N minus K. And you have this explicit representation. And now if I, let me just do the first moment as an illustration. So if I wanna compute the first moment of the partition function conditioned on this filtration inside the Hamiltonian, I have this O transpose DO multiplied by sigma on both sides. And if we apply the conditional law of O, then there's going to be a component that's deterministic that projects sigma onto the direction of A. These are the AMP iterates. And then there's a component that's orthogonal to that that will remain, you know, that still has this N minus K dimensional rotation. And what you're faced with is the task of computing the expectation of this exponential that involves a quadratic expression in O tilde. Okay, and this you can compute using a spherical integral. So we formalize the following spherical integral that if I want to evaluate asymptotically the expectation of an exponential of some of this kind of quadratic looking expression in O, then you have this explicit expression involving an infimization over a scalar parameter gamma. This integral is a sort of HCIZ type integral rate. So if you consider the case where there's no linear term, you don't have B and you just have this quadratic term, then this is exactly a rank one HCIZ integral. And this evaluation would take a more explicit form. It would be some integrated R transform. And this is a result that was shown by Guionae Meida in O5. And the extension to this kind of integral here is using the same ideas, right? So the way you prove this result is this vector OA is uniform on some sphere so you can represent it as a Gaussian vector divided by its norm. And then you can do a large deviations analysis on that Gaussian vector to get this result. Okay, so if you do this evaluation then what you end up getting is just an expression for this conditional first moment where inside the exponent you have some functional of the empirical measure of all of these A and P iterates or these tab iterates that you've seen thus far together with the empirical measure of the eigenvalues D of your couplings. And the last ingredient of this analysis is a formal state evolution of these tab iterations, right? So as n goes to infinity, the joint distribution of entries of all of these A and P iterations together with the entries of the external field and the entries of this diagonal matrix representing the eigenvalues of your couplings, they converge weekly almost early to some joint limit law. And there's an explicit characterization of this law which I'm also not stating on this slide. But so if you pass to this limit and then you apply a large deviations analysis, what you get very naturally is a variational formula that characterizes this conditional first moment, right? So if I fix an iterates T as n goes to infinity, I get this variational representation of the conditional first moment. And the variational representation, it depends on a couple of things. So there are order parameters, the number of order parameters is scales with T and these order parameters represent the inner product of sigma with your external field, together with the inner product of sigma with all of your tap iterations up to that point. They're order T tap iterations. So you get these order T order parameters. And then when you do the large deviations analysis, it'll involve the cumulon generating functions of these order parameters. So you get these dual variables that you're now infamizing over. These are capital UVW corresponding to these order parameters, lowercase UVW. And then you have this additional infimum over this gamma that came from the spherical integral. So you have this kind of characterization. And then the rest of the analysis is to just try to analyze somehow this low dimensional variational problem. And the following things are reasonably straightforward to see. So it's not hard to guess I think what is the relevant fixed point of this variational problem. If you guess that fixed point and evaluate it, you'll see that in the limit as T, the number of AMP iterations goes to infinity. This gives you the replica symmetric prediction. It's also not hard to show a lower bound. So if I were to fix these order parameters UVW at their correct values, then the infimum over all of the other variables turns out to be convex. And because of this convexity, the optimum is actually achieved at the correct values also for these dual parameters. And from this argument, you get a lower bound for this conditional first moment. And the part of this that's more difficult to analyze is to show an upper bound to show the optimality of this fixed point. And for this analysis of the upper bound, this is where we're using very crucially the high temperature assumption on the model that we have this Anzatz for what is the global optimum. And then we have a particular, it's a sort of crude specialization of the inner dual parameters based on the outer order parameters so that the resulting function is actually concave in your order parameters. And this you can only achieve under some high temperature assumption in the analysis. But with this concavity, then we can also show that this values IRS is an upper bound and combining these analyses, we get that the conditional first moment converges to this replica of symmetric prediction. And the analysis of the second moment is completely analogous, so I'm not gonna talk about it. Okay, let me pause again to see if there are questions. Okay, so let me maybe go through some of the ideas at an even higher level for the other parts of this. So for the TAP equations, it's a little bit curious, I guess that okay, we use this algorithm where we condition on the iterates of this algorithm and this algorithm is supposed to compute the magnetization and I guess the reason why this method works is because it's an approximation to the magnetization. But I guess it's a little bit curious somehow that this argument itself doesn't prove the TAP equations for the magnetization. So this required an additional idea. And okay, the idea is the following. So if I go back to these TAP iterations of Chuck Mark and Oprah, by design, these iterations will converge in sufficiently high temperature. By design, they will converge to a solution of the TAP equations, so this is known. And in fact, this convergence holds in the full conjectured high temperature region. This was shown by Chuck Mark and Oprah. However, what we didn't know rigorously was that these iterations also converged to the magnetization. So in order to show that the magnetization satisfies the TAP equations, it would be sufficient to show that this algorithm indeed converges to the magnetization of the model. In the SK model, there was a result that showed that Bolt-Hausen's TAP iterations converged the magnetization. This was done by Wei Quachen and Zitang and their argument used an equivalence of Bolt-Hausen's TAP iterations with the sort of cavity iteration where you remove one additional spin every iterate. And here, as far as I know, there's not a cavity method interpretation of this algorithm here. And so we show this convergence using a different geometric idea that's related to this conditional second moment analysis I was just talking about. And the idea is the following. We consider a replicated system of capital N replicas. As you think about capital N as being some fixed but large number. And you consider the restriction of the system of replicas to this band that's centered around MT, some large iterate of these TAP iterations. So in this space, you have this iterate MT. And then you consider the band of the hypercube that's of your vector sigma that's orthogonal where sigma minus M would be orthogonal to M. And then between the replicas themselves, we also consider restriction where they're pairwise orthogonal to each other. And the intuition for this restriction is that it's believed that these kinds of configurations of sigma would make the dominant contribution to the free energy that everything else is asymptotically negligible. And this is an idea that's inspired by actually analyses of low temperature regimes of SK due to Subhag and Chen and Pinchanko. And maybe I can make a small caveat that here, we're not centering by the actual magnetization of the model, we're centering this band by the iterate of the TAP algorithm, which we're trying to show coincides with the magnetization of the model. But so the definition of this band is centered around MT, not the magnetization. Okay, and then if you were to extend a little bit to this kind of conditional moment analysis, you can show the following that indeed, this restricted partition function appropriately normalized by the number of replicas converges to the same replica symmetric limit as the unrestricted partition function. And you can do this using a variation of this conditional moment analysis that I was describing. And what this means is that if I were to look at the Gibbs measure of this N-fold replicated system, the probability that these replicas actually belong to the restricted band is something that's better than exponentially small in N. So you'll have a probability. So in the exponent, the term that's linear in N vanishes as you take more and more AMP iterations. And then there's maybe some sub exponential term in N. And this is the guarantee that you get about the Gibbs measure. And then the proof is concluded in the following way. So if you're on this event where these capital N replicas actually belong to this band, then it's just a deterministic consequence of this that their average is gonna be close to the center of the band, which is this vector MT. And on the other hand, if I look at the unrestricted system of replicas at sufficiently high temperature, that's going to concentrate around the true magnetization of the model. And this you can show using a recent loxobal of inequality for these high temperature spin systems that was developed by Baruch, Mid, and Badano. And this concentration will be exponentially good in N. And if you combine these two things, what this implies is that this center of the band MT indeed has to coincide with the magnetization for all sufficiently large iterates T and all large N. And this shows that the algorithm converges to the magnetization. And because the algorithm converges to a fixed point that satisfies the type equations, it shows that the magnetization satisfies the type equations. So this is the argument for tap. Take a quick check on time. Do I have like five minutes roughly to talk about? Okay, so in five minutes I may maybe talk about some of these ideas in the linear model. So to analyze the linear model, we extend this analysis using a similar kind of approach. So in the linear model, we have this model Y equals AX star plus noise, and there's this equivalent scalar channel, which is defined by this pair of two fixed point equations. And we also consider a tap iteration for solving the conjecture tap equations, in your model, and then these iterations are what are commonly known as vector A and P in the literature. So these are due to Rangan Schnitter and Fletcher, to Keiko Takeuchi, and to Ma and Ping. And I've massaged the form of the vector A and P iterates into this form, and again I guess the precise form is maybe not too important for what I'm gonna say. But you see that this form is analogous to these tap iterations for the orthogonal invariant SK model that I showed earlier. So it depends on a resolvent of this matrix A transpose A. Okay, and again by design these iterates converge to a solution of this conjecture tap equation that you see here. Okay, so what we do is to carry out a similar conditional second moment analysis. For technical reasons, because we're considering cases where this prior might have unbounded support, it's helpful to just restrict the squared out to norm of the sigma to some compact set. So we consider some truncated partition function, Z for this model truncated to the set U. And again, we have the following computation. If you compute the conditional first and second moments of this partition function for a sufficiently large truncated set, then you get the replica symmetric prediction for the free energy and twice the prediction. Okay, for the second moment actually in this work we only proved an upper bound. We were too lazy to prove the lower bound because you don't need it in the argument. But I think you can also prove a lower bound if you really want. And then as a consequence the unconditioned free energy would converge to this replica symmetric prediction. And then this truncation is easier to remove. So if you truncate to a ball that's sufficiently large it's not hard to show that the contribution, the free energy outside this ball is negligible. So you don't have to worry about this. Okay, so this immediately implies the limits of the mutual information because there's a very simple relation between the mutual information and the free energy. And then to understand the Bayes-Opnel MSE you can do the following. So if I restrict this partition function to the set of vectors whose, so let me remind you this restriction is the squared difference between the vector sigma and the true signal x star. And for, so the partition function will be dominated by vector sigma where this squared difference is concentrated around some value that's two a to star inverse. So if you restrict the partition function outside of this then you can repeat this analysis and show that the restricted partition function converges to something that's smaller than IRS. And what this means is that with high probability under the Gibbs measure this sigma would be close to, or so sorry, the squared difference between sigma and the plan of sigma x star would be close to this value two a to star inverse. Both in high probability and then also an expectation, right? And then if you apply an Ishimori identity sigma and x star would each be then roughly one a to star inverse away from the mean of the measure by this Pythagorean relation. And so you get that x star in particular its difference to the posterior mean of the distribution is roughly a to star inverse. And so yes, which implication to this thing? Yeah, it's because the log partition, so the full log partition function converges to XIRS. And if I restrict away from this set of points it converges to something that's less than XIRS. So the Gibbs probability of that restriction from that point is exponentially small, right? So the total Gibbs measure of that large shape is something exponentially small on n. And then so you actually, I mean this, you get a much more quantitative version of the statement that the probability is actually exponentially small. Okay, good. And then finally, for the TAP equations we again prove that the vector AMP iterates converge to the posterior mean of this model. Here the argument I think is simpler than in the spin glass case because we can use Bayesian types of arguments to do this. So we can just observe the following that if you have any estimator of your true signal then by this Pythagorean relation with the Bayes estimator, so the square difference between any estimator and true signal is equal to the square difference to the Bayes optimal estimator plus the square difference of the Bayes optimal estimator to the true signal. This is because the Bayes optimal estimator is the O2 projection of the true signal onto the space of measurable estimators. So you have this relation and then if you analyze the state evolution of vamp as an upper bound, actually this is already known that vamp achieves an error that's conjectured to be Bayes optimal. And then if we were able to show that the Bayes optimal MSC indeed coincides with the error achieved by vamp then from this Pythagorean relation, what that implies is that the difference between the vamp estimate and the posterior mean actually goes to zero in the limit ST host infinity and this gives a very simple argument for convergence to the posterior mean. And by design the vamp iterates converge to a solution of tap equations and therefore the posterior mean satisfies the tap equation. Okay, so let me conclude. So I guess one sentence summary of this talk is that predictions of the replica method and high temperature kinds of expansions for these mean field models with rotation invariant disorder can be shown rigorously using this approach of conditioning on the filtration of tap style iterations. And let me conclude with three open questions. So one open question that I know a lot of you guys in this room would be interested in is the validity of this replica symmetric prediction in the linear model without this high temperature restriction, right? Of course it's also an open question to analyze this SK kind of model down to the actual right temperature threshold but that feels harder to me than this question because there seem to be many more techniques that are available in the linear model. So this is one question. And then the latter two questions are maybe related to each other so I can say that when we first started thinking about doing these conditional moment analyses, we actually, we weren't trying to do this conditional second moment analysis. What we were actually trying to do at the start was to see if these conditioning ideas can be used to actually try to make rigorous the high temperature expansion itself that was done by PLEFKA and Parisian potters. And we failed at certain points on how to do this. And so we settled for the second moment argument. But my feeling is that maybe something can still be done there and I'm happy to chat about this if anyone's interested. And then related to this maybe is a question of the universality of these results, the free energy, the mutual information, the base optimal MSC. So Rishabh on Tuesday gave a very nice talk about the universality of these AMP algorithms. And what that would imply is universality of the error that's achieved in these algorithms, right? So if you were to apply AMP to do estimation that's say in this linear model with a design that's much more structured and rotational variant, that result would show that you get the same limiting error as you would in the orthogonary variant design. But whether that limiting error is the best possible one, so how to prove universality of the corresponding lower bound in these kinds of problems I believe is still open. And so this is something else that perhaps people here would have ideas on how to do. Okay, that's all, thanks. Thank you very much, very nice talk. Okay, my question is that is it easy or not to extend your method for the generalized linear model? Yeah, it's a good question. We haven't, I think it'd be very interesting to look at. We haven't looked into it. My feeling is that if you introduce an auxiliary field variable to represent A times X and then that something can be done but we haven't really pushed in this direction. Yes, last year that Takahashi and I published a paper, actually this is a replica method, so not rigorous. So I want to know how can we prove it is correct? Yeah, I think it'd be interesting to look at. I mean, you do also have this generally, you have a GAMP kind of GAMP vamp algorithms for these models that are rigorous already. So I think it'd be interesting to look into. Thank you. Thank you, Zhu, for this very insightful talk on technical subject. It was very nice. I would like to understand a bit better when you look. So I'm really not familiar with this second moment type of arguments, but there is a plethora of message passing type of algorithms for each of these problems. What's the motivation of picking one rather than another one in this proof? So it's not clear for me why in a model you took this memory free tap. Why is that useful? And then you consider the vector imp. No, that's a great question. The reason we chose these algorithms is because they're the simplest ones and the analysis would become horrendously complicated if you look at other algorithms. I mean, okay, so for example, for the orthogonally varying spin glass model, we analyzed this memory free algorithm of Chuck Mock and Oprah and there were previous algorithms developed by Oprah Chuck Mock-Vinther that used single step memory kind of structures and things like this. There are some differences in the convergence of these algorithms, like for example, the latter one might not converge in the entire conjectured high temperature regime. Here we're only restricting this to some very high temperature setting, so maybe that's not super relevant, but I think the most important factor for it, like you need to be able to do these computations, you need to be able to do this conditioning and then pass to this limit and understand these quantities and the computations would become, I think quite tedious and challenging if we were to use some of these other algorithms that have this long memory structure and all of these free cumulons baked into them. So we never really made an attempt to use these other algorithms because these algorithms with much simpler structure are available. I don't see morally why if you were to really try to do the analysis using one of these other algorithms, I guess I would imagine that the proof should still go through within an enormous amount of work, but yeah, I don't know. And a second question is more technical. If it's too long, we can skip for a later discussion. If you come back to a slide where you were showing this nice relation of equality in low between a projection of matrix by an arm matrix, you have this conditioning on a linear system. And then you, yeah, I think it was done. Yes. Yeah, so the next slide, you use this here, yes. Would like to understand why at the level of the second line when you use this identity, this expression is simpler to deal with because you have a similar structure in a sense before where you have this quadratic form. Yeah, no, no, it's not that it's simpler to analyze than the first line. It's that if you don't do the conditioning and you just compute the unconditional first and second moments, you're not gonna get you're not gonna get the right form of the free energy. The log of the unconditional expectation of Z, if you do that computation, it's not gonna give you the replica symmetric prediction. And you need to do this conditioning in order to recover the right result. And that makes the computation more complicated. It doesn't make it simpler, but somehow you need to extend the computation this way in order to recover the result. And if I can, last one, okay. At some point you gave two conditions on the validity of when you're formula. I think it was for the linear system. You had two inequalities to be verified for. Can you say that again? You had two conditions for the validity of when you're, of your results. I think it was for the linear model. You had the, there were two inequalities. Two conditions you provided for. You mean, do you mean in the spin, do you mean the conjecture conditions for the validity of this? Yes, exactly. Yes, thanks. What's the meaning of these conditions? Are they related to fundamental phase transitions or are they technical? I think the first conditions is sort of standard stability condition of the replica symmetric solution. If you look, if you do a local stability analysis, this is the AT condition. The second condition I don't have much understanding of. I mean, it's, it's so, I guess one answer to this is when you evaluate the spherical integral, okay. If you evaluate, if you evaluate this kind of spherical integral, you get this infimum over some parameter. And I guess in some cases the infimum is attained at the boundary. Yes. And in some cases it's attained in the interior. It's related to the sticking. I think it's related to this. The sticking transition in the, in the spherical integral where the, the vector like kind of sticks to the top by a, again, vector I guess. Yeah, I think it's related to this. Where this condition is, you know, it's, it's saying that when you evaluate this you get something that doesn't stick to the boundary and that the infimum is attained in the interior. And again, under sufficiently high temperature we can show that it's always attained away from the boundary. But I don't actually know. I think maybe, maybe my friend would know what's, what's the belief about the replica symmetric behavior if the second condition is not verified? I think John's question maybe is what's the, what's the interpretation of the second condition here that this parameter is in the domain? Like if this condition is not satisfied then what is the conjectured behavior of the model? Okay. So, I don't know either. Yeah. Okay. Let's thank Zoe again for her talk. Thank you.