 OK, slides. It works perfectly. So thank you all, the thanks to the organizers for the invitation. It's really been a lot of fun to participate in this conference over the past week. In particular, I don't think I've met a lot of you guys before, but I recognize a lot of names on papers that I've been reading. And so it's nice to be able to match some of the faces to the names. So this is a short talk today. So what I want to do is just to essentially describe or advertise two related results that we've obtained in the last year or so. One will be about approximate message-passing algorithms. And then the second will be about an application of this to spin glasses. And this is joint work with my colleague Yi Hongwu at Yale. So let me begin. The version of AMP that I'll talk about today is just the simplest symmetric square version. So what's usually meant by an AMP algorithm is something of the following form. I take an iterate ut. I multiply by a given symmetric matrix w. And then I subtract from this product w times ut, some correction of bias to get an iterate that I'll call zt. And then to zt, I apply some nonlinear function, let's say entry-wise to zt, to get the next iterate ut plus 1. So the defining property of AMP and what makes it AMP is this bias correction. And the purpose of this bias correction is to simplify the dynamics of the algorithm and to make the iterates describable in a relatively simple form by a simple Gaussian state evolution. And so in the usual forms of AMP, as introduced by Kabashima and Dhanahoma like Montanari, the form of this bias correction has a particular quantitative form. It's given by a scalar bt times the iterate ut minus 1, where bt is the empirical average of coordinates of the derivative of ut. The design of this form is to ensure the following very special property, that if w has IID sub-Gaussian entries, let's say up to the symmetry constraint, then in the large system limit, the empirical distribution of entries of zt has a very simple characterization. It's going to have a Gaussian limit, a normal limit. And this variance sigma t squared is a deterministic scalar parameter that's described by what's called the AMP state evolution. So this kind of statement was first proven for Gaussian w by Bolt-Hausen and Bayati and Montanari, and then universality arguments have extended this to sub-Gaussian w, including notably a recent paper by Chen and Lam last year. So what I want to do in this talk today is to talk about versions of AMP for non-IID matrices w. And this is related to the topic that Cedric talked about at the start of the session. So specifically, the model I want to consider is a rotationally invariant model. And what I mean by this is that if I write down the eigen decomposition of w as O transpose times lambda times O, then I'll assume that the orthogonal basis of eigen vectors of w is hard uniform and random. So it's completely uniformly random on the orthogonal group. But lambda can essentially be an arbitrary diagonal matrix of eigenvalues. And more informally, what this assumption is saying is that I require my eigenvectors to be generic, but the eigenvalues can be essentially arbitrary. This kind of an assumption is very similar to the setting of a class of AMP algorithms called vector AMP or orthogonal AMP that's been studied previously in the context of linear models and GLMs. This specific class, these vamp or OMP algorithms, they're sort of special versions of AMP where we're applying AMP to a non-linearity that doesn't actually require a debiasing or memory correction. What I'll talk about today is something slightly different. I'm going to talk about a, perhaps a more general purpose AMP algorithm that was derived previously and worked by Oprah, Chuck, Mock, and Wienther in 2016. They derived this using non-rigorous dynamical function theory techniques, which we heard a lot about on the first day of this workshop. And the result that I'll describe is a rigorous analysis of this algorithm. So before getting into this algorithm, let me just say one quick word of motivation. So I started to work on this problem back a year or two ago because we were trying to apply AMP to do PCA for gene expression data. So what you're seeing here is a very quick example. On the left is a typical PCA analysis of a gene expression matrix from a single cell RNA-seq experiment. Each point corresponds to the embedding of a cell in this space. And this is a very high-dimensional application. So we expect that there's a lot of noise that you observe when you just do PCA. And so what we're trying to do is to combine AMP with very classical empirical Bayesian statistical ideas to estimate and underline prior distribution to these principal components and then apply Bayes AMP to denoise Torres's prior. So I won't talk too much about this work today. It's joined with two wonderful graduate students, Xinyi Zhong and Changsu at Yale. And you can find details in this reference. What I want to say, though, is that the version of AMP that we applied in this work was the sort of standard version where you assume that your noise is IID entries, IID Gaussian entries. But it's very clear to us that in this application that assumption is not actually exactly correct. And one symptom of that not being correct is that if you were to plot a singular value distribution histogram of just the singular values of your gene expressions, it has a wider spread or an over dispersion compared with what you would expect from the prediction of just white noise. And so this kind of rotationally invariant model that I'll talk about today, I think of it as a sort of a way of directly modeling this kind of over dispersion, which I think is quite common in data. So it's a way of directly describing the spectral distribution of your noise but still preserving the generic property of the singular vectors of the eigenvectors. So let me explain the form of the algorithm. If I give you a rotationally invariant matrix W, I claim that the general form of the AMP algorithm needs to be the following. You take your iterate UT, you multiply it by this matrix W. And then the main difference with the standard AMP is in the form of this debiasing correction. You need to apply a different debiasing correction where you subtract off a bias in the direction of each of your preceding iterates, U1 through UT. And then after obtaining this iterate Z, you then can apply a nonlinear function to get your next iterate UT plus 1. In certain applications, I think it's useful to apply UT plus 1 to all of your preceding iterates, Z1 through ZT, not just the immediate preceding 1 ZT. And so we also explore this kind of extension. To describe to you the form of this debiasing, what are these coefficients, BT1 through BTT, I can explain it in the following way. So let me consider all of the partial derivatives of these functions U in their arguments. So I take each function U and I look at its partial derivative in each of its arguments. I look at the empirical average of coordinates of this partial derivative, and I collect these into a matrix. I call this matrix phi. So the rows of phi are indexed by your functions U1, U2, U3, et cetera. And the columns are indexed by the arguments in which you're differentiating. And then because each U depends only on the preceding Z iterates and not the ones after, this phi is going to be lower triangular. It's going to have a natural lower triangular structure. And the coefficients in this AMP correction are read off from a certain series expression of phi. So if I consider kappa j to be the jth free cumulative of the spectral distribution of this matrix W, then I can construct a matrix series in powers of phi where the coefficients are exactly these free cumulants. This is equivalent to the r transform of W applied of phi. And this matrix series will also be lower triangular because each phi is lower triangular. And I claim that the correct debiasing coefficients will be read off as the last row of this matrix of this series in phi. So what I mean by the correct debiasing is the following, that if you apply the debiasing in this way, then this kind of an AMP algorithm will be rigorously characterized by a Gaussian state evolution of the same sort of type as you get for the usual AMP algorithm. To describe the state evolution, I can explain it in the following way. So let me define an additional matrix delta, which is the pairwise inner products between your iterates of AMP, UR, U.S. And we can consider a second matrix series sigma, which again has the coefficients of the series, the spectral free cumulants of W, multiplied by these matrices theta j that are polynomials in phi and delta. The specific form here I don't think is super important for the rest of the talk. And the result that we obtained is the following, suppose that your matrix W indeed has a horror uniform eigenvectors, O. And the empirical spectral law of W converges in the Wasserstein's P space for every order P greater than equal to one. And then every function you apply in AMP is nonlinear, weakly differentiable and has derivatives of our most polynomial growth. Then almost surely in WP, again, for every order P greater than equal to one, the empirical distribution of these iterates, Z will have a Gaussian limit. Or more specifically, if I look at the first T iteration, Z1 through ZT and the rows of these iterates stacked, so the rows in RT, they're gonna have a T dimensional Gaussian limit where the covariance is exactly this matrix sigma. And this result is a formalization or a proof of the description of this data evolution that was obtained non-regirsty by Albert Schachmacher in 2016. So let me just say one word about the analysis since this is a short talk. So the way we do the proof is not using the kind of dynamic functional theory approach of Albert Schachmacher-Winther. We apply this conditioning argument of Bothausend. And the argument is the following. So it's essentially an induction argument. What you wanna say is that if the state evolution holds up to a particular iterate T, then conditional on the AMP iterates up to iterate T, when I analyze iterate T plus one, this data evolution is gonna continue to hold up to T plus one. And the kind of barrier that you run into when you try to do this induction argument in this case of rotationally varying models is that the induction hypothesis is not enough. That when you try to analyze iterate T plus one, it starts to depend on things that are not characterized by the state evolution up to iterate T. So you can't really close the inductive loop. And so I think the main insight in this analysis or the proof is to identify what is the set of quantities that you wanna characterize up to each iteration of AMP to allow you to be able to continue to characterize the next iteration in order to close this inductive loop. And I claim that the set of quantities that you wanna characterize are the quantities of the form shown here. I can take any two fixed iterates of AMP, let's say you are in US, and I can take any integer power of W, this power is K, and I wanna look at the quadratic form UR transpose W the K times US. And in the large system limit as AMP goes to infinity, this quadratic form turns out to have a clean characterization and it allows you to complete the induction. So the specific form of this limit is again defined by a matrix series in these matrices data J for my preceding slide. The coefficients of this series here are a certain combinatorial interpolation between the spectral moments of W and the free cumulants of W. And what I mean by this is these coefficients are indexed by K and J. If I take J to be very small, J equals zero, then what you get are just the moments of the eigenvalue distribution of W. And if I take K to be very small, K equals one, then what you get are the free cumulants of W. And as you reduce K and increase J, what you get is a combinatorial interpolation between the moments and the free cumulants of the spectral law. To give one more word of detail on what this interpolation is, the moments in the free cumulants of the spectral law of W are related through a system of moment cumulant relations where the moments are given by a sum over the lattice of non-crossing partitions of products of the free cumulants. And these coefficients C, K, J here are certain partial sums where I take subsets of this lattice and I sum only over the subsets of the lattice. And so by doing this kind of interpolation between the moments and the cumulants, we can characterize arbitrary quadratic forms of this type and this allows us to close the inductive loop and to complete the proof. Okay, so this kind of an AMP algorithm you can potentially apply to a lot of different statistical and probabilistic problems and models. What I wanna describe in the second half, the remaining time that I have in my talk is an application to what I think is probably the simplest model in this class, right? And the simplest model is gonna be a spin glass model. I'll consider a probability distribution on the binary hyper cube. So it's a distribution over a vector sigma and plus one minus one to the n. And the Hamiltonian of this distribution is gonna consist of two terms. There's gonna be a couplings term that describes the interaction between these spins of sigma. And the couplings term is defined by this matrix J and J is where I'm gonna assume is orthogonally invariant, right? So J, if I write down the eigen decomposition, I'm gonna assume that the eigenvectors are hard uniform over the orthogonal group. There's a specific reason why I'm not using the same notation W and lambda as before and I'll explain this a few slides from now. And then the second term of this Hamiltonian will be the external field term. So just H transpose sigma is gonna bias the mean towards a particular direction. What I'll be interested in is a study of the free energy in this model. So Z here is the normalizing constant and what I wanna look at is log Z. There's of course a very special case of this model for which we have a very deep understanding. It's the case where J is a Gaussian matrix GOE. In that case, this is the celebrated sharing to Kirkpatrick model. And of course we have very deep results about the behavior of the free energy at all temperature regimes. What I'm gonna talk about today is just a much simpler setting where beta is small. So I'm gonna restrict only to the high temperature regime but I wanna consider this model for more general J. So this general class of J that's orthogonally invariant but that might not have independent entries or a Gaussian distributed entries. And in this kind of setting, it seems at least to me that a lot of the techniques that have been developed to study SK don't seem to directly generalize to this kind. So the result that I wanna explain is about the limit of this free energy. And there's a prediction for this limit in statistical physics which is the replica symmetric prediction for high temperature. Let me assume that the empirical spectral law of J or equivalently of this eigenvalue matrix D converges to a limit mu D and the empirical distribution of H also converges to a limit mu H. From these limits, I can define two scalar values Q star sigma star squared by this pair of fixed point equations. Here capital G is just a standard Gaussian capital H is generated from this limit mu H. And it depends on the limiting spectral distribution of D through the R transform of mu D. So this pair of fixed point equations here if you're familiar with SK, it's the same as what you see in SK except with this additional term that's the derivative of the R transform. In SK this term would be one. And then the conjecture is that at least up to a certain temperature threshold and there's a threshold that's analogous to the AT threshold for the SK model. It's conjecture that this free energy converges to this replica symmetric formula that you see here. Some version of this formula was first stated by Marinati-Parisian retort in 94. And then this specific version of the formula that you see here you can find in the adaptive type paper of Oprah and B. in there in 2001. So this was first derived using the replica method. And the result that I want to explain is the following. So we're able to prove the validity of this prediction in the high temperature in a regime of very high temperature. So the statement is that if this convergence to this pair of distributions mu D and mu H occurs in Wasserstein P for every order P greater than equal to one. And in addition to largest and smallest eigenvalues of this couplings matrix J converge the endpoints of the spectral support then for some temperature threshold beta naught depending on mu D and all beta less than beta naught to this replica symmetric prediction is correct for the free energy. And the connection of this to the first part of my talk about AMP is that the proof for this result that we use it applies a conditional second moment idea introduced by Botaus in a 2018 where we analyze an AMP algorithm for solving what's called the TAP equations in this model. And I'll explain this on the next slide. Can I do a quick time check to see how much time I have left? Forgot to start by stopwatch. If I'm completely honest, I don't know but I can say a random time. Let's say minutes. Probably around minus one, but you're good. Okay. Let me maybe say a few things and then I'll conclude that in the case. So what are the TAP equations in this model? So for sufficiently high temperature, small beta the mean of this distribution is conjecture to satisfy this system of mean field equations the TAP equations for SK this was introduced by Dallas Anderson and Palmer in 77 and in the context of this kind of rotation in variant model is introduced by Parisian potters in 95. I believe this is still conjectural at any temperature. And the way we do this proof is to study a particular AMP algorithm for solving the system of TAP equations. This algorithm was introduced by Chuck Mock in Oprah in 2019. The algorithm takes the following form. You take your iterate UT, you multiply by W there is no bias correction in this AMP algorithm. You just take this product that's your ZT and then you apply this particular nonlinear function UT plus one to ZT to get your next iterate. Okay. What is W here? So this matrix W is not the coupling matrix W is a particular resolvent of this matrix J where you define the resolvent at a point lambda star that depends on your fixed point Q star asymptotic overlap. Let me say a couple of words about the structure of this algorithm. So this algorithm doesn't have the memory or bias corrections that you typically see in AMP. And this is because this nonlinear function U is designed such that its derivatives all converge to zero. So in my previous notation, all of these matrices five would be zero you can set in the AMP algorithm. This kind of structure is very similar to the structure of vector AMP and linear models. And you can derive this kind of algorithm from a graphical model derivation using the expectation propagation approach of Minka and it's very similar to the derivation of that. So you have these vector value nodes and you do message passing back and forth between these nodes. However, this kind of general analysis that I talked about in the first part of the talk reveals a sort of very magical property about this kind of AMP algorithm. And it's the following. If I were to look at the system of quadratic forms that we characterize in the sort of key component of the previous analysis, this system of quadratic form simplifies considerably when you look in algorithm of this form. So if I look at the quadratic form defined by U R transpose W the K U S, this limit behaves as if your U vectors were completely independent of W even though they're defined iteratively to have this very complex dependence on W. This limit is just the limit in our product between U R U S times the limit ratio moments of W. And it's really a sort of miraculous property that we're using in this proof of this algorithm. If you pass this kind of property through the functional calculus and the Verstra's polynomial approximation, you get the following kind of lemma for any smooth for any smooth function F if I apply a spectrally to W and I look at this quadratic form, it's gonna behave as if the U vectors are independent of W. So it's a special property of this kind of divergence free AMP algorithm. So to conclude then the high level structure of the proof let me just explain on one slide. What we do is to consider the sigma field defined by the iterates of this algorithm up to some large but fixed iteration that's a 100 iterations of AMP. Let me call this sigma field GT. And then the approach that was introduced by both house in 2018 is to study the conditional moments of your partition function excuse me Z conditional on the sigma field. So what we can show is that the limit of the log of the conditional first and second moments of Z converge exactly to ZIRS and to ZIRS plus a discrepancy that's gonna vanish as you take more and more as you take more and more AMP iterations. So this kind of a property was established by both house in 2018 for the setting of the SK model for Gaussian couplings using his original AMP algorithm for SK. And what we did in this work is to extend this kind of analysis to this sort of divergence free using a P algorithm of Chuck Mocking over. The analysis in our setting consists of essentially two high level steps. One is to use the AMP state evolution in large deviations techniques to give low dimensional variational formulas that characterize the above two limits. And then we can do a global analysis of the solutions to these variational problems provided that you're in sufficiently high temperature. So it's in the second step that we're really using crucially this assumption that it is. Let me conclude with just a couple of remarks. So in the first part of the talk I described a general AMP algorithm. The form of this algorithms is sort of completely general, it's quite general. However, if I give you a particular statistical problem designing what is the right AMP algorithm might not be trivial. And I think Cedric mentioned this in his first talk, right? So there's interesting work to be done here for a lot of problems. And then the second comment that I wanna make maybe is that this conditional second moment approach that we use to analyze this spin glass model. I feel like it should be extendable to say something about the conjecture replica symmetric mutual formation, this sort of vamp linear regression model. So this is a conjecture that goes back to Takeda, Uda and Kabashima, no six and Tulino, Kervirdu and Shamai in 2013. There's been some partial proofs of this conjecture in certain settings by Galen Reeves in 2017 and Barbietto in 2018. I don't know exactly how far this technique should be able to extend in vamp. What is the range of signal to noise ratios, for example, that you can try to characterize using this approach? But I feel like it should be able to say something. These are the two references for the stuff I talked about today. Thanks for your attention. The real noise this time. Thank you. Thank you.