 So our first speaker is Clément Leno from EPFL who will tell us about tensor estimation and I'll let you jump right in. Hi everyone, can you hear me well? Yes, yes, yes. Okay. Okay, so today I will talk about joint work with Nicolas Macriss on tensor estimation with structured priors. So I will present the statistical model for tensor estimation that we consider and I will spend a few slides presenting existing results before presenting our own results. So the statistical model is a classic one. You have an n dimensional spike x and with this spike x you form a symmetric rank one tensor. So here we will consider a second order tensor like a matrix or a third order tensor. Your tensor is a tensor with components xi, xj, xk and you multiply this tensor by some signal to noise ratio parameterized by lambda. And what you observe is a noisy version of this symmetric rank one tensor where the noise is some additive white Gaussian noise. And then the question you ask yourself is how can you estimate the spike x or the underlying rank one tensor. So this problem is no well studied when understood information theoretically and algorithmically, especially for a spike whose components are IID with respect to some prior distribution px. The large and Milan proved a formula for the minimum mean square error in the I dimensional regime and necessary and collaborators studied the approximate message passing algorithm in the I dimensional regime. The performance of amp is tracked by the state evolution equations. And so there's one interesting change of behavior algorithmically when you go from matrix estimation to tensor estimation. So if you have a center prior, for example, IID Gaussian prior for matrix estimation, you will observe that the minimum mean squared error is trivial and maximum for a low SNR until some threshold and pass this threshold it will decrease to zero. And there's no discontinuity at the for the MMSC at the threshold beyond which the MMSC is non trivial. And it also turns out that the amp estimator, its error matches the error of the Bayesian estimator so it matches the MMSC. On the contrary, if you know consider tensor estimation, you will see that you still have a phase transition for the MMSC, but this phase transition is the first order phase transition so you have a discontinuity. And above the threshold, the error of the amp estimator remains remains trivial and maximum. So in the case of matrix estimation for IID Gaussian prior you don't have any algorithmic gap why for tensor estimation you have an infinite algorithmic gap that appears for matrix estimation you can also have algorithmic gap but they will be finite. For example, if you consider a Bernouy-Rademacher prior, if the sparsity of this prior is low enough, you will have an algorithmic gap that will appear. So this picture is important and I will come back to it later but first I will present another kind of prior that has been considered recently for matrix estimation. So data in nature has some structure and we would like to have a model that take into account such structure and you would like to exploit this structure to see if it's possible to do better algorithmically. And especially one observation is that often I-dimensional signal effectively lies on a low-dimensional manifold. So recently, Benjamin Aubin and collaborators proposed to study a prior that is given by the output of a generalized linear model. So the structure come from a generative model. So what you would have is you would have a p-dimensional Naton vector whose components are still IID with some prior PS. But this Naton vector is multiplied by some sensing matrix with whose components are standard Gaussian and once you have done this linear operation you will pass the vector W times S through an activation function phi that is possibly non-linear. So S might be IID but the components of X will be correlated with each other. So this model was proposed by Aubin collaborators and they studied such prior in the context of matrix estimation. So they studied this model in the I-dimensional limit where the ratio alpha equals n over p is kept fixed. So n is the size of the signal and p is the size of the latent vector and this ratio is kept fixed. And they make the interesting observation that for all the cases they study, the MMSC as a function of the noise variance delta, which is the inverse of the SNR, doesn't undergo a first order phase transition. So the MMSC is continuous and there's no algorithmic gap with generative model priors. So it's an interesting observation and I was wondering what happened with such generative model priors in the case of tensor estimation, especially I was wondering if we can leverage such generative priors in tensor estimation to have a finite algorithmic gap when the prior is centered. So in the remaining of this talk, I will present formulas for the asymptotic mutual information and the asymptotic minimum mean squared error for tensor estimation with this generative prior. And I'll use these formulas to visualize the MMSC for different settings and we'll see that, unfortunately, the infinite algorithmic gap persists. And in the end, I will look at this I dimensional regime in the limit where alpha goes to zero. In that case, you can write a simplified equivalent model with IID prior. And so this equivalent model is also valid for matrix estimation and it will help us exhibit activation function for which you also have an algorithmic gap with matrix estimation. So first some theoretical results. So the model we consider is the following you have a tensor y with components y IGK is given by square root of lambda over n times Xi XJXK plus some additive white Gaussian noise Z. And so the components of the spike X comes from a generalized linear model. And what you can show is that the mutual information between the spike X and the observation Y given the sensing matrix W normalized by n will converge to a simple variational problem in the limit where n over p is fixed equals to alpha and n goes to infinity. And so this variational problem is simply the extremization of a potential function over three variable QX QS and RS. And the potential function is the sum of two mutual information involving scalar random variables and some polynomial in the variables. So once you have this result, you can use the IMMC relation that links the derivative of the mutual information to the minimum mean squared error and you will obtain a result for the asymptotic minimum mean squared error. And you find that for almost every positive lambda. In the I dimensional regime, the error of the Bayesian estimator of the third tensor power of the spike X. So the MMSE will converge to row X cube minus Q X star cube. Where row X is the second moment of the of the components of the spike X. And Q X star is Q X achieving the minimum in in this variational problem. So now that we have this result, we can visualize the MMSE. So to visualize the MMSE, you want to extremize the potential you do so by writing the critical point equation for the potential. It gives you a fixed point equation for the parameter QX QS and RS. And you can iterate this fixed point equation starting from different initialization. And in the end, you will keep the fixed point that yields the lowest potential and you will use this formula to compute the asymptotic MMSE. And so there's one interesting config. There's one interesting setting. It's when the activation function phi is odd and the prior PS is centered. In this case, you have a fixed point QX equals zero. That is, that we call an informative fixed point. It's called an informative because when Q X star is zero, the asymptotic MMSE is trivial and maximum equals to row X cube. So for matrix estimation with generative prior Obama and collaborators could could observe that in in the settings they consider this fixed point becomes unstable. Once you pass the information theoretic threshold, and it means that there's no algorithmic gap for the configuration they study. In the case of transfer estimation with generative prior this fixed point is always strongly stable. And so you have exactly the same algorithmic behavior than for math than for transfer estimation with IID prior, meaning that you have an infinite algorithmic gap. So that's what you see here where I've plotted the MMSE for linear activation function and for Gaussian prior PS for different values of alpha. And the red dashed line is is the algorithmic prediction for the error. And you see that it remains trivial and maximum. Otherwise information theoretic theoretically you exactly have the same behavior than for matrix estimation. For example, the information theoretic threshold above which the MMSE becomes non trivial decreases with the ratio alpha. And you exactly have the same kind of pitch picture if you know you take activation function that is not linear but has a non linearity like a sign function. And so for the for the end of this talk, I will consider the limit when alpha goes to zero. So when alpha goes to zero it's possible to determine what is the limiting curve for the minimum mean squared error. This is a blue dashed curve here. And so what you can show is that when alpha goes to zero the asymptotic mutual information is given by a simplified variational problem. It's a variational problem only on the parameter qx. And this is exactly the same kind of variational problem that you would get with transfer estimation with IID prior. And there are two cases. If the prior of the latent vector PS is centered. Then you have then the equivalent problem is transfer estimation with the IID prior. And the prior distribution is the same distribution as a Gaussian random variable of variance OS passing through the activation function phi. If no the prior PS is not centered you have to take into account some side information V but the proof by the large and new land can be easily adapt to take into account the side information. And you can show that you have this variational formula in the asymptotic limits. So now you can use the simplified. You can use the simplified problem and we will come back to the matrix estimation. So for matrix estimation you exactly have the same kind of limits when alpha goes to zero. And so you see that you can choose activation phi to obtain any equivalent IID priors that you want when alpha goes to zero, including a prior exhibiting an algorithmic gap. And so an example of such activation function would be a function phi of X that is equal to the sign of X if the absolute value of X is greater than epsilon and zero otherwise. In that case, matrix estimation with generative prior is equivalent to matrix estimation with the IID bernouir ademature prior when alpha goes to zero. And you see that if epsilon is large enough this bernouir ademature prior will be sparse. And you will get exactly the kind of picture I showed you earlier with an algorithmic gap. So in fact, even for matrix estimation with such kind of structure you can have algorithmic gap. However, here I considered the regime alpha goes to zero so it means that the dimension of the latent vector is huge compared to the dimension of the signal. So it does not correspond to the I-dimensional, it does not correspond to the setting of the I-dimensional signal X lying on a lower p-dimensional space. So as a future work I'd like to look at what happens when alpha increases, especially does the algorithmic gap vanishes or does it completely disappear when alpha increases. And I guess it's time to end here. So I would just finish with a few references, especially the work by Aubin and collaborators, the spike matrix model with generative priors, because in this work they also have way more algorithmic consideration and they propose also linearization of AMP to estimate the spike. And I would like to point out to the following three prints, transfer estimation with structured priors, if you are interested in the proof of the asymptotic mutual information and the asymptotic minimum mean square error. Thank you.