 It's a pleasure to introduce Grant and Pony, Gerard Benavuz, who is going to speak about Spike PCA model. Thank you. Spike Tensor PCA. So of course, it's a model, a question I alluded to in the lecture I gave, the cross program thing I gave on Monday. So let me remind you what the question is. So can you read? So PCA means principal component analysis. And I'm sure you know what it is for matrices. So the question here for Tensor is the following. You fix a point, n, in the sphere. We're in dimension n. So that's the unit sphere. And this point, you don't know it. It will be the spike. And this is what you want to discover. But you're not given this point. So you're given this, let me call it p. So you're given the tensor, the p tensor given by this vector. And so of course, if you just had that, the problem is solved. But this tensor is what you see, a noisy observation of this. To this, you add z, which is a p tensor, which is a random p tensor. And to simplify, of course, you assume it to be Gaussian. And all the entries to be in 0, 1. So the question, so maybe now here you have a number, let's say alpha. This alpha will be the signal to noise ratio. So if alpha is 0, you have no signal. Alpha is large. Then you can consider the noise is small. And the question is, you observe this, you want to understand n. You want to find n. So this is unknown. And you have this noisy version. And you want to denoise. So you want to understand n. So and of course, the signal to noise ratio will play a role. All right, so the way you do that, so I won't spend much time on the statistics aspects of it. But when you have this, in particular if this thing is Gaussian, you do, of course, a maximum likelihood estimator. And when you do that, maximum likelihood estimation impose that you have in the end to what you want to find is the maximum of, let's say, u times this thing. So in the end, let me write it. In the end, what the maximum likelihood problem ends up doing is you have to find the max for the u in the sphere, here the unit sphere, of alpha u in a product with this pole, n, I call it n, for me it's north pole to the power p, plus the sum of the coefficient, let me call them again jy1, jip, of ui1, uip. Well, these j's are, in fact, simply, I should call them z, but they were called j in any spin glass theory things or in the lecture I was giving before. So these are the coefficients of the tensor z, right? The tensor z is just this matrix. OK, for the moment, let me call it z so that this is coherent. And so these guys are iid and n01. So the statistical estimation problem, the maximum likelihood end up being doing something like this, OK? And so the question is, how do you find this maximum? And can you? So of course, here I have a function, let me call it phi of u. So this function here is a random function, a random Gaussian function, on the sphere. And I may be interested in it, so on this function, to understand what is its complexity. As I described in the lecture of Monday, I've come back to that for those who are here on Monday. So what is the complexity of this function? And what we'll see is that depending on the strengths of the signal to noise, we will have three regions. One where you can recover the signal very well. Complexity is not important. One where you cannot recover the signal at all, essentially. And the complexity beats everything. And one where you can recover a part of the signal and describe the complexity there. So before I go there, and this doesn't look like a random matrix model at all, of course. But so before I go there, there are two simpler cases that you may want to understand. So what I will be describing here is joint work with Mihail Nikon. In the order, it's Andrea Montanari. I'm trying to put the alphabetical order. Mihail Nikon and May. So Mihail is third. All right, so two simpler cases. First, what happens when alpha equals zero? When you have no spike, when you have pure noise. So I'll come back to that. That I described a bit last time. And second, what happens when p is 2, which is much simpler? When p is 2, this problem is the usual PCA. Usual PCA. So let's look at what p2 means. So when p is 2, here you just have a quadratic form. Or let's say here, you have a matrix to which you add one spike. So this falls under the classical question now of understanding what happens. And now we have a random matrix. So this is well understood. So let me maybe start with this. So the question is what happens here, the general question, is what happens for a random matrix is there when p equals 2, you have a random matrix plus a finite rank perturbation. Here, the perturbation is of rank 1, right? So what happens in particular for the spectrum? So this is, of course, in statistics, it's a typical. So let's say I have a random matrix m properly normalized, and let's say random, and I add a, which is finite rank. Let's say rank 1 or 2 to this. So that's, of course, the usual. It's called PCA in statistics. When this matrix is in fact the sample covariance matrix of a sample, and you have spike because you assume that your true covariance structure is flat with maybe one or two direction in which there is signal. So PCA is this principal component analysis is very simple. You just look at the spectrum of this. You take the singular values. You look at the spectrum of the wishart matrix, and you try to see if you have some eigenvalues that get out of the spectrum. So this has been studied at length. So when this is a wishart, in the wishart case, but also the Wigner case. So let's say this is a Wigner matrix, and this is a finite rank, let's say, for the moment perturbation, deterministic. Let's say rank 1 for the moment. And so the statistical question is, can you have two different questions? In fact, more. One question is of the type, which is there are a lot of variants. I'm doing a bit fast, but so that I just need that to be able to talk about it. In the tensor case, you have detection, and you have recovery. Two different questions. So detection is you observe this, and you ask yourself, so A n has a rank 1 matrix. If you want to think of something simple, think of A n to be, I don't know, or I put an alpha, so let me put it here. Alpha would be just A n would just be this matrix. Matrix was just rank 1 with the 1 here, for instance. And you ask yourself, can I detect in what I observe, this noisy version of this thing, can I detect the spike? Can I make a difference between this and the case where alpha equals 0? So that's detection, and there are different variants of defining detection. But globally, it means, can you detect the signal? Can you detect that there exists a signal? And the recovery is, of course, and again, you have weak, strong recovery. Recovery means, can you find the signal, or some good approximation of the signal? So all that is, that's an example, you could do whatever. So all that is what I understood. It goes back to, in the case of random matrices, to a paper by Beik, myself, and Sandrine Péché, which is now old, I think it's 7.6, which did that and then has been expanded a lot in all sorts of directions. On the random matrix side by Milaine Maïda, Sandrine Péché, of course, and Catherine Donati-Martin, and others, Florent, Benaïch-George, and Alice. That's on the random matrix side. But of course, on the statistic side, this has become a complete industry, maybe not with the kind of detail that we were obtaining there. So what was obtaining those things were things of this nature, of the following nature. When is it that the eigenvalue that you add, let's say this finite rank thing, can get out of the semicircle? So for instance, detection means, for instance, if you see that you have one eigenvalue, if alpha equals zero, you have a semicircle. And outside of the spectrum, outside of the semicircle, you find nothing, essentially. The outliers don't exist. So if you find an outlier, then you detect that something is happening. So the question is, when is it that this thing imposes an eigenvalue outside of the semicircle? And this happens, in fact, when, so if this is the semicircle normalized with the usual radius of 2, the usual semicircle. Unfortunately, all these papers are written with the radius of square root of 2, but it's just a scaling. So this happens when alpha is larger than 1. When alpha is larger than 1, strictly, you find an eigenvalue. The spectrum of this has an outlier. And so you can detect and, in fact, also recover. So in this case, so here are many things I've studied. For instance, the fluctuation of this eigenvalue, when it's outside of the spectrum, it's Gaussian. In the critical regime, where it's close to 1, it's a modified tracy-widim fluctuation. And when it's smaller than 1, you just don't find it. So in this case, in case of matrices, this is a case where p equals 2, detection and recovery happen at the same threshold, which is 1. And so you don't have what is called a statistical estimation gap. That is, when you can detect, you can recover. But in particular, in this literature, we will be using one important paper for us here, which is a paper by Maida in 2006 or 2007, which obtained the large deviation principle for the top eigenvalue. So in this context, if you don't have a spike, just take a Wigner matrix. Consider lambda 1. When I say Wigner, here I meant GOE, Gaussian. So take lambda 1, the top eigenvalue of your GOE. So you know that lambda 1 is close to 2. You know that the fluctuations are tracy-widim, but you may want to look at the large deviation principle. And this has been done a long time ago now in a paper already about spin glasses by Amir Dembo, Alice Guionnet, and myself. And where we obtain the large deviation principle for this thing, and the large deviation principle for that are simple. If you look at something like that, this will be of order exponential minus. And a certain function j of x, or I will call here L0 of x for future purposes, which is easy to describe. And if, on the contrary, you try to understand what's happening if you want to go here, x is positive. On the other side, of course, this is much more difficult. And it's in the rate n squared. And it's another thing, which is i of x. All these function L0 and i are completely explicit. So L0 is simply, maybe with a constant that I forget, integral to 2 to x of square root of 4 minus x2 x squared dx. All right, so that's the large deviation in the case where you have no spike. But when you have a spike, it's way more delicate. So when you have a spike, still now in the case of matrices, this is the result by Milan Maida. She gives an LDP for the top eigenvalue with a rate function, which is, which I would call L, let me get her alpha or something. I'm sorry? Yeah, you mean it should be that. OK, so there is a certain rate function that I won't describe. This thing is a much more delicate thing than the result I just mentioned. This uses heavily former results by Alice Guignet and Ofer Zetuni, which goes back to O2, I think. All right, so of course, one way if I don't want this detailed information here, I could say that lambda1 satisfies the large deviation principle in rate n. If I assume that I simply say that the rate function when x is negative, if you want here, is infinite. Because in fact, the real rate is n squared. So that would be a way to state this. And Milan Maida gives the equivalent result in this case. All right, so that's good. That's the case for the usual PCA. The question is, and by the way, the location of this thing here is essentially alpha plus 1 over alpha. So you know where the spike is. So you can recover the strength of the spike, too. All right, so that's all good, but that's this case where p equal 2. So in the case p equal 2, the idea that there is a random matrix, of course, is clear because we're looking at a random matrix. But now what about alpha equals 0? So another example, which is well known, where p is larger than 3. So now we are, so remember what is the function we're looking at. So we have this function, which was alpha times x in a product with a pole, whichever it is. And I had plus this sum of gi1ip, xi1ip. And the j's are iid and 01. So now I want to get rid of this. For the moment, I take the case where alpha equals 0. So I'm just looking at this thing. So this thing is just what I was talking about on Monday. This is just if you want one way to look at it. Just a random polynomial, homogenous polynomial of degree 3, restricted to the sphere. The best way to describe its distribution, it's to say that this is a Gaussian field, a Gaussian function centered here when alpha equals 0. And covariance, given simply if you make it, it's a very simple computation. If you compute the covariance of this at two different points of the sphere, you find that this is just the inner product of x and x prime to the p. So this describes, if you want, this is description 1. This is description 2 of the same object. So now I have this random Gaussian function, and I want to understand, say something about it. How difficult is it to find its minimum or its maximum? So this is what I was describing. On Monday, the complexity of this is well understood now. So of course, for the way to describe this as a physicist, it's simply the P-spin spherical model. In the physics models, instead of putting it on the unit sphere, it's on the sphere of radius square root of n, but it's the same thing. All right, so one way to study it, there are essentially two ways to look at this function and understand how hard it is to find its minimum or detect if there is a spike or not for the moment I've not put any spike. So one way is, of course, the physics ways is to introduce the Gibbs measure and a positive temperature. So you look simply at the measure of mu, let's say g, like Gibbs, and beta dx on the sphere, which is simply exponential minus beta phi of x dx normalized. And then you want, so what you do when you're a physicist, you want to understand, the first question you ask is the free energy. Of course, this dn depends on the temperature beta, which is try to understand the limit of 1 over n, log zn. And the second thing is try to understand the behavior, the asymptotic behavior of the Gibbs measure itself. And in particular, so this is, I mean, in physics literature and now in the math literature, this is understood. So I would be fast on that. There is a formula for this thing. We have a variational formula, which is called the Parisi formula. I won't describe it. And there's also an understanding of the Gibbs measure, which I will describe very briefly, which is if you take two points, let's put it this way. If you sample two points under the Gibbs measure at high temperature when beta is small, which is not what we're interested in here, then essentially these two points behave as if you were sampling them from the uniform distribution on the sphere, which mean that if you look, for instance, at their overlap, that is their inner product, it's essentially zero. Whereas if you do the same thing when beta is large at low temperature, you have a, so what I described here is what the physicists called the replic asymmetric phase. When beta is large at low temperature, you have a one RSB phase. So that's replic asymmetric phase versus one RSB phase, which I will describe very briefly. If you take two points under the Gibbs measure, their inner product, or if you want their distance, take essentially two values, two possible values. Either they are orthogonal, the 2.0, or they are at a definite distance, which depends on beta. This is coherent with the idea you should think of it as you have clusters of the Gibbs measure is carried by clusters of points around points which are orthogonal. So if you take two points in the same cluster, they will have a certain definite distance. In two different clusters, they will be orthogonal. So that's what in a nutshell, what we know. But now here, what we are interested in was what happens at zero temperature, which is beta equals infinity. So as a physicist, for instance, if you want to find the minimum of this function, what you can do is just take the, or even the mathematician, you take this. You take the free energy and you divide it by beta and then you let beta go to infinity and this should go to the minimum. And it's a fact. It's true. So that's one way to discover the minimum of this function. Another way, which is what I was describing, is to understand the complexity of the function. So one way to do that, so directly at temperature zero. So without introducing the Gibbs measure, and one way to do that is to understand the number of critical points. So I would call crit k of u. It's the number of critical points of my function phi with index k and such that phi of x is smaller than u. Here, because of the normalization, it would be square root of n u. OK, so you look at the critical points at a given level. And here I fixed the index. Remember, it's the number of negative eigenvalues of the Hessian. And you may want to understand this mean number of critical points. This is a random number, of course. And this, as I explained, through the cat's rice formula, is linked to a random matrix model problem. This is not equal. It's related to a random matrix problem, which I will describe very quickly. This thing is simply the integral on the sphere of the expectation of the absolute value of the determinant of the Hessian of phi indicator of index at point x equal k conditioned by the fact that phi is that x is critical, that the value is v. And then you integrate against the density, let's say, g of u0 du dx. And here you integrate v. You integrate between negative infinity and u. OK, that's a formula. The important thing is that you have here the Hessian and its absolute value. And the link with random matrix is that, in fact, as it's easy to find, the Hessian of this function, this Hessian, conditioned by the fact that the function is critical and by the value of the function. So the law of this is, in fact, the law of a GOE in dimension n minus 1 shifted by a constant, which depends on p v by the identity. So that's the real link. That's what the catchphrase formula is good for. This is a link between a random function problem and a random matrix problem. And so obviously, the Hessian is a random matrix. It's a random Gaussian symmetric real matrix. And it happened that in this context, because this field is isotropic, as you can see in this formula, the covariance only depends on the distance, it happened that, in fact, this matrix, this Hessian, is the GOE. So if you apply this basic fact, this counting problem becomes a problem in a random matrix. And what do you have to understand? You have to understand the expectation. So what is needed here at the core of this formula, it's the expectation of the absolute value of the determinant of a GOE minus a constant times identity. That is, you have to understand the absolute value of the characteristic polynomial of a GOE in some detail. So this has been done. And in this work, this was started in the work by Toucao Finger, myself and Cherny. And then there were many other papers. And in particular, recent papers by Eliran Subag and Subag and Zetini, which push this work much further. Let me tell you what we learned. So we learned that this function is exponentially complex. We learned that this number of critical points, I will concentrate on the minima here, is exponentially large. So from there, you find that the limit of the expectation of the number of critical points of index 0 below level u, index 0 means local minima. I'm sorry, I need a log. This limit exists and is a certain function theta 0 of u, which you can compute explicitly in terms of the large deviation principle that I mentioned before. And this function has this shape. In particular, this function is positive somewhere. And this indicates that there should be an exponentially large number of critical points of a local minima. And there are. And in fact, Subag improved this by proving that the second moment behaves like twice this. So in fact, he was able to prove that this number of critical point normalized by its mean essentially goes to 1. So the mean gives you the right order of magnitude. So let me summarize what we know now when you have no spike. This function is very complex. It has an enormous number of, exponentially large number of local minima, which are all very low. So again, if you try to find the minimum, remember we had to find the maximum, but everybody can put a negative sign and that would be the same problem. If you want to find the minimum, you will be blocked by these local minima with any reasonable algorithm. Now the question is, after these two simple facts, simple cases, what happens when you really have a spike? So now when alpha is non-zero and p is not 2. So we have to mix these two things. Then the result that we have is that, in fact, we can compute. So this is a theorem, which is due to the four authors I mentioned, that the limit of 1 over n log of the mean number of critical points of index k. And we can do something very precise. So let me maybe try to draw a sphere in dimension n, which is not easy. So here is my north pole. This is the thing I want to find. This is the equator. So here I will fix an altitude, if you want, z, a latitude z. So z is between 0, which corresponds to the equator, and 1, which corresponds to the north pole. And I will look at the parallel here. And I will look at this.