 We can see the file. So thanks so much, Afonso. Thanks a lot to the organizers for organizing this wonderful online conference. So today, I'll be talking about overlap gap in the context of Planted Submatrix Recovery. This is based on joint work with David Gomarnik, who's at MIT, and Akash Jagannath, who's at the University of Waterloo. So the context for this work lies in the accumulating evidence regarding statistical to computational gaps in high-dimensional inference problems. So we've heard quite a few presentations already in this conference regarding this general theme, so I'll keep my background discussion pretty concise. But a summary, essentially, is that over the past decade, we have had accumulating evidence that in modern high-dimensional inference problems, there are regimes where inference might be information theoretically possible, but computationally intractable in the sense that the optimal statistical procedure might require exponential time. And any polynomial time algorithm might actually require significantly higher signal strength to actually be efficient. Now, there is a lot of evidence. There is strong reasons to believe that this phenomenon is actually quite widespread and appears in practical problems such as community detection, planted clique, sparse PCA, and so on and so forth. Now, you might naturally wonder, OK, how would one go about trying to make sense of this claim? And there have been very different approaches to make sense of this statement. Of course, from a computer science viewpoint, people have tried average case reductions. We've heard Matt talk about this in our last sessions. From a physics viewpoint, as Antoine explained in our last talk, it's natural to look at message-passing-based algorithms and study the state evolution formalism. One can, of course, look at families of convex relaxation hierarchies and analyze their performance. And there are other approaches to this. So today, I'll look at only a very, very specific problem of this flavor. And instead of these approaches, I'll actually look at the entire likelihood landscape and try to provide some more evidence regarding computational hardness in terms of the existence of free energy barriers and so on and so forth. Now, as such, I think this is a- Sorry to interrupt. Just to make sure, are you still in the title slide? No. We're still in the title slide. It was, you know, we're still very clear all you said, even without just the title slide, but just to make sure. Sorry, do you see my slide transitions? No, we just see the title slide. Oh, OK. Maybe you're not sharing the proper things. Well, at some time, it happens with Zoom. You should share. This is a bit strange. Try preview if PDF Adobe doesn't work. OK, do you see it now? Yes, now we see slide three. OK, but and do you see the transitions? Yes, yes, we do. OK, great. Thank you so much. All right, thank you. OK, so, OK, so sorry for the inconvenience and the confusion, but today I'll talk about a very specific instance of this. And I'll try to look at the performance of maximum likelihood. So let me describe the specific setup, right? So in this case, one gets to observe an n by n symmetric matrix. And most of the entries here are centered Gaussians. However, hidden in this matrix, there is a principle sub-matrix where the mean has been elevated. So an alternate way to represent this matrix is in terms of a one-rank perturbation of a Gaussian matrix. So w here is a symmetric Gaussian matrix that has been appropriately normalized. The vector v encodes the support of the planted sub-matrix. So its entries are 0 and 1. And today we look at settings where the planted sub-matrix has size n rho times n rho, where rho here is a constant that's independent of n. Now, this is a very stylized setting, but it turns out to be relevant and similar in spirit to problems arising in things like sparse PCA and by clustering. So this is what we look at today. And from the perspective of computational hardness, the interesting regimes turn out to be when in this double asymptotic limit where n tends to infinity and then rho goes to 0, because then the planted sub-matrix is a very small fraction of the ambient matrix. Now, given such a setting, there are certain natural statistical questions of interest. The first and obvious one is, OK, can I actually detect the presence of this sub-matrix? I'd like to emphasize that in this particular setting, this is easy because one can just test based on the sum of all the elements of the matrix. So you can always detect the presence. And I emphasize this because you might naturally be trying to associate it with things like the low degree likelihood ratio test and these features. This problem is of a slightly different flavor in the sense that testing at least is easy in this case. However, when you try to start thinking about recovery, you have this sub-matrix. You don't know where it is. You would like to recover its support. This is where things start becoming more interesting. So the basic questions we'll think about today is, when can we recover this sub-matrix and when can we recover it efficiently? So this problem has been looked at in the past from the best that I know. This was initiated in a work by Deshpande and Montenari who formulated it as a Bayesian problem. The entries of the vector v were Bernoulli 0 or 1 with probability rho. Their criteria was in terms of recovering the low rank spike matrix. And they established that an AMP-based algorithm would be optimal for rows bigger than some constant row critical. This was followed up in a series of works by Lassier et al. And they actually first discovered that in this double asymptotic limit, you would expect to have a computationally hard phase in this problem. So today I'll take a slightly different perspective on this and I'll actually look at the behavior of the maximum likelihood estimator for this problem. If we write down the maximum likelihood in this case, it's not too hard to see that it reduces to this following problem. So you naturally try to optimize this quadratic form subject to these constraints. And the performance metric that we'll try to look at is something that we call reliable recovery in that we'll try to see whether we recover a constant proportion of the support or not. So for example, you could think of this constant as like 10% or 0.1 and then you want to know whether you recover a constant fraction of the support of this hidden sub matrix. So our first result essentially identifies the information theoretic threshold required for the MLE to actually recover the support in a reliable fashion. And the important point I want you to note here is that the MLE requires signals which is of the order one over square root row, essentially. Now, on the other hand, you could actually analyze very simple algorithms like a naive spectral one where you would just look at the top eigenvector of the matrix and then you would round its entries. This is a very simple algorithm and we can basically establish that if lambda is actually much bigger than one over row, then this algorithm will also start working. Now, of course you might say, okay, maybe your analysis for this is kind of loose. Maybe one can do something more clever and we can try to maybe achieve, have reliable support recovery with a feasible algorithm closer to the threshold of one over square root row. But my main sort of thesis is that I'd like to convince you that this might not be actually possible. And we'd like to explain why we believe it would be hard to achieve the MLE threshold as such. So, what happens for intermediate row, right? To understand this better, we'll introduce this restricted likelihood function. So, what it does essentially is that it basically maximizes the likelihood, but it does so subject to an additional constraint. And here we essentially constrain the overlap with the truth to be around some constant value Q. So, recall that V is the planted vector that we are looking at and we constrain this overlap to a certain value. Oh, sorry. Okay, sorry, can you see this again? Yes, we can see this side again. Okay, great, sorry. Yeah, so now one of our main technical contributions here is that we can actually identify a deterministic limit for this restricted likelihood function for every value of Q, okay? I'll not go into the details of what this formula is, but this is some deterministic formula that we can write down. One point that I'd like to emphasize here is that note that this restricted likelihood function is not something that you can compute practically given the data. So, this is not some estimator, but rather this is a proof device that you can think about to understand the complexities of the problem. Now, naturally given this deterministic formula, we can try to analyze the behavior as rho goes to zero. And just, so let's see how this problem behaves. In the special case where, let's say lambda is zero, right? You don't have any signal in your problem. So it behaves basically like pure noise and then typical matrices all have overlap of rho squared with the truth. And therefore this problem basically behaves like a unimodal, this function basically is unimodal with a maximum at rho squared. When lambda is actually very, very large, then reliable recovery actually happens in that the global maximum actually occurs at some constant times rho. And we would still expect this unimodal picture to persist. It's just that the mode will shift. The interesting regime is sort of this intermediate signal regime where we can actually establish rigorously the existence of this non-monotonic behavior. Here, naturally the global maximum is still at some constant times rho, but there is another local maximizer right past rho squared. It's not exactly at rho squared, but it's sort of close to that. And this picture is very evocative because it basically says that if you start some local algorithm at this point and if you try to improve, you naturally expect that you will get stuck at this first local optimum and it will be hard to reach the global model. We refer to this as a version of overlap gap simply because if you look at the set of configurations that have likelihood above a certain level, then it's overlaps with the truth sort of are in these two disjoint sets. And in some sense, it's evocative of the overlap gap property that has been looked at in the context of unplanted problems. And formally as a consequence of this non-monotonic structure, we can establish certain barriers for local Markov chain type algorithms. So for example, if you want to maximize the likelihood, a natural algorithm would be to run, let's say a global dynamics with respect to a chain, which looks like a Gibbs distribution with beta that is sufficiently large. And the basic punchline here is that if you look at some local Markov chain of this form, and if you try to optimize it and if you initialize close to random initialization, it usually requires exponential time for it to escape and back to the global one. Maybe just one word on the proof. Our main technical contribution is in deriving the deterministic formula for the restricted likelihood function and its analysis in the regime where rho goes to zero. There we basically use recent developments in the analysis of Parisi type formulae for mean field spin glasses. Our results for the recovery thresholds for the MLE, they are more straightforward and they are based on first and second moment method-based arguments. So following our result, there have been subsequent developments. So there is a set of results by Barbier-McReece and Barbier-McReece and Rush where they analyze the mutual information and this problem in the regime where rho actually goes to zero with N. And more relevant to our work is there's a very recent result, set of results due to Benaroos, Wine and Zadik who actually analyze this non-monotonic behavior in the regime where the size of the planted matrix is sublinear in the size of the ambient matrix. So thanks so much for your attention. Let me stop there. Thank you. Thank you. Thank you.