 Thanks, Yuri, for the introduction. All right, can everyone see my slides? Yeah, maybe there's a way to increase the size. I don't know if it's possible. I see it's a bit smaller. Oh, OK. Right? Right now, it's in full screen. I'm not sure if it's fine. Are you on iPad? I know. No, I'm on my computer. OK. Sometimes you need to rotate it, but yeah, OK. I mean, I guess, Jean, I think we can read it, right, still. So it's OK. OK, cool. So I'll be presenting on joint work with my advisor, Guy Bressler. So the plan in this talk is to outline some recent progress and reductions among statistical problems. And in particular, I'm going to try to illustrate the types of techniques that arise in these reductions by looking at one reduction detail, which shows tight lower bounds for robust sparse mean estimation. So statistical computational gaps or algorithmic information gaps have appeared in many talks so far in this program. But just to recap, formally, a statistical computational gap occurs when the optimal sample complexity rate of estimation, the level of signal needs to be much higher for polynomial time algorithms to solve a problem than for unbounded time algorithms. And these gaps typically yield two distinct phase transitions. So if the number of samples is below a certain level and stat, then the problem is information theoretically impossible. If the number is above and stat, but below another threshold and comp, then inefficient algorithms begin to kick in. But no known efficient algorithms are known to work. And if the number of samples is above a level and comp, then the problem's easy and efficient algorithms begin to work. So the canonical example of a problem with a statistical computational gap is sparse PCA. So in this task, we have this class called k to k squared gap between n stat and n comp. So there are several different approaches to providing evidence for statistical computational gaps. The first is to give rigorous evidence that conjecturally optimal classes of efficient algorithms, such as AMP, the sum of squares hierarchy, local search algorithms, and statistical query algorithms fail below the conjectured computational threshold. And the second approach is the traditional complexity theoretic approach of giving polynomial time reductions between problems. So one important remark to make here is that because of certain complexity theoretic barriers that have been rigorously established in the 2000s, it's unlikely that we could base these statistical computational gaps on NP hardness. And so as a result, the reductions approach is instead to typically focus on reducing among statistical problems rather than trying to show something like P is not equal MP, or ETH, or other worst case assumptions imply these statistical computational gaps. So this reductions approach is united by the 2013 cold paper of Quinton Verthay, who we spoke yesterday, and Philippe Brilet, which showed that the k2k squared gap in sparse PCA follows from the planet plate conjecture, which is a well-believed conjecture in average case complexity. And since then, there have been many reductions establishing tight gaps in a number of problems. And these include further work on sparse PCA, planet and subgraph, submatrix problems, and a web of reductions amongst several problems with sparsity. However, the most common criticism of this approach is that most of these problems are similar in structure. So specifically, if you look at the ones I just mentioned, they're all roughly of the form a sparse submatrix signal plus an independent noise matrix, which many argue is already similar to the starting assumption of the planning clique. So what we'll be focusing on in this talk is our recent web of reductions that breaks out of the sparse submatrix structure that limited previous reductions. So the main insight here is that instead of starting with planning clique, we can start with mild modifications of planning clique. And this yields a bunch of new techniques for average case reductions. So in particular, the mild modifications will yield a little bit of flexibility, which is just enough to make certain technical insights go through. So the plan is to illustrate one of these techniques in detail through our reduction to robust sparse mean estimation. So if you can see my cursor, we'll be following this line in the web, from k-partite hypergraph PC to k-partite PC to imbalanced sparse Gaussian mixtures and then robust sparse mean estimation. But it's really this little edge that is going to be the focus of the talk. So before diving into this reduction, I'll first review the classical planning clique problem as well as the variant of it that will be the starting point for our reduction. So in the detection formulation of planning clique, the task is to test between an Erdos-Renyi graph of size n and an Erdos-Renyi graph of size n with a uniformly-at-random-planet k-click. So even though this problem can be self-inquasive polynomial time, already when k is the order log n, all of the best-known polynomial time algorithms require that k is at least approved n. And the planning clique conjecture predicts that this is necessary. So here you have a huge statistical computational gap between log n and root n. So now I'll describe the variant of the planning clique conjecture that will be the starting point for our reduction to robust sparse mean estimation. So first, let's consider a natural bipartite variant of the planning clique. Here the task is to test between an n-by-n bipartite graph sampled uniformly-at-random and the same distribution with a uniformly-at-random k-m by k-n complete bipartite subgraph. The natural analog for the planning clique conjecture in this bipartite setting is that polynomial time algorithms fail when k-m and k-n are much smaller than root n and root n respectively. We now make one further modification to the setup by imposing a partition over the right side of the bipartite graph. So this will be the key technical modification that will make some of our reduction techniques pass through. So here, what's the difference? Specifically what we're gonna do is choose the k-n right vertices in the bipartite PC instance, randomly such that there's exactly one per part of a given partition of the vertex set n into k-s of n equally-sized parts. We remarked that this hardness assumption is also implied by another variant of planning clique, so in a k-part extension of the planning clique conjecture of the hypergraphs. Furthermore, all of these extensions are supported by the failure of low-degree polynomial tests and statistical query algorithms. So there's evidence that these are the right thresholds in these problems, but oops, sorry. But because reduction techniques are our focus, I'm not gonna get further into the details of how hard the starting assumptions are and what evidence there is for them. But the important takeaway right now is that this k-BPC conjecture is our starting point for our reduction. So now I'll introduce the example we'll be looking at for the rest of the talk. So in ordinary sparse mean estimation, the task is to estimate a k-sparse vector mu in d dimensions within L2 error gamma given access to IAD Gaussian perturbations of mu. So this problem is a gapless problem. Efficient algorithms can achieve the optimal sample complexity of k log d over gamma squared. However, if an adversary is allowed to corrupt an epsilon fraction of these samples, then information theoretically, the best estimation error you can hope to achieve is now o of epsilon. And furthermore, more importantly, a gap begins to emerge. So there's a gap between instead and then comp of this factor of k, very similar looking rates to the sparse VCA setup. And furthermore, this upper bound can be achieved through convex programming. So the theorem we're gonna show is that the KBPC conjecture implies that estimating within L2 error gamma requires that n-comp is at least k squared epsilon squared over gamma to the fourth. So if you just plug in the optimal error of epsilon, then you get exactly this k squared log d over, sorry, k squared over epsilon squared. Here we'll be ignoring log factors. So we're formally any polynomial time algorithm for robust sparse mean estimation or RSME that outputs some estimate within L2 error gamma with probability at least two thirds requires this higher sample complexity than the information theoretic barrier. So what's our proof plan? So to prove this theorem, we're gonna carry out a polynomial time reduction in total variation. So more specifically, we'll construct a reduction that takes the hypotheses H0 and H1 of KBPC to samples for two distributions without knowing crucially which of H0 or H1 the input is distributed according to. So specifically, our reduction will map H0 of KBPC to N isotropic d-dimensional Gaussian samples and it will map H1 of KBPC to within a little over one total variation of NIID samples from the following mixture. So again, this is a reduction total variation. So we only have to map to within little low one total variation of each of these two hypotheses. So what's this mixture shown in point two? So this mixture is two important properties. So first, the bulk of the N samples from the mixture will be from this first component shown here. And by concentration less than epsilon less than an epsilon fraction of the samples you'd observe from this mixture will be from the second component. And therefore, a RSME adversary can produce this mixture within little over one total variation. The second point is just this, if you had access to a black box which could estimate the unperturbed vector which in this case is two gamma mu within L2 error gamma then you could distinguish between these two hypotheses H0 and H1. So what that would mean is that if we had access to such a reduction and also simultaneously access to a black box solving RSME we could just apply the reduction to an instance of KVPC. We could apply then our estimation algorithm and that would yield a way to detect between H0 and H1 and hence we would have contradicted the KVPC conjecture. So now the question becomes how do we divide this reduction with these properties? So here's the rough plan of our reduction. So we're first gonna introduce this technique that we call dense perdually rotations. Then we'll apply this technique locally to stuff vectors the KVPC adjacency matrix. And in doing so, there's gonna be a choice that we'll have. We'll have the opportunity to choose certain parameters a parameter matrix called A and it will turn out that choosing this matrix A is a very brittle task and we'll have to do so carefully according to several precise criteria that I'll outline later. And choosing that matrix will be one of the key components of the proof. All right, so before diving into the details, just I'm gonna make some simplifying notational changes to total variation. So whenever two distributions are epsilon close to total variation, we'll write P is approximately on sub epsilon Q. And then whenever you have an algorithm that maps to within small total variation of Q we'll take this notation with the arrow and throughout I'll just be implicitly using the data processing inequality and trying on equality, which says that if you apply two steps of an algorithm, the TV errors accumulate and I'll be applying union bounds over product measures. So the first primitive I'll introduce is rejection kernels. So this is a key reduction primitive in dense perdually rotations and throughout all of our reductions. So this is a general framework for efficient changes of measures in reductions. Here we'll just specialize to the simple situation of the Gaussian case, which is all we'll need for our reduction. So the goal in Gaussian rejection kernels is to simultaneously map the bit one to end mu one and for newly half to end zero one, both within small total variation distance. And so here, this little low event to the minus three is something that's just small enough that we'll be able to apply union bounds. For now, just thinking of it as something very small and also this mean is going to be close to one-ish. So within polylogarithmic factors of constant. So the main idea is very simple. So the main idea is let phi mu be the PDF of n mu one. Then if you see the bit one sample phi mu and if you see the input zero, sample the PDF to phi zero minus phi mu. Now this satisfies both properties exactly, but there's a catch this is impossible. And the issue is very simple. It's just that the second thing is not a valid PDF because it can be negative. But this is not too much of an issue because we can just truncate the support of this function. So it is a valid PDF. And this will be the bulk of both distributions and hence we'll get our total variation properties if mu is at most one over the log n. So how do we actually sample this PDF? Well, there's a simple way to implement this procedure with rejection sample. So Matt, sorry, so you have about one minute left. Oh, okay, all right, so I'll go quickly. All right, so yes, so I'll just outline. So dense Bernoulli rotations are this general technique for transforming a vector of n i i d Bernoulli half with an unknown bit fixed to one into an approximate sample from a Gaussian with mean ai where i is the index of the unknown bit that's been fixed to one. And here, these matrices, these vectors ai are things that we can choose as long as there's maximum singular values at most one. And tau here is a scaling factor that's at most one over the log n. So I'm not gonna go through the details of how to implement this, but it's just basically tricks with Gaussians. So the plan is what we're going to do is we're gonna apply Bernoulli rotations to rows within blocks of the adjacency matrix of the KVPC instance. So this is our KVPC instance. And what we'll do is we're going to just take individual rows within blocks corresponding to partitions. And then we're gonna apply Bernoulli rotations to these to Gaussianize them. And then what happens is we'll just apply this everywhere and we'll end up with this Gaussianized matrix. And the key is to now choose these vectors a one through a n. And so basically what this all comes down to is that we can reduce mapping to robust vars mean estimation to choosing a matrix A with three properties. We want this A to have zero sum rows. We want it to only contain two values, roughly a one minus epsilon over two fraction of which are the first value X. And we want the singular value, the maximum singular value of this matrix to be bounded. So it turns out if you just work out all the details of this reduction and do the calculations, this will exactly yield the conjectured barrier. So if we happen to have a matrix with these three properties, we'd be done. And so the trick now is to look at pick a prime R which is roughly the size epsilon over two, such that one over R is roughly the size epsilon over two. Now consider the vector space FR to the T where R is the prime and T is some fixed integer and look at all the points in this vector space. And now also look at all affine shifts of hyperplanes in the space. And now construct A to be a weighted incidence matrix between these points and these affine shifts of hyperplanes. And you can verify that this is some kind of generalization of Hadamard matrices that has all of the properties we want. And this ends up showing tight reductions. So this is just an example of one of our reduction methods and mapping to problems of very different hidden structures. And overall there are many open problems about reduction techniques and we hope that there'll be many more reductions in these directions in the future. Thank you.