 I'm going to talk about spectral universality in high-dimensional estimation. He's in Harvard, but then we're going to become a faculty very soon at medicine. Spectral universality in high-dimensional estimation. One thing I'll say is that my slides are probably going to be messed up, because I should have converted into PDF before transferring, but I didn't. So let's see. So let me begin by introducing a canonical estimation problem. So maybe a very simple task is to fit a generalized linear model to a data set, or alternatively to recover an unknown signal from noisy nonlinear measurements. And so maybe one way of formalizing this problem is that there is this unknown signal vector beta star, which is a p-dimensional vector that we would like to estimate. And we observe a measurement vector y. This is an n-dimensional vector. And the way we model the relationship between y and beta star is via a generalized linear model. So it's composed of two steps. So first, you transform the signal linearly. And this linear transformation is given by a design matrix x. And you might, depending on which area you work in, you might also call it the feature matrix or the measurement matrix or the sensing matrix. And then there is this nonlinear RTG, which acts entry-wise on the linear transformation of the signal. And you might want to have some noise in the measurement process, so there's an epsilon there. And so maybe throughout this talk, I'll use n to denote the sample size or the number of measurements. And p to denote the dimension of the signal that we would like to estimate. So this talk is going to be about high-dimensional asymptotics or precise asymptotics or proportional asymptotics. And here the goal is to study this problem when we let both the number of measurements n and p, which is the dimension of the signal, go to infinity together, such that the ratio converges to a fixed constant, delta. And I guess there are a lot of people who work in this sort of regime. And maybe I took a shortcut and I just put a reference to a review article or like. So and really maybe the reason why we study these problems in this asymptotic is that in this asymptotic we can precisely characterize the error of our estimators, including the constants. And so in many situations, we can come up with many estimators which get the correct order of magnitude of the error bound. And here then the promise of this proportional asymptotics is that by analyzing problems in this regime, you can pin down the exact constants and compare different estimators, which tend to be equivalent in terms of the order of magnitude of the error bounds. And so if you read papers in this area, you will often see figures like this where people plot the theorems that prove, which are the green curves, with the simulations, empirical simulations that they do, which are the black dots, and the point that people are trying to make with this is that in this sort of asymptotic, you're not getting coarse performance bounds. You exactly get in describing the exact performance of the estimator, including the sharp constants. But maybe one of the issues here is that when we do the analysis in this sort of asymptotic regime, we typically tend to make strong assumptions on the design matrix x. So maybe popular assumptions are the design matrix is Gaussian. The design matrix is IID, or a rotationally invariant. And maybe the simplest assumption you can make is that the design matrix has IID Gaussian entries. And so this talk is going to about how can we do proportional asymptotics without making these strong distributional assumptions. So like the motivation for this talks is coming from signal processing applications. So I'll tell you what design matrices look like in signal processing applications. So in compressed sensing, which is sort of the mathematical problem underlying MRI machines, it turns out that the design matrix, the way it's constructed, is that you take a big DFT matrix, a discrete Fourier transform matrix, and you randomly pick some rows of it. And in imaging applications like X-ray crystallography, the design matrix has the structure. It consists of several blocks. And each block is a Fourier matrix multiplied by a diagonal matrix. So I'll talk about these matrices again later in the talk. Maybe so it's not important how do they show up right now. But what I want to tell you is that these matrices, these design matrices are very far from Gaussian designs. And so this talk is going to be how can we do sort of proportional asymptotics when our design matrix is non-Gaussian has strong dependence and very little randomness in it. OK, so the reason why there is this hope of proving any results for these designs is this sort of empirical phenomenon called spectral universe, which we call spectral universality because we didn't have a name for it. And to explain this phenomenon, I can keep in mind the problem of fitting a generalized linear model to your data. And so if someone gives you a design matrix, you can specify it by its SVD. So the u's are the left singular vectors. The sigma is a diagonal matrix consisting of the singular values. And the v are the right singular vectors. And what people have often observed is that the statistical properties of the inference problems somehow often depend only on the singular values of your design matrix x. The singular vectors themselves are not super important, provided that they are not exceptional singular vectors. And by statistical performance, I mean statistical properties. I just mean the performance of your favorite algorithm to solve this problem. And there's like a long line of work which sort of has observed this in simulations. And so why is this interesting? Because it sort of gives us a way to simplify our problem. So maybe we care about an actual design x, which we kind of don't know how to do proportional asymptotics for. This design might be very deterministic. It might have strong dependence between its entries. But if you sort of believe in this spectral universality phenomenon, what you could do is that you could construct a surrogate design, which will capture the behavior of the design that you care about. And the way the surrogate design is constructed is that you preserve the spectrum of your original design. So x and x still share the same singular values. So the sigma, which is in magenta, is the same for the two designs. But while constructing your surrogate design, you sample the left and right singular vectors as uniformly random orthogonal matrices. And maybe the reason to do so is that this is the most mathematically convenient distribution for your singular vectors. And so if the singular vectors don't matter, then you might as well assume them to be what is easiest to work with. And once you make this assumption, it often turns out that it's possible to do high dimensional asymptotics with this surrogate design because of the rotational invariance properties of the uniform distribution on the orthogonal group. And what is surprising is that even though people can prove theorems about the surrogate design, the theorems that they prove often describe the behavior of the actual design that you started with. And so this is what we call a spectral universality. So let me see. I'll give an example of this, but let me pause and see if there's any question about the basic idea. So what I want to, the reason why I call this a heuristic is because there's this phrase in the heuristic. It says that the singular vectors are not important, provided that they are not exceptional. So there's a requirement that these singular vectors are generic, but I'm not really telling you what this is. So right now, the way I've stated, you don't know when is this going to work and when is this not going to work. And so that's why I call it a heuristic. So here's the outline of my talk. So here's the outline of my talk. So I'll begin by, so I've already told you what the phenomenon is that we want to understand, which is a spectral universality. And so now I'm going to the second part of the talk, which where the goal is to sort of give you an example of why this is like an important thing to try and understand and what you can do by exploiting this phenomenon. And the last part of the talk will be towards trying to bring this heuristic principle into like a mathematical framework, a formal mathematical statement. So in the second part of the talk, I'll explain an application of what you can do with this empirically observed phenomenon in the context of a problem called phase retrieval, which we already saw in Marco's talk. And basically, the philosophy is that in this problem, the design matrix are quite hard to understand. And instead, sorry, are quite hard to understand. And so rather than trying to understand those, we'll construct these surrogate designs for this problem using the spectral universality heuristic. And the hope is that the two designs, the surrogate design and the design that we care about, lie in the same universality class. And for the purpose of the second part of the talk, we'll just settle with checking this in simulations. And the hope is that whatever insights that we get out of analyzing the easier problem, which are the surrogate designs, which transfer to the designs that we really care about. So that's going to be the rough overview of the second part of the talk. So let me begin by introducing this problem, which is the phase retrieval problem. So this problem arises in imaging applications. So here, you basically want to figure out the structure of a molecule. So what you do is that you place a sample of the molecule in front of an x-ray source. And as the x-rays pass through the molecule, they diffract. And then you capture a diffraction pattern on your sensor. And basically, because of physical limitations, you can only, the diffraction pattern has both like a magnitude and a phase information. But you can only measure the magnitude of the diffraction pattern because of some physical limitations. And so you can map this problem into the problem of fitting a generalized linear model. So here, the measurements are the diffraction patterns that you observe on your sensor. The beta star, which is the unknown signal, represents the structure of the molecule that you want to infer. And the physics of the problem is encoded in the design matrix x for this problem. And because you are not observing, you're only observing the intensity of the diffraction pattern. You're losing the phase information. The measurements are the squared absolute value of the linear measurements. So this squared absolute value acts entry-wise on its arguments. And what we would like to do for this problem is to construct an estimator, which recovers the unknown signal beta star, using the things that we observe, so the measurements and the design matrix, which we know because we understand the physics of this problem. And once we have an estimator, we assess how well it's doing by measuring the cosine similarity between the signal and the estimator. So if this is close to 1, it means that we are doing great. And if it's close to 0, it means that we are doing pretty badly. So what does the design matrix look like in this problem? So I mentioned this design matrix briefly, but I'll sort of explain it again. And I'll tell you how this design matrix is showing up. So the design matrix in this problem consists of several blocks. And each block is a Fourier matrix multiplied with a diagonal matrix. So why does the Fourier matrix show up? It shows up because of the physics of the problem. The diffraction pattern that is captured on the sensor is the Fourier transform of the electron density function of the signal. So that's why the Fourier matrix shows up in this design matrix. So why do these random diagonal matrices show up? So one of the proposals for this problem by Candace et al. Is that you would like to capture redundant information about your signal. You want to repeat your experiment several times. And each time you want to illuminate your unknown molecule in different ways to capture redundant information. So these diagonal matrices sort of represent sort of masks that you use to randomly illuminate your molecule to capture more information. And again, the main quantity of interest, the main parameter for this problem is going to be the sampling ratio n by p, which is the ratio of the number of measurements. So that's the number of rows in the design matrix to p, which is the dimension of the unknown signal that we want to estimate. Is the setup OK? So there are many different estimators for this problem. And one class of popular estimators to run PCA somehow reduce the problem into running PCA. And so here the idea is that the way you're going to estimate your unknown signal beta star is by computing the largest eigenvector of a reweighted covariance matrix of your design matrix. So this m tau is you can just think of it as the covariance matrix corresponding to your design matrix, except that you have these weights which are constructed by transforming the measurements by eyes. So you're weighting the rows of your design matrices by the corresponding measurements. And then you're not using the measurements as it is, but you are sort of transforming them before using them. And this transformation is something that you should choose to optimize, to get the best possible performance. So it's like a user-defined tuning parameter. And you should try to figure out a tau which would give you the best reconstruction. The nice thing about this estimate is really simple. So you're just running PCA on a weighted covariance matrix. So it's really simple to compute. And as we'll see, and as we already saw in Marco's talk, that often the spectral estimator is optimal in some ways. And even if you want to use a more complicated estimator, like running gradient descent on some sort of loss function for this problem, that loss function is more often than not non-convex. So you could consider initializing your algorithm with the spectral estimator. And so in the plot on the right, I'm plotting the performance of the spectral estimator as I change the n by p ratio, so the number of measurements to the dimension of the signal. And I'm measuring the performance of the spectral estimator by the squared cosine similarity between the estimator and the unknown signal. And so as you can imagine, as you get more and more measurements, the spectral estimator becomes better and better. And I'm using two different choices of tau. I'm using two different choices of tau. So that's the trimming function or the transformation you're using to transform your measurements before using them in the spectral estimator. And as you can see, the choice of the trimming function can change the performance of the spectral estimator by quite a bit. So we would like to understand how do we come up with good trimming functions for this problem. And let me quickly tell you some prior work for this problem. So these plots are for the mass Fourier matrices. And for the mass Fourier matrices, people have done sort of an order of magnitude analysis of how many samples. What's the order of magnitude of samples that you need for the spectral estimator to work for this very structured matrix? So that's the work by Candice et al. But what's been missing is the high dimensional analysis, the precise analysis when we let both N and P go to infinity, so the ratio converges to a constant. And in that sort of line of work, people have settled for understanding the spectral estimator when the design is IID Gaussian. But the point is that IID Gaussian designs don't really capture the mass Fourier designs which are important for this problem. And mainly the key difficulty in sort of analyzing the mass Fourier designs is that these designs are quite structured, and they're really hard to analyze directly. So this is sort of exactly the problem in which we should try to apply the spectral universality principle that I told you about in the first part of the talk. And I'll just recall, so it says that only the singular values of your design should matter. The singular vectors might not matter unless they are very exceptional. And so the actual design that we care about is the mass Fourier design. It's very structured, very little randomness, and we don't know how to analyze this. But it turns out you can write down its SVD. And we'll consider a surrogate design. So I'll call that design the hard design. And the way it's constructed is by preserving the singular values of the mass Fourier design by thinking of the singular vectors of the design that we care as random orthogonal matrices. So you preserve the singular values because singular values are supposed to be important, but the singular vectors are not. So you might as well assume that they are random orthogonal matrices. And again, we choose random orthogonal matrices because they are easy to work with. And maybe the hope is that by making this strong assumption, the problem becomes more tractable. But at the same time, because we are preserving the spectrum, maybe we can still expect some universality. So the results proved for the hard design would describe what's happening for the actual design that we care about. So let me tell you about a result that we proved. So this is a work I did during my PhD with my advisor, Aryan Maliki, and a fellow student, Millard, and a postdoc, Junji. And so basically, we characterize the performance of spectral estimators in the proportional regime. And as you can imagine, the limiting performance has a complicated formula. So I haven't tried to write it down completely. But what I've highlighted is the main feature of the formula, that there's a critical threshold, which is delta critical, below which the spectral estimator is the accuracy of the spectral estimator converges to 0. So it's not doing anything useful. And then about the critical threshold, it converges to an explicit function, which I'm calling rho. I haven't written down the formula for rho because it's complicated. And again, so as I mentioned, the result is proved in the high dimensional limit when you let both n and p go to infinity, such that the ratio converges to a constant delta. The explicit formula is messy, so I've just told you the main feature of the formula. And what I want to say is that the spectral estimator had a tuning parameter tau, which was the trimming function. And the limiting performance depends on tau. So both the critical threshold, delta crit, and the limiting accuracy above the critical threshold depend on explicit functions, which depend on the trimming function tau that you decided to use. And so in the plot on the left, the black dots are again the performance of the spectral estimator on the mass Fourier design, which is the actual design for the problem that we care about. And the green curves are the results of this theorem. So even though the theorem was proved for the hard design, what this plot is showing you is that really the same result seems to hold for even for the mass Fourier design. So that's like an instance of the spectral universality because in doing our analysis, we preserved the singular values of the mass Fourier design. The result we got by randomizing the singular vectors actually describes what's happening for the design that we care about. And maybe I'll say one word about the proof. So the proof involves sort of massaging the problem into a well-studied matrix model in free probability. OK, let me pause for a second and see if there are questions. OK, so what can you do with this result? So it turns out that because we understand the performance of spectral estimators for a wide range of trimming functions, what you could use the theorem statement to do is to figure out what the optimal choice of the trimming function is. So just like in the Gaussian case, so I had alluded to work before where people first analyzed the problem for the Gaussian design and they were able to use that result to optimize, to figure out the optimal trimming function. It turns out you can do the same for the hard design. And the optimal trimming function is like a very explicit function. I've written it down. And in the figure, so the two curves on the bottom are the trimming functions that I've been showing you. And the curve on the top is this optimal trimming function that you get by optimizing over tau. And so as you can imagine that it beats the other two trimming functions, otherwise it wouldn't be optimal. And the way this optimal trimming function is derived is by solving a variational problem, right? So for any tau, we understand what the limiting performance is. So we want to solve the variational problem of finding the best tau. And it turns out that this problem has some nice analytic structure underneath it, so you can figure out what the optimal tau is explicitly. And maybe one feature of the optimal trimming function is that when delta is less than 2, even the optimal trimming function is not working at all. So you can see its accuracy is, again, is basically zero. And actually, we also know that this is not really something to do with the spectral estimator. But when delta is less than 2, nothing is going to work. So the problem is just information theoretically impossible when delta is less than 2. So this is why I was suggesting that in some sense, spectral estimators are optimal. Then I will see if there are any questions and then I'll move to that. Yeah, that's true. So yeah, and maybe you'll see, for certain bad beta stuff, we know this universality can break. And so when I prove a theorem about this universality, so right now we were just working with real images and this wasn't happening. But when we see a theorem about this, we'll see there's an assumption to counteract the situation that you're referring. OK. So in the last part of the talk, I kind of want to tell you about the result about trying to formalize this empirical phenomenon. So again, just to recall, what we are trying to understand is that often we see that the statistical properties of the inference test depend only on the singular values of our design matrix. The singular vectors are not important, provided they are not exceptional. And the goal is to sort of turn this heuristic into a mathematical principle that we can check when this is going to hold and when this is not going to hold. So when does this happen and when does it fade? And what exactly do we mean by the singular vectors being non-exceptional or generic? And so I've worked on this for a bit. And what I'll tell you about is sort of the last result I got with my postdoc advisors, Yuelu and Surabrata Sen. And I'll also point you to concurrent work by Wang Zhong and Fan, which is basically studying the same phenomenon. OK. So let me explain the setup for our result. So we decided to study this phenomenon in the context of the linear regression problem. So it's a simpler problem than what I told you about. So here we assume a linear model on the measurements. So the measurements are x beta star. So x is, again, the design matrix. Beta star is the unknown signal that we'd like to recover. And we are corrupting the measurements by ID Gaussian noise. And we would like to study the universality properties of regularized least square estimators. So these are estimators that minimize the cost function. So the cost function has two parts. The first is sort of the sum of squares errors between the actual measurements and the fitted measurements if the unknown signal was beta. And then there's this regularization term, which sort of promotes some a priori structure that we believe applies to the unknown signal data. So for example, if we believe that the unknown signal data star was sparse, maybe we would choose the regularizer to be lasso. But I should say that the theorem applies only if the regularizer is strongly convex. So it doesn't apply to lasso. But if you added a small amount of rich penalty to lasso, then it would apply to lasso. Eisen lasso plus rich, yeah. OK, so is the setup OK? And so we would like to understand when the spectral universality happened in the context of the performance of these regularized least square estimators. So in precisely the setup, actually, this spectral universality phenomenon was observed in the work of Don Ho and Tanner and a subsequent work of monogamy at all. So just to explain what they observed again. So they looked at three different design matrices. And I'll explain. So these are spike sign design, random DCT design, and hard design. So these are three different design matrices. And I'll explain what they are. And what they were interested in is trying to understand the performance of the lasso estimator for these three different designs. So they plotted the mean square error of the lasso estimator as you change the sparsity of the underlying signal for these three different designs. And what they found was that basically these three different designs behave exactly the same in the simulations. And it's not just that the mean square error of the three designs was behaving the same. But if you plotted the empirical distribution of the estimator, basically, it seemed like the distribution of the estimator was the same. And so what are these three designs? The first design was like a completely structure deterministic design. The first block in that design was the identity matrix. The second block was the discrete cosine transform matrix. The other design that they looked at was a little bit random. So they took a big discrete cosine transform matrix and randomly picked some of its rows. And that design, I have an MRI image next to it because that's the design that people doing MRI care about. And the hard design is like a very random design. So the way it's constructed is that you first sample a uniformly random orthogonal matrix and then pick some of its rows. And what's really common between the three designs is that they have the same singular values. And so what Dono and Tanner were seeing was basically an instance of the spectral universality that I told you about. And out of these three designs, there has been some work on analyzing the hard design because it has some nice rotational invariance properties. So the two papers are written. They prove results about the hard design. But we don't know how to understand the spike sign design or the random DCT design because they are much less random. And so if you could understand why spectral universality is happening on, we could show that it's happening. So whatever results people have proved for the hard design would automatically transfer to the designs, to the less random designs. OK. And I also quickly mentioned some prior work on universality. So if you consider the noiseless linear regression problem, so suppose there's no noise in your problem and you look at the lasso estimator, then actually Dono and Tanner proved some of their universality observations. And this is like a great result, which is really hard to beat in terms of the generality of the assumptions on the design. But the catch is that they prove this result for noiseless in a very special case. And actually the way they prove this result is that they exploit the fact that the lasso estimator is a solution of a linear program. And so proving the universality boiled down to exploiting some results in the theory of random polytopes. But the proof breaks down as soon as you inject some noise into the problem or if you go beyond the lasso estimator, and to general regularize estimators, which cannot be computed using linear programming. So it doesn't really capture the reason why this isn't the end of the story, is that because spectral universality is a very general phenomenon, and this result doesn't really capture the generality of it. And then there's really a long line of work, and we heard some talks about Gaussian universality. So these results basically say that if you have a highly random design, they often behave like Gaussian designs. So by highly random designs, I mean designs either with IID entries or designs with independent rows, but with possible correlations within the row. And the nice thing about these results is that they're very broad. They apply to essentially very general problems. But unfortunately, Gaussian universality is not valid for these very structured designs that I'm talking about. So these designs just don't behave like Gaussian designs. So you can even see this in simulations where these designs don't follow Gaussian universality. And then there is a different line of work coming from communication systems and free probability where people have been trying to understand what's the eigenvalue distribution of matrices constructed by adding and subtracting, adding and multiplying matrices with generic eigenvectors. But maybe the catch here is that it's not really clear how to relate the performance of the lasso estimate, for example, to the spectrum of a matrix. So that connection is not super clear. OK, so let me tell you about the result that we proved. So before I show you the result, this result is a asymptotic result in high dimensions. So I let P, which is the dimension of the signal of interest, go to infinity. And so the template of the result, I'll first tell you the template of the result. So the result says that suppose you have two designs, x1 and x2, which are required to lie in the same universality class. And what this phrase, the same universality class, is a placeholder for three conditions that I'll tell you in a moment. So these two designs have to satisfy a set of three conditions, and then explain these conditions in later slides. But suppose they satisfy these three conditions. And if you assume that your unknown parameter beta star is drawn IID from a prior pi, so this would exclude bad priors, which are correlated with your design, bad signals, which are correlated with your design. And suppose you computed the regularized least square estimator on the first design and the second design. So beta hat 1 is the estimator you compute on design 1. And beta hat 2 is the estimator you compute on design 2. And so what the result will claim is that in high dimensions, the joint distribution of the estimator on design 1 and the beta star is the same as the joint distribution of the estimator on design 2 and beta star. So beta star is the unknown signal that we were trying to estimate. And the sense in which this approximate equality and distribution holds is that if you pick any nice test function pi, which is a bivariate test function, and then you applied it to the coordinates of the unknown signal and the first estimated computer on design 1 versus you applied it on the unknown signal vector and the estimated computer on design 2 and average them across the coordinates, then the difference between the two quantities would tend to zero in high dimensions. And really, you should think of pi as your favorite loss function. So for example, if you are interested in MSC, the mean square error, you should take pi as the square loss. And it would tell you that the mean square error of the two problems are equivalent of the two different designs. And yeah, but it's nothing special about the mean square error, you could pick other pi. And I won't really have time to tell you much about the proof. But maybe for people who are experts, like the proof at the heart of this result is sort of a universality result for approximate message passing. And once we have that, we can transfer the result to an iterative algorithm which can compute the regularized least square estimator. And because the iterative algorithm you use as a proximal method, and because proximal methods can compute arbitrary well approximations to the regularized least square estimator. And so if we show that the proximal method behaves universally, so must the regularized least square estimator. Yeah, so it's like an algorithmic approach that we also saw in Marco's talk. Maybe what I want to spend the remaining time is telling you about the three conditions which I didn't quite specify. So the two designs for them to line the same universality class. They are supposed to satisfy a set of three conditions. Out of these three conditions, the first condition is quite natural. It just posits that x1 and x2 should have approximately the same singular values, which was one of the prerequisites in the spectral universality heuristic. So it's not super interesting. So I'll tell you about the other two conditions, which are that x1 and x2 have generic singular vectors. So that will help clarify what we mean by generic. And the third condition is the sign-in variance condition. And I'll tell you about it because it's sort of a, I think there's a nice problem there that's left. What am I doing on? OK, so let me tell you what the, so what does it, so this condition is supposed to capture the requirement that the design should have generic singular vectors. And so first I'll tell you the statement mathematically and then I'll tell you the intuition behind it. So what this condition posits is that if you look at the covariance matrix of a design and you raise it to any power k, it should look like a scalar multiple of the identity matrix. And the way you're measuring your approximation error is in the entry-wise maximum difference between the two matrices. So this infinity norm is the entry-wise maximum difference between the two matrices. And you allow the approximation error of 1 over root p, where p is the dimension of the signal. And you can even allow the, this polylog p is, it's typically hard to verify it with 1 over root p. So this, you can think of this polylog p as a slack factor, which will help you verify this condition more easily. OK, so what's the intuition behind this condition? So I kind of like to think about this using a thought experiment. So in, so this is a thought experiment designed to interpret what this condition is enforcing. So in our thought experiment, the design that we have has the properties that if you look at the covariance matrix of the design, and if you look at its eigen decomposition, so the lambda i's are the eigenvalues, and the ui's are the eigenvectors, then the lambda i's are randomly matched to the corresponding eigenvectors. So in our thought experiment, the coupling between the eigenvectors and eigenvalues of the covariance matrix of the design is given by a random permutation. And for this design, actually, this condition is quite easy to verify, because if you look at the gate power of the covariance matrix, using some, if the eigenvectors were delocalized, using some simple concentration bounds, you can show that entry-wise, this gate power of the covariance concentrates to its expectation. And you can also compute its expectation exactly because by averaging over the random permutation and this expected value will be a scalar multiple of identity, which is exactly what we are requiring in the condition. And actually, even though this is like a thought experiment, it's not like a crazy thought experiment, because many of the designs that Dono and Tanner were looking at, they actually fit perfectly into this thought experiment. So when you're picking the rows of a DCT matrix randomly, one way to pick rows of a discrete cosine transform randomly is to shuffle the rows by a random permutation and then pick the first-view rows. So that will introduce a random permutation pi into the problem. And you can write the covariance matrix of this randomly row subsampled DCT exactly in this form. But more generally, what these conditions are intended to capture is that even though it might not be true that the eigenvalues and eigenvectors of your covariance matrix of the design are randomly coupled, but when they satisfy these conditions, what these conditions pick up on are approximate decoupling of the eigenvalues and the eigenvectors. And the one thing that we like about these conditions is that we were able to verify them for all examples where monogamy at all had observed universality. And they were quite simple to verify in all of their examples. OK. So let me tell you about the last condition. So before I tell you about the last condition, let me first begin by saying, without this condition, we know that universality can fail. Without a third condition, we know universality can fail. So in this plot, again, I'm looking at the three designs that I told you about that Dono and Tanner were looking at. And they observed that the performance of the mean square error of the last estimator was the same for these three designs. But here, what I didn't tell you is that the way they were sampling their unknown signal was from an IID prior, which was symmetric about the origin. But if you repeated the experiment with a prior, which was not symmetric, you start to see breakdown of universality. And this was also observed in this paper by Monajume et al. So I have a quote from that paper. And so to bypass these exceptions to universality, what we require is that the design matrix X is constructed, the way it's constructed is that you take a deterministic matrix, and J can be an arbitrary deterministic matrix, but you randomly sign the columns of that matrix. And the reason that we require this is because without this, we know that universality can fail. So it's not gonna be satisfied by these three deterministic designs, by some of these designs. And the reason is because these designs sometimes don't exhibit universality, but as long as you randomly sign their columns, they would satisfy all the conditions that we imposed. And what I would like to emphasize is that that's the only source of randomness in the design matrix, which is like much less than what was known in prior works. And in some situations, you can get rid of the sign, signing assumption if there's some sign symmetry in the problem. And I think a nice problem here is to sort of identify a deterministic condition that replaces the sign invariance condition. So we kind of think of it as a placeholder for a condition that we are not able to get at right now. Okay, and I think I'll leave it here. That's the sign tree.