 This equation forces us to assume that epsilon is less or equal than a constant times n to the one negative one seventh. So we cannot push it all the way down to one, but at least we can get epsilon being a power of n. And there is no reason for this eight. It can be definitely reduced to one coordinate. This restriction is also an artifact of the proof. One can reduce it at the price of making the proof more involved, but we strived for the easiest case for the clearest result. So how would one approach such a problem? Of course, if the entries are bounded, if the tails are light, or even if the tails have a uniformly bounded fourth moment, this assumption, yes. But I wanted to separate the moment type assumption from the assumption on the entries. The tails enter the discussion in only one way via this event. They have to be on the sphere, but the sphere may be complex. So how would one prove this? First, let's get rid of the notion of eigenvector here. This is sensitive to eigenvectors related to eigenvalues, and if an ensemble is complicated, we don't have precise information on eigenvalues. Also, if the matrix is not normal, the eigenvectors are not orthogonal, and this strips us from using most of the spectral methods. So let's get rid of the eigenvectors and let's translate it to a different problem. So suppose that I consider a toy case where V on I is just 0. And then I have the equation of A minus lambda V is 0. But this means that I can reduce this equation to only the coordinates of I complement, and it means that this matrix has an untrivial kernel. But what is this matrix? This is the matrix with n rows and 1 minus epsilon n columns. This is rectangular, and if the matrix is random, it's improbable that the kernel would be non-trivial. We can play on this fact and reduce the problem to the invertibility of rectangular matrices. Of course, this is a toy case, but we can relax it, and if we instead assume that the norm of Vi is less than some delta, then we have again 0 is A minus lambda V, which is A minus lambda Vi complement plus Vi. And this means that A minus lambda Vi, I complement and here is also I complement. The L2 norm is less or equal than the norm of this matrix, A minus lambda times the norm of Vi2. Here I can use the triangle inequality, and if I have this event that the norm is small, I can bound it by m square root of n plus the absolute value of lambda times, and this way I assumed, was less than delta. And this lambda is a point in the spectrum, so its absolute values should not exceed the norm of the matrix, and we are getting down to m square root of n times delta. So this is small, and if it is delta small, I can also approximate this lambda by some deterministic parameter. So I'll assume that absolute value of lambda minus mu is less than delta, and let me keep the result, then we have that the norm of A minus lambda I complement, Vi complement will be less than 4m square root of n times delta. Sorry, yeah, minus lambda, minus mu. Mu now is deterministic. Of course, it's not completely deterministic, it should be close to lambda. Lambda is a random parameter, I don't know anything about its distribution, but we can run a delta net argument over all possible values of mu. We can take a disk of radius m square, and we can take a delta net in this disk so that any point, m square root of n, any point can be delta approximated by the point of a net, and then we can take a union bound over this net. These points belong to the net n delta, and the cardinality of n delta is bounded by, you can compare the area of the disk to the area of small disks around each point, it will be c square root of n over delta square, and this entropy cost, if we recall the lectures of Terry Tao, this entropy cost is negligible. Remember, we want to get something to the power epsilon n. If epsilon n is reasonably big, then these two will be suppressed. Then, after we have done it for a concrete set i, we can take the union bound over all possible sets i and get the following reduction, any mu complex with absolute value of mu less or equal than m square root of n, the probability for any i of cardinality epsilon n, the probability that the minimal singular value of a minus mu is less or equal than delta square root of n, is less than some number p0 less than 1, and then no gaps. Delocalization holds with probability greater or equal than 1 minus, and then we have to include the entropy cost, which is c square root of n over delta squared p0, and times the cost of choosing i, which is n, choose epsilon n, and this bound will be very important later. So, first how the minimal singular value got into picture? What we had here is that the image of this vector v reduced to the i complement falls into a small ball, but if I assume that the norm of v i is small and the norm of the whole vector v is 1, then the norm of v reduced to i complement should be large, it should be at least one half, and this means that this matrix should have a smallest singular value. So, if we manage to get a uniform bound on the fact that the small singular value is small, we are in a good shape. And there are methods to get such bound. The first one, the most straightforward one, is the epsilon net argument described in the lectures of Terry Tau, and we can run it after all. Sorry, not a minus mu, a minus mu, I complement this. Submatrix. If this submatrix has n rows and 1 minus epsilon n columns, this is a rectangular and we know to get the bounds on the smallest singular value of such matrices. So, if we run an epsilon net argument, we will get some bound. So, let's say if a tilde is an n by m matrix with nice entries centered unit variance and finite fourth moment, then the probability that the minimal singular value of this a tilde is greater or equal than c square root of n will be bounded by the exponential of the power of negative c prime. And if m is less than some beta n for a small absolute constant beta. So, we may try, this result was basically proved in the lectures of Terry Tau. So, let me save time on proving it again. We can look at this and try to apply it to our situation. It's very close. We only need to take some union bounds, but we know how to take them. So, what's the problems here? First, I want to keep the assumption about dependence. And this result require complete independence. Okay, maybe if I am not so ambitious, if I restrict myself to the generic type ensembles and I assume complete independence, I can easily finish the proof of this theorem by applying the epsilon net argument. And then we hit the second obstacle. And the second obstacle, this beta should be small and I am striving for a small epsilon, which means that the set i is small and i complement is almost the whole set 1, etc. And which means that the matrix is almost square. For almost square matrix, this is wrong. Sorry, less or equal, of course. For almost square matrix, this is wrong, but a more elaborate analysis, the composition of into compressed and incompressible vector, can yield some, less or equal and less or equal, can yield some bound on the probability that the smallest single value is small. The problem with it is that the bound will be not this time with the probability exponential in n, but with probability exponential in n minus m. If n minus m is small, in particular if it is epsilon n, we have the probability exponential in epsilon n. And we have to beat the union bound, which is super exponential in epsilon n. And then do not match. But maybe we can do something smarter than taking the union bound overall. Overall possible choices of i. So we have, so far we identified two obstacles, epsilon net argument. So the first is insufficient independence. The second obstacle is not sufficiently high, high probability. Although the epsilon net argument, at least the elaborate epsilon net argument with the splitting to compressible and incompressible vectors yields the exponential probability, we need a super exponential one. But then suppose that we can overcome even this, or suppose that our problem is more moderate instead of having it for all i's, we want to have it for a single i. Let's compromise on everything. But even then we would hit the third obstacle. And to describe it, let me recall the main feature of the epsilon net argument. We run the small, we play the small ball probability against the cardinality of the net. That means that we discretize the unit sphere, and if n, let's say n tau is the tau net in the unit sphere, then the cardinality of n real unit sphere, then the cardinality of n tau is bounded by a constant over tau to the power n. And if we are lucky, the probability that the norm of A tilde x2 is less or equal than say delta square root n is bounded by some constant c of delta, or c delta actually, to the power, this will be m, if it's in m sphere, and this will be n. So A tilde is the matrix, which has n rows and m columns. And then we multiply these two numbers, and if n is greater than m, and delta is of the same order as tau, we win. And this works perfectly in the real case. But suppose that our entries are real, then even in this case, the eigenvalues will be complex. So instead of the real sphere, we must consider the complex sphere. And for the complex sphere, the dimension, the real dimension jumps, and the size of the net also jumps from m to 2m. And then we multiply the exponential of 2m and the exponential of n. And if m and n are close, the entry cost is prohibitively high. We cannot get anything from the epsilon net argument. Of course, there is one case where we don't have to go to the complex sphere. The real would be enough. This is the case of the Hermitian matrices. But for Hermitian matrices, we don't have independence. So we cannot run the epsilon net argument for a different reason. And all of this looks like a hopeless case. It's the exercise in futility. So why did I discuss it here? First of all, it's not absolutely futile. We can, and we are going to use this later in the proof as a piece of the puzzle when we assemble the puzzle together. We wouldn't use it for the whole sphere, and we wouldn't use it for this matrix. But this will come handy in the critical moment. Second, this frontal attack, which failed, teaches us one thing. If we want to approach there are no gaps in the localization where this proposition, if we want to get a strong bound on the smallest singular values of rectangular matrices with very high probability, super exponential probability, we need a good, a very strong, small bound, a bold bound for such vectors. And we are going to spend the next lecture on developing some of these strong bounds, even if we don't know how to apply them yet. Let me stop here. Questions? So I had a quick question. So you made this assumption that the imaginary parts were deterministic, and then you said later you would take the expectation over that. So how do the imaginary parts come into the bounds? Is it just in terms of the M? The imaginary part, the only way the imaginary part comes into the bounds is that I want to keep this, the event BAM, the event that the normal of the matrix is more likely. Otherwise it can be anything. Any other questions? So if not, we'll start again in 10 minutes and let's thank Mark again.