 triangle inequality. However, here, we have another caveat. All of this quantity is called the weak LP norm. It's not a norm. It doesn't satisfy triangle inequality. It's one homogeneous, but doesn't satisfy the triangle inequality. However, there is an equivalent norm for all weak LP spaces with P greater than one. There is a real Banach space norm, which is equivalent to this quantity. Moreover, if I consider P greater or equal to 2, which we can afford, then the coefficients of this equivalence are independent of P. So I can't treat it like the Banach space norm, and then I can write the norm of the sum J from 1 to 1 minus epsilon n y J weak LP is less or equal than a trivial triangle inequality. I'll multiply by the number of y J's. So each norm is C over epsilon n, and I multiply by 1 minus epsilon n. I'll get C over epsilon, actually C times 1 minus epsilon over epsilon, but 1 minus epsilon is a small change. We would not gain anything from it. And we got that translating it back for any positive, the probability that the sum J from 1 to 1 minus epsilon n y J is less or equal than tau over epsilon is less or equal than C over tau to the power epsilon n over 2. So here we used a functional analysis theorem. This is a standard fact, but it's not very trivial. You can look up this renormalization theorem, for example, in the book of Stein, Harmonic Analysis in RN, and the proof is, occupies some three pages. If you don't want to go this way to take this shortcut, you can argue on the level of the standard LP spaces, this is a little longer, and I showed the estimate in the notes. So before let's do one cosmetic thing, I don't like this epsilon n over 2, and I'll replace tau by theta squared, and then I'll have theta squared in the denominator, and this will give me epsilon n. Great. But all this was done because we, not because we want to estimate the distances between vectors and subspaces, but because of the negative second moment formula, and at this moment I can write that for any theta positive, the probability that the sum j from 1, 1 minus epsilon n as j of b to the negative 2 is less or equal than theta squared over epsilon is less or equal than c over theta to the power epsilon n. So the complimentary event now is likely. So omega theta b, the event that this sum, sorry, is greater, it was greater or equal, and here it was greater or equal, because I estimated the tails, and then I applied, so the triangle inequality yields the norm of the sum, and then this norm of the sum is translated to the probability that the variable is greater than something. So let omega theta b event that the sum j from 1 to 1 minus epsilon n as j of b to the negative 2 is less or equal than theta squared over epsilon, and we will choose theta in a moment. Now I know that the probability of omega theta complement is less or equal than c over theta to the epsilon. Assume that this event occurs, and then, of course, not many singular values can be very small. We can use Markov's inequality if you wish. The negative second moment anymore played its role. So I take theta to be delta square root of n, the probability that the sum j from, no, not the sum, the probability that the cardinality of the set j, which as j of b is less or equal than delta square root of n is, sorry, it's not probability any longer. I assume that the event omega theta occurred, so this sum is small, and then I play Markov's to the counting measure, so it is less or equal than, less or equal than, it's the number of all j's such that as j to the negative 2 of b is greater or equal than 1 over delta squared n, this is less or equal than delta 1 over delta, delta squared n over theta squared, theta squared epsilon. Now, delta is up to our choice. Let's choose, let's choose this delta to be 10 theta epsilon. Then theta square would cancel out and 1 epsilon would cancel out, so the cardinality of the set of all j's such that as j of b and b, I recall, is the matrix we want a minus lambda, so is less or equal than 10 theta epsilon square root of n is at most epsilon n over 100. So this is what we managed to squeeze from our argument. We were not able to show the bound for the minimal singular value, but we managed to obtain a bound for the number of smallish singular values. In general, the smallish singular values have nothing to do with the minimal one. Those who tried to, for example, to prove the circle or know that these are two absolutely different problems. So we worked hard. We got some bound, but the bound seems useless because we really need the smallest singular value. Actually, we are almost done. There is a trick which will allow us to finish the proof quickly. Let's redraw our matrix. So it's a minus lambda. This was n by n. Then I dropped a set i of cardinality, epsilon n, and I got 1 minus epsilon n by n matrix. And the difference of the number of rows and the number of columns is epsilon n. Let's cut here epsilon n over 2 rows and call this matrix this upper corner, say b. This will be our new matrix, b, and this will be the matrix g. So what are the sizes of these matrices? b is n on this blackboard. b is 1 minus epsilon over 2n by 1 minus epsilon n, and g is epsilon over 2n by 1 minus epsilon n. Okay, and let's look at the matrix b. The difference between the number of rows and the number of columns is epsilon over 2n. Epsilon n or epsilon over 2n doesn't matter. It's a constant coefficient. We can apply the whole story we told up to this moment to this matrix b. And we can get the bound on the singular values of this new matrix b. But then we saved this piece, the matrix g. And let's recall the dependencies. The entries of this matrix g depend on other entries of the big matrix, but these entries are here. We have thrown them away, which means that the entries of g are independent of the entries of b. We bootstrapped ourselves. We have the same estimate for the singular values of the sub matrix, and we have an independent matrix at our disposal. This would be enough to finish the proof. Let's see. Let's take a singular value decomposition of this matrix and let me call it say b hat. So b will be the big matrix, and b hat will be the top corner. And let's take the singular value decomposition of b hat, and we'll denote e plus to be the linear span of the singular vectors uj such that sj of b is greater or equal than whatever we put there, 10, greater or equal than 10 theta epsilon square root of n. Good. Then I know that the and the space c1 minus epsilon n will be decomposed as their orthogonal sum of e plus and e minus where e minus is the subspace spanned by the other singular vectors corresponding to the small singular values. So what can we say about these subspaces? First, we know the number of smallish singular values. So the dimension of e minus the smallish singular values is at most epsilon n over 100. Second, on the space e plus and the space e plus is spanned by the singular vectors, all the singular values are large, which means that for any z in e plus the norm of b z is greater or equal than the smallest singular value on this subspace. It's greater or equal than 10 theta epsilon square root of n. Great. So we need to estimate the minimal singular value of b, which is the minimum over z in c1 minus epsilon n norm b z 2. And on this event omega theta, which is the likely event, we estimated this norm from below, like we need something like something times square root of n, but not on the whole space on a big subspace. And we do not know anything about what happens on the orthogonal subspace. The output doesn't provide any information about it. But for this, we prepared the bottom matrix, g. Let's see. g is independent of b hat, and we will use g to get a bound on the orthogonal subspace. g is a short and fat matrix. It has only about, sorry, epsilon n. Not epsilon over n, of course. Epsilon n rows, and much more, many more columns. So its kernel is huge, but I'm not going to consider it on the full space c1 minus epsilon n. I'm considering it on the space e minus, and the dimension of e minus is epsilon n over 100, which means that efficiently I am considering it as a matrix with epsilon n rows and epsilon n over 100 columns. This is a tall matrix. It's very well invertible, and its invertibility can be proved by the straightforward epsilon net argument. So after all, our attempt to run the epsilon net argument was not so stupid. We needed this epsilon net argument. We cannot apply it for the whole space, but it comes handy when you want to control small dimensional subspaces. And then, just considering the norm of, so for any z in e minus, the norm of g z2 is going to be z, if I don't normalize it, is going to be large. It's going to be like the number, a square root of the number of rows times the norm of z2. Okay, and this is with high probability, with probability exponentially close to 1, due to the most straightforward epsilon net argument. And now I have a lower estimate on one subspace, and another lower estimate on the orthogonal subspace, and we can finish. Let's see, from this moment, I don't need any probability. I just write that for any z this time in the whole space c1 minus epsilon n, the norm of d z. Since this was a singular value decomposition, these subspaces e plus and d minus were created from the singular value decomposition, I can write that this is the norm of b hat z squared plus the norm of g z squared. And then I will be done by the elementary optimization argument. For any vector z, either the projection onto e plus is large, and then this quantity is large, or the projection onto e minus is large, and then this quantity is large. And then you optimize it. The only thing to take care of is the probability. And the probability here is defined by the event omega theta. So what we are requiring from this probability, let's go back to our, one would be enough. Okay, let's go back to our translation into the smallest singular value. We have to beat the binomial coefficient n over epsilon n. We need super exponential bounded n over epsilon n is about exponential, let's write this way, is about one over epsilon n to the power epsilon n. The power epsilon n I have here, so I just choose theta to be a constant times epsilon n and everything fits in its place. And we completely established the no-gaps delocalization. So next time we are going to discuss the applications of this result to random graphs. Okay, questions? Of b hat, of course. We repeat the whole story for b hat. Then these are defined by b hat and they become independent of g.