 So, we are considering the localization of eigenvectors of random matrices, and we will concentrate on one concrete type of the localization, namely no-gaps-de-localization. And before I formulate it, let me say which classes of matrices we will consider were striving to the maximally general formulation so that the structure wouldn't matter. And we assume that the entries of the matrix are all independent, except that the entry Aij can depend on AjI. This would include in our class the full independent matrices, the Hermitian matrices, Q Hermitian matrices, etc. The second assumption was that the real part of the matrix is random and the imaginary part is fixed. This looks somewhat artificial, but the aim was also to include the real matrices, where the imaginary part is just zero, or complex matrices, where the real and imaginary parts are independent, and then you can condition on the imaginary part and reduce it to this model. And then we want to prove that any unit eigenvector of a matrix carries a non-negligible mass on any reasonably large set. If you take epsilon to be small and you take any subset of coordinates, then these coordinates carry a mass which is polynomial in epsilon. So we are going to discuss, to show that this delocalization event is likely and there are two different formulations of it. One that if the entries have a uniformly bounded density, then this event is very likely so that the probability of the failure is exponential in epsilon m as can be any positive parameter plus the probability, sorry, minus the probability that the norm of the matrix is large and this event has a very small probability if say the fourth moments of the entries are uniformly bounded. So this is a relatively easy statement and if we drop the uniform bounded density assumption, it turns out that we can prove essentially the same for general entries, only they have to be random and the only thing we require that they are not contained in small disks, otherwise the matrix would be almost deterministic and then the conclusion is the same that the delocalization event is likely if epsilon is at least some power of n. And one of the main messages of this lecture series is that using geometry along with probability, one can get sometime more than with probability alone, so we want to eliminate all the traces of the fact that we are dealing with the eigenvector and reformulated the problem so that it would be approachable geometrically. So here I rewrote the definition of the delocalization event and we proved that we can reduce delocalization to the uniform bound of the singular values of the matrix A minus lambda for any deterministic lambda. So if we manage to prove such a bound, then we show that we have shown that the probability of the delocalization event is likely we have to multiply this probability by the binomial coefficient which is responsible for the number of such sets and some term, some discretization term which is harmless, this is not of the order of the first term. So we have to bound the smallest singular value uniformly and we approached this also from a geometric point of view trying to use the epsilon net argument described in the lectures of tau. So I take the minimal singular value of say the matrix A tilde is the minimum over the norm of X being one of the norm of A tilde X and discretize the sphere and evaluate this minimum for each point separately. This ended up in complete fiasco, it cannot work for any model of random matrices and there are many reasons, but one thing to take home from this fiasco is that we need p0 to be sufficiently small to suppress this binomial coefficient and this binomial coefficient by sterling formula is about the exponential of epsilon n log E epsilon n which means that it is super exponential in epsilon n. So our aim, if we wish to prove something using this method, we have to find super exponential bounds for the minimal singular values but that means that we have to solve an easier problem getting the super exponential bounds for the norm of A tilde X for a fixed X and here A tilde will be A minus lambda. So for this lecture I will step away from random matrices and we will discuss how to obtain strong small probability, a small ball probability bounds for random vectors. So let say X in Rn be a vector with independent coordinates and we are going to and let p Rn Rn be a projection and to show that the probability that the norm of pX of two norm is small also small. This turns out to be a very difficult problem if the coordinates of X have general distribution. What arises there is the arithmetic structure of the kernel of this projection p and handling it is a rather non-trivial task but there is a particular case where everything is simple and we are going to discuss this case namely we will assume that entries of X have bounded density. So if the density of Xj does not exceed some number k then the density of the vector X does not exceed. This is a product of density so it does not exceed k to then and if p is coordinate projection then the marginal will be also a product of independent random variables. So the density of pX y does not exceed k to the d where d is the rank of p and we want to and it's plausible to that the same inequality would hold if we consider not only coordinate projections but any projection of a rank d. Let's consider the simplest possible case d equals one. So I consider a rank one projection which is just the inner product which with some vector where the norm of this vector is one. Okay, if it's the inner product this is the sum j from one to n a j Xj. So this is a linear combination of independent random variables. Everything can be easier than that and if the densities of the variables are uniformly bounded then obviously the density of a linear combination should be bounded. And I heard this statement quite a few times and it's indeed obvious until you ask yourself a question why it is obvious and then you start searching the literature and turns out that it wasn't written anywhere, almost anywhere. Only one can find this statement combining two theorems but one has to dig in an unexpected place not in probability but in geometry. So for one dimension this problem was considered by Ragozin in mid-70s and he proved that the density of the linear combination would be maximal if Xj are uniformly distributed in an interval. So now let's see what does it mean that Xj's are uniformly distributed say on the interval negative one one. If I assemble them in a vector then this vector is uniformly distributed in the unit cube. Now what is the density, the maximal density of this linear combination? Let's translate everything back to the one dimensional projection, it's the maximal density of the one dimensional projection and the maximum of px is the maximal, let me take not one one but negative one half one half to make the volume of the cube one is the maximal area of the section of the unit cube. So you take an n dimensional unit cube and you are searching for the volume of the maximal hyperplane section and there is a theorem of Keith Ball that the maximum section has area not more than square root of two. This may look surprising if you draw a cube then intuitively the maximal section should be orthogonal to one of its diagonals and this is not the case. Actually in any dimension the maximal section will be a two dimensional diagonal multiplied by a coordinate subspace. So combining theorems of Ragozin and Ball we can solve the one dimensional case. However the theorem of Ball is very delicate and the proof is complicated it was later simplified significantly by Nazarov and Podcoritov but it's still quite an untrivial piece of work. So if we are not shooting for an optimal constant there is a softer method of obtaining such estimates and this is what we are going to discuss. So we are going to prove the following theorem and it is also joined with Vercinium so that X, a random X in our independent coordinates such that the densities of each coordinate are uniformly bounded. Then for any projection P and of rank D the maximal density of the projection is bounded by what we anticipated a constant times K to a power D where C is an absolute constant. Very recently in 2017 Liffschitz-Pivovarov and Paorius found an optimal constant C using also geometric methods and it turned out that this is the same constant that we saw in Ball's theorem this is square root of 2 but we are not going to shoot for such precision an absolute constant would be enough. So before we prove this theorem in full generality let's consider a one dimensional case this is the simplest possible one and here actually the method of proof was proposed by Nazarov and Ball who used Fourier transform but their paper was never published probably because they discovered that this method appeared before in the works of Halas and if you read the argument of Halas it's precisely this so D equals 1 this is only Halas consider discrete random variables but if you adopt this method for continuous random variables you will get a proof. So what we are going to do our random variable y which is one dimensional projection is the sum A j x j where the norm of A is 1 and we want to find the maximal density of y and let's consider two cases one case is trivial if one of the coordinates of A j say the absolute value of A j so there exists a j such that the absolute value of A j is greater than one half then this is trivial because I can condition on all other variables and conditional density will be bounded then I integrate over the other random variable and conditional density remains bounded as well. So this is triviality and case two the interesting case is when for n in j the absolute value of A j is less or equal than one half in this case we will use characteristic functions so we want first of all without loss of generality I'll assume that k is one just by scaling each coordinate I can always do it and we have to bound the maximal density but instead we will bound f y at zero if our method is shifting variant and it will be shifting variant then one point would be enough and let me write f y at zero as 1 over 2 pi over r of phi y of t dt where phi y of t is a characteristic function I y t this is the Fourier inversion formula and the first question which comes to mind why we can use the Fourier inversion. Fourier inversion requires that the Fourier transform is an L1 function in general it's not but this is not a real problem if I add a small Gaussian noise to each coordinate so I add an independent copy of a small normal variable then on the language of characteristic functions I multiply by the characteristic function of a Gaussian which is Gaussian so I will make it an L1 function so let's assume from the very beginning that the characteristic function is L1 and then we can use the Fourier inversion formula and this is very convenient because our y is a combination of independent random variables that's what the method of characteristic functions was created for so this is the product j from 1 to n of the expectations of exponentials of i a j x a j x j t. So now I'm going to estimate the integral of this and let's use Helders inequality remember that a is a unit random vector so the sum of a j square is 1 and I'll use the Helders inequality with exponents a j to the negative 2 so f y of 0 is less or equal than 1 over 2 pi this is a harmless coefficient then the product j from 1 to n of the integral over r absolute value of pi x j this is the characteristic function of a scaled copy of x j so pi x j of a j t to the power a j to the negative 2 to the power a j squared. Okay so now I am down to estimating d t to estimating each of these integrals separately and if I call this integral inside pi i j then changing the variable and setting p to b a j to the negative 2 we have to estimate if I change the variables we will have p to the negative 3 halves integral over r of absolute value of x of t to the power p d t oh sorry p to the negative 1 half I ran too quickly. Okay so let's write it using Foubenius theorem or sometimes it's called distribution function formula it's p to the negative 3 halves integral over f from 0 to infinity of s to the power p minus 1 times the Lebesgue measure let me write it on the next blackboard. Okay this is p to the negative 3 halves integral over from 0 to infinity of s to the p minus 1 times the Lebesgue measure of the set of all t such that phi x absolute value of phi x of t is greater or equal than s greater than s d s this is just Foubenius theorem. Okay now let's look at what we have here and I see that if the characteristic function cannot be more than one so I don't really have to integrate up to infinity I am integrating up to one and if we know the behavior of this measure we would be done very quickly so the central lemma which would allow us to produce this density density bound is that the measure of the set of all t such that t is greater than s does not exceed 2 pi over s squared for s in zero say 3 quarters and a constant times square root of 1 minus s squared for s in 3 quarters 1 so the importance of this is in the second line as s approaches 1 this measure should approach zero and we manage to quantify it. If we have this lemma then the proof is an exercise in calculus you just split this integral from 0 to 1 as from 0 to 3 quarters and 3 quarters to 1 and estimated integral separately for the first one you have the power negative 2 but then you multiply it by the power p minus 1 so you have p minus 3 and remember we took care that aj are less than one half so p which is aj to negative square is greater than 4 the power is greater than 1 the integral is harmless and on the second interval it's also an exercise in calculus if you change the variable accurately you will see that