 The overarching topic of the second week is universality in random matrices Universality is a key concept not just in random matrices, but in the whole of probability mathematical and statistical physics and in random matrices in the last decade There have been spectacular developments in proving Decades-long conjectures on various aspects of universality and so we'll hear from the top most experts on the topic Our first lecture doesn't really need an introduction He's made fundamental breakthroughs in more domains of mathematics than most graded students take classes in We are fortunate that one of those domains was random matrix theory Our first lecture for today is Terry Tao from UCLA and his title is some universality techniques for random matrix ensembles Okay Okay, so I'm actually gonna cover three topics only well, which are all loosely Connected to universality, but not as much as I was planning to do initially So the three top three topics of this lecture series will be following first of all These singular values random matrices Second is the circular law ID matrices Actually here also be working mostly ID matrices. I'll tell you what ID matrix is in in just a second And this topic will will use as a key input material from the first topic and then the third topic which is not really related is the linear dexting or strategy Which is one of the key? tools we have nowadays to To prove universality results for big new matrices in particular, but actually also ID matrices It isn't the only tool you can't do everything with this method, but you can do some stuff Okay, usually you combine it with other methods to get the best results Okay, but so today in the first two lectures, I think we'll be focusing on least singular values Okay, so you all know the singularity composition. Hopefully so if you have an N by P matrix M of say complex numbers and say P is less than n Then we have a singularity composition Okay, and can always be factored as As the product of it of a unitary matrix and by n times a diagonal matrix Or as diagonal as you can give them a rectangular So it will be an N by P matrix So it would be diagonal in the P by P block and then a big block of zeros and then another matrix Okay, so you can always factor an arbitrary matrix it by two unitary's and essentially diagonal matrix and these numbers are Not negative and you can arrange them in increasing in decreasing order Okay, and these are the singular values of the matrix. Okay, so they're very important almost as important as the eigenvalues Of course, if your matrix is a condition the singular values are just the absolute values of the eigenvalues Otherwise that the relationship is more complicated Okay, but in practice the most important singular values They're all important, but but but the most important usually are the largest singular value and the smallest singular values So the the largest singular value of a matrix is Also the operator norm So another way to think about it is it's the supremum Of mx of all unit vectors So the the largest singular value is the largest that that a matrix can dilate Can dilate a vector. Of course here. I'm using the L2 norm always for vectors Okay, so that's the largest singular value and similarly the smallest single value is Is the in okay, so it's it's the it's it's the it's the it's the most that That is the most contractive that your matrix can be You can also think of it as the normal up a normal of the inverse To the inverse if it is matrix is invertible Okay, so if matrix is square and it's invertible you can also think of it as Normal inverse so the least the least thing the value measures how invertible your matrix is if you have a square matrix Your matrix is invertible if and only if your least singular value is non-zero Okay, so it's always non-negative and a zero exactly when your matrix has full rank Or in the case of square matrices when it is invertible okay, so These numbers show up all the time When you're studying other statistics of random matrices, so it is important to understand these numbers for various Matrix models now, there are very there are many matrix models that will care about but for this For lectures one and two we will care mostly about IID random matrices so these are matrices and by P Where all these guys are random numbers random complex variables which would be Identity and independently distributed so every single entry has the same distribution and they're all independent Okay, it's customary to normalize these of mean zero and variance one And in fact for most of the lectures just a simplicity we will focus mostly on the case of Bernoulli random matrices We're not only your IID, but actually each of the random variables is just Plus or minus one with a 50 50 chance of each Okay, so it's also good random sign matrices Okay, so you just take your n by P matrix and just randomly fill Your entries with plus or minus ones you flip you have a different coin for each one which you want all these are these Entries and you get a random matrix ensemble Okay, and this is a very typical Example of IID random matrix It's the most common tutorial of these ensembles So at the other extreme you can also talk about Gaussian random matrices where the entries here are Gaussian random variables So that's a special case which you can often do explicitly by other methods, so as you may have seen in other in the lectures already when you have a Gaussian random matrix the many distributions such as the joint distribution of the singular values or the eigenvalues has Have some very nice explicit formulas, you know in terms of log gases or something and or determine other processes or whatever and You can compute these things explicitly But when you leave the Gaussian world and you work with these more combinatorial models such as Bernoulli random matrices Then you do not have nice formulas for the distribution of of the singular values and so forth for example Well, for example, you know, this is a discrete ensemble So there's this there's only finally many different matrices here So you no longer get a continuous density function. There's no longer a joint distribution, which is a there's no longer a PDF of the Obviously single values. It's just that some some discrete probabilities distribution, which has a rather complicated formula Which is almost never usable directly Okay, so we would like to to to have tools to understand the behavior of matrices such as Bernoulli random matrix Yep, okay, you can consider many other random models, you know, you know, for example, Erdisch-Reniografs or Okay, you know or sparse things or basically, I mean these models are almost as general as arbitrary random graphs I should mention that that it's you know, I mean sort of clears them I'm being indifferent, but these are not mission matrices by any stretch Yeah, even when an even an NMP are equal. We're not imposing any symmetry on this on these matrices So so they're not symmetric And that actually complicates the analysis. Well, it doesn't it doesn't make it doesn't actually make things simpler of For part one but for part two it certainly makes things more complicated the fact that you're not a mission when you start talking about eigenvalues Okay All right So the basic questions we have okay, so which will be the topic of today is Yeah, and and is So given a M N-by-P and a matrix That's a van der bedouille matrix simplicity Okay, what is the behavior of the largest single value and the smallest single value? Okay, so Okay, so these the answer these questions will turn out to be important for part two So but but these are also some interesting questions in in in their own right Okay now of the two the The The the largest thing the value is by far the easier one to deal with and there are many many tools to to understand the Because it's also the operator and all and and this is there's many things you can do So for example, you could you could try to understand the largest thing the value using the moment method for instance so You can you can relate the operator norm to You can take for example the trace of mm star m times its adjoint and this is just the sum of the eigenvalues Sorry of the single value squared So in particular it bounds the the first thing the value squared Okay, so if you can compute this for a second moment by quantity of your matrix then you can then you can certainly bound at least upper bound the the largest Single value and you know more generally you can you can bound any any even power of the single value by an appropriate power of mm star and Okay, and we have lots of tools for understanding these sort of sums these are fairly explicit polynomial expressions of the coefficients And so this you can already use And you take it by taking k mod of the large you can get some actually quite precise control of on the largest single value by ideas like this But I won't talk about this. Maybe we'll be talked about in other lectures Because this this sort of method doesn't tell you very much about the least single value It yeah Yeah, you know you would like to throw negative moment somehow if you wanted to to to control the least single value And so this method turns out not to be so So good asking the second question So we're going to focus on on a different method, which is called the epsilon net method Which gives weaker results than the moment method it doesn't give as strong bounds on on the largest single value But it it has the advantage that it gives or it can do something it can say It can say something about the the least single value Whereas the moment method can't really can't do much about about this this is the spectrum Okay, so let's first talk about this method in the context of the largest single value So the basic theorem we're going to prove here Is that with M being a matrix like this there exists a constant C such that Such that the largest thing of value is in fact smaller than a constant times root n With exponentially exponentially high probability. Yeah, so what that means is that? The problem that it is not like not less than C C would end It's actually small. It's actually exponentially small There's an absolute constant C. Okay, so outside of a really really tiny event the operator norm of your matrix here This is random Bernoulli matrix is basically root n Okay All right, so this is this is this is the first result Okay, so you know the trivial bound by the way is is M Like if all the entries are plus one if they're all plus one then the operator norm here would be basically M Okay, but usually you get a square root saving because of the random signs Okay, so Yeah, so if you use the moment method See you can show that any C bigger than two works, but the argument I give will not give C as low as two I think it gives them like four eight So it's a little weaker, but but up to a constant it gives you comparable results All right, so The way you prove this is that you will rely mostly on so we won't use the moment method. We will just use this supremo definition of the Of the Largest single value Okay, so we will just go ahead and compute this sort of a brute force Okay, so the probability that the operator norm is is bigger than C would end is is the probability of the soup over all unit vectors So If you open almost bigger Then then C would end and that means that there's some unit vectors such that M applies that unit vectors also bigger than C would end Okay, so this is this is the probability of a union So you take the union over all unit vectors for every unit vector. There's an event Okay, so for every unit vector There's a chance that that that unit vector is the bad singular vector that that unit vector might be the vector That makes this This this quantity big and then you're taking the union of all these events and we're trying to show that this probability is exponentially small Okay, so The most naive thing you could do here and it's way too naive is the union bound So, okay, so you can say oh, okay, this is a union so I can boundless by some of all unit vectors of this probability Now Now the good news is that for each individual x. This is actually fairly small Okay, so so there are these sort of turnoff type inequalities and we'll get those later that each individual one of these probabilities is Certainly exponentially small Which is sort of has to be because if you want the whole thing to be exponentially small certainly each one of these smaller events That's actually small But this you know the moment you do is you've already lost the game because there are uncountably many points on the sphere Yeah, and you're something uncountably many points and no matter what what how good a bound yet you have on each individual term In fact, you can't even do this because the probability is all uncountably additive So you don't actually do this okay, so the union bound directly doesn't work But this is way too wasteful because you know this so there's uncountable number points in the sphere So there's an uncountable number events here, but many of them overlap each other Very tightly, you know if two points x on the sphere very close together Then say x and y are very close together then the event that mx is big and the event that m y is big should be almost coinciding and then using the union bound to bound events that are almost called the union of s almost coinciding is very inefficient so You want to improve this argument? By not taking the union by replacing this uncountable union, which you can't bound by the union bound by some more discreet union over more separated points x and y so so that you don't sort of Reuse the same event over and over again. You try to reduce the overlap so that the union bound becomes more efficient Okay. Oh, so so so pieces. Oh, yeah, so Yeah, P for the up for the largest thing about P is not going to be very important. So so P is just anything less than n. I did not see that. Yeah. Oh, yeah, so For this theorem you could assume without asking about it that p equals n actually because If you have a n by p matrix, then it is of course You can think of as a minor of an n by n Bernoulli matrix And if you can bound the operator norm of the bigger matrix, then you you certainly bound the operator norm of a smaller matrix so you certainly get Yeah, so if you can do this for p or square matrices you you get up over tangular matrices for free Yeah, okay, so if you like you can think of this as being square matrices P will become more important when we do with the least things about you. But yeah for now if you wish you can think of this as being square Okay All right, so yeah, so what we're going to do is that so we have this uncountable unit sphere here. Ah So x is taking values in okay, so it's x takes values in in in CP Actually because it doesn't really matter, but because there's a Bernoulli matrix you can actually just well up You can restrict to to to to real vectors because because this is but anyway Okay, so we have this big sphere here So what we're going to do is you're going to discretize the sphere So we're going to pick some some small scale epsilon Actually epsilon you can measure you take to be something like one tenth or something you take a small problem to epsilon and we will pick Okay, so we'll let that sigma be what's called a maximal epsilon net In the sphere, okay, so it's okay So in the sphere you take a finite set of points sigma which is an epsilon net so epsilon that means means that all points And sigma are separated by distance at least epsilon okay, so all these points are these epsilon apart and Maximal means means that but that that was maximum that you can't add any more Okay, so you can't add you can't add any other point can't make Secretly bigger There's no other point on the sphere that you can add to this this this set without destroying the epsilon net property Now we're saying that is that every point on the sphere lies within epsilon of one point on this net Okay, so that's called an epsilon net So these things always exist just from the greedy algorithm. Okay, you use Zorn's lemma. That's really overkill. Okay, but okay just like the Greedy algorithm would do it Okay, so so I mean it's a bit hard to write them down explicitly But but you can certainly show that it exists All right, so So so so so what's what's so good about these these sets? So so first of all They're not too big. So so there's a cardinality bound. We can bound how big these are and You could be more precise than this but but the cardinality would you can bound it by something which is exponential in N Well in P But certainly in N that it is bounded by some constant model of one of an epsilon raised to the power N Okay, so so if I say one of a 10 then this would be something like 10 to the N Or maybe 40 40 to the N. So the arm these these nets they are moderately large They're x they exponentially large, but they're not incredibly large which was which was the problem. We have you had previously Okay, so there is some bound. It's not so great. Especially when epsilon gets small This is not a great bound, but but it is it is a bound Okay, so this this is a fairly easy bound it it follows from what's called the volume packing argument Okay, see what you can do is that if you have this net Around every point in the net you form a little ball. You take the ball of eight is epsilon over two So around every point in this net You take a ball like this and because it's an epsilon net because all points are separated by at least epsilon The triangle inequality all these balls are disjoint. Okay, so you will if you can fatten up each of these points to do a ball and on all these balls are Disjoint and on the other hand just by the triangle inequality all these balls will The disjoint and they lie inside for example the ball radius, too Okay, that's a bit crude But but all these balls certainly lie inside the ball radius too because they're centralized on the sphere and the radius is Okay So so they were disjoint so therefore just by counting volume The number of points here is at most the volume of the ball of radius, too over the volume of the ball of radius epsilon over 2 Okay, and this is some constant times 2 to the n this is constant times epsilon over 2 to the n and then you just divide In fact, I even got a precise constant yet So in fact you can balance right by 4 of absolutely yet Okay, this is not the best bound you can do for sphere packing But that's not it's not the point Okay, you could I mean it's within a constant of optimal actually So it's up to this for this is the best possible and and because I'm not caring about optimizing this C This is good enough Okay, so on the one hand, we have some some finiteness some bound on On this net and then the second thing is that this uncountable soup here You can bound by by this by a carnival by a finite soup Okay, so the other fact is that as soon as epsilon is less than one half the This uncountable soup here is actually bounded by twice a finite soup Okay, so we had an uncountable soup which was causing us trouble In our first attempt to prove this theorem, but I can bound up to a factor 2 Which I won't care about because I'm not trying to optimize the C. I can bound it by this finite soup Okay, so so this is Okay, so why is this true? So first of all, this is this is also the operator norm okay, so This is actually quite easy Okay, so the operator norm is this soup. Okay, so the soup is attained somewhere So so there was some point on the unit sphere because the sphere is compact The operator norm is the operator norm of Mx for some for some unit vector Okay, now this you this unit vector this point point the sphere doesn't lie in the net necessarily Okay, if it did then we'd be done. We wouldn't need this to But it lies close to it lies close to the net. Okay, so this X So then there must exist a point in the net Which is within epsilon of all of this point here Okay, so you have point x which isn't on in the net necessarily, but it is within epsilon Of some of a point y which is in the net So just by the triangle inequality. I cannot find the my Plus m of x minus y Okay Now this I can bound by the soup over the net and this I can just bound by the operator norm Times this the length was vector which is epsilon and most epsilon Okay, so as long as that was at least one half I can this error term can be absorbed into the the left-hand side and giving them a factor of two Okay, so you get Okay, so you get this bound here All right. All right, and so hence so going back to this This easy problem of bounding the largest thing the value so This is bound up. This was the the uncountable soup, but but using this this fact here taking epsilon be one half say yes so so for this problem actually the optimal epsilon is one half so I can just bound this by All right, this the soup over the This net is bounded by But I need this now to be bounded by I think super 2 Okay, so using this bound I can bound the uncountable soup by this this finite soup and now it is Just realize a sigma is a bad choice rotation here, but okay, but I can now I can now but use a union bound and now I'm not immediately dead yet so Okay, and because up because of this cardinality balance, so this is bounded by I guess four of episodes It's I even have exposed the constant eight to the end an exponential factor times of soup over all x So see we started with a probability with a soup inside And now we've taken the soup outside using the union bound and but we have to pay this factor This kind of fact is I'm going to entropy cost It's the price you pay because you don't know in advance which x is going to work Which x is going to be the worst x in advance here? If you if you knew if there's a fixed x which was always the worst Then you could forget all the other all the other x's and then you wouldn't lose anything But the the x which is the worst we don't know where it is But we know it's close to one of the points on this net And so if you want to to freeze x and you just want to work with a single x you have to pay the entropy cost Which is which is the metric entropy of office set the number of points Are the number really of genuinely distinct points in your parameter space? Okay, so this is just this is just a price you have to pay but as As long as the bound here is is exponentially better than the entropy cost you can still win Okay, so this is now a much simpler problem. We just need to estimate a single Event like this rather than a union. Okay, and this can be handled by or by various techniques So Alright, so first of all I can square it. Yeah, so Okay, let's see. So so so m in by P matrix. Okay, so x is it was was this P by one matrix vector x M you can think of it as a bunch of rows so I Can think of this matrix as a bunch of independent rows Okay, and if one of these rows It's just a random element of a discrete cube there's a minus one plus one to the P So the entries are just random signs But but also because the end this was an ID matrix all these rows are independent So these are uniformly distributed on the cube and independent variables So so MX It's just is it n by one vector and it's it's just the dot products So this vector is just a whole bunch of independent random variables and each random variable It's as a dot product of a random vector with a fixed x. Okay, so so So x is deterministic. Oh, this is just a a fixed bunch of numbers You you dot this number with a whole bunch of random vectors and then you you add them all up Okay, so this is Okay, so but I could steal this is the same thing I think of the probability the sum of X i X squared is bigger than c squared and oh for Okay, so I just squirp I just scribble size use Pythagoras Okay, so what I got here? I've got a probability that the sum of a bunch of independent random variables is I need a large deviation estimate. What is the probability that that that this the sum is very big Okay, now we have No shortage of tools to understand sums of independent variables In probability, so you know at this point you can pull out your turn off or your house you're hofting inequality or whatever So Okay, so Okay, maybe in the interest of time This is covered in the notes, but the Yeah, maybe actually just to to save for the time so The way you prove things are shown up in equality usually is that you try to control this by some sort of exponential moment so you can you can boundless by the exponential moment you take some exponential of of some moment here they get a divide by exponential of c squared n over 4 Okay, so it's just Markov's inequality and the point is that this is how you can compute because because There's some factors quite nicely if you actually work it all out because because of all the independence So this number you can compute and it's true for any t bigger than zero and Okay, and so this is increasing it is decreasing in T. This is