 matrix here and so if you try to use the entry method naively you'll get a terrible bound. So the the key point though is that this short little null vector here can be so it's actually fairly constrained it can be determined you don't need to know the whole matrix in order to figure out what what what this vector is because this right this matrix yeah so this matrix has rank k minus one okay because a null k minus one all the k minus one columns are linearly independent so this matrix is rank k minus one so what that means okay so row rank with column rank so that means that somewhere out there there is there are some k rows that span a key minus one dimensional space okay so now I can see what it said before okay so because row rank equals column rank okay so there exists k rows which you know if span a key minus one dimensional space okay so you can find k rows which which generate k minus one dimensional space but all these rows have to be of x okay which means that that x is just normal it's basically unique normal vector to to to these k rows okay so this there's no vector okay so there's some k rows in fact so again if you pay an entropy cost pay another entropy cost and choose k we can assume actually that is the first k rows that's it's the first k rows okay so we can assume that the the first k rows generate a k minus one dimensional space and these rows already tell you what x is up to a constant okay it tells you that there's no vector x is actually it's actually just the normal that the basically the unique vector which is normal to all of these k vectors here okay so this is unique up the constants and I don't care about constants okay so this x is not arbitrary it is determined by the this block of the matrix okay but you still but even after fixing this block you have this huge extra block of random of randomness that you can use so so once you fix once you fix x this way yeah so the point is now x once you once you fix this x is independent of the bottom in minus k rows okay so this this this x only depends on the on what you have to pay the entry cost first to do this okay but once you do this x only depends on these rows but is independent of all this other random stuff look down here and so now you can bound this this this probability by first of all this entry cost times the probability that this x is orthogonal to all the remaining rows okay but these are all independent of each other so this is actually just the same as entries k that's the probability that x that this x is orthogonal to just one of these rows raise the power in minus k and at this point I just use a very crude bound so yeah so this you're asking for when this dot product is is zero and you're doing this random walk where each step you each step you're doing a 50-50 you either going adding or subtracting a number with 50-50 probability so if you think about it this random walk cannot possibly be any cannot concentrate any more than one half you because because every because every step will either half the time go up or half and go down you can't concentrate a single point of probably bigger than one half so this is a very crude bound but this is all I need for what I'm doing here so each of these remaining rows only has at most a 50-50 chance of being orthogonal to the x that was generated by these rows and you've got a minus k of these rows so when you put all this together now what we what you buy things by is epsilon n and choose k twice from the entry and then like 2 to the minus n minus k okay and this will be very small okay so this will be small so my k is k is already small and this will be small on epsilon is small okay if epsilon is I'm not less than say point point one or something okay because because these binomial coefficients will be a lot smaller than these than this exponential factor here as long as as long as epsilon is smaller okay so this is how you do with the investable vectors okay so it's a weird trick you have to somehow decouple the normal vector from some of the random components of your matrix so that you can start using independence all right so okay so we can so we can now assume okay so what you done this so we can now assume now assume that this doesn't happen so we can now assume that no epsilon n of the columns are independent so we can assume that there's no sparse relationship among the columns and similarly you can assume that there's no sparse relationship on the rows so the only thing we have left to do is to understand the very dense relations between between these these vectors here okay so now we go back to the singularity probability okay so we're interested in the probability that this these n vectors have a linear dependence okay and because of what we've already done we know that we can assume now that this is a dense linear independence okay so involving at least epsilon n of the rows okay we can assume that there's no sparse relationship there's only dense relationships okay so as as I said before if n vectors have a linear dependence then one of the vectors has to be a linear combination of the others but in fact once you know that the linear dependence is not sparse once you know that it's dense in fact most of the rows have to be a linear combination of the others so once you know that there are no sparse relationships so this implies that at least epsilon n of the Xi are spanned by all the other vectors so yeah because as I said I think this is even nameless because the Stein and exchange line or something okay that if you have no sparse relationships then you can find some k of these vectors that generate all the other ones and k is at least epsilon n and they're independent and so each one of these of these k vectors that you find is generated by all the other ones okay so okay so in fact a fair proportion of the X's are spanned by all the other ones and so this thing I did earlier of restricting to a single X let's say the last Xn and asking if the last Xn was spanned by all the other vectors is not so inefficient given that so many of the Xi's are actually spanned by all the other ones so the way you use that is that this expression you can bound using this fact by one of epsilon n times the sum of the probability that Xi is spanned by everybody else X1, Xi minus one, Xi plus one, Xn okay okay so this is just double counting or if you like linearity of expectation okay so this each time this event occurs you contribute epsilon n events of this form and so so this this sum must be bigger than this sum here okay and you can permute the X's this is this doesn't have an i just by symmetry so this is the same thing so this is just one of the epsilon times the probability that the last guy is in the span all right which is where we were before okay so we now have this factor one bit epsilon but epsilon is like point one so so this is this is not this is no big deal here okay so we need we want to know the probability Xn is is spanned by by the X1 to Xn okay so again this these vectors span some subspace of dimension at most in minus one so we just choose again a normal vector okay now if these vector if the X's are not independent there could be multiple choices for this for this omega but I don't really care you may have some flexibility in how to choose omega but I don't care as long as whatever how you choose omega it only depends on these vectors and it doesn't depend on Xn okay so so the only point is that omega should not depend on Xn okay but that that that this is clearly you can do that since the conditions you do not depend on Xn okay so you just arbitrarily choose such an omega and then this is just by my probability that Xn omega is zero okay now okay so at this point we just split into the sparse and non-sparse cases yeah but but actually because we have already eliminated linear depends along the columns we know that that omega cannot be the epsilon n maybe minus one sparse okay that whatever expected you choose is it can't have fewer than epsilon n minus one non-zero entries because if it did omega has to be orthogonal to all of these two to the so so X got his rose here okay if omega was for sparse then the first n the first n minus one rose would be would be a thongano to a vector of size of size epsilon n minus one which means that there's some linear dependence along the columns of this matrix which involves only epsilon n minus one of the entries now this linear relation is missing the bottom row but you can add you can add the bottom row this increases the rank by most one and so so the complete rose here will have some linear relationship of size at most epsilon n which we have already excluded so so this doesn't actually happen so omega must be actually quite dense there must be epsilon n or so non-zero entries and then and then by the urdish bound you get one of the epsilon times like one of a root epsilon n over here and so this is just constant so epsilon was like a tenth okay so so this is what gives you the constant of a root n okay so all right so yeah this basically the the idea is to is to try to pin down what the normal vector is using some of the randomness of your matrix and then use the remaining randomness of your matrix to then get extract out some bound which is non-trivial but it's not a great bound okay so previously we're getting exponentially strong bounds here it's just the one of a root n it's not so strong and the main bottleneck is our use of the urdish fluid would often inequality okay which remind you okay yeah so okay we're using the bound that that the probability that a random wall concentrates to a point like zero is bound about only by one of a square root of the number non-zero entries so this is the urdish bound and in a sense it is sharp because as I said if if it came on zero entries and they're all equal that would go to one say then you actually do attain this bound so you know in some sense this bound is optimal but in practice you can use more sophisticated estimates if the omegas are not equal then it turns out that you can do better than this for example if you have k non-zero entries and they're all different then it turns out that there's one of a root k improves I think to one of a k to the three halves and in fact you know like for most omegas you know like if you pick n real numbers at random just generically you pick n real numbers at random they should have no linear relationships between them whatsoever and in fact this probably would should be like two to the minus inch be exponentially small in in in end so it turns out that for most omegas you can actually improve this number by quite a bit there are some exceptional omegas for which that doesn't happen but you can show that they're rare by sort of a more complicated version of these arguments so yeah so there's a whole sequence of papers in which they sort of take this basic commercial argument and make it more and more efficient and that that's how we've gotten to this so and eventually to as one of your two plus little one to the n by basically by my much more advanced liberal offer students but I'm not going to discuss that any further here because I'm already at all I'm really out of time okay that that took a way more time than I wanted yeah because what we really want to do is the singular value bound and the single value actually follows a similar strategy but you just have to have all these error terms every extra error terms okay so all right so for so for the least thing about you bound so we try to understand the probability that the least thing about you is less than some small okay so we're trying to prove something like this okay so okay so for the singularity problem we were asking for vectors x1 through xm to be to have a linear dependence now it's not quite a linear dependence but you're asking for the rows of this of this matrix okay so what this is asking is that m and x is now small not zero but small is a unit vector okay this is the event that we try to capture another way of saying this is that there was some linear combination among the columns so that there's some non-trivial linear relationship of combination of these columns which is not zero but which is which is small okay and you okay again the problem is that you don't know which for which x this is true okay so you're trying to bound the event that for some x for some vector x some unit vector x you have a relation of this form and okay again if you try to take a union bound right away you lose okay even if you pass to a net the entropy is just too high and so you need to first cut down the entropy of the x's to something more reasonable so in the singularity case we split up into the sparse and non-sparse cases where so if x was very sparse only epsilon n of the entries were non-zero you could use one sort of argument to to to bound this and you got a very good bound and then when it was not sparse when this linear dependence was non-sparse you use this this little bit of a type type result instead so it's a similar scheme for the singular value bound but just more complicated okay so instead of dividing the sparse and non-sparse you again divide into these compressible and incompressible vectors okay so now again we pick a small epsilon well in fact okay it's just it's just small here basically and we see we say that a vector x is sparse so compressible if it is within epsilon of an epsilon and sparse vector okay so it's it's it's it's it's not completely sparse as it was in the singular case so you allow an error of size epsilon because you have a unit vector and up to an error of smaller of size epsilon it is it is almost sparse incompressible otherwise okay and then it turns out that you can you can repeat the previous arguments to to get rid of the of the compressible case fairly easily so the probability that m and x is small for some compressible x you can you can bound with a very good bound in fact you can get an exponentially good bound similarly to what happened in the in the singularity probability case and it's again a very similar thing so you have this sparse vector you can pay an entropy cost you can assume that that the sparse vectors is at the very beginning so what this means is that the first k rows for some k are almost linearly linearly dependent that there's only a small amount of extra yeah that that already the first k vectors are already almost independent and then if you do things properly you can you can find some minor of that of that tall thin matrix which which sort of spans all almost spans all the other rows of that of that in minor and you again run sort of the same argument you do for in doing so by the way one of the key facts that you use is that is you need a lower bound on the least singular value of a tall thin matrix you know that this tall thin matrix has is what's called wall condition that the upper the upper and lower the largest and smaller singular values almost comparable but that's what we did in the first lecture so you actually use that that result at one point but anyway you can you can bound the compressible case just as before and you can also bound the the incompressible case but pretty much by the same method instead of having a linear dependence you have to replace every linear dependence by an approximate linear dependence where you so where some combination of these vectors is not zero but very very small and it turns out but I think because of lack of time I don't I want like you do it it turns out that that you do things properly all the all the arguments go through you need a variant of this little bit offered type type of so because now this error terms you're not interested exactly in random walks being zero but instead interested in random walks being small maybe I'll just say one thing yeah so yeah so what you can use for example you can again use a various in type theorem so the the type of bound you want you want is that is that if you have a random walk for various steps like this and you're asking for the probability of this random walk is less than epsilon the type of bound you want as it turns out is that this sort of random walk the constriction probably is something linear in epsilon if these vectors are these numbers are in some sense incompressible okay so that they're not all concentrated in a few entries so there's a bound like this which shows up as a substitute for the for the urdish bound and in fact that's very similar to the banners also needed in the first lecture so there's a bound like this which which serves as a substitute for the urdish bound and that's what eventually gives you thoughts like this yes unfortunately I'm pretty much out of time so I think I will stop here and on Wednesday we will start on the circular law yeah it is known yeah it is I forget the exact formula yeah so so Edelman can be computed what the law was in the case of real Gaussian matrices and it is known that for real Bernoulli matrices it's the same law so that's like you discussed in the notes but I didn't have time to do it here the yes it's a comparison method so if you have a Bernoulli matrix and a Gaussian matrix it turns out that the least singular value law is universal and the way you can view that the way you can see that is that you can you can look at the inverses of these matrices okay and the law of the least singular values of these matrices is the same as the law of the largest single values of these matrices but the so these matrices are pretty ugly but one nice thing about them is that they're almost finite rank see the the spectrum as I said you know the spectrum of these matrices is sort of roughly equally spaced from all the like one of a root n to to root n so and and the bulk of the entries have size about the same same size as as the as the operating norm so it this is very far from being fact low rank but when you invert it okay when you invert this picture you know the least single value goes up a little okay so it the inverse also range between one of the root n and n but most now the bulk of the eigenvalues are now down here rather than up here and so in fact very few of the eigenvalues of size of root n and this matrix is almost finite rank so these are almost finite rank matrices and a key point so by the way this was done by van Buren myself the if you want to understand the operating norm of finite rank matrix you can just look at a small minor you randomly pick a small minor and and with high probability the small minor will have an operating norm which is proportional to to to to to to for matrix and so you can you can look at small minors of this matrix and and and and that can actually be understood and you can show that those are universal that those don't don't depend on the on the entries so yeah we do actually have a technique for understanding the least single value law yeah so that's discussed in in the notes and maybe Nick might talk a little bit about it I don't know what he's planned after this but