 So, if I write this down now in the following manner this is y1 y2 sorry I should write this in the following manner y1 Hermitian y2 Hermitian till yn Hermitian times y1 y2 yn is equal to 1111 until the first R entries rest are all zeros. What does that tell me about these y's? Can you tell me now y1 Hermitian y1 is 1 y2 Hermitian y2 is 1 until yr Hermitian yr is also 1. What about the last n-r fellows here? Their norms are 0. So, the claim is yr plus 1 is equal to yr plus 2 is equal to dot dot dot till yn is equal to 0 of size m agreed. It must be so if you have to equal on if you have to have equality on both sides then it must be this agreed. No doubts about this. Also what can you say about y1 through yr mutually? Aren't they going to be orthogonal to each other? Because you see the cross coupling terms here are all zeros. So, this is what y1 Hermitian y2 y1 Hermitian y3 these are all zeros. So, we will also say that y1 y2 till yr is a set of orthonormal vectors in cm yeah. Please ask if it is not clear. Is it true? Any doubts about why this must be so? You agree with this assessment so far? Okay. Did I hear a question somewhere? No? Okay. So, now what we have essentially shown is a v w inverse is equal to what exactly? y1 y2 till yr and then a whole bunch of zeros is it not? Just that conclusion just adding it here because r plus 1 onwards every other fellow is 0 yeah. From whence I can write a v is equal to remember what w was by my definition w was if I may use some abuse of notation with your permission I would just call it the square root of that sigma. I mean I know something new sigma is a matrix a diagonal matrix but I am just calling it square root of maybe sigma hat perhaps because I am not picking out the non-zero part I am just picking out the sorry I am not picking out the zero part I am just picking out sigma 1 through sigma r okay maybe bad idea just let me write it let me not be lazy about this sigma r and then what is this? Identity 0 0 y1 yr and then another padding with zeros like this sorry sorry sorry yeah I am sorry just a lapse of concentration there yeah y2 yr and then sigma 1 sigma 2 till sigma r and i right. So, then can I not write this as a is equal to this y1 y2 till yr added with all these zeros I am constricting it because I have to make room excuse me for that but this is sigma 1 sigma 2 till sigma r then the identity here 0 here and 0 here and v Hermitian I am just hitting with v Hermitian on the right on both sides yeah no issues so far is alright now I am going to play a trick look this is a set of r vectors each of which is an m tuple yeah if I expand this to something non-zero and if I make this identity to be 0 okay let me write that down then you will realize. So, let us say y1 y2 till yr let me call this part of the sub matrix as u hat right then y till they r plus 1 dot dot dot till y till they we need m because these are m tuples I expand it in a very special manner so that this is square and it is an orthogonal matrix an orthonormal matrix in fact such that its transpose with itself leads to the product of its transpose with itself leads to I can do that by Gram-Schmidt this is just a set of r linear independent vectors in cm I can always expand it to an orthogonal basis first I can expand it to any basis and then I can orthogonalize that basis by Gram-Schmidt procedure so definitely this exists but if I have to equal this then I have to play a special trick so I will get rid of this identity here and I will pause here for a moment for you to allow you to absorb this okay what is happening here earlier this was sigma 1 times y1 the second column was sigma 2 times y2 so on till sigma r times yr and the rest of it was anyway although you had an identity here they were not adding up to anything much now I put the onus of that 0 on this and added instead fellows here to make sure that this u hat augmented by u till day gives me a full square matrix that is orthonormal right or unitary in this case right then this is true is this clear if you understood this that is it we have arrived at our desired where no this is all m tuples where no no no this is m this is m this is n so this is a rectangular matrix this is a rectangular matrix now I have gone from this till here it was probably square but now I am I am now making it rectangular yeah good observation so this is now because rectangular see ultimately what I am doing here is I am taking a combination of all of these fellows the rest of it does not matter so now I am placing yeah I am playing around with the dimension so now this has become rectangular until here it was square but now this is rectangular yeah nice thank you for that observation so that is something that needs a mention indeed the ultimate end of the day my goal is to show that this is a rectangular matrix so this must also be a rectangular matrix alright so this was m cross n this was n cross n and this was also n cross n so the whole thing was m cross n now this is m cross m so this has to be m cross n now this fellow which was square until this point at this point has stopped being square it is now m cross n thanks for that observation that is the point I should have mentioned yeah good that you are paying attention you see what your friend has just pointed out it is not just that the dimensions are intact now this is a square matrix until this point it was not a square matrix no this fellow was square so then this fellow could not have been square right yeah so this is true and if this is true we have one of the greatest results in matrix theory that will come across which is the singular value decomposition this is exactly what the singular value decomposition looks like so this whole matrix now we call it you by my very nature of construction this you is going to be unitary V is already unitary so for any matrix of size m cross n I will always end up with a legitimate singular value decomposition where the fellows in the center here are the singular values right they happen to be the square roots of the eigenvalues of a Hermitian a just check that they will also be the square roots of the nonzero eigenvalues of a Hermitian as an exercise try to check that the nonzero eigenvalues of a Hermitian a equal those of a Hermitian you don't have to go for determinants you don't have to go for characteristic polynomials just apply the simple definition of eigenvalues if you have a Hermitian v is equal to some lambda v hit it with a Hermitian on on the left and then you again have the eigenvalue eigenvector equation just check that okay I've dropped enough of a hint there so this is something that exists for every matrix the very important thing is also to also to be noted here is that there is an ordering sigma 1 is greater than or equal to sigma 2 is greater than or equal to so on till sigma r is the so-called smallest singular value why is this interesting there are fascinating aspects to this singular value decomposition some of which will be very interesting revelations and there's a reason why the moment people go for any proof using matrices the first thing they assume is let this be the singular value decomposition of the matrix and then let's proceed with the proof what is it I'm saying that the image of the matrix a is equal to the span of can you guess what is going to be y1 y2 till yr so the singular value at once provides you with an orthonormal basis for the image of a all right and this mode to come the kernel of a is equal to the span of what is it no not that it's a rectangular matrix remember the kernel must come from which space it cannot come from cm it has to come from cn n tuples no the matrix a acts on n tuples not m tuples so kernel comes from the domain not the co-domain yeah so I'm going to do the following I'm going to write v as v1 v2 till vr then vr plus 1 dot dot dot till what is the size n cross n right vn so this is going to be equal to the span of vr plus 1 vr plus 2 dot dot till vn so the last n minus r fellows in the matrix v the last n minus r columns of the matrix v provide you with an orthogonal basis for the kernel of a the first our columns of the matrix you provide you with a basis an orthogonal basis an orthonormal basis for the image of a all right we'll look at a quick proof of this but before we go for that proof I'm going to just tell you something very interesting here which is going to be the application maybe if after that we don't have time we'll just do the proof of these two statements in the next lecture but the very important application which I'm not going to prove by the way but hopefully you'll find it impressive and interesting yeah that was sorry no no no this v Hermitian the way I've described it is v Hermitian so the columns of v or the rows of v Hermitian it's the same thing yeah yeah so you had sigma 1 squared sigma 2 squared those are the eigenvalues of a Hermitian a yeah yeah yeah yeah yeah yeah when you take a matrix and you want to take its square root there are actually legitimate square roots of positive definite matrices this is no no no I'm just talking about the sigma part the non-zero part that is just an r cross r block so which is why I got rid of that square root notation because I felt it was a bad notation I was being a little lazy and then I thought no let's just write it down legitimately just take the square roots of sigma 1 squared which is sigma 1 the positive square roots no no no no when you take the entire matrix that also does exist if you have a no no square matrices when you have positive definite square matrices you can always write them down as square roots there are transformations that allow you to do that factorizations of matrices that allow you to write as write them as product of not exactly in the same manner but you can factorize any matrix as any positive definite matrix as not just the fact that a Hermitian a or a transposed a is positive definite but every positive semi-definite matrix can also be written in the form of some matrix such that it is h transpose h and such existences and things are routinely proved in a course on matrix theory which is not what this course is but still maybe we will not have time for the proof of these results we will probably do that quickly in the next lecture and then move on with our original agenda which is that of exploring the eigenvalue eigenvector question we have been taking a slight detour because I felt this is an interesting application but I will end with a fascinating application of this so if you are transmitting data in the form of a matrix through a channel okay an array of numbers so if you have a matrix a that belongs to say I mean you can take it to be real numbers complex numbers whatever it is how many numbers are there you're transmitting mn variables and sometimes if m and m are very large this is a whole you know different ballgame right if it's some thousands and this is some millions and it million times thousand a thousand million yeah huge numbers of chunks of data through matrices that you're transmitting you might not want to do this yeah so you might want to transmit very efficiently you cannot just say oh I'm going to just transmit this sub matrix that's not going to help think of an image for example you see that screen right behind you yeah and if I just transmit some of the pixel information from the bottom left corner you would be left with nothing but this pink block here that's not meaningful you want to transmit this information want to compress this data and yet send in this information through a channel meaningfully so that someone receiving the data at another end can meaningfully reconstruct what you had actually transmitted remember mp3 files when they came in maybe you are too too young we saw that happen when we were college students so we saw the mp3 revolution which also caused a lot of piracy by the way but you know earlier in the series you used to have some what 20 odd songs in a CD and they used to be very costly and then suddenly this mp3 opened up and you had hundreds maybe 500 songs in a CD in the mp3 format that's all the magic of data compression right so this is an example of data compression what if instead you are required to transmit information to the tune of some m plus n times some k but k is much much smaller than either of m or n then it's a winner if you're thinking of m is equal to 5 n is equal to 2 you may not see the advantage but if you take m is equal to 5 million and this as 10 million you'll definitely see the advantage because this is of the order of 10 million times maybe 20 so 200 million whereas this is a huge number yeah so definitely if you can do this it's a winner so what you need is essentially a close approximation of the actual mn m cross n matrix which matches it now what do you mean by matching again you need some idea about the distance and we know one sure shot way of defining the distance is through norms so there is this norm which if you recall while describing inner products we are described actually it's an inner product yeah we had said it's a Hermitian B is the inner product so this induces the norm so the norm would be trace of a Hermitian a so this is the norm squared that's that's how it is defined what do you think this is trace is the sum of the diagonal entries I leave it to you to check a moment's thought will lead to the conclusion this is nothing but it's like you're stacking up the entries of the matrix as a tall vector of size mn cross 1 and taking it's to norm this is just the sum of the squares of individual entries in case of a real matrix and some of the squares of the moduli of those individual entries of the matrix if the matrix is a complex matrix that's what this norm is now if you want and this norm has got a name it's called the Frobenius norm it's a norm that's induced by that inner product so this is the inner product yeah and this is the norm that that inner product induces now the question that has been posed in this case is can I find me a bee can I find myself a bee so minimize over all bee such that rank of bee is equal to k and what am I trying to minimize the Frobenius norm of a minus b I'm sure you would agree that this is a meaningful thing to minimize right because you're trying to find the minimum rank k approximation the best rank k approximation sorry not minimum I'm fixing up a number k which is much much smaller than either of m or n so a belongs to say c m cross n and I'm trying to find a rank k approximation of this fellow which is this bee so this bee is a minimizer alright and it turns out this bee is given by and that's the best part of it sigma 1 let's say y 1 u 1 Hermitian plus sigma 2 y 2 u 2 Hermitian plus dot dot dot plus sigma k y k u k Hermitian in other words even before I go into an intuitive explanation of why this is good I'm not going to prove this I'm going to just intuitively explain why this is good but just check the number of data points the number of actual numbers actual complex numbers that you need to transmit what are the sizes of these wise remember why is our columns of what sorry this v right this v sorry this is v so why is the columns of u so these are m tuples the v's are n tuples how many of the y's are you picking out k so you have k times m that describes all the y's for you all the requisite y's for you and you have k times n which describes all the v's for you and you have k singular values so the total number of points or numbers that you are transmitting is m plus n plus 1 so compared to m plus n 1 is a small number it is almost of the order of m plus n times k as we had asked for so if you are taking a very small rank approximation of a large matrix then this is the best way you can do this and why is this best so let me give you a quick intuitive understanding of why this is best because now you think about what your domain the basis for the domain is the basis for the domain is v1 v2 v3 so on so along the v1 direction what is your gain what is your amplification if you hit a vector of the form gamma v1 with this b the gain is sigma 1 times y1 but y1 has a norm unity so the outputs norm is amplified by sigma 1 so in fact although I have not proved it here you can actually guess it from here that y1 through yr over there they will span your image and the first few singular vectors are the best approximations the importance of the singular values thereafter determine the importance of the corresponding singular vectors in u which are these so this is the principle direction the most important direction along which this linear map or this transformation assigns a maximum weight which is sigma 1 the maximum amplification along this direction it applies the second most important so if you have to choose the most important direction then it is y1 and you give 0 to others if you want to choose the two most important directions then you choose the first two so that is how you come up with the conclusion that if you want to choose the k most important directions the k principle directions the best k directions because all other directions subsequent to those k from k plus 1 through r will have diminishing returns will have diminishing weight so I am not going to prove this again but it is a very very fascinating result and one that allows you significant advantage in terms of data compression like so right so you need to only transmit this many number of complex numbers or real numbers as the case may be in order to allow your friend at the other end the receivers end to be able to get a best reconstruction of the original a matrix that you had okay and now think about images and other sorts of data where you have to do this smartly in a spatial manner where this matrix is like some spatial data arranged in some array right so you cannot afford to just give a very nice faithful description of one part of the picture and the other part of the picture is completely blocked nobody cares about that image rather you would have a low resolution image that has low resolution throughout but at least the entire images before you so this is the smartest way to do this is the best way to do this and the singular value decomposition allows you to do this this is of course by no means the only or the most important application of singular value decomposition if you are interested do read up on them there are several very fascinating examples and you will see much of the things that get talked about for example principal component analysis so on and so forth all of it is basically at the end of the day just singular value decomposition known by some other term or language okay so in the next lecture first we shall prove this and then we shall get back on track with any square matrix not just symmetric matrices and all and we shall explore the eigenvalue eigenvector question and more important the question of diagonalizability of a square matrix through a set of eigenvectors thank you