 good morning. In this lecture, we will be studying singular value decomposition. This topic embodies a very deep connection between quite a few different topic in the area of linear algebra. Consider this situation, we have already studied Eigenvalue problem in which we wanted to decompose a matrix A in this form with u and v equal. We have already studied Eigenvalue problem and all the time of our study in Eigenvalue problem, we have faced this question whether the decomposition of this sort will exist or not, if it exists then how to handle it and so on. So, it would be nice always in the Eigenvalue problem if we could make this lambda diagonal with u and v orthogonal and all such things. And at every step, our work was mired with the difficulties of several sorts. First among all matrices, we could ask this question only for those matrices which are square that is a subset. A subset of matrices, the subset of square matrices constitute the only matrices for which this question arises. So, for square matrices this question arises to begin with. Then in the square matrices not all matrices can be diagonalized, not all square matrices can be diagonalized. So, among square matrices we had a subset which is the set of diagonalizable matrices for which this kind of a decomposition is possible. Even among diagonalizable matrices, we had another subset of matrices which are symmetric for which this decomposition could be affected with orthogonal v which is same as u for that matter with that condition fulfilled. Even among the symmetric matrices for which we had this valuable theorem that you can work out an orthogonal diagonalization, even there the diagonal elements of lambda could be negative. Now, even among symmetric matrices we had a sub case subset which is positive semi definite in which case the lambda i turns out to be non-negative. Now, this is the best possible situation which we could think of and that is a sub case of the sub case of the sub case of the generalized general form of the matrices. Now, we can ask this question that we do not ask for a similarity transformation and we focus on this form of the decomposition. When we say we do not ask for similarity we basically want to allow u and v to be different. So, in that case we ask this question that if we do not ask for u and v to be equal, then what of these we can ask for and get results and with just this one relaxation of allowing u and v to be different, different in content as well as in size. If we allow that, then we can get a decomposition of this sort which is guaranteed for all matrices irrespective of size and shape that means even rectangular matrices with orthogonal u and v matrices and with non-negative diagonal entries in delta in this matrix lambda, diagonal matrix lambda. That is in that case we do not refer to it as lambda because lambda has been already used for the matrix of Eigen values. So, we show that as sigma that means that just by allowing this u and v to be different we can effect a decomposition of this sort with all the other facets that is the decomposition will be possible for all matrices and it will be always possible. The question will arise for all matrices including rectangular that diagonalizing that decomposition we cannot call it diagonalization that decomposition will be always possible with orthogonal u and v not same anymore and the diagonal entries of this matrix sigma will be all non-negative such a decomposition is the singular value decomposition and those diagonal entries are called singular values of matrix A. Underline is this very important theorem called the SVD theorem or singular value decomposition theorem. The theorem says for any real matrix A of size m by n there exist orthogonal matrices u which is m by m and v which is n by n both horizontal such that u transpose A v is diagonal matrix of size m by n. Now, what is this idea of a diagonal matrix of a rectangular size. So, its diagonal entries are sigma 1, sigma 2, sigma 3 etcetera all non-negative which you obtain by getting a square matrix first of size p by p in which p is lesser of the two dimensions m and n. Now, if you want this diagonal matrix to be m by n size then whichever is larger m or n that many extra rows 0 rows or that many extra 0 columns you append and these diagonal entries sigma 1 to sigma p are called the singular values of this matrix A. A similar result is there for complex matrices then for that the SVD theorem will read for any complex matrix A belonging to C m by n there exist unitary matrices u and v such that u star A v where star is a conjugate transpose is real sigma this is always real and so on. So, now this theorem gives the basis for the decomposition in this manner for a matrix A. Now, the question arises how to construct u v and sigma the three components the three factors the way we work out the construction at the same time provides the proof also of the SVD theorem that such factors u v sigma will always exist. So, let us quickly look at the construction to construct the singular value decomposition the factors u sigma and v you first say that if we could decompose A in this manner A as u sigma v transpose then its transpose A transpose will be this v sigma transpose u transpose and then we can just multiply it. As we multiply it u being orthogonal u transpose u will be identity and we have got v sigma transpose sigma v transpose. Now, sigma transpose sigma we have already discussed that sigma is a matrix of this shape in which if m is less then it will have only m columns which will mean that the matrix will be only this much these rows will not be there. If m is less then it will have this shape if n is less then it will have this shape. So, extra 0 rows or extra 0 columns there will be no question of anything here because one of these 0 blocks will be there not both. So, if sigma is of this shape then sigma transpose sigma will be a square matrix in which the diagonal entries will be sigma 1 square sigma 2 square up to sigma p square and then since this matrix is n by n size. So, if n is larger then there will be additional 0 entries in the rest of the diagonal positions and all the of diagonal entries will be 0. So, that is the description of this sigma transpose sigma. Now, here this sigma transpose sigma this matrix is being called lambda which has a region. You see A transpose A is certainly symmetric not only symmetric it is positive semi definite also. You cannot say a priority whether it is positive definite or not, but positive semi definite it will certainly be. Now, if this is a symmetric matrix then this certainly has a diagonalization and orthogonal diagonalization for this matrix and this v lambda v transpose is actually the decomposition that you do when you solve the diagonalization problem of a symmetric matrix. So, that means this v which you want in single value composition is in fact the matrix of Eigen vectors of A transpose A and this lambda then is the diagonal matrix of Eigen values of A transpose A. If so then we already know how to determine v and lambda because we have studied the Eigen value problem of a symmetric matrix in good detail we can effect this diagonalization. So, that means by effecting the diagonalization of a symmetric matrix we determine v and lambda. The moment v and lambda are determined we can work out sigma because each diagonal entry of lambda the first p entries are nothing but sigma 1 square sigma 2 square sigma 3 square and up to sigma p square. So, from the first p lambdas from here which are all non negative we can take the square root. So, when you take the square root there are 2 square roots for a positive number 1 positive 1 negative. So, you collect only the positive ones which you put as sigma 1 sigma 2 sigma 3 etcetera up to sigma p. So, all the non trivial entries of this matrix sigma is in our hand then and then we append that with appropriate number of 0 rows or 0 columns depending upon what is the size of A which is the same as the size of sigma. So, that means v and sigma are now in our hand. Now, remember A is u sigma v transpose and v is orthogonal. So, that means we can post multiply that original definition of the singular value composition with v and then v transpose v will become identity from here you will get only u sigma and on this side you will get A v in which in this entire equation A was originally given v and sigma we have determined and we are left with the problem of determining this matrix u the columns of the matrix u. So, 4 situations will arise when we go to determine the columns of matrix u. In fact, 4 situations may arise in any particular case only 3 of them will arise the either the third will arise or the fourth will arise depending upon whether the matrix A has more rows or more columns. So, first situation is actually the one in which you will have some information to determine. If you equate the 2 sides column by column then you will find that the left side will give you columns which is A v 1, A v 2, A v 3 where v 1, v 2, v 3 are columns of matrix v and from the right side you will get columns the corresponding columns as u 1 into sigma 1 plus all 0 then u 2 into sigma 2 plus all 0 and so on. That means you will get this kind of column equations when you break this column by column that will be the first column r columns if r is the rank that means for the non-zero singular values. So, out of these p singular values some of them may be 0. So, for non-zero singular values corresponding column equations will give you this kind of equations and if sigma k is non-zero then determining the corresponding columns of u is easy you just divide A v k by sigma and you get the columns of u. So, these columns developed from here are bound to be mutually orthogonal you can verify that suppose 2 columns u i and u j have been developed like this and you want to find out u i transpose u j. So, they are not only orthogonal they are also normal that is each of them is a unit vector also. So, for being also normal this has to be 1 if i and j is same and 0 if i and j is i and j are different. So, you can see this that when you consider u i transpose u j from these expressions from here you have determined u i u j then here you will find that you will get v i transpose A transpose A a v j now A transpose A is the matrix for which we actually solve the eigenvalue problem. So, v j is its eigenvector corresponding to eigenvalue lambda j that is sigma j square. So, when you write this here 1 by sigma i is here 1 by sigma j is here. So, we collect the scalars together and then we are left with v i transpose v i transpose A transpose A v j. So, write v i transpose and A transpose A v j is lambda j v j this is lambda j and this is v j. So, lambda j that is sigma j square is scalar which we can bring here and we are then left with v i transpose v j here. From there you find that if i and j are different then v i transpose v j is 0 because v and lambda together give the orthogonal diagonalization of A that means columns of v are mutually orthogonal. So, if i and j are different then v i transpose v j is 0 and you have got the orthogonality of u i transpose u j. On the other hand if i and j are same then v i transpose v j you will get which is 1 because v is orthogonal. So, each column v j in particular is of size 1. So, in that case v j transpose v j will be 1 and this sigma j square cancels with this sigma j square i is equal to j in this case. So, you will get 1 here that means u j transpose u j will be 1. So, that shows the orthogonality of all the columns that we have determined from this. This much for those singular values which are not 0 for non zero singular values. For the singular values which are 0 you have got this A v k equal to sigma k u k and sigma k is 0. That means you are talking about A v k equal to 0. So, the corresponding u k is left indeterminate. So, that means that you cannot determine u k from this relationship because the coefficient is 0, but it is left indeterminate. That means you are free to choose a suitable u k. What is a suitable u k? A unit vector that is orthogonal to all the other columns that we have already determined. Now, in a case where m is less than n that means u has less number of columns and v has more number of columns. That means in that case you will get further equations A v k for k greater than m for which on this side you will get 0. And from that there is no corresponding column of u to determine. So, this is gone. The fourth case is where m is greater that is the matrix A has more rows than columns. In that case after all these calculations there will be further row columns of u which are left indeterminate. So, just like the case 2 in this case also there are additional u columns, additional columns of u which are left indeterminate. So, just like this case in this case also the additional u vectors are determined to make the entire u matrix orthogonal. That means additional columns of this case with 0 singular values and additional u columns corresponding to this case with additional singular values additional columns which have no matching singular values. So, these 2 cases are determined based on the orthogonality requirement of u. So, that means in one line you can say we extend the columns of u determined from here to an orthonormal basis and that full set of m vectors will give you the square matrix u. So, this way after the 3 factors of the singular value composition have been constructed you have A equal to u sigma v transpose each of the c u have in hand. After constructing the singular value composition like this you would like to see what are the properties of such a decomposition. So, first question after verifying existence is uniqueness, is it unique? The actual answer is that it is actually not unique. For example, you can apply several changes in it and still the changed u sigma v will constitute another singular value composition of the same matrix. So, that means that you can do several changes. So, those changes are here and then you can say for a given matrix the SVT is unique up to these changes. That means it is actually not unique it is determinate, but such changes will not disturb the requirements. Such changes will not disturb the fact that the matrix is the decomposition is still an SVT of the given matrix. So, what are these changes which are possible? The same permutation of columns of u columns of v and diagonal elements of sigma. That means if you interchange sigma 2 and sigma 5 and at the same time interchange columns u 2 and u 5 and interchange v 2 and v 5 then the resulting u sigma and v will still give an SVT and so on. Now corresponding to equal singular values you have got columns of u and v. So, among them if you work out an orthogonal reorganization that is suppose sigma 2 and sigma 3 are same then you say that I will work out this. Now this will be my new u 2 and this will be my new u 3 and corresponding for v also between v 2 and v 3 also you will make the same transformation. This will be still the resulting u and v matrices with the same sigma will still give you a singular value composition which is valid. The particular case particular transformation that we have worked out here is cos theta sin theta sin theta cos theta here that is cos theta minus sin theta note that this is minus. So, that matrix is an orthogonal matrix. So, such orthogonal linear combinations for columns of u and corresponding columns of v is fine that will not disturb the singular value composition. For 0 or non-existent singular values you can do any linear combination any arbitrary orthonormal linear combinations among the columns of u or columns of v. So, that will still be all right. So, these reorganizations in an already existing SVD can be done and the result will be still an SVD. Now if this can be done then we can do something better than what we have done till now. That is we have determined sigma 1 sigma 2 sigma 3. Now if the permutations can be appropriated in that then we can order them that is we can organize columns of u and v in such a manner that the sigma value that comes first is the largest magnitude and so on this we can do. So, this is typically done when we work with singular value composition. So, that means the non-zero singular values come at the top with this order and after that the 0 singular values come and after that of course additional rows or columns may come depending upon the rectangular size and shape of the given matrix right. Now here what is R? R is the rank and this is a very simple result which you can immediately establish that is rank of the given matrix is the same as rank of sigma which is R here other properties. You would have already noticed that matrix of matrix A is of size m by n that means it maps vector from R n to R m right in which this is the domain and this is the co-domain right. Now you can see that v being an n by n R orthogonal matrix can give a basis which is orthonormal basis. The columns of v are actually n dimensional vectors and they are all mutually orthonormal. So, that means that the columns of v give us an orthonormal basis for the domain. Similarly, columns of u will give an orthonormal basis for the co-domain and now here we see how these new basis v and u decompose the domain and co-domain into orthogonal subspaces. So, you consider the application of A on an arbitrary vector x with A written as u sigma v transpose. Now if you represent the vector in the domain the vector x in the domain in this new basis v then the expression the coordinates of that those vectors in this new basis will be v transpose x actually v inverse x, but since v is orthogonal. So, it will be same as v transpose x right. So, if we call that y then we will have u sigma y u is written here and recognizing that sigma is a diagonal matrix with sigma 1, sigma 2, sigma p written on the diagonal entries among which the top r r non zero you will have sigma y as sigma 1, y 1, sigma 2, y 2 etcetera up to sigma r y r below that everything else is 0 right and u has been broken and written in this fashion r columns here and the rest of them here. Now when you consider this product you will find the product is sigma 1, y 1 into u 1 plus sigma 2, y 2 into u 2 and so on up to this after that everything else is 0. Now see what is happening in this sum you will notice that this has non zero components along only the first r columns in this product the components along u r plus 1, u r plus 2, u r plus 3 etcetera are all 0. That means that A x has non zero components along only the first r columns of u right. That means u has given us an orthonormal basis for the core domain in which the range the vectors A x are contained only with the first r columns of u. That means u gives an orthonormal basis for the core domain such that the range is exactly described by the first r members of u and the rest of them describe an orthogonal component of range orthogonal complement of range. So that means the entire core domain has been decomposed into two orthogonal sub spaces. The first one is the range which is described with the first r columns of u which are corresponding to the non zero singular values and the rest of them are components in the orthogonal complement of range which are not in the range. Similarly on the domain side if you see this v transpose x is y. So v transpose what are the rows of v transpose are v 1 transpose v 2 transpose v 3 transpose and so on right and where v 1 v 2 v 3 are columns of v. So the entries the coordinates in y y 1 y 2 y 3 are actually v 1 transpose x v 2 transpose x etcetera that is v k transpose x is y k that is the coordinate y k is found like this right. So that is component of x along the unit vector v k. So the full x is component of it along v 1 into the v 1 unit vector plus its component along v 2 into unit vector v 2 and so on like this. Now in this you will find that those vectors which are here only make a contribution in the A x mapping. Those here will not make any such contribution because y r plus 1 y r plus 2 etcetera are 0 that we have already seen right. They are made 0 by the in this product sigma y. So whatever is y r plus 1 y r plus 2 etcetera sigma y will kill their contribution. That means whatever is y r plus 1 y r plus 2 etcetera their contribution in the product here will be 0 because sigma multiplied to them will kill their contribution. So that means v here gives you an orthonormal basis for the domain such that the components here v r plus 1 to v n they actually constitute the null space. So you find that on the co-domain side range is constituted by the columns of u corresponding to non-zero singular values and on the domain side the null space is spanned by the other columns other columns of v that is columns of v which are corresponding to the 0 singular values or non-existent singular values and that is it. Now with this understanding in the background we proceed and find a few more interesting things. In particular we work out the revised definitions of norm of a matrix and the condition number of a matrix. In basis v if we write a vector in the domain in this manner then this can be written as v c where v is the matrix with columns v 1 v 2 v 3 etcetera up to v n and c is the vector with these scalar components. Then from the definition of norm which we have seen earlier in the chapter 7 of the textbook in an earlier lecture we discussed. So from the definition of the norm of a matrix we say that norm square is maximum over v of norm a v square by norm v square. Now in this if we insert this description of the general vector v that is v c. So then first of all from the norm definition we get this and there in place of small v we insert v c then we get this. For v we have v c and for v transpose we have c transpose v transpose. Now here we have already seen that a transpose a diagonalization was carried out with the basis matrix v and the corresponding diagonal matrix sigma transpose sigma. So in place of this whole thing we can write sigma transpose sigma. Now here sigma transpose sigma is a diagonal matrix with entries sigma 1 k square sigma 2 square up to sigma p square and then perhaps additional 0. So this numerator breaks down to basically this and now you say that we want the maximum of it. When will it be maximum? If sigma 1 sigma 2 sigma 3 sigma 4 are not all of the same magnitude then this will be maximum when c k is a vector when c is a vector in which the only component is along the largest one which gets magnified by the largest amount. Then only you will get the maximum value of this and so you get the norm as norm square as the case where only that c k as a non-zero value for which sigma k is maximum that is sigma max. So when you put sigma max there then you get this. So norm is now found as the largest singular value of the matrix. So this is the new revised definition of the norm of a matrix. Now for a non-singular square matrix we worked out condition numbers. So here again we try to do that for A inverse we get this which is v sigma inverse u transpose which is this. Now you notice that by the same definition if we try to work out the norm of A inverse then it will be the largest singular value of A inverse and the smallest singular value from A will actually in its reciprocal will give the largest singular value for A inverse. So you find that the norm of A inverse is 1 by sigma min of the original matrix A. So the condition number is norm of A into norm of A inverse that is sigma max into 1 by sigma min. So you get this and that brings us to the revised definition for norm and condition number of a matrix. The new revised definition of norm and condition number will be like this. The norm of a matrix is the largest singular value and the condition number is the ratio of the largest singular value to the least. Now note that this revised definition of condition number can equally cater to rectangular matrices also. The old definition based on inverse would not be able to do that. Now note one more important issue. If you can arrange the singular values in decreasing order as we have been talking about then with rank of the matrix as R you can write it in this manner. So in which U R is that sub matrix which has all the columns of U which are corresponding to non-zero singular value. Similarly, V R are the corresponding columns of V and U bar and V bar constitute the rest of the columns. In that case this matrix A which is U sigma V transpose can be multiplied in this block form in which the three components that you get out of it will be 0 because of these and the non-zero component is only this U sigma U R sigma R V R transpose. So the other components are 0 and this gives you this summation. That will mean that if you can store the components of U and V the columns of U and V which are corresponding to non-zero sigma then that alone along with the sigma values of non-zero sigma sigma k's will be able to reconstruct the matrix A. And that means that for a large matrix with only a few top singular values as non-zero and significant you can effect every efficient storage and reconstruction. So with this background now we go ahead and see what is the application and what is the particular advantage of singular value decomposition for solving linear system of equations A x equal to b. And we again revise the definition of pseudo inverse compared to what we did earlier in the chapter 7. So in the background there is this term called generalized inverse. For any matrix you can define a generalized inverse or G inverse if for a vector b in the range A G B is a solution of this. That is for a matrix A a matrix G can be considered a an inverse of some sort generalized inverse if for a consistent right side vector b G B gives you the solution that way G operates something like an inverse. So pseudo inverse is actually a special case of generalized inverse the pseudo inverse or the Moore Penrose inverse is defined in this manner. And in order to differentiate it from the ordinary inverse we write it with this symbol A hash. So A hash is U sigma V transpose hash. Now here wherever an inverse is actually possible we take the A hash A we take the pseudo inverse as the same as the actual inverse. So the pseudo inverse of this will be V transpose hash sigma hash U hash. Now V transpose and U are orthogonal so for them actual inverse exist. So for the V transpose hash we write V transpose inverse which is V and similarly for U hash which is U inverse which is U transpose. That actual problem with actual problem is with this right. So this is the one which requires a definition that is defined like this. For this structure of sigma in which there is a diagonal matrix of R by R size here with R non-zero single values and everything else is 0 sigma hash is defined as this. So now that will mean that those diagonal entries which are non-zero their reciprocals will come here and those diagonal entries which are 0 so their reciprocals rather than infinity will put 0 here. This is very interesting. In place of 1 by 0 which should come as 1 by 0 by the ordinary rule here we are actually writing 0. So this is how we define the pseudo inverse or Moore-Penrose inverse. In elaboration you can write sigma hash in this manner. So sigma 1 2 so in place of the diagonal entries rho 1 to rho p you write where rho k is the reciprocal of sigma k when sigma k is non-zero and sometimes in practical cases even if sigma k is very small then we consider it as good as 0 that is here. So for those cases where sigma k is 0 or extremely small we put rho k as 0. So rather than putting 1 by extremely small number or 1 by 0 we actually put it 0 there. So this is the definition of pseudo inverse. Now sometime at leisure you should compare this expression and this description of the pseudo inverse with the special cases full rank cases which we worked out in chapter 7 as right inverse and left inverse. So in those cases where the matrix is full rank those definitions will appear as special cases of this. Now what are the inverse like properties or qualities of this pseudo inverse? First is pseudo inverse of the pseudo inverse of the matrix is the original matrix considering only actual 0 cases being put 0 here and not the truncations. Second important point which is like inverse that if a is actually invertible if it is a square non-singular matrix then this will boil down to the ordinary inverse and a hash b will give the correct unique solution of a x equal to b. On the other hand if the situation is not so good and if a x equal to b is an under determined but consistent system that is full rank case of more unknowns and less equations then a hash b selects that solution x star which has the minimum norm out of an infinite possible solutions. On the other hand if the system is inconsistent then this a hash b defined with the same formula then this a hash b will minimize the least square error that is if the system is inconsistent there is bound to be some error a x equal to b in a x equal to b a x will never be exactly equal to b then this same a hash b will find you an x star which gives the minimum error. Now if that minimum error giving solution is also not unique if there are infinite of them then at the same time it will give you that solution out of those infinite possible solutions giving the minimum error which has the least size. So all these sensible things the pseudo inverse does with the help of a single definition. Now you should contrast this with the solution which we obtained earlier from Kikhanov regularization. So pseudo inverse solution is typically used when you want precise values and also for diagnosing a linear system whether it has any such inconsistency or under deterministic problems and so on. On the other hand Kikhanov solution can be used when the coefficient matrix a changes over a domain and you want continuity of solution. So Kikhanov solution is preferable for continuity but for diagnosis and for precise solutions pseudo inverse solution is better. Kikhanov solution will always inhibit some error. Now in the exercises of this chapter in the textbook actually there is an exercise which asks you to determine the Kikhanov solution and the pseudo inverse solution and compare them for a matrix a which has one of the components variable. Now we want to know how this whole thing is accomplished by a single formula. So for that first we note down what is the pseudo inverse solution that we find that is this is the pseudo inverse of a and when we multiply it with b we get this sum where the summation is over K from 1 to r that is for all the non-zero singular values. So for that we get this expression and when we reduce it then we have U k transpose b which is a scalar divided by sigma k because rho k is 1 by sigma k. We can if we write it like this then we will find that the pseudo inverse solution that we are getting is actually a linear combination of r basis members V 1 to V r the corresponding components are these scalar values written in the parenthesis. Now we want to pose the problem as first minimization of the error and then if the solution is infinite then further minimization of the size of the solution and then see whether we get this same solution. So if we want to minimize the error this square error half norm square of the error a x minus b then as we open this we have already encountered this earlier once then the minimality condition first order condition is that its derivative its gradient with respect to x must be 0. So when we do that we get this as we got last time here now in place of a we write U sigma V transpose and through a few steps we come to this point. Now note that this is a matrix vector equation and this is the corresponding scalar equation for each component of that vector equation. So this is for each k from k equal to 1 to r where r is the rank that is non zero singular value. So from here you find that V k transpose x that is component of x along the unit vector V k turns out to be U k transpose b divided by sigma k. This sigma k square goes down in the denominator and this is what is actually sitting here. So in this solution x star is composed of several vectors V 1 V 2 V 3 up to V r in which the component of V k is this. So that means x star is actually giving you this combination of these vectors with these components. Now this first order condition for the minimality of this tells you what should be the components of the solution along the basis vectors V 1 V 2 V 3 up to V r along V r plus 1 V r plus 2 V r plus 3 what should be the component that is not mentioned here. That means those components can be anything the error is still minimum because the condition is satisfied. That means the general solution for minimum error you can constitute with the components along V 1 to V r as specified here and any component along the rest of the directions that will give you this with components as prescribed along the first r directions along the first r basis members and anything in the rest that means y is free here. So V r plus 1 y 1 V r plus 1 r plus 2 y 2 and so on. So these y 1 y 2 y 3 can be anything the V bar is the basis for the null space. That you will appreciate because any null space member will not change anything in the solution in the right hand side. So now we say that out of all these infinite possible solutions which one is the one which is of least size. So then what we ask for? We ask for how to minimize the size of the vector subject to this error being minimum anyway. That means the solution you take from here and minimize it with respect to y that is which y to select to minimize the size of the vector this. So we say minimize the size that is x norm square that is this. Now you find that x star this part is a linear combination of V 1 to V r and this part is a linear combination of other basis members and all other basis members are orthogonal to the basis members of the first family. That will make this x star sitting in one subspace and this part V bar y sitting in another subspace two subspaces being orthogonal to each other. So how do you find the square of some of this? If the two members are in a in orthogonal subspaces. So since they are mutually orthogonal this will be simply x star square plus V bar y norm square. Now you find that if we then want to ask that which y will give this as minimum where this is already available and cannot be tampered only y can be changed then y equal to 0 will give you this as 0 and this sum as minimum. So that means that y equal to 0 will give you the minimum size vector x which is of this form which minimizes the error. Now how this whole thing happens that you get all the optimal conditions in the solution that you construct with the help of the pseudo inverse. So for that let us investigate the anatomy of this optimization through SVD. If we use basis V and basis U for the domain and co-domain then the variables x and b under question x unknown b right hand side known. They are transformed as this that is in the new basis V the expression of x will be this y and in the new basis U for the co-domain the vector b will be represented as t which is U transpose b. Now if we write the system of linear equation A x equal to b and A is U sigma V transpose then V transpose x is y and U brought here as U transpose multiplied with b U transpose b is c. So then you basically get the equation in the new basis V on this side and U on this side as sigma y equal to c. This is a completely decoupled system because if we write this system of equations sigma y is equal to c we will find sigma 1 sigma 2 up to sigma r like this y 1 y 2 up to y r and below that possibly more variables up to y n and on this side we will have c 1 c 2 up to c r and below perhaps more things. Now the way the solution has been constructed you get the useful information only from the first r rows first r equations and they are completely decoupled because y 1 is simply c 1 by sigma 1 y 2 is simply c 2 by sigma 2 up to this y r is c r by sigma r. What happens below below you find that all 0's here that means right left side of the equation will give 0 question is what is here in c. If there are corresponding 0's here then that will mean that the system is consistent but that information 0 equal to 0 is completely unusable it does not have any information content. On the other hand if some particular values here are non 0 that will mean we are talking about 0 equal to something non 0 that means that is the conflict that is the source of inconsistency in the system of equations. So in this situation we find that for k equal to 1 to r this is what we determine and that is the only usable component. And for c k greater than 0 for k larger than 1 r that is below for c k non 0 for c k non 0 you will find that you have purely unresolvable conflict. So that is simply the inconsistency decomposed into a an orthogonal subspace and which cannot be compensated for by any other component. And c k equal to 0 will give you completely redundant information that is again the completely redundant information is also collected over an orthogonal subspace which cannot be changed from any other component from outside. So by setting the appropriate diagonal entries of sigma hash as 0 SVD extracts this pure redundancy and inconsistency and rejects that. So it rejects the redundancy it rejects the inconsistency and gives you that solution which is the best possible achievable. At the same time since these were free still because the usable component gave you values upon this much setting this all these variables as 0 minimizes the norm of y and since the norm of x will be the norm of v y v is orthogonal. So through the multiplication of an orthogonal matrix the norm of the factor does not change. So minimum y will mean minimum x for the norm. Now the points to notice here important points to note are the following. First SVD provides you a complete orthogonal decomposition of the domain and co domain and it separates functionally distinct subspaces. On this side the null space from the rest on that side the range from the rest. It offers a complete diagnosis of the pathologies of a system of linear equations. And then zero inverse solution A hash B gives you the most important solution of a linear system in all cases. Apart from these what has not been noticed till now very clearly is that with the existence of SVD guaranteed that any matrix real or complex you can write as u sigma v star or u sigma v transpose. Many important mathematical results and many other formulations can be worked out in a very straight forward and direct manner. In many of the cases in coming lectures based on this existence of SVD only you will find that you will be able to appreciate the deductions of many of the result quite easily. So here we in this lecture we have actually connected two important problems systems of linear equations and iron will problems together through the singular value decomposition. In the next lecture which will be the last lecture of our linear algebra module we consolidate a few important issues based on the abstract fundamental ideas of linear transformation. Thank you.