 The last time we were discussing about the singular value decomposition and recall that the singular values of a matrix are sigma 1 through sigma n where these sigma 1 through sigma n are ordered with sigma 1 being the largest and sigma n being the smallest singular value they are all real numbers and non-negative and such that sigma 1 squared sigma 2 squared up to sigma n squared are the eigenvalues of a Hermitian a now we also saw the singular value decomposition theorem which says that if a is an m by n matrix and sigma 1 through sigma r are the non-zero singular values of a where r equals r is the rank of this matrix a and if we define d to be a diagonal matrix containing sigma 1 through sigma r as its diagonal entries and zeros everywhere else and sigma to be a matrix with d as its top left r cross r block and zeros everywhere else and of size m cross n then there exists a unitary u of size m by m and a unitary v of size n cross n such that u Hermitian a v equals this matrix sigma so this is a very important theorem in the sense that I like it for its generality it applies to any matrix a absolutely no structural assumption made on the matrix a and you can construct a singular value decomposition for any matrix we also saw that we can partition the singular value decomposition like this we can write it as u 1 u 2 where u 1 contains the first r columns of the matrix u u 2 contains the remaining m minus r columns of the matrix u this is sigma tilde which is the r cross r matrix is the same as the d written in the statement of the theorem containing the non-zero singular values in fact this is yeah these are the non-zero singular values and v is also partition row wise with the first r rows of v being denoted by of v transpose denoted by v 1 transpose and the remaining n minus r rows of v transpose being denoted by v 2 transpose so this sigma 1 is therefore a diagonal matrix containing sigma 1 through sigma r so we were looking at several properties of the singular value decomposition the first was that the rank of the matrix a equals the number of non-zero singular values and we saw this because this is true because if we write it like this then we can multiply these together and we see that we can write a to b so we can write this as u 1 sigma tilde v 1 transpose and we also recall that this is called the economy singular value decomposition so basically the columns of a are linear combinations of the columns of the matrix u 1 with the coefficients given by the columns of this matrix b and this sigma this sigma tilde has linearly independent columns and v 1 tilde has v 1 transpose has linearly independent rows so basically the columns of this matrix are linearly independent so there are exactly r linearly independent columns in this matrix a okay so in fact the converse is also true that is if the rank of a equals r then it has r non-zero singular values this comes from the svd theorem itself if you look back at the proof of the theorem other properties are that the null space of a is the same at the span of the columns of v 2 this matrix here and v 2 is actually an orthonormal basis for the null space of a the range space of a is the span of the columns of u 1 u 1 is this matrix here and u 1 is an orthonormal basis for the range space of a and the range space of a transpose is the span of the columns in v 1 and the orthogonal complement of the range space of a which is the same as the null space of a transpose is the span of the columns in u 2 so basically the svd reveals the four fundamental subspaces associated with any matrix it gives you an orthonormal basis for all of the four subspaces also the the the the the spectral norm of a is equal to sigma 1 which is the largest singular value and if a is square and full rank we can easily determine a inverse once we know the singular value decomposition a inverse is simply v sigma inverse and u times u hermitian and since sigma is the diagonal matrix inverting it is trivial you just have to invert the each of the diagonal entries another interesting property of the svd is that it diagonalizes any system of linear equations so for example if you're given ax equals b this is a system of linear equations then substituting a equals u sigma v transpose we get u sigma v transpose x equals b and if we define c to be the matrix u transpose times the vector u transpose times v so i'm pre multiplying this by u transpose and what i get on the right side would be u transpose b and i call that my c and i partition c as like this with c 1 containing the first r entries in c and c 2 containing the remaining m minus r entries in c this is the same as representing this vector b in the basis defined by u this is a big change of basis because u is an orthonormal matrix multiplying by u transpose corresponds to a change of basis and c is basically the same as b but in a new basis represented in a new basis similarly if i define d to be this v transpose x and i partition it as d 1 d 2 where d 1 contains the first r entries and d 2 contains the remaining n minus r entries this is a representation of x in the basis v so if i do these substitutions then the system of linear equations reduces to sigma times d equals c which is which is which looks like this it's sigma tilde 000 times d 1 d 2 so this is r cross r and this has r entries this has n minus r entries this has r entries and this has n minus r entries so this is now a very simple system of linear equations and in fact we can visually solve the system of linear equations in particular if m is greater than n that is the number of rows is more than the number of columns then the matrix a is tall and if you look back at the system of linear equations this means you have more equations than you have variables that you need to solve for and in in in particular if you assume a is full rank that is the rank of this matrix is equal to the number of columns in a which is n then then there won't be any right block here because r is equal to n there is no such thing as d 2 and there's no right block over here so it's just sigma tilde times d equals c 1 c 2 and of course these equations because d is only only multiplying zeros in these rows these equations can only be satisfied if c 2 equals 0 and c 2 equals 0 in turn means that it means that u 2 transpose times b equals 0 because c 2 is just exactly equal to u 2 transpose times b and if u 2 transpose b equals 0 it means that b must belong to the range space of u 1 or in other words the range space of a we already saw that the range space of a is this the span of u 1 so that is clear right if you have more equations than if you have more yeah more equations than you have unknowns there exists a solution only if b lies in the range space of a similarly if m is less than n that is the matrix a is fat then you have more more variables than you have equations and if a is full rank that is r equal to m then there are no bottom blocks here so you'll have sigma tilde d 1 d 2 is equal to c there is no bottom there's no c 2 here then if we define d 1 to be sigma tilde inverse time c this is allowed because we've assumed a is full rank so sigma tilde has all non-zero entries and so if we so in in that case we can d 1 the solution d 1 is just sigma in tilde inverse time c and d 2 can be anything because it's anyway just going to multiply 0 here okay and so the solution is not unique the and in general we can write the solution to be c v 1 times sigma tilde inverse time c plus v 2 d 2 where d 2 is arbitrary so this is the complete set of solutions of this system of equations so in in this case there's always a solution in fact there are always infinitely many solutions if a is not full rank then both of these apply so yeah you have to you have to use both these conditions so there may or may not exist solutions another another property is that v it rotates the rows of a to get orthogonal columns so in particular since a equals u sigma v transpose if we write multiply by v a times v becomes equal to u times sigma now the columns of u are ortho orthonormal and then this is a diagonal matrix so this matrix u times sigma has orthogonal orthogonal columns with each column norm being equal to one of the singular values of this matrix a similarly if i take u transpose a a is u sigma v transpose so u transpose u becomes the identity matrix so what i'm left with is sigma v transpose which in turn has orthogonal rows with the row norms being equal to the singular values of this matrix a so the last time somebody asked me a question about connections between the eigenvalue decomposition and the singular value decomposition the explicit connection is exactly this that if i if i consider the matrix a transpose a again a is equal to u sigma v transpose then a transpose will be v sigma transpose u transpose so a transpose a is v times the sigma is sigma tilde 000 and since i'm writing it as sigma tilde i don't need a transpose here this is just a diagonal matrix its transpose is itself but the matrix overall is of size n by m not m by n whereas this is the sigma matrix which is sigma tilde 000 which is of size m by n times v transpose now if i carry out this multiplication i'll get the matrix sigma squared as the top left r cross r matrix and then zeros everywhere else so that the overall size of the matrix is n cross n and i have v v transpose so from this we see that the eigenvectors so basically this is the eigenvalue decomposition of a transpose a so the eigenvectors v of a transpose a are the right singular vectors of a and of course this also shows that if a is fat um then a transpose a will have and if it's actually full rank then all these entries will be non-zero so there will be m non-zero eigenvalues for a transpose a and the remaining n minus m eigenvalues will be zeros similarly if you write out what a transpose is that is u times sigma tilde squared 000 but now this is of size m by m so the zeros are of appropriate dimensions so that the overall dimension is f by m times u transpose and so the eigenvectors of a transpose at this u which are equal to the left singular vectors of a and of course the singular values of a square are the eigenvalues of a transpose so basically the price paid for being able to diagonalize an arbitrary matrix is that we need two matrices to diagonalize the matrix instead of one if it was a if a was square and symmetric we can find the eigenvalue decomposition like this q lambda q transpose where lambda can is a diagonal matrix containing the eigenvalues and q is an orthonormal matrix containing the eigenvectors so basically a q is equal to q lambda or a times the ith column of q is equal to lambda i times q i i equal to 1 to n so this is the eigenvalue decomposition so you need only one matrix to diagonalize this a but for a single for a for a for an arbitrary matrix which is not necessarily square we can find an svd a singular value decomposition like this a equal to u sigma v transpose and of course this means that a times v equals u times sigma or a v i equals sigma i u i so these are v i's as unit norm vectors and u i's are unit norm vectors so these are kind of like eigenvectors of the matrix but not really because this u i is not equal to this v i but they are unit norm vectors they're in fact of different dimensions v i's are of dimension n by 1 whereas u i's are of dimension m by 1 similarly a transpose equals v sigma u transpose taking the transpose of this and this this matrix i should actually write it as sigma transpose u transpose so then if i write multiply by u i have a transpose u equals v sigma transpose then a transpose u i is equal to sigma i times v i and so again so this is the kind of relation you have a v i equals sigma i u i and a transpose u i equals sigma i v i that is the relation you have for the singular value decomposition what you should do on your own is to compare these two compare the eigenvalue value decomposition and the singular value decomposition when the matrix is square symmetric okay and positive definite or in the case where it's indefinite that it has some that is it has some positive and some negative eigenvalues okay there is another geometric view or the ellipsoidal view of the singular value decomposition so again take a matrix which is of size m by n let's look at this space defined by the vectors y where y is equal to ax for some x which is of unit l2 norm okay and let's look at what this what the space of y looks like it turns out that the space e okay so that is you take any vector which is of unit l2 norm you compute ax and you look at you plot that point in the m dimensional space and look at all the points y that you can reach by by this operation it turns out that that's that set of y's describe an ellipse are actually in in m dimension it's called a hyper ellipsoid and these singular values of the matrix a are the lengths of the semi axis of these hyper ellipsoids what I mean by that is for example if I take a two dimensional ellipse then these lengths the lengths of the semi axis actually just to be a little more clear let me draw it in a different way so suppose there's an ellipse that goes like this then this is called the semi major axis and this is called the semi minor axis and this length will be sigma 2 and this length will be sigma 1 okay so let's see that so suppose y if I consider y equals ax and now I use the singular value decomposition it's u sigma v transpose x then once again if I define u transpose y to be c and v transpose x this part to be d then this this equation reduces to c equal to sigma times d and further if x is of unit l2 norm because v is an orthonormal matrix it means that instead of searching over or instead of going over all points such that x2 equals 1 I can as well go over all points such that d2 equals 1 so we can look at the set of c vectors that I get corresponding to d satisfying norm d2 equals 1 now because c equals sigma times d norm d2 equals 1 simply means that summation i equal to 1 to p where p is the number of nonzero singular values in the matrix a c i over sigma i squared which is the same as sigma i equal to 1 to p di square this is diagonal that's why I'm able to write this is equal to 1 now the set of points c is actually exactly described by the c that satisfies this equation and this equation sigma i equal to 1 to p c i over sigma i squared equals 1 is an ellipse or a hyper ellipsoid so basically e and this going from c to y is just all it takes is multiplying by u which is an orthonormal matrix it's just a orthonormal change of basis so e is also an ellipse when or a hyper ellipsoid when expressed in the basis u and the principal axis of this when you write an ellipse equation of ellipse in this elementary form like this then sigma i is at the semi major axis of this hyper ellipsoid so the principal axis are along aligned along the columns ui and have lengths sigma i which are the singular values of the matrix a there's one more very very interesting and useful theorem associated with the singular values a singular value decomposition so suppose a equals u sigma v transpose which we can if we expand this product out we can write it as sigma i equal to 1 to r sigma i ui vi transpose so in fact we see that the matrix a can be written as the sum of r matrices of rank one ui vi transpose is a matrix of size m by n and it has rank one and so a can be written like this and the question we ask is given this matrix a with rank r what is the matrix b with rank equal to k which could be less than r which is in fact less than r that is closest to a in the two norm sense so that the answer to this is given by the following theorem define ak to be the summation i equal to 1 to k sigma i ui vi transpose that is I am I am I am stopping this summation okay from i equal to 1 to r instead of going all the way up to r I am only going up to k and that matrix I define as ak then the minimum over all b matrices b of rank at most k rank equal to k of a minus b l2 norm is equal to the norm of a minus this particular matrix ak which is in turn equal to sigma k plus 1 in other words ak is the rank it is the matrix of rank k which is as close to a as possible in the two norm sense and how far is it from a it is it is sigma k plus 1 away from a recall that for all this to I mean the assumption here is that sigma 1 is greater than or equal to sigma 2 is greater than or equal to etc up to sigma r is greater than 0 okay these are ordered so it means it means what this means is that the in words what what this theorem is saying is that the closest rank k matrix to a in the two norm sense is given by ak this matrix here okay and this matrix is formed by excluding the the contribution of the smallest singular values from in the in the expansion of this matrix a given here it also means that any mate any m by n matrix of rank k less than or equal to r is at least sigma k plus 1 away from a in the two norm sense so you can't find a matrix of rank k which is strictly less than r that is arbitrarily close to a there it will be at least sigma k plus 1 away from a now the proof of this theorem is actually very interesting but before I walk you through the proof note that what this is saying is that the solution to this optimization problem is ak this by itself is actually the fairly you know it's it's not obvious how you would show something like this it's saying that yeah you need to show that when you solve this optimization problem you will get ak as a solution so the way you actually prove theorems like this often is that you show a lower bound on this quantity if you look at all matrices of rank equal to k what is the smallest value that a minus b2 can achieve and then you show that this ak as defined here actually attains this lower bound so that basically establishes the theorem so this kind of tricks can be used when you somehow have a guess of what the answer to the optimization problem is so I show that kind of problem is actually always almost always easier than find the solution to this optimization problem so to the proof goes like this first of all rank of ak as defined here is equal to k okay that's simply because if I do u transpose ak times v I get this diagonal matrix containing sigma 1 through sigma k along the diagonal and now this clearly has rank equal to k so rank of ak equals k also if you look at the l2 norm of a minus ak this l2 norm is invariant to left or right multiplication by unitary matrix so I can consider u transpose a minus ak times v which is substituting for a to be u sigma v transpose and ak to be u some sigma dash v transpose you can see that the first k singular values will just cancel off and what you will be left with is a diagonal matrix containing sigma k plus 1 through sigma r and of course the largest eigen value of this diagonal matrix is just sigma k plus 1 so this norm is exactly equal to sigma k plus 1 okay so that shows that that shows this part of the theorem a minus ak is sigma k plus 1 now what we need to show is that there is no other matrix that can have an l2 norm of the difference being bigger than sigma k plus 1 sorry being smaller than sigma k plus 1 in other words ak itself is the matrix that is closest to a among all rank k matrices in this l2 norm sense so suppose b is a matrix such that rank of b equals k which in turn means that by the rank nullity theorem the dimension of the null space of b is equal to n minus k define w to be the span of v1 through vk plus 1 the first k plus 1 columns of v okay so a is equal to u sigma v transpose and v is v1 v2 vn okay take these first k plus 1 columns here and the span of this is the subspace w now the dimension of this subspace w is k plus 1 the dimension of the null space of b is n minus k so the sum of the two dimensions is more than n okay and therefore by dimensionality considerations we see that the null space of b intersection w cannot be the null set there has to be some nonzero vectors in it so suppose we pick one such vector x which belongs to the intersection of these two sets subspaces which is of unit norm x2 equals 1 now this x of course belongs to the null space of b and it belongs to w so since it x belongs to w we can write x to be a linear combination of these k plus 1 columns here so we will call that i equal to 1 to k plus 1 alpha i vi and these vi's are orthonormal and so we can actually directly find these alpha i's to be vi hermitian times x okay i equal to 1 to k plus 1 now since the l2 norm of x equals 1 it means that if I take vi hermitian x square these when you square and add them that should also be equal to 1 you can simply take this hermitian times itself all the cross terms drop off because these vi's are orthogonal to each other and you're left with sigma alpha i mod of alpha i square which is the same as sigma vi hermitian x square i equal to 1 to k plus 1 and that should be equal to 1 on the other hand since x also belongs to the null space of b it means that b times x equals 0 which means that a minus b times x is equal to a x because b x is equal to 0 and if I write out what a x is so all I'm doing here is I'm just substituting sigma i equal to 1 to k plus 1 alpha i vi for x so I get sigma i equal to 1 to k plus 1 alpha i is just a scalar so I can bring it out front and it's equal to vi hermitian x times avi but we already saw that avi is the same as sigma i ui and so I can write it as i equal to 1 to k plus 1 vi hermitian x times sigma i times ui and so basically if I look at norm square of a minus b x this is equal to sigma i equal to 1 to k plus 1 vi hermitian x square times these vectors are orthonormal so once again if I take this thing squared all I'll all the cross terms will drop off and all the direct terms will have ui hermitian ui which is equal to 1 there and so I have vi hermitian x square times sigma i square and so and further by the sub multiplicativity property a minus b x l2 norm is less than or equal to the l2 norm of a minus b times the l2 norm of x but the l2 norm of x equals 1 we started out by saying that that x be a vector such that l2 norm equals 1 so this is less than or equal to the l2 norm of a minus b and so this is bigger than or equal to this which is in turn equal to this and so a minus or rather the square of this so it's equal to the square root of this so a minus b l2 norm squared is greater than or equal to the square of this which is equal to this sigma i equal to 1 to k plus 1 vi hermitian x square times sigma i square now these sigma i's are in decreasing order and these vi hermitian x squares add up to 1 so if I replace all of these sigma i squares with sigma k plus 1 the smallest value here then sigma k plus 1 can come out but this summation i equal to 1 to k plus 1 vi hermitian x square is equal to 1 that's what we wrote over here and so this is greater than or equal to sigma squared k plus 1 okay so that means that if I had taken any other b which is of rank k then a minus b squared is is at least equal to sigma k plus 1 square and we found the matrix ak which is exactly at sigma k plus 1 square distance from a and so that proves the result any questions so far sir yes you have not proved the first part right the the matrix b which minimizes this problem is ak now what I'm saying is this this these two points together you made two points okay the first point is that let me go back here the first point is that a minus ak l2 norm is equal to sigma k plus 1 okay so ak is exactly at it's a matrix of rank k which is exactly at sigma k plus 1 distance from a and then we said let b be any other matrix okay such that it has rank equal to k then what we showed is that this matrix b is at least equal to sigma k plus 1 distance yes sir okay I understand so now basically I can challenge you to say find me a matrix which is of rank k which is less than sigma k plus 1 distance away you won't be able to because I already proved a lower bound that any other matrix any matrix b of rank k is at least sigma k plus 1 away from a and we found the matrix ak which is at sigma k plus 1 distance away and therefore this matrix ak that we found is a solution to the optimization problem of finding a matrix b of rank equal to k which is as close as possible to a in the l2 norm sense of course there will be other solutions but this is at least this is certainly one solution yes sir okay thank you okay so if there are no other questions I move on to the next topic which is generalized inverses of matrices sir yeah sir a minus b into norm for fixed a and where it will be if it's a convex function then it's the only solution we can say yes okay so you need two things to prove that this uniqueness based on convexity okay you need that the cost function is convex and if I take a minus b l2 norm squared that is indeed convex in b however you also need that the constraint set is a convex set and clearly you can see that if I look at the space of all matrices of rank matrices b of rank k if I take a convex combination of two such matrices that need not be ranking and so this space is a complicated space it's not a convex space okay and because of that you cannot use convexity based arguments to say that this is a unique solution okay very understood thank you welcome