 So we thought that we could factor A as U sigma v transpose by finding the eigenvalues and eigenvectors of the smaller of A transpose A or A transpose. This worked because the additional eigenvectors were multiplied by the zeros of sigma provided that sigma had more columns than rows. But what if it didn't? So suppose A is an m by n matrix with m greater than n, so it has more rows than columns. Then in A equal U sigma v transpose, if we want to continue to make U and V to be square, then in order for the dimensionalities to work out, A and m by n matrix must be the product of an m by m matrix U, an m by n matrix sigma, and an n by n matrix v transpose. And this means U will be the eigenvectors for an m by m matrix, v will be the eigenvectors for an n by n matrix, and sigma will be an m by n matrix. And again, since we're assuming that m is greater than n, we'll be lazy and find v and the first n columns of m. Now, for the rest of it, in A equal U sigma v transpose, let's consider U sigma. So U sigma will be the product of a matrix consisting of a number of column vectors, of which we know the first n, plus a number of other column vectors, which we don't yet know. Sigma, meanwhile, will be a diagonal matrix with our values of sigma one through sigma n, and we can make all the remaining rows of sigma equal to zero. And so, while we don't know what these remaining column vectors of U are going to be, they'll be multiplied by zeros in our matrix sigma, and so they don't matter. Let's think about this some more. Let's consider those sigma i's. If we put them in decreasing order, the last sigma i's will be much smaller than the others. And so the corresponding eigenvectors will be multiplied by small numbers, and so maybe we don't need to find those eigenvectors either. So let's be lazy, but let's be productively lazy, and so let's take the largest eigenvalue and eigenvector to approximate this 3 by 5 matrix. So we find that A transpose is, and a numerical method gives us the largest eigenvalue in the corresponding eigenvector, and so our value of sigma is going to be, which gives us the first column of our matrix U, as well as the first entry of our matrix sigma. Now, from U transpose A equals sigma V transpose, we can also find the first row of V transpose from just that first row of U, which gives us V transpose, which will be the first row of our matrix V transpose. And now let's ignore all of our undetermined entries. And so this matrix U will treat this as matrix U prime, which just consists of a single column vector. This matrix sigma will treat this as matrix sigma prime, which consists of just that largest square root of the eigenvalue. And this matrix V will treat this as V prime transpose, a single row vector. And if we multiply these three matrices together, we get... Now, the value of what we did might not be obvious, but suppose we knew that our matrix A had integer values. If we rounded the entries of U prime sigma prime V prime transpose to the nearest integer, we'd get the following matrix. And the thing to notice here is this is almost the same as our original matrix A. And this is a truly amazing result, because notice that we needed the three components of our first vector, the one singular value, sigma one, and the five components of V one. Which means that with three plus one plus five nine numbers, we were able to mostly reproduce a matrix with three times five fifteen numbers. And this has some enormous practical implications. Suppose A is a one thousand by one thousand matrix with one million data values. For example, this could be something like a digital image. We might be able to approximate A using a single column vector with one thousand entries, a single sigma value, one value, and a single row vector also with a thousand entries, or about two thousand and one data values. And this allows us to go from one million data values down to two thousand and one, and this gives us a compression of about ninety nine point eight percent. Now in practice we actually need a few of the largest eigenvalues and eigenvectors. Say the first one hundred column vectors, one hundred eigenvalues, and one hundred row vectors. But even if we did need that many, this would still allow us to reduce our data size by about eighty percent.