 So, good afternoon. Let's begin. So, last time we were looking at Gaussian elimination with pivoting. We described the process which involves using these rotation matrices. So, today we will and towards the end of the class, we started looking at other modifications to the LU decomposition. And we sort of were working our way up to this decomposition we'll discuss today which is known as Cholesky decomposition. And we'll also discuss some uses of this Cholesky decomposition. So, just to recall, at the last, in the last class we saw that if matrix A, if matrix A is symmetric and non-singular, then in the decomposition, LDM transpose, L is equal to M, which means that we can write or we can find a lower triangular matrix L such that A is equal to LDL transpose exists where L is lower triangular and D is diagonal. Okay. So, this kind of a decomposition exists. Now, don't confuse that entries of D are the eigenvalues of the matrix. They are not, in general, they are not. D are some diagonal entries. And this is the LDL transpose decomposition. One thing is that the eigenvalue decomposition of a matrix is not something that can be done in using a polynomial type time procedure like we are describing here. Whereas these kinds of LU decomposition or LDM transpose decomposition or LDL transpose decomposition, these are, you can write down a polynomial time algorithm that can actually find this decomposition. Whereas to find the eigenvalue decomposition, you need to find eigenvalues and eigenvectors for which you need to typically use iterative procedures or some other techniques to find the eigenvalues and eigenvectors. Okay. So, the D is a diagonal matrix, but the entries of D are not the eigenvalues of the matrix. Okay. So, now we will discuss about Cholesky decomposition. So, I need one definition which we are going to use very heavily in the coming classes. And that is that of a positive definite matrix. So, a in c to the n cross n is positive definite if it is Hermitian and x Hermitian a x is strictly greater than 0 for all nonzero vectors in c to the n. Okay. And we will write this as a slanted greater than 0. Okay. So, another way to define a positive definite matrix is a matrix is positive definite if it is Hermitian and all its eigenvalues are strictly greater than 0, but we will come to that. For the purposes of this discussion, this definition is enough. So, this is that of a positive definite matrix. Okay. So, now the first result we have is that if a is symmetric and positive definite, actually this is for the real case and positive definite which I am going to abbreviate as pd, then there exists a lower triangular matrix g with positive diagonal entries such that a is equal to g g transpose. Okay. So, instead of an LDL transpose decomposition, we have a g g transpose decomposition. So, why is this true? It is because, so first of all, a is positive definite and symmetric. So, that means that x transpose ax is strictly positive for every nonzero x in r to the n which, so since a is a positive definite and x transpose ax is strictly greater than 0 for all nonzero x, it means that the matrix x is non-singular. So, it admits, it is symmetric and non-singular. So, by the previous result, it admits an LDL transpose decomposition. And so, that means that x transpose LDL transpose x is strictly greater than 0. And since a is positive definite, it means that the matrix L here is going to be non-singular, rather full rank. And so, if we let this full rank, which means that if we let y equal to L transpose x, then what we have is y transpose dy, just substituting is greater than 0. And this is true for every y not equal to 0. And this in turn, this holds, if and only if, dii, d is a diagonal matrix. So, each of, so you can, for example, choose y equal to e1. That will pull out d11. And that is greater than 0. Then you can choose y equal to e2. That will pull out d22. And that is greater than 0. So, dii is greater than 0 for i equal to 1 to up to n. And since dii is greater than 0, we can define g to be equal to L times this diagonal matrix containing square root of d11, square root of d22, up to square root of dnn. Then a equal to g g transpose and g is a lower triangular matrix with positive entries along the diagonal. This is as desired. So, one thing is that this Cholesky decomposition, where you are only looking for a g matrix such that a equal to g g transpose, it, since a is positive definite and symmetric, it can be done with half the number of flops as compared to the vanilla LU decomposition. And the other thing is that because a is positive definite, it turns out that the computation of the LU decomposition is stable without using pivoting. So, you do not need to do this pivoting step that we discussed in the previous class. So, here is how you compute the Cholesky decomposition. You can do it more efficiently than LU because of the following. Matlab has a built-in command C-H-O-L of A and it will directly give you the Cholesky decomposition of A. So, we will illustrate this for the 3 cross 3 case, but you will see that the idea applies to any dimensional matrix. So, in the case of 3 cross 3 matrix, we are interested in solving for the matrix g where it has the structure g11, g12, g22, g, sorry, this is g21, this is g31, g32, g33 and zeros everywhere else. This is g, g transpose will then be g11, g21, g31, 0, g22, g21, sorry, 0, g22, g32 and 0, 0, g33. This product should give me the matrix a11, a12, a13. So, then by following a proper order in trying to figure out this g, ijs, you can actually compute each of them quite easily. So, for example, if you compare the 1, 1 element, so the 1, 1 element here is g11 square, that is equal to a11 and then taking the positive square root, we have g11 equals square root of a11. So, this, whether you take the positive or negative square root gives you some flexibility in computing the Cholesky decomposition, but if you restrict yourself to taking the positive square root each time, then the computations are unique. So, the Cholesky decomposition is unique. And then what you do is you look at the first column of the product. So, that will be, the next entry will be g21 times g11. So, or more generally, g i1 times g11 is equal to ai1, which implies that g i1 will be equal to ai1 divided by g11. And essentially, the fact that the matrix A is positive definite implies that this a11 will be strictly positive, because if you recall, x transpose ax equal is greater than 0 for all non-zero x. So, if I take x to be equal to e1, that extracts the entry a11, and therefore a11 should be strictly greater than 0. So, I would not be dividing by 0 here. This is i equal to 2 to n. So, all elements in the first column of g are now already determined. And similarly, if I look at the second column, the first entry will be this times this, which will be g11, g21. But I already know what g11 and g21 are. So, that does not give me any new information. Actually, this matrix A is symmetric. So, these two will be the same. So, maybe just to avoid confusion, I will write this as, I will do the following. This is a12. This is a13. And this is a23. Okay. So, this times this gives me g11, g21 equals a12, which is the same as what I got over here. g11, g21 equals a12. Nothing new from that equation. Excuse me. But if I take the 2 comma 2th entry, I have g21 squared plus g22 squared equals a22, which is this entry here, which means that we already know what g21 is. Sorry, we already know, yeah, we already know what g21 is. gI1 we know, i equal to 2 to n. And so, since we know this, we can then compute g22 to be equal to the square root of a22 minus g21 square. And the positive definiteness of a ensures that this quantity will be always positive. This is something, these kind of things are something we will see later. But it is indeed a consequence of the fact that the matrix A is positive definite. So, that ensures that this is always greater than 0. Okay. And so, since now g22 is known, then the second column, which is essentially now g32. Sir. Yeah. Sir, as you changed the entries, because of symmetry, you changed the subscript things. The second line of your proof, like it is not directly having any entry at it. You wrote ai1. So, a21, a31 is not directly present. Okay. You can make that small correction, call this a19. Okay. So, and so on. But you are right. This a21, but the idea is that whatever entry I put here, you know it, because I am just writing the entries out here. So, you can always determine these gi1s. So, basically, this is how you can efficiently determine the Cholesky decomposition of a matrix A. Now, if A equals gg transpose, then in order to solve Ax equals B, what we do is we first solve g times z equals B. So, basically g transpose x, I am calling it z. And so, gz equals B and you solve for z. And g is lower triangular. So, you can solve this efficiently using forward substitution. And then, you solve g transpose x equal to z and solve for x. Now, g transpose will be upper triangular. And so, you can do backward substitution and find x. So, this Cholesky decomposition, like the LUD composition, is useful to solve a system of linear equations. So, as I mentioned, if you take the positive square root at each step by the way we have developed it, it is clear that this Cholesky decomposition is unique. And yeah, so I will just make that note here. If always take the positive square root, the Cholesky decomposition is unique. And the other thing is that since A equals gg transpose, this is like writing A as the product of something times its transpose. It is a matrix analog of the square root operation. But the difference is that the square root of a matrix is not unique. So, in particular, if I consider g times q, where q is any orthonormal matrix, then if I take gq times gq transpose, then that is also equal to gg transpose, which is equal to A. So, gq is also a valid square root of this matrix A. So, it is not unique. So, for example, another square root could be v times the diagonal matrix containing lambda 1, square root of lambda 1, square root of lambda 2, up to square root of lambda n. So, this also, if I take this times its transpose, it will give me v times lambda, a diagonal matrix containing all the eigenvalues times v transpose. And because, so these are the eigenvectors and these are the eigenvalues, then remember that the eigenvalue decomposition is of the form A v equals v lambda, where this is a diagonal matrix containing lambda 1 through lambda n. And so, A is equal to v lambda v transpose. So, this is also a valid square root. So, there are many square roots that are possible. But the Cholesky decomposition is a unique square root in the sense that we have taken the positive square roots at each step. And what you are getting is a matrix g, which is lower triangular and has positive diagonal entries. So, if you restrict to square root matrices that have these two properties, that it should be lower triangular and it should have positive diagonal entries, then the square root of the matrix is unique. Now, if we have one small point here is that what I need here is that QQ transpose must be the identity matrix. So, Q need not even be a square orthonormal matrix. It can be any matrix such that QQ transpose equals the identity matrix. So, it could be of size n cross k with k greater than or equal to n. And still this will work. So, the square root of a matrix in this sense need not even be a square matrix, because GQ will then be of size n cross k. Now, another point is that if we consider, if we have a zero mean vector process xi in r to the n. So, this is a random process. So, maybe for, I will write it as capital X i. Okay. And suppose we have m samples of this random process. And suppose that m is greater than n. So, you have more number of samples than you have than the dimension of each of the vectors. Then we can form the matrix x, which is a concatenation of all these like this x1 transpose x2 transpose xm transpose. And this is going to have m rows and n columns. Then each row here is a sample xi transpose. Then an estimate of the covariance matrix is r is equal to 1 over m times x transpose x. Okay. So, you might have seen this, for example, in this form 1 over m sigma i equal to 1 to m xi transpose. Okay. So, this is a way to estimate the covariance of this vector process x. And this has some properties that you will see possibly in your detection and estimation class next term. But for now, just say that this is the estimate of the covariance of x. Now, if we have, I will call it r hat, the true covariance matrix r, and this is the estimate obtained using m samples. So, we call it r hat. Now, if x is equal to q times u, where this is the q r decomposition. So, this is, remember that this is m by n. So, this matrix would be m by m. And this u would be an upper triangular matrix of size m by n. And this is the q r decomposition of x. I don't want to write r because I said the true covariance is r. So, I write it as u. Sir. Yes. Sir, if u is not a squared matrix, so how can we write like a triangular matrix or upper triangular? Upper triangular just means that there's nothing below the main diagonal. I mean, it's all zeros below the main diagonal. So, that's what I'm writing here. So, this is of size m by n. It has more rows than columns. It's the tall matrix. And this is the first n rows. And this is the remaining m minus n rows. There will all be zeros. Okay. And u zero is upper triangular with positive entries along the diagonal. So, we'll choose the q r to ensure that. Okay. Then u zero transpose is the Cholesky factor m r hat. That is m r hat is equal to x transpose x just from this itself. And that is equal to u transpose q transpose q u. And q transpose q is the identity matrix. And so, this is equal to u transpose u, where u has the structure, which is equal to u zero transpose u zero transpose considering the block matrix multiplication. Sorry, u zero transpose u zero, which is equal to, I can write, think of this as g g transpose because it has the structure that u zero is upper triangular with n positive entries along the diagonal. So, this is like my g transpose matrix and this is my g matrix. So, u zero transpose is the same as g. So, so what this means is that the u zero resulting from the q r decomposition from the q r decomposition of x is, in fact, the Cholesky, in fact, the transpose of the Cholesky factor of m r hat. m is just a scalar. So, in fact, it is a scale version of that one over square root of m times u zero would be the transpose of the Cholesky factor of r hat itself. So, continuing with these points, this Cholesky decomposition is useful for, for example, if you want to generate a vector process with a desired covariance matrix, this is useful for simulation purposes. So, so suppose we want to generate an x which is in r to the n with a desired covariance matrix and let us call that desired covariance matrix sigma symmetric. Any covariance matrix is by definition symmetric positive definite covariance matrix. Okay, then what we can do is, so we first find since sigma is symmetric and positive definite, we can find g such that sigma equals g g transpose. Then what we do is, we say we start with a vector w which is in r to the n be a random variable such that its covariance matrix expected value of w w transpose equals the identity matrix. So, generating such these are called isotropic this isotropically distributed random vectors is easy. So, you just generate vectors with independent and identically distributed entries, then expected value of w w transpose will be one if the variance of each of those entries is equal to one. Then, then we define x equal to g times w. Then the expected value of x x transpose will be equal to the expected value of g w w transpose g transpose which is equal to g is just a linear operator and expectation is also a linear operator. So, I can exchange them and pull the expectation inside and pull the g transpose out from the other side. So, this is g times the expected value of w w transpose times g transpose which is equal to g g transpose is equal to sigma. So, we have generated random vectors that have this desired covariance matrix sigma. This is useful as I said for computer simulations for creating vector processes with a desired covariance matrix. The converse of this is what is called whitening which is also very useful because when we have for example a stationary random process x i which is r to the n and i equal to 1, 2, etc. And suppose we get to observe these x i's with x i being equal to some s i plus v i where s i is the signal plus v i is the noise and this v i being the noise is colored meaning the expected value of v i transpose is not the identity matrix but some other matrix sigma. It is not equal to the identity matrix. So, suppose this sigma is known so somehow you have access to independent noise samples from which you are able to estimate the noise covariance matrix and suppose sigma is known and is positive definite then there is a g such that sigma equal to g g transpose. So, let g be such that sigma equals g g transpose then what we will do is we pre multiply x i by g inverse by g inverse what that gives us is g inverse times x i is equal to g inverse times s i plus g inverse times v i. Then the new noise covariance matrix is the expected value of g inverse v i v i transpose g inverse transpose which is equal to g inverse expected value of v i v i transpose g inverse transpose and this is just sigma so that is equal to g inverse sigma g inverse transpose which is equal to g inverse g g transpose g inverse transpose which is equal to the identity matrix. So, we whiten the noise so the resulting noise is white. So, basically this this Cholesky decomposition is very useful in noise whitening which is a very important tool in signal processing and in particular Cholesky is used because it is stable and it is easy to compute. So, just to summarize what we have seen in this seen so far in this chapter is that we looked at the Jordan canonical form where the main working block was this Jordan block which has the form lambda along the diagonal and once on the first super diagonal and zeros everywhere else and we saw that in the Jordan canonical form any a can be written as s times a matrix that contains some Jordan blocks and call it j n 1 of lambda 1 j n k of lambda k and zeros everywhere else times s inverse and we call this matrix j which is the Jordan canonical form of a and here these n 1 n 2 their block sizes are such that n 1 plus n 2 plus etc plus n k is equal to n and lambda 1 through lambda k are at the eigenvalues of the matrix and these eigenvalues are not necessarily distinct and further if I look at the summation of n i over all i such that lambda i equals some particular value lambda this gives me the algebraic multiplicity of lambda and if I look at the sum of i equal to 1 to k I just add one each time lambda i equals lambda this counts the number of blocks in which this eigenvalue lambda appears in the Jordan canonical form and this is equal to the geometric multiplicity of lambda and of course see from this itself you can see that the algebraic multiplicity is greater than or equal to the geometric multiplicity if all these blocks are of size one then all these n i's are equal to one and these two will be equal and so the matrix j or a is diagonalizable if and only if k equal to n i.e. all are one cross one blocks okay and the other thing we saw the next thing we saw was how to find the JCF Jordan canonical form what we do is there's a recipe we wrote out where the first step is to find all distinct eigenvalues of a and then for each lambda i eigenvalue lambda i we calculate the rank of a minus lambda i times the identity matrix power k for k equal to 1 2 etc and we study the sequence of ranks and this sequence gives the orders of all the Jordan blocks of a corresponding to the eigenvalue lambda i just right here for each eigenvalue and one very interesting consequence of the Jordan canonical form is that a is similar to a transpose for every a and we saw that a the matrix a is convergent if and only if mod lambda is less than 1 for every eigenvalue lambda of a this is also true for non diagonalizable matrices and then we discussed a bit about the minimal polynomial which is a monic polynomial that is the leading coefficient equals 1 and smallest degree polynomial that annihilates a that is if i p such that p of a equals 0 and this minimal polynomial is unique and divides any other polynomial that also annihilates a and the other thing we saw is that similar matrices have the same minimal polynomial and the other thing is that the JCF can be used to find the minimal polynomial although it may not be the best way to do it but what you do is that if a has eigenvalues the distinct eigenvalues lambda 1 through lambda m then the minimal polynomial is of the form through a of t equal to the product i equal to 1 to m t minus lambda i power r i where r i is at least equal to 1 but it's equal to and in fact r i r i is the size of the or the order of the largest Jordan block corresponding to lambda i and of course a is diagonalizable if and only if r i equals 1 for all i it comes back to the point that the the all the Jordan blocks are 1 cross 1 because r i is equal to 1 and the minimal polynomial is of the form say t minus lambda 1 it's up to t minus lambda m okay um then we looked at other factorizations which is the lu fact decomposition and the lu decomposition with uh this is just nothing but a Gaussian elimination and lu decomposition with pivoting which is a numerically stable way of computing the lu decomposition and then we looked at Cholesky decomposition and we briefly discussed about the use of lu in solving linear systems okay so this just sort of summarizes what we saw in this chapter so far okay so um with this I conclude what I wanted to say about these matrix decompositions