 The last time we looked at the notion of unitary equivalence, we completed that discussion and then we presented or discussed this very important theorem which is Schur's unitary triangularization theorem which basically says that given any matrix there is no restrictions on A, it can be any complex valued matrix of size n cross n and it has eigenvalues lambda 1 to lambda n and n cross n matrix will always have n eigenvalues. Then there exists a unitary matrix U such that A is unitary equivalent to a matrix T which is upper triangular with diagonal entries equal to these n eigenvalues lambda 1 to lambda n. Of course, if this if A and all its eigenvalues are real you can show can be chosen to be a real orthogonal matrix which will also be real orthonormal matrix. So that is the I mean the generality of the theorem is what makes it very important. It is applicable under no restrictions on the matrix A. One application of this triangularization theorem that we saw was the Cayley Hamilton theorem which basically says that any matrix A satisfies its own characteristic polynomial and we saw the proof of this theorem and that is where we stopped the previous time. The next thing to today what I want to discuss is some uses of the Cayley Hamilton theorem and then some points about diagonalizability of matrices and then I will maybe start the discussion on normal matrices. Okay so we will begin with the first point that is uses of Cayley Hamilton theorem. So the Cayley Hamilton theorem can be used to express A power k for k greater than or equal to n as a linear combination of lower powers of A. So this is an application that I suppose most of you have seen in your undergraduate program. So we can write A power k greater than or equal to n as a linear combination of I A A power n minus 1. So we will just illustrate this with an example. So suppose A was the matrix 3, 1 minus 2 and 0. Then its characteristic polynomial B A of t will be t minus 3 times t plus 2. So that is going to be equal to t squared minus 3t plus 2 and since the matrix A satisfies its characteristic polynomial we have A squared minus 3A plus 2I equals the all 0 2 cross 2 matrix which in turn implies A squared equals 3A minus 2I. So I can write A squared in terms of A and the identity matrix. Now A cube I just have to multiply this by A I get 3A squared minus 2A and I can substitute for A squared from this. So that becomes I just do it 3 times 3A minus 2I minus 2A which is equal to 9A minus 6I minus 2A which is equal to 7A minus 6I. Similarly A power 4 again you multiply this by A and then you substitute for A squared and you get 7A squared minus 6A which then is equal to 15A minus 14I and so on. So we can write all the higher powers of A as a linear combination of lower powers of A. Again this is a very interesting observation and to me it is not obvious why if you take higher and higher powers of A you should always be able to write it as a linear combination of the first n powers including 0 of A. Specifically if you take an n cross n matrix it is an object that is living in n squared dimensional space and so I can always write an n squared dimensional vector as a linear combination of n squared linearly independent vectors all sitting in the n squared dimensional space. So the fact that you can I mean so if I had said that I can write A power n as a linear combination of n squared matrices IAA squared up to A to the n squared minus 1 that would not have been surprising but what is surprising here is that you can write A power k as a linear combination of IAA squared up to A power n minus 1 only. So you only need n of these matrices and all other powers of A can be written as linear combination of these n matrices that's what is surprising here. Now so basically one other small thing is that this constant term here is actually determinant of A and that is not equal to 0 so A is non-singular so A is non-singular. So this allows us actually to write or find A inverse like this. So basically what I do is I take this equation and write 2I is equal to 3A minus A squared or I is equal to 3 over 2 or I'll write it this way. I'll take 1A common out between these two and write it as A times 3I minus A over 2. Now I multiply both sides by A inverse and that gives me A inverse is equal to 3I minus A over 2. So which I can write as 3I is 3 over 2 0 0 3 over 2 minus A over 2 is 3 over 2 1 over 2 minus 1 0. So then that gives me A inverse is equal to 0 minus half 1 and 3 over 2. So basically Cayley-Hamilton theorem allows you to also compute A inverse when A is non-singular and in fact you can find expressions for A to the minus 2, A to the minus 3 and so on also. So you can try this. So all you have to do is to multiply this by A inverse and then substitute for A inverse from this. So you'll get 3 over 2 times A inverse minus 1 half the identity matrix but you already have an expression for A inverse. You substitute for that you will get an expression for A to the minus 2 and so on. And this observation is true for any non-singular matrix and so we can say that for any non-singular A in C to the n cross n there exists a polynomial q of t of degree at most n minus 1 such that A inverse equals q of A. Okay so now we move on to the next point which is that we know that not all matrices are diagonalizable but how close can we get? Can we get a matrix that is, can we take a matrix that is not diagonalizable and express it through a similarity transform or a unitary equivalence to a matrix which is almost diagonal. So there are two ways to answer this. So the first way is to consider that is we can say that can we find or so okay I'll just write the answer. So there exists a diagonalizable matrix that is arbitrarily close to the given matrix and I'll make the sense in which I'm saying arbitrarily close clear in a minute. Any given matrix is similar and upper triangular matrix with arbitrarily small of diagonal entries that is it's almost diagonalized. So basically these are two theorems that it basically make this assertion. So the first one is like this. So we're given a matrix A in c to the n cross n. Given any small number epsilon greater than 0 there exists an A of epsilon which is a matrix which is going to be close to A in c to the n cross n that has n distinct eigenvalues and therefore diagonalizable is such that sigma ij equal to 1 to n aij minus aij of epsilon. This is the difference between corresponding entries of A and A of epsilon and added up squared values added up over all the entries is less than epsilon. So here aij of epsilon is the ijth element of A of epsilon. So this is the result. So basically it doesn't matter if a matrix is not diagonalizable you can find a matrix that is arbitrarily close to it and is also diagonalizable. So proof is actually quite straightforward. So since A is n cross n there is a U such that U Hermitian Au is equal to T which is upper triangular that is Schur's theorem. Let E be a matrix a diagonal matrix with E i in magnitude being less than square root of epsilon over n. So E i's each of them none of them is bigger than square root of epsilon over n in magnitude and we choose these E i's such that T 11 plus E 1, T 22 plus E 2, T nn plus E n are distinct. So can this be done? Can you always find E 1, E 2 up to E n with magnitudes less than square root of n such that T 11 plus E 1, T 22 plus E 2 etc up to T nn plus E n are distinct numbers? Of course you can because there are infinitely many numbers between 0 and square root of epsilon over n. You just have to pick some numbers such that and you also can choose the phase angles of these numbers to make them all distinct. So it's really very easy to choose n numbers such that T 11 plus E 1 up to T nn plus E n are all distinct numbers. Then the matrix T plus E has distinct eigenvalues which means that it is diagonalizable. So we've already seen that before that a matrix that has distinct eigenvalues is always diagonalizable and so basically we then have that if I consider U times T plus E times U Hermitian. So I am undoing this operation here. This will be equal to A plus U E U Hermitian. This matrix, this is similarity transform, it preserves the eigenvalues. So this matrix has distinct eigenvalues which implies it is diagonalizable. So it is similar to T plus E. So that tells us what we should choose as A of epsilon. So let A of epsilon be equal to A plus U E U Hermitian that implies A minus A of epsilon is going to be minus U E U Hermitian. So and we have already seen that A of epsilon is diagonalizable. We just need to show that the Frobenius norm of A minus A of epsilon, the Frobenius norm squared of this is going to be less than epsilon that will satisfy this last requirement of the theorem. So then we have that summation ij equal to 1 to n mod aij minus aij of epsilon squared. This is the Frobenius norm and this is invariant under unitary equivalence and so or in fact it is invariant under similarity transforms and so this is equal to sigma i equal to 1 to n. I just need to consider the Frobenius norm of this quantity E which is diagonal. So I just need to add up over the diagonal entries mod ei square but each of these ei says at most square root of epsilon over n in magnitude. So ei squared is at most epsilon over n and so if I add up n of these I get that this is less than n times epsilon over n which is equal to epsilon. So basically A of epsilon satisfies the requirements of the theorem. The other theorem is goes like this. So again A is an n cross n matrix then for every epsilon greater than 0 there exists a non-singular S epsilon belonging to C to the n cross n such that S epsilon inverse A S epsilon is equal to T epsilon which is upper triangular and mod of tij of epsilon is less than epsilon for 1 less than or equal to i less than j less than or equal to epsilon. Of course tii you cannot restrict it to be small because tii's are the eigenvalues of A and so those may not be small but the off diagonal terms can be made arbitrarily small. So the difference between the two theorems is that in this case what we are doing is instead of trying to diagonalize A we are trying to diagonalize a nearby matrix A epsilon and we say that and there is a nearby matrix A epsilon that is diagonalizable and in this theorem what we are trying to do is we're instead of bringing A to a diagonal form we're bringing it to an upper triangular form with arbitrarily small off diagonal entries. We're getting it closer and closer to being so I won't go over the proof of this theorem it's some detail which will take me a long time to complete but you can see the text and in order to just there's the last point in this particular discussion there's one more theorem which is actually another extension of Schur's theorem and is useful for the Jordan canonical form which we'll discuss a bit later. So this is so again A is an n cross n matrix then and it has distinct eigenvalues lambda 1 through lambda k it has k distinct eigenvalues which and k could be less than n k can at most be equal to n with algebraic multiplicities so algebraic multiplicity is the number of times it occurs as a zero of the characteristic polynomial n 1 through n k respectively then A is similar to the matrix T 1 T k zero everywhere else where this matrix is T i is upper triangular n i cross n i upper triangular with diagonal entries lambda i so this theorem again I won't show the proof here but the proof it basically first involves using Schur's theorem to get an upper triangular form and then using a series of carefully chosen non-unitary similarity transforms that produce the this kind of an upper block upper triangular form um and uh and especially the zeros in the off diagonal terms without changing the diagonal or the upper triangular structure of the matrix T but this is good we'll use this later when we discuss the Jordan canonical form okay so next we'll discuss about normal matrices so this matrix is a diagonal right which matrix T 1 to T k this matrix similar matrix to A this would be diagonal right because this is upper triangular it's upper triangular so again keep in mind that this is a result which applies to any A which is of size n cross n so A need not be diagonalizable for this result to hold if I mean it is possible that you can you will end up with T i's all being diagonal which is possible if A is diagonalizable but if A is not diagonalizable but it has these distinct eigenvalues lambda 1 to lambda k then A is similar to an upper triangle this kind of a block upper triangular matrix uh the concatenation of upper triangular matrices along the diagonal where each T i is upper triangular with diagonal entries equal to the corresponding eigenvalue lambda i