 Hi, today we will start a new topic. This is on methods for computing eigenvalues and eigenvectors. In this lecture, we will derive a method called power method. Before getting into the method, let us quickly recall what is mean by eigenvalues and eigenvectors of a matrix. We are given a n cross n matrix of course, with real entries. Eigenvalues of a are defined as the root of the equation determinant of lambda i minus a equal to 0. Note that determinant of lambda i minus a is a polynomial in lambda of degree n and it is called the characteristic polynomial of the matrix A. When n is equal to 2, the characteristic polynomial is a quadratic polynomial and we know how to get the roots of this polynomial. When n is equal to 3 well with little difficulties, we can still somehow compute the roots of the characteristic polynomial. For n equal to 4 and for n greater than 4, it is rather difficult to compute the exact roots of the polynomial. We do not have any formula to compute the roots when it comes to fourth degree or higher degree polynomials. Therefore, one has to go for some numerical approximation in order to get the eigenvalues of a matrix. One obvious way of approximating eigenvalues of a matrix is to first write the polynomial explicitly in lambda and then use some non-linear solvers which we will be introducing in the next chapter. That is polynomial equations are non-linear equations. We have some iteration methods to obtain approximate roots of non-linear equations. We can use one of such methods to approximate the roots of a polynomial equation also right, but this is not so good because of two reasons. One reason is that obtaining explicit form of the polynomial is itself a difficult task especially when the dimension of the matrix is very large and second thing is polynomials are very sensitive to errors. It means if you make a small error in the coefficients of the polynomials, then the corresponding root may be drastically different. Let us have an example to see how dangerous it is to deal with polynomials when we tend to make errors. Let us consider a function f of x given by x minus 1 into x minus 2 up to x minus 10. You can see that it is a polynomial of degree 10 with roots as 1, 2, 3 up to 10. Now, what we will do is we will make a slight perturbation in the coefficient of x power 10 term in the polynomial f of x and we will call this new perturbed polynomial as capital f of x. One can see that the real roots of the polynomial equation f of x equal to 0 that is capital f of x equal to 0 will lie in the interval 1, 2, 3.5. Well, you can verify this graphically. In this graph we have plotted both the polynomials small f of x as well as capital f of x. Small f of x is plotted in blue solid line. You can see that the point of intersection of the graph with the line x equal to 0 are the roots of this polynomial. Similarly, the graph of the function capital f of x is shown in red solid line. Again you can see what are all the roots of this polynomial equation at least roughly and you can see that after around 3.4 the graph of the function capital f of x tends to increase and it never again comes back to intersect the x axis. That shows capital f of x equal to 0 has roots between 0 to around 3.4 whereas, small f of x has roots 1, 2, 3 up to 10. So, what is the difference between small f of x and capital f of x? Well, there is only 1 percent error in the coefficient of x power 10 otherwise these two polynomials are same right. So, a small perturbation therefore, in any of the coefficients of a polynomial can make the roots entirely different from the original one. Therefore, constructing the polynomial explicitly from the expression determinant of lambda i minus a equal to 0 especially on a computer may tend to make some rounding error and that in turn will give you a polynomial in lambda which may have entirely different roots than what actually we wanted as Eigen values of the matrix A. Therefore, it is not a good idea for us to go for solving the polynomial equations in order to get Eigen values of the matrix at least computationally. An alternate idea is what we are going to learn in this class and that is the power method. Let us see how to obtain Eigen values without solving the characteristic polynomial. Power method is actually used to obtain a specific Eigen value of a given matrix called the dominant Eigen value and it also approximates corresponding Eigen vector of the dominant Eigen value of a given matrix A. First let us understand what is mean by dominant Eigen value. You are given a matrix A which is a n cross n matrix and let mu 1 mu 2 up to mu n be the Eigen values of a well they are repeated according to their algebraic multiplicity. Now what you do is first you arrange them in a way that after rearranging the Eigen values will satisfy this sequence of inequalities that is suppose you have mu 1 is equal to say 2 and mu 2 is equal to say minus 5 and mu 3 is equal to 1 then what you do is you name lambda 1 as minus 5 and lambda 2 as 2 and lambda 3 as 1. So, that mod lambda 1 is greater than or equal to mod lambda 2 is it is not equal to mod lambda 3 is just a rearrangement there is nothing new in this and then we call lambda 1 as the dominant Eigen value of the matrix A. Let us see few examples let us take this matrix A is equal to 1 0 0 0 minus 2 1 and 0 0 minus 1 you can see that the Eigen values are 1 minus 1 and minus 2. Therefore, you take lambda 1 is equal to minus 2 lambda 2 is equal to you can take either minus 1 or plus 1 lambda 3 equal to 1 and you can see that lambda 1 is the dominant Eigen value of this matrix and you can also see that the dominant Eigen value of the matrix A is unique right. Let us see another example here the Eigen values of the matrix B are 1 minus 2 and 2 you can see that although B has distinct Eigen values its dominant Eigen value is not unique because both minus 2 and 2 will come as dominant Eigen values. Now, let us see the conditions under which we can apply power method. Power method demands that the dominant Eigen value should be unique it means mod lambda 1 should be strictly greater than mod lambda 2 and then other things goes as usually therefore, the dominant Eigen value should be unique. If you recall the matrix A comes under this condition whereas, matrix B will not come into this condition although it has distinct Eigen values that is interesting here. The next condition is that the corresponding Eigen vectors which we will denote by V 1 V 2 up to V n are real and forms a basis for R n that is all these vectors should be linearly independent. This is also a condition under which you can apply the power method. Now, once you assume this condition that is the Eigen vectors forms a basis it means what you are given any vector V in R n that vector V can be written as a linear combination of the Eigen vectors that is what we meant by saying that this set of Eigen vectors form a basis right. Now, once you have this representation of a given vector then what you do is you pre multiply this equation by a which is also done on the right hand side and note that all this V i's are Eigen vectors therefore, A V i is equal to lambda i times V i right. So, you use this in the above equation to get A V is equal to C 1 into A V 1 will become lambda V 1 and so on you have till C n lambda n V n right. Now, what you do is you just take this lambda 1 outside and write this expression as lambda 1 into C 1 V 1 plus C 2 lambda 2 by lambda 1 V 2 and so on up to C n lambda n by lambda 1 V n. Now, what you do is pre multiply this equation by A again that will give you again all this A V 1 will give lambda 1 V 1 A V 2 will give lambda 2 V 2 and so on A V n will give lambda n V n. Again remove one lambda 1 from here and write it as lambda 1 square and therefore, you will have the first term as C 1 V 1 and the second term is C 2. Now, already you have lambda 2 you are getting one more lambda 2 from here and one more lambda 1 in the denominator will come because you are pulling one lambda 1 outside therefore, it is lambda 2 by lambda 1 square V 2 plus so on till C n lambda n by lambda 1 square V n. And now you keep on doing this that is you keep on pre multiplying A and every time you remove lambda 1 outside and write the expression accordingly. At any kth stage you have lambda k V is equal to you would have pulled k times lambda 1 outside therefore, you have lambda 1 k and then C 1 V 1 and at the same time you will also have lambda 2 k and since you have pulled lambda 1 k out therefore, you have lambda 2 by lambda 1 times k V 2 and so on till C n lambda n by lambda 1 to the power of k times V n. So, we have this expression now you recall that we assumed that the dominant eigenvalue of A is unique therefore, we have lambda j by lambda 1 is less than 1. Now let us use this condition in the expression that we got previously that is by pre multiplying A k times to V we obtained this expression. Now you see each of this term is going to be something less than 1 when you take the modulus on both sides. That shows that as k tends to infinity these terms go to 0. So, what we will be left out with is that this is going to 0, this is going to 0 and therefore, we are left out with A k V is equal to lambda 1 k into C 1 V 1. Therefore, limit k tends to infinity A k into V divided by lambda 1 k will be a scalar multiple of the eigenvector V 1 corresponding to lambda 1 that shows that the sequence A k V divided by lambda 1 k converges to an eigenvector of lambda 1 that is what we understand. Remember this V is any vector in R n we have not specified any condition on this. Only thing is you can observe that in order to get an eigenvector for lambda 1 you should have this V in such a way that in its representation that is if you recall we started with V is equal to C 1 V 1 plus dash dash dash up to C n V n right this scalar C 1 should be non-zero why it is so? Because otherwise you will just have a 0 vector here which is not an eigenvector of lambda 1 right. So, in order to have a non-zero vector here we need to have the C 1 to be non-zero that is very important. So, you can see that we have obtained a sequence which may converge to an eigenvector of the dominant eigenvalue lambda 1 that is what we have understood from this simple calculation. Now, let us try to construct a sequence that converges to the dominant eigenvalue lambda 1. How are we going to do this again look at this expression as k tends to infinity all this terms are going to go to 0 and you only have this term right. Now, what you do is you take one of the components of this vector. Remember this is a vector and similarly the right hand side is also a vector you just pick up one component say a k V is a vector you just take a component say ith component which is not equal to 0. Now, what you do the same component you take from a k plus 1 V also then you can see that this term is nothing but lambda 1 k plus 1 divided by lambda 1 k into C 1 V 1 this is a vector and it is ith component plus all the other terms which are finally, going to give us a n dimensional vector and you take the ith component of that vector also. You can see that this entire thing goes to 0 as k tends to infinity and therefore, you land up with this and this is for k plus 1 right this is for k plus 1 and similarly you have for the denominator also what is that it is also C 1 V 1 the first term is the same plus it has this vector which is obtained at the kth step and it is ith component that does not matter if this will also go to 0 and what remains is going to be equal therefore, they get cancelled here also lambda 1 k will get cancelled with lambda 1 k here and you will be left out with just lambda 1 right. So, that is how we are getting the sequence for approximating the dominant Eigen value of the matrix A. So, what we did so far we just started with some arbitrary vector V we wrote that vector as C 1 V 1 plus C 2 V 2 so on plus C n B n and of course, the first scalar should not be equal to 0. This is just a construction we are not defining it as a method of course, when we are going for a power method it means we do not know what are the Eigen values and what are the Eigen vectors that is why we are going for an approximation procedure right, but this is theoretically what we have to impose as a condition. However, in practice we will not know this therefore, you just have to start with some arbitrary vector and just go ahead with the method. So, once you have this vector chosen then you keep on multiplying A with it and every time on the right hand side you pull lambda 1 out in that way we have constructed 2 sequences. One sequence converges to the dominant Eigen value and another sequence that we constructed converges to corresponding Eigen vector of the dominant Eigen value if at all these sequences converge right. If they converge they converge to what we actually want. However, the question is will they converge that is the question well let us write this method in a systematic way so that it can be implemented on a computer. Let me explain you the computational procedure first and then we will write the algorithm in a nice systematic way to compute the dominant Eigen value and a corresponding Eigen vector using power method what we have to do is first choose an arbitrary vector x naught theoretically that x naught should satisfy certain conditions that we will list later one condition we have already seen that its representation in terms of the Eigen vectors should have its first term that is c 1 to be non-zero ok. So, that is something which is expected, but practically we do not know. So, therefore, practical implementation in that sense is a blind implementation you just have to take an arbitrary vector and then what you do is you find a vector y 1 which is given by a into x 1 remember we have to keep on pre multiplying a with the vector that we have chosen right that is the only idea in power method we are doing the same thing in a rather algorithmic way. So, first choose the vector x naught and multiply a with x naught call that as y 1 then what you do is you define a scalar mu 1 which is nothing, but the coordinate of y at which the maximum norm is achieved. Suppose y 1 is say minus 1 5 and 0 then mu 1 is equal to 5 or suppose y 1 is equal to say 2 minus 5 and 1 then mu 1 is minus 5 it is not plus 5 because when you are finding the maximum that time you will take modulus, but you pick only the index and take the value as mu 1 not the absolute value this is something which often students make mistake they take modulus of y i you should not take that and now what you do is define a vector x 1 as y 1 divided by mu 1. Now, this completes one typical iteration once you get x 1 now again you go to find y 2 which is equal to a times x 1 once you get this then find that coordinate of y 2 at which the maximum norm is achieved. To have a precise choice of this we will take the minimum index at which the maximum is achieved that is suppose if you have y 2 as say minus 1 2 and minus 2 then the maximum norm is achieved both at the second coordinate as well as at the third coordinate. So, we will choose the second coordinate that is we will take mu 2 as 2 ok. So, mu 2 is equal to y i 2 where y i is the coordinate with minimum index at which the maximum norm is achieved. Once you have this you will go for computing x 2 that is nothing but y 2 divided by mu 2 right like that you will keep on going once you get x 2 again you will find y 3 mu 3 and then x 3 and it goes on in that way you have an iterative sequence. Let us put this iterative sequence in a little better way. So, you are given x naught which is arbitrarily chosen then for each k we define two sequences one is a sequence of real numbers mu k and another is a sequence of vectors x k they are given by mu k plus 1 is equal to y i k plus 1 and x k plus 1 equal to y k plus 1 divided by mu k plus 1 what is y k plus 1 y k plus 1 is nothing but a into x k. So, you start with x naught you plug in here you get y 1 and once you have y 1 you go to find mu 1 and once you have mu 1 you go to find x 1 and then again once you have x 1 you again plug in here get y 2 once you have y 2 you plug in here to get mu 2 and once you have mu 2 you plug in here to get x 2 and so on right. So, that is a very clear algorithm that we got and this is called the power method. So, the outcome of the power method is a pair of sequences one is a sequence of real number and another one is a sequence of vectors. Let us take an example let us consider this matrix A just for the information the Eigen values of A are given like this you can see that A has a dominant Eigen value 3 which is the unique dominant Eigen value for A. For information we will also see what are the Eigen vectors that we are considering here the Eigen vectors are v 1 equal to 1 0 2 which corresponds to lambda 1 v 2 corresponds to lambda 2 and v 3 is taken like this which is an Eigen vector of lambda 3 ok. Now this is just for the information we do not need this information in order to work with the power method it is just for our information whereas to run the power method we just need this initial guess remember as far as the implementation is concerned we can simply take this x naught as a arbitrary vector once you have this you will calculate y 1 which is given by A x naught and that happens to be this vector. Now from here you will find the maximum of the absolute values of this coordinates and that is achieved in the third coordinate in this vector and therefore mu 1 is taken as 7.25. Once you have mu 1 you will compute x 1 which is y 1 divided by mu 1 and that is given by this vector. Now let us go to compute the second iteration here you will take the x 1 which was computed in iteration number 1 plug in here and you will get y 2 as A into x 1. Again you see which coordinate is achieving the maximum norm that is coming again in the third coordinate therefore, mu 2 is taken as this value and then x 2 you just observe how the sequence mu is going and also observe how the sequence x k is going on. You can see that in the first iteration mu was 7.25 in the second iteration it jumped actually pretty close to 3. We are expecting mu 2 converge to lambda 1 and sequence x is expected to converge to a scalar multiple of v 1. Let us see how the next iteration goes. The next iteration gives us mu 3 equal to its little more closer to 3 therefore, mu is going very well x 3 is coming like this. You can observe that x 3 is actually going closer and closer to 1 by 2 times v 1. Let us go ahead x 4 pretty close to 3 and this is also pretty close to 1 by 2 v 1 x 5. Similarly, you can go on computing x 6, x 7, x 8, x 9, x 10 and so on. You can keep on going, but I have stopped my computation at the iteration number 10. You can see that your mu is pretty close to the dominant eigenvalue of the matrix A and x 10 is pretty close to 1 by 2 times v 1. So, that is what we could observe here. Now, the question is when does the power method converge and if it is converging well we know that the sequence mu will converge to the dominant eigenvalue and the sequence x k will converge to a scalar multiple of v 1. We can also see what is that scalar multiple? Here it happens to be 1 by 2. You can also understand what is that scalar multiple and so on. For that we have to understand the convergence theorem of power method which we will do in the next class. Thank you for your attention.