 The last time we looked at this diagonal dominant theorem and we defined these Gershgorin disks which basically are circles centered at each of the diagonal entities and radius equal to the sum of the magnitudes of all the off diagonal terms in the same row. And we saw that we saw this very interesting theorem which was called the Gershgorin disk theorem which basically said that all the eigenvalues of a lie in the union of these n Gershgorin disks. Further if a union of k of these n disks forms a connected region that is disjoint from the remaining n minus k disks then there are exactly k eigenvalues in that connected region. Then we saw at the end of the last class this theorem about the continuity of eigenvalues. So today we will see some consequences of this Gershgorin disk theorem and we will also start talking about the condition number associated with eigenvalues. Remember that we have seen the condition number plays an important role in the sensitivity of solutions to linear systems of equations. We will see similarly that there is a condition number related quantity that shows up when you are looking at sensitivity of the eigenvalue problem to perturbations in the matrix. Now recall that the eigenvalues of a are in the union of these Gershgorin disks which I am going to denote by G of a. Similarly since a and a transpose have the same eigenvalues the eigenvalues of a lie in G of a transpose which we can define in this way but basically it's the same definition of the Gershgorin disks but defined with respect to a transpose. So it's the union j going from 1 to n mod of z minus ajj less than or equal to the sum of all the off diagonal entries in the ith column of a. So because the eigenvalues must lie in this set as well as in this set it must they must all lie in the intersection of these two sets. So this can give us a tighter region within which we can locate the eigenvalues of a. And in particular note that the largest modulus eigenvalue of a must also lie in this set but then these are just circles z minus ajj less than or equal to this. So we can actually find out the farthest point from the origin in this circle. Okay the furthest point in di from the origin it has the modulus equal to so the this the circle the di the circle is basically a circle that is centered at aii. So this is some complex number aii and it has a radius equal to the sum of i or j equal to 1 to n j not equal to i mod aj. Okay that's the radius. So if I look at the look for the furthest point I have to go all the way up to aii and then I have to go up further this radius distance. So basically the furthest point has modulus aii plus summation j not equal to i mod of aj which is just simply the summation j equal to 1 to n mod aj. Okay so basically the largest modulus eigenvalue of a must be at most this much distance from the origin. So the largest modulus eigenvalue of a but then of course the eigenvalues lie in the union of these Gershgorin disks and so the largest eigenvalue in modulus of a is upper bounded by the largest row sum. It's also upper bounded by the largest column sum and so from that we get the following corollary that the spectral radius of a is less than or equal to the minimum of the largest column sum and the largest row sum. You've already seen this. We've seen it in this form. Actually I should write it with three bars. We've already seen this but this is a more geometric view of the same result that the spectral radius of a is less than or equal to the mean of the largest row sum now and the largest column sum now. Okay now another remark is that if s is an invertible matrix then s inverse a s has the same eigenvalues as a and so now we can apply Gershgorin disk theorem to s inverse a s and choose s you know we can we we can try to choose s cleverly to get sharper bounds. So here's an example. Suppose I have this matrix a equal to 1 1 0 2. There are two Gershgorin disks. The Gershgorin disk the first Gershgorin disk is centered at 1 and it has a radius equal to 1. The second Gershgorin disk is centered at 2 and as a radius equal to 0. So the first disk is shown in red here. It is centered at 1 and it has a radius 1 and the second disk is shown in black here. It is centered at 2 and it has a radius of 0. So the eigenvalues of a in fact for this we can read off the eigenvalues. They are 1 and 2 and of course they lie in the union of these Gershgorin disks. Now suppose we use s equal to this diagonal matrix with p 1 and p 2 on the diagonal and p 1 p 2 being positive numbers. Then if I work out what s inverse a s is it's p 1 inverse p 2 inverse times this a matrix times p 1 p 2. So if I solve this product I get p 1 p 2 0 and 2 p 2 and then if I multiply that with this matrix p 1 p 1 inverse cancel and so the diagonal entry will remain 1 and the diagonal entry here will remain equal to 2. But the off diagonal term becomes p 2 over p 1. And so now the first circle the second circle remains at center 2 and radius equal to 0. But the first Gershgorin disk will now have be centered at 1 and have a radius p 2 over p 1. And p 2 over and p 1 p 2 can be chosen such that p 2 over p 1 is positive but arbitrarily small. And so you can actually locate the eigenvalues much more accurately. You know that one is in the in some tiny neighborhood around 1 and the other eigenvalue is in a tiny neighborhood around 2. Again of course then in this case the matrix is upper triangular and so it's kind of trivial. You already know that the eigenvalues are 1 and 2 so you don't have to approximately locate them. But for more complex matrices this could be a useful trick. So let's generalize this. So suppose S was the diagonal matrix containing p 1 to p n along the diagonal with all these p i's being positive. Then if you look work out the ijth element of S inverse a S that's going to be equal to p j times a i j divided by p i. And now we apply Gershgorin theorem to this this matrix S inverse a S. If we do that we get the following corollary. A is an n cross n matrix and p 1 to p n are some numbers which are positive you know greater than 0. Then the eigenvalues of a lie in so the when you do this the diagonal entries remain unchanged. So the centers of the circle remain unchanged. So it is mod z minus a i i is less than or equal to the sum of the off diagonal terms when I'm summing over j p i does not depend on j. So I can bring it out of the summation. And so I have summation j not equal to i p j times mod a i j. And this is basically G of S inverse a S in my notation above. And also in union j equal to 1 to n this is the column version z minus a j j less than or equal to when I'm summing over i this does not depend on i so I can pull it out. I'll get p j times summation over i going from 1 to n i not equal to j 1 over p i times mod a i j. So this can give us somewhere sharper bounds on the location of the eigenvalues. And specifically I can think about optimizing p i and p j that is p 1 to p n such that these bounds are as tight as possible. So essentially all the eigenvalues lie in the intersection of all such choices over all such choices of this p matrix. So it's in the intersection of d belonging to script d of the G that is the union of Gershkorin disks corresponding to d inverse a d where d the script d is a set of diagonal matrices with positive entries. So that's basically this corollary 6 which is that the spectral radius when applied when this is applied to the spectral radius we get that the spectral radius is at most the min over p 1 to p n being positive of the max row sum norm or the max column sum norm which is basically max over this is the this is the sum across across columns and this is the sum across rows and then you are free to minimize that over p 1 to p n and this minimum value of this is still an upper bound on row away. It turns out that this upper bound is actually tight for any matrix with positive entries and there's a there's a proof in the text that you can look at. So far we've been discussing matrices that are arbitrary not necessarily Hermitian but suppose A was Hermitian then the eigenvalues of A are real valued and so then we can specialize the Gershkorin theorem to say that the eigenvalues belong to the real line intersection with the union of Gershkorin disks. So if I take these Gershkorin disks which could be located wherever etc and then I take the union of that with the real line I just get these line segments. So this is a finite union of intervals. You can similarly write out tighter bounds when the matrix has additional structure like skew, Hermitian, unitary or orthogonal etc. Now we also looked at a diagonal positive S to improve the inclusion regions of the eigenvalues but it's possible to get tighter bounds perhaps by considering more general S but we won't look at that in this in this course. Now one related question is can you do better than the Gershkorin disk theorem or is that the tightest uncertainty region within which you can within which the eigenvalues of the matrix A are guaranteed to lie? The answer is no because there is also this is also there in the text but it can be shown that if Z is some complex number on the boundary of G of A, boundary of these Gershkorin disks then you can find a matrix B which matches A in the diagonal entries and matches A in magnitude in the off diagonal entries and such that this Z is an eigenvalue of B. And so basically the point is that in Gershkorin's theorem we are only using the main diagonal entries and the absolute values of the off diagonal entries and that is indeed the tightest bound you can get on the uncertainty region within which eigenvalues lie. So if you want a tighter bound than the Gershkorin disk theorem you will need to take into account the phase angles or the signs of the off diagonal tabs. Now we move on to a different topic which is the condition of eigenvalues. This was actually a topic that sort of initiated all this discussion location and perturbation of eigenvalues. This whole chapter is about that. Now I come back to this example we discussed in the first class we had on this chapter where we looked at this matrix and we said that these eigenvalues are very sensitive to small changes to this matrix. We added 10 to the minus 2 here and we found that the eigenvalues became plus or minus 100 and plus or minus 100 i. So this is very sensitive to small changes in the eigenvalue and in general a matrix could have here there is only one eigenvalue zero and this matrix is not a well conditioned matrix with respect to its eigenvalues in the sense that it is a small perturbation can lead to a large perturbation in eigenvalues. So by and large this is the definition we will use for an eigenvalue being well conditioned or ill conditioned namely that if you apply a small perturbation of size measured in some norm if you if you apply a perturbation of size epsilon to the matrix A then the perturbation in the eigenvalue should also be of the order epsilon. If that happens we say that the matrix the eigenvalue lambda is a well conditioned eigenvalue otherwise we say that it is an ill conditioned eigenvalue. So generally what happens is that these eigenvalues could be some of the eigenvalue I mean the matrix A could be well conditioned with respect to some of its eigenvalue values and ill conditioned with respect to the other eigenvalues. Now in particular if the matrix yes. Sir suppose if I apply this epsilon perturbation to diagonal entries then it is not necessary for the eigenvalue to be ill conditioned right what I am essentially trying to say is it also depends on where we apply the perturbation. Yes right yeah so so how can we conclude it is well conditioned in general so shouldn't it be a function of the position also. Yeah it depends on that so what we will be doing we will look at perturbations like this by a matrix E and what you are given is a matrix D plus E and an adversary is allowed to choose whichever entries in E they want to perturb and your eigenvalue should remain stable no matter which entries the adversary perturbs as long as the overall magnitude of perturbation is less than some epsilon okay that's the kind of guarantees that we will look for okay so I'll explain that further as we go along. So now in particular if you start with a matrix D which is diagonal and you let E be some perturbation matrix and consider D plus E now by Gershgorin theorem the eigenvalues of D plus E the the diagonal entries become lambda i plus E i i so those become the new centers of these disks and the radius is the summation j not equal to i of E i j so that's basically your Gershgorin disk and the union of this over i going from one to n is where the eigenvalues are guaranteed to be located these are the eigenvalues of the perturbed matrix D plus E okay and now what I can do is to add E i i and use triangle inequality and I can show that these eigenvalues these these disks are actually contained in the this set of disks which is centered at lambda i and has radius mod E i i plus this right hand side here which is summation j equal to one to n mod of E i j okay so what that means is that if lambda hat is an eigenvalue of D plus E then it must hold that when I substitute lambda hat for Z here that lambda hat minus lambda i less than or equal to this should hold for at least one of these i's and so that means there is at least there is some eigenvalue lambda i of of this matrix D such that lambda hat minus lambda i is less than or equal to the max of all these radius which is equal to the the l infinity norm of E okay so what this means then is that as long as I'm allowed to only perturb the matrix A D by a matrix E whose infinity norm is bounded then the perturbations in the eigenvalues are also bounded by the same quantity so in other words what this shows is that the eigenvalues of diagonal matrices are well conditioned unfortunately this argument does not extend to the non-diagonal case but we can say more in two important special cases the first being when A is diagonalizable and the second being when lambda is a simple eigenvalue of A that is it's an eigenvalue whose algebraic multiplicity equals one we'll start with the second case so it's a simple eigenvalue and so we look at the condition of that specific eigenvalue so let lambda be a simple eigenvalue of A which means that it has algebraic multiplicity equal to one and let X be a right eigenvector and Y be a left eigenvector corresponding to lambda and both being unit norm that means A X equals lambda X and Y Hermitian A equals lambda Y Hermitian and define S of lambda to be equal to the mod of Y Hermitian X okay the inner product between Y and X the left eigenvector and the right eigenvector then we define the condition of the eigenvalue lambda to be one over S of lambda now because a lambda is a simple eigenvalue of A this S of lambda is unique and S of lambda is at most equal to one by the Cauchy Schwartz inequality and it's also is possible to show that S of lambda is not equal to zero so it's a number between zero strictly greater than zero and less than or equal to one now let P be any matrix whose spectral norm equals one okay that is square root of so spectral norm is square root of lambda where lambda is the largest eigenvalue of P Hermitian P or it's also equal to the max over all unit L2 norm vectors of the L2 norm of P X so P P be a matrix such that it has spectral norm equal to one or spectral radius equal to one then define A of T to be A plus T P in other words we are looking at perturbing the matrix A by a unit spectral norm matrix multiplied by some coefficient T and think of T as being a small number so if T is small enough you're really applying a small perturbation on this matrix A so I'm okay so what am I doing here I'm trying to show you why we consider one over S of lambda to be the condition of the eigenvalue lambda in other words when you perturb the matrix like this A of T equals T P then the perturbation in the eigenvalue lambda will be of the order one over S of lambda times the perturbation that you apply which is like T okay that's what we will show now suppose lambda of T and X of T both being differentiable in with respect to T in the neighborhood of zero the eigenvectors and eigenvector and eigenvalue of A of T okay that is A of T times X of T equals lambda of T X of T and note that when I said T equals zero I get lambda of zero and that lambda of zero equals lambda and X of zero equals X where lambda and X as I defined above so lambda is a simple eigenvalue and X is its corresponding eigenvector okay and define dash with to be to mean the derivative so in other words lambda dash of T is D lambda of T over DT X dash of T is DX of T over DT and A dash of T is D A of T over DT so we have the following proposition which says that lambda dash of zero in magnitude is at most one over S of lambda what this means is that a small perturbation of order epsilon in A leads to a change in the eigenvalue lambda of order at most epsilon over S of lambda that is what lambda dash of zero being at most one over S of lambda means okay so how do you show this so we start by differentiating this equation A of T X of T equals lambda of T X of T with respect to T and then we set T equals zero so if I differentiate this using chain rule I have A dash of T X of T plus A dash of so plus A of T X dash of T is equal to lambda dash of T X of T plus lambda of T X dash of T so and then I'm substituted T equals zero to get this equation but remember that A dash of zero A of T is equal to A plus T P so A dash of zero is just P and X of zero equals X A of zero is just A and here X of zero equals X and lambda of zero equals lambda so substituting all that I have P X plus A X dash of zero is equal to lambda dash of zero times X plus lambda X dash of zero and now we pre multiply by Y Hermitian so I'll get Y Hermitian P X and here I have Y Hermitian A X dash of zero but Y Hermitian A is lambda Y Hermitian so I'll have a minus Y Hermitian so I'll have a minus lambda Y Hermitian X dash of zero on the right hand side and this is lambda dash of zero times Y Hermitian X and this is lambda Y Hermitian X dash of zero which exactly cancels with the lambda Y Hermitian X dash of zero coming from the left hand side so these two cancel and what I'm left with is lambda dash of zero times mod Y so if I take the modulus of this equation so just this equals this is all I'm left with so if I take the modulus on both sides mod of lambda dash of zero times mod Y Hermitian X is equal to Y Hermitian P X magnitude which by the sub multiplicativity of may and compact you use the idea of compatible norms and sub multiplicativity to write it as the product of the norm of Y Hermitian times the norm of P times the norm of X and we've started out by assuming that these both all three of these are equal to one and so we have lambda dash of zero is less than or equal to one over Y Hermitian X which is one over S of lambda even if norm P2 is not equal to one we would just have a P2 sitting in the numerator here the result looks more elegant if you use assume P2 equals one and write it as one over S of lambda okay so what this means is that if S of lambda which is mod of Y Hermitian X the inner product between the left eigenvector and the right eigenvector if that is close to one then the matrix is well the eigenvalue lambda is a well conditioned eigenvalue if it's close to zero then it's an ill conditioned eigenvalue in this case it it actually means that a is close to a matrix with lambda being a repeated eigenvalue so when you have repeated eigenvalues it's possible that you can apply small perturbations and make large changes in the eigenvalues not necessary but it's possible the result doesn't apply to that case