 Okay, so good afternoon and let's begin. Just to recap the last time we saw some other consequences of the Gershkorin disk theorem and then we started discussing about perturbation location and perturbation of eigenvalues namely that eigenvalues of a matrix change when the matrix is perturbed. And so the starting point is that we are given a matrix A of size n cross n and it has eigenvalues lambda 1, lambda 2 up to lambda n and this matrix A is getting perturbed by a matrix E to get the matrix A plus E and suppose we compute the eigenvalues lambda hat 1 up to lambda hat n or in particular if some eigenvalue is lambda hat what can we say about how close lambda hat will be to one of the eigenvalues of the matrix A that is can we say something about lambda hat minus lambda I modulus value for some I that is will this value be small for some one of the eigenvalues of A and the various ways to answer this question first of all if A happens to be a diagonal matrix if A happens to be a diagonal matrix then we can simply directly use the Gershkorin disk theorem to say that this difference between lambda hat and one of the eigenvalues is at most the L infinity norm of E this is what we saw the last time and the other case is when lambda is a simple eigenvalue of A meaning that it has an algebraic multiplicity equal to 1 then if we denote X to be the right eigenvector corresponding to lambda that is A X equals lambda X and Y to be a left eigenvector corresponding to lambda that is Y Hermitian A equals lambda Y Hermitian then if E is a perturbation matrix such that its spectral norm equals 1 then if we denote A of t to be A plus t times E then we can study what happens to the how sensitive the eigenvalues of A by looking at the modulus of the derivative of the eigenvalue at t equals 0 so we showed that this is less than or equal to 1 over S of lambda where S of lambda is defined to be the magnitude of Y Hermitian X and S of lambda we called it the condition number or the condition of the eigenvalue lambda so this is for simple eigenvalues okay so it's an eigenvalue of A with algebraic multiplicity equal to 1 then we looked at the case where A is a diagonalizable matrix and E can be anything that means A can be written as S lambda S inverse for some invertible matrix S and a diagonal matrix lambda then we showed that lambda hat minus lambda I is at most K of S times the norm of E for some I where this norm is any matrix norm such that the norm of a diagonal matrix is the maximum magnitude diagonal entry in the matrix and K of S here is the condition number of this matrix S with respect to this norm used here of course the condition number is greater than or equal to 1 and it is equal to 1 when S is unitary so if the matrix is unitary diagonalizable then we have that lambda hat minus lambda I is less than or equal to norm of E itself so this K of S is equal to 1 and so then normal matrices are unitary diagonalizable so as a consequence if A is a normal matrix and E is arbitrary then we have that lambda hat minus lambda I is at most the here taken the spectral norm of E for some eigenvalue lambda I of A the next one is that suppose A and E are both Hermitian symmetric matrices then if we denote lambda 1 to lambda n to be the ordered eigenvalues of A and lambda hat 1 through lambda hat n to be the ordered eigenvalues of A plus E then we can lower bound lambda hat k minus lambda k so this is taking the kth largest eigenvalue of A plus E and subtracting the kth largest eigenvalue of A itself and that is at least equal to lambda 1 of E and at most equal to lambda n of E and further this magnitude of this difference is at most the spectral radius of E okay so this is where we stopped the last time now we consider one more case where A and A plus E are both normal matrices what that means is that we can write A as U lambda U Hermitian and A plus E as V lambda V Hermitian where U and V are both unitary matrices because normal matrices can be unitary diagonalized now if we look at the Frobenius norm square E2 square then that is the Frobenius norm square of A plus E minus A which I can write as U lambda hat U Hermitian just substituting for A plus E and minus U lambda U Hermitian just substituting for A Frobenius norm square and what I can do is since the Frobenius norm square of a unitary matrix is no yeah so I can pull out a U on the left and a U Hermitian on the right and since the since multiplication by left or right multiplication by a unitary matrix does not change the Frobenius norm of a matrix I can get rid of U and U Hermitian and write it as the norm of Z lambda hat Z Hermitian minus lambda square where Z which is equal to U Hermitian V is a unitary matrix okay so now this is a unitary matrix and this Frobenius norm square we know that it can be written as trace of so Frobenius norm squared of A is trace of A Hermitian so I just write that here so this is trace of Z lambda hat Z Hermitian minus lambda times the Hermitian of this which is Z Hermitian Hermitian is just Z lambda hat Hermitian Z Hermitian minus lambda Hermitian if I just expand this out this gives me this times this which is just the Frobenius norm of Z lambda hat Z Hermitian square and then this times this is just the Frobenius norm of lambda square minus 2 times trace of the inner product between this and this which is Z lambda hat Z Hermitian times lambda Hermitian. Now again this is multiplication by a unitary matrix so I can get rid of this Z on the left and Z Hermitian on the right and write this as lambda hat 2 square plus lambda 2 square minus 2 real part of trace of this matrix the same as the previous equation so what are we trying to do here we are trying to see how relate this E2 square to the eigenvalues of A and A plus E so that we can ultimately bound E2 square so for example here you can already see that this Frobenius norm squared this is just a diagonal matrix so it's just the sum of the diagonal entry squared this is also a diagonal matrix so it's just the sum of the diagonal entry square and then there's this term I'll come to that in a sec but the point is that on the right hand side I have terms that depend only on the eigenvalues of A and A plus E and if I can find an expression for for this quantity which only depends on the eigenvalues of A and A plus E then now I'll have an upper bound which connects eigenvalues of A and A plus E with the Frobenius norm squared of the error matrix or the perturbation matrix so that's the final goal is to find some expression for this quantity which depends only on lambda hat and lambda i now this quantity itself is something whatever it is but if I replace it with something bigger then I'm only making this whole expression smaller so I can say that it's greater than or equal to these two terms minus a quantity g star where I define g star to be the maximum that this can attain over all possible unitary matrices z so g star is the max of this thing two real trace of w lambda hat w Hermitian lambda Hermitian over all matrices w which are unitary now if I simply expand this out right and take into account the fact that lambda and lambda Hermitian are both diagonal matrices I can write this two real trace of w lambda hat w Hermitian lambda Hermitian as the summation over ij going from 1 to n of the modulus of w ij square times the real part of lambda i star times lambda j hat is there a question okay so this is like this now suppose we define a matrix c with entries c ij c ij is equal to this quantity this is this coefficient of this term here now what is this matrix c it has non-negative entries okay and if you take the sum of the rows the entries in any given row or the sum of the entries along any given column they all add up to 1 because w is a unitary matrix okay such a matrix is called a doubly stochastic matrix a matrix with non-negative entries where every column adds up to 1 and every row adds up to 1 is called a doubly stochastic matrix and so what we what we see is that whenever w is unitary then this matrix c will be a doubly stochastic matrix however the converse need not be true in the sense that if i take a c which is doubly stochastic it there may not exist at a unitary w such that mod wi squared is this doubly stochastic matrix so if i replace the maximization over w with maximization over all matrices c which are doubly stochastic then i'm only expanding my potentially expanding my space expanding the space over which i'm doing this optimization so this is further upper bounded by the maximum of 2 c i j so wi j squared is equal to c i j times this real part of lambda i star times lambda j hat over all possible doubly stochastic matrices now this objective function is linear in this matrix in the entries of this matrix c and further you know a linear function is both a convex and a concave function of these variables and so as a consequence this is this amounts to maximizing a convex function over the space of doubly stochastic matrices so in order to solve this optimization problem we need to know something about this constraint space what is the space of doubly stochastic matrices so let's precisely define that so suppose ds is the set of all doubly stochastic matrices okay it's a small exercise to show that ds is a convex set that is you take any two doubly stochastic matrices and if you take a convex combination of those two matrices you will get a doubly stochastic matrix so you can show this very easily so the set ds is actually convex okay now we we use one very very fundamental result from optimization that the maximum of a convex function over a convex set is always attained at what is called an extremal point of the convex set okay similarly if you want to minimize a concave function over a convex set the minimum will occur at an extremal point of the convex set so just to give you the idea suppose I have a convex function like this and if I have a convex set in on this real line which is an interval and if I want to in fact yeah if I want to maximize this convex function over this convex set then the maximum will occur at an extreme point this is also true if I take a convex function that looks like this for instance okay the maximum will be either at this point or this point depending on which one is higher the minimum could be at some point inside but the maximum will always be at an extreme point of the convex set okay so so then what is an extreme point it's like the end points of the convex set but the generalization is as follows so if d is a convex set on some defined on some vector space we say that x a point x in d is an extremal point or an extreme point of d if there is no lambda 1 lambda 2 greater than 0 which add up to 1 and x1 x2 belonging d such that lambda 1 x1 plus lambda 2 x2 is equal to x in other words you cannot find two points that are internal to the set d where if you take a convex combination of those two points you will get this point x okay so the so that you can see that from this interval also if I take either this point or this point I cannot write this as write this point or this point as a convex combination of two points that are internal to this convex set whereas if I take a point here I can write it as a convex combination of these two extreme points so this is not an extreme point so for example in two dimensions if I have a polygon like this then the corners of the polygon are all extreme points or if I have a circle in two dimensions like this the entire boundary of the circle is a are extreme points okay so the solution to this to this optimization problem okay will be at one of the extreme points of this convex set so all we need to do is to identify what are the extreme points of this convex set substitute those extreme points and then pick the best the highest value we can get so we need to understand what would be the extreme points of this particular convex set okay so the the extreme points of this convex set are actually given by a theorem which is known as Birkhoff theorem or the Birkhoff von Neumann theorem this is one of the this is a theorem I will not be proving in the class it's it's a it'll digress too much from what we want to do plus we are we don't have time to prove this theorem here but basically what it says is that the set ds okay is the convex hull don't worry about if you don't know what this convex hull means it's the convex hull of the set of n cross n perturbation permutation matrices okay and the extreme points of ds are precisely these permutation matrices okay that's the punch line the extreme points of ds are precisely the permutation matrices so basically if c is a doubly stochastic matrix then there exists an n less than infinity alpha 1 to alpha n non negative and adding up to 1 and permutation matrices p1 to pn such that you can write c to be alpha 1 p1 plus etc up to alpha n pn okay so any any doubly stochastic matrix can be written as a convex combination of permutation matrices that's the punch line and these permutation matrices are essentially vertices of this convex set ds in fact some side notes here are that there is another famous theorem called karath Yaldori's theorem which says that you can choose n to be at most n squared minus 2n plus 2 that is n minus 1 the whole squared plus 1 okay so while there exists n factorial permutation matrices of size n cross n we don't need to use all n factorial permutation matrices to decompose a given doubly stochastic matrices we can make do with at most n squared minus 2n plus 2 matrices and further this this decomposition is not unique there are the various ways in which you can decompose if I go back to this example here so if I take a point over here obviously you know I can write this as a convex combination of these extreme points in multiple ways for example I can take these three points and find the convex combination get me here or I can potentially take these three points and write it as a convex combination of these three points and so on so there's no unique way to do it but in this case you can see that for any point I can take a I can I can I can reach any point by taking a convex combination of three points so when n equals 2 what happens to n squared minus 2n plus 2 n squared is 4 minus 4 plus 2 in fact two points are enough that's what karat karatheadoris theorem says but this is not this this is not the convex set corresponding to permutation matrices so if you look at the 2 cross 2 doubly stochastic matrices you can write any 2 cross 2 doubly stochastic matrix as a convex combination of at most two permutation matrices okay so then what is a permutation matrix it's basically a square binary matrix with exactly one entry equal to one in every row and one one entry equal to one in every column and zeros everywhere else and we've also seen these matrices previously we know that p a permits the rows of a and a p permits the columns of a okay so now if we use this Birkoff's theorem then there exists a permutation matrix p that solves this optimization problem because these the solution to this optimization problem is an extreme point and the extreme points are all permutation matrices so for that permutation matrix we go back to the earlier way of writing this this expression and we write it as two times real part of trace of now p is a permutation matrix so p lambda hat p transpose because it's zeros and ones you don't need a Hermitian there times lambda Hermitian okay and now p lambda hat is a permutation of the columns p transpose lambda hat lambda hat Hermitian is a permutation of the rows of lambda Hermitian now so if p e i is equal to e or e of sigma of i so sigma of i represents the permutation so basically what it's saying is that the index i is getting mapped to index sigma of i that is what this permutation matrix is doing so there are different ways of writing out a permutation matrix one is to write it as a matrix and the other is to specify this sigma of i the permutation function which maps indices i equal to 1 to n to i equal to 1 to n okay so if we if we do this and then we substitute that into this expression you can actually simplify this formula to this expression here so we now see that what we are left with is just lambda i star times lambda hat of sigma of i okay so it's just products of permuted versions of lambda hat so thus what we have shown is that this e2 squared the l2 norm of e squared is lower bounded by the summation i equal to 1 to n lambda hat of sigma of i square so it was lambda hat i squared earlier but sigma of i is just a permutation so all the entries will get included if i use sigma sigma of i instead of i itself so this is also okay plus the summation i equal to 1 to n lambda i square minus 2 times the real part of sigma hat of lambda hat of sigma of i times lambda i star now this expression here is nothing but the modulus of lambda hat of sigma of i minus lambda i squared added up i equal to 1 to n so now this is beautiful because i am now shown that this quantity here which is the sum of the square differences of the eigenvalues of a and a plus e these are the eigenvalues of a these are the eigenvalues of a plus e but written in some other order sigma of i this sum is at most equal to the the Frobenius norm squared of e so what we have shown is what is it is known as the Hoffman Weyland theorem which says that if a and e are both n cross n matrices a and a plus e both being normal matrices and lambda 1 to lambda n and lambda hat 1 to lambda hat n are the eigenvalues of a and a plus e respectively and in some order then there exists a permutation sigma of i of the integers 1 to n such that summation i equal to 1 to n lambda hat of sigma of i minus lambda i square is at most the norm of e square okay so basically what this theorem does is to show that there is a strong global stability it's strong because it just depends on e2 square and it's global because it doesn't matter which e you pick even if you choose adversarily this thing is at most e2 square and these are stable to stability of the set of eigenvalues of a normal matrix okay so this is one more result that we have about the perturbation of eigenvalues of a matrix okay so if a is not diagonalizable we've seen that we can find a formula for perturbation of algebraically simple eigenvalues and we can also do one other thing which is that we can approximate um arbitrarily by a diagonalizable matrix so we can approximate this matrix a arbitrarily closely by a diagonalizable matrix and then everything we said about diagonalizable matrix is applicable and we can then say something about how the matrix a will get perturbed what we mean by we can approximate it arbitrarily closely by a diagonalizable matrix is something we've already seen before so I'll just state that to recall what we said so a is an n cross n matrix and uh so suppose this is any norm then given epsilon greater than 0 there exists a matrix a1 c to the n cross n such that a1 has n distinct eigenvalues and a minus a1 is actually less than or equal to epsilon in fact you can even make this okay is that most epsilon and corollary to this matrices is dense on okay so given a matrix a in c to the n cross n I can find a diagonalizable matrix which is arbitrarily close to uh this matrix a it's useful because one one useful way of approaching perturbation related problems is to first solve for diagonal matrices then solve for diagonalizable matrices and then use the approximation and some limiting process to say something about what happens in the non-diagonalizable case yeah