 Another very, very important use of linear algebra is in solving systems of linear equations. And so if you are trying to solve the system of linear equations Ax equals b, so let us consider a square system for now. The more general things we will discuss later, but for now we consider a square system and we want to solve Ax equals b, but due to errors we end up solving a nearby system of equations say A plus E x equals b. So you can think of it as your knowledge of A is a little noisy, so you had A plus E in your hand and then you solved A plus E times x equals b and let us say you got a solution x hat. And what the question is, what can we see about the error is x minus x hat, so what can we say about this? So once again if E is the entries of E, so the entries of E are small enough such that the spectral radius of A inverse E is less than 1, then we can do the following. We can follow an approach quite similar to what we just did and we can write x minus x hat is equal to x is equals b and x hat is A plus E inverse b which is then equal to A inverse minus A plus E inverse times b and now we see that this matrix is exactly the same form as what we saw in computing the errors in inverses and so we can use exactly the same approach and write this as sigma k equal to 1 to infinity minus 1 power k plus 1 A inverse E power k times A inverse. So there is this extra b factor that is coming in here, so I do not need to track it out. So other than this extra b factor it is the same as what we had earlier and A inverse b is just x and so this is equal to x. Now in order to proceed further, see I want to bound for example the norm of x minus x hat in terms of this norm of x, so I need to find a way of connecting this norm of x minus x hat to norm of x out here but then there is a multiplication by these matrices out here. So for that we have to use another notion which is the notion of compatible norms, so here is the definition. The vector norm c to the n is said to be compatible with the matrix norm on c to the n cross n is given A in c to the n cross n, the norm of A x, A x is a vector is less than or equal to the matrix norm of A times the vector norm of x. So this is like a mass of multiplicativity property of matrix vector norms. So earlier this was a matrix product and you had the matrix norm of A times the matrix norm of b but now this is a norm of a vector which we are bounding by the matrix norm of A times the vector norm of b and this should hold for all x in c to the n, okay. Then you have the following result, so the question is are there such things as compatible norms? It turns out that any induced norm is compatible with the vector norm that induced the matrix norm and this is a small exercise that you can show but we have the following theorem. If is a matrix norm c to the n cross n then there is some vector norm that is compatible with it, okay so this is always going to be one that is compatible with it. So how do you show this? So first of all we will define so this is actually what I consider to be a kind of clever proof. So the very first step is to define norm of x in terms of this matrix norm and we are going to show that this particular definition of the vector norm is going to be compatible with this particular matrix norm and so without guessing this first step it is little hard to show this result. So I will compute the matrix norm of the matrix where I append a set of n minus 1, 0 vectors to this vector x to get an n cross n matrix and then I compute the matrix norm of that. So I define this to be the vector norm that I am interested in. So is this a vector norm or not is something that you should show that this is a vector norm. So it basically inherits all the properties of the matrix norm and all you need to do is to show that because it is inheriting these properties of the matrix norm it is a positive t, homogeneity and triangle inequality. So then the norm of A x, so this is a vector and I am computing this particular norm of it. This is equal to the matrix norm of a matrix whose first column is A x and all the other columns are 0 which is equal to now because these columns are 0 I can actually pull out an A and write this to be A times x, 0, 0. Now of course the next steps are immediate so you write use the sub multiplicativity and write this as norm of A times the norm of this matrix whose first column is x and all the other columns are 0 and this by definition is the norm of x. So that shows that norm of, so there is always going to be a vector norm that is compatible with the matrix norm. But what I said earlier is that if the norm, this norm is an induced norm then the vector norm that induced it is compatible with this matrix norm. So that is something else I will try to show. So based on this we were looking at the norm of, so to continue we wrote that x minus x hat is equal to sigma k equal to 1 to infinity minus 1 power k plus 1 A inverse E over k times x. We had A inverse B which is equal to x and so if we now take the norm on both sides x minus x hat is going to be less than or equal to because the compatible matrix norm will satisfy this kind of sub multiplicativity type property but it is a sub multiplicativity of a matrix norm with a vector norm and then I will further use the sub multiplicativity to simplify A inverse E power k. So k equal to 1 to infinity minus 1 magnitude is equal to 1 so I can just drop that down norm of A inverse E power k times norm of x and if the norm of A inverse E is less than 1 then this the right hand side, okay let me do it this way. This is going to be equal to norm of A inverse E divided by 1 minus norm of A inverse E times the norm of x if the norm of A inverse E is less than 1. So we taking this norm x to the other side we have that norm of x minus x hat over norm of x is less than or equal to norm of A inverse E over 1 minus norm of A inverse E and then we can use exactly the same arguments as the previous A inverse computation to further bound this as k of A divided by 1 minus k of A times norm E over norm A times norm E over norm A and this will be true if norm of A inverse times norm E is less than 1 and of course there is the other reason other point is that this norm is compatible so you cannot use an arbitrary norm on the left expected to get bounded by the this quantity when you use some other norm over on the right you have to use compatible norms on the left and right hand side, okay. So we looked at basically solving I mean we solved A plus E inverse x hat equal to B but in fact there could be errors on the in the right hand side in measuring B also this basically represents that you you do not have exact knowledge of A but if there is error in B also I can write this as B plus some error vector E small E and then using the same procedure as earlier we can write norm of x minus x hat over norm of x is less than or equal to k of A divided by 1 minus k of A times norm E over norm A times norm E over norm A plus k of A over 1 minus same thing k of A norm E over norm A times norm E over norm B. So this k of A appears in both the terms so this is true if norm A inverse times norm E we did not make any new assumptions here is less than 1 and these two bars is compatible with so the punch line here is that the left hand side here is the relative error in the solution and that is equal to the sum of two terms the first term is the relative error in E with the scaling factor kappa of A and this second term this is the relative error in the matrix A scaled by the factor kappa of A and this is the relative error in the right hand side small E multiplied again by this k of A or kappa of A which is the condition number of A. So we will just write that for the sake of completeness this term is equal to the relative error due to error in A plus the relative error due to the error in B and notice that these bounds okay they do not directly involve exact the solution that you found. So in some sense they are giving you a bound on I mean they depend on the actual error incurred or at least the norm of the actual error incurred but they do not have any direct dependence on A or on x hat. So there is a slightly we can consider which is that a slightly different viewpoint which is that basically we wanted to solve Ax equals B but we computed some x hat where x hat is an approximation to x i.e. Ax hat is not equal to B okay. So we have solved something else some nearby system or we have used some procedure we got hold of an x hat and Ax hat is not equal to B. So we can define what is called a residual which is equal to B minus Ax hat and this will be a nonzero quantity unless you exactly solve this equation Ax equals B this is called the residue and then we ask what is the what can we say about how close x hat is to x by looking at R. Of course Ax hat is B minus R we will see so if I compute A inverse R that gives me A inverse times R is B minus Ax hat which is equal to A inverse B minus A inverse Ax hat which is equal to x hat and A inverse B is x so this is x minus x hat. So this is the error in x when you so the error in x is related to the error in R through pre multiplication by the matrix A inverse and so we have that this norm of A inverse R is equal to the norm of x minus x hat okay and so if this three bars is a compatible matrix norm two bars vector norm then the norm of B is equal to the norm of Ax is less than or equal to norm of A times the two norm two bar norm of x which then means this quantity the norm of A times the norm of x over norm of B is greater than or equal to 1 if B is not equal to 0. So if as long as you are not solving a homogeneous set of linear equations B is not equal to 0 then we can have such a bound like this and then because of this if I look at norm of x minus x hat this is less than or equal to the norm of A inverse R which is less than or equal to norm of A times norm of x divided by norm of B times the norm of A inverse times the norm of R which then in turn is equal to K of A times norm of R over norm of B times the norm of x so basically yeah sir should be norm of x minus x hat be equal to the norm of A inverse R correct so they're equal but just for the sake of writing and equality I write it is also less than or equal to so I wrote that here I think norm of A inverse R is equal to the norm of x minus x hat but this is also true although I'm I'm I'm loosening it a bit but I can also write it this way so if B is not equal to 0 then the relative error between the computer solution x hat and such that A x hat is equal to B minus R R is the residual and the true solution x which is such that A x equals B satisfies norm of x minus x hat divided by norm of x is less than or equal to K of A times norm of R over norm of B okay where the only thing to keep in mind is that the matrix norm gives just a second yeah here we didn't use any any any rule like the norm of A inverse E should be less than 1 or anything there's a different formulation there's no E matrix here so the only thing we used was that where matrix norm used to compute K of A is compatible with the vector norm so but anyway the thing is that again it brings out the fact that well conditioned are good matrices because the amplification to the relative error in in B is is going to be small but if the matrix is ill conditioned it is possible that the relative error the the error the relative error in x hat can be large much larger than the error in relative error in the residue so you may think that you've solved it very accurately because A x hat is very close to B but the error in x hat itself could be much larger than that if the matrix is poorly conditioned okay so I think there is just one more way to analyze the sensitivity of linear systems let me see if I can cover that today okay so at least I can maybe start it give you an indication of how it works so another way so the the way it works is like this so we consider a perturbed system and so let epsilon be some small number greater than zero and A plus epsilon times an error matrix A delta times x of epsilon is equal to B plus epsilon B delta so here A delta and B delta some fixed matrices but what we say is that let's suppose that the A in our hand is A plus some small number epsilon times this A delta and the B in our hand is B plus some small number epsilon times this B delta and corresponding to this linear system of equations or system of linear equations we compute a solution to the linear system and we call it x epsilon and now we can try to look at how close x epsilon will be to x as epsilon starts becoming smaller and smaller so that means that we are perturbing this matrix by a smaller coefficient times this A delta and we are perturbing the observations B by a smaller and smaller coefficient times this B delta and we want to know if very very small perturbations can lead to large errors in x that is x epsilon minus x is going to be a big number still so now there is a scalar epsilon throughout this thing so we can actually differentiate this with respect to epsilon so what I get then is I have to use chain rule here so A plus epsilon A delta times there is some derivative x dot of epsilon this is the derivative of the solution which is a function of this epsilon and this plus A delta times x of epsilon so I am differentiating so what I have on the left hand side here is a vector and what I have on the right hand side is also a vector so I am differentiating a vector with respect to a scalar epsilon which is the same as differentiating each entry of the vector with respect to the scalar epsilon and so this time plus so if you do that more carefully you will see that what I am writing here is exactly correct so this is equal to B delta so for example if I take the ith component of this and differentiate it with respect to epsilon the ith component is Bi plus epsilon B delta i and if I differentiate that with respect to epsilon I will get B delta i and so that is why this whole vector on the right hand side is just B delta so which means that when epsilon is equal to 0 we have x dot of epsilon when epsilon equals 0 is actually equal to A inverse times B delta minus A delta times x at epsilon equals 0 which is equal to x which is the true solution so we can use this derivative to expand x of epsilon in a Taylor series around 0 so what we get is x epsilon minus or this is equal to x at 0 plus epsilon times x dot at 0 plus some term which is o of epsilon squared so we have that x of epsilon minus x of 0 which is actually equal to x is equal to epsilon x dot of 0 which is A inverse B delta minus A delta x then plus this second order term o of epsilon squared so then we have that norm of x epsilon minus x is equal to the norm of this quantity which I will use again triangle inequality actually this o of epsilon squared when I write it like this this is a vector whose entries are all scaling with scaling down as epsilon squared and so when I take this triangle inequality on it I will get the norm of o of epsilon squared and the norm of a vector whose entries are all of order epsilon squared is also of order epsilon squared and so this actually is less than or equal to the norm of epsilon A inverse B delta minus A delta x plus a term that is still o of epsilon squared and now again we will consider compatible norms such that A B is less than or equal to the matrix norm of A times the vector norm of B so then we can write this right hand side as being less than or equal to epsilon norm of A inverse norm of vector norm of B delta minus A delta times x plus still this o of epsilon squared in turn I can bound as less than or equal to epsilon times norm of A inverse and I will use triangle inequality and then sub multiplicativity here and write this as norm of B delta and I am using triangle inequality but then I have to use plus because I am taking norm inside and the mod of minus 1 is plus 1 so plus and then since it is additive term I can use this property one more time to write this as norm of A delta times the norm of x plus o of epsilon squared is less than or equal to and as usual I can insert multiply and divide by norm of A epsilon times norm of A inverse times norm of A times B delta so I have to divide by norm of A so norm of B delta divided by norm of A times there is a norm x from here so divided by norm of x plus the norm of A delta the norm of x cancels but I will be left with a norm of A sitting here again plus o of epsilon squared dividing by norm of A and norm of x doesn't change the epsilon dependency here and so this in turn so basically again use this compatible norm thing so A x equals B implies norm of B is less than or equal to the matrix norm of A times the vector norm of x so this is a bigger number so if I replace the denominator here with norm of B that will only increase this quantity so my bound is not affected so this x of epsilon minus x over norm of x is less than or equal to epsilon this quantity here is k of A times norm of B delta over norm of B plus norm of A delta over norm of A plus o of epsilon squared so this is nice you can see that it's showing how oops it's showing how the relative error in x of epsilon is related to the oops something happened I'll fix that and what happened here here we go okay so yeah yeah this quantity is the relative error in D this quantity is the relative error in A and this is the condition number and so it's showing how the relative error in the solution depends on the condition number times the relative error in B plus the relative error in A times this epsilon itself okay so we'll stop here for today and continue on Wednesday.