 So, in last class we are looking at matrix conditioning, how to classify whether a system of linear algebraic equations is well posed problem or ill posed problem. This boils down to looking at matrix A and we will come to that. I was just talking about a motivation to look at this problem. I showed you a simple system in which reordering of calculations can change the results. Another system in which inherently bad ill condition, a small error on the right hand side can change the solution drastically. So, the idea was to analyze errors of this type. So, we are looking at solutions of A x equal to b and as I told you invariably when you solve a problem using computer, you never you never solve this original problem. We always solve a problem which is A plus delta A x plus delta x is equal to b plus delta b. We can never solve the original problem except some simple very very simple systems. In general for most of the problems, I mean let me just give you a simple example. You can you know let us say I have this problem pi minus e. Let us say I want to solve this problem using computer. I cannot represent pi exactly. I cannot represent e exactly. Actually for this particular problem, you can find the exact solution analytically. What is the solution? You find a determinant and then you co-factor inverse. So, the true solution to this problem would be x 1, x 2. Can you tell me what is the determinant of this? 1 upon pi q plus e square and what is the co-factor? pi square e minus e pi into root 2 e square. Is it correct? This is the inverse matrix. This matrix this solution which you get here or this original problem can never be represented in a computer because pi is a pi is not a rational number. Even if you write a rational number, approximations will creep in because of finite precision used in computing. So, this is the true solution I would say. What you get in the computer is the approximate solution of this problem. And now the real worry is how bad is the approximation? When can computing make things wrong? I will give you a simple example now which is little more involved. This looks like a very simple matrix. I have given this in the notes. This matrix is 10, 7, 7, 5, 6, 5. The reason for giving all these examples is because unless you know motivation is clear if I just do the raw theory it does not make sense. So, this is my A matrix. I am choosing x 1, x 2, x 3, x 4 is equal to 1, 1, 1, 1. And then if I choose this I get 32, 23, 33 and 31. This is A matrix. This is x. This is the right hand side. 1, 1, 1, 1. I have chosen it is a non problem to you. If I ask you to solve for x you should exactly get 1, 1, 1, 1. Now what if I perturb this matrix a little bit? I will show you what happens. If to this matrix I add plus delta A. I add delta A. Now my delta A is going to be 0, 0, 0.1, 0.2, 0.08, 0.04, 0, 0, 0, minus 0.02, minus 0.110 and 0.01, 0.01, 0 and minus. Instead of taking this A matrix I am going to solve for A plus delta A. This is my A matrix. This is my original x and this is my right hand side. I am keeping the right hand side same. I am changing the problem now to this problem. I am changing A plus delta A. I do not know what will be x. I do not know what will be x and I am keeping the b same. Now if you see there my matrix delta A contains very small perturbations compared to the elements 10, 7, 8. My perturbations are 0.1, 0.2, 0.01, 0.02. Small perturbations in A matrix. You might imagine well if these kind of small errors will not change the solution drastically. You would expect the solution what you would expect x plus delta x to be typically small perturbation in A. My x should change a little bit does not happen here. If you solve this problem now x turns out to be minus 81, 137 with such a small perturbation in A matrix my x changes from 1, 1, 1, 1 to this vector. A small error, a tiny error in A matrix can cause solution to change so drastically that you cannot even recognize. There is something fundamentally wrong about that matrix. You make a small error in the representation. Your solution can be substantially different. It will not even resemble this original x. This is x plus delta x. You should write this as x plus. So, x plus delta x is this and your x just compare. Your x was 1, 1, 1, 1. These two are significantly different vectors. Let me make a difference. Instead of solving this problem I will solve this problem plus x plus delta x B plus delta B. I am going to introduce a slight perturbation on the right hand side. A matrix nothing is changed. You are represented A matrix correctly. My right hand side it changed and B plus delta B now is 31.99, 23.01, 32.99. Look at the original B vector. It is 32. I have perturbed by minus 0.01. This side I have perturbed by plus 0.01. This is minus 0.01. This is 0.02. Very, very small perturbation on the right hand side. How does x change? If I introduce this perturbation, my x plus delta x becomes 0.12, 2.46, 0.62. Look at this solution. Where is 1, 1, 1, 1 and where is this solution or where is 1, 1, 1, 1 and where is this solution? These two solutions with slight perturbation on the right hand side or a slight perturbation on the left hand side in the A matrix or B matrix is changing your solution so drastically that there seems to be something funny about this matrix. You just change something a little bit. Make one small error your calculations are going everywhere. You would expect a small error committed in A or B would result in a small error in A or in solution x. That is not happening. This particular matrix seems to blow up even a small, a tiny error in the representation. This is the background, this is the motivation and then when you solve partial differential equations or boundary value problems, you have large matrices depending upon how you discretize, how you create grades or number of collocation points or whatever method of least squares, whatever you are using. You have a large number of points and you make small errors there. If those matrices are ill conditioned, you can get answers which are wrong even when you have a very good program. Nothing to do with how good your program is written. Even if you have a best package, numerical package, you can end up getting absurd answers. So now I want to come up with a method of analytical tool by which I can say which matrix is good, which matrix is bad and that is what is going to be my theme for this particular lecture. So let us look at this case first. I think with this numerical example, at least you have motivation for why you are looking at this particular problem. Now I am going to do this two special cases of this derivation because they will give you insight. It is possible to do a more general derivation but I think the specific derivation gives more insight. So let me look at this case first. I wanted to solve AX equal to B. I was able to represent A perfectly. There was no problem. There was some error committed on the right hand side. B was represented wrong. That is why the solution became X plus delta X. So this is my original, let us say X is the true solution, A is the true representation of A matrix, B is the true representation of B vector and this is what you ended up solving in the computer. So this is my 2. What will I get? 2 minus 1. So this is AX plus A delta X is equal to B plus delta B. If I subtract 1 from 2, I will get A delta X is equal to delta B. Let us for the time being assume that of course when you are solving for AX equal to B, A is invertible. If A is not invertible, you are solving a problem which does not have a solution or which may have multiple solutions. It does not have a unique solution. Let us assume that we have a problem where A has a unique solution. So I am not talking of a system which is, just mind you, I am not talking of a system which is singular. This is not, singularity is not a problem. It is not a singularity. Do not confuse singularity with ill conditioning. A singular system may not have a solution or it may have multiple solutions. Singular system may not have a solution or it may have multiple solutions depending upon whether B belongs to the column space of A or and what is the null space of A matrix and so on. So that is a different story. This is not singularity. This is something different. So now delta X is equal to A inverse delta B. Now I am going to use properties of matrix norm. So norm delta X is equal to norm A inverse delta B which is less than or equal to norm A inverse using this basic definition of matrix. I am just using properties of matrix norms. So which means norm delta X change in solution due to change in the right hand side. This ratio is bounded by norm of A inverse. Well I, in general when you talk of delta X and delta B, these could be very very small numbers and then the ratios sometimes does not help you to quantify everything. We need to talk about relative change. I would like to know about delta X by X delta B by B. If I change percentage error with respect to the original solution. So this inequality is not sufficient. I need something more. So now I am going back to this first equation. I am going back to this first equation here. AX equal to B. So norm AX is equal to norm B which is less than or equal to norm A norm X. Norm AX is less than norm A norm X. This is the fundamental inequality which defines the matrix norm which defines the matrix norm. So I am just using the definition of matrix, induced matrix norm. So this particular inequality gives me delta B, sorry here we are talking about B. Norm B divided by norm X this quantity is always bounded by norm A. Now this inequality I am going to combine with the earlier inequality. See this is a positive number. This is a positive number ratio of two norms. Right hand side is a positive number. So now combining let us call this result 4 and let us call this inequality as result 3. So if I combine 3 and 4 I can just multiply the left hand side and right hand side two positive numbers which are less than two other positive numbers. So if I combine those two I will get if I combine these two inequalities I get this relationship. If I do a little bit of rearrangement you will see why I am doing this. Now this is everyone with me on this inequality. I am multiplying positive numbers on the left hand side and multiplying positive numbers on the right hand side. Two inequalities are combined to generate this inequality. Now how do you derive something out of this? Now so this gives me this fundamental inequality that relative change in the solution I am looking at a sensitivity. Relative change in the solution due to relative change in the right hand side is bounded by this number. What does it mean? It means that maximum ratio of relative change in x to relative change in b is equal to norm of a into norm of a inverse. What do you see here on the right hand side? Neither x appears nor b appears. Only matrix a appears. So if you have a slight error in representation of b what is the maximum possible fractional error? This is something like fractional error. Norm delta x by norm x is something like fractional error. So what is the maximum fractional error that you get in the solution? This is bounded fundamentally by two quantities, multiplication of two quantities norm of a and norm of a inverse. This norm of a into norm of a inverse this is called as condition number of a matrix. How do you evaluate this? You could use one norm, two norm, infinite norm, whatever is convenient. You can use that norm and find out this quantity. Well finding out this for one or infinite norm has a problem because one or infinite norm would require a computation of a inverse and many times a inverse is not computable. Two norm somehow happens to be convenient. I will give you the way of computing condition number using two norm. So condition number using two norm is used very often. But it does not mean that you cannot use the other error. You can of course use the other definition. So this in some sense gives a bound on the amplification factor or amplification of the error in the solution due to change in the right hand side or error in the error committed. This error delta b could be committed due to variety of reasons. It could be because of representation. It could be while doing some computations. So this fundamental quantity which appears here I will show you that in some other context also when you put up a matrix again same number will appear. So there seems to be something fundamental about this. Now let me analyze another case. Now I want to solve a x equal to b. This is my equation number one and then I end up solving a plus delta a x plus delta x is equal to b. I end up solving a plus delta a x plus delta x is equal to b. If I expand this it will be a x plus a delta x plus delta a x plus delta a delta x is equal to b. And then I am going to rearrange this and say that delta x I am retaining this delta x delta x on both sides here. The idea here is to give you the spirit of what is happening. As I said it is possible to do a derivation where a plus delta a x plus delta x b plus delta b. I am avoiding that general derivation. I am just looking at two special cases. Now this you get by subtracting. So if I subtract if this is my equation five and if this is the same as equation one which we have written earlier. So if I subtract from five equation one b and b on both sides will disappear. B will disappear on both sides. A x and A x will disappear. What remains is this term, this term and this term. This will go and this will go. If I subtract this equation from this equation then what remains is a delta x is equal to minus of delta a times x plus delta x. All that I have done is I have expanded this and subtracted this equation to get this perturbation equation. So this implies that delta x is equal to minus A inverse delta a or norm delta x is equal to norm A inverse delta a x plus delta x using the fundamental inequality of matrices. I can write this as A inverse norm of delta x is always less than. So I am going to rewrite this as norm delta x divided by norm of x plus delta x. Is everyone with me on this? So this is something like relative change in the solution. Except here we have written delta x on the denominator but this is something like relative change in the solution and right hand side is this. I am going to play a trick on the right hand side. I am going to write this as this quantity on the right hand side. I am going to write as norm of A inverse norm of A inverse into norm of A into norm of delta A. Is everyone with me on this? Just check this. I am multiplying and dividing by norm A. I am multiplying and dividing by norm A and then I am going to single out this quantity and then using this I am going to write or I am going to rearrange this inequality as follows norm of delta x divided by norm of x plus delta x. Relative change in the solution to a relative change in A matrix is again bounded by into A inverse that is norm of A into norm of A inverse. Again this condition number reappears when you are trying to analyze the system in which A is represented slightly erroneously. So this is again condition number. This is also condition number of A. So general case where there is error in all three becomes little more complex and you can look at it in some of the textbooks on numerical analysis. But what is important is that this condition number which seems to play the key role in how sensitive is your solution to the errors. See what you are looking at here? How sensitive is your solution? What is the fractional change in the solution to fractional change in A matrix? What are you looking here? What is the fractional change in the solution to fractional change in the B matrix? Well we had made while deriving this we had made that A is perfectly represented. There we had made assumption that B is perfectly represented but A is wrong. So basically you just get insights into what really seems to be the key factor. This analysis of based on norms seems to suggest that this quantity is something fundamental and this determines how well conditioned, how ill conditioned a particular system is. If well to give you a thumb rule if this number is large then what is large? Large is say about 1000. If this number appears this is more like Reynolds number below 100. If this quantity is less than between 1 and 100 then matrix is well conditioned. If between 100 to 1000 it is a gray area you do not know what it is. Beyond 1000 you can expect trouble. Beyond 1000 you can expect trouble and if it is 10 to the power 5, 10 to the power 6, deep trouble. Nevertheless MATLAB is a wonderful software. It can give you reasonably accurate solutions to 10 to the power condition number, 10 to the power 5, 10 to the power 6. But MATLAB starts breaking down if you give a matrix whose condition number is very, very large. Now how do you get more insights into condition number? We will use here two norms to compute the condition number. So I want to get an estimate of this quantity. Do it using two norms. If I give you a matrix we have done this earlier. Two norms of the matrix is given by largest magnitude eigenvalue of A transpose A. So this we have done earlier for lambda i max over, not lambda i, max over i lambda i A transpose A. A transpose A is a positive definite matrix always or positive semi definite matrix. All the eigenvalues are 0 or greater than 0. So non negative eigenvalues. So you can always find a maximum. So let us call this as lambda n. Let us call this as lambda n. Let us say that we have arranged the eigenvalues. We have numbered them such that lambda n is the maximum magnitude eigenvalue of A transpose A. It is very easy to compute the two norms of A inverse. Without having to compute A inverse that is very nice. You can compute two norms of A inverse. You use this fundamental relationship. If you are given a matrix, is there a relationship between a matrix, a non singular matrix. It is eigenvalues and eigenvalues of A inverse. Is there a relationship? See if A is an invertible matrix, no zero eigenvalue. Let us take B here because we are talking about A transpose A. B B V is equal to lambda V. It is very easy to show if lambda is the eigenvalue and if V is the eigenvector. I just have to do this. We multiply both sides by A inverse. So I will get V is equal to lambda V inverse V or 1 by lambda V is equal to. So if lambda is eigenvalue, if lambda is eigenvector, eigenvalue of B 1 by lambda is eigenvalue of B inverse and eigenvectors are same. Just see. Eigenvectors do not change. B inverse V is equal to 1 by lambda, non singular matrix. I am talking about non singular matrix. There is one more thing, one more relationship which I need to use is that A inverse. Do you agree with me? A inverse into A is identity transpose is equal to I. So A transpose into A inverse transpose is equal to I. Is everyone with me on this? I can interchange inverse and transpose. See what I started with. A inverse into A is identity. Transpose is equal to identity. This is identity equal to identity. That is what I am writing. Then I am expanding this because if you multiply two matrices and take their transpose, it is, you know this rule? C D transpose is equal to D transpose C transpose. C D transpose is D transpose C transpose. So I am just using that here. So A inverse transpose is same as A transpose inverse. A inverse transpose is same as A transpose inverse. Or let us not confuse with that A here. Let us put this B. Notationally, I am just talking about general properties of matrices. We were talking about A x equal to B. I am not talking of that A. In general, when A is an invertible matrix, you can write this. Any square invertible matrix, you can show that B inverse transpose is same as B transpose inverse. You can interchange transpose and inverse. Now these are the two properties which I am going to use. I am leaving a few things for you to derive. I am not doing it on the board. But norm A inverse 2 square is equal to max over I. So A transpose I lambda I of A inverse into, it is A inverse. See what was here? Two norm of A is maximum magnitude eigenvalue of A transpose A. In this case, we are talking about A inverse. So A inverse transpose A inverse. I am just combining and saying this. Now, eigenvalues of A transpose A is a positive definite matrix. All eigenvalues are positive. What is the relationship between eigenvalues of A transpose A and A transpose A inverse? One upon. So if lambda 1 is the smallest eigenvalue of A transpose A. So A transpose A is the smallest eigenvalue what will be the largest eigenvalue of A transpose A inverse? One by lambda 1. So one by lambda 1, largest eigenvalue A transpose A inverse. This is the largest eigenvalue of A transpose A inverse. Is everyone with me on this? So lambda n, what is the largest eigenvalue? We have numbered eigenvalues. The largest one we are calling lambda n, the smallest one we are calling lambda 1 for A transpose A. You have some doubt? A transpose A, we have numbered the eigenvalues. Lambda 1 is the smallest, lambda n is the largest. We have chosen the numbering. We look at the eigenvalues and we number them. Lambda n is the largest, lambda 1 is the smallest. Now, if lambda 1 is the smallest eigenvalue of A transpose A, then one upon lambda 1 is the largest eigenvalue of A transpose A inverse. It follows from the first relationship. So now, what I need to do is, so I am now going to combine these two things and come here. I am going to combine these two things. So norm A2 square into norm A inverse 2 square seems to be equal to lambda n by lambda 1 or norm A inverse 2 is square root of lambda n by lambda 1. What is lambda n? Largest eigenvalue is also called a singular value of A. Largest eigenvalue of A transpose A and what is lambda 1? Smallest eigenvalue. So condition number of this matrix is simply ratio of largest eigenvalue of A transpose A to smallest eigenvalue of A transpose A. Computing these largest or smallest eigenvalues for a positive definite matrix, not that difficult, not very difficult. Why positive definite? Because A transpose A is a positive definite matrix. If you ask MATLAB, if you give matrix A and say C, O, N, D, it will give you condition number of matrix. It will give you condition number. Actually, it will compute this ratio and tell you what is the condition number. So this way, I would call this quantity as C2 of A. I will call this quantity C2 of A. So condition number based on two norm of the matrix, likewise I can define condition number based on infinite norm or one norm. Although in that case, I will have to compute A inverse explicitly. Of course, we can do that for a simple matrix to get insights, though for a large matrix that might be not so suitable. So I can define C infinity of A as infinite norm depending upon where you choose norms. You can define the condition number. Now, let me complete some story which I began long time back and I never told you exact reason. I kept on saying that polynomial approximations give rise to difficulties because polynomials higher than fourth order or fifth order, you get ill conditioned matrices. Something may appear disconnected now but I want to complete the loop now and go back and say why polynomial approximations higher than certain order. So why does it happen that polynomial approximations create problems? I am going to analyze this using condition number. Coming back, I had put one more piece in the puzzle. I had showed you that when you try to do polynomial approximations, you get a matrix called as Hilbert matrix H. Actually, in MATLAB there is a command called Hilb. If you say Hilb 3, it will give you a 3 cross 3 matrix which is Hilbert matrix. Hilb 4 will give you 4 cross 4 matrix and you can just do this once. Just go on creating Hilbert matrices and start looking at the condition number. Condition number is, let us say this. Condition number of Hilbert matrix are notoriously bad. What does it mean? It means that if you make a slight error in the representation of numbers, these ratios are going to be, see this condition number tells you worst case error. It is not that for a particular case it will happen but if it happens it can be very bad. I have shown you some examples. You put up a matrix slightly, your solution just goes out of box. 1111, it goes somewhere else. It is not in some small neighborhood of 1111. It goes to 83 and some 52. So, solution can be completely different with a small error. Let me take this Hilbert matrix. H 3 is this Hilbert matrix. Hilbert matrix is a very nice structure. 1 by 2, 1 by 3, 1 by 2, 1 by 3, 1 by 4, 1 by 3, 1 by 4, 1 by 5. And I kept on telling you that second order or cubic polynomial is ok but fourth order, fifth order, sixth order, seventh order polynomial becomes bad to solve. Why it becomes? Because you get a situation, you get h times theta equal to some u. u is known on the right hand side. H is the Hilbert matrix. Theta are the parameters to be estimated. You got this kind of an equation. Theta are the polynomial coefficients. H is the Hilbert matrix and u is the right hand side. Whatever is the right hand side, what is important is how well conditioned is h matrix. Because you are solving A x equal to b, this is another form estimating coefficients of a polynomial. Now, for this simple matrix you can show that I am going to call this as h 3 because it is 3 cross 3. You can show that it is 1 norm is same as h 3 infinite norm. This is 11 by 6 and you can actually for this simple matrix you can compute the inverse and you can show that h 3 inverse 1 norm is equal to h 3 inverse infinite norm is equal to 408. You just compute h inverse or 3 cross 3 matrix. You can do it by hand also. We will get exact solution and you can actually get this these numbers and then what you show here is c 1 h 3 is same as c infinity h 3 which turns out to be 748. The calculations are not bad. Condition number is 748. The worst case error that can happen is of the order of 1000 times. Not so bad. Just let us see what happens if you want to fit a sixth order polynomial. In a sixth order polynomial, but even this case for h 6 would be 1, 1 half, 1 third, 1 fourth, 1 fifth, 1 sixth, 1 half, 1 third and so on up to 1 by 6, 1 by 7 h 6. If I want to fit a sixth order polynomial then I will get h 6 into theta h 6 into theta is equal to u. I will get h 6 into theta is equal to u and what you can show is that c 1 h 6 condition number based on one norm which in this particular case turns out to be condition number based on infinite norm also same sixth order polynomial or a sixth order polynomial. Just look at this. Condition number is so bad. Whatever you try to do, you will not get reliable solutions. 10 to the power 7, a small error can get amplified in certain directions in very very bad measures. What are those certain directions? Those are related to the eigen vectors of A transpose A. In the directions in which A transpose A has maximum magnitude eigen value, the eigen vector corresponding to lambda n will amplify your error worst. It depends upon how is your B is aligned. This particular matrix as you start increasing polynomial order, you will get Hilbert matrices of higher and higher order. If you go to 10 to the power 12, a small error committed can create a havoc. I will just illustrate how things can be different. Even for this h 3, then you can judge what will happen for sixth order polynomial or seventh order polynomial why we do not get good results, why we need cubic spline, why we need polynomial interpolations which are piecewise polynomial interpolations. Why do we really go for that? I will just complete this one example. See if h 3 into 111 will give you 11 by 6, 13 by 12, 47 by 60. Now what I am going to do is that instead of solving for this problem, I am going to round off this matrix. I am going to solve for h plus delta h. What is my h plus delta h? I have rounded off this right hand side perfectly just 2 digit approximation. Very very often we do in doing calculations and this you might see nothing wrong. One third being represented as 0.333, I am truncating. Nothing wrong. My solution here was 111. Right hand side was this, left hand side was that. I just a plus h plus delta h and this is b plus delta b. How much does the solution change? This x plus delta x turns out to be 1.83, 1.08, 0.783. Just imagine. A tiny error in every number, trying to fit a third order polynomial. You are trying to go from that matrix to this matrix. You might find perfectly reasonable. I will get this solution. A small perturbation for a matrix whose condition number is only 700, not very bad, gives me so much difference in the solution if I decide to represent this by this and that matrix by this matrix. You see why condition number is so important when you want to study matrix computations? My solution is, which solution is the correct solution now? The correct solution is 111. What you are getting here is completely different. Just imagine what will happen if condition number is 10 to the power 6 or 10 to the power 4 or 10 to the power 5. So, the solutions which MATLAB gives you or not MATLAB, I should not single out MATLAB. Any software will give you for a matrix with high condition number is likely to be a garbage and you should know this. When the solution is garbage and when you have committed mistake, if a matrix is well conditioned and if you are getting garbage, you have made a mistake in programming. If a matrix is ill conditioned and if you are getting garbage, not that the software is wrong, not that the program is wrong, it is inherent problem. Just see here, 700 condition number, small change in the right hand side or very small change in the left hand side, A and B matrices, you get drastically different solution. Even for third order polynomial, you have this situation. That is why we do not try to fit high order polynomials when you will continue this story and this is end of this series of lectures on AX equal to B. A little bit of it is remaining. We will complete it in the next lecture.