 So, we have been looking at this, we have been looking at method of optimization, numerical optimization to solve linear algebraic equations and what we derived in the last class was the gradient method or the Cauchy method. The gradient method does a search, iterative search in the gradient, negative of the gradient direction and if I summarize this algorithm, this was, so we are trying to solve Ax equal to b and we have this, we are assuming that A is symmetric or A is equal to A transpose and A is, well in applied mathematics or in linear algebra, you often write that A is positive definite by saying A greater than 0. That means A is positive definite, that does not mean all elements of A are greater than 0. When you write A greater than 0, it means A is positive definite, A is symmetric and then we formulated an objective function. Well, we know what to do if A is not symmetric positive definite, we pre multiply both sides by A transpose and then we get, we work with the modified problem. So, phi is half x transpose Ax minus x transpose b, we want to minimize this with respect to x, minimize with respect to x phi and then we use this gradient method to come up with a search direction. So, what was this gradient method? The gradient method was, we find the gradient direction. So, dou phi by dou x is Ax, this is the gradient and we want to move in the negative of the gradient. So, we take g k is equal to minus Ax k minus b, this is my negative of the gradient direction and then we move, how much do we move each time? We solve a one dimensional optimization problem and then we move actually x k plus 1 is equal to x k plus lambda k g k and we found that lambda k is nothing, but b transpose g k. So, we make an optimal move every time in the negative of the gradient direction, this is negative of the gradient direction lambda k times g k that is the move that we make every time and this is the classical gradient method or the Cauchy method and then these iterations are stopped when you reach a point or gradient is not changing, either way of them is fine. x k is not changing, the new iterations are not adding too much information. We pre-specify epsilon, epsilon could be something like 10 to the power minus 8, 10 to the power minus 10, a very very small number and we say that the relative change in x as the new iterates are generated is insignificant that is when we stop. So, this is iterative scheme and if you remember, if you see here every time you need to compute, you need to compute a x, a x k minus b k, you need to compute a x k minus a x k minus b, this is the residual actually when you solve this, when you solve this, this g k will be 0 gradient direction in 0 which is the necessary condition for optimality for this particular objective function. So, finding the optimum is equivalent to reaching the meeting the necessary condition for optimality and we satisfy, we terminate or we terminate based on even you can terminate based on g k, if g k is very very small, the gradient vector is very very small. So, this slight modification of this because this the problem with this method, it works very well when you are far away from the optimum, when you come close to the optimum this tends to become very very slow. The moves that you make every time when you come close to the optimum are quite small. So, the trouble is this we need to do something to accelerate the conversions when you come to closer to the optimum. Now, what I am saying is not just true for this particular problem, it is in general to that, gradient method is one of the very basic methods for doing numerical search or numerical optimization, but the problem with gradient method is that it works very well when you are away from the optimum, when you come very close to the optimum it becomes very very slow. So, convergence to the optimum is slow and we need to do something about it. So, there is a variant which is called as conjugate gradient method, variant called conjugate gradient method. So, what we do is we move in the conjugate directions, a conjugate directions. So, given a matrix A, given a matrix A, A is our matrix, we call see a set of directions S0, S1, let us say that S1, S2, S3 are directions in which we want to move. These directions are called as A conjugate if this condition is satisfied. If you take two directions K and K minus 1, not any two, K and K minus 1, say new direction and the previous direction and if this inner product is 0 or if this whatever you want to call it dot product is this 0 which is weighted by A matrix, then these directions are called as A conjugate directions. And what you can show is that instead of moving along the gradient direction, if you move along the A conjugate directions, then you reach the optimum very very fast. If you have the nodes with a small correction, it is on equation 129. So, how do you generate this A conjugate directions? We are going to use this formula, S k is equal to beta k S k minus 1 plus. So, we are going to use the gradient direction, it is not that I am not going to use the gradient direction. It is a conjugate gradient direction, conjugate gradient algorithm. So, I am going to use the gradient vector, current gradient vector, but to that I am going to add some correction which is based on the previous, this is based on the previous you know previous direction. And I kick off this algorithm with S 0 is equal to. So, what means this means sorry yeah S 0 is equal to 0, which means S 1, the first move that I am going to make, the first move that I am going to make is G k, G 1, the first gradient. Let us say I am starting my iterations at 1, I am taking S 0 to be 0. So, the first direction is the gradient direction. What will be the subsequent direction? How do I choose this beta k and all that will become clear soon. What is S 2? S 2 will be beta 2 S 1 plus G 2 that means gradient at point 2, negative of the gradient at the point 2 sorry, G here is negative of the gradient. We are going to move in the negative of the gradient, we are minimizing not the gradient direction, G is the negative of the gradient direction. So, G 1 is nothing but minus of A x 1 minus B, A x 1 minus B and so on. I can just go on doing this. So, you can see what is this? This is beta 2 G 1 plus G 2. So, I am taking a linear combination of the two gradient directions, not just current gradient or not just negative of the current gradient direction, but I am taking a linear combination and so on. So, this every iteration I am going to do this. Now, I am going to invoke this A conjugacy. So, what I am going to do now is to take S k transpose and pre-multiply it by A S k minus 1. So, what is this? Beta k S k minus 1 plus G k transpose A S k minus 1. Now, these are A conjugate directions, what should be this? This should be 0. So, since these are A conjugate, this left hand side is 0. So, I get what do I get from here? 0 is equal to beta k S k minus 1 A S k, S k minus 1 transpose A S k. See, S k minus 1, let us put it is in the bracket, it will become easier to understand. This is S k minus 1 transpose A S k minus 1 plus G k transpose A S k minus 1. This will see what is G k is the negative of the current gradient direction, G k is known to me. S k minus 1 is the previous direction. So, using this I can find out beta k. So, that gives me a formula for beta k. This gives me a formula for beta k, how much to move, how much to what is the linear combination that should be made for A conjugate directions. So now, and what is this? If I write G k to be minus of A X k minus B that completes the formula. So, I have this G k, which is given by this. This is how you kick off your iterations and this is how you compute B k, beta k. This is how you combine. So, actually I can combine this and I can combine the two and I can find out this as my iteration scheme. This is a scalar remember, this is a scalar ratio of two scalars. This is the vector direction and this is current negative of the gradient direction. So, this is how you combine and see how beta k changes with little bit of algebraic jugglery. What you can show is that I am skipping this. I have given the derivation here. You can just go over it is that S k with some algebraic manipulation can be shown to be beta n minus beta 1. Well, if we do a little bit of index jugglery. If I just come back here, I will do a little bit of index jugglery. I will call this minus 1 S minus 1 equal to 0. So, if S minus 1 equal to 0, then S 0 becomes G 0, which is minus, which is same as minus A x 0 minus B. If you do a little bit of index jugglery, you call 0 as the starting point, minus 1 previous to the starting point, because if you put k equal to 0, whether you start from k equal to 1 or k equal to 0, it is a matter of just writing the algorithm. To be consistent with the notation here, I will just. So, G 0 is the initial negative of the initial gradient direction. So, what you can show with a little bit of jugglery is that. So, in this case, you find the current direction in which to move using a linear combination of all the previous gradient directions. You do not use only one current gradient direction. So, use a linear combination of all previous gradient directions. Now, this is the direction in which you want to move. How much do you want to move? You have to use this lambda business. So, after we have decided this S k, what we do is… So, my new point that is x k plus 1 is x k plus lambda, S k is only the direction. Just remember and then I find out. So, lambda k is equal to min with respect to lambda phi of x k plus lambda S k. This is equal to min with respect to lambda and then this can be solved very easily. Lambda k turns out to be this matter, this scalar. How much to move in this direction is given by this particular scalar. Now, what I can show is that theoretically, I can reach the optimum of phi only in n steps. I have started with this objective function where a is symmetric positive definite. So, theoretically, you can show that half x transpose A x minus x transpose B. Solution of this can be reached starting from arbitrary initial condition only in n steps. But of course, what happens is there are numerical errors when you compute in a computer and you cannot practically reach in n steps. On paper, you should converge to the solution only in n steps. Very powerful method. Now, just look at a matrix which is 1000 cross 1000. You are going to reach very close to the solution in 1000 steps. Each step only requires one matrix multiplication. It requires some matrix multiplications and you converge to the solution very fast. This is actually one of the popular methods for solving linear algebraic equations for large scale systems. Very quickly, you can reach the solution. You can get very close to the reasonably accurate solution. You can get very quickly. The lambda k can have some problem, but you are multiplying by A. G k transpose A g k. No, but I do not think that will happen in conjugate gradient. Gradient method, you can land up into problem because of that. You can get not 0 by 0. You get a small number by small number. You get a small number by small number. So, that can sometimes cause a problem. But more problem with gradient method is not because of that. The difficulty is because it becomes very slow as you come and this does not become slow. This homes down to the solution in n steps, at least theoretically. So, it is a much better method. So, we move on to the next thing now. We have talked about different methods of solving A x equal to b. Let me just take a brief review of what we have done till now. We started looking at direct methods. We had at least taste of what are called as sparse matrix methods. Now, we try to exploit number of 0s that are present in the matrix and then we come up with a way of solving very quickly in less number of multiplications and divisions. Then, we moved on to iterative methods for solving A x equal to b which was completely new to you because you have been looking at Gaussian elimination, Gauss-Jordan, everything that is using. But iterative methods can particularly in let us say in Newton Raphson when you are solving multiple A x equal to b problems. It is not that important that you exactly get the true solution at the iterates. You can very quickly get the solution approximate solution very close to the approximate solution using iterative methods. So, Jacobi method, Gauss-Jordan method, we had a look at those methods. We also looked at why and how they converge, how to make them converge and so on. So, we relate to the Eigen values. We said that inside unit circle, no, if you have Eigen values of S inverse t, you can guarantee convergence to the solution necessary and sufficient conditions. We also derived some sufficient conditions, some necessary and sufficient conditions or how do you transform the problem to guarantee solution and so on. Then, we looked at very quickly gradient based methods, optimization based methods. So, Gauss-Jordan method and conjugate gradient method. If you do a course of optimization, you will see that gradient method and conjugate gradient method, they form, you know, basis of many, many variants and there are more powerful variants of, but these are two basic methods which are used in numerical optimization. The next thing that I want to move to is what is called as a condition number. Well, what I want to talk about now is very, very important aspect of numerical computing. This is how well conditioned or how ill conditioned a system of linear algebraic equations is. So, now the problems that occur in numerical computing, one of the reasons for these problems is fixed point arithmetic. What is fixed point arithmetic? Fixed point arithmetic is that we have finite number of bits to represent a number and then we truncate. So, suppose the number goes beyond this representation or if there is a number which cannot be represented exactly using finite number of bits, we just take the nearest approximation and work with it. Just to give you an idea that if you have a computer which has 3 bit precision and if we are to add, you know, point, suppose I want to do this addition, a 3 bit precision computer or 3 digit precision, let us not say 3 bit, 3 digit precision computer cannot actually look at this part of the number. It will ignore this part of the number. It will only take this plus 0.002 and a 3 digit computer will commit an error of 3.1 10 to the power minus 4 in this addition. And just imagine this is now of course we have numbers floating point which is 16 digit precision and so on. But there is something that is being neglected and even though every time it is very, very small, each time when you do it, it is very, very small number that is neglected. But you know, you are doing millions of operations, right? You are doing millions of operations, large matrix. You are doing so many thousands of multiplications division each time you are neglecting a small number. Each time you neglect a small number that can cascade and grow and build up and create a problem. Now this happens only for those cases where the matrix is problematic. When matrix is not problematic, the errors have natural tendency to die down and things work out. You get reasonable solutions. So actually what you have to remember is that when you are solving A x equal to B using a computer, you actually never end up solving this exact equation. You end up solving something like this. You end up solving A plus delta A x plus delta x is equal to B plus delta B. I want to solve this A x equal to B. I end up solving this problem. That is because I cannot represent A accurately inside my computer. I cannot represent B accurately inside my computer. If A and B are not accurately represented, I cannot get exact solution to my problem. Now just to give you, so what I want to do is I want to have some idea when this solution will be far away from what I really want to get. What are the factors that influence how the solution behaves? First of all the right hand side B is something which is specified by the person who is using this set of equations by the user let us say. So B is not in our hands. What is fixed about the system of equations is matrix A. So how the solutions behave? It actually depends upon how A matrix is or how well conditioned or how ill conditioned A matrix is. So our main culprit or when things go wrong is A matrix. How to recognize whether A matrix is good or A matrix is bad? If A matrix is bad whatever you do to some extent well good programs like MATLAB can recover and give you a reasonable solution but beyond the point even MATLAB fails and it is not because there is something wrong with MATLAB. It is because there is a problem with the matrix. The matrix is ill conditioned. In iterations sometimes you may have received a message that this matrix is ill conditioned. The results may not be reliable. So MATLAB does its job. It tells you that there is something going on wrong here. You should know what it means. You should know what the warning means. You should understand and analyze. So let me give you a motivation for what is going to happen next. That is condition numbering. I will need one more lecture to cover this business of condition numbers but I am going to give motivation for this today. Why do I want to look at condition number? Then we will look at the mathematical details. So this example I have taken from Strang. It is a very nice simple example which illustrates. Using a very simple example, 2 cross 2 example, Strang has illustrated the point that sometimes the calculations go wrong because you are not done proper ordering of the calculations. If you reorder the calculations things can be okay. But sometimes the matrix is bad whatever you do there will be problem. So this system 1 he considers is the simple system. So this is a very very simple system 0.001, 1, 1 and 1. So now what is how will you do Gaussian elimination? If you do it without thinking about the problem you will just take this is a non-zero number. You will take this as a pivot and then proceed with making the matrix to be lower triangular. So your next step would be 0.0001 and 1, 0 minus 9999 x1 x2. So this will give you x2 is equal to this and then you will back substitute and get x1. So we round this off to 1. Let us say you have a computer which has finite digit precision. So you round this off to 1. 0.999 will get rounded off to 1. If this is rounded off to 1 what is the solution for x1? If this is rounded off to 1 what will happen? x1 will be 0. This is because the rounding of error. Now what is the exact what is the correct way of doing calculations here? Well I will do maximum pivoting. You know what is maximum pivoting? In Gaussian elimination what you do? You rearrange the rows such that the diagonal elements are maximum. So if I do maximum pivoting this will be 1, 1, 0.0001, 1, x1 x2 is equal to 2, 1. My modified problem, same system. I have done maximum pivoting. I have moved this pivot here 1. Now do Gaussian elimination. So now if I do Gaussian elimination here I will get 1, 1, 0, 0.999. What is x2? What is x1 now? If x2 is equal to 1, x1 is equal to 1. You see the problem here? Same system of equations. I reordered the calculations. Once I got solution, once I got solution 0 and 1. Here I got solution 1 and 1 rounding off. What mattered here? This is the correct solution. 1, 1 is the correct solution. What mattered was where I ordered my calculations. If I do wrong ordering of my calculations this particular matrix is not ill conditioned matrix. It is a well conditioned matrix. Why it is well conditioned matrix? We will come to that little later. But right now just take it for granted. This is a well conditioned matrix. What went wrong was we did wrong ordering of calculations and we got a solution which is not acceptable but we did not recognize it of course that time. We thought everything is logical. We did rounding off there and then we got a solution which is. But here when you correct the order of calculations you recover the solution 1, 1 and then. Now let me take another system. Now second system my system 2 is this. 1, 1, 1.0001 x1, x2 is equal to 2, 2. If I apply first step in the Gaussian elimination. You know first step in the Gaussian elimination. This will be 0, this will be 0.0001. I just did Gaussian elimination 2 times. So I just subtracted this row from this row, this from this. I got 0 on this side and then what is the solution? x2 is equal to 0 and x1 is equal to. What I am going to do now? I am going to say that well this was a slight error on representing the right hand side. Now you would say well how does it matter? This slight error 10 to the power minus 4. I make an error of 10 to the power minus 4 on the right hand side. So instead of representing 2 and 2 you know I ended up representing 2 and 2.0001. Instead of typing 2 and 2 let us say I typed 2 and 0.0001. You would expect what is there? These are 2 very, very close vectors. Error is of the order of just see what will happen. If this was instead by mistake you know represented as 2.0001 this will become what next? If this happens my x1 will be equal to 1 because 0.0001 divided by 0.0001. x1 will become 1 what will be x2 or the other way round. Suppose this was 1.999 it will be x1 will be minus 1, x2 will be 3, x2 will be 3. So slight error on the right hand side slight 10 to the power minus 4 which you think you know looking at these numbers 10 to the power minus 4 we are working with 1, 1, 1, 2 and all that. 10 to the power minus 4 error but such a small error is causing a drastic change in the solution. Such a small error this will not happen for system 1. Once you reorder the calculations if you put up the right hand slide slightly it is not going to change the solution too much. Here if you have a solution which is if you have the right hand side which is slightly perturbed you have a problem. The solution changes drastically. So from 0 to 2 it changes to 1 to 1. It is not a small change. So this problem what we will show a little later is that this is because whatever you try to do here you rearrange the calculations you do pivoting well here there is no scope for pivoting both pivots are 1, 1 so there is no question of Maxwell pivoting but here nothing is going to change this problem. The slight error in representation on the right hand side you will get drastically different solution. So this problem is ill conditioned. No reordering is going to help you. The first problem is not ill conditioned it is a problem of reordering the calculations to get the correct solution. Now I want a measure that will separate these two situations where I can say well this A matrix is ill conditioned whatever I do I will get into trouble. This A matrix is well conditioned. If my solutions are absurd I have made an error which is not the case for the second case. Second case if the solutions are absurd it is because there is a limitation. We will continue in the next lecture.