 The last class we have to discuss the unconstrained optimization problem using numerical techniques. First we have discussed what is descent method, then we have we have written the algorithm for the steepest descent method. That means, if you move from one point to another point the function value will decrease and what is the necessary condition for descent direction of the function that we have established. And we have seen the steepest descent method is very slow convergence and this convergence rate is in linear order, in order of 1. Then we have seen to improve the rate of convergence of this algorithm, then we have considered the conjugate gradient method that is based on the previous information of descent direction and present it to the descent direction. Combination of these two information we have used it and we found that there is a rate of convergence is improved, but the order of this convergence is not 2 less than 2. Then we have discussed what is called the Newton's method for solving the unconstrained optimization problems. And if you recollect we have just considered this function that let us approximate the function f of x in a neighborhood of x is equal to x superscript key means at kth point the value of x is x superscript k. And that function is what is called approximated by using Taylor series expansion, then this we have written x of k is equal to f of x is equal to f of x superscript k plus x minus x of superscript. So, this is the delta x part of from the x at kth point what is the value of x from there is a perturbation is delta x. So, we have that Taylor series expansion we made it and kept it up to second order partial derivative terms up to second order series we kept it. Then if you look at this expression it is this function f x is nearly equal to that three terms we have considered is nothing but a quadratic form of a polynomial is that this function one can write it into this form c this is a constant because you know the function value at kth iteration this c plus this is also known you can consider known v transpose. And this is x minus delta x this is unknown but x of superscript k is known but x is unknown. So, it is considered in delta x plus half then this you can write it delta x transpose. And this is a Hessian matrix and that is a symmetric matrix that you write p and delta of x then you see this is a quadratic form that function which is f of x nearly equal to can be approximate with a quadratic function because higher terms of the third order terms of the Taylor series expansion we have neglected. So, this function we have equal writing we are equal to q of x we are writing. So, now q of x is equal to f of superscript this plus this term plus this term. So, you have discussed up to that point now q of x is known in the sense except this function we have to minimize that means we have to differentiate this with respect to x is a vector which elements are that vector elements are x 1 x 2 dot dot x n. So, if you do this one that our necessary condition for this function to be minimized for the function this quadratic function is to be minimized our necessary condition we have derived earlier if you recollect this one our necessary condition condition for the function q to be minimized is del q del q dot means x function of with respect to x is equal to this. So, if you see this one that means I am differentiating this function with respect to x. So, this is a constant. So, this derivation derivative of this function will be 0. So, this is constant now we have differentiating this with respect to x. So, the results we have shown you earlier if you have a f of x is equal to b transpose x then derivative of f of x is nothing but a b. So, you can say this is b transpose x if you differentiate this with respect to x this will be nothing but a b. B means delta f transpose we have considered the b transpose that results if you apply then this term will come b and this term it will come half is there. So, twice p into x. So, our result after if you differentiate this one it will come that f x superscript k that means gradient of this function at x is equal to x superscript k kth point plus delta square f x superscript k this into delta x is equal to 0. So, it is nothing but a if you see your what we are doing it is nothing but a the gradient of q is assigned 0 I am finding out the roots of this polynomial whatever this polynomial is coming this. So, if you now see this one what is x therefore, x minus superscript of k is equal to we can write if you take this is right side that minus delta f x superscript k this and this is the Hessian matrix which is a square matrix of dimension n cross n what is n and is the number of variable or decision variable involved in the objective function or cost function f of x. So, this inversion is delta square f x superscript k whole that inverse. So, next is we can write it therefore, from kth point to another point we have moved it it is we can write it instead of x we can write k plus 1th point this value is updated with the knowledge of kth that point minus c that is the Hessian matrix at kth point you compute then take this inverse of that one into delta f x superscript k. So, this is our equation number 1 which every instant every each iteration this x value is updated like this way, but look at this expression this is nothing but a it is similar to our Newton Raphson method which is finding out the roots of the polynomial or any equation. It is something like this way x superscript is equal to in general if a function is given how to find out the roots of the function it is kth iteration value x minus f of x divided by f dash of x if it is a that x is a single variable case then it is, but here x I am finding out the roots of the gradient of this q. So, this expression is now updated like this way. So, one can write it more general form that one is like this way if you say. So, I can write it that x k plus 1 is equal to x of k plus lambda k into d k and this is nothing but a what is called Newton's method this is called the Newton's method when is that x variable is updating with this one. So, this is the Newton's method this can be written into more general form x of superscript k lambda k lambda k value is greater than 0 and this is d k. That means the descent direction to where d k you can write it where d k is equal to minus the second derivative of the function at x is equal to x superscript k you find out and it takes the inverse then at f of x k. So, this can be easily proved that this is I am the d k it can be that is the descent direction how to what is the condition that the function value one point to another point if you move it the function value should decrease how to you multiplied by d k. Our condition we have derived if you recollect we have condition we have derived it that delta f x superscript k transpose into d k must be 0. If it is in the descent direction of this function that we know d k let us see whether it is let us see what is this value of that one this is the condition that function value that is this condition the function value will decrease if you move from one point to another point. If it is this condition is satisfied then this function value will decrease from kth point to k plus one point when you move kth point to k plus one point. So, let us see this function is what delta k when delta k we have just defined we have got it this is the delta k. So, it is a minus delta k is minus that delta k is your minus if you write it delta f square x of this the assignment square into what is this that your delta f of you see this delta f of x k. So, delta f of x superscript k this is multiplied let us see what is this one. So, minus minus sign is this one delta f transpose x superscript k this that that f square x of superscript k this inverse delta f x superscript k. So, now see this will be a this is a quadratic form this will be a negative when this the Hessian matrix must be positive definite then only this will be negative. So, this condition this condition that gradient transpose into that d k will be less than 0 provided this will be less than 0 provided delta f Hessian matrix of this function at kth point this one again is greater than 0 means positive definite. If this is a positive a p is a positive definite matrix it inverse it inverse is also positive definite matrix that means this is positive definite matrix. So, this is the conditions implied. So, next condition is if this is fails this is not satisfied this one that means we are not approaching in the descent directions. So, and we are not approaching to the optimum value of the function. So, and which in turn it indicates that we are that convergence is not guaranteed until unless that Hessian matrix of the function at kth iteration must be greater than 0 means positive definite greater than 0 means positive definite. So, what is the drawback of this one algorithm drawback of this Newton's method is there in Newton methods how to update this one x k plus 1 is equal to x superscript k plus lambda k d k. So, this is the each iteration is it has to update this one and it will go in the descent direction of the function provided that Hessian matrix of the function at each iteration must be greater than 0 means positive definite. Then it will move it will go to the optimum means minimum value of the function we are approaching to the descent direction each iteration that is the conditions. So, our drawback of this method that this must be a one thing should be positive one thing another thing if you see when we are finding out the x k plus 1 updating this this one you need the inversion of a Hessian matrix. So, there is a inversion in is involved and that matrix dimension is n by n because since the decision variable x as a dimension n. So, this is a n by n matrix inversion you have to do it at each iterations. So, that is the one drawback another drawback the Hessian matrix or the second derivative of the partial second derivative of the function must be what is called positive definite. So, these two are the drawbacks of this algorithms. So, suppose this that function Hessian matrix or the second derivative of function at each iteration does not satisfy this condition then how to overcome this problems. So, next is that what is called how to overcome if how to overcome if delta that is second derivative of the function at each iteration f of superscript or Hessian matrix of this one inversion if this is not positive. How to overcome if this is this is not positive definite. So, how to overcome this one. So, one of the that is that is called the modification of Newton's method for finding out the minimum value of unconstrained optimization problems using. So, modified you can say modified Newton's method while solving the unconstrained optimization problems. So, let us define a new matrix of same dimension of the Hessian matrix whose dimension I am just writing n cross n matrix dimension n cross n. And this n cross n means there is a number of variables decision variables is n x 1 x 2 dot dot x n. So, Hessian matrix dimension is n by n naturally m should be n by n. So, that I am just adding a matrix which is diagonal matrix with the Hessian matrix and this matrix this matrix dimension n cross n this matrix dimension is n cross n. So, what I did if it fails that this quantity is not a positive definite matrix why it requires a positive definite matrix that d k descent direction whether we are moving to the descent direction of the function or not that we have seen the condition must be satisfied what condition the gradient of the function multiplied by d k must be less than this less than 0 which in turn implies that gradient what is called the Hessian matrix of this one should be positive definite. Suppose it fails this one then you add a diagonal matrix with this one and where mu k is a very greater than 0 and real quantity greater than 0 real and sufficiently small greater than 0 then real and sufficiently small this mu k. So, even it is a negative definite matrix it I am adding with a some positive quantity and diagonal elements is this one. In other words you can say if it is a negative definite matrix you multiply it by both side by z any vector z transpose z again it will be negative you multiply it by mu both side z transpose z. So, it will be positive so positive and negative in turn it may what is called give you the m of k positive definite matrix. So, this is the idea of that one. So, this will make that m k is greater than 0. So, what is our now then d k in place of the Hessian matrix inversion I will just write it minus m k. Now, our modified is Hessian matrix is like the m k inverse gradient of x of superscript k of this one is a descent direction of f. So, this will ensure this will ensure that this will ensure that f of the necessary condition that when we will move from one point to another point that our descent direction whether we are moving to the descent direction or not. And this condition will ensure this into d k will be will be see what is this one del f superscript k transpose and d k I will write it d k value is here d k value. So, it is a minus one again minus then you write m k whole inverse then gradient of that. Now, again minus is there I can write minus del f transpose x of superscript k m k minus del f x of k that quantity will be less than 0. That quantity will be less than 0 means negative if m k is m k inverse is positive quantity because it is considered minus this implies m k will be positive. So, the m k we have suppose it fails it is not a positive definite matrix then I am adding a small quantity of along with this one then we made it this is m k positive definite to avoid that problems if it is negative definite. If it is negative definite that second derivative of the function then the algorithm will not converge. So, if we can write this now our algorithmic steps for Newton's method for to solve the what is called the unconstrained minimization problem. So, algorithms or optimization problems mix steps is same as earlier what we have discussed for this method then we have discussed for conjugate gradient method exactly same way we will write it first our problem is if you recollect our problem is minimize this minimize f of x. So, our first step choose the initial guess x of 0 and let our iteration starts at 0th iteration k is 0 to 0 then epsilon 1 epsilon 2 and epsilon 3 are all positive real quantity, but very small quantity very very small quantity. And this will nothing but a tolerance for stopping criteria of the algorithms. So, this is the tolerance for stopping the what is called iterative process this is the things. Step 2 once you know this one x of 0 is considered immediately I can find out what is called gradient of this function immediately I can find out the gradient of the function. Let us assume I am finding out the gradient of the function at kth iteration or kth step kth step kth iteration. So, you find out the gradient find or determine the gradient at this because I need that in information when you will find out the descent direction of the function. So, x of this you calculate of this next step 3 once you know this one agree. So, I have to check it whether this is I have to check compute the Hessian matrix of the function at x is equal to x superscript kth iteration what is the value of the decision variable value at this point you compute. Now, check if this gradient of what is called the Hessian matrix of the function at kth step this is greater than 0 means positive definite matrix then d k u update descent direction will be minus Hessian matrix of that one of the function inverse multiplied by gradient of this function x superscript of k. So, this d k you find out suppose it does not satisfy you try else you try d k is equal to minus this is mu k agree mu k plus that our what is this one we have to consider gradient of f square you see this one what I am writing m k is modified mu i plus delta square Hessian matrix of this one. So, what this one is mu i sorry I missed i mu i that this square agree that is the second derivative of the function at kth step this you do it that one then take inverse that is nothing but m k this whole thing is our m k if you recollect this is m k whole thing inverse agree into that our f x superscript of k. So, this part this inversion because if it is less than if it is not greater than means positive definite then if you try with this one the algorithm will not converge. So, we are trying with adding with this one a positive quantity mu and then taking updating the d k is like this way agree and we have shown with this one if you multiply the gradient transpose of this one condition for descent direction is satisfied if it is a positive definite this. So, d k we got it now once you know d k d k of this one immediately I can find out step 4 up to the step 3 find the optimal step size step size lambda k is equal to lambda k star and that we already we know if you recollect this one that we know already how you to find out that one. So, that means lambda is equal to lambda k how you have done it you know d k now by this time you can find out x superscript k plus 1 is equal to x k plus lambda k and d k you know d k you know x a. So, in the function you put it x is equal to x superscript k plus 1. So, that function will be a now function of lambda k now question is what should be the choice of lambda k now it is a single variable function what should be the choice of lambda k. So, that function value will decrease as small as possible in the descent directions that is our point. So, if you differentiate this one with respect to lambda s n to 0 then you will get it the lambda value star is we have done it earlier also. Now, our this question is like this way if gradient not gradient that is the Hessian matrix of the function at k th iteration this is greater than 0 means positive definite then your lambda star lambda k star is equal to minus lambda f transpose x of superscript k this into d k divided by the d k this. So, I know d k then you update that one, but if it is less than if it is not true else if this greater than 0 means positive definite matrix then update like this way else lambda k transpose star is equal to minus delta f transpose x of superscript k d k, but only it will be a changed of that one. So, d k transpose it will be a mu k i plus delta square f superscript of k this is k agree that into d k. So, this way you have to update that is the optimum step size of lambda you can get it once you get it optimum step size then immediately I can find out what is the updated value of the decision variables the step 5 compute x superscript k plus 1 x superscript k x k plus lambda k star just we have obtained that one into d k. So, next is check the with the tolerance whether iteration will be stopped or not this one each after each iteration we have to check that one. So, compute del d f and d f we have denoted by f of superscript k plus 1 function below at kth iterations minus the function below at k plus 1th iteration function below at kth iteration the difference is delta f and delta x this is a scalar quantity, but this is a vector of dimension n cross 1 and that is a difference of value in at 2 points at kth iteration decision variables value and k plus 1 of the decision variable values minus kth iteration 2 successive iteration value difference. So, now if delta f absolute value of this one is less than epsilon stop it indicates a function value is not changing at all then 2 if delta x here I cannot write that mod because it is a f is scalar I can write mod now delta x is a vector either you write the ethylene norm or you write the distance square delta x of delta x which is a scalar quantity now this will be a this is epsilon 2 this is epsilon 1 stop this indicates the decision variables value are not changing and third criteria one can use it what is called that our gradient function below that has a gradient of f transpose at k plus 1th iteration and delta f of this one iteration this indication is what this is nothing, but a gradient of function is a vector that means we are finding out the slope of the f at x is equal to x 1 keeping all the variables fixed similarly gradient of this that is the function derivative of the function at x is equal to at what is called x is equal to x 2 point all other points x 1 x 3 dot dot other points are remaining fixed means it is a vector gradient of the vector vector is a column vector multiplied by row vector. So, if this quantity is less than epsilon means if it is very small that means we have we are approach to the minimum value of this function where the slopes are gradients are almost 0 that is epsilon 3 that it indicates that we have reached to the convergence. So, this is the our algorithm steps for modified or the our conventional Newton's method. So, next is that I told you that there is a difficulties in if the Hessian matrix of the function is not positive definite matrix then we have modified that matrix by adding with a scalar quantity mu k multiplied by identity matrix of same size of Hessian matrix this is one way of doing another method of doing is there and we have mentioned that the drawbacks of the what is called the Newton's method one drawback is there that Hessian matrix or second derivative of the function must be positive definite. That means that one should be positive definite if it is not positive definite if it is a positive definite means we are approaching to the descent direction of the function. If it is not that means we are not approaching to the descent direction away from the descent direction and that means that the what is called this each iteration the algorithm will diverge if does not satisfy this one. Another disadvantage of Newton's method is we have to take each iteration the inverse of a Hessian matrix. So, this that is f f there is x superscript k this Hessian matrix inversion we have to take each iteration and competition burden is involved here is much if it is a number of variables is more. So, it is a time consume for determining the search directions you can say time consuming because the dimension of the matrix is this an inversion we are taking. So, it is nothing but a time consuming to find the direction of the search of the function. So, next is what is called we will take a quasi quasi Newton Newton's method. Again this is similar to Newton like method Newton similar to Newton method. Again this quasi Newton method takes the advantage of what is called that to the steepest descent information steepest descent information is taking and Newton's method what is called information also taking in Newton's method we see Hessian matrix is there inversion you have to do it again. So, it is the quasi this one this method you can write it this method has desirable features both steepest descent and the Newton's method desirable features of both Newton's method. But here you will see that it does not take directly the inverse of this matrix inverse of Hessian matrix it finds the inversion in place of taking inversion it finds in iterative method to avoid the inversion of Hessian matrix. So, what is this method we will see this one. So, a natural extension of that Newton Raphson method is from that this method you can extend it replacing the inversion of a Hessian matrix by some matrix. So, what is this is doing it here in natural extension of Newton's method is to replace the inversion of Hessian matrix at each iteration that inversion is replace this thing by a as I told you positive definite matrix positive definite matrix. Say the matrix is s k whose dimension is n cross n this Hessian matrix dimension is n cross n. So, I am replacing this one by a matrix s of suffix k means kth iteration then this matrix should be positive definite because in order to move in the descent directions. So, what we are doing it here you see now x of the descent variable value at kth iteration is equal to x of k plus 1th iteration is equal to kth iteration decision variable plus lambda k the step size in that descent direction you will move it that you have to find it optimal steps size that we have shown you how to find out then this multiplied by what is next is here in Newton's method that Hessian matrix inversion in place of this one I am writing x k is a matrix the inversion is now replaced by s k and delta f of superscript of k gradient of this matrix. Now, therefore, our d k is equal to minus x k the what is called search direction is what s k s k is positive definite matrix which is replaced by this is this s k in place of the Hessian matrix inversion I replaced by s k so that f of x that this. So, this is our direction method so long s k is positive definite matrix agree the condition of descent direction of the function is satisfied agree. So, where lambda k can be determined by minimizing the function lambda k same way lambda k can be determined by minimizing x of superscript k plus lambda k then your d k this one that we have repeatedly we have told how to get this value and this value that lambda k what is the optimal value of this value of lambda k. So, the function value is minimized at that iteration kth iteration we know how what is the choice of that one. So, now x k you know it, but next iteration as at k plus 1th iteration what should be the value of that one you know this one, but at that instant again you have to find out the what is called Hessian matrix value at x is equal to x superscript k plus 1 and k k plus 1th iteration, but inversion you have to take it, but that things we have to what is called avoid that one inversion we have to avoid. So, I have to update x k by what is called x k plus 1. So, next is our choice is how to update x k next question at x k at every iteration at every iteration. So, that is our question. So, if you see this one that how you are updating I am writing the algorithms then I will show the proof this one. So, x k plus 1 that next iteration the value of the Hessian matrix inversion that is x k plus 1th k plus 1th iteration will be x k previous iteration value x k plus delta k I will define what is delta k x k then gamma k multiplied by mind it this s k e is what the dimension of s k e is n cross n because it is a Hessian matrix inversion I replaced by s k. So, this is dimension is n cross n and this dimension if you see this dimensions is a one row n column I will discuss how it is then multiplied by same matrix sorry same vector gamma k this is gamma k x k transpose. This is a row vector of dimension this dimension is one row n column and that dimension is if this dimension is n row this dimension is if you see this dimension no sorry this dimension is n row one column sorry this dimension is n row n row one column and this dimension is your one row n column just a minute one minute. So, this dimension this now see look this one this matrix this one this product of this one must be a matrix of dimension n cross n let us see now this one because you are adding with s k. So, this should be a how many rows are there n rows one column this one should be n rows one column and this will be a your if you see this one this one row n column. So, I was correct earlier so this so now what is this one so that I am writing I am using this one that whole thing is divided by this whole thing this is the dimension I have written it this row vector this column vector this column vector we divided by gamma k transpose then this row vector delta k minus x k gamma k. So, I repeat once again that what you have written it here clearly x k plus one is equal to x k whose dimension is n cross one plus delta k minus x k this is x k into gamma k multiplied by transpose of that one delta k minus x k s k gamma k whole transpose divided by that gamma k transpose into delta k minus x k this is x k this is x k into gamma k. Now, I told you what is the dimension of this matrix n row one column n row one column and what is this dimension of this may one row n column. So, we multiplied by this it will be a matrix so matrix divided by that must be scalar quantity because we cannot divide by another matrix this one. So, this thing I told you what is this one that is your n row one column n row one column and that must be a one row n column. Let us see where what is what delta k is what where I can write delta k now it will be clear what is the dimension of this one delta k is the difference in decision variable at two successive iterations delta k and what is this dimension you know n row one column this dimension. So, gamma k is nothing but a the difference in gradient value at two successive iterations. So, delta f x k plus one minus delta f x of superscript k this one what is the dimension of that one we know already this dimension is n row one column. So, this gamma dimension is n row one column and this dimension also n row one column. Now, you see this one this dimension you check it you will see this will be a matrix at this one. So, this way you have to update now what is this if you recollect this one that our quasi what is called Newton's method is nothing but a it is it is a it is a Newton's method only the inversion of Hessian matrix is replaced by a matrix x k agree that inversion and then multiplied by gradient and this two combined is given the with this thing is given the our descent direction that one. If you see the descent direction is d k is we have calculated minus the Hessian matrix square and a Hessian matrix inversion into gradient of this function at kth iteration in that one we have s k is this one minus n is there. So, this is the descent direction. So, I am just then once you know that s then how to update this x next iteration this update is done by using this expression what is called this expression. Now, question is how we got it this relationship. So, I am just putting it this one reference you take mathematical I will derive it, but still more study if you want to do mathematical programming theory and algorithm that author is M. Publishers is this is a book John Willie and since 1986. So, the basic steps of that one is derivation just I will show you how I have written this updated expressions and x k is what is the Hessian matrix inversion is replaced by x k and how x k is updated it is written by that expression. So, next is derivations we know the difference of a function at two successive point carries the information carries the information about it first derivative. And in multivariable case multivariable functions the difference of a function at two successive point carries the gradient of these functions. Similarly, I can say difference in what is called gradient of a function at two successive points carries the information of second derivative or carries the information of the what is Hessian matrix. So, in short I can write it what is difference in the difference that one I am just writing the difference of the gradients of a function at two successive points contains information carries information contains information about second order derivative of the function of the function between the two points. This is same as the difference in function value at two consecutive values carries the information about the first derivative. Next is I am telling the difference in what is called first derivative value of the difference of first derivative of a function at two consecutive points carries the information of second derivative. So, we can write it in n variable function that gradient of that f x k plus 1 this second derivative x superscript k plus 1 minus x of k this is equal to difference in the gradient at two successive points minus f of x superscript k. So, difference in gradient of two successive points carries the information of the second derivative of the function. In multivariable case I can multivariable functions this difference in gradient and it is a Hessian matrix carries the information Hessian matrix and this is the multiplied by that vector. If a single variable case you can easily understand that difference in derivative divided by because x is a single I can divide this one, but in multivariable I cannot divide that one. So, this is the basic step of that one. So, what we can do this now you can see here x k plus 1 is equal to x k plus 1 if you see this one x k plus 1 I can write it x k plus 1 minus x superscript k is equal to this you take that side that will be a gradient not gradient it is a Hessian matrix of that second derivative of the function at k plus 1th iterations inverse multiplied by that one difference in gradient value at consecutive two points minus delta f of superscript k. So, this one see so what is this expression now I can write it x k plus 1 is equal to x of k plus the second derivative of function or the Hessian matrix x superscript k. Plus one whole inverse into del f x superscript k plus 1 minus del f x superscript of k this one agree. So, this I can represent by a x suffix this I can write it x suffix k plus 1 I have when subscript kth iteration I have inverse and I have replaced by x k. Now, it is a x k plus 1 iteration that Hessian matrix inverse I will denoted by it is a x k plus 1. So, today I will stop here I will continue next class the continuation of this lecture will be next class. Thank you.