 class we have discussed that unconstrained optimization problem using numerical technique. And we have seen that solution to numerical technique is nothing, but iterative process and objective of iterative optimization problem is to find out the minimum value of the cost functions. So, if you just recap what we have done it last class you see that we have a this that we have a let us call this is the function we have to minimize it. And at kth iteration there is one point is here let us call x of k and the function value of x of superscript k. Then from kth point we have to move in such a direction. So, that the function value at k plus 1th iteration the function value will be less than at kth iteration value function value. So, you have to move in such a direction. So, we have found out with all this thing this is the condition the direction of movement from kth instant to k plus 1th instant then this is the conditions then you have to move in these directions which is nothing, but a minus of the transpose of the gradient of the function at that point. So, this is the condition and we have seen that if we have a function f which is a n variable functions that x 1 x 2 this function is to be minimized. And we have x 0 x 1 dot dot x n we have generated through these expressions. And this d k is the direction of that we are moving from kth instant to kth iteration to k plus 1th iteration. And lambda k is a scale factor which value is less than greater than 0. So, in this way we have to move and when this lambda value is p determined then this algorithm the way we are moving with the direction such search direction then we will call this is a called a gradient method only. But lambda k is p determined when lambda k is optimized and this optimization can be done through any numerical technique any minimization technique. Because the function when we will put this x is equal to k plus 1th iteration at function value then whole function will be a function of lambda k then only single variable. So, we can find out the optimized value of this function for lambda k is equal to lambda k star such that the function value will be minimized. So, we can get it the minimum value function as far as possible with the choice of lambda k is equal to lambda k star. So, that is we have and we have how to find out the optimal step size of this one we have derived that this is the condition for to get the optimum size of the step size when we move from kth iteration to k plus iteration agree. Then which direction it will move that is d of k and what will be the step size that one and that lambda k we have found out by this expression the gradient transpose gradient of the function transpose into d k divided by that quadratic function which is a d k transpose that gradient of the function once again you have differentiate with x to x the second partial derivative of the function into d k will give you the direction of this d k is the direction of the search direction with p c with minus that is the choice of lambda k and lambda k is greater than 0 and look this expression this is nothing but a condition that which direction you will move it. So, that function value will decrease from kth is instant to k plus 3 plus 1th instant the function will be this is the condition and that will be less than 0. So, less than 0 means negative and negative that will be positive. So, lambda k is a positive and this value that if you once again you differentiate this is the derivative of the function derivative of this function with respect to lambda k we got it lambda k value. And if you put this value if you put this lambda k value in the second partial derivative second derivative of f of x where x is equal to x k plus lambda k d k with differentiating with respect to lambda k square lambda k square then you will see if you differentiate this one forget about this one this is the grade what is called derivative of the function f which is a function of lambda k only once again you have differentiate with respect to lambda k. So, it will be remain only that part this part will remain because lambda k differentiate once again this one and this quantity you see this quantity is a what is called positive definite this one if it is this positive definite then we have reached the function value as far as possible minimum at that choice of lambda k. So, this is the so far you have discussed this now let us see I told you when that if you see this one when the step size is optimized at each iteration when the step size lambda k is equal to lambda k star is optimized at each iteration we called it is a iteration we called it is a steepest descent method at each iteration then this is called or it is called steepest descent steepest gradient method. So, lambda k we have to optimized so that the function value will be which a function value means it is a function of lambda k which will be as far as possible minimum the function value. So, let us see this one algorithmic steps for steepest descent method. So, what is this first we have to get the starting point of our iterations that means initial guess of step one chooses starting point x superscript 0 superscript indicate iteration number of iterations that that at 0th instant or 0th starting point this value of x is equal to x superscript start that you choose it and let k is equal to 0 and epsilon 1 epsilon 2 and epsilon 3 are the stopping criteria are the stopping criteria of the algorithms. This is step one and step once you know this one at that point let us call this is the function at I know this point x superscript this at this point you find out the gradient step two at kth step in general at kth step determine the gradient of the objective function or cost function of the objective function f of that is our objective function which is n which is n cross 1 variables. So, our job is if you see minimization of a that objective function and that our objective function is f of x because this one minimize this is our objective minimize this function using steepest descent method descent means from one iteration to another iteration when you will move it the function value should decrease from the previous just previous iterations the function value should discrete is a descent value of functions downward values. So, the objective determine the gradient of the objective functions that means in other words you can say one the gradient of this function you calculate at kth instant this one and next what is the your what is called gradient we got it at k is equal to kth iteration k superscript of k x superscript of k second you calculate the descent direction that which direction you have to move. So, that the function value will decrease from the just previous value of the function. So, that is d k d k is equal to minus delta f of superscript of k that is the search direction. So, this is called search direction. So, that is we have to find out in step two then one step three once you know this one then you can find out what is the value of our decision variable x at kth k plus one iteration that we can easily find out agree. So, step three, but in that expression you see we have a x superscript k plus one is equal to x k x superscript k plus lambda k into d k d k we have calculated x superscript k we know that what is the initial or kth iteration the value of the decision variables and lambda k and lambda k if you have determined pre determined then it is a gradient descent method gradient method in short or if I use the steepest descent method then you have to optimize the value of lambda k. In other words you have to optimize the step size of the scalar quantity lambda k. So, how to do this step size of lambda k you have to minimize minimize in what sense that. So, that function value at lambda k is equal to lambda k star some value of lambda k star the function value will be as far as possible minimum in that iteration and that in it agree. So, this we have shown it in our last class that how to find out the optimal step size of lambda k find the optimal step size if you recollect we have derived this one by putting the value of x is equal to x superscript k plus one and where x superscript k plus one is equal to x superscript k plus lambda k into d k that such directions. So, that we got it last class if you remember that it is nothing but a the gradient transpose of the function at kth iteration this function transpose into d k divided by d k transpose and the Hessian matrix of the function at kth iteration d k and this is the optimum size of the lambda k which will give you the function value as far as possible minimum at that iteration. So, step four once you know this one. So, I know what is this value of x superscript k plus one is equal to x superscript k plus lambda k star note that each iteration that lambda k value will change we are optimizing that function value which is the function of lambda k only. So, d k this is known this is just in step three we have calculated known and this is the work kth iteration the decision variable value that is also known. So, you know k plus one at the iteration what is the value of decision variables this. So, this way you have to repeat this process then where when you will stop this process means which process that is the iterative process when you will stop it step five step five you can write here compute then step five is calculate that delta f mind it I am writing delta f delta f if it was the change in function value at k plus one th instant iteration minus function value at kth iteration. So, it is a x k plus one the function value minus the function value at kth iteration. So, this is delta f also calculate delta x that this is the value of the decision variables at k plus one th instant minus x of superscript kth instant. So, this is delta x. So, this is a scalar quantity as you know and this is a vector n plus one because our decision variables we have a n variables x 1 x 2 dot dot x n. So, we have a we can stop if you like this way if mod of absolute value of f delta f since it is a scalar quantity we are taking the absolute value of that one is equal to nothing but a if you see f of x superscript k plus one agree that bracket is here this one minus f of x superscript k of bracket this one absolute value this if it is less than epsilon one which is a pre assigned value that value is positive quantity and very small. If it is less than this one stop it indicates that the function value is not decreasing. Now, it indicates the function value the function value value is not decreasing it may be another criteria if delta x delta x is a vector of dimension n cross one. So, we have to write that the absolute value of that one or it is a nothing but a distance of the vector this from k plus one th instant what is the decision variable value minus kth instant what are the decision variable value this difference is delta x. So, if you just write delta x transpose of this one this one is a positive quantity. So, you need not to write that one or in short I can write it this one if you use the norm delta x norm echelon norm this is called echelon norm square if it is less than epsilon two again the epsilon one and epsilon two and epsilon three we have considered in the at the beginning that means, step one they are P assigned value which value is very small if it is less than this one stop it stop this indicate the decision variables are not changing much. So, this is indicate the decision variable again the variables design variable or decision variables design variables are not changing this criteria. And you see there may be a some function which function value is changing very slowly agree, but it has not reached to the optimum value of function. So, if you stop this one then you are land up with a wrong answer. So, there may be a some situation delta x the delta x when it is changing very small from one point to another point, but function value is may be very large function value may be very large. So, if you stop it here then there is a problem. So, one can do the both the criteria commonly taken into account to stop the iterative process. So, another criteria is convergence criteria you can say that if that gradient of the function at k plus 1th iteration again k plus 1th iteration into gradient of gradient transpose and that x of k plus 1 iteration this quantity is a gradient of a function is a column vector and transpose is a row vector. So, these two quantities will be a scalar quantity and that scalar quantity is a what is called positive quantity is just like a norm of a vectors. So, this quantity if it is less than epsilon 3 and epsilon 3 is a positive quantity you can write this is greater than 0, but very small quantity agree then stop. That means it is that our algorithm converged. So, here you can all these cases I mention it that this will be a positive quantity which is very small lambda epsilon 1 and epsilon 2 greater than 0 positive quantity and very small. So, this is the criteria of this one. So, you have to check each iteration of this one. So, once you check it then your iteration is updated k plus 1 is updated to this kth I am writing symbol by k then what will do you go to step 2 here you have to come back see this one. So, that k plus 1th iteration that decision variable value at k plus 1th iteration x to the power of k plus 1 that you know it. So, now you come to this step and proceed step 2 to what is called step 5 until unless this criteria is satisfied. So, this is the iterative process again. So, this is the algorithms. So, remarks one that the remarks on the steepest descent method steepest descent method. So, one we can write that the steepest descent method has the descent property has the descent property. Why because we have selected the search direction again in such a way that condition is satisfied. What condition we have shown it that if you recollect that gradient of that function transpose into decay is again must be less than 0. So, with that condition we have selected the decay. So, that is why the descent property is satisfied descent property means at kth iteration what is the value of the function. That function value is greater than at k plus 1th iteration in the function value. That means at k plus 1th instant function value is less than the function value at kth iterations this is. So, that descent property is satisfied second is convergence is guaranteed. And third is regarding convergence rate convergence is convergence order is 1. So, this is the thing. So, convergence is very slow for this one, but it is much better than the gradient method. So, next we will see what is called the some other technique which is a much faster than the steepest descent method that is called what is called conjugate gradient method conjugate gradient. We have noticed that our crucial factor is selection of search direction and that will our I mean main point how to select the search directions. So, that the function value at k plus function value at k plus 1th iteration will be less than the function value at kth iteration, but it how first or how the function value as far as possible it will be a minimum at each iteration. So, our conjugate gradient method is just a steepest descent method only minor modifications. And that helps our performance of the algorithm drastically it improves it improves the performance of this conjugate gradient. Just slide to change the algorithm in the what is called in steepest descent method. So, the conjugate gradient method you can say just small change in the steepest descent method the conjugate gradient method is a small modification is a small modification to the steepest descent method with an enormous improvement on the performance and enormous effect on the performance. And the what sense it is enormous effect on the performance the rate of convergence is must rate of convergence of the conjugate gradient method is must faster than the what is called the steepest descent method. So, what we made it here we took the some history of search direction of previous iteration to find out the present search directions that is we are taking into account. So, if you take this into account we can write it now straight way the algorithm like this way. So, I will just explain through algorithm that what is the conjugate gradient method and that rate of convergence is faster than the steepest descent method due to only one reason that we are taking the history of previous descent previous search directions while we will competing the present search directions algorithmic steps for conjugate gradient. So, our problem is minimize this function of f x let the function which is a function of n cross 1 is to be minimized that is the what problem because why we are calling always minimized because it is a descent direction we are moving from kth iteration to k plus 1th iteration in such a way the function value at k plus 1th iteration is less than the function value at kth iteration and it is descent direction. So, it is the minimization problem where if you do move like this way we will achieve to the minimum value of the function. So, this idea can be applied for maximum finding out the maximum value of function how just maximum value maximization of the problem can be can be reformulated into a what is called if I multiply by minus 1 of this maximum of the cost function which is a maximization of the function then after multiplying by minus 1 we can think of it is a minimization of the minus of f x. So, that we can do it, but for all this time is a minimization because we are looking for the search direction the descent direction search direction is the descent direction. So, this we have to minimize. So, first step is same as earlier choose the starting point choose a starting point x of superscript 0 and let the same steps k is equal to 0 then assign epsilon 1 epsilon 2 and epsilon 3 for stopping criteria the same as that what is called the steepest descent method step 1. And step 2 you will see that what we are doing it this one I told you there is a small modification in steepest descent method then what is that modification step 2 at kth iteration at kth step or iteration. Determine gradient of that function at kth iteration means at kth iteration what is the design variables value x value you put it in that gradient of that functions this step 3 compute the conjugate directions compute the new conjugate direction as d k is equal to minus del f gradient of that function at kth iteration look if it is there it is nothing but our what is called steepest descent method, but we are adding some other term that will take the history of previous iteration of such directions agree the lambda k not will use different notation beta k into that d k minus 1. So, it is the previous iteration information of search directions. So, this is the you can this so this is the this is the you can say the previous iteration search directions iteration value of value of search direction. So, this and this beta k is a constant which is greater than 0 this one. So, all these 2 products is nothing but a scaled this together it indicates the scaled of previous iteration search direction agree. So, this total product is nothing but a you can say the scaled search direction of previous iteration agree. So, this now you see naturally I am taking some information of previous iteration search direction and adding with the present iteration search directions this things. So, this will is the this d k will improve the rate of convergence of this algorithms agree. So, where beta k is equal to how it will be delta f transpose of superscript k this into delta f x superscript k of this. So, this is a scalar quantity of this one that means you find out the gradient norm of this gradient at kth iteration what is this one you find out norm square this one this in in depth divided by previous iteration with kth iteration and take the previous iteration that is f of x k minus 1 this then delta f of x k minus 1 this iteration that beta k. So, this is a positive quantity this is also positive. So, results is a positive quantity. So, beta k is you can write it greater than 0 agree. So, if you look this at this expression because what is the guarantee that d k the if you move in this direction what is the guarantee that we are moving in the descent direction. That means function value at k plus 1th instant k plus 1th iteration the function value will be less than at kth iteration function value at kth iteration what is the guarantee. So, this you can check it by multiplying by both side by gradient transpose of that one that is what is the condition we know descent direction. If you want to do this one that our condition is if you recollect that delta f of x k whole transpose into d k must be less than 0 this is our condition that function value will always at k plus 1th iteration will be less than the function value at kth iteration at kth iteration the function value will be less than this one this will be true provided this condition is satisfied agree. So, if you multiply by both side delta f transpose of f of x k this quantity multiplied by both side this one this quantity is positive preceded with minus. So, this quantity is negative, but what about this one I just mentioned beta is greater than 0 agree and d k in previous iteration also you have seen it is a this quantity I multiplied by delta f k this agree. So, this quantity what will be this quantity that condition of that one is it will be negative this one. So, whole quantity is becoming negative that if you multiply by delta f this whole quantity less than 0. So, we are moving to the descent direction. So, this step three so this way you update the not update this way you find out the descent direction of that one agree. Next is step four and you see this quantity if you look at this expression this expression it is always delta if you multiply by this delta f transpose of x to the power of that kth iteration this product both side you multiplied by this quantity just now I mentioned it is a negative quantity. That means we are moving to the what is called descent directions. The step four calculate now you can calculate comfortably that value of decision variable at k plus 1th iteration is equal to x of superscript k plus lambda k into d k that expression and in this step we have to find out the optimal choice of step size. So, that the function value will be as far as possible minimum at that iteration. So, that we know how to select the hour what is called lambda k is equal to lambda k star that we know. So, this where lambda k is equal to lambda k star is determined by minimizing f of x superscript k plus 1 is equal to f of x superscript k plus I am just putting the value of x k plus 1 value lambda k d k. Now, this is the function of only lambda k that you know what is the choice of that one this implies is lambda k is equal to lambda k star is equal to minus that we have derived earlier is nothing but a x of superscript k into delta f of superscript k this k this divided by d k and this is the Hessian matrix or second derivative of this function partial derivative of the function at k superscript k this one again into d k. So, that at that step you have to find out the optimum step size optimal step size that you have to calculate. Once you calculated this one then you know because you know this one you know this one this I have calculated and that will give you x k plus 1 what does it mean with this value that will move from k th iteration to k plus 1th iteration and the function value will be as far as possible minimum with this choice. So, this next is our stopping criteria and the stopping criteria is same as we have discussed earlier in steepest descent method. So, you define delta f is equal to f x superscript k plus 1 this minus of f superscript of k and delta x is x superscript k plus 1 minus x superscript of k this is delta x and then you just check this criteria. So, one criteria is f since delta f is the change in function value is a scalar quantity I can write delta f of this if it is less than epsilon the stopping criteria which is a positive and very small quantity may be 10 to the power of minus 6 minus 4 as you consider the pre assigned tolerance value. So, this stop and you know the meaning of this one the function value is not changing with it from one iteration to another iteration. If this delta x transpose of delta x which is n cross 1 if less than if less than epsilon 2 stop this indicates that decision variable or design variable are not changing. Similarly, if this is your gradient of that function transpose at k plus 1th iteration this and gradient of this this meaning is there are distance square gradient is a vector gradient of the function is a vector that vector square that vector distance square if it is less than this then stop means it is converged. So, this means it is epsilon is a very small quantity I told positive quantity is a very small let us go 10 to the power 4 this indicates the delta f is very small delta f and delta f gradient of this one is small that means the gradient is equal to 0. That means we have reached to the optimum value of this function at that iteration. So, this is our algorithms we have now what you have to do it that one you update your iterations check every iteration these things and then go to step 2 and step 2 if you see in this conjugate gradient method is that one you come and repeat step 2 3 4 5 and so on. So, our difference from if you look at this one or the difference from the steepest descent method is this step this step 3 means present iteration we are taking the value of what is called the gradient that is search direction we are taking it in addition to we are taking the scaled search direction of previous iteration k minus 1 d k of k this iteration we are taking into account. So, this d k will ensure that value of the function if you move like this way and reach to the k plus 1 iteration value the function value will decrease from the function value at kth iteration this in this that I told you multiply by both side gradient of f transpose then this will this will see this expression will be less than 0 that is our necessary condition for descent directions. So, let us go next is our what is called newton's method. So, next is our newton's method as you know earlier that if you have a function f of x which is a scalar function let us call f of x is 1 and it is a function of single variable and if you recollect this one that suppose if you are asked to f of x is a function which is equal to 0 I asked to find the roots of this function then what we do we can do analytically or you can do iterative process that is one of the iterative process is the newton's method that what we do we take some initial guess then we find out that the improve value of this variable x is equal to x superscript earlier iteration that variable value minus f of x k divided by derivative of x k I asked I told you that this is the function of a single variable case agree. So, this thing we we can do it and you repeat this one and when it is converged the same convergence criteria you can put it then it indicates the converged value of that x is the solution or roots of the that function or roots of this function or solution of this function. So, that is we know same concept we can apply here to find the what is called minimum value of the function f of x. So, our job is to now let us assume that the our minimize it is a minimize f of x. Now, our function is a variable of this is a function is a variable x is a variable and this has a n component is a function of x 1, x 2 dot dot x n. So, this function you have to minimize then what we are doing it look this let us assume that the function our job is to minimize this objective function or cost function using newton's method, but whole concept I will apply what is the how to find out the roots of the what is called the function the same constant what is concept we can use it here to find minimize this function. Let us assume that function f of x which is a n cross 1 this function is differentiable twice agree that means the second derivative of this function can be computed. Assume the second and this we assume this function is twice differentially and it is also valid for other algorithms also that we have discussed it here agree. So, now let us what we will do just see let us we approximate let us approximate the function f of x this function we approximate this function in a neighborhood of x is equal to x superscript k this let us approximate the function in a neighborhood of x is equal to x superscript this means at kth iteration what is the design variable or the decision variable value at this one we want to approximate this function value. So, that we can do by using by truncating the Taylor series x function Taylor series again. So, you know that f of x is what I am approximate the function value in the neighborhood of x of k. So, that way I can write it now x of if you see this one x of this is the thing and then I am writing x minus x of superscript k. So, there that I consider is a delta x suppose you have a x just to Taylor case if you think of if it is a x of superscript k neighborhood of this one is delta x either in this side or this side. So, that is I am telling it if you call this is the x. So, x minus x superscript k is delta x either this side that side. So, this I can write this equal to till now nearly equal to this I can write it I am truncating the series after second order. So, I can write f of superscript k this plus gradient of this function transpose f x superscript k into this one x minus change in the decision variables this is what we can write it this is delta x plus I equal truncate after second order. So, factorial 2 then it will x minus x superscript k whole transpose then second partial differentiation of this function at x is equal to x superscript k means k th iteration what is the decision variables value you put it x is equal to this. So, you can simply write x of this into del x minus x superscript k. So, now see this one this is the thing, but this is approximately that one. So, this is the quadratic function you see it is something like a I can write this is the delta x transpose this is the matrix which is the Hessian matrix which is a symmetric matrix Hessian matrix and symmetric matrix this is symmetric and this is the delta x. So, it is a quadratic function you can say and this is also quadratic function it is something like x transpose p x then it is x transpose some matrix into x some vector rho vector into x and that is f of x scalar quantity all these things are scalar quantity. So, it is a all together this is quadratic form of a function in terms of x because x k you know it this is also is something like a x transpose b because this is scalar quantity I can put the transpose both side then it will be x minus x transpose into delta f. So, it is a b and this is the scalar quantity constant known scalar quantity. So, this is the quadratic form. So, if I consider is a x this whole quantity I am now denoted by this here and here I denoted by f is equal to q of x is equal to f x superscript k plus delta f transpose or I can write it does not matter if I write it like this way agree or let us call write this that way f transpose x superscript k into x minus superscript k plus half x of x superscript k whole transpose delta square f x superscript k into that x of superscript k. So, this is a quadratic function. So, this is a function of x only if you see because I know x of k value all these things that our kth iteration value that is a decision variables value. So, it is a function of only x. So, I can minimize this function minimizing this function means approximately I am minimizing f of x. So, this function minimization is what is first necessary condition is gradient of that one is 0 you have to put it. So, then once you get it gradient of that one from there you will find out what is our variable here that delta x or x you can say from there you will find out what is the value at k plus 1th iteration decision variables value. So, I will discuss this you this portion next class in details. Thank you.