 optimality condition for a function of two variable case, just we will recap these things what is this one. We have a function, which is a function of two variables and x star x 1 star x 2 star are the optimal point, around this optimal point we have given a perturbation with x 1 will perturb by epsilon 1 x 2 perturb by epsilon 2, that perturbation is very small. So, this if you do the Taylor series expansion, we have shown that I have seen that this the function value at the optimal point and this is the Taylor series expansion first order terms and second order terms. And the higher order terms all other higher order terms are generated by R, this R is sufficiently small compared to the preceding terms when the perturbation is near around very close to the x 1 star and x 2 star. And if you do the and if you bring it the this in the right hand side left hand side, this is the change in function value from the optimum point we have given a perturbation and what is the change in function value, that change in function value is nothing but a gradient of that function multiplied by that perturbation vector. And if you look at this expression, we can see that this term we cannot say whether this is a scalar term, but we cannot say anything about the whether it will be positive or negative, because epsilon can be positive and negative epsilon 1 epsilon 2. So, this term we have assigned to 0 and it is the necessary condition for this one and remaining terms is that one that what is a quadratic term, this we can easily write it into quadratic form, these three terms we can write it in quadratic form and which we have written like this way. If you see this we have written this term is assigned to 0 and this is half, this we can say it is a epsilon transpose epsilon is a vector which components are epsilon 1 epsilon 2. Then this is the Hessian matrix second derivative of the functions evaluate at x 1 is equal to x star x 2 is equal to x star x 2 is equal to x 2 star. Then this is the quadratic form and ultimately this is nothing, but a you can write it this is nothing, but a gradient this is the what is called Hessian matrix in the second gradient you can write gradient of a vector f of x gradient of a second derivative of this function agree. So, f of this and evaluate this value x 1 is equal to x star and x 2 is equal to x 2 star and this we can consider this can be neglected that one. So, this function value will be because we have assumed that x 1 star x 2 star is the optimal point. So, this function value if it is a positive this positive indicates that that the point x 1 star x 2 star is a optimal point in other side of this perturbations optimal point we got it. That means, function as this is optimal value so this we can say if we consider this if we consider h is equal to del square f of x evaluated at x is equal to x 1 is equal to x 1 star and x 2 is equal to x 2 star and depending upon the matrix value at this h. If h is a positive definite matrix then we can say that 1 by 2 factorial 2 epsilon transpose into h. That means, we can write it this that epsilon this equal to f of x 1 star plus epsilon comma x 2 star plus epsilon 2 minus this f of x 1 star comma x 2 star this will be less than if it is if we write it this value is greater than 0 this indicates this will be greater than provided h is greater than 0 means h is positive definite matrix where h is h is nothing but a the derivative of the gradient or Hessian matrix evaluate at x is equal to x 1 star and x 2 is equal to x 2 star. If this value is positive definite matrix this indicates that function value we have reached to the minimum value of this function. If then f again f x 1 star plus epsilon 1 plus x 2 star plus epsilon 2 minus x 1 star x 2 star this if it is less than 0 if this is the less than 0 this indicates this will be less than 0 provided h is negative definite matrix. This indicates we got the what is called maximum value of the function at x is equal to x star means x 1 is equal to x 1 star and x 2 is x 2 star this is the maximum value and this condition is for minimum value of the function value of f x maximum value of f x. So, now we can restate the our problem that what is called the necessary and sufficient condition like this way that theorem the necessary and sufficient condition the theorem the necessary sufficient condition for function to be sufficient conditions for local minimum or maximum is like this way or necessary condition is like this way necessary condition first assign the gradient of this vector that is what is called gradient of this vector del f of del x with respect to x you assign this is equal to 0. This is our necessary condition where x is equal to in general now write it dimension in n cross 1 which is x 1 x 2 dot x n. So, then sufficient condition condition is our Hessian matrix that h that is equal to del f square del x x is equal to x 1 x star if sufficient if this quantity is greater than 0 means positive definite. Then function is reached is minimum value the function is the function has the minimum value at x is equal to x star you can put it here at put at extra we are finding out this. Similarly, if h is less than 0 means negative definite this is negative definite this implies the function value is obtained at x is equal to x star is the minimum value maximum value for some function value is maximum function value is maximum at x is equal to x star. So, this is the necessary and sufficient condition for a function of n variables x is equal to dimension n cross 1. So, this is the necessary and sufficient condition for this one. So, let us take one simple example quickly we just take this example. Suppose you are asked to find out the optimum value of this function optimum value of the function either it is a minimum or maximum value of the function. So, the function is given at of x which is a in our example it is a 2 by 2 means it is a function of x 1 and x 2 that is x 1 square plus 4 x 1 x 2 plus 4 x 2 square minus 4 x 1 plus 2 x 2 plus 16. So, our necessary condition according to the theorem we proved necessary condition means gradient of f of x is equal to 0 what is the gradient of this one del f del x 1 del x 1 and del f del x 2 this is the gradient of this is equal to 0. So, if you differentiate this with respect to x 1 it is coming 2 x 1 plus 4 x 2 minus 4 this and if you differentiate the second part f with respect to x 2 then it will be 4 x 1 plus 8 x 2 plus 2 is equal to 0 and 0. Solving this set of equation 2 equation algebraic equation if you solve solving the equation solving we get the stationary points or we get the stationary point x 1 is equal to 2.5. Let us call we consider x 1 star x 2 is equal to minus 1.5. Now, we have to see whether the Hessian matrix value of this one whether it is a positive definite matrix or negative definite matrix or negative semi definite matrix or positive semi definite matrix. So, if you find out the value of the Hessian matrix. So, this is the necessary condition we got the stationary point here at this point the function may be minimum maximum positive semi definite negative semi definite all these things. So, we have to further check with the sufficient conditions sufficient condition. So, the Hessian matrix or the differentiate the gradient of a vector the gradient of the function gradient of gradient of gradient with respect to x once again or this is the Hessian matrix you find out the value x is equal to x star. Differentiate the gradient once again with respect to x which is nothing but a Hessian matrix this value you find out if you find out this value we have already see this one. Now, what is our Hessian matrix if you see this one this is nothing but a del square f of x del x 1 square del square f of x this is already we have written so many times. So, just without explaining in details I am writing the expression for the Hessian matrix. So, this value you evaluate x is equal to x star and our case x 1 is equal to if you see this is nothing but a or x 1 is equal to x 1 star which you got it 2.5 x 2 is equal to x 2 star we got minus 1.5. So, put these values in this one and these values are if you see this if you differentiate already we have differentiated in the gradient of a function we have find out you differentiate this with respect to x 1 again you differentiate this with respect to x 2. So, this is a if you differentiate this one that is x 2 this is missing that subscript this is x 2. So, if you do this one and put this limit it will come 4 4 4 and 8 and let us call this matrix is our h. Now, you check it whether this is a positive definite matrix or not since the diagonal elements are all positive. So, you can proceed for the positive definite matrix first. So, leading principal minors as you know the leading principal minor of order 1 is 4 which is greater than 0 is 4 which is greater than 0 leading principal minor leading principal minor of order 2 is matrix itself the way we are calculating leading principal minors. That means, determinant is determinant of 4 4 4 8 that determinant of this one which is equal to 32 minus 16 will be 16 which is greater than 0. Therefore, our h is positive definite meaning implies why that function is attained is minimum value at x 1 is equal to x 1 star means 2.5 and x 2 is equal to x 2 star 1.5 at this point the function value attained is minimum value of the function since this is a positive definite function positive definite matrix. So, we know at this moment that our conclusion is now f of x has a minimum value at x is equal to x star which is equal to 2.5 minus 1.5 at this point and its value is if you put this value in that expression the value of the function you will get f of x at x is equal to x star means is equal to 2.5 minus 1.5 this is the stationary point we got it if you see earlier. So, that value will get 9.5 please check it. So, we know at this moment how to find out the multi function which is a more than one variable or multi variable functions of dimension n how to find out its local optimum and local optimum means local minimum and local maximum value of the function. So, before we proceed further first we see that what is called that if you have a quadratic form is there if quadratic form let us call x transpose p x it is n variables are there this matrix immediately the dimension is n cross n and this is a scalar form. So, if you differentiate and this is a scalar function of n variables x 1 x 2 dot dot x n. So, if you differentiate this one with respect to x that differential let us call this I am denoting by f of x is equal to that one. So, if you differentiate this a scalar function with respect to a vector x that results you know this is nothing but a del f of x del x 1 similarly del f of x del x 2 dot dot del f of x del x n. So, the results you can easily verify it like writing the details expression of this one and expand it write the polynomial quadratic form with in terms of x 1 x 2 and p assume a matrix of n by n matrix with a assume that p is a symmetric matrix then only p is a symmetric matrix then results is p into twice p into x n dimension is that assuming that p is a symmetric matrix. So, this you expand it in terms of x 1 x 2 just product it and p elements you consider the p 1 1 p 1 2 dot dot p 1 n and second row elements are p 2 1 in place of p 2 1 you write p 1 2 because this is a symmetric matrix then differentiate each element each f of x in terms of x 1 x 2 ultimately the results will come this one. So, this is a quadratic form where p is a symmetric matrix if you differentiate you that function with respect to it the results is 2 p x next is your the derivative of linear function this is the quadratic function the derivative of derivative of linear function f of x is equal to a 1 x 1 a 2 x 2 plus dot dot a n x n and this is called linear function even there may be a constant term c. So, when c term is there constant term we call it is a affine function will come details in later. So, let us call for the time being this is. So, you differentiate this thing with respect to x then what is the results that is straight way you can find out that is f of x is a scalar function, but which is a linear now you differentiate with respect to x vector. So, that gradient of that x you are finding out again if you do this one I am not writing that expression you know that the gradient of function each element you know how to do this will come a cross n. So, what is a n this expression if you see this expression I can write into row vector and column vector form dot dot a n again this is row vector multiplied by x 1 x 2 dot dot x n which I can write it is equal to a transpose x x is a n cross 1 and a is where a is I can write it where a is is equal to your a 1 a 2 dot dot a n whose dimension is n cross 1. So, in short now I can if you have a linear function is there if you differentiate with respect to x is a vector x 1 x 2 then results is nothing, but a that means function which is expressed a transpose x the inner product of two vectors this a transpose x. If you differentiate with respect to x then results will be not a transpose it will be a and that you can easily verify this one. So, keeping this two results in mind then we can what is called proceed further that unconstrained optimization problem in the beginning of the class I think like first lecture we have discussed what do you mean by the constrained optimization problem unconstrained optimization problem that mean objective function is given and there is no constrained subject to any conditions or and as well as there is no side constraints are there. So, this is called unconstrained optimization problem. So, you are now discussing is a unconstrained optimization problem, but using some numerical techniques numerical techniques, but the numerical techniques is say it is a iterative process iterative the objective of iterative process is like this way you guess some value of x then find out the function value f of x with our problem is minimization value minimization of a function. So, you take a initial value of a that variable x let us call x superscript of 0 you take this one and now next iteration you got the improve value of x where x is a n variables and if the function value function value is improving means if our problem is minimization the function value is decreasing. That means slowly we are each iteration we are going to approach to minimum value of the functions. So, our iterative process you can see iterative process is nothing but a the objective of iterative process of the iterative process iterative optimization process iterative right optimizations. So, in process is to reach a minimum value of the function minimum value of the function minimum value or you can write minimum value for the cost function or objective function function f of x which is n variables. Suppose that suppose that at kth iteration kth iteration we have not reached to the minimum value of the function. Next k plus 1th iteration of our value that k plus 1th iteration the function value if it is a it should decrease it. Then we are saying that we are approaching to slowly if it is each iteration value is decreasing from the previous iteration value that means we are approaching to the our minimum point minimum value of the functions. So, mathematically we can just write it here f of x f superscript f superscript indicates the iterative kth kth iteration the value of the function kth iteration the value of the variable not function value of the variable. When you put f f x of k it indicates that function value at kth iteration. So, x superscript x superscript k indicates the value of the decision variable at kth instant if you put the values of x in the function expression. Then you will get the value of the function at kth iteration value of the function. So, if we assume at kth iteration the function has not reached to the optimum value then k plus 1th iteration the function value will be less than this because we are approaching to the minimum value. So, we will can write it at k plus 1th iteration value is less than this value. So, I can now write it what is k plus 1th iteration value I it is the value from kth iteration value plus some perturbation about kth iteration values kth iteration the decision variable at kth iteration what is this variable from there there is a perturbation. Let us call that perturbation I will writing is del k lambda k d k which I am writing delta x. So, this value and at kth iteration the decision variable value is at x to the power of k and function value is this one at k plus 1th iteration. That means at kth iteration what is the value of decision variable value value from there there is a some perturbation is delta x and that at that at that value at that what is called iteration what is the function value f of this one and that value if it is less than this quantity. That means we are approaching towards the minimum point each iteration should decrease its value. So, lambda is a scalar quantity whose values is greater than 0 and d k naturally you can f the x superscript k that decision variable I am adding with a another vector where lambda is scalar quantity d k must be vector. So, that d k is called the search direction vector I mean we are looking for that means from kth iteration the decision variable x k from there we are moving in such a direction such that the function value is decreased at k plus 1th iteration function value should decrease and we have to move in such a direction and that direction I am denoted by d k. So, if you do this one Taylor series expansion because I told you that delta x is the perturbation from the kth iteration that decision variable decision variable value at kth iteration from there there is a perturbation is delta x. So, if you do the Taylor series expansion up to first order then we will see f of x of k this plus gradient of that vector f transpose of x k and put the value x is equal to x superscript kth iteration value because I am doing the Taylor series expansion around this point where kth iteration decision variable values from there into delta k sorry lambda k into d k. So, that another what is called higher order terms are there I have neglected because I am considering that delta x is very very small considering that one and that value if it is less than this one then we are approaching in the right direction. So, that function value each iteration when iteration from k to k plus 1th iteration the function value is decreasing. So, this now you bring it in this side this equation if you bring it this one this function x of superscript k minus f of superscript of k this plus gradient of this function of f f x at x is equal to x superscript k into lambda k d k must be less than 0 because this part I have taken in this and this cancel. So, this quantity should be less than 0 this is a scalar quantity see this is a means rho vector lambda is a scalar quantity which value is greater than 0 in a positive quantity and d k is a vector and that vector is a column vector. So, you have to select d k in such a way that product because lambda k is a positive quantity. So, it will not affect that value of what is this quantity less than 0 it will not affect. So, we can write therefore, gradient of this one f of x x is equal to x superscript of k into d k must be less than 0 negative quantity. So, there is a lot of different choices of d k is there may exist one of the choices is obvious that it will be a negative quantity that this product will be negative quantity when we select or choose the d k value is equal to minus that f of this value. When d k value if you select that value it will be a gradient transpose into that gradient with preceded. So, this product this into this or you could write it x is equal to x k. So, this into this is a norm of a vector and that is always norm means distance of the vector distance is a positive quantity and preceded with a minus. So, if I put it here this choice then it will coming like this way you see del f this x is equal to what is called x superscript k a iteration value multiplied by delta f of x preceded because this multiplied by delta. So, this is we call this we have selected as a d k. So, this and put this value x is equal to x star and this value always less than 0 because this is nothing but a f transpose of x into this quantity x is equal to x star at superscript k into delta f of x x is equal to x star and this quantity is less than 0. If x if y is a let us call y is a vector n cross n then if you write it y transpose y is nothing but a scalar quantity and this scalar quantity is always greater than 0. This means if you see this one component y is nothing but a if you wrote this one if you just multiply it is nothing but a y 1 square for y 2 square plus dot dot y n square is greater than 0 where y is a vector of dimension n cross n is elements are y 1 y 2 dot dot y n. So, this is and physically it is nothing but a distance square agree this is nothing but a distance square from the origin of this vector. So, this mathematically is people are writing like this way y norm of this 2 is nothing but a this is the distance square is nothing but a square root of y transpose y agree. So, this quantity if you see this quantity is a always positive quantity and p c d with minus. So, it is always this product of this one p c d with minus it will be negative. So, if this condition is satisfied in other words if you consider the word search direction d k with reverse sign of gradient of a that function agree. Then the function value from k th iteration to k plus 1 iteration the function value will decrease and you have to select that d k the direction of this search vector direction you have to choose that way. So, what is our choice of our search back search direction d k will be a minus delta f of x that is our choice. So, agree at d k because at k th iteration. So, you write it this one k. So, d k is I am writing once again. So, minus gradient of find out the gradient at k th iteration agree. So, that one this is the direction search vector direction you have to move next to go from x to of superscript k my k th point to k plus 1 1th point you have to move in such a way. So, that function value is decrease when you move from k th point to k plus 1th point and you have to move like this way because function is known gradient is known with sign negative sign reverse sign of that one. So, this way you have to move it. So, now our so we will discuss keeping this thing in mind that how to select the search vector or direction vector. Then we can proceed that how to solve a optimization problems optimization problem of function which is a n variable function by using this iterative process. So, our next is we can say the steepest descent method and this is based on this what is called that concept this concept will be used here. So, let us consider how we have a function f of x and x is a n variable the function f is a function of n variable x 1 x 2 dot dot x n agree. So, is to be minimized that is our problem. So, you consider we have a sequence of point x of 0 x of 1 x of 2 dot dot x of k this is the we have a sequence of point is generated how it is generated using the following expression following expression. Now, look at this expression this one our basic that concept what we consider is this one that means our problem is minimization. Now if we are in kth point agree we move to k plus 1th point in such a way in such a direction so that the function value at kth iteration at kth point the function value at kth point and the function value at k plus 1th point that difference will be negative. So, that is the basic with this thing in keeping mind I can write a x k plus 1th iteration that x of k and which direction from kth iteration and from kth point which direction I have to move in d k direction and d k I know what is d k minus of gradient of that function this function. So, this is plus lambda k into d k if you move in this direction let us call you have a point is x k x superscript a you move in the direction d k and d k was gradient of that vector that function f of x with negative sign you move that way that so you will get it x superscript of k. Then this value at this value the function value will be less than what is the value of the function at kth iteration. So, this is the expression and this I told you this value is greater than 0 positive quantity and this if you can choose the direction vector in such a way for all values of k this is true and also lambda k we have chosen greater than 0 then ultimately we will reach to the what is called minimum value of the function. So, this method is called this method this method is called is called on known as the gradient method because it is the whole this method that we are moving in such a direction the function value is decrease from one point to another point. So, this is this method is called gradient method with predetermined mine step size lambda k. For each iteration for each iteration lambda k is predetermined and it is kept constant throughout the iterative process for each iteration. So, that is is gradient method. Now, steepest descent method is just extension of gradient method in this case we have fixed the lambda k which is nothing but a step size we will move in decay direction, but what is the step size we will move it. Now, in steepest descent method the lambda k is optimized lambda k means step size is optimized at each iteration k is equal to let us call 0, k is equal to 1, k is equal to k is equal to 3 in each iteration lambda k is optimized steps step size of this one is optimized. So, this iteration this is optimized and then it is called it is a steepest descent method the difference between this and this only this is the basic expression. So, this is this one and this is the search direction search direction and the gradient method the lambda k is predetermined step size is predetermined and each iteration it is taking the same value. Whereas, in the steepest descent method the lambda k that step size from one step one iteration to another iteration is optimized and it is used to find out the value of decision variable at k plus one at instant this is the only difference. So, let us call in gradient method one is what is called is a very simple gradient method is very simple no doubt, but convergence is very slow gradient method is slow very slow convergence, but easy to implement agree, but whereas, in steepest descent method it is simple, but little competition burden is increased when we are going to find out the step size of the lambda k optimally. So, that little competition burden is there, but convergence is first because we are each iteration we are finding out the what is called step size optimal step size of lambda k in order to get the function value as minimum as possible. Because when you put this value in the function it is a function of only single variable lambda k because x of superscript k is known to us d k is known to us d k is known to us. So, if you put this value that f of x k plus one value in this expression and in this expression it is a function of lambda k only and if it is a function is a that is a function is only a single variable case. We know how to find out the optimum value of function f of f which is a function of lambda k only we know how to find out is minimum value of this function. So, that way the convergence concept convergence is more faster than the gradient method. So, and you can see this one from x that if you are at kth point and if you move in the search direction in the proper direction search direction is slight perturbation in the in that direction if you move it there is a lot of improvement in the function value. Minimum function value is decreasing agree. So, a vector which satisfy a vector d k so our conclusion final conclusion is a vector d k that satisfies this condition what this condition delta f what is called gradient transform of this superscript k into d k if it is less than this 0 if a vector that satisfy this one agree is called direction of the direction of the direction of the direction of descent for the cost function. In short if this condition is satisfied then if you move from x k to x k superscript k plus 1 the function value will decrease at k plus k superscript k plus 1 is instant the function value will decrease from what is called function value at kth iteration function value. So, this is the condition this is the important condition. So, now question arise how to find out the optimum step size of lambda k lambda k is equal to lambda k star that means optimal step size while we will go from k plus 1 to well go from k to k plus 1 point iteration. Then what is the optimum size of that one that means that one what is the optimum size of that one lambda k when you move from kth iteration to k plus 1th iteration while we will update k plus 1 decision variables how to move this one. So, let us see this one. So, let us call it is a f of k plus lambda k we move from kth iteration to kth iteration d k. So, this nearly equal to by Taylor series expansion I can write f of superscript k this plus f transpose of superscript k of this gradient transpose this and what is this one I am just writing Taylor series expansion this is lambda k into d k this is a delta x you can say delta x simply then half lambda k and delta x transpose lambda k d k whole transpose. Then what is called gradient that means Hessian matrix differentiation of gradient of function that function with respect to x once again. So, that is that one that is why at what point kth point value this and then lambda k d k and then lambda k d k and I neglected the higher terms of this one. So, now you see this one this is known to us if you see this quantity is known to us known this quantity also known this quantity also known this is also known and this is also known this whole thing is known except unknown is here unknown lambda k lambda k. Now, this is if you see this one in this the function expression if you put it that x k plus 1 that k plus 1th iteration the value of x if I put it here the function is that function objective function or the cost function is a function of lambda k only you see in this expression is the lambda k only lambda k. So, now I what should be the choice of lambda k. So, that the function value is minimum for this. So, you have to differentiate this with respect to x since it is a function of single variable I will write a d f which is a x of next page I am writing this is nothing, but a d f x of superscript k plus lambda k d k that differentiate with respect to lambda k is assigned 0 this is the necessary condition for this one. So, I have to find out the function value is minimum for what choice of lambda this is the necessary conditions and if you solve this one you will get the lab value of lambda. So, if you differentiate this one you will see if you differentiate this one with respect to lambda this is a known constant. So, this will be 0 and this is what lambda k differentiating with respect to lambda k this is the this vector is a row vector this is a column vector this is scalar quantity. So, if differentiate this one you will get same thing here. So, delta f transpose of f of k whole d k this is the first term it will come and this one you see lambda k is a scalar quantity there are two lambda k's lambda square will come and it will be d k transpose the gradient that is your Hessian matrix then d k. So, it will be coming plus half 2 I am differentiating with respect to lambda. So, it is a lambda k lambda square lambda k square. So, it is 2 is coming here then it is a d k transpose then Hessian matrix or second derivative of the partial derivative of the function. So, this f superscript k this into d k is equal to 0. So, this is our necessary conditions then lambda k you find out lambda k is equal to if you just do this one this this cancel lambda k is equal to you will get this is a scalar quantity mind it x transpose p x the same thing scalar quantity and this is a also scalar. So, if you take it this that side is a minus gradient of gradient transpose x of k d k divided by I can divide because it is a scalar quantity divided by d k transpose delta f square of x k into d k. So, you see this is scalar quantity this is scalar quantity and this quantity the product of this one I told you lambda k is greater than 0. So, this product if you see this product is a condition for descent direction the function value will decrease from kth point to k plus one point if this product is negative. So, negative negative here is negative. So, positive lambda k is positive quantity. So, this is the choice of lambda k for which function value will decrease if you take the lambda k value some other value other than this one function value will decrease, but if you select this one the function value will decrease as much as possible. So, this is the choice of optimum this is the optimum step size size. So, this now we can check it whether the function value as optimum just check whether this value is del square f x of superscript k plus lambda k d k differentiate with respect to d lambda k square this value putting lambda k is equal to lambda k star that is we call this is equal to lambda k star. If this value is greater than 0 this is the scalar quantity is greater than 0 that means function is value is decreasing as minimum as possible for the choice of lambda k. So, we will stop it today next we will continue with this one we will write the what is called algorithm then how to implement this one in digital computer for this one. Thank you.