 So, let us look at what is happening here geometrically, effectively what we are what is being claimed by this particular by this by this particular equation the boxed equation let us try to understand this, okay. So, so for to understand this we first need a basic idea of what the role of the derivative of a function derivative or the gradient of a function, what does it tell you about the function. So, any function if it is a function of say n variable I can just instead of plotting it in a way where I look at the n variables on the as independent variables and the value of the function in another axis instead of that I can look at the contours of the function, right. So, I can plot a function like this, suppose these are the contours of f0, okay. So, suppose these are the contours of f0, so what does this mean, so this means that you are this is the locus of points x, so a contour like this is a locus of points where f0 is a constant. So, it is all points x such that say f0 of x is equal to c, some constant c and as I vary c I will keep getting different contours, these contours will obviously not intersect because they correspond to different values of c, okay and you may or may not get concentric contours like this that depends on the shape of the function I am just taking for simplicity this is how it that the contours look like this. Now, what is the role of the derivative or the gradient, so if I take a point like this here and I say well the gradient is a vector as long as x itself, so I can draw the gradient with origin shifted here suppose, so this is the vector grad f0 of x, but I have just instead of draw what I have done is I have just shifted the origin of that vector to this point x itself to this particular point, likewise at this point this would I just to for indication I am just drawing the gradient in this sort of way here, if this is my x then this is the point x then this is the gradient, this is the gradient. Now what does this, what does the direction to which the gradient is a vector it is pointing in a certain direction if I look at if I shift it shift its origin to x it is pointing in a certain direction what does that direction tell me about the function, so the first thing is the gradient is always normal to the contour, so the gradient if the function is differentiable then is always normal to the contour, why is that, so contour comprises of points where the function value is a constant, so the gradient and whereas, so effectively what must be happening is that the gradient is you can think of as if in the scalar case is basically the derivative, effectively the derivative along these points has to be along as you travel along this should start vanishing, so if you take the rate of change of the function as you travel around the contour that should vanish, what that means in Rn is that the gradient itself must be orthogonal to the direction in which you are travelling while you are on the contour, so the gradient is always orthogonal to the contour, the gradient is orthogonal to the contour but which direction does it point in, so there are two orthogonal directions, one is a direction here and the other, so for example if this is a point I could talk of this orthogonal direction, I could also talk of this orthogonal direction, so this is the other thing about the gradient, so this direction, the gradient always points in the direction of increase of the function, so the way to see this in this contour plot is to see suppose here C is, let us suppose this C is equal to say 100 and this C inner C here is suppose this one is say 110, so the gradient is pointing inwards means it is pointing towards the direction in which the contour values keep increasing, so if this was reversed if this was not 100 say it was 90, if the function was function values were decreasing as you went into inner and inner contour, then the gradient would have been pointing outwards in this direction, inwards or outwards depends on what happens, what is happening to the function value, you cannot tell that from just looking at the shape of the contour, so the gradient is always pointing in the direction of increase of the function as So now let us go back to this boxed condition, what is this condition saying, it is saying that if you take the derivative of the function F0, it is given as it is a linear combination of the derivatives of the linear combination of the derivatives of the constraint, so what is this, what would that mean, that means the direction in which your, the direction in which F0 was increasing at x star, if you look at the point x star and look at the direction in which F0 was increasing, that is the direction of the gradient, if you look at that direction, that direction is itself a linear combination of the directions of the gradients of the constraint, what does that, what does that further mean, it is satisfying the constraints also yes, so you are on the surface correct, you are on the surface, what you are on the surface and then what does that say, so okay, so let me write it in this sort of way, so once we have linear independence of the derivatives, what we can say is that the tangent plane to this surface or the common surface area, common area in all surfaces, the tangent plane that is the tangent plane the feasible to the feasible region at x star is given by all directions d such that, so it is, sorry, all directions d such that if I take the derivative at x star that times d or equivalently let me write this in terms of the gradient, gradient at x star transpose d, this is equal to 0 for all i and n, so if you take those directions d for which are such that, such that the gradient is that such that they are orthogonal to all the individual gradient, this means if you take the subspace spanned by these gradients, d is orthogonal to that subspace right, so because it is orthogonal to all the individual gradients, so that those d's are your tangent directions okay, now if your f0 of x, if your f0 of x, if your f0 of x lied in one of in that tangent direction, then what would that mean, if the gradient of the objective was in that tangent direction, what would that mean, what it would mean is that you could increase your function while also gently going along the tangent to your surfaces, means there was scope for you to be on the surface and at the same time increase the function further, so there is this surface that is, so you have all these surfaces that are intersecting, the tangent to them is orthogonal to the space to the subspace formed by the gradients of the constraint and your objective gradient also lies in that subspace because it is a linear combination of the gradients of the constraint right, so the direction that you need to go in in order to be on the surface is orthogonal to the direction in which you need to go in to increase the function, so this is the compromise that gets reached here, essentially when you, if you are at a local maximum of this problem, then this has to be true that you cannot both be on continue to be on the surface and also increase the objective value further because if you could increase the objective value and also be on the constraint and stay on the constraint, then the gradient which is the direction in which objective value increases that gradient would have some positive component on the subspace in which you need to be to remain on the surface right, so if you want, if you had to remain on this particular surface and then for that you need to be, you need to travel along the tangent to it, but your gradient is pushing you in this sort of direction orthogonal to that. So the gradient here is a linear combination of the gradient of the objective is a linear combination of the gradient of the constraint, whereas the tangent space is orthogonal to this, to the subspace formed by the linear combination of this constraint right, is this clear? So this is basically the condition that is being expressed here, that these are two orthogonal directions, if there was even a slight component between of the gradient of the objective on to this subspace, then there would have been scope for you to travel further in that direction and you would have potentially gotten a better objective. So that is what this constraints condition effectively says that it is just expressing that the very fact that you are at a local minimum means that this is the very least, local maximum means that this is the very least that must be true, is this clear? So this is the geometric interpretation behind what is happening here. Now can someone, suppose if I ask you a related question, suppose what if I instead of a maximization which is what I have here, I have a maximization problem here written here, I had a minimization problem, how would this change? So I could just replace F0 by a minus F0 and that would correspond to minimizing the function F0, so you are maximizing the negative of F0, F0 gets replaced by minus F0, all the other conditions are not perturbed by this, what I would get is a negative sign sitting out here, I would get a negative sign sitting out here behind next to the gradient here, but then that negative sign I can absorb in my definition of lambda star itself, because this condition simply says there exists some lambda star, so I can absorb that into the lambda star and this the condition would still look the same just that the lambdas would have reversed the sign. But there exists some other lambda star that is all you could make, so this condition would not change even if I had a minimization problem. Now that gives you a clue about what is going on here, it tells you that this is actually a very weak condition, it works for both maximum as well as minimum, so the very fact that you have found lambda star and an x star such that these equations are satisfied does not mean that you have actually got to a local maximum, you could as well have got to a local minimum, so geometrically the same compromise that I just mentioned that you cannot go in the direction of increase of the function because the gradient should not have a component in that direction, likewise if the gradient had a component along that tangent space then the negative gradient would also have a component and you could go in the direction of decrease only. So the same argument can be worked for the minimization also, so this condition tells you that well that there exists such lambda star such that these equations are satisfied is both for local maximum as well as for local minimum. So to decide further if you are at a local minimum or a maximum you need additional information, so for a you need it is not just enough that you cannot you do not have a component along the time that the gradient does not have a component along the tangent space what also matters is whether you have approached the tangent space that point along an increasing curve or a decreasing curve, so that is what we need to check for something analogous to the second derivative condition of scalar optimization has to be invoked. So let me state that now. Suppose x star is a local maximum of your optimization problem which is maximizing f0 of x subject to fi of x equal to alpha i for all i and suppose the gradients of the constraints are linearly independent. Now let lambda star be such that those equations are satisfied. If lambda star be such that this condition is satisfied and define c of x star lambda star as this it is all directions d such that this is that tangent space that I was referring to it is those directions d such that they are orthogonal to all of these gradients. This is the tangent space then if it is a local maximum then if f0 f1 dot dot dot fm are twice continuously differentiable then you can write then the following must be true. You can take W transpose I will explain what this notation is this is another piece of notation. Now here this L here L of this is the function defined as so this is your second order condition which tells you that if you have found some condition which satisfies the previous condition I found a point that satisfies the previous condition that means this boxed it satisfies the constraint and you have it satisfies all this. So then now if it is a local maximum and you know that your functions are twice considered continuously differentiable then this must also hold. What is this? This is looking at the Hessian of this function L. L is formed by taking a linear combination of the objective with the constraints it is formed by taking a linear combination of the objective with the constraints. So let me just make there is a small connection here. So this should be a should be a negative and I should write this. So what you need to look at what you do is look at this function L which is formed by taking the objective function along with the linear combination of the constraints. So you take f0 of x minus lambda 1 f1 lambda 2 f2 dot dot dot minus lambda m fm put all of that together that is called that is called this function L it is it has a name it is called the Lagrangian function. You take the Lagrangian function look at it only in the x space it is a function of both x and lambda you take the Lagrangian function and look at it only in the x space calculate the Hessian of the Lagrangian only with respect to x. Assuming lambda fixed you just think of it as a function of x calculate the Hessian of the Lagrangian with respect to x evaluate that Hessian at x star lambda star alright. And then you look at this thing this looks like so what is this this is w transpose Hessian of the x Hessian of the Lagrangian times w that has to be less than equal to 0 for all w that lie in this tangent space. So which means that the function should have a certain curvature along the tangent space defined by the constraint. The second order derivative of the Lagrangian is telling you how the Lagrangian curves in the x space but then we want it is necessary that this is less than equal to 0 along the direction defined by the tangent space of the constraint. This is clear this is the thing that is this is one way of further reducing the number of further pruning from this set of points that satisfy these boxed constraints boxed equations. So you when you get an optimization problem you will you and you try to solve for this you will you can it is you are looking for an x star and a lambda star that so they will you will you will be able you need to satisfy the constraints and you need to satisfy this boxed equation. But then this will could still get you to a local minimum or a local maximum. If you are looking for a local maximum then you will further check if your point satisfies this condition. But for you to check this you need that your functions are twice continuously different. Like so this is a necessary condition the sufficient condition meaning that if you want to be sure that it is a local minimum local maximum then what you need to do is the same thing here gets replaced by a strict inequality for all w in this set c star x of c of x star lambda star w not equal to 0. So what this means is that this your what we are referring to is that this Hessian should be negative semi definite along a certain subspace only not all not not necessarily along the full space or we just mean that it is negative semi definite along that some particular subject the sufficient this is a sufficient condition. So if this so if this is less than 0 for all w in this then is a local maximum.