 So, let us continue where we left off in the previous class, we were discussing this problem where we had a ellipse in two dimensions. So, this is a problem in R2, they let the horizontal axis be denoted by x, the vertical axis is denoted by y and so we were looking at points that were lying on the surface of this ellipse, that is the black trajectory that is shown here, and a point like x star, y star which lies on the surface of the ellipse and we wanted to find a rectangle with whose endpoints are whose corner points are on the ellipse that has the maximum area and for simplicity we said let us take the rectangle to be aligned with the coordinate axis, so the axis of the rectangle are aligned with the coordinate axis. So, the mathematical problem we posed was we have f0 of x, y which was the area of a rectangle whose corner point here is say x, y that is that you can see is can be given by 4xy and x, y should lie on this ellipse that means x, y should satisfy the equation f1 of x, y equals alpha that is this equation. So, f1 of x, y equals alpha where f1 is the ellipse equation, so x square by a square plus y square by b square, so any x, y that lies on the ellipse basically satisfies x square by a square plus y square by b square equals alpha alright. So, over all such points that satisfy this equation we wanted to find the x, y that maximizes the area, so our objective was to maximize f0 of x, y over x, y such that or subject to the requirement that f1 of x, y equals alpha and now assuming x star, y star is a local maximum, so if x star, y star is a local maximum we said that then it is necessary that this equation holds. The equation here says f0 x evaluated at x star, y star minus f0, y evaluated at x star, y star times f1, y evaluated at x star, y star inverse, so this is divided by f1, y of x star, y star times f1 x of x star, y star this must be equal to 0. Now the assumption and we said we will make the assumption here that x star, that y star is not equal to 0, so consequently f1, y of x star, y star was not 0, it is okay, it is legal to divide by f1, y of x star, y star alright. So, this, so we derived, we ended the class at this step that where we derived this condition, so now this condition can be interpreted in another way, so let me take off from here and introduce for you, okay before we do that actually let us evaluate also what the solution turns out to be, so this it will turn out if you solve this equation, now you get that x star, y star is a local maximum then this must hold, you can solve this equation, you get that x star, y star the values are something like this, so you get two solutions, one of them is alpha by 2 square root of alpha by 2, you get two solutions, one is square root of alpha by 2, times a, b, the other is negative of square root of alpha by 2, a, a, b, okay, so these are the two solutions that you get and for both of them the area is actually the same, the area, the optimal area let us call that m of alpha which is the optimal value that turns out to be equal to 2 alpha a, b, so now you get these two solutions the reason for that, but you can check that both of them are local maxima essentially what they are corresponding to are a point x star, y star here and its reflection on this side, this other point here which is negative x star, negative y star, they are both giving you, they can both either of them can be taken as the solution, okay. So now let us write this condition, this red equation here in a slightly different way, let us introduce this notation, let us call, let lambda star be denote, denote this quantity which is f0, y of x star, y star, f1, y of x star, y star inverse, let lambda star be denote this, then in that case I can write this equation, the red equation in the following form, I can simply say that f0, x, so I will drop the x star, y star for simplicity, f0, x is equal to lambda star, lambda star times f1, x and this same equation that I have written here, the blue equation that I have written here, this same equation I can just, I can take f1, y of x star, y star inverse to the other side and write that in this form, I get that will give me f0, y equals lambda star of f1, y. In other words, what we have, we can put these two together and write it like this, if I take the gradient of f0 and that is at x star, y star, at x star, y star that is equal to the gradient of f1 at x star, y star times lambda star. So, the earlier red equation which is kind of complicated looking can be simplified in this sort of form if we introduce this other notation, other quantity lambda star. Now, lambda star has an important, you can see what this thing is doing, it is basically giving you an equation, it is giving you a condition that says that the gradient of the objective evaluated at x star, y star is actually just a scalar multiple lambda star of the gradient of the constraint evaluated at x star, y star. So, in particular the gradients are actually collinear, one is just a scalar multiple of the other. So, this condition is what we will generalize now, the lambda star here has a name, it is called a Lagrange multiplier. So, what we can do is we can, the way we instead of writing a complicated equation like the earlier red one, what we will do is we will introduce this additional variable called the Lagrange multiplier and write another equation that is in terms of x star, y star and the Lagrange multiplier. So, that is what we will do now. But before I get to doing this in more generality, I want you to carefully understand what exactly we accomplished when we got, when we solved this particular problem in this way. So, and then that will also give you motivation for why we should be considering the Lagrange multiplier and so on. So, how did we go about solving this problem? We said we have a way of solving, we have a condition which says that which is a necessary condition to get to a solution for a point to be a solution of an optimization problem over an open set. So, when you are maximizing or minimizing a function over an open set, we know how to address that problem. We said we took this particular problem which was not over an open set. And we said let us let us address this in some way. So, what did we do? What were the steps we followed? So, the steps we followed was the first step was we eliminated the y variable. So, we eliminated one of these variables. And this was how why were we able to do this? We were able to do this thanks to the implicit function. So, we will eliminate this we eliminated this y thanks to the implicit function theorem that then gave us as an optimization over x was an optimization over x on an open set. Now, when we applied the implicit function theorem, remember what did we use? We used that around this point x star y star here, we were always on the surface of this particular ellipse. So, could we have applied the implicit function theorem if we did not have this particular feasible region but rather a feasible region that looked like this you have the ellipse and also the shell of the ellipse and also the interior. If you have a problem like this, where you have both the shell and the interior right, you it is not possible to say that you will be able to eliminate one variable in term and get it in terms of the other. You can do it provided you are constrained to be on that surface because that surface gives you this additional equation which lets you solve for one variable in terms of the other. But if you can be both on the surface as well as inside, there is no definite equation that you can involve. So, the kind of constraint for which we can do all these steps that we followed are those constraints where we are optimizing over surfaces. The surfaces take the forms form h of x equal to 0 or in this case f1 of x, y equal to 0. So, this works for problems where we are optimizing optimization over equality constraints. So, when you are optimizing a differentiable function with only equality constraints, then we are effectively optimizing over a surface and then that the equations of that define that surface that you eliminate one variable when get you in terms of the other right. So, if you had both the surface and its interior, the shell of the watermelon as well as the pulp of it inside all of that if you have, then you cannot invoke, you cannot do this. So, this elimination of y also requires this. We also later use that later we again use the same thing. We also after writing everything in terms of x, we also saw that we took the derivative of the constraint with respect to x, you can go back and check and put that equal to 0. That also is because we are always on the surface. So, that again requires that you have only an equality constraint. So, the thing that we are, what we have done here is basically we have, if we can replicate all these kind of steps for general problems with equality constraints, we should be able to get a general purpose result for how to solve those sort of optimizations. At the last step, what we got was we solved for x star, y star. Some of you may be wondering what how did I get this? Well, this in this case, we solved for x star, y star and we could get that, I could get that in closed form. It is not very hard, but in general you would have to do this numerically. So, the point is to start the all the effort in optimizations is to start with a problem that is written in this sort of form and reduce it to something that looks like this, where you have now just an equation that needs to be solved and then you have some ready machinery to get to solve that equation. So, all the effort is to see how problems that are posed in as this sort of in a decision making sort of form where you have to maximize some goals subject to some constraints and how do you get them down to a bunch of equations that need to be satisfied. So, now I will just state for you the general theorem for when you have an optimization problem with equality constraints. So, rather than prove it, I will just give you intuition for how this relates to the previous example that we studied. So, let me just state the theorem. So, suppose let f 0 f 1 dot dot dot f m be continuously differentiable functions. Let x star be local optimal solution this problem I am writing this in the maximization form just because of earlier problem was in maximization form. So, f 0 x is what you need to maximize over x and you have to satisfy these subject to the requirement that all these equalities must hold. So, f i of x equals alpha i for all i from 1 to m. So, there are m different equality constraints f 1 of x equal to alpha 1 f 2 of x equal to alpha 2 etcetera all of them should hold. So, we are optimizing over the common region that in which all of them are satisfied over that region we are trying to find the x that maximizes f 0 of x. Now, the theorem says this following suppose you look at these derivatives suppose at x star the derivatives f i x of x star for i equal to 1 to m not 0 to m i equal to 1 to m. So, you are looking at the derivatives corresponding to the constraints here evaluated at x star. So, these are can someone tell me what length vectors are these if x is in R n. So, x belong x we are optimizing in R n what length vectors are these f i of x these f i x of x star they are exactly. So, their n length vectors all of them row vectors f i x evaluated at x star and there are m of these row vectors. Now, so the n length vectors m of them suppose that these derivatives. So, therefore they form the m of these vectors suppose these derivatives are linearly independent. Suppose these are linearly independent. So, the one at x star are so the derivatives evaluated at x star these are linearly independent then there exists a vector lambda star and let us write this as a column vector lambda star is written as lambda 1 star dot dot dot lambda m star. So, this vector is a m length vector vector in R m. So, there is a collection its components it has one component for each constraint that is there in the problem there are m constraints in the problem. So, there is one component for each constraint. So, lambda star is in R m such that we have we have this. So, I look at f 0 x evaluated at x star that means, the derivative of the objective evaluated at x star that can be written as lambda 1 star times f 1 x evaluated at x star plus lambda 2 star times f 2 x evaluated at x star plus dot dot dot lambda m star f m x evaluated at x star. So, you can see what we have done here we said we started with an abstract decision problem like this which says maximize this function subject to these requirements and what we have said is that well if there is if x star is a local solution then you must be able to satisfy these equation. What is this how many equations are there here? So, this is one equation but how one vector equation right how many components are we talking of n of them because these are all derivatives right. So, they are n length vectors. So, there are effectively n scalar equations here how many unknowns do we have here? What are our unknowns? We x star is an unknown which we want to find in addition to that lambda is also an unknown right. So, x star has is n length. So, there are n unknowns in x star m unknowns in lambda because lambda is of length m right. So, there are m plus n unknowns you have you have n equations that come from here you need some more equations where would they where are they coming from? So, the other question there are m additional equations that we have which comes from the fact that x star must be feasible x star must be must lie in the must be feasible for the optimization problem. So, fi of x star should be equal to alpha i for all i so that gives you additionally m equation. So, all of these put together help you solve the optimization problem. So, you have these equations and you have these equations. So, there is this n plus m equations in n plus m unknowns. So, what we have done is basically then taken a problem like this and taken a problem this kind of abstract problem and said that this is it is necessary that x star satisfies x star satisfies a bunch of equations. But they need to be written in terms of additional variables not just x star not x star alone because now constraints are involved you need to introduce a few additional variables and you need exactly one additional variable per constraint. Yes. So, the question here is what if the number of constraints are very large? So, what if m is very large? So, I will I will come I will discuss this matter see if m is very large then you are intersect you are on the on several surfaces at once. In fact, eventually you would the very fact that you have to satisfy m of these equations that itself can end up determining constraining x so much that you will probably even get just one solution. If m becomes n for example, that itself determines what which point you are on what your feasible region is. So, if that can happen, but usually the the then that is a case of a poorly formulated problem because you have constrained it so much that now there is not nothing to search really because the constraints there is only one alternative for you effectively. So, that sort of problem can occur but it is not a very interesting you know interesting thing to study because it is probably not well formulated to begin with. Let me so again these here are the lambdas here are again called the lambda stars these are these are called Lagrange multipliers. Now while we are at this let us let us also just think understand a bit about the role of all these assumptions here for that you see how this is backward compatible with what we have seen so far in that example. So, here the key assumption I made here is that of course these are all continuously differentiable that is one assumption that all these of the objective constraints etc these functions are continuously differentiable and then I said that if I look at the derivatives of the constraint look at the derivatives of the constraints they are linearly independent the derivatives of the constraints are all linearly independent. Now that is akin to our previous assumption here that we had assumed that y star was remember y star was assumed as non-zero y star was taken as not zero we were considered because the reason was there we said we do not need to look at such points. But what y star non-zero ensured was that this f1 y at evaluated at x star y star is not zero. The analogous thing that we need here in this more general setup is that the constraint derivatives are all are put together are linearly independent. Now what that ensures is geometrically what that ensures is when you are optimizing over all these surfaces the intersection of all of these surfaces if the constraint derivatives are linearly independent then that lets you define what is the in terms of the derivatives the tangent surface to all of these surfaces. So, this will become more and more evident as you go later into the course basically the tangent surface gets defined and once you have the tangent surface defined in terms of that you can we can write out conditions like this.