 We are going to talk about going back to our topic we started with which is that we want to minimize optimization of a function over a open set, okay. So we are going to talk of minimizing a function f over a set x, so over a set s subject to s and f, so my feasible region f, this is my feasible region and let us and this is an some open set and I am going to assume that f is differentiable, so let me write the theorem. Let x star in s an optimal solution of the optimization problem minimize the function f subject to x in s, let f be differentiable s, set of Rn be open, then we must have this, if you look at the derivative of f at with respect to x evaluated at x star, evaluated at x star and this derivative must be equal to 0, that is the conclusion, let me make sure you understand what this theorem is saying, it is saying that consider this sort of optimization problem and suppose there is an optimal solution, it is not saying there is an optimal solution, it is saying suppose there is an optimal solution, the optimization problem has the properties that f is differentiable and the feasible region is a open set, then it has to be that the derivative of the function with respect to x evaluated at x star is equal to 0. So, now let us just prove this, since x star lies in s and s is open, what do we know? If x star is a point in s and the set s is open, then what must be the case? Then there is a ball around x star that lies completely in s. So, the picture if you have in mind is, so suppose imagine this is your set s, this is your point x star and there is a ball around it like this that lies completely in s, what that means is, so there exists a radius R greater than 0 such that x belongs to s whenever the distance between x and x star is less than r. Now, what is unique about the ball? What is so interesting about? There is a ball around the point x star which lies completely in s. So what? So what if there is a ball around? What is so great about the ball? So there are many interesting things about the ball, but I think that that is of relevance here is that remember the unit any ball of unit ball or any ball of any positive radius is essentially like all of r n shrunk down to a smaller radius. So what that means is, if you look at x star and you look at any other point in r n, look at the direction starting from x star going towards that other point, let us call that other point y. Look at this direction that starts from x star going towards y. That direction, this is a ray that starts from x star goes towards y. What I can do is without changing the direction, I can simply shrink this ray all the way down to the point where now that ray is present completely in the ball. So any direction I can conceive of which is present in starting from x star towards which I can go in r n, all of those directions are also present inside the ball itself. It starts from 1 but it starts towards some point in the ball. So what that means this, so for every vector h in r n, I can find some eta positive such that if I look at the vector x plus delta h, I will tell you what delta is, this vector x plus delta h belongs to s for all delta that are greater than equal to 0 and less than equal to eta. So take any direction h and what I can do is take that direction and scale it down by multiplying it with a factor delta such that now x plus delta times that direction lies in the ball. So this lies in the ball and therefore in s. So for every vector h there exists an eta such that one can do this. Now since we know that x star is optimal is an optimal solution. So what does that mean? That means that f of x star is less than equal to f of x for all x in s. But then I can in particular take a point like this, take a point like x star plus delta h which is also lies in s. In particular f of x star is less than equal to f of x star plus delta h whenever and now remember that delta can be is the way usually what is the factor by which you have shrunk the vector x. So now we are going to make delta smaller and smaller. So if delta is smaller and smaller then x star plus delta h is very close to x star itself and then we are in the regime of Taylor's theorem. So now what does Taylor's theorem tell you? Taylor's theorem implies that f of x, Taylor's theorem tells you that f of x star plus delta h is equal to f of x star plus this thing, this derivative of f at with respect to x evaluated at x star times times now x star minus x star plus delta h minus x star. So that is just delta h plus something that was going to 0. Now we go back to what kind of factor it was. So Taylor's theorem said that there was slightly bad choice of notation because here there is this, I have used h for the function but I hope you will still be able to get this. This factor here which was going to 0, it is such that h itself goes to 0 as y goes to 0, y minus a also goes to 0 as y goes to 0 but because h goes to 0 as y goes to 0, what this means is this residue that is left is such that if I divide that by y minus a that also goes to 0. So another way of writing Taylor's theorem would be to say something like this would be to say that this equal to the f of x star plus delta h is equal to f of x star plus this derivative times delta h plus something which I will write like this, small o of delta. What do you mean by small o of delta? Small o of delta is just a generic notation, it stands for the following. Small o of delta is just any quantity such that any quantity say theta of any quantity theta of delta such that if I look at any quantity theta of delta such that if I look at the limit small theta of delta divided by delta this goes to 0 as delta goes to 0. So it is small o of delta means that it not only does it itself go to 0 even after dividing by delta it goes to 0. So now this is a way of writing things in optimization we make this kind of short hand we do not really need to know what this exact thing is so long as we know at what rate it goes to 0. Quantity can go to 0 but you can have a quantity that theta of delta that goes to 0 but then theta of delta divided by delta may not go to 0. But in this case this is a quantity where theta of delta itself goes to 0 and upon dividing by delta also it is going to 0. So that is the significance of this. So what Taylor's theorem is basically telling us is that f of x star plus delta h is equal to f of x star plus this derivative times delta h plus this quantity which is with this which means this is something such that small o of delta divided by delta goes to 0 as delta goes to 0, okay. So since that is the case now what we get is so now f of x star is we know that f of x star is less than equal to f of x plus f of x star plus delta h. So consequently we get that so if I look at this difference f of x star plus delta h minus f of x star that difference is greater than equal to 0. So consequently I get that this is now this holds for all delta in some neighborhood 0 to eta. So what I can do is take delta greater than 0 that allows me to divide throughout by delta in that case I have that this is true. Now since this is true I can now let delta go to 0 and that would give me that the derivative of f with respect to x evaluated i x star times h is greater than equal to this. Remember this derivative is a row vector h is a column vector this times this this scalar is greater than equal to this, okay. Now I can do this, okay. So this is greater than equal to 0. Now what did we want to show if you want to if you go back to the claim of the theorem it was saying that the derivative is actually equal to 0, right. But remember now what I have got is that this inner product the inner product that is written here the derivative time guy this guy is greater than equal to 0 for every vector h in Rn, right. Because I started off with h as an arbitrary vector and I got that this thing should be greater than equal to 0. So since this has to be greater than equal to 0 for every vector h, okay what does that mean? So this must also be true for minus x, right or it must be true in particular if I take h to be this negative of this derivative itself, right. So I can take for instance h to be the negative gradient of f and so that would then give me that my negative of this derivative evaluated at x star norm of the whole thing squared should be greater than equal to 0. So now negative of the norm squared is greater than equal to 0 that just means that the norm of this derivative this squared is less than equal to 0 but it cannot be strictly less than 0 which means it has to be equal to 0. But when the norm is equal to 0 means the vector itself is equal to 0. So that proves the theorem. So now let us understand the consequences of this and also understand what the limitations are. So the main consequence is remember that when you are searching for a minimum of a function over an open set, there are potentially there are infinitely many possible options there, infinitely many possible choices. What this theorem is saying is that well if you wanted to look instead of looking over all of them you can actually limit yourself to searching over those points for which the derivative is equal to 0. Why does it say that? It is saying that because this is a necessary condition, any minimum must satisfy this. So searching over the full space, full feasible region has been reduced to searching over solutions of this equation. Solutions means solutions x star of this equation. Now what kind of equations are these? How many variables and how many equations? X star itself was in Rn and this is the derivative of f with respect to x. So there are n equations n variables. Usually the difficulty arises that these equations are not necessarily linear. So you need some numerical methods to find the solutions but it is still better than having to search in the entire space. So this is how optimization works. You first try to find a mathematical result which constrains your search and then you come up with an algorithm to go and find search in that particular region. So this is otherwise there is you know you would without a result you would be searching everywhere and that is too expensive. So I want to also, this is one consequence but also some cautionary remarks. Now remember what we said, remember the theorem said that if there is a solution then this must hold. So there are many different cases and sub cases that arise because of this kind of statement. So let us go through all of them. So the first case, so let me try and, the first case is, so let me write it this way, existence of solution of optimal solution. Second thing is number of solutions to this equation. And then you have consequences. So let us take this case by case. So suppose there is an optimal solution. Now if there is an optimal solution then we know that it must satisfy, it must satisfy that its derivative is equal to 0. If there exists an optimal solution this must be true. And now suppose there is only one solution to this bunch of equations. So there are these non-linear equations. We know that these non-linear equations must be satisfied by the optimal solution. But there may also be other things, other points that solve these non-linear equations. But suppose there is only one solution to these non-linear equations. So suppose only one solution. Then what does that mean? That has to be the optimal solution. So every optimal solution must satisfy these set of equations. But then these set of equations have only one solution. Then it has to be that that solution is the optimal. So then the x star that solves these set of equations is the unique optimal solution. Now suppose there is a solution to this set of equations, sorry to the optimization problem. There is an optimal solution. But when you look at these non-linear equations you will find more than one point satisfying it. So more than one point solves this, more than one point x star solves this. Now if there is more than one point that solves this, what does that mean? It does not mean anything. Any of those could be optimal solutions. No particular confusion that can be done. Now suppose there is no optimal solution. Suppose there is no optimal solution and no solution here also. Then of course then again nothing can be said. Suppose there is again going further. Suppose there is no optimal solution. But there is a unique point at which you can solve this system of equations. What can one conclude about this? Again we cannot say anything. See remember because the theorem started with the premise that you have a solution. If there is no solution the theorem will not tell you anything. And again once again that suppose there is no optimal solution and then there is more than one solution to this. That also does not say. I will, one second. Let me illustrate this for you to do some examples. So let us do the first case first. First case is when there is an optimal solution and there is a unique solution. So what I will do is I will take S to be say minus 1 to 1, the open interval minus 1 to 1. And let us draw some function illustrating this case. It is minus 1. Here is 1. So I am minimizing this. And I find that there is a unique point here like this where the derivative has become equal to 0. Then it has to be that this is also the optimal solution. So this is the happy case. The second case is once again I am on minus 1 to 1 like this. And I have a function that looks like this. So now if you look at the derivative of this function, the derivative of the function has become 0 here and it also becomes 0 here. So there is now more than one point where the derivative has vanished. So we need to dig deeper in order to conclude what the optimal solution. So in this case, we have to dig deeper. That is a separate matter. The theorem does not help us conclude anything about this. What does case 3 look like? No optimal solution and no derivative also does not vanish. So let us take. So just take for example, I want to keep the domain as minus 1 to 1. So I just take say just a line like this. There is no optimal solution. The infimum is not attained in this set. It would have been attained at minus 1 if minus 1 was part of the domain, but it is not. So the domain is open. So the infimum is not attained. And also that function shape is such that its derivative does not vanish anywhere here. Let us look at 4. So 4 is 4 is actually a case. That is very tricky because here you have a unique solution to the equation that derivative is equal to 0. But there is no guarantee that the optimal solution actually does not exist. So what does that look like? So just invert this figure 1. If I invert this figure 1, just see for example, you do this. Look at this sort of figure. There is no, there is no, the infimum is not attained in this case. But there is this point here where the derivative has vanished. So this is case 4 is often responsible for many errors because before checking that a solution exists, people jump into checking for solutions of derivative equal to 0 and then often come up with answers that are not actually solutions of the optimal solutions of the problem. And then solution 5 would look say something like this. We are talking of minus 1 to 1. Again there are now multiple points where the derivative is vanishing. But there is no