 Today, lecture is the last lecture. In this lecture, I am not supposed to tell you something new, I am not supposed to tell you something very difficult or advanced. As a tradition of NPTEL lectures, I am supposed to summarize what has been done, give you an overview of what is the subject in some sense and give an idea of what you can probably do with it. It is also important that I give you some possibly more homeworks or little bit of examples if possible. Tell you about what has not been done, what I wanted to do because as you know as number of lectures which is fixed, it is also important to know that whatever I have done here may not be the most important things needed by each and every one of you who are listening to this lecture because there is a lot of things in optimization and every practice, every practitioner needs something of it. I really wanted to tell you something about direct search methods in the sense that these are heuristic methods which takes a point. For example, I am giving a compass search which is called and look at a point x naught and see if the gradient is 0 at this point. So, this for a different or it does not matter even if you have a gradient information, you can do it without gradient information. Suppose, this point is not the optimum. So, you check for few points. You move along this direction, move along this, move along this, move along this east, west, north, south and see at which point the function value is decreased. Take that point. If I have not decreased the function value, function value remains higher than this. Then I have to reduce my radius of the movement and then again try. So, these are called one of the methods, one of the direct search methods. These things appeared in the mid 60s, but they did not have a convergence proof. So, lost their clout with mathematicians and they were almost forgotten though and though later revived that if you have differentiability information for this sort of cases, you can actually develop a convergence tool. I really wanted to do something with that in this course, but this is not the time that you can actually do such things in the last lecture. You recollect in this course, we have started with unconstrained optimization. Of course, unconstrained optimization is a mirror through which all of optimization can be viewed. So, the fundamental rule to find here is to find an x naught and then check if is positive definite is p d. So, if you can find an x naught which satisfies this and this, then such an x naught is it will imply that x naught is minimum. That is exactly what is the thing that we learn. Of course, we learn through various algorithms how to find not exactly the minimum, but find some sort of a good approximation to it, because it is very important to know except for toy examples which we give in books like f x equal to x square. In actual problems or even slightly complicated problems, a fact which I want to stress, I am stressing rather in this course is that you cannot find the actual solution to an optimization problem. What you can find is some sort of approximate solution, happy solution which you may like, which you may not like. So, it is up to you. So, x naught is the minimum. So, x naught is the minimum here theoretically, but x naught to find use algorithms. So, algorithms give only approximations. So, this is the fundamental thing you must remember. Algorithms give approximations not possible to not easy or rather impossible in most cases an exact solution. So, this is the fact that you need to remember when you want to if you want to advance yourself in the subject of optimization. Now, the algorithms that we did where largely steepest descent, we spoke about the Newton's method, we spoke about conjugate gradient method. I want you to recollect that these two are line search methods. That means this can be written as where d k is a direction of descent and you know what is the direction of descent. Here it is slightly different, here we are talking about the conjugate directions. So, this is built upon conjugate direction. So, there is another nonline search algorithm called trust region algorithms which we had not done. So, another important class at this moment and a very important part of optimization research is trust region algorithms. So, I would refer you to the book by Nosez Alain Wright, which because it also has an Indian edition, it is published by Springer, numerical optimization, Khorkha Nosez Alain and Stephen Wright. So, you can see a very very good study of fundamental issues in trust region algorithms using this. We also did another important variation of the Newton's method which is much more useful to solve non convex problems is the quasi Newton methods. Of course, you must remember that when we do the line search methods we always have to keep in mind that we are expecting this, we want to do this. There is a interesting analogy between trust region algorithms and quasi Newton algorithms because in trust region algorithms need the use of constraint optimization while this also needs the use of constraint optimization to understand it to make its updates. This and this and this really does not need it. So, now once unconstrained optimization was done we came to the heart of the matter. We came to study the constraint optimization problems where we studied the Fritz John conditions and Karush Kuhn Tucker conditions. I would like to revise your memory in the sense that a very important thing is to note that whenever there is a normal multiplier for a Fritz John system that is same as telling the at the Karush Kuhn Tucker conditions hold. So, every normal multiplier existence of every normal multiplier is telling that the Karush Kuhn Tucker condition holds. So, there were constant qualifications like the linear independence, constant qualification on L I C Q or the Mangasarian from which Mangasarian from which constant qualification which is called M F C Q. These two always guarantee that if this is satisfied at the solution point then there cannot be any Fritz John multiplier which is abnormal. All the Fritz John multipliers would have lambda not strictly bigger than 0. I would just remind you the Karush Kuhn Tucker conditions. So, let me take one constraint equality and one constraint inequality. This is the F J condition or the Fritz John condition. So, what is done here if you observe the last second condition is a very important thing to note here that if L I C Q or M F C Q is holding as we have said that there cannot be any lambda not which is strictly bigger than 0. See I have to write this also lambda sorry lambda 1 G 1 X star equal to this. This is called the complementary slackness condition. Now, if these two any of these two happen then this can never be 0. Actually in the Fritz John condition what we have is that lambda not lambda 1 and mu 1 this is not equal to 0. So, lambda not is either 1 my 1 means one either greater than 0 or lambda not is equal to 0. If lambda not is equal to 0 we can always rescale the multipliers with lambda not that is divide by lambda not and get this lambda not to be equal to 1. So, I can always write as this as. Now, you see this thing is guaranteed to be always strictly greater than 0 if these either these happens or this happens. This is the weakest condition which guarantees that all the lambda not strictly bigger than 0. If this fails then it will be always conformed that lambda not is strictly bigger than 0. Sorry lambda not sorry I just want to repeat. If this M F C Q fails then there is always a set of multipliers which satisfy this with lambda not equal to 0 that will there exist an abnormal multiplier. So, a very very important central thing in your learning is this if M F C Q fails there exists an abnormal multiplier abnormal F J multiplier Fritz John multiplier. So, when M F C Q fails your abnormal multiplier is guaranteed. So, if all the conditions that you will see in books like Abbey constant qualification or Gignard constant qualification or the approach of Routini of in the recent times they only do one thing they say that if my M F C Q has failed my problem is an abnormal problem because of an abnormal multiplier, but then does there exist a set of multipliers which is normal that is with lambda not strictly bigger than 0. The answer surprisingly to answer to be yes. So, these conditions guarantee that there will be at least one multiplier set with lambda not is strictly greater than 0. So, the KKT condition would be satisfied that is why these things which are weaker than M F C Q are called constant qualification because if the constants qualify those conditions then the KKT condition is guaranteed. A very very central issue is in linear programming for a linear programming problem there always exists a Karush Kuntaker point or there without any constant qualifications. So, this is a very very important result for an LP this is called an LP and LP for an LP rather for LP we can always guarantee the existence of a normal John multiplier. The interesting part is that if I take it this to be any other function f x the differential function non convex also then if these constants are linear you can still write down the Karush Kuntaker condition without any use of constant qualification that if actually I have linear or affine constants then the existence of at least one normal Fritz John multiplier is guaranteed. This is a central thing very very important result. The the the interior point methods which not the Karnakar's one which has occurred after that especially through the work of Reine got the use of Newton's method for solving the use of Newton's method for solving this problem this linear programming problem comes out of this very fact that the Karush Kuntaker condition always exists that is always there is one set of multipliers which is normal. So the KKT condition is always holding and then we can just solve the KKT system very cleverly using the Newton's method which we do not discuss here, but it is discussed in slight details in my other course on convex optimization in NPTEL. So, this is a very very important very very central fact and this cannot be ignored this is a very very important fact possibly in the Karush Kuntaker theory this is the central fact I would say. Now we have also discussed something about and we had done some lot of examples another book that I want to show you which is a Springer text in statistics series book this is the optimization by Kenneth Lang it is available in Indian edition, but the usefulness of this is that not only does the Indian edition because this book is written by statistician who uses optimization is work and so you can see a lot of good examples from statistics and that is really a source of fun even for the optimizer to read this book it is really a big fun I would say has a very different way of looking at Karush Kuntaker theory and so so the thing one should read it. So, let me just try to give you a type of problem that is useful in statistics and may be you should be able to find the x the solution say you let me have to minimize f x now you observe that this is only defined if I have x i strictly bigger than 0 x i not equal to 0 rather not bigger than 0 and I have an upper half space this constraint. So, it is called a minimum of the linear reciprocal function x i cannot be equal to 0. So, once x i is not equal to 0 this is a differentiable function. So, now find the KKD point and check if that is a minimum. So, here is an example where you have to use the Lagrangian theory now if you look at this problem I remind you again that once you have this KKD point how do you check that it is a minimum you can always take the second order conditions just remember the second order conditions that we have discussed and you can check the second order conditions here and see what you get out of it or whether you can argue in some other way that this is actually minimum that would be much more fun. So, there are many things that we have studied for example, we have studied the penalty method penalty approach we have spoken about exact penalty we have spoken about the projected gradient method and the projected sub gradient method we have spoken about projected gradient method and projected sub gradient method. We have also studied in quite a detail about Lagrangian duality now I will just try to you as a homework about once problem which you will see that Lagrangian dual is starting the Lagrangian dual of such a problem is actually very helpful these are called problems which can be decomposed or decomposable problems and. So, let me write down a problem let me write down its structure and you will see that Lagrangian studying the Lagrangian dual of such a problem is more useful or simpler in for computation than the actual problem. So, here you have to minimize function f x and that f x is given as so it is x. So, there are so means this way x variable is partitioned now into x 1, x 2, x k and x k belongs to R of n k and n 1 n k is capital N basically I am partitioning this vector into several blocks each block here is a vector. So, this is in R n 1 and this is in R n capital N. Once this is known let me now just write down the inequality constraints. So, this is a problem in a decomposed form now once you have this in a decomposed form then what is your Lagrangian maybe I can put this with 0 b i 2 is 0. So, this is where you will see a very good usefulness of Lagrangian duality. So, what would be a Lagrangian here? A Lagrangian would be this would be a Lagrangian. Now, if you want to find the theta lambda this can be now written as number x of course. Now, you see here I have now got to solve some small or simpler optimization problem, but if I solve one of these optimization problems the structure of the other is clear what would be the solution. So, I have to solve a very simple optimization problem and at a much lower dimension than n and if it is just a non constant problem now I can just for a fixed lambda I can actually really solve this problem much more easily. So, here you see there is a use of actually looking at the Lagrangian dual because computation of theta lambda is quite simple. I would give you the homework as to what would happen if I put a b i here. So, if I put a b i here instead of 0. So, how what would be the writing? Here also you will have you will have the additional term minus b lambda you do this and then from the whole thing subtract minus b lambda, but let me just give you a problem in a decomposed form which is specially used in the study of power supply in electrical engineering where this problem actually comes out of study of power demand in the sense that how can we satisfy the demand of power at minimal cost. So, if x j is the amount of power the j th power plant is. So, there are n power plants 1 2 3 4 5 6 7 8 9 10. If the j th power plant what should be the output of the j th power plant so that the demand is met. So, demand is b say then the total amount of output of all the n power plant should be excess of b that is the meaning that the demand is satisfied. So, suppose the cost of generating electricity at the j th power plant it does not matter what you put this or that just here. So, this is the cost usually the cost is the quadratic cost and b is the total demand. So, if x j is output of the to have an output x j from the j th plant this is the total cost that you have to incur. So, here it is strictly convex of course suppose I say that they can make x j is free any amount of j they can produce just theoretically actually x j should have a bound then this problem is to minimize summation f j x j and if b is the total demand then summation j equal to 1 to n that is what it should be. So, it is again a problem in the decomposed form and for which you can actually use the Lagrangian duality to get something it is very important to note that optimization is a very very vast field. It is not that what we have done is a broad gamut of optimization it is just a miniscule gamut possibly optimization is not about what we have done is called continuous optimization. We have not spoken about variables that could be discrete that is for example, you could minimize an objective function f x where x i either takes the value 0 or 1. So, for example, I just give you a simple problem like this I just a linear programming problem switching. So, x i takes the value either 0 or 1. So, these sort of problems are called integer programming problems and we have not discussed them at all here or x is always in R n rather than x being in the Cartesian product of. So, here x would be in the Cartesian product of n fold Cartesian product of 0 1. So, 0 1 which is which is also customarily written as. So, it is a sort of hyper cube. So, if you are if you are in 2 d then 0 1 0 1. So, it is this point this point this point a grid basically. So, these are the 4 points over which you need to compute the function values and list down the minimum. If it is only a 4 points like this that is n equal to 2 then it is very simple just calculate the 4 points and get it machine will give you all the answer in blink of an eye. The problem is that what happens when n is very very large and when n is very large things can be very difficult you cannot make a total enumeration. So, because you cannot make a total enumeration you must have clever ways to know which among this huge number of finite points is actually giving is the minimum. So, that gives rise to a subject called integer programming or combinatorial optimization because it uses methods from combinatorics and it is approaches is completely different from the approach that we have taken in this course and approach of continuous optimization. I want to assert you at this stage that we are going to fulfill a part of this thing in this course where after apart from the lectures that I have there will be 2 special lectures added to this course and those are delivered by professor Vishnu Narayan of the industrial engineering and operations research department of IIT, Mumbai and you would see that the course that I the approach taken for such problems where you have discrete variables or integer variables are absolutely different from the approaches that we take have taken here. There the approaches depend on combinatorics it depends on graph theory while here the approaches approach is largely depending on analysis. There is also another important class of optimization problem which is currently a very hot area of research is called polynomial optimization. Of course, it depends on analysis, but it also depends heavily on algebra and algebraic geometry. So, these are certain exciting things that are coming in and what we have not done here is that we have not here spoken about what happens when a data has got noise because most engineering problems during collection of data there are certain noise which comes into the problem automatically nobody can stop that that noise will come in. So, once that noise comes in you need to know from your empirical experience that what sort of a distribution that noise might follow and once that is known you can develop certain optimization techniques to handle the problem with that noise those sort of things are called stochastic optimization or there is a part of it called pro-albistic programming which we have not done here it is called stochastic optimization. So, as we try to finish this course we see that we have done a huge gamut of stuff which is the foundations of optimization, but largely on continuous optimization we have not spoken anything about discrete optimization of which you will have listened in the two lectures. But I would also like to assert that lot of problems of discrete optimization can have a corresponding problem in the continuous optimization setup which is some sort of an approximation to it. So, we can relax or in some way approximate this problem by a continuous optimization problem and immediately estimate the solution. For example, in this particular case of four we can relax this problem by taking the minimization over the convex hull of these four points. So, I can do the same thing I can relax this problem what I have done here and write minimum of this over. So, that is a relaxation over the convex hull of. Now, it is very very difficult to do compute the convex hull in the machine if that could have been done then integer programming would have been much easier. Because that cannot be done there has to be different ways and that is whole lot of an exciting subject which we have not going to get into. So, with this I would like to. So, here now we are ending this course I would like to thank you for patiently listening to this. I hope I have not made some mistakes which could be unintentional if you go through the course please write back to me in my email. So, that can be corrected and corrected update can be posted on the website of this course which would be maintained by NPTEL at IIT Madras. So, thank you once again and I hope some of you who has listened to this course or some of you need optimization problems will make a deeper study of optimization and understand the subject better. Thank you very much.