 Welcome, today is our last lecture on optimization theory. In this lecture, I will cover the last important topic in the basic theory of constraint optimization that is duality. And then we will discuss a general overview of and classification of methods of constraint optimization and then if time permit, then we will quickly have a brief discussion on two specific type of constraint optimization problems which are linear optimization and quadratic optimization problems. First, for discussion on duality, let us consider this general nonlinear optimization problem in which we want to minimize the function f with constraints g x less than equal to 0 and h x equal to 0. Suppose, we get a solution of this problem and that solution is x star corresponding to Lagrange multiplier which are lambda star and mu star. Now, note that these Lagrange multipliers which we called as lambda and mu earlier, now we are giving it giving them the name lambda star and mu star because for the discussion on duality in which lambda and mu will be taken as full budget variables, we need the notation lambda and mu for those variables and therefore, the specific values for those variables at the solution point, we are going to call as lambda star and mu star. Now, if x star is a feasible point which is the solution of this nonlinear optimization problem constraint optimization problem, then we know that the Lagrangian of the problem at the solution can be developed as f of x plus lambda star transpose h of x plus mu star transpose g of x. And then, the first order necessary condition for this function to be stationary is grad l equal to 0 which is grad f plus lambda plus grad h into lambda star plus grad g into mu star equal to 0, which we have seen earlier in the k g t conditions. Apart from that, we will also find the optimality condition as that mu star should be all non negative and 0 corresponding to the inactive inequality constraints, this much we have seen. Now note that that is the condition that is the first order condition. Now note that if we get this as the solution and then we try to vary these numbers. Whatever is the Lagrange multiplier values lambda star and mu star at the solution point around that point, if we try to vary these numbers and the general values of these numbers, these multipliers, we treat as lambda and mu with the values specific values of those variables lambda and mu at the solution point being taken as lambda star and mu star. And then the general function in terms of general x, general lambda and general mu will be constructed like this and this turns out to be a function of x lambda and mu that is the original variables of the problem and the Lagrange multipliers taken as variables. If we define like that then we say that for some given values of lambda and mu, we can consider the minimization problem and consider this as the Lagrangian and then through that is after giving some values of lambda and mu, we try to minimize this function with respect to x. Now the minimum point x star in that case will depend upon the lambda and mu values that we give that is if we end up giving the correct values of lambda and mu that is lambda star and mu star then we will get the correct x star which is the solution of the original optimization problem. On the other hand if we give somewhat different values of lambda and mu then as the solution process at the end of the solution process as the solution of minimization of this function, we will get a point x star which is not the correct x star. Therefore, we can call that x star which is the minimum of this function for general values of lambda and mu not necessarily lambda star mu star as the x star that we obtain as a result of specifying those lambda mu values and correspondingly we get the function value at this x star. So, that value at the x star dependent on these can be constructed like this. Now note that with respect to for every prescribed set of values lambda and mu we get some minimum point of this function and the corresponding minimum value. Now this minimum value then can be taken as a function of lambda and mu and therefore, this the minimum value of this minimum with respect to x variables as a function of lambda and mu is called the dual function that is a function of lambda and mu and so we can define this new function as a function of lambda mu which is the minimum of this Lagrangian function with respect to x. Note that since the minimization is carried out with respect to x the resulting minimum value is no more a function of x, but it is a function of lambda and mu. So, these variables lambda and mu turn out to be the variables of this particular function and this function is called the dual function. Now the original function was with respect to x original function was a function of x and this new function that we have now defined is a function of the Lagrangian multiplier. So, these are called the dual variables the original variables x is called the primal variables and therefore, the original problem is called sometimes the primal problem which has been stated here and this from here with the dual function we will define a dual problem. Now consider now in this case we know that the optimality condition of the original problem or the primal problem is gradient of this equal to 0 that is one thing and mu is non-negative the entries in mu are non-negative. Now consider this same function, but then maximization of this function with respect to lambda and mu and see what we get and in that we will consider mu to be non-negative. Now here we consider this problem maximize the Lagrangian function with respect to lambda mu subject to mu greater than equal to 0. Suppose we consider this problem now this problem gives us something interesting. Now note that for when we maximize when you try to maximize this function with respect to lambda and mu for a given set of values for x that is for given x we try to maximize it with respect to lambda and mu. Now if the given x is such that at that value of x one of these inequality constraints turns out to be one of these functions turns out to be positive. Now for that g i if there that is positive then what value of the corresponding mu we should take to maximize this function we are trying to maximize it if a particular g i is positive then the corresponding mu i we can take extremely positive that is we can go on increasing that value of mu which is corresponding to the function g i which is positive that means that on the upward side this function of lambda and mu is unbounded that is maximum does not exist that is if some g i is positive then we can go on increasing mu we can give a positive value of mu enormously large positive value of mu which will make this function as large as you want. So that means there will be no maximum value it will be unbounded that means for a maximum value for this function to exist it is necessary that no g i is positive. So we get one requirement for the existence of maximum of this we need this. Now see that if a g i is negative then giving value mu equal to positive will reduce this function value which we do not want this is a maximization problem. Since we are solving a maximization problem then for every g i which is negative take this for every g i which is negative we would like to keep mu 0 because if mu is positive and g i is negative then the corresponding product mu i g i that you will get here will be negative and we are we are maximizing this function. So the negative value of this we would like to reuse as much as possible that is compared to minus 5 minus 4 will be considered better compared to which minus 2 will be considered better and so on and best will be if this is 0. Now if g i is negative then above 0 it cannot go because mu is non-negative. So that means that for every g i which is negative we need mu i 0 that settles well with our condition earlier the complementarity condition in the KKT conditions that corresponding to inactive inequality constraints the mu should be 0 and those g i's which are active at that x that means those g i's which evaluate to 0 for them mu i can be positive this is one thing but as the requirement we need all these to be 0 that means a value of x at which any of these turns out to be positive at that x this maximization problem the solution of this maximization problem will fail maximum will not exist. So for the maximum to exist it is a necessary condition that this is less than equal to 0. Now let us come here at that value of x if a particular h is positive this is positive then on lambdas there is no restriction lambdas can be positive as well as negative. So if a particular h is positive then corresponding lambda lambda j we can make positive and give it as large a value as we want and therefore the product lambda j h j will be enormously large as large as we can want. So that means on the upper side this function will be unbounded and we will not find any maximum we can go on increasing it as much as we want. Similarly if a particular h turns out to be negative then the corresponding lambda j we can give negative here there is no sign restriction. So again as low as possible minus 10,000 minus 20,000 minus 40,000 value we can go on giving to lambda j and make the corresponding product lambda j h j here as large as possible because this is also negative this also we give negative as negative as we want and therefore we can go on increasing this function again the function will be unbounded. So there will be no maximum. So for the maximum of this function to exist it is also necessary that h j is neither positive nor negative that means h j should be all 0. Now note something interesting here based on the Lagrangian when we wanted to minimize this with respect to x we got this function which we called the dual function and at the same time we found this because one of the minimum conditions is this and of course the Lagrangian derivative should be 0 that condition we got. So while minimizing this with respect to x we got this function the reduced function function of lambda mu only and mu greater than or equal to 0 the gradient of this with respect to x vanishes. Now when you took the same function and tried to maximize it with respect to lambda mu then we got the conditions like this which are the feasibility requirement for this. So the duality is here that is the definition of the dual function requires a minimization with respect to the primal variable. So definition itself means the dual function dual problem is feasible only at those point where the primal function is minimum. Now here you find that the primal problem is feasible only at those point where the dual turns out to be maximum. So if these are 0 then here in the contribution this whole thing will be 0 and if these are all non-negative and for negative values of g i corresponding mu's are positive mu's are 0 then this also will be 0. So at the solution point you will be left with only f x with the conditions which are these. That means you will be left with the function f x with these conditions that means the feasibility of the primal problem. So that shows that when you minimize the Lagrangian you get the dual to be feasible that is optimality of the primal problem is linked to is connected with the feasibility or the definition of the dual problem. Similarly optimality now in the maximization sense of the dual problem is linked to or connected with the feasibility of the primal problem. So this is the idea of the duality. Now if you consider lambda and mu as variables and then you define this function of the Lagrangian multipliers and then say this is the dual function we try to maximize this dual function with respect to lambda mu. You should get under suitable conditions the same lambda star mu star with which we started and you should get the maximum value of this phi which is the same as the Lagrangian value at x star lambda star mu star and also the same as the optimal function value f of x star. Similarly if you take the Lagrangian and from there you first maximize that and get back the original primal problem and then conduct the minimization of that then also you reach the same point. So with this much background now let me summarize the overall results of local duality or convex duality without getting into complicated proof. This is just for an overview if we assume local convexity that is near the solution point the function is convex in the x variables in the primal variables. Then the dual function is defined in this manner and constraints on the dual that is for the definition of the dual problem you will need this that is optimality of the primal and similarly you will find that at the same time apart from this gradient condition you will find the corresponding to the inequality constraints of the primal problem. You will find non-negative variables mu in the dual problem that is constraint corresponding to the inequality constraints whatever is the optimality condition on the mu's that will appear in the dual problem as constraints on the bounds on the non-negativity constraint on the dual variables mu's. If you work out in detail that is what is the condition of first order optimality first order condition for maximum of this then you will be taking the gradient of phi with respect to lambda mu and setting that equal to 0 that will actually give you these conditions which are which is equal equivalent to the feasibility of the primal problem. So you will find that first order necessary conditions for the dual optimality turn out to be equivalent to the feasibility of the primal problem and the way near the solution point the primal function is convex on the other side the dual function is concave that is this is a concave function that is its second derivative will be a negative definite matrix and another suitable convexity conditions this will be satisfied. Now if those if this condition is not satisfied that means that the problem does not have those conditions but even there what will be satisfied will be that the maximum of the dual function is less than equal to the Lagrangian value at extra lambda star which is again less than equal to the minimum value of the primal function. So in the case of convex problems you will find that the inequalities are replaced by equalities and then what is the characteristic of this large function this Lagrangian function the Lagrangian function has a saddle point and that at x star lambda star mu star because in the x variables that is a minimum point. So that means the in terms of if you try to see the scene in the x subspace in the complete space of x lambda mu if you try to visualize the shape of the Lagrangian function in the x subspace then at x star you will find that it has a minimum like this. On the other hand in lambda mu subspace it will have a maximum at that point that means like this. So that means that in the lambda mu subspace when you give a value to lambda mu that means in the lambda mu subspace you are assigning a value that means you are saying that we are going to cut here or here and that is in the lambda mu subspace that is this variation. So suppose you are giving this value of lambda mu so then at that value of lambda mu in the x subspace you will get this curve in which this is a minimum and so on and among such minima if you try to then find the maximum then you will get this and that is the solution. So in the lambda mu subspace the solution point is a local minimum in the lambda mu space it is a local maximum in the x space it is a local minimum. So if you give a slightly different lambda mu then other than this curve you will get another curve and so on. So those will not be feasible for the original problem for the primal problem but the corresponding minimum you will get here. Similarly if you first freeze the x variable then you will get not this curve but something like this curve in which this will be the maximum. So the locus of all maxima of the dual problem will be this and similarly the locus of all the minima of the primal problem will be this. Now out of the locus of all the maxima of the dual problem if you minimize you get this point similarly out of the locus of all the minima of the primal problem for different lambda mu if you maximize that is maximizing the dual you get this point. So same point you get from all directions. Now this duality has an advantage in the sense that there may be problems in which the primal problem is difficult to solve but then if we recast the problem into the dual variables and then many times the dual problem turns out to be simpler to solve and in that case we try to solve the dual problem and thereby develop the solution for the primal problem. Some of the optimization methods some of the algorithms are based on this duality. Now with this much of theoretical background of constraint optimization theory let us quickly have a have an overview of the type of methods that we use for solving constraint problems. Typically for a problem of n variables with m active constraints you can classify the different optimization algorithms different non-linear optimization algorithms into several classes depending upon the dimension of the space in which they conduct the search. The simplest is the family of penalty methods in which the search is conducted in the same space in which the search would be conducted if it were an unconstrained problem and constraints are included in the discussion in the search process through a penalty term like this. So what we do in the case of constraint optimization in penalty method is that rather than trying to minimize the original function f x we try to minimize a penalized function which is f x plus c into p x where p x is a well designed penalty function which is 0 or insignificant at those points where the constraints are satisfied and they become positive at those points where the constraints are not satisfied and more the constraint valuation higher is the value of p x with a large number c sitting as the penalty parameters. Now how does this work In the normal search process for any unconstrained method this function will tend to be large tend to have large values in those localities in those points at those points where the constraint valuation is more. Therefore any optimization method due to the very nature of its working will avoid those zones where constraints are violated. So for example you can consider this as a penalty function right this is one of the very often used penalty function half h s norm square plus half maximum of 0 and g x norm g x square that means if g x is negative then it is not penalized 0 is taken if g x is positive that means it is violated then the corresponding violation will get penalized. Now whatever is the amount of violation the values of h and whatever is the amount of violation here positive value of g according to that the violation will be more I mean according to that the violation will vary. So this is this way what happens that if the value of the penalty penalty parameter is extremely small then the constraints will not have much effect on it. On the other hand if the constraints are if the penalty parameter c is very large if we give a very large value to the penalty parameter then the constraint satisfaction will take such a prominent role that the original function will be lost in it and the profile the contours of the penalized function turns out to be extremely small because of a large penalty value. So these are the typical difficulties with penalty method that is why to handle these difficulties in penalty method typically when we apply a penalty method we apply it in several stages for example first round we can put c equal to 0 and then we will get the unconstrained minimum of the function then we give say c equal to 1 then the constraint functions will put some amount of effect and the minimum point will slightly shift possibly and then we give c equal to 10 c equal to 100 and so on. If the constraints are active of course equality constraints are always active and if the unconstrained minimum point is not feasible then as we go on increasing c as 1, 10, 100, 1000, 10,000 and so on then the constraint violation will have more and more of a cost and therefore the step wise the minimum point will go on shifting and then by the time we take very high values of c for example 10 to the power 8 or 10 to the power 9 by the time the constraints will be satisfied properly and at every stage of this minimization process we will consider the previous value reach as the starting point for the current iteration. So, this is one way of handling the constraints in the setup of an unconstrained optimization solution methodology itself and this search is made in R n the space the same space of the primal variables the original variables x variables of the problem. Now, there are some methods which operate only on the feasible space and they are called the primal methods. They do not give any chance to constraint violation that means that they start from a point which is feasible and then at every step they continue into the feasible space itself and that way if there are m active constraints then the dimension of the space in which they operate is n minus m because they operate on the tangent plane of active constraints and for inequality constraints they will work in the cone of feasible directions. Now, these methods have one advantage over penalty methods and that is that in the case of penalty methods in the case of a premature termination the result is no way useful because the result may be a point which is yet not feasible. Primal methods have an advantage that even if there was a termination which is premature that is even before convergence that point even if not optimal is still a feasible point and perhaps a reasonably good solution to the original practical problem. So, there are quite a few primal methods one example is gradient projection method. Another family of optimization methods consider the Lagrange multipliers as very fundamental variables affecting the nature of the function in the design space and they say that if you can get hold of the correct values of the Lagrange multipliers at the solution point then the rest of the job is easy and that helps you particularly if the number of Lagrange multipliers turns out to be quite less compared to the number of variables or recasting the problem in terms of Lagrange multipliers gives certain advantages in the sense that the shape of the function turns out to be much simpler or some such thing. So, in such methods we consider the dual function the way we just now discussed. So, in that case we transform the original problem into the space of Lagrange multipliers define the dual problem and make a make an attempt to solve the dual problem and as we solve the dual problem on the way we develop the knowledge of the solution of the primal problem also. One very good example of this method of this family of methods is the augmented Lagrangian methods. There is yet another class of methods in which we operate on the entire space of x lambda mu together that is primal variables and the dual variables all together and that is why they operate in the in a space of dimension m plus n. These are the family of method these constitute the family of method called as Lagrange methods. In that what we do we take the equations from the KKT conditions directly and try to solve those equations. We try to find out the solutions of those equations and also the corresponding equalities together. So, those equalities inequalities and equations from the KKT conditions we try to solve directly through descent steps thereby converging to the minimum of the problem. Now, one example of this family of methods is the famous algorithm called sequential quadratic programming. Now, this much is on the general theory of constraint optimization and in rest of this lecture we will consider two particular types of non-linear optimization problems which are linear optimization problems, linear programming problems and quadratic programming problems L P problem and Q P problem. You must be already conversant with the linear programming problem and the famous simplex method to solve it. So, here we will not go into detail of the linear programming and simplex method except that except that we will make a quick overview of the linear programming problem and the simplex method and then have a look at the general perspective of a linear programming problem in terms of all the theoretical aspects that we discussed in the context of a general non-linear problem. As you know a typical standard form of an L P problem is this minimize f x equal to c transpose x a linear function subject to a number of linear problems with non-negative variable x and non-negative b. Now, if the original problem does not appear in this manner then we conduct a little pre processing to cast a problem to the standard form. Now, for example, if the original problem is to maximize then we minimize the negative of it. Similarly, if there is a variable which can take positive as well as negative values then we give two variables for that variable that is a variable x which can be positive as well as negative that we can put as x p minus x q and say that both of these x p and x q should be non-negative. So, the difference can be anything. So, variables of unrestricted sign we replace by using two variables each of them being unrestricted. If there are inequality constraints then we use slack or surplus variables to get them into equality constraints. If there is a right hand side value which is negative not satisfying this then we multiply that constraint with minus 1 which means that this multiplication with minus 1 has to be done prior to using slack or surplus variables. Now, these pre processing steps we conduct in order to put the problem in this standard form. Now, why do we do all these? Now, to go to get into that you need to think you need to visualize the geometry of an LP problem. For example, for a linear programming problem if the domain is infinite then the question arises does a minimum exist. Now, if the domain is completely open if it is completely unconstrained then there is no question of minimum existing because corresponding to any c which is negative if we go on giving a large value to the corresponding x then we can make it equal to minus infinity which means on the lower side it will be unbounded. So, we are not talking about infinite domain of that kind. If the domain is closed from one side and open from the other side then the question arises does a minimum exist. Now, if the function decreases in this direction towards this closed side then the minimum will exist. On the other hand if there is any opening in a direction in which the function decreases then the function will have no minimum in the domain. You can go on in the direction and indefinitely reduce the function value. On the other hand if the domain is a finite convex polytope that is it has to be a polytope it has to be a convex polytope because of the nature of the constraint that you can have all linear constraints. So, it has to be a convex polytope if it is finite. So, if it is closed finite polytope closed from all sides then existence of the minimum point is guaranteed because you cannot go on indefinitely in any direction. Now, that means what? That means the minimum will exist but that will exist only in the boundary. That means the linear programming problem cannot have an internal minimum point because the derivatives are constant. So, now consider this situation that we have got a domain in which we are trying to solve a linear programming problem. Now, if we start anywhere at a feasible point and then work out the negative gradient for example, suppose this is the direction in which the negative gradient works and we are at the interior of this particular domain in which we are going to minimize. So, if this is the direction of the negative gradient then in the direction we would go on moving till we hit a boundary. Suppose this board is a boundary. So, then at this point we will see what is the gradient. Now, as I have pointed this you will notice that the gradient here has a component which is tangential to this board. So, then since the board is a boundary so we will consider that we cannot go beyond it but on the board we can move. So, we will work out this direction in that we will go on moving till we hit this top of the board which is also another boundary. Then we will go on moving in this direction because that way also we find a component. So, then we move in this direction and find that we hit another boundary. Note that in this three dimensional space we had to hit three boundaries in order to reach a vertex. In the three dimensional space we had to hit three boundaries. First going like this we hit at this boundary board. Then we took the component of the negative gradient along the board. We started moving along the board like this. We hit the top of the board second constraint and then we took again a tangential step and started moving in this way and got hit by a vertex. Now, at a vertex of the convex polytope three faces met and that closed our direction completely. So, in n dimensional space that way n boundaries have to intersect and reach a vertex to finally stop our movement. Therefore, we find that for solving linear programming problems rather than travelling all the way like this we could have said at the beginning that we will operate only with vertices. So, operating with vertices alone is a sufficient strategy and then we find that if we work with vertices only then since we are adding we are introducing slack and surplus variables. So, that means till we hit a constraint for example, this is a constraint inequality constraint that is on this side of the board is the domain. So, till we reach here the corresponding surplus variable A x plus B A x equal to A x plus the surplus variable equal to B that surplus variable was non zero. That surplus variable started decreasing as we moved like this and that surplus variable became zero on at this point. So, we find that the surplus variables slack and surplus variables have the natural values of zero at the boundaries and on this side on the feasible side they are positive. Now if we are going to introduce additional slack surplus variables which are zero at the boundary non zero in the interior then it would help it would be easy to be convenient if our original variables also were like that. That is in the domain they are positive interior of the domain they are positive on the boundary of the domain they are zero and negative nowhere negative in the infeasible zone and that is why to treat all variables at par it becomes easy in the book keeping way to keep all variables as non negative. And that way the original set of variables become a subset of the complete set of variables and on all of them we put this and then all the variables together we can consider. Now in the simple method what we do at every step we keep a set of basic variables that is equal to the number of linear constraints that we have and the other set of variables is taken as a non basic variables. Now at every step we consider the variables having we consider only vertices to begin with and at every vertex quite a few constraints are on the active point. So, those constraints which are active corresponding slack variables will be zero. So, the non basic variables will have a zero value and therefore the basic variables will have this identity basic variables will have the same value as on the right side. And at every step we try to replace the current vertex in favor of another vertex which is better that is why the function value is better. So, at one vertex we consider all the edges along which we would like to move because from one vertex to another we would move along an edge. So, at that vertex whichever edges are meeting out of those edges we select an edge. The moment we select an edge one of the constraint boundaries will leave that means one of the zero variable is becoming non zero. That means one basic one non basic variable zero value variable will get a non zero value now. And then we go along that edge and stop at somewhere whichever other constraint boundary cuts it that means at another vertex. At the beginning of the edge we have the current vertex at the end of that same edge we will have another vertex. So, where another constraint will become active. So, the slack variable whether our introduced slack variable or original variable of the problem that will become zero. So, that means a basic variable now becomes non basic. So, one non basic variable becomes basic and one basic variable becomes non basic. So, this is the idea. So, at every iteration from the current vertex we select a non basic variable to enter the basis. That means we select that constraint which is going to become inactive now. If there is no qualifier if no direction no edge we can find that is leaving no none of the current active constraint boundaries is going to be an advantage. Then that will mean that we have converge the current vertex is optimal. On the other hand if several edges qualify along which there is an advantage then we choose that edge that direction along which the advantage is maximum that is the fastest rate of descent. So, that way we select one non basic variable to enter into the basis from this set. And then at the same time we see that along that edge how far we can go where do we hit the first boundary corresponding to that one currently zero variable non zero variable currently basic variable will become zero because we will hit that boundary. So, we select a basic variable to leave the basis and get included in this list. So, based on the first constraint becoming active along that edge we choose that if no constraint becomes active in that direction that means in that direction the domain is open and we can go up to infinite distance and that means no constraint ahead along a descent direction and the function is unbounded. After these two selections if both result in certain useful selections then basically we conduct another round of elementary row operations to change to transfer one variable from this side to this side and this side to this side. So, that this remains square and then corresponding elementary row operations we conduct to make it identity. So, this goes on till one of these two things happens. So, this is the typical way the simple method operates. Now, let us have a quick look at the general perspective of an LP problem. Now, in this we will not consider this non negativity necessity because currently we are basically looking at the theoretical aspects rather than trying to solve it through an algorithm. So, for example, suppose the LP problem is minimize this function. Now, here I have put two sets of variables the entire set of primal variables entire set of variables I have partitioned into two x and y. The vector x contains those variables which are unrestricted in sign and the vector y constitutes those variables which are which have a have an original non negativity restriction that is like a constraint. And these are equality constraints these are inequality constraints. Now, in the general style of a general constraint optimization problem if you try to work out the Lagrangian of this problem then what will that be that will include the original variables Lagrangian multipliers lambda corresponding to these equality constraints and Lagrangian multipliers mu corresponding to these inequality constraints. And another set of Lagrangian multipliers mu corresponding to these inequality constraints these are also inequality constraints. If you try to write them in the standard form you write as minus y less than equal to 0 if you want to put it in the less than equal to style. Similarly, this will be written as a 2 1 x plus a 2 2 y minus b 2 is less than equal to 0. Similarly, this is a 1 1 x plus a 1 2 y minus b 1 equal to 0 equality constraints. So, Lagrangian multipliers lambda mu nu will enter into the Lagrangian function that is f plus lambda transpose a from here plus mu transpose g from here plus mu transpose minus y from here. So, you get this. So, this is the expression for the Lagrangian. And first order conditions for the minimality you will get as derivative of this with respect to x equal to 0 that is this. And derivative of this with respect to y if you try to consider then you will get a term from here a term from here and a term from here and this. So, that will show the this equality that is nu is equal to these. And then apart from that you will find that the nu should be non-negative and mu's are also non-negative. And if you substitute back you will get this as optimal function value from which it is easy to see that the sensitivity is given by lambda and b. Sensitivity to the values b 1 and b 2 will be given by this lambda and mu's. If you now try to construct the dual out of it you will find the dual to be this which is the optimal function value in terms of lambda and mu. This is the dual and what are the constraints? Constraints of the dual problem you will get from here that is this and from here that is this. So, and mu is greater than equal to 0. So, these are the constraints of the dual which are the optimality conditions of the primal problem. So, this shows you the symmetry between the primal and dual problems. Now, what is a quadratic programming problem? A quadratic objective function and linear constraints define what is called a quadratic programming problem. Why is that special? Because if you try to write the KKT conditions which include the derivative of the objective function and then if the objective function is quadratic then its derivative will be linear function and constraints are already linear. So, equations that get you get out of the KKT conditions they are all linear functions and that means when you write the first order necessary conditions KKT conditions whatever equations they involve they will all be linear equations and therefore Lagrange methods which try to directly solve the KKT conditions they are the natural choice for a quadratic programming problem. A very simple example shows that with equality constraints only a quadratic programming problem is very obvious to solve. If you have this as the objective function and this as the constraints only equality constraints then directly KKT conditions will give you this which is a system of linear equations and in one step without any iteration you can solve it and get the x star and lambda. And if the non-linear programming problem has if the quadratic programming problem has a solution then this immediately will give you that solution and for that of course what you require is the positive definiteness of q and so on. Now this is if you have only equality conditions equality constraints if you have inequality constraints also then the process becomes iterative and you can consider an active set method in which you keep track of active constraints from iteration to iteration or you can consider a slack variable strategy which gives you a linear complementary problem and you can solve that problem that has also as a methodology of its own. So, we will not get into the details of these methods except to point out that the active set and slack variable strategies turn out to be quite competitive for a quadratic programming problem which is somewhere in between linear programming problem and general non-linear optimization problem. For linear programming problem typically we add here to slack variable strategy only in the case of typically generally highly non-linear problems we typically take active set strategy for a quadratic programming problem both are competitive. So, if you follow these slides or the textbook you will find quite a few examples of quadratic programming problems and some of the exercises you can try to be at home with general non-linear non-linear problems. So, quadratic programming or quadratic problems open the gates and give you some of the seeds for the general methods of non-linear optimization which we have been talking about the structure of the methods which we have been talking about some of the some of these methods have their roots in the typical quadratic problem. Thank you.