 Welcome everyone. So, we were in the previous lecture we were talking about penalty methods. Penalty methods are a way of converting a constrained optimization problem into an unconstrained one by penalizing the constraint. So, we introduced this additional function called the penalty function. The penalty function was supposed to have the was required to have the property that it could be continuous, it would be non-negative and it would be 0 entirely on the feasible region of the problem. So, it would be 0 everywhere on the feasible region of the problem and outside the feasible region it would be strictly positive. So, and using this penalty method penalty function we recast a constrained optimization problem as an unconstrained optimization problem by removing the constraints and adding to the objective a penalty parameter times the penalty function. So, the new objective was the original objective plus a penalty parameter times the penalty function. Now, one of the things we found there was that as the limit if you computed the sequence of unconstrained minimization by letting the penalty parameter go to infinity one of the things we found there was that this sequence, this sequence if it converges that means any limit point of this sequence converge to the global minimum of the constrained optimization problem. Now, this is a extremely powerful technique because it allows us a way of essentially reducing any unconstrained optimization any constrained optimization to simply an unconstrained one. However, there are a few little drawbacks which I will just discuss. One of the things that we find in the penalty in this sort of method is that one converges rather slowly to a feasible point meaning that the feasible in order to become actually feasible your penalty parameter needs to be extremely large. So, let me let us see one kind of penalty method in which we are using what is called a norm penalty or a quadratic penalty. So, consider this optimization problem. Suppose, we are minimizing this function f x subject to the constraint h of x equal to 0. Now, one possible choice for the penalty for a penalty function for this sort of problem since this is a equality constraint one possible choice for the penalty function is to simply penalize h by the penalty function p of x can be taken as simply norm of h of x the whole square. And then we take we can consider the objective which is f x plus plus c times norm h of x whole square. And the idea is that we let c increase to infinity and then we compute at each iteration we minimize this particular problem. Now, where is the catch here? So, the catch is the following. So, let me mention this theorem. So, the theorem is the following. If x star be a limit point, let x star be a limit point of this sequence xk. So, let me say for example, let us take c as c equal to ck and ck goes to infinity. Let x star be the limit point of the sequence xk obtained by minimizing f of x plus ck norm h of x the whole square. And obtained by minimizing f of x plus ck norm h of x the whole square where ck is tending to infinity is increasing to infinity. Now, then if x star is infeasible, then it is a stationary point stationary point of norm of h of x square. So, if it is infeasible, it actually ends up as a stationary point of norm of h of x square. So, you end up actually minimizing not not f but or not f plus this, but rather eventually and ending up minimizing this. So, you end up at a stationary point of this. On the other hand, x star is feasible and the gradients of the constraints. So, here by I did not mention this, but these are basically this could be even a vector of constraints. So, we will allow for this. So, let us say let us write this as a j equal to 0 j equals 1 to p say and then we could write this as norm of this. So, this would be the penalty function for the kth constraint and this would be then a summation from 1 to 1 to p. So, I will just adjust this to allow for a vector. And the constraint gradients, constraint gradients, the constraint gradients are these are linearly independent, then x star is a KKT point of the constraint problem of the constraint optimization problem for any for such points, we have that for any sequence k such that the limit as k in k x k tends to x star, we have limit k in k Ck times hj of xk equals, we have that limit as limit as k of k as as k way runs over the sequence capital K, the limit of Ck times hj of xk equals theta j, where theta j is the Lagrange multiplier corresponding to the constraint hj of x equal to 0 in the KKT conditions at point x star. So, what does this mean? What effectively is saying is that if you take the sequence of xk that are obtained by solving the penalized problem and you let this, you look at any limit point x star of this sequence and you have two possibilities. One is that the limit point is actually infeasible for the original problem, in which case it turns out that the limit point is actually a stationary point, in which case it turns out that the limit point is actually a stationary point of this of your penalty function. So, you end up actually at the stationary point of the penalty function and that has may have no relation whatsoever to the solution of the original problem. On the other hand if you are feasible then it turns out that if your constraint gradients are linearly independent say for instance if LICQ holds and then you actually end up at a KKT point of the constraint optimization problem, moreover with this penalty function moreover you have this additional property that ck times hj of xk tends to approaches the Lagrange multiplier, ck times hj of xk approaches the Lagrange multiplier. Now this is a particular property of this quadratic penalty that we are considering. So, this limiting value becomes the value of the Lagrange, it ends up at the optimal value of the Lagrange multiplier for which for the KKT conditions at point x star. So, this is true for every each equality constraint. So, this is for all this is for all p for all j in going from 1 to p all the p equality constraint. Now, what does this mean? This means that effectively for K large this basically saying that for K large your ck times hj of xk is not becoming 0, but rather becoming close to a constant, constant equal to the Lagrange multiplier. So, which means that if I will just think of hj of xk itself, hj of xk starts approaching a constant divided by ck. Now this particular thing is somewhat undesirable because what effectively means is that hk is never actually going to become exactly equal to 0, which is what you would need. Hj has to be exactly equal to 0 for you to have feasibility. So, you are never actually going to have a j exactly equal to 0 unless ck itself becomes infinite. So, unless ck becomes infinite, this is not going to work out exactly. So, which means that you really need ck to blow up to infinity and so forth. If you terminate the algorithm at any finite iteration, although in the limit you would end up at the solution, but if you terminate the algorithm at any finite iteration, you may not actually be feasible. In fact, you would in general not be feasible. You will be feasibility will be off by a certain amount. Now there are two ways of remedying this and let us I will talk about one particular approach first and then we will go to another approach. So, one particular approach is to change the penalty function itself. So, the reason this is happening is because your penalty function right now is actually quite smooth. The penalty function that we are using it has increases the quadratic penalty function tends to increase gradually. If you see the this quadratic penalty function effectively if you see how it looks essentially looks looks like a gradual gradual increase towards infinity. That is the that is the behavior of this function right t and p of t the penalty function t of t which is equal to p square. So, it is not quite doing a sharp penalization near the feasible region. So, as a result your method tends to sort of become a little bit ambivalent about points that are closed that are around here that are close to the feasible region but are but may be feasible or infeasible. So, it is not the penalty out here is not sharp enough. Now the way one way of therefore dealing with this is to put in that sort of sharp penalty and that is what I will talk about right now. So, a sharp penalty would effectively mean that you are putting you need to put in a penalty that goes from the from inside the feasible region to outside the feasible region in a very in a dramatic sort of way. So, that if usually means that you are look you would lose differentiability of the penalty function you can have continuity but the function will not be differentiable anymore just like the quadratic function smoothly increase you will not you are not going to you cannot expect that sort of behavior. So, you are looking for a penalty function that that is not smooth right. So, that is what is called a non smooth penalty function. So, a non smooth a non smooth penalty function. So, one simple example of a non smooth penalty function is the modulus function the absolute value function right. So, if you have. So, you take the absolute value. So, p of h of x as simply h j of x as simply the absolute value of h of x instead of the square of this quantity. Now, this function in unlike the one before the this this is your quadratic function quadratic penalty the one with where you are using the modulus of the absolute value that function would look like this. Now, absolute value being less would eventually become will eventually this being linear it will eventually the not penalize as severely as the quadratic because the quadratic would penalizes in a much more dramatic way for larger value. But for smaller values that the penalty in the absolute value or the L 1 penalty is also called the L 1 penalty. So, the for smaller values of t the penalty here is going to be larger because the for smaller values mod x will actually be smaller values of t mod mod t will actually be our absolute value of t would be would be less than t square. So, this is the thing that we would end up we want to end up exploiting. So, then what happened what we are then looking for is then we are the problem we are solving then is this problem q of x given C k which which we are in we are going to define this q of x given C k as f of x plus C k times summation absolute value of h j of x j ranges from 1 to p and the and at each step you minimize you minimize this q of this in order to get x k. Now, the theorem that that we can get from this is following suppose x star is a strict local minimum the constrained optimization problem of the constrained optimization at which the KKT conditions are satisfied with Lagrange multipliers Lagrange multipliers for j in 1 j going from 1 to p then x star is a local minimizer the x star is a local minimizer of q and C for all C greater than C star where where C star is equal to the maximum of the Lagrange multipliers maximum of these Lagrange multipliers. So, x star turns out to be a is a local minimizer is a local minimizer of this for all for all C greater than C star. What does this mean that if you if you found a if you can find if you if you can set your penalty parameter to be larger than C star where C star is simply the largest of the Lagrange multipliers that we have then you it then you can get the true solution the solution of the of the constrained problem is also a is also a minimum of of the penalized problem which means that it suffices to set the penalty parameter to be larger than than C star and that that solves the problem. So, this is this is actually very powerful because it lets us effectively get to the solution of the constrained problem through an unconstrained problem and with a finite penalty penalty value penalty parameter value without having to deal with infinities anywhere and so this is but the only only catch here is that the actual minimization of the penalty of the penal of this this function the penalized function the one with the one with the mod H here in the objective mod H here the actual minimization becomes a little problematic because now you have a non-differentiable objective. The objective is f plus absolute value of h j of x which is which is not necessarily a differentiable function this becomes the this is this becomes the catch but this is the price you have to pay for having to for getting a strong result of this kind right. So, this is this is one of the one of the ways by which you can use a finite value of the penalty and yet use the penalty method to get to the solution of a constrained optimization problem.