 So, where does weak duality come from? So, suppose now let us say let X star be, so I will call this problem P, problem P, let X star be an optimal solution of P and optimal solution of, let X star be an optimal solution of P and now let us look at lambda of X star, this comma this. So, this is equal to f of X star plus all this, I going from 1 to m for the inequality constraints and J going from 1 to P for the equality constraint. Now, what is it that you can observe here? So, if you look at this term here, these two terms, what can one say about these? Well, X star is an optimal solution of P, if it is an optimal solution of P, then it has to at the very least it has to be feasible, which means that J of X star has to be equal to 0, which means this term is actually 0. All these guys, this term here is actually 0. What about this term here? What about GI of X star? Well, GI of X star being again X star being feasible, it means that this guy is less than equal to 0. Now, if this guy is less than equal to 0, then so we have that then that f of X star is greater than equal to lambda I times something that is less than equal to 0. Now, I have not yet said anything about lambda I itself. I have not told you that lambda I should be greater than equal to 0 or anything like that. We did put lambda I greater than equal to 0 in this optimization here, when we were defining the dual problem. So, let us go ahead with that. Suppose we restrict lambda I greater than equal to 0 for all I equal to 1 to n. Then in that case, what would we get? Then we would get that this here is less than equal to f of X star. You would get that this is actually less than. So, this quantity here is less than equal to f of X star. Why? Because this is less than equal to 0, this is less because this that these the GIs are actually less than equal to 0 and these terms have disappeared. So, let us play around with this in a different sort of way. So, suppose X star is the optimal solution of P. Now, this is the value of my Lagrangian. Now, remember that D of lambda, theta is actually the infimum of the Lagrangian. So, D of lambda, theta is therefore always greater than equal to L of X star, lambda, theta. So, D of lambda, theta is greater than equal to L of X star, lambda, theta. So, if this is my mistake here. So, D of lambda, theta is the infimum of the Lagrangian. So, if it is the infimum of the Lagrangian, it follows that D of lambda, theta is always less than equal to L of X star, lambda, theta. And this holds for all lambda, for all lambda and for all theta. This is just, I do not even need to restrict my sign here. This holds for all lambda and all theta. Now, what does this mean? And this quantity L of X star, lambda, theta was itself less than equal to F of X star. So, and in getting to this inequality, we needed that lambda is greater than equal to 0 and theta is in R, R p. So, when lambda is greater than equal to 0 and for any theta, we have that D of lambda, theta is less than equal to F of X star. So, this is true for all lambda greater than equal to 0 and all theta and all vectors theta. What does this mean? You look back here, then again at this optimization. What is this optimization? Well, it has a maximization of D of lambda, theta over precisely the lambdas and thetas that are mentioned here. So, what does this mean? It means then that the maximization of D of lambda, theta over lambda greater than equal to 0 and any theta also has value less than equal to F of X star. And now, what is this F of X star here? Remember, X star was an optimal solution of p. It was an optimal solution of p. So, this is in fact the optimal value. So, this here is an optimal value of p and what is the one on the left hand side? Let us call this problem D. The one on the left hand side is simply the optimal value of D. What have we got here then? We have got that the optimal value of the dual is less than equal to the optimal value of the primal. So, this statement here is nothing but the statement of weak duality. So, what does this mean? To summarize, you can take any optimization problem like this. You write its Lagrangian. Lagrangian is formed by taking a linear of the objective and a linear combination of the constraints. Then for the constraints that are inequality constraints, you restrict the multipliers, the Lagrange multipliers here to be greater than equal to 0. For the constraints that are equality constraints, you do not need to have any such restriction. Then you look at the least possible value of the Lagrangian over the entire space. Define that as D of lambda, theta and then maximize that D of lambda, theta over the Lagrange multipliers the way you have restricted the Lagrange multipliers. Lambda to be greater than equal to 0 and theta can be anything. In that case, and then what do we get? We get that the optimal value of this, this is what we call the dual problem. You call this the dual problem, you call this the primal problem. What we get is weak duality. So, the optimal value of the dual problem is always less than equal to the optimal value of the primal. So, now I will show you now that actually what you found calculated as the dual of a linear program in fact appears as a special case of this. So, that is not very hard to see. So, let us just go through this. So, look at, consider a linear program in standard form C transpose X, AX equal to B and X greater than equal to 0. Now, I am going to create a Lagrangian of this lambda of L of X lambda, theta. Now, let us be careful here lambda corresponds to the inequality constraints. So, that is what is going to, so lambda is going to multiply my X and theta corresponds to the equality constraints. So, theta is going to multiply my AX, AX minus AX equal to B. And since we want it in this, in the form that we had for the optimization problem on the previous page. So, what I will write, I will write this as AX minus B equal to 0. So, this is now, so my Lagrangian therefore is C transpose X plus theta transpose AX minus B. Now, if you go back here, I wrote this problem with inequality constraints only since now, but now I am going to allow for equality constraints here in, sorry I am going to allow, sorry I wrote this problem with less than equal to constraints here. So, if you go back to this problem, this problem has been written with less than equal to type constraints, whereas here my constraints are a greater than equal to type of constraint. So, I can effectively just multiply both sides by minus 1 and that would flip the direction of the inequality of the constraint. Alternatively, I can what I need to do is just compensate for that in the definition of my Lagrangian itself. So, this is now my Lagrangian. Now, let me write the dual function, the dual function which is d of lambda, theta that is the infimum of the Lagrangian over the entire space. Now, if you look at the Lagrangian function as a function of X, if you look at this as a function of X, this is clearly a linear function in X. For each fixed lambda and theta, this is a linear function in X. And what you are doing here is you are taking the infimum of this linear function over this entire space. Now, a linear function, if you minimize this over in an unconstra, without any constraints, then you would get the optimal value is going to be minus infinity, except in the case when the coefficients of the linear function are actually 0. The coefficients, if the coefficients involved are 0, then the linear function would evaluate to something that is just a constant. So, now to do that, to evaluate this more clearly, let me, let us just put, gather together the coefficients of X. So, let us write this L of X, lambda, comma theta in this sort of way, C minus A transpose theta minus lambda. The whole thing transpose, sorry, E C plus this, the whole thing transpose X, then there is a, and then I am left with a minus theta transpose B. So, now if I, then if I take the infimum of the Lagrangian, then that tells me that D of lambda, comma theta should be equal to one of these. So, it is equal to either minus theta transpose B, if C plus A transpose theta minus lambda is exactly equal to 0, and otherwise it is minus infinity. So, whenever this is not equal to 0, you can choose a suitable X to drive the value down to minus infinity. So, it is equal to some real value, which is minus theta transpose B for X, for those, so long as theta and lambda satisfy this equation, otherwise it is equal to minus infinity. So, now if I am looking, if I look to maximize D of lambda, comma theta, remember now I need to do this over lambda greater than equal to 0 and over all theta, then what does, where would my maximum be attained? Well, my maximum cannot be minus infinity, obviously since it is a maximum. So, my maximum is going to be attained over this region. What do you mean by this, what do I mean by this region? I am looking for the maximum over these, over the lambda, comma theta such that they satisfy this. So, effectively, the maximum is going to be equal, this is going to be equal to maximize minus of theta transpose B subject to C plus A transpose theta minus lambda equal to 0, lambda greater than equal to 0 and any theta. Now, if you play around with this a little bit, what do you realize? If you play around this with a little bit, you realize that, well, my lambda does not appear in the objective at all. I can absorb this, the fact that lambda is greater than equal to 0 and there is a minus lambda here, then it is simply that the lambda is simply appearing here as a slack variable. So, effectively, this constraint here can be written in this form that I can simply write this as maximum maximizing minus theta transpose B C plus A transpose theta greater than equal to 0 and my theta is unrestricted. So, my lambda plays no role, lambda can be removed from this by just observing that these two equations here are effectively saying C plus A transpose theta equals lambda and lambda is greater than equal to 0. So, it is maximizing this bit. So, I can simplify this even further and write this more neatly. So, I can say maximize minus of theta transpose B C. So, I will write this in the following way. I can see is greater than equal to negative of A transpose theta. So, now and I am maximizing over theta. Now, notice how something that we can do. Since theta does not is has no sign constraints, maximizing this particular thing, this minus theta transpose B subject to C greater than equal to minus A transpose theta, this can be this since theta has no sign constraints, I can absorb, I can just replace theta by minus theta and the optimization problem would remain thermal value would remain should remain the same or alternatively, I take this minus sign minus I just define theta dash as minus theta or define y as minus theta here. If I just let y equal to minus theta, then what I have is familiar form of the dual, which is B is maximize B transpose y subject to A transpose y less than equal to C. Remember, I am not multiplying anything by negative sign, I am just changing my notation. I am doing a change of variables, I am expressing y as I am expressing minus theta as y and then substituting and that gives me this and this is actually nothing but the familiar dual problem of this particular problem. So, this was our primal and this is what we had learned as the dual. So, if you work with the Lagrangian and you follow this, follow the routine that I mentioned in the previous slide, you get back the dual. So, what this does is this way of this entire whatever is there here on the mention that is mentioned here, which is that you define the Lagrangian, you then define the dual function by taking the infimum of the Lagrangian over the whole space and then take the then maximize the dual function subject to with constraints on the Lagrangian multipliers. If you do all of that, that actually gives you, if you do that for a linear program, that gives you back the dual that we had defined earlier. So, this way of defining the dual is now a way of is basically generalizing the duality formulation of linear programming to problems that are potentially non-linear. So, this is going to be our vehicle for analyzing the, for talking of duality for convex optimization. I should also tell you that there is another connection here, let me mention that. So, there is another, the word dual is one that has been used by multiple people in different senses. So, there is another dual which is called the conjugate dual or conjugate dual or what is also called the Fenchel dual, thanks to Werner Fenchel. So, what is this dual? This dual is of the conjugate dual of a function f f is denoted in of a function f, it is denoted as f star. So, f star of y is defined in this way, it is defined as the supremum over all x of supremum over all x of y transpose x minus f x. So, what you do is you take, you subtract from f or rather subtract f from a linear function whose slope is the parameter that you control, the slope here is y. So, you subtract f from this linear function and you look at the maximum value of that you can get with a certain slope. So, what is the maximum departure of this function from a linear function and study that as a function of the slope, that if you look at that as a function of the slope, that quantity is what gives you is what is called a dual function. Now, the dual function just like your, just like D has, the D was always concave, then this f star is always convex. So, now what is the connection between D and D and f star? Well, there is a connection in the following sense that you can see that this here as what you are taking the supremum of has a resemblance to the Lagrangian in some way as a resemblance to the Lagrangian. So, it is sort of closely related to an optimization to a certain type of optimization problem. And that optimization problem is basically, you can consider the optimization problem where you are minimizing f, if you are minimizing f subject to say x greater than equal to 0. So, in that case, the kind of quantity you would encounter would end up becoming something like this. But this is just for you to know, there is no, this is another notion of the dual and it should not be confusing this with the Lagrange dual. So, what we will be working with is the Lagrange dual.