 Let us talk a bit more about the dual function. So what is the dual function? What is the geometry of the dual function of the Lagrangian? What is the geometry of the Lagrangian? So now here is one way of so the Lagrangian or and so on all of these appear they make an appearance in problems that are unconstrained. But we can convert these sort of problems that are actually constrained to problems that are unconstrained in the following by doing the following thing. So I have suppose a problem that I mentioned before which is minimize fx subject to over variables x subject to gi of x less than equal to 0 for all i from 1 to m and hj of x equal to 0 for all j from 1 to p and then I defined this as my Lagrangian lambda of x L of x comma theta as f of x plus right. So this was my lambda L of x lambda theta. Now suppose I did the following I decided I want to express this constrained optimization problem as an unconstrained optimization how would how can I do that? So to do that let me introduce this function this function let us call this the function i plus the function i plus of t what does this do? This function is 0 minus infinity it is greater than for t greater than equal to 0 it is 0 and for t strictly less than 0 it is minus infinity right. So i plus of t how does this look as a function of t? So here is suppose my t for t is strictly less than 0 this i plus is equal to 0 change the color and first it is this for t is less strictly less than 0 it is sorry for t is strictly less than 0 it is equal to minus infinity right. So for all right and for t greater than equal to 0 it is actually equal to 0. So this is my function i plus so it takes value minus infinity when t is less than 0 and 0 onwards it takes value 0 let me define another function related looking function i 0 of t. i 0 of t is defined in this way i 0 of t is equal to 0 if t is equal to 0 and it is minus infinity otherwise. So how does this function look? This function looks like this at 0 it is 0 and everywhere else its value is minus infinity okay. Now these are obviously very ill behaved functions they are taking value minus infinity and so there is obviously a huge discontinuity and a non-differentiability here but in terms of these functions if I include minus infinity in my calculations I can express an unconstrained problem in terms of the I can express this constraint problem as an unconstrained problem. So how do I do that? So notice that this optimization okay this problem let us call this problem p, p is actually equivalent to minimizing fx now what do I need to do here okay let me just change this a little bit I will just change this definition a little bit so actually let me make a slight change in this definition because that will be convenient for us so let us instead of taking defining i plus and i 0 in this sort of way let me put these as using this as plus infinity okay and also I will change the range so it is plus infinity so define i plus of so let us so define these functions so define i plus of t as is in this sort of way at so for t for so whenever t is less than equal to 0 it is equal to 0 and when t becomes positive it shoots to plus infinity alright and i 0 of t is 0 when t is equal to 0 and whenever t is not equal to 0 it again shoots to plus infinity alright so what is this look like what sort of function does this look like let us draw these here so i plus of t as I said it is for t less than equal to 0 it is for t less than equal to 0 it is 0 and for t greater than 0 this it shoots to plus infinity so this is my i plus of t and i 0 of t looks like this at 0 no problem it is equal to 0 and whenever t is not equal to 0 it shoots to plus infinity this is i 0 of t alright now using these functions i plus and i 0 of t I can actually express express my I can express my the optimization problem p I can express in the following way I can I can write this as minimize fx plus summation lambda i i plus of gi of x sorry I do not need the lambda so using these these these two functions i plus and i 0 of t I can express my optimization problem p in the following way I can write it as minimize fx plus summation i plus of gi of x I going from 1 to m plus summation i 0 of j of x j equals 1 to p so now what does this do what does this do well it says that look at I look at the definition of i plus whenever gi of x is less than equal to 0 I plus is equal to 0 alright so in that case this this term here the term here this this term actually is equal to 0 whenever gi of x is whenever gi of x is less than equal to 0 in particular when all the gi of x is less than equal to 0 this entire summation is actually equal to 0 similarly look at this term whenever h j of x is equal to 0 this term is equal to 0 right this term is 0 so whenever so whenever so in short if I take any x that is feasible for p then each of these terms the i plus terms as well as the i 0 terms each of these terms should would end up being exactly would become exactly equal to 0 so in short on the feasible region this new function that I have defined you know this new non-differentiable infinity value taking function this function is on the feasible region is actually nothing but f x now outside the feasible region of p outside the feasible region of p what is what is this function well if you are outside the feasible region then it means that at least one of these terms is going to be not 0 so at least these if either one of the at least one of these i pluses or one of these h j's i 0's one of these is going to be non-zero and when it is non-zero what value does it take whenever it is if it is not 0 if these these things are equal to plus infinity remember so this they take value plus infinity so which means that once you are outside the feasible region this this here this this expression here actually takes value plus infinity so what does this mean it means that this this function here let me write it like this this is equal to f x for all x feasible for p and plus infinity otherwise so which means that if you have to minimize this this if you have to minimize this this function that is that is mentioned here if you are minimizing this particular function what are you doing you are effectively just minimizing f x over the feasible region of p which is nothing but solving p itself right so all so this problem although you are minimizing this over all in an unconstrained way over the entire over all of r n what you are effectively doing is minimizing just f over the feasible region of p right so that is that is actually an incredible incredible simplification because you you you do not really need to care about the geometry of what is happened in the constraints and so on but it is also very deceptive because what you have done all that geometry has actually been absorbed into these complicated i 0 and i plus functions which if you have to analyze you would need to understand the geometry of of the g i s and h j s in the first place all right now what has this got to do with the Lagrangian so if you look at this the Lagrangian function here that is less that is mentioned here and if you look at the function that I am optimizing here there is clearly a close resemblance because here is your f plus a summation of something right and and some other terms with with the here they have inequality constraints and then here you have your equality constraints right and likewise here you have f plus something that involves inequality constraints plus something that involves equality constraints so now what is the connection between these two that is something that we can see we can see now so what I have you can think of you can think of it this way that what I have done is actually in place of the i plus and in place of the i 0 I have put in some new functions here which are actually linear functions so in place of i plus of g i of x what I have done is put in lambda i times g i of x and likewise in place of i 0 of i 0 of h j of x I have placed theta j of theta j times h j of x now what is that what is that actually doing so let us come back to this figure see remember i plus has this sort of form where for t less than 0 it takes value 0 and for t greater than 0 it shoots up to plus infinity now if I want to approximate this linearly in a if I want to do a linear lower approximation to i plus of t what sort of function can I choose well the kind of functions that I can choose have to be of this sort of form so I have not drawn this very well let me draw it again so the kind of function that I can choose has to be of this sort of form so what does this mean what kind of form is this it is a function whose slope is like this it is positive because if I take a function just this was for just for you to see if I took a function whose slope is negative then it would at some stage for some value of t it would go above 0 above this line here and then it will not be a lower approximation anymore so for it to be a lower approximation it is necessary that it must have a slope a positive slope like this it must have a positive slope like this moreover for it to a since it must have a positive slope I can look for the best lower approximation amongst the guys that have positive slope and it is clear that it there is no use having an intercept here so taking a lower approximation like this is of no use I might as well get a better approach better lower approximation by taking the intercept by making sure it passes through the origin in short the intercept should be 0 right so it is of no use taking this kind of lower approximations so what we can take are approximations like this we take a lower approximation that is we take a lower approximation like this which is passing through the origin all right so what does this mean the in short a lower approximation to i a lower linear approximation to i plus is takes the form it takes the form lambda i times t okay so so this this one here is a function of the form lambda i times t okay where lambda is is lambda i is greater than equal to 0 likewise a lower let us look at now i 0 so if I have if you look at i 0 if i 0 has this is it takes a value plus infinity everywhere except for 0 so if you want to take a lower approximation to this now that you can take a linear function like this you can also take a linear function like this so long as the intercept is is is below 0 there is no problem right because everywhere else the value is plus infinity and you will be okay you will be all right so so long as the intercept on the y axis is negative you can continue you can take any kind of linear function but again if you want a tighter approximation if you want a better linear approximation why even bother with an intercept you might as well take a linear function that passes exactly through through the origin and that gives you that means that the function should be of the form theta j times t either this or this whatever right this these are functions of the form theta j times t where theta j is any is any real number right okay so what does this mean this now these are both linear approximations all right so which means that point wise that means for every t they actually take value less than the corresponding i plus or i 0 respectively right so which means that if I look at what is here if I look at what is in this optimization what I just put in the bracket that can always be lower bounded if I if I replace these the i plus and i 0 by their respective linear approximations so I can I can get this is always greater than equal to write like this so p is the optimal value of p is is always greater than equal to the minimum over x in R n of minimum or infimum whatever over x in R n i going from 1 to m j going from 1 to p theta j times j of x what is and what is this this is actually nothing but the inf you are doing effectively the minimum of the Lagrangian and this is equal to your dual function so so long so the greater than equal to here this required that lambda is greater than equal to 0 and any theta. So for all lambda greater than equal to 0 and for any choice of theta we have that we have that the optimal value of p is is greater than equal to this which is which is nothing but the dual function and now so what is the what is effectively the dual function doing the dual function is taking a linear approximation to these i 0 and i plus functions okay and and solve the dual function is the optimal value of that linear approximation so or in other words the Lagrangian what the Lagrangian is actually doing is is taking a linear approximation to these i plus and i 0 functions and the the minimum of the Lagrangian is basically the minimum value of this linear approximation and that is what we are calling the dual function and why is it a it is a function of what it is a function of the slopes that you choose for making the linear approximation. So as a function of these of the slopes you could have we remember we we said there is no need to take the intercept and that is why we took the intercept as 0 and we and we got these linear functions passing through the origin but we did not say anything about the slope the slope is up is still up for grabs it can it is still to be decided so as a function of the slope the gap between the actual optimization and and or and and this can can still be fine tuned the actual approximation as actual optimization and the lean and its linear under approximation can still be fine tuned in any case the linear under approximation is giving is is captured is captured by the dual function which gives it to you in is a function of the slope so maximizing the dual function so maximizing this which is your dual problem maximizing this which is your dual problem is basically asking for what does this ask for it is asking for the best linear under approximation and see basically asking for the best linear under approximation. So in this class of under in this class of approximations you can what is the best you can do right so you you so so the sequences you have you create your you write your actual problem like this you write your actual problem like this create a family of linear approximations using this using this logic you look for the best value of the minimum of those linear approximations and then you ask what is the what is the best I can do amongst amongst my entire family what is the largest value of my what is the tightest lower bound that I can get using this linear approximation all right. So that is what that is what the dual problem is doing now this if you think about it this way it is actually nothing short of a miracle that that in the case of linear programming the primal and dual actually end up being equal. So the the optimal values of the primal and dual being end up being equal which means what I mean by that is see you see how grossly inaccurate this entire linear approximation is so what you wanted to actually approximate was this sort of function something that 0 here and shoots to plus infinity after that likewise what you wanted to approximate here was this function that is plus infinity everywhere and 0 here and you are approximating it by what an extremely benign function you are just taking a linear function like this and then you are saying okay amongst this class of approximations which is the one that is giving me the best possible value that is by that is what you are solving by solving the dual problem right and it is incredible it is actually really incredible that you can in fact get get back the same value as of the primal that means that there will be no gap between this one which involved is i plus and i 0 functions and this problem that has been found by looking for the best value of the linear approximation okay. So in the case of linear programming that is exactly what we get with the linear programming duality theorem taught us that the primal optimal and the dual optimal are whenever there is a solution to the primal there is a solution to the dual and the values are equal and that is what we are finding here okay. So this sets the stage now for convex optimization duality and so we will do that we will do that in the next lecture okay.