 Welcome everyone so we were talking about linear under approximations to the Lagrangian function you can scroll back to the previous lecture to know what that is. So what I will talk about today is duality in the convex optimization world and the cornerstone of that is going to be a constraint qualification that we have mentioned before which is the which is Slater's constraint qualification. So to begin with this let me recall let us state our optimization problem first. So we want to we are going to be minimizing this function f subject to g i of x less than equal to 0 for all i from 1 to m and h j of x equal to 0 for all j from 1 to p ok. I still have not brought in convex the convex optimization optimization problem because I wanted to first discuss a little bit of the geometry involved in this problem right. So let us now instead of so the our approach is going to be to that we will not we will not look at this problem primarily only in the space of x's. So x's are what we will call primal variables primal variables are the variables in which your the primitive variables in which your optimization problem has been defined the dual variables are the variables that we introduce corresponding to the constraints in the optimization problem. Now the instead of looking at the problem only in the primal variable and studying the geometry of the problem only in the primal variable what we will do is we will look at the problem jointly in the primal and dual space. Now when you look at the problem jointly in the primal and dual space a new type of geometry emerges and that geometry is essentially at the heart of duality and one of the main lessons you will you should take back from this sort of analysis is that the correct space of viewing an optimization problem is in the joint primal dual space neither in the primal alone nor in the dual alone but in the joint primal dual space you will realize all this in a much deeper way once we look at algorithms that involve both primal and dual variables. So now let me let us do this let us let us go to go into some more specifics on this. So define this new quantity which is define this set let us call this set G. This set G is written in terms of three variables u, v and t. It is u, v, t such that there exists an x for which such that fx is less than equal to t, gi of x is less than equal to ui and hj of x is equal to vj. So this is for all j from 1 to p and this is for all i from 1 to m. So this is my set G. Now you can see what I have done here. Now one thing you would notice here immediately is that see our original optimization problem has been posed has been has been written with the right hand side as 0. So if you notice here it is gi of x less than equal to 0 and hj of x less equal to 0 for our constraints. Now on the other hand if you look at this set G here what we are allowing for is gi of x less than equal to some ui and hj of x equal to some vj. Now the reason we are doing this is because we want to allow for constraints also to be elastic in a certain sense because and we want to see what sort of values can the objective function take as your constraints vary. This gives us then an object that lies that spans the basically the joint values of the objective and constraints that can possibly be attained. And effectively in this space we can then look at a new kind of object what has happened because we move to this sort of space we do not really need to look at we do not need to bother ourselves with the actual the x anymore. So you will see what I mean by that. So the right hand sides here are although they are 0 here we are defining a new object a new notion here G which looks something like this where so it is those x is such that it is sorry it is u v t it is a 3 tuple of u v and t where u can you can someone point out what are the dimensions of this u must lie in R m v must lie in R p and t is a scalar. So this entire G is a subset now of R m plus p plus 1. So what is this set G it is those values of u v and t that can be simultaneously attained by some x means it is u v and t such that there exists an x for which f x is less than equal to t g i of x is less than equal to u i and h j of x is equal to v j. Now by when I mean simultaneously attained actually it is not exactly simultaneously attained because it is it can be just simultaneously lower bounded by some x. So that means you just need an x for which f x is less than equal to t does not have to be exactly equal. And similarly g i of x has to be just less than equal to u i but for the equality constraint we want that equality must hold h j of x should be exactly equal to v j. Now suppose the optimal value of the primal so this is my primal problem suppose I write what is the optimal value of the primal what is the optimal value of the primal in terms of this set G well in terms of so what using G what we can now try to express is the optimal value of not just this particular problem P but a family of problems but not just this particular problem P but a family of problems which in where the right hand sides are now not just are not necessarily zero but any u i and v j all right and we will study how this varies as u i and v j vary. So the optimal value of P is this is actually equal to the infimum of the infimum it is the least value of t such that 00 comma t belongs to G. So optimal value of P of the primal is actually the least value of t such that 00 00 t belongs to G. Now how do I get that well if 0 if when I put 0 u and v equal to 0 effectively I am putting I am looking for those x's here for which g i of x is less than equal to 0 and h j of x is equal to 0 and then I am seeking the least value of t that is possible in this what would be that least value of t well it would be it would be attained at the at the value at the optimal value of P itself because then I will get t equal to f of f of x star where x star is the optimal solution of P. So the optimal value of t is not optimal value of P can be expressed in terms of this set it is this least value of t such that 00 t belongs to G. Now can one express the dual function now in terms of can one express the dual function in terms of in terms of this set G. So that also can be done. So let us look at let us look at this d of lambda comma theta this is equal to the infimum of lambda theta comma 1 transpose u v t such that u v p belongs to G. Now let us see how this comes about. So the what is this when I when I take when I take u v t in G what I am doing I am taking I am just taking any point u v t that such that such that these these relations are satisfied and then what I am doing is saying I am taking lambda theta 1 transpose that so effectively what is this expression this expression is basically equal to lambda transpose u plus theta transpose v plus t. So this overall u v t that belong to G now if u v t belong to G if as a very u v t over G how what would what would this become. So one sorry one thing I forgot to mention here that this holds for lambda greater than equal to 0. So as I vary as I vary this so as I vary u v t in G so let us look at t first. So t here is free I can I need to get the least value of t what value of t would it pick out as I vary u v t well it would pick out the value where t becomes equal to f x for some x. So it as a very u v t in G what I am effectively varying is x and then and t is for a fixed x t would take the value f x the least value that I would get is the value t equal to f x. v is always set equal to h of x so v j each v j is equal to h j of x. So theta transpose v is going to be just theta transpose h j of x. Now because lambda is greater than equal to 0 the least value of u would also end up becoming the value of g i of x. So least value of u i would end up becoming the value g i of x. So as a result of this this for lambda greater than equal to 0 we get that this actually is nothing but the infimum over x of the Lagrangian. So what does this mean? This means that what have we got as a result we have got that the optimal value of p can be expressed in terms of this set G. The optimal value of the dual function itself can be expressed in terms of this and what kind of what does it mean for it to be expressed in terms of this? It means that the dual function what is the form how is the dual function expressed in terms of G? Well it is you look at these kind of linear functions on the entire space and you ask what is the least value of this of linear functions with a certain type of slope over the set G that value is actually the value of that linear function the that least value as u v t ranges over G is actually your dual function and you get it as a function of the slope. So now if you have any set G if you have any set if you have any set and you look for say let me draw a set here so some set like this suppose you have some set like this and if you are looking for linear if you look at look for linear functions with a certain slope and look for the least value of this linear function over this entire set G. Where would this you can plot these kind of contours of this linear function where would you get the least possible value? Well you would get the least possible value when you when this linear function actually forms a supporting hyperplane to that set. So the so somewhere embedded here is also a definition of a supporting hyperplane. So that means that the dual function is effectively forming the intercept of a certain supporting hyperplane right because after all it you can think of it this way that lambda theta 1 transpose u v you look at this you look at this hyperplane look for u v t such that lambda theta 1 transpose u v t is greater than equal to d of lambda, theta that what is the what kind of a hyperplane this is clearly a hyperplane with a certain slope lambda theta 1 all right. And this hyperplane in this the way the intercept d of lambda, theta has been defined this it is a hyperplane that contains g on one side of it that means the entire set G lies on one side of it and in fact if the infimum is attained in the set G the infimum here here is attained in the set G then it would actually touch the set G as well right. So this is a this is a hyperplane that you whenever the infimum is attained it is a hyperplane that supports it is a supporting hyperplane to the set G all right. So the dual function is basic is essentially in terms of the set G is that you have a set what it is doing is well it is that it is taking it is asking for the support it is asking for the intercept of a supporting hyperplane to that set all right. So what is the maximum value of a dual of the dual what is the optimal value of the dual problem then well the optimal value of the dual problem is the maximum value of the support of the intercept of any supporting hyperplane to the set G right. So the optimal value of dual equal to maximum intercept of any supporting hyperplane to the set G all right. So now now what does this what does this effectively mean now the strong dual if you want strong duality to hold what are we effectively saying we are looking for so let me draw a figure for you that should make things a little little clearer okay. So here is my here is my so now I am going to draw this so let me draw if a set like this. So suppose this is my set G okay and for simplicity I am going to skip the V variable and let us focus on only the U and T variable okay so this is going to be my U variable and this is my T variable all right. Now if this is my set G where is in this figure where is the optimal value of P okay. So let us call that optimal value you introduce a notation for this optimal value let us call it equal as P star. So where is P star? P star is the least value of T such that the point 0 0 T or in now since we are skipping the V axis all together we are just taking the U axis the point 0 comma T lies in G and what is that point well that value is here this point here this is so all of these all of these points here along the T axis are points of the form 0 comma T that lie in G right all of these lie in G and what is the least value of the least value of T such that 0 comma T lies in G well it is equal to this. So this is your P star all right now where is what is a supporting hyperplane here. So let us let us look at let us look for now what I want to look for is something that has a I am now looking for hyperplanes of the form lambda comma 1 okay. So I am looking for hyperplanes of the form lambda comma 1 so I am looking for a hyperplane like this and then we are what we are asking is what is the intercept what is the intercept of this particular hyperplane. So how do I find this particular how do I find this particular intercept well what I need to do is put again put U equal to 0 and look for the intercept on the on the T axis because my T coefficient is just 1 that gives me that gives me the value of of the dual and in this case the D of lambda D of lambda is actually equal to this particular height this height of this intercept that is my D of lambda I am skipping the theta because there is no for simplicity here okay. So now so let us do a few more a few little bit another example so another more detailed example see. So let us think of what sort of set what sort of set is G is the set G see actually the set G that I have drawn here is not really representative of this particular set the kind of set that I that we have considered here if you look at the way that this set G has been defined here U can if I if any U is feasible here then any larger U is also feasible right. So it cannot be that the set sort of ends here if I get if I get a point U here then any larger U in this in this direction is also feasible right. So actually the set should look something like this. So if you look at this set in this along these acts along this axis it should look something like this. So it does should not really end here in this sort of way it should it should actually carry on in this kind in this direction and likewise it should carry on in the T direction as well because if that is if it if any T is feasible to the set then a larger T is also feasible for the same values of U U and B. So in short the set looks something like this okay. Now I have it the way I have drawn this is it actually makes it look like it is it is just a very simple convex set but the convexity needs to needs a proof and it holds only when the problem it holds when the problem is convex okay. So just hold on until then. Now what is then the what is then the the claim of strong duality. So if I if you look at this particular figure here this height the height that I have drawn here this green height here this is the optimal value of my primal problem. The intercept that I have marked here is the optimal value of my dual problem and what you are seeing here is obvious is weak duality in play in the sense that my intercept is always going to be less than equal to this one because I cannot possibly have it cannot possibly you know using a supporting hyperplane I cannot possibly enter into this region. So my intercept is always going to be less than equal to this green height. Now when it would strong duality hold well strong duality would hold if I can do the following if instead of taking this red intercept in supporting hyperplane I can look for a supporting hyperplane like this a supporting hyperplane that passes through the point 00 p star. I look for a supporting hyperplane that actually passes through the point 00 p star then its intercept would be equal to p star and then strong duality would hold right. So the claim of strong duality then basically says that for the set G I should be able to find an interest a supporting hyperplane at the point 00 p star. Now on the face of it this looks like a simple claim why cannot there be so the end take any point you can draw a supporting hyperplane at that point why is that you know so but it is not that simple firstly you need that the set itself is a has to be a convex set that is one we will that is all right we will take that but more importantly remember that the supporting hyperplane is not just any supporting hyperplane it the all of this logic worked for supporting hyperplane that had slopes of this form lambda was greater than equal to 0 and the coefficient for in the t axis of that or the slope in the t axis was unity. So this sort of supporting hyperplane is what is called this it is not just any supporting hyperplane it is what is called in non-vertical supporting hyperplane. Why is it non-vertical now it is non-vertical because it has a it has a non-zero slope along t axis. So a vertical supporting hyperplane would be one which is parallel to the t axis that would be a vertical supporting plane hyperplane this sort of it would be banged parallel to this t axis we are looking for a supporting hyperplane that is non-vertical means that it should not be parallel to the t axis it should have some slope along here. So the t coefficient should not be 0 all right. So that is what we mean by a non-vertical supporting hyperplane. So that is what makes the problem then a little bit more subtle than just simply drawing a supporting hyperplane. We are now looking for that the set G must have a non-vertical supporting hyperplane at the point 00 p star if that is the case then strong duality would hold. So what that means is for instance you cannot have is that the set G cannot take this sort of shape for instance. So the set G cannot you know at the point p star should not just shoot off like this this sort of set G or cannot it is supporting hyperplane at p star would be exactly parallel to the t axis and this sort of set will not you will not have yeah you cannot get that sort of supporting hyperplane for this set okay. So that is basically the that is basically what is at the heart of this particular theorem the at the heart. So to repeat what we are looking for is a is something is this particular is this we are focusing on this particular region if you look here we have the green segment here is the height of the height of that green segment is or the length of that green segment is the optimal value of the primal and the the little intercept here is the optimal is the value of the dual function and what we want to do is we want to search over all possible support and the dual function is formed by the support is the is basically the intercept of a supporting hyperplane to this set. Now what we are when we are optimizing the dual function what we are doing is searching over all sorts of supporting hyperplanes and looking for the largest value of the intercept. Now the intercept obviously by the geometry of this problem by the virtue of being a supporting hyperplane it has to be that the intercept will always be less than this green segment but one way by which you can get equality that means we can get that you would that it is that the intercept would be exactly equal to p star would be when you are able to draw is non-vertical when you are able to draw a supporting hyperplane of this form at 00 p star and now when what is this sort of form for this supporting hyperplane well it must have lambda the slope in the lambda in the u direction should be greater than equal to 0 that is lambda should be greater than equal to 0 and the slope on the t in the t direction should be positive and it should be this sort of hyperplane is what is called a non-vertical supporting hyperplane and we so what we are looking for therefore for strong duality is a non-vertical supporting hyperplane at the point 00 p star all right. So now let us actually get on to the proof of this. Now we need for it is for the existence of a non-vertical supporting hyperplane is where we will need actually the properties of convexity. Now you can see this sort of argument where you had to just you had us we had this what segment and you are looking at the intercept we wanted to compare the segment and the intercept that kind of argument does not need convexity at all it will always be the case once you have a supporting hyperplane it will always be the case that its intercept will lie below. So that is that does not need any convexity convexity comes up will come up subsequently.