 So, you can also talk a little bit further about what these Lagrange multipliers are actually saying and for that let us go back to our example again and I will tell you what so, you can actually calculate what the lambda star in this particular case was and you what you will notice is that in this case actually if you take write it in a different colour. So, if you take the lambda star actually satisfies this it is the if I look at this optimal value m of alpha and look at the partial derivative of m of alpha with respect to alpha then that is actually lambda star. So, that is in short it is equal to 2ab. What is the meaning of this? What this means is that the lambda star is telling me how much would the objective function change if I change my alpha? How much would be not the objective function the optimal value change if I change my alpha? So, m of alpha remember was the size was the area of the largest rectangle it is a function of alpha, alpha was the right hand side here alpha tells you how big is your ellipse right if I scale alpha my ellipse magnifies or become smaller. So, if I change my alpha slightly how much would the size of the largest rectangle how much would the area of the largest rectangle change by that is what my lambda star is telling me. A lambda star is actually is the derivative of the optimal value with respect to alpha. So, here is the interpretation and the importance of Lagrange multipliers. Lagrange multipliers are telling us how sensitive is the optimal value to changes in the constraint. So, think of the constraint as a resource suppose I tell you that alpha is my is the size of my the controls say the plot of the size of the plot of land this which is of this elliptical shape plot of land and alpha controls the size of that. So, alpha is a way by which I am going to measure the size of that plot of land I am if I wanted to change my alpha a little bit means if I wanted to go for a slightly bigger plot of land how much bigger of a rectangle could I accommodate in terms of area? Well the answer is for a delta alpha change in alpha it would be lambda star times delta alpha would be the change would be the change in the area of the optimal of the largest rectangle. So, Lagrange multipliers tell us something about the so this is what is called sensitivity. Sensitivity is for small change in constraints what is the change in optimal value? So, small changes in the right in the constraints of the constraint what is the change in the optimal value. So, you can do this one constraint at a time also you do not need to look at all constraints together look at if you are changing only one constraint by a slight amount you look at how much is you are basically just making a change in that particular component of alpha. So, what together by together with these you will be able to say what the so I will explain what this is. So, you can get in the general case if I look at just so this is the derivative of the optimal value with respect to alpha that is always equal to lambda star transpose. Now one thing to note here is because you are talking of equality constraints because you are on surfaces a bigger value of alpha does not necessarily mean you have more resource it just happened because we are talking of this ellipse that larger alpha would mean a bigger ellipse and smaller alpha would mean a smaller ellipse. But in general as you change your alpha your surface can change in many in strange ways. So, it does not necessarily mean you have you are optimizing over a bigger region or that the earlier region is enclosed in the previous region or any of that because we are talking of surfaces here as alpha changes the shape of the surface or the contour on which you are operating will change. So, it is possible that the objective can by increasing alpha your objective could decrease your it is possible that by decreasing alpha your objective could increase all of that is encapsulated in the sign of lambda. Lambda is also then the sign of lambda also tells you where which constraints are sort of more binding than the others and in which direction should you be changing the constraint whether you should be decreasing or increasing in order to get a better objective. So, you are doing this problem of last time we did this problem of least squares solutions of equations that were over determined. So, you could not satisfy all of the all equations at once we did this in the context of machine learning and also in the context of maximum likelihood estimation. So, all equations could not be satisfied together by a linear relation. So, we were looking for the minimizing the sum of the squares of the residues that was the problem we were looking at that became an unconstrained optimization problem. So, today I will look at a slightly different problem. So, here suppose you have suppose you have a matrix A and since this matrix is a fat matrix what I mean by this is you should imagine it to be something like this. It has fewer rows and more columns. This is the nature of the matrix A and let us assume that it is full row rank. Now, if I ask you for a solution of this A x equal to B where B is some other vector. So, suppose my A is in R m cross n and B is in R m sorry and I ask you for a solution of A x equal to B. So, I am in this region where m is less than n actually m is much less than n in general. So, then can you solve for x and how many solutions do we have? Yes. So, this is the fewer, so number of unknowns here is n which is the number of columns of A and number of equations is m which is the number of rows of A. You have fewer rows than columns or fewer equations than variables. So, you can easily of course solve for this. In fact, you will get not one but infinitely many solutions. Why infinitely many? Because this sort of matrix or fat matrix like this will always have a null space. So, the null space of A, the null space of A this is z such that A z equals 0. So, this is an entire subspace of R n. So, if I have one solution like this x hat, suppose if I take x hat, let x hat be such that A x hat equals B and I take any z in the null space of A, then what can I say? x hat plus z is also a solution of this. So, if I have one solution and of course there is at least one, I can always generate an infinitely many more by just picking points from the subspace. So, then in this sort of situation then a common problem that is posed is that you want to find a solution that has a certain structure. Now, structure can mean many different things. Structure can mean sparsity, structure can mean close to something else, structure can mean having the least norm. So, in this case let us look at the least norm problem. So, the problem there is then is to look at amongst all solutions x of the system of equation A x equal to B. You want to find the one which has the least norm x transpose x or x transpose x that is the same as norm of x whole square. So, we have this problem. So, now if A is in R m cross n, how many variables are we optimizing over? You have n variables, x is in R m, n variables, n scalar variables, how many constraints do we have m of these? All of them put together, I have written it as a matrix equation, but it is basically m individual scalar constraints, n variables, m constraints. So, this is now an optimization problem of trying to find the solution of least norm that satisfies the linear system of equations. Let us solve this. So, remember this function L that I introduced. If you look at this function L and I write here, just look at this quantity, the gradient with respect to x of L, what would that be? That would be f 0 of x minus gradient of f 1 of x times lambda 1 minus gradient of f 2 of x times lambda 2 dot, dot, dot gradient of f m of x lambda m. And now go back to the boxed equation here. Can you write this equation in terms of this function L? This is, I mean, if I just take the transpose of this, what this is effectively would amount to is to simply say, it would amount to saying that the gradient of this Lagrangian equation with respect to x evaluated at x star lambda star should be equal to 0. So, this boxed equation is basically, all it is saying is that the gradient of the Lagrangian must be equal to 0. So, this is one succinct way of writing this equation, the right boxed equation. In addition, of course, you have, you need to satisfy these boxed equations. So, let us use that sort of notation here. Let us write the Lagrangian. So, what would be the Lagrangian? I have my objective x transpose x and minus, now I need to write. So, what I can go back here if you like. Lagrangian was linear, you are taking linear combination of the constraints. So, the constraints were, so, these constraints were written as f. So, actually I made a slight error here, let me just correct that. So, let me absorb all the alphas also in the definition of the functions. So, if f 1 of x minus alpha equal to 0 is my constraint. So, it is lambda 1 times f 1 of x minus alpha lambda m, lambda m into f m of x minus alpha. This does not affect the way I, after I take the partial derivative with respect to x, all the alphas will not, will anyway go away. So, it does not affect this condition. So, let us write it in this sort of way. So, I can write it as for my problem. So, you have x transpose x minus, now let lambda be your Lagrange multiplier vector and I will do lambda transpose Ax minus b. Now, can you verify that this is the same as doing, this is the same as doing actually lambda 1 into, so, where a can be expressed as a 1 transpose dot dot dot a m transpose. So, if my rows of a are a 1 transpose a 2 transpose and a m transpose, those are my, those are my rows of a, then I can write this Lagrangian in this sort of way. So, it is x transpose x minus lambda transpose Ax minus b, where now lambda is just, lambda is a vector in, is just any vector in R n. So, what I have to solve for is that is my previous box equation, which is the gradient of the Lagrangian should be equal to 0. And in addition to that, I need to make sure I am feasible, which is this other box equation, which means I need to make sure that Ax equals b. These are my, these are the equations I need to solve, is this clear? So, if I put the gradient of the Lagrangian with respect to x, let us solve, let us calculate that, what would that be? What is the gradient of the Lagrangian with respect to x? So, it is 2x minus a transpose lambda, which 2x minus a transpose lambda. So, this is my, this is the Lagrangian, I am taking the, it is gradient with respect to x, it gives me 2x minus a, you can check this, this is 2x minus a transpose lambda. So, I need to put this equal to 0, so that gives me that x is equal to a transpose lambda divided by 2. Now, I also need to satisfy Ax equals b. So, I can just substitute for this x out here and that would give me a into a transpose lambda by 2 equals b. Now, a into a transpose, remember a was a fact matrix like this, a transpose would be a thin matrix, a and a I have assumed is full row rank, so a into a transpose is invertible. So, consequently I can take this on the other side and so I have a a transpose whole thing inverse b and a 2 out here, that is my lambda. With my lambda known, I can put it back here to get back my x. So, this is my lambda star, lambda equals lambda star as this put this back here and that gives me my x star as equal to a transpose, a transpose inverse b, sorry, into, yeah, this is no 2. So, this is your least norm solution. So, the least norm solution of this optimization problem is this one. So, if you want to understood what we did, we have your, we had our optimization problem, we wrote out the Lagrangian function, then we took the derivative gradient with of the Lagrangian with respect to just the x variable put that equal to 0 and then we had to also satisfy our constraint. These were the two equations we need to satisfy, you put that in, we get that we found actually that there is a unique solution. So, it has to be therefore that this is the solution of the problem, is this clear? So, I will, so we can end here, we will continue again next time.