 Alright, so what we'll do now is consider another second version or technique for optimizing a function subject to a constraint. So we've already considered the first more algebraic approach of using the constraint to eliminate one variable. We did that on this example here and just to give away the final answer, we found that when we optimized this function, found the minimum of that function subject to this particular constraint, the solution, the answer was x and y are 5 and 1. The goal now is to do that via a different approach. So let's say we want to find the maximum or minimum of some function. Maybe it's a two-dimensional function that depends on x and y. Maybe it depends on more variables than that. But to keep things simple for right now, we'll do the example with just two variables of x and y and we want to do that with some constraint and the constraint is also a function of x and y, so x plus y, that's a function of x and y, but that function has to have some particular value. So our constraint takes the form, maybe some different function of x and y has to be equal to some constant, some constraint. So the trick here is instead of thinking about the original function, let's make up some new function, I'll call it f prime. That's going to be the original function f of x and y, but I'm going to add to that function, let's say, c minus the constraint equation. So the value of the constraint minus the constraint equation, c minus g of x and y. If the constraint is obeyed, any time x and y take on values that make this constraint true, then g of x is equal to c and this difference is zero. So all I've done is add zero to the function, I haven't changed the function at all, if I'm obeying the constraint. In fact, I can add zero to that function not just once, but I could stick a two in front of here or a five or any constant that I want. So I'm going to stick this constant lambda in front of that quantity in brackets. So again, if I'm obeying the constraint, I've just added zero to the function some arbitrary number of times. And this method that we'll use is going to be called the method of Lagrange multipliers and this lambda is the thing that's called the Lagrange multiplier. So as long as I'm obeying the constraint, I haven't perturbed the function at all. So the minimum of the function subject to the constraint is also going to be the minimum of this function. So I can look for the places where the partial derivatives of this new function f prime are equal to zero. That will find the places where this function f prime is at a minimum or a maximum. And we can write down, even if we don't know for a generic problem what f and g are, I can write down the derivative of f prime is the derivative of f and then the only thing the derivative of fx in here is the derivative of g. So derivative of f prime is going to look like partial derivative of f with respect to x plus lambda times a negative sign. So that's why I left this as a negative sign, derivative of g with respect to x. So I've got lambda gg dx. And if I'm at a minimum of the constraint function, then that derivative must be equal to zero because the derivative of f prime has to be equal to zero. Likewise for the derivative with respect to y, derivative of f with respect to y minus lambda derivative of this g with respect to y, that has to be equal to zero. And both these things are true when I'm at the minimum or the max of this function. So we have a new set of expressions. For my constraint function, we're not necessarily at the minimum of the original function f of x. We will be at the minimum of this function f prime. So this df dx minus lambda dg dx must be zero. Similar partial derivative with respect to y must also be zero. So this technique, using these two equations, that's called the method of Lagrange multipliers. And that'll make more sense perhaps after we work an example. So let's go back to this example. So we minimize this parabolic function x squared plus 2y squared minus 6x plus 5 subject to this particular constraint. So I've written down the constraint, our constraint function x plus y, that's g. So if we solve this problem with Lagrange multipliers, we just need to use these two expressions. We know what f is and we know what g is. So I'll take, just plug into these equations, derivative of f with respect to x. So I'll go ahead and write down what I'm doing. df dx minus lambda dg dx, df dx is 2x minus 6. I need to subtract lambda times dg dx. This quantity x plus y, that's the constraint function g of x and y. The derivative of that with respect to x just gives me 1, so lambda times 1. That's what must be equal to 0 at the minimum. Likewise, for y, df dy minus lambda dg dy, derivative of my original function with respect to y just gives me 4y. If I subtract lambda times, again, dg dy, the derivative of this constraint function with respect to y just gives me 1. That must be equal to 0 when I'm at the minimum. So these equations can be solved. If we notice one additional thing, at first it looks like it can't be solved. We need to know the value of x. We need to learn the value of y, and we need to know the value of lambda. Right now we don't know any of those, but we only have two equations to solve for those three unknowns. To make that clear, let's just rearrange these equations. The first equation, 2x minus 6 minus lambda equals 0. So I can rewrite that as 2x minus 6 is equal to lambda. So I've just brought the lambda over to the other side of the equation. Likewise, the second equation, 4y minus lambda equal to 0. So 4y is equal to lambda if I rearrange that second equation. But these two lambdas are both the same lambda, right? So lambda is either equal to 2x minus 6, or it's equal to 4y. In fact, it's equal to both of those. Those two things have to be equal to each other. 2x minus 6 is equal to 4y. And that's all the further I can get with what I've written on the board from here to here. 2x minus 6 is equal to 4y. But there's a lot of ways that can be true. Lots of different values of x and y I can pick that make that true. The next step is to remember that we still have this constraint equation. x plus y has to add up to 6. So y has to still be 6 minus x. So if I use the constraint equation in here, I learned that 2x minus 6 is equal to 4y, which is equal to 4 times 6 minus x, which is 24 minus 4x. And if I bring all the x's to the left, so I've got 6x's. If I bring all the constants to the right, 24 plus 6 is 30. And I find that x equals 5. And the constraint equation tells me that y must be 1 when I am at the constrained minimum of this function. So exactly the same answer, 5 comma 1, as we obtained when we did it using the more familiar algebraic approach. So for this particular example, it may not seem like I've saved very much effort by using Lagrange multipliers to solve this problem. Really, this is just an illustration to get us comfortable with using Lagrange multipliers. But it's certainly useful to point out that we get the exact same result by Lagrange multipliers as we did when we did it, the more familiar approach. Where this will become useful is in problems where the standard method is a little bit tricky. So as a first step, let's take the method of Lagrange multipliers and apply it back to our coin flip problem. So what we want to do in this problem is maximize the entropy. In particular, let's say we're going to maximize. So the entropy is minus k sum of p log p. So I'll go ahead and maximize s divided by k. So I don't have to worry about the value of k. So I want to maximize minus p heads log p heads minus p tails log p tails. But I don't just want to maximize that function all by itself. I want to maximize that function subject to a constraint. I want to guarantee that whatever my probabilities are, heads and tails have to add up to 100%. So I don't want to accept solutions like 37% heads, 37% tails that don't add up to 100%. I want to find the maximum entropy. The most random, I can make this coin flip while requiring that heads and tails add up to 100%. So this s bar over k, that's the function I want to maximize. P heads plus p tails, that's the constraint that has this particular value. And the solution to this problem is going to come when derivative of f with respect to p heads minus lambda derivative of the constraint with respect to p heads is equal to 0. So that's what Lagrange's multiplier tells us what to do. And likewise, a very similar expression for p tails, but let's go ahead and do the one for heads. So the derivative of this expression minus p heads log p heads minus p tails log p tails, derivative of that with respect to p heads. So again, product rule for derivative of minus p heads log p heads with respect to p heads gives us minus log p heads minus 1. And the p tails part goes away when I take the heads derivative. I subtract from that lambda times derivative of this constraint expression with respect to p heads. So the derivative of p heads with respect to p heads is just 1. So I've got minus lambda times 1. That's what has to be equal to 0. Or what this expression tells us is that log of p heads is equal to minus lambda minus 1. So essentially take minus log p heads, move it to the other side of the equal sign. Log of p heads with a positive sign is minus lambda minus 1. If I do the exact same thing for p tails, the derivatives work out exactly the same. So I'll just write down the result. Log of p tails is also going to be minus lambda minus 1. I guess we can convert those to solving for p heads. So if log of p heads is minus lambda minus 1, p heads is e to the minus lambda minus 1, likewise for p tails. So what do we do now? We could solve for the value of lambda using the constraint equation, perhaps. But we don't actually need to. Another word, another name for this method that's sometimes used instead of calling it the method of Lagrange multipliers is to call it Lagrange's method of undetermined multipliers, because it turns out we don't actually care what the value of lambda is. We're only interested in p heads and p tails. So if p heads is equal to e to the minus lambda minus 1, whatever value that has, p tails is equal to e to the minus lambda minus 1, whatever value lambda has. We're not interested in the value of lambda. But we do know that p heads and p tails must be equal to each other, according to this expression. Whatever lambda is, e to the minus lambda minus 1 is the same here as it is here. So that means p heads is equal to p tails. And if I use the constraint equation, p heads and p tails must add up to 1. That means p heads and p tails are not only the same as each other, but because they add up to 1, they must each have the value of 1 half. So this is the correct answer to the original expression. How do I make a coin flip as random as possible, as high entropy as possible, not by making it 100% heads and 0% tails or not 75, 25, but by making it 50, 50. An even, fair coin flip is as random as we can get it. And that gives us the correct answer this time because we correctly used the constraint equation to guarantee that the probabilities added up to 1. So what we'll do next, now that we understand how to use Lagrange multipliers, is use that to solve for the probabilities that maximize the entropy more generally, not just for flipping coins, but for an arbitrary problem.