 Okay, so we're getting closer to our intermediate goal of being able to maximize the entropy and find out what macrostate of a system is most likely. But before we can do that, we need to talk about optimizing not just single variable functions, but functions with more than one variable, multivariate functions. So to make sure we understand what we're talking about, let me sketch a picture of a two-dimensional function, some function that depends on two variables, x and y. So I can draw x, I can draw y, and then I'll attempt to draw some function that depends on both x and y and has a particular position where it's at a maximum. So the goal is to understand what to do to find the maximum or indeed the minimum of a function with more than one variable. And of course, for multivariate function, we can calculate more than one type of derivative. We can calculate partial derivatives. So we've made the jump now from calculus one to calculus three in talking about multivariate functions. And I can calculate for this function, I can calculate the derivative in the x direction, the derivative in the y direction. And of course what that means is derivative in the f direction is telling me how much the function is changing as, so how much does f change as I change x? When I move purely in the x direction, and a little bit trickier to draw, but if I attempt to draw what this function looks like, if I make a change only in the y direction, then this is telling me about the slope dfdy when I move only in the y direction. So I can calculate the slope in any direction, but of course the important thing about the maximum is when I'm sitting at the top of the hill right here, it doesn't matter whether I look in this direction or this direction or this direction or this direction, the top of the hill is flat in every direction. So the maximum or the minimum of the function will obey the requirement that dfdx, the derivative of the function in the x direction is zero when I move in this direction, the slope is zero at the maximum. And likewise, the derivative of the function when I move in the y direction, it's also flat. In every direction, regardless of whether I'm moving in x or whether I'm moving in y, the derivatives in all these directions have to be zero. For a more complicated function, if I have more than just two variables, if I have maybe three or maybe a hundred variables, a function that depends on lots and lots of variables, the requirement for being at an optimum or an extremum is the derivative with respect to any one of those directions has to be zero. The partial derivative has to be zero if I'm at an extremum of the function. So that's the prescription for how to find the maximum or minimum of a multivariate function is just take all the partial derivatives with respect to any one of the variables and set them all equal to zero. For example, let's suppose our function is x squared and 2y squared minus 6x plus five. So there's a function. It's not the same as this function, but it's a parabolic function that's going to be fairly easy to take derivatives of. And if I want to know where is the, and this function will have a minimum, it's a concave up parabolic function. So the minimum of this function will be where df dx and df dy are both equal to zero. So I have to be able to take the derivative of the function, partial derivative with respect to x, pretending the y's are constant, taking the derivatives with respect to x. I just get 2x from the x squared term, and the minus 6x term gives me minus 6. Partial derivative with respect to y, 2y squared when I take the derivative becomes 4y. And all the other terms go away when I take the y derivative. So those are the partial derivatives with respect to x and y. If I'm at the minimum, then each of those terms has to be equal to zero, and those are not terribly difficult to solve. So 2x minus 6 has to be equal to zero. That's only true when 2x is equal to six or when x is equal to three. So that's the value of x at the minimum. 4y is equal to zero only when y is equal to zero. So that is the minimum of the function. The function goes through a minimum when x is equal to three and when y is equal to zero. So now we have done an example calculating the minimum of a multivariate function. So it almost seems as if we're ready to tackle a real problem. So let's see how that works. Let's say, remember two things. I know how to calculate entropy. If I know the probabilities, entropy is minus some of the p log p's multiplied by a constant, Boltzmann's constant, if we need a numerical value. Let's suppose my example is flipping a coin. So we know the answer to this question. When I flip a coin, there's a 50% probability it's going to come up heads, and a 50% probability it's going to come up tails. Let's suppose, though, I want to calculate the entropy of that coin flip. I could plug in 0.5 for heads, 0.5 for tails, calculate the entropy of that coin flip. But let's ask the question a slightly different way. Let's say, what would the probability of heads and tails be that would make that coin flip as random as possible? Would it be 100% heads and 0% tails or 50% heads and 50% tails? Which of those two or which of any of the possible probabilities give us the most entropy or the most disorder, the most randomness? So we've got some probability of heads, some probability of tails. And I would like to know, what are the probability of heads and tails that maximize the entropy? Actually, let me maximize the entropy per coin flip. So I need a bar on top of my entropy because I'm calculating the entropy per occasion. So the entropy, molar entropy looks like minus k p log p. So p heads log p heads and p tails log p tails. That is a two dimensional function. That is a function of p heads and p tails. So I can take the derivatives of that function with respect to the two variables, p heads and p tails, and set them both equal to 0. So I would like to know if I take the derivative of this function with respect to p sub h and set it equal to 0, what does that tell me about p sub h and likewise for p sub t? So that derivative of p times log of p, that's very similar to the derivative we did in a previous lecture. So product rule tells me that's 1 times the log of p plus p times derivative of p 1 over p. So that's log p heads, that's what I get when I take derivative of p times log p and add to that p times the derivative of log, that one gives me the 1. So all together that's minus k times log p h plus 1. And a very similar expression is going to be true for probability of tails. So ds bar dp tails. I'll write that a little. There we go. That's going to be minus k log p tails plus 1. So those are the two equations I have to solve for p heads and p tails in order to find out what maximizes the entropy. Rearranging the first one, first of all I can divide by k and get rid of the k on both sides. Equation is minus log p h minus 1 has to be equal to 0. So log p h has to be minus 1 or p h has to be equal to, after I undo the natural log by taking the exponential I'll get e to the minus 1 or 0.37. Mathematically the same is when we were considering optimization of a multiple example we considered when we did optimization of a one variable function. So find out that p heads is 0.37. Math is exactly the same for the second equation. So setting 0 equal to minus log p tails minus 1 also leads to p tails equal to the same result 1 over e or 0.37. So what this example has told us is what will make the entropy as large as possible, what will make my coin flip as disordered or unpredictable as possible with the highest entropy is if I have a 37% chance of getting heads and a 37% chance of getting tails. Not 50-50 like we expected, in fact something seems to have gone badly wrong here and what went wrong is that we've been assuming all along that heads and tails have to add up to 100%. My only two outcomes are heads and tails. Either I get a head or I get a tails, I can't get anything else. So mathematically the heads and tails must add to 100% but that's not what happened in this problem because we didn't require that heads and tails add up to 100%. So the problem we've solved is not actually the one that we're interested in. These do in fact maximize the entropy but they are not the maximum of the entropy if we require that heads and tails add to 100%. So we're not actually quite ready to maximize the entropy just yet, first we have to consider how to do this maximization with a constraint, a constraint that the heads and the tails have to add to 100% and that's what we'll do next.