 So we're finally ready to be able to make some predictions about real chemical systems by asking which macro state is the most likely one out of all the ones we could consider. So the question we're really asking is if a particular combination of molecules in a particular collection of states is going to be the most likely one, it's going to be the one with the highest multiplicity, the one with the largest number of ways of making it happen. As we've seen when we talked about lattice problems, for example, but we have one slight problem which is the multiplicity is in a terribly convenient problem. So that's why we went to the trouble of defining this quantity we're calling the entropy, which is the log of the multiplicity. So whenever the multiplicity is large, the log of the multiplicity will also be large. So asking for what macro state has the highest multiplicity is equivalent to asking what macro state has the highest entropy, the extensive entropy, which is equivalent to asking which macro state has the highest entropy per molecule or per mole, the molar entropy, or while we're at it, that's also the same as asking which one has the highest entropy when we divide it by constant, Boltzmann's constant or any constant we wish. So we can maximize any one of these properties, whichever one's most convenient, and it turns out the most convenient one for us to maximize is this probabilistic definition of the entropy. So if I sum up P log P probability times the log of probability for every different state the system can exist in, every different confirmation, every different energy level, whatever the states are that we're talking about, then that's the thing I want to maximize, negative some of these P log P's. But we've learned that we don't actually just want to maximize the entropy, probabilities have a special criterion they have to obey. If this is the function we want to maximize, F is minus the sum of the P log P's, there's a constraint, which is that those probabilities have to add up to one. So we want to maximize this function subject to this constraint. We've done this for the specific case of just two outcomes, heads and tails for example, but we're in a position now where we can do it more generally, where there might be ten different states the molecule can occupy, or ten different products that can happen in a reaction. So if we want to maximize this function subject to this constraint, we can use Lagrange multipliers to do that. So we're looking for the conditions where derivative of this expression with respect to one of the probabilities, maybe P1 or P2, we're looking for, let's use P sub j, we're looking for the derivative of F with respect to Pj minus Lagrange multiplier dG dPj, we want that to be equal to zero, and we want that to be equal to zero not just for P1 or P2, but for all of them. So what we want is j must be one or two or three or any of the possible states the molecule can be in. So we just need to take these derivatives, I'll write this out, derivative with respect to P sub j of negative the sum of all the P log P, so P i log P i summed up over all the i's, that's that first dF dPj, and I want to subtract lambda times the derivative with respect to P sub j of the constraint, which is just the sum of all the probabilities. So that's the thing that we want to be equal to zero, let's stop and make sure we consider how to do this, because if you haven't seen this very many times before, taking the derivative of summation notation might look a little intimidating, but really all this means, I'll write this out in a separate step. This particular step, for example, derivative with respect to P sub j for one of these j's, let's say P sub three, derivative with respect to P3 of this sum, that sum is just, so for example dP3 of P1 plus P2 plus P3 plus P4. The sum just means add up all the different probabilities P1, P2, P3, P4 and so on. Derivative with respect to P3 of that sum, none of those terms matter except the P3 term. And so derivative of P3 with respect to P3 is just one. So this derivative is actually quite simple in form. It's that derivative of the sum with respect to P sub j is just one. Whichever the Pj is, it appears in that sum exactly one time with no coefficients. So when you take the derivative of summation notation, you're just plucking out only the terms that have a Pj in them and taking the derivative only of those terms. Likewise, for the first term, the derivative with respect to Pj of this term. The first term P1 log P1, the second term P2 log P2, only one of those terms matters and it's the one with Pj in it. And that's going to give us minus log Pj minus one. Where that's the same derivative we've taken a number of times now. Product rule tells us derivative of P times log P gives us this. And P times derivative of log P gives us this. Both with a negative sign. So that's after taking the derivatives. Those things must be equal to zero if we're at a minimum. So if we are at the, and again, for j equals one, two, three, and so on. So for every one of these states, the log of its probability with a negative sign, minus one, minus a lambda, has to add up to zero. So rearranging that equation tells us log of Pj must equal minus lambda minus one. Undoing the log, Pj is e to the minus lambda minus one. This is looking similar, quite similar to the problem we did with a coin flip. The difference now being we don't just have two options, we have lots of possible options for what each molecule can be. So, so far so good. Each one of these individual states, number one, or two, or three, or so on, has a probability equal to e to the minus lambda minus one. Lambda is this Lagrange multiplier that we don't need to know the value of. Because the constraint tells us when I add up all of the piece of i's, P1, P2, P3, and so on, I should get one. The probabilities have to add up to one. So, adding up this value for P1, this value for P2, this value for P3. I'm going to get e to the minus lambda for each of these terms. If I have, let's say, big N different possibilities, state number one, two, three, all the way up to state N, then N different times, I'm going to add up e to the minus lambda minus one. So, e to the minus lambda minus one N times has to be equal to one. And what that tells us is e to the minus lambda minus one is equal to one over N. But e to the, and we could continue and solve for the value of lambda, but remember we don't actually care what the value of lambda is. Knowing that e to the minus lambda minus one is equal to one over N is enough information to tell us that since each of these probabilities is equal to e to the minus lambda minus one. P sub one is equal to this, P sub two is equal to this, P sub three is equal to this. All the probabilities have to be one over N. When I add them up, one over N, N different times, that gives me 100%. So, this is the answer to the question, how do I maximize the entropy? How do I, what is the most likely outcome if I have a bunch of different states? The one that maximizes the entropy is if every state has the same probability, N states each with probability one over N. For example, for the coin flip example, if I have two possibilities, heads or tails, the maximum entropy comes when heads happens one half the time and tails happens one half the time. If I were to roll a six-sided die, what's most likely, what maximizes the entropy is if one out of six of my rolls come up with a one, one out of six of them come up two, one out of six of them come up three and so on. Six different outcomes, probability of each one is one over six. So we've done a fair amount of work to derive something that seems pretty obvious, but we've now guaranteed that that is in fact what maximizes the entropy and that's the most likely outcome is half my coin flips will be heads and half will be tails. But notice that it doesn't seem to answer some of the more interesting questions, right? If I have, we know for example that butane molecules, one of the examples we've been using, at room temperature butane molecules have the 68% chance of being in the anti-configuration or confirmation, 16% gauche plus, 16% gauche minus. That is not one-third, one-third, one-third for the three different possibilities, right? Why isn't it more likely that the butane molecules will have one-third probability of being in each configuration? It seems from this math as if that would maximize the entropy, but that's not what happens in the real world. So we're not yet able to predict what's gonna happen for interesting systems, chemical systems like butane, only for relatively boring things like coins and dice. And the reason for that is we've included the probability constraint. They guarantee that the probabilities have to add up to 100%. But the thing that makes butane more likely to be in the anti-conformation than the gauche conformation is something to do with its energy. And we haven't included any constraints about the energy. So that's the next step we'll take is to understand how to predict what's gonna happen when there are constraints on the energy of the system.