 So let's consider a system I can put in many states, like a dice. Well, that gets a bit complicated because a dice can only be in one state at a time. So to make things slightly easier, I'm going to consider many systems and think about how many ways I can organize them. That too is difficult because, again, I don't have anything to start from. So I will start with a small analogy. Assuming that I have six matches and I would like to know how many ways can I organize them, so I have four on this side and two on that side. That is not entirely trivial. Let's forget about the four and two first and say how many ways can I organize six matches if the order matters? Well, first I can pick six, either of six, or sorry, the order, and then I can pick any of five, and then any of four, and then any of three, two, and one. So that's going to be six faculty. But that was just the total number of ways I could organize them, right? How many ways could I organize the four here? Because those are going to be the same states. Well, it's the same thing there. That would be four faculty, and on that side I would have two faculty. So if I'm only interested in the number of ways I could divide them this way, it would be six faculty divided by four faculty divided by two faculty. And I'm going to do exactly that now, but for n systems. So let's assume that I have states here, state one, state two, all the way to state j. The probability of a state is weight one, weight two, weight j. And if I have a total of uppercase n systems, I'll have to draw that here. The number of systems I have in state one is going to be n one, which is uppercase and multiplied by weight one, etc. So then I'm going to have n one systems there, n two systems there, and n j systems there. Just defining things. But defining things is important if you want to solve something. And that means that the number of ways, again, we're going to use p because p is the probability I'm after. That is proportional to the number of ways I can organize this number of states. And based on what I just said for the matches, I'm going to argue that that is uppercase n faculty, total number of ways I can organize all those systems, divided by n one faculty, and two faculty, etc., all the way to n j faculty. The faculty function is a complicated one. It's not easy to work with. And I can't really immediately do anything here. But remember that we're talking about large numbers in general, the statistical mechanics, right? And for large values of faculty, there is something called the Stirling formula that we can use to approximate it. And to save space, I'll write that up here. Stirling's formula says that for large values of what should we use, I'll just, I'll reuse n here. For large values of n faculty, we can write that as approximately n divided by the natural logarithm e raised to the power of n. And then I can introduce that here. So first, in the nominator, well, that's going to be uppercase n divided by e raised to the power of, that should be uppercase n. But instead of uppercase n, I should be able to write that as the sum of all the lowercase n's. That's going to save me one step. n one plus n two, etc., plus n j. And then I'll just write out the lowercase n's. That's going to be n one divided by e raised to the power of n one, and two divided by e raised to the power of n two, etc., up to n j divided by e raised to the power of n j. I know it doesn't look simple, but it is. This trick here, do you see here we have e raised to n one, e raised to n two, etc., and here I also have e raised to n one, e raised to n two. So I can simply strike those out. They will cancel each other. It's still not the world's most beautiful expression, but let's see where we get, continuing on the equal sign there. This then corresponds to uppercase n divided by lowercase n one, raised to the power of n one, multiplied by uppercase n divided by lowercase n two, raised to the power of n two, etc., all the way up to n j, raised to the power of n j. I hope you can see that up there, but there are some things I can use if I go back to my definitions and weights here. So uppercase n divided by lowercase n one, that's kind of weight, the inverse of the first weight. So I'm going to write that as the inverse of weight one. And then instead of n one here, n one was the total number of systems multiplied by weight one. I know it doesn't look easier, but it will be. One divided by weight two, raised to n number weight two, and then all the way up to one over weight j, raised to the number of systems. And now it's paid time, because now I'm going to assume one system. Because again, it was just the derivation here that was easier when I considered n systems. With one system, n equals one. And if n equals one, I'm going to get a much nicer, actually, rather than avoid writing the equal sign here. So that would mean that p is proportional to one over weight one, raised to weight one, multiplied by one over weight two, raised to the power weight two, multiplied, et cetera, by one over weight j, raised to j. Still doesn't look particularly beautiful and simple. But it wasn't probabilities I was after here. Rather than let's do the normal derivation of entropy and see what the entropy would look like. Entropy is k, l, n, m, or p in this case. If I just use the logarithms here, this is going to simplify something very nice. This product will become a sum, right? So I'm going to have k. And do you notice how all the w's are in the denominator? I can flip that around one over that and just add a minus sign for the logarithm. So this is going to be a sum of the logarithm, leaves some space before. And then it's going to be the sum over all i, weight i, raised to the power of weight i. And then I'll use the logarithm laws one final time and put that weight in front of it. So the entropy in this system corresponds to a sum over all the states, the weight for each states multiplied by the logarithm of the weight of that state. And that the weight here, this means that we get the entropy again as k, l, n, omega, or weight, which is minus it. This is pretty cool because it corresponds to the definition of entropy that Claude Shannon introduced in the 1900s information theory. So there is a very close connection between entropy the way we interpreted in physics and statistical mechanics and entropy and data science on the other hand. It's exactly the same type of entropy or order versus disorder we're talking about. And armed with this, we can in principle calculate the entropy. The only problem is to calculate the entropy, I'm going to need the sum over all the states in the system and I'm going to need to know the weight that is how much time it's spending in the system. And for any even semi-realistic simplified model, the only way we can do that is typically with a computer simulation that samples all these states and finds out what the weights are.