 Dynamics. Thank you. Okay. Thank you. So today will be a blackboard presentation because I think it's better for the well we have you You have had written notes that you can download from the website Which are the ones that I'm going to follow But I think for this type of lesson which is yesterday was a kind of introduction of the of the topic but today we are going to Introduce concepts mathematical concepts, which are better in the backboard. Okay if So this is the lesson to Which is information theory? Well basic basic aspects and basic concepts of information theory information theory is Essentially the theory developed by Claude Shannon in 1948 He wrote two papers His idea was more to To actually the title of those papers is the mathematical theory of communication. So it's it's a theory of Communication channels, but he introduced the idea of Information and so what though there is some confusion with With the concept of information I'm going to introduce What in some in some maybe you have here about this well this You have a you have a random variable. Let's say discrete random variable discrete and The this random variable has a Distribution, let's call it a Piece of I know this is the probability that my random variable Let me see the notation that I'm using Yeah, we use this one X Usually we we use X for continuous variables, but here I Will use X sometimes in in mathematics. We also Indicate By a sub index The random variable because this is not this is not really This is a just a bar a dummy variable that you can Replace by a number or by Y or by X square or by whatever so to indicate that this is the Probability distribution of the variable X and this is actually the probability that X is equal to a small X and Then we indicate it like that so you can replace small X by every by five six or whatever Okay, but in physics sometimes we just omit this because this This letter tells you more tells you what is the variable that you are talking about We will use this so so if you have this situation a random variable with this distribution the Shannon Entropy or Sharon uncertainty is defined as H of X We will use also this H of P X We will use these notations sometimes we indicate the dependency of on on on the random variable or on the distribution is defined as the sum of P log P and And probably you have seen this formula This is also the same formula that Gips and entropy in the statistical mechanics I Prefer to call this uncertainty, but entropy because I Entropy is a little bit confusing in the sense that people can confuse this Shannon entropy with thermodynamic entropy We will see that one of the main Goals or main Yeah, main goals of thermodynamics of information is to prove that Shannon entropy is equal or Can be equal in some situations to thermodynamic entropy, but don't take this for granted. I mean And this is a common mistake Thermodynamics and information theory to consider this as assessed as the thermodynamic entry Actually, I think the the the original idea of Shannon is to call this uncertainty and Shannon himself tells a story that The end of his life he he said that it was a lie but because Shannon was I mean She he was a character and he had a sense of humor So he told the story that he was when he wrote the paper He asked for Neumann jump for Neumann How can I call this this magnitude and for Neumann told him? Call it entropy for two reasons first because Gips already used this formula for entropy and second because nobody knows what entropy is So you will win any argument or anything? He's right that nobody knows really what thermodynamic entropy is well We know how to use it, but it's a it's a Complicated, I mean it's a controversial concept. So but let's call it entropy. So This sum runs over all the values of x Where where p is this this sum runs over all the values where px is different from zero If p is zero, this is he'll define But let's say if you take the limit of epsilon log epsilon when epsilon goes to zero This goes to zero so you can safely I mean It's the definition is continuous in p in the sense that if you have very small p's that tend to zero The entropy is continuous I usually include here Sorry the I I I write them the let me write it here because there's a I write this as a log of a Generic logarithm in any base and I usually also multiply this by K To express the following thing that So this is my definition of entropy X log Let's put here log a px and You have different possibilities. This is dimensionless if this K is not here So entropy shadow entropy the original version of channel entropy, which is without this K is dimensionless So but usually we call this is we use some units to Quantify entropy if K is equal one Which is the original definition and a is equal to two You have p log p in base to this is We call this number Entropy in beats. So this is the unit that you use is beats If The logarithm is the natural logarithm the neperian logarithm of base e then we call We say that the entropy is expressing nuts This is a net let's say beats nuts And if K is the Boltzmann constant we call this The unit is the unit of this is dimension. Well, we say Wolfgang constant and this is the natural logarithm Then the unit is just the unit of K, which is energy divided by temperature So this is Joules divided by Kelvin But we are going to reserve in Instead of using this we we will use also this letter especially when when the when the When this is Boltzmann constant, we will use the letter s instead of h I'm writing a book on that. I was like one week thinking of how to Select the notation the best notation for that and at the at the beginning I had two letters h for the Shannon entropy and s for the Shannon entropy expressing use by K And then after one week of thinking I decided to use just s so in the book everything will be s Because it's the same quantity expressed in different units. There is a I mean there is no So we have three unit three type of units to to express uncertainty Which is beats nuts and looks like it beats is very nice because the the the Shannon uncertainty has a very Very simple Interpretation and this is what you have to keep in mind H is The when you express h in beats is the number of beats that you need on average to express x Okay so for instance if x is left right like we did yesterday in the Sealer engine or if we throw a coin and x is there is the outcome of the of the coin and It's one half probability heads one half probability process then You put it here know if px px is Let's say One half if x is left and One half if x is right Then this This The uncertainty of these is mine is the sum of minus one half log in beats One half minus one half log one half and this is one bit Okay, so any binary random variable That's can take the two values with the same probability one half This is a random bit and if instead of ref right left you put zero one so a beat in the in Shannon theory Is the uncertainty of a random bit of a of a binary Digit that can take on the value zero with probability one half and the value one with probability and and we we will see that this is general that h of x is the average number of beats needed to Describe x sorry pecs is binary. I need just one beat and And but you can have more complicated situations I also like to To illustrate this as the number Also, it's the number of just no questions that you need to guess x So if x you are playing one of these games where you you think of a Character of a person and you have to ask just no questions To to guess to figure out who is the person H of this random variable, which is the Person is is the number the average number of questions that you have to make To to guess the the person on average And this is the number of beats because it's yes. No question is like one bit We can We will prove this in in in a moment that well in the notes you have approved but I will I will skip it because I want to You can I want to do the the general proof. You can also have In quantum mechanics, we are not going to work too much on quantum mechanics But on quantum mechanics you can define also this entropy for For a the equivalent the equivalent of a probability distribution in quantum mechanics is the Density matrix row. So you can have the entropy of a density matrix is Is That is the trace of row log row This is a quantum density matrix and these formulas. This is this is known as for Neumann Entropy or quantum entropy You know that the trace it replaces in quantum the trace plays the same role as the sum in classical systems So this is p log p is instead of a sum you have the trace adding the in them in the basis where the Density matrix is diagonal the Shannon entropy of the for Neumann entropy are the same thing Okay, but there's one for Neumann entropy is more general now Let's think of The problem What is the relationship between Shannon entropy and thermodynamic entropy and this is a the most subtle issue in this course And it's something that you have to prove Even in textbooks of a statistical mechanics that there is an approach by James in to derive the ensembles in a statistical mechanics which is To to start with this with the Shannon entropy and to maximize the Shannon entropy under some constraints Maybe in some of your universities you have used this this approach this approach is okay, but But you are implicitly assuming some meaning for the Shannon entropy that Is not clear that can be applied to physics So I I don't like this approach because it is not first is an approach that does not connect statistical mechanics with mechanics which I think is Essential to understand the statistical mechanics and second it It assumes implicitly some Properties like the for this for the Shannon entropy that must be maximized But we maximum in some situations, which is something that it is not clear at all So it's more of a kind of epistemological approach, and I I don't I that I don't I think have some problems. So we cannot this is very important We cannot take for granted that the Shannon entropy is equal to No, we have to prove it in equilibrium is easy to prove so for for a physical system in equilibrium In equilibrium, this is important So it's true that the we have if we have a discrete number of states the equilibrium probability density Let's say of a state I is the exponential of minus beta EI You I by C EI is the energy of state I and C is the is the partition function And if you calculate the the Shannon entropy, let's use now this letter S Then it's minus K K is the Boltzmann constant I do and then is the sum over I all the states of row and the log of row and the log of row if you take the logarithm of this is minus beta EI minus log of C and You see that These parts this is a one over KT. Remember that beta is one over KT. So if this is minus This is this is the probability to have an energy I multiply by EI. So this is the average energy Let's put it like that average energy divide by T and this is Minus K if I multiply and divide by T I get Minus plus one over T KT log C. This is the free energy in equilibrium We will define something called the non equilibrium free energy, but now it's the equilibrium free energy. So we have that The entropy is minus the average energy. Let's call this E Minus the average energy and this is minus the Remembering the statistical mechanics the free energy is minus KT log C. So this is minus K minus sorry, this is There is this K I Have KT and K log C. So I have to It's fine. No, it's fine. And and this is minus F divided by T Let me see if I have everything. Yeah, so It's okay. No, no, it's not okay Ah, this minus with this minus is plus and this minus with this minus is plus But the free energy is defined like that. So this is plus and then you have that if this is equivalent that F is E minus TS So which is I mean in it it's it's consistent with the definition of entropy in thermodynamics that the You can define entropy From the free energy or the free energy from the entropy but in any case you can reproduce the mathematical structure of thermodynamics In the canonical ensemble using the canonical in the canonical ensemble Using as a definition of entropy the Shannon entropy. So for equilibrium Shannon entropy is equivalent to thermodynamic entropy It's equal to thermodynamic entropy, but not in general In particular not in equilibrium Okay, one can ask so it's oh, it's all this is it's always true that I can use Shannon entropy as a thermodynamic entropy There are some there is a lot of discussion on that. So For instance Shannon entropy has a problem when you have in the you have the face space of your system of a physical system This is the face space and each point here is The position and the moment of all the particles for instance you have a gas and q and p is the position And momenta of all the particles and You this is the way we in this is a mechanics we express this Micro state of a system and you have a probability distribution That evolves in time because you have a Hamiltonian dynamics here. So Then it's easy to prove that the entropy of this row Is constant. I don't know if you have done this this exercise, but it is a typical exercise in statistical mechanics To prove that the Shannon entropy is constant. It's always constant. I mean it doesn't depend on the process It's a theorem that is easy to prove It's because no, no, no, no, it's not constant, bro. Well, I don't want to enter into details here bro verifies something called the Louisville equation or is that the the simplest case is that if you assume for instance, if if Roy's is a kind of Goshen Centering in a micro state this macro state has a crazy dynamics This micro states are both like that and then this This probability distribution this from will evolve crazy. I mean the the evolution is crazy But it's true that if you compute this this is constant. I Don't want to the the The proof is not so easy. Well, it's not so easy. It's not so difficult is you have to make the the the time derivative of this You make the time derivative of this and then you you have the the time derivative of this is the derivative of Of well, you have to make the derivative of this derivative of this And then you work with the properties of the of the lewbillian and you can find that the entropy is constant But I think it's a little bit complicated. I don't want to so this is the main objection that people Uses to say that the shadow on entropy is not a thermodynamic Because in a irreversible process The entropy should be should increase and I hear it never increases How people can? Fix this well, but what you say is that if you start this row It's true that it's constant, but this row if you start with a row around some microstate And the evolution is very very complicated and then you have some kind of coarse-graining so What a macroscopic observer sees is not this row in all its details But something which is a smooth row and when you smooth or you coarse-grained the row Then it's easy to prove that the shadow entropy always increases and this is in this way people make compatible the the Shannon entropy and and thermodynamic entropy but I have So when you don't have a time-dependent Hamiltonian the row is a constant of the motion, right? Sorry, when you don't have a time-dependent Hamiltonian the row is a constant The energy is a constant but probably it's not a constant row depends on the initial condition Roy's a constant only for a killer on I mean this row is a constant of course When you have this this Roy's constant rock QPT Actually is constant I mean if I don't leave Roy's a function the row is a function of the Hamiltonian Yeah, which is the case in equilibrium, but not not in non-equilibrium You can have it whatever Roy's the probability density, but that is what you will see or I mean if the Hamiltonian no You will see or even tells you that if you take a we are talking not about about Volumes in phase space volume in phase space is constant. This is you will see or so if you start if you start with a volume you can if you start with a Bunch of points here and you let the points evolve they will evolve like that very complicated and the volume The volume of this guy is equal to the volume of this guy. This is new bills there But now suppose that you just start but but the volume that I mean the set changes in time and The forms and so what and the probability distribution if you like it's a uniform distribution here for instance It will be a uniform distribution here, but it will evolve So the the probability density of a system, but otherwise we didn't have a kill non-equilibrium and statistical mechanics Well, you will an operator. Yeah. Yeah. Yeah, it's constant It's constant only if it's a function of it if it is cumbersome with age not if age is dependent and you say That is in the case of the system in the equilibrium they mean the equilibrium with the bath In this case you can you can extend this. This is this is the canonical ensemble So this this describes a system in equilibrium with a thermal bath But you can extend this you can prove that Shannon entropy is Equal to the thermodynamic entropy for any equilibrium state not only the canonical But the micro canonical the grand canonical Okay for this case there is there is some difference between if the distribution is Gaussian or no the solutions of This equation if you put a signature condition a Gaussian The only thing it Even it does not remain Gaussian a Gaussian remains Gaussian only if the evolution is linear Yeah, and the Hamiltonian evolution is so thank you. We'll get the complete Okay, this is a these are Typical arguments to discuss of this discussion of whether the Shannon entropy can be identified with the with the Thermalamic entropy, but I have I will have another Can you repeat the argument why taking a point in the face of space and thinking the cross-grading and then a smooth set in the In the face of space is different for the Shannon entropy and is different Like I didn't get the the argument why taking a point or doing the cross-grading and taking the evolution Makes different the evolution of the entropy. I don't remember what you said No, what I said is that if you okay that Okay, this was just a kind of foot though I didn't want to Because this is I mean it's not difficult But it's a kind of complicated to because you have to explain you will theorem you will equation and so on but the idea is that No matter how you start You can start with any row The row can be a delta in a microstate a Gaussian in the microstate It's what happens or or or the row can be like that like like Uniform here and zero elsewhere So what happens under the Hamiltonian evolution is that this row has a very complicated form But the Shannon entropy of this row is constant. This is a theorem, which is easy to prove for a time-dependent Hamiltonian of a constant so and And and then but what it what the Let's say what it's happening is that this row is very is the uncertainties is constant But it's so complicated that any cross-grading or a smooth Or some type of You make the smoother the the function This increases the Shannon entropy and the proof of that. This is what you are asking. No, what why cause raining increases? I Will give the proof later on because otherwise, but you can imagine that You can imagine the following for instance the the typical evolution is like that. Let me let me give you a Intuitive explanation the typical evolution of a set In a Hamiltonian dynamics is that you create a lot of filaments like that Very complex like that so And the Shannon entropy of this is equal to this for uniform distributions the Shannon entropy is the volume So this volume is equal to this volume and this is a little theorem. So it's constant so but what happens that these These complicated sets for a macroscopic observer The macroscopic officer I cannot distinguish between these details. So what the macroscopic observer sees is a kind of something like that and This volume is always bigger and this is the idea Okay, the intuitive idea, but you can make this more you can make this not for volumes also for for probability distributions Okay, but I'm happy with this discussion because this is a This is one of the goals of the course to let you know that to reflect on the differences between Shannon and and Thermodynamic entropy for me one of the arguments This is this is a good argument that this Shannon entropy by itself cannot be a measure of entropy unless you Prescribe some degradation of the information in raw But there is also a very a very simple argument if you have a if you have a Potential like that Because that you have a Brownian particle one particle knowing optical tweezers or whatever so you have a Brownian particle and You know the equilibrium is that the raw effects This is X This is the equilibrium Let's say that this is a Harmonic oscillator The equilibrium distribution is this one no beta m omega square x square y by 2 By my stomach If I just if I take my particle and I put my particle exactly with the same distribution or I move my optical trap This is an optical trap. No, and I move my optical trap It's clear that this is out of equilibrium this situation. I mean This is in equilibrium. This is very far from a kill you actually from this distribution. I can get Energy or work or whatever I like and about the shadow entropy is the same here and here So it's clear that shadow entropy is not by itself Something that tells you how far a system is from equilibrium or things like that. You need a combination of of raw. This is raw and Hamiltonian this is the potential and so shadow entropy is not Enough to tell you how a system is In what is the thermodynamic system the thermodynamic state of the system, okay? So we will go back to this example by defining something We will use shadow entropy as thermodynamic entropy by combined with the Hamiltonian and then we will get We will see that what makes sense for systems in out of equilibrium is the Actually is is the free energy not the entropy But I'm just announcing what we will see tomorrow No man, yeah, yeah is more general than Shannon entropy and in some people in stochastic thermodynamic in and out equilibrium. I can see that Say the volume on entropy is Equivalent to the entropy production plus the entropy No for quantum system for Neumann entropy is actually plays the same role as a shadow entropy It has the same problem for example Neumann entropy is constant under Hamiltonian evolution So it has the same problems as everything that we have said for Shannon entropy and classical systems. You can say it Have some people talking about stochastic thermodynamic in and out equilibrium and there is a equivalence between between the volume on entropy He say is equivalent to while the entropy production plus the Yeah, yeah, this is what we are going to see this but this you can do this with both I don't understand. Yeah, both man and volume on entropy. So there is a equivalence between both man and volume on entropy No, I've bought and but Neumann and Shannon. There is a nuclear line between both Neumann and Shannon But not both Neumann and Boltzmann Okay, but Boltzmann first, but there is no Boltzmann entropy for quantum systems. Yeah, I know so it is But we will see this tomorrow when we Tomorrow, we will see stochastic thermodynamics and apply and we will apply these two thermodynamics But so far the only thing that I want here is to make you think that Maybe you have studied before that Shannon entropy is just thermodynamic entropy is true to To tell you that this is not true. Okay, but tomorrow we will go to this point Yeah, the argument was the entropy is constantly computed under a Hamiltonian evolution But it should increase and you say it should increase because we the observer it is like Some volume and it doesn't have the precise details of the volume. So it says bigger. It is a bigger one but I mean Physically for me, it's okay, but still like mathematically. I don't see why The observer should see a bigger volume. I don't know. This is of course. There is no a mathematical and a mathematical Let's say a formulation of that and actually This is a kind of epistemological way of The logical argument you have to include in the theory a degradation of information because of the Macroscopic nature of the observer or something like that and this is completely. This is completely subjective or arbitrary or The channel entropy would play. Yeah. Yeah. Yeah in this case could play I try to say that the That when you do this you do equilibrate the Brownian particle in an optical tweezer and suddenly you move it The entropy of this row is the same, but clearly the system has departed from equilibrium So and the channel entropy cannot reflect this When when the system is out of equilibrium, you can do a lot of things you can get work from me You can I mean this is a this is a completely different situation from this one And even though this is not the channel entropy cannot Reflect this okay Okay, let's so this is the mixture of information theory and and I mean this is all this discussion is very interesting, but it is the the the Interplay between information theory and thermodynamics now Let me just focus on information theory because this is lesson 2 is information theory So we have introduced the channel entropy as I said this the best way of Imagine the the the channel entropy is to imagine that it's the number of bits that you need to describe something Yeah, this is this is what you have to keep in mind. Let's introduce the second Concept which is called mutual information mutual information is Defined for two variables and he's defined like that by definition is The sum of EXI I Will use luck, but well, let me use luck in general This is the joint distribution. This is the joint distribution. These are the marginals This is the probability of finding X. This is the probability of finding I Y And this is the mutual information You can write this mutual information in different ways for instance this is also equal to by definition The joint probability distribution divided by the marginal is the conditional distribution So I can write this also like that For instance, if I combine these in this case combine these and these I get p x y x this is a this is the conditional probability divided by p x and You you can use these two equations or two definitions To express the mutual information in terms of Shannon entropies for instance if you take the first one Here I have p log p The joint lock the joint. This is minus H of The joint distribution and then you have a p log p x Some over X and Y if I some if I take this this this Let's let's do it here if I take this one this term Here, this is this is that's not dependent Y So when I sum over Y I get here I get the marginal So this the term that comes from the log of p X is minus p X log p X is H of X and if you do the same with p log p Y is the H of Y so The mutual information can be expressed in this way and also if you take this one Here This is a p log p X is H X and What remains is this quantity? P of the joint log the conditional. This is called the conditional entropy so by definition this conditional entropy Is the sum over X and Y? Of p X Minus p X Y X Y log of p X The conditional So the mutual information has to these two ways of writing and of course It's symmetric. You see that it's symmetric the definition under X and Y. So we can also write this as H Y minus H Why this is the mutual information? Why this definition? Well, if you think of the of this first Formula this is the number of bits. You need to describe X This is the number of bits that you need to describe why and this is the number of bits that you need to describe the pair If the pair I mean so this is the number of bits that you save if you take into account correlations Because this is if this is the the bits you need to describe X X by itself. This is Y. But when you describe the pair, X, Y, this is positive, by the way. You need less bits to describe the pair than to describe independently X and Y. So this is a, the mutual information is a measure of the correlation between X and Y. This is something that you can see here in the first definition. If X and Y are independent, this joint distribution factorizes and this is zero. So I is zero if X and Y are independent. Okay? This is the first one important property. But the second one is even more important, the second formula. Because suppose that you have X, a random variable, no? And you make, suppose that X is, you make a question about X to somebody who knows X. Suppose that X, you know X, no? It's a number between one and one hundred, let's say. And I ask you, is OTH the number or is even? And then you tell me yes, no? But you can lie. No? Before we saw that in this game of yes, no questions, you always say the truth. And then the number of questions is H. The number of questions that I need. But suppose that he can lie. Then how my uncertainty is reduced not by the, I mean, this formula, this mutual information tells you how the uncertainty is reduced when you make a question about X. I will show you this now. So suppose you have X and you ask a question. And the answer to the question is Y. Y can be now yes, no, or can be wherever. Can be anything. Actually, this, you have the random variable. This is the same if you have a system, a physical system, and you make a measurement. And Y is the output of the measurement. It's the same thing. So you have a random variable. You can ask a question. You inquire about the random variable X. And you get an answer Y. Okay? But this answer Y can be, can be in, or the outcome of a measurement can be subject to some error. So what you have is a probability that your answer is Y whenever the system is in state X. Okay? And of course, if in an error free or if this is an error free, measurement, no, or if the, if the, if you, when I ask you the question, you don't lie, then this will be a delta, no? Pi X, it will be, it will be one if, if one or zero, let's say, no? If I ask you, is the number odd or is the number even? If the number is odd, it's one and the number is the number seven, it's the probability. Suppose that you have some probability to lie, which is the same as in a measurement. What is the, what is the information after the question? What is the uncertainty? Sorry, after the question. After the, after the question, I have to update my probability. This is bias statistics. I update my, this is the updated, this is the new distribution incorporating the information that I know. This is the outcome of the measurement, the outcome of the question or the answer to the question or measurement. And using a bias formula, this is, I guess you know this formula, P, A, B. We have used this formula before, no? The conditional probability is the joint probability divided by the, the probability of the condition. So you can use this formula to get, to get, this is PY and this is PX. This is let's say what characterizes your measurement apparatus, no? And if I, if I multiply by PX, this is the joint probability and this is the, this is called the bias formula. And this is my new, my new, my new probability distribution. What is the uncertainty of this new probability distribution? It's minus the sum over X of P and this depends on the outcome. Of course, some outcomes, if I ask you, you, you think a number from, from 0 to 10 and I ask you, is the number 10? And you say yes. Then this is a lot of information. You tell me no. It's, it's very, very small information. So the information depends on the outcome. But usually I'm interested on averages. So the average of this, when I take this average, put it like that, I take the average of this over all possible answers or all possible outcomes, which is PI. This is called the conditional uncertainty or the conditional entropy. So the conditional entropy is, is the average of the number of bits that I need to describe X once I've incorporated the information from the, from the measurement. So this is the uncertainty and, and let's put it now here, X, Y. I multiply this by this. So I get P, Y. I get this formula. And this is the number of bits that I need to describe X after I incorporate or after I update my, my, my, my probability distribution using the information after. And now this formula has a very nice interpretation. This is, let me write it here. I, I will write it like that. I, I pass this there and I write it like that. And you see now the, the, the deep meaning of the mutual information. I have some, I have a random variable. I have some uncertainty with this random variable. This could be the state of system, the number that he has thought or whatever. I have some situation with some randomness. And this situation, this has some uncertainty, H of X. Now I inquire about X. I make an experiment. I make a, a, a question. I, I, I extract some energy and I obtain Y, which is the answer to my question or the outcome of my measurement. This information decreases the uncertainty. And this is what this expresses. HX condition to Y, the conditional entropy is the uncertainty after the measurement and is equal to the information, to the uncertainty before the measurement minus the mutual information. So the mutual information is the reduction of the uncertainty when I make a question or when I make a measurement. And this is the, this is the main meaning of mutual information mutual information. Actually some people call H information. Some people think that Shannon information is roll of roll. No, this is the uncertainty. Information is what you get. I mean, information is the reduction of the uncertainty. And, and, and the true measure of information is Y. It's I. It's the mutual information. It's the information that Y provides about X. And it's, it turns out to be symmetric because we have proved that it's symmetric. It's also the information that X provides about Y. But this is the, the, the, the meaning of the mutual information that you have. Average number, always. You can prove that this is, that I is always positive. I will not prove this. This is a simple proof using any of these formulas. You can prove it. And I will not prove it. So you can, whenever you ask a question, you have really a reduction in uncertainty. When, if, if, if the answer to the question, I mean, if it is a real liar and he just invented the answer every time, or if I have a, a, a completely measurement but the error is so big that the Y is completely independent of the real state of the system, then the reduction is zero. So let me finish one, with one thing. So if Y and X are independent, then this is not, this is zero. So the, the, the, the Y does not provide any information about X. The other extreme is this one. If, if, if I have an error free, if I have an error free, if I have an error free measurement, then the, the mutual information, you can get it from here. If, if, if the, the measure, the answer is, is, is determined by X, no? We are in this case. If, if, if, I mean, Y is a property. Maybe it's, if X is a number, Y could be, it's an odd number or it's an even number or it's my number bigger than 100 or whatever. And, and then when the, when there is no error in the measurement, this is one and zero. So this means that the conditional uncertainty of Y, if I know X, if I know the state of the system, I know the answer. So this is zero. There is no uncertainty in error free measurements. The state of my system that, that reminds the answer or, or the outcome of the measurement. So in error free measurement, you see that the, the mutual information is just the uncertainty of the, is the uncertainty of the, or the entropy of the answer or the outcome. This is why many people also, there is a confusion between mutual information and Shannon entropy. Or Shannon entropy is considered information. Shannon entropy sometimes for, for instance, for storing, storing information can be considered a measure of information. But what it is really a measure of information is the mutual information. It's I, not H. Okay. So whenever you, you have, maybe you have here, ah, the Shannon information and so on, ah, ah, it's not true. Ah, H is a measure of uncertainty. Information is the reduction of uncertainty, which is I, which is the mutual information. Yeah, of course, in the, in the restored to zero process, you have variable, random variable, which is the, the state of the system, the measures, the macroscape, the macroscape of the system, which can be zero one. So you have the Shannon entropy, the Shannon entropy, there is, ah, is one bit. And then you go to zero, which is, see, there is no uncertainty, zero bit. So there you have, ah, there you have a decrease of entropy, but not because you have measured. It's because you have done something to the system. So it's not, ah, it's not this scenario, it's another scenario. Yeah, let me introduce just the final, ah, in the last, ah, part of the class. So, um, and, and, and we will, ah, see it. So this is, I want you to understand very well the concept of mutual information, which, ah, is the reduction of uncertainty because of a measurement. This is going to be super important in the Maxwell demon, because if we, if we are able to identify Shannon entropy with thermodynamic entropy, this means that a measurement also will decrease the, the thermodynamic entropy of a system, and the decrease is given by the mutual information. So this is the first interpretation of mutual information, first interpretation, the decrease of uncertainty in a measurement. The second is this one, which is also important. This is a measure of correlations, and, um, and you can see, ah, we can rewrite this as follows. For instance, ah, this is the entropy of, of, suppose you have a system that is composed by two subsystems, x and y. This will be important. We will consider this as the system, and y is, it will be the demon. So you have two systems, x and y. The entropy or uncertainty of the system as a whole is the, the uncertainty of x plus the uncertainty of y. One could think that is additive, like sometimes we say, ah, entropy is additive. This means that the entropy of, of a system is the entropy of system x plus the system of system y. But no, you have to subtract the mutual information. So the mutual information, which is the measure of correlations, because it is zero when they are independent, is also a way of expressing the entropy of the whole global system as the entropy of subsystem x, subsystem y minus this correlation. So the correlations between two systems always decrease the total uncertainty. In this sense, I mean, the more, if I somehow, I, I, I destroy the correlations, the entropy of the system increases. And if I create correlations, the entropy of the global system decreases. Well, correlation is that they are not independent. Correlations, correlation is, is, is, is a way of saying that x and y are not independent. You can measure correlations using the typical correlation like that. This is what we call correlation in the statistics. But you can measure correlations in, I mean, correlation is, is something that tells you that how one variable depends on the other. So this is a way of, of, of quantifying correlation. This is another way of quantifying correlation. So this is when I said correlation, I, I mean the, the, the, the generic expression that one variable depends on the other. So you have different ways of quantifying. This one is the statistical correlation, the typical correlation. The mutual information is a measure or a, or a, or a, or a, that is a measure of, of, of correlations. Yeah. No, the, the, the mutual information is between the two systems. It's something that. Ah, it's 1 y 1. It's what? It's 2 y 2. So you have four, four variables. Yeah. No, it's not deterministic. When I? When I decrease, can I directly compare actually? No. I don't, I would say, well, you can find, you can say the same. We will see that this mutual information will be related with the work. So to create correlations between two systems, you need a work which is equal to KT and the mutual information. So in this sense, you can compare it. But of course, the, the global entropy depends, depends, I mean, if the two, if the two systems are identical, you, of course, you can, but, but you can have this number equal in two, in two systems and then this is, this, this can be different. So I don't think it makes sense to compare. We will not, no, mutual information. Well, mutual information in communication theory has a very important role because it tells you how, how many, how much information can you transmit in a channel? Let's say by Wi-Fi or by Wi-Fi or by, or by a cable or something like that. But I've not seen, so you can compare two channels. This is true. It's more efficient, but that's it. No, no, no. X is the state of the system. And this is not abstract. I mean, it's, it's general, but not abstract. It's, X is the state of the system. It can be a microstate, a mesostate, even a macrostate, whatever you like. But it's a random variable that is the state of the system. Why is the outcome of a measurement that I perform on the system? The error in Y or what? Or Y? Y could be, yeah, Y could be, I mean, in a measurement apparatus, the last, the number that you get. Yeah, there is a system, then there is the apparatus, so the apparatus tells you, the outcome, the outcome is Y and the, yeah, yeah, yeah. So information, yeah, maybe this is also important. Information is the concept that you can talk about the, the information capacity of a system, which is the channel entropy. And usually, what is information? Information is something that you receive, and because you receive this information, you know more about something. And in this case, you receive Y, you look, you observe Y, and then this observation decreases the uncertainty of the state of the system, which is X. This is, this is the idea. Yeah, yeah, it can be, well, mutual information, you can, you cannot do it, but it is not so, it is not so straightforward to generalize mutual information to several variables for the problem. Yeah, I'm sure that there are papers on that, but it is not, I mean, we, we have not reached this so general case. There are generalizations of many outcomes, not binary outcomes and so on, but not many agents. Well, yeah, maybe at the end of the course, we can discuss that. Okay, I, I want to finish here, because we have to finish a little bit earlier with the last, the last concept. The last concept is, is the cool back, relative entropy, let's call it. It's, it's a, it's a, you have, you have a single random variable, but you can have, you, actually, the usual situation is that you don't know the probability distribution, or, or you have the probability distribution can, can be of, can, can be PX or can be, can be, can be QX. So, and you don't know. This is a typical, a typical problem in the statistics. You have data, X is a random variable, so you can make experiments and get realizations of X. You can get, suppose that X is, is a, the typical example is that X is a number between two and twelve, but you don't know if, if you go to another room and you, I give you two dice and, and, and you throw the dice and you give me this data, but I give you also a lottery, like or, or, so, you have a lottery with balls and the balls are labeled from two to twelve. So you go to the other room and you start to, to, to choose one of the two. Of course they are different, no? Here, the probability distribution is something like that. This is six to twelve, and here the probability distribution is uniform. No? Every, every number has the same probability to be obtained, no? So you start to give me numbers, there is a P or a Q, and I have to guess or to figure out which is the method that you have used. Okay? This is a typical problem in statistics, that you have a model, you have several models, like, or one model that depends on the parameter, but you have different guesses of what could be the origin of your data, your given data, and then you have to decide which one is. The relative entropy or cool back is called also cool back li, libelar divergence tells you how difficult it is to solve this problem, and it's defined as follows. DPQ is the sum over X of the joint distribution, sorry, PX log of PX divided by QX, and you see that this is zero if P and Q are equal, and it's bigger than zero otherwise. So it's a measure of how different these two probability distribution are. Some people call it a distance, but it's not really a distance because it's not even symmetric. So the cool back libelar divergence between P and Q is different from Q and P, and you have an exercise to grasp, to understand why it's not symmetric. The cool back libelar gives you a lot of information of how distinguishable are the two distributions, P and Q. For instance, it's telling you that if you want to, sorry, let me call log, let me write log here. So you can use natural logarithms or whatever, or bits, you can measure it in bits as well. There are some properties, you have the properties in the notes, but the main properties is that it gives you an answer to this question, to this question of a distinguishing between a probability distribution and another probability distribution. This is called the Steiner lemma, and it tells you that if I have N data, so you give me N numbers of, you go to the other room, choose dice or lottery, and then you start to extract numbers, and then you give me the numbers. You extract N numbers, and then I have to, well, I can use a maximum likelihood algorithm or whatever. If I have, if any is large, I start to do the histogram, and if I see that the histogram is not uniform, like that, like that, then, I mean, the data will reveal what is the origin of these numbers. But how many data do I need, or what is the error probability? The error, the probability of having an error, the probability of the following error of guessing, probability of guessing P when Q is the real one, or condition two. So, let's think that Q is the real one. This asymptotically goes like 2 to minus the number of experiments and the distance, the relative entropy. So, if the relative entropy is very small, you need a lot of data. So, this is why the relative entropy is a measure of how easy it is to distinguish Q from P. And actually, this explains why it's asymmetric, because sometimes maybe for me it's easy to dress up as, I don't know, I don't want to be politically incorrect, let's say, maybe you can confuse me with some, I don't know, a lion, let's say, but for a lion, it's harder to look like me. So, maybe it's easy, maybe it's easier that P looks like Q than like Q looks like P, or this is why it's asymmetric. And you have an exercise to prove that. This is very, I mean, this is not very rigorous. You have, in the book on information theory, you have, because you have to bound, because you can always say that, this is, of course, this result is independent of the method. You can have much more likelihood you can have other methods. But you have to, I mean, this is just a very rough formulation of the theory. The real one is Q and you start to give that and from your algorithm tells you that the answer is P. So, this is an error probability of missing of error. So, this is the probability of making an error in your guess. No, this has nothing to do with mutual information in principle. This is just how two probabilities are independent. Of course, mutual information, mutual information is the distance between Px and Px, and Px, and Px, and Px times Py. So, mutual information is how different is the joint distribution from the product? No, but here, here we have just one variable. This is important. In mutual information, it's about two random variables with some probability distributions. Here is a single random variable with two possible candidates, which is completely different. Well, in physics, we will show that, for instance, mutual information has energetic consequences and correlations are just an empirical thing. But the mutual information has much more physical meaning than the normal correlation statistics. We will show this. No, mutual information, we will see that it's a free energy, that it's directly related with the work you need to create correlations. More questions? Yeah, it's equal. This is, which is not, it's not, I mean, it's not, this is not trivial, eh? The definition is trivial, but if you think of the definition of mutual information as the reduction of uncertainty in x, once you know y, this means that, or this is the information that y provides about x, it's surprising that it's equal to the information that x provides about y. So, it's not trivial, eh? But it is true. Okay, so you have a pair of exercises, one on the, on the relative entropy to, to understand why it's no symmetric. And the second one is, is to calculate mutual information in a seed arranging when you have a probability of mistake, I mean, of error, of, of, you measure and, and, and the measurement apparatus tells you that the particle is in the right, but it's, it's in the left. So you have an error. So you can combine everything and tomorrow we will see thermodynamics and on Thursday we will see how the two information theory and thermodynamics combine together to, to explain the Maxwell-D upon the seed arranging and so on. Okay.