 Can you hear me now in the chat online? Thank you. Yes, we can hear you now. Thank you. Okay, so basically the activity for today is if you can write whether you have been using entropy in your research in your projects and if yes, how. And once we have it ready you can then get sticky to the ball. Okay, if there are some contributions in the chat, I will read them. Okay, that's also possible. Okay, so you can discuss some field of study for it. Great introduction of trash trees. Have to do this one. This one is as yesterday. Yes, I know that yes, yes, and the other one was the version assumption about processing. And it needs a computer instructions to mutual information between two courses. Using the ring was like your production. I don't know for perspective purposes. I just found out to this. Oh, there was a thing to this. Okay. Yes. Yeah, I know that paper. That's pretty nice. And it's using the maximum to be principal in research and systems and value. Okay. The information through the Internet. Yes. Great. So mentioned my application so I, if you look at my, my publications, more than half of them is summarized entropy. So, so that's why I'm giving lecture now. So in statistical physics information theory, econophysics analysis, image processing getting many, many applications. I see, you know, this thing. Good. So, um, sorry, I didn't pay attention. I don't know if, if you were going through this statement in the information theory course, but when I was thinking how to name this new discovery measure of missing information. John von Neumann replied to him, you should call it entropy for two reasons. In first place, your answer to function has been studied in statistical mechanics under that name so it already has a name. In a second place and more important, nobody knows entropy really is so in a debate you will always have the advantage. So I think that's also the, like the main team of this talk to see if it has some meaning and if we, at the end, know a little bit more about entropy. Yes. So basically, the maximum entropy principle has been introduced by by Jane so interestingly he was the first one who was singing about a connection between statistical mechanics and information theory. And this is the famous paper in 1957, where he says like that this maximum entropy principle as an inference principle so as a statistical principle of statistical estimation gives the correct answer or like is consistent with the results in statistical mechanics because by maximizing it you get this by maximizing entropy with some constraints, you get the results that you know from thermodynamics so you also have the second first second law and everything so he was the first one who really found out that there is information content in thermodynamics and these two entropies up to this Boltzmann constant because one is one so the information to be smashed in bits the thermodynamic entropy is measured in joules per Kelvin. So this Boltzmann constant is this constant that that joins them and stuff like very complicated function of them is just constant that that is that differs these two. Okay. So, so the maximum entropy principle as we know it from the courses on statistical mechanics is that we have a set of constraints that are constraints in like in terms of linear constraints so automatic means. And then you can formulate it as a theorem or just as a principle that given the set of constraints the best estimate of the underlying property distribution is the one that maximizes the entropy functional subject to the constraint so it means that it's the constraint optimization and for that and we use this Lagrange functional so this method method of Lagrange multipliers. And that of course, you can use the maximum principle not only with the linear constraints but with some other types of constraints, but then we it will be very hard to interpret the constraints in terms of me because typically the energy of the system is like an automatic mean. So you can do that but then it's hard much harder to interpret the thermodynamic interpretation of entropy. Basically, if I say, if I take this general approach. So I take methods of large multipliers, then it means that I maximize this function s minus sorry for this double minus alpha, which is one large multiplier times the normalization constraint because we are in the property simplex. So this is a set of multipliers. So this is easy to take the partial derivative, and then I get this steady s over dpi minus alpha minus the sum of lambda k i k so you see that this that the property distribution is now only contained in this partial derivative and rest due to the linearity of the constraints doesn't contain them. So then in case I can write this as some function that is invertible for PI, then I can invert this function and then we see that there is a general so could let you on the structure of the terminomics where the Lagrange function has the terminic interpretation so it's the entropy minus the beta times the if the constraint is the energy so it's minus beta times energy. And we know that the beta here plays the role of the inverse temperature so then this is called massier function or this minus beta times free energy. So it means that the Lagrange function doesn't only serve as a tool, but it has a physical meaning. Of course provided that this can be done but very often we will study only cases where this inverse can be done. And now I will go through the examples from yesterday because we calculated the entropy but now the question is what is the what is the related distribution that you get to see that the different distribution gives you different distributions with very different results so these are three cases that are notoriously known. That is the Maxwell Boltzmann entropy which gives you the Boltzmann distribution so this e to the so is the exponential of minus energy over the temperature. This gives you this one over exponential minus one and Fermi Dirac gives you this one over exponential plus one so here one has to be very careful because there is the only difference is this plus minus one, but it's a major difference. And if we look at the distributions for this, this is in terms of and which is the average number of states. So we see that, and this is the, the, this transfer of energy so it's basically the energy minus the chemical potential so it's kind of effective energy. And I try to move it here. No, you're not. It doesn't, it doesn't matter. So then you see that while the Fermi Dirac is bounded then the Maxwell Boltzmann or the Boltzmann distribution and also Einstein goes to infinity but they are slightly different. Sorry, can you repeat what the information is explaining that. So, so this is the, the average number of particles so let's say the number of particles is the, the average energy times the total number of particles because in the case we have the discrete levels of energy states so in Fermi Dirac we have one is zero one in the other we have 123456, etc. And is the average number of particles in a certain state? In a certain, in a certain state. Okay, yes. And that this what you don't see is the, the energy over the temperature so E over KT. Okay. And now to something more interesting. Now for the structure forming systems. If you, so this is the entropy so yesterday, just remember that this is written in this is curly piece which is the probability that normalize is not that some of the Pi is one but some of the J Pi J is one. And then we have the energy and we can calculate the maximum distribution which is open like the Boltzmann distribution so there is this pre factor and to the J minus one over J factorial and then we have something minus beta epsilon Pi. Here the difference is that normal Boltzmann distribution there will be only alpha without any J. So the J is different for different J, which makes us the normalization condition to look like this so this J, ZJ, ZJ is this partial partition function times E to minus alpha J is equal to one. And if you think about it, this is the polynomial equation and E to minus alpha, because this E to minus alpha J. The polynomial equation is the order of largest possible molecule, which means that if you have largest molecule bigger than maybe three or four, there is no analytic formula for this because there is no analytic formula for solutions of polynomial equation of order five and higher. So it means that this is probably one of the reason why Boltzmann also didn't like go forward because he couldn't at that time without computer solve the normalization constraint. You can do it numerically of course but this is one of the small disadvantage. From this we can calculate the average number of molecules because the Pi J is just the number of molecules of the given state. And what you find out is that the free energy that is basically again the U minus DS can be written in terms of minus alpha over beta. And that appears if you consider the Boltzmann entropy, so the free energy is minus alpha over beta. Here's additional term minus M over beta. So then you'll see that this approach that you take the partial that you take the partition function. And then from this you calculate all the quantities is here different because you have to basically all five to say would be the logarithm of the partition function, you will, you will, you cannot just come up, you have to basically solve for it. So this is this is this is the difference here. So in some cases the partition function might be quite difficult to calculate and then it's maybe easier to calculate just either partition function or some other quantities that then can be used to get the final result. So I have some questions to this. Okay. Then I go to the case of Charlie's entropy because for the case of Charlie's entropy. So what the reason why it was introduced is that the maximum distribution is q exponential I showed yesterday so it's basically this one plus one minus q x to one over minus q. So it's like kind of power all. So it's the this transition between exponential and our lower tails, which can be found in many data. So one of the motivation can be that this is something that can be used for fitting data. And here one small thing that is necessary to mention also regarding the, the partition function is that so we know that in both some case you can write the distribution. It doesn't matter. So that you can write either e to a minus alpha minus beta epsilon i or you can write it ups e to the minus beta epsilon i over the partition function, which is why you can do that is because it's exponential so of course the exponential exponential of the sum is the product of exponential so then basically you can take out this alpha and make make it as a pre factor. Here it's not possible in general so this p star p a star it's the exponential of alpha plus beta epsilon i is not generally equal to exponent of minus beta epsilon i over this some kind of partition function. These are two different two different distributions if you really put in the numbers you see that these are two different distributions. However, in this case, you can derive this kind of interesting identity that the q exponential of the x plus y can be written as product of exponentials, but then one of the arguments must be rescaled to this number so basically what you can then find is that you can rewrite the alpha plus beta epsilon i if q exponential as a product of this exponentials but then this beat up, which would be the temperature then changes and you have something like a like a renormalization group of this temperature so now the question is whether the what is the real temperature so whether the beta tilde or whether the beta is this one over key. And this is this is still a little bit unclear. Sometimes people call itself propellant temperature but it's maybe a good sign of the fact that the temperature is not always easily related to the background multiplier so that the relation might be in certain cases a bit more complicated. And this must be always shown again so for each type of different entropy, the relation between the thermodynamic potentials or thermodynamic variables and Lagrange multipliers must be really discussed. Okay. And now, now we go to a bit advanced. Because the other examples where path dependent processes. So basically, now we ask what is the question what is the most probable histogram of a process that has length and and some parameters theta and case the histogram and now the property distribution of finding this histogram. You know that PK given theta. So basically the most probably histogram is the, the minimum of the over all possible histograms of this should be maximum so maximum argument of this PK given theta and very often and we've seen it also yesterday that this can be rewritten as the as the as the two terms so this is the the multiplicity of the histogram, and then the probability of a microstate that belongs to that histogram. And what you can see then that by taking the logarithm, then this logarithm of this PK given theta is basically equal to logarithm of WK. So this is our entropy. And then the algorithm of this GK theta. So this is, this is this probability of finding this histogram given the parameters. And I hope I think that in the lecture on information theory, you came across to this term cross entropy so then the, the, this S well is the relative entropy so divergence. For a public library, you know that you can write it as the entropy and cross entropy, then we'll appropriate science. This is very often the case also for the generalized entropy is. Not always so we will see that in our cases, we can find such a nice structure and not then really the log W plays the role of entropy, and the log K cross entropy plays the role of the constraint so because it contains this prior probability distribution, and we will see that it naturally gives us the constant. So basically, if I take the cross entropy, the ordinary one so if I have shown entropy and public library version then, then basically this cross entropy is minus sum of P a look qi value qi are the probabilities. And I can say, okay, let's think or let's consider that these prior probabilities depend exponentially on the energy. So I basically put in the on the Boltzmann distribution of the prior distribution. And then what you what you see is that this cross entropy is basically beta times average energy plus the logarithm of partition function which is the free energy. So, basically then I get, get this nice little, little relation between the public library versions entropy and cross entropy. And of course, the, for the case of party pendant processes, the constraints might not be of this form because then the prior distribution of the, observing the histogram might be different and this is what we see in a minute. And the case is that for for the multiple multiplicity of trajectory histogram so we have so this is the SSR processes so we have shown that in this case entropy is this entangled entropy. And if you consider that after each run we drive a ball to random state with probability qi, then basically we can think about what is the probability of the histogram in the following way. But once we jump, then after this first step only available space is the states below so so here's shown by the screen line that the first time all the lower states one to nine are available. If we jump to five, then basically this the subspace that is available to us is just from one to four. And this means that if you little bit think about the, what is the probability of observing this histogram is that by probability of particular sequence, each visit of a state that is not a ground state contributes to the probability of the next visit to a state, the next state with a factor one minus qi and this qi is the sum, the spark will some of the states that are below this given state. And only if we get, if there is this one, we get the renormalization factor since the process restarts, and then we can throw it again so by really thinking about this little bit one can see that this probability of something is to come K is the qi to right over this big qi minus one to k i. And then by taking the regular approximations to take any logarithm and and and using sterling formula, then you can show or even here, you don't have to use the ceiling because there's no factor loss. And it looks like this term so basically, this is the what we've seen before, plus this term where this log q is the is a log of this is parcel some of the probabilities. And if we then assume again that the prior distribution is the is the Boltzmann distribution so we say that the probability of of the starting depends on on the energy kind of. If we get the height of the of the stair, then basically we get the following relation that basically the cross entropy is the beta times the average energy times beta times something that is called this F. And this is the average sum over the partition partial partition function where the partition function goes only from one to some state. And basically take all these steps where I go from one to one, one to two, one to three, one to four, one to five, because these are all this partial partition function that are available for a state that that is not the ground state. So, then, if you do the I don't have it here. I don't know why, but then basically if you do the maximum entropy distribution principle. And basically take the entropy and maximize with this one string. What you get is, I'm sorry. This is the piece that with the P star I is then little qi over big qi. And if you crack in that qi is formed, then what you get is that the PI is proportional to one or I don't exist so but we didn't say that the qi is are the goals when distribution. Yeah, so yeah so so in that case it would be for the very high temperature so this is the limit of beta goes to zero. But in case if it's not that case, then of course you get that this is the e to the minus i over some j to i minus one j. And interesting way now, if you do it for a large number of steps, and this is something that is not doesn't have so much difference it almost always look very similar to this steps law. Isn't the denominator sum to one? No, I don't know because this is, this is, no, it's not sum to one, because this never really just for i minus one. Okay, this PI, this is not a full partition function, it's this partial partition function. And with each state it increases. So, that's why, and you can show very easily if this epsilon, the distribution of the energy states is not too wild, and it's this distribution effectively is very close to this slope so there is some universal class, and because of this slow driving. So, honestly, it's different between the energy levels is somehow exponential or something like that. And very often you get distribution that is very close to the steps law. And log log scale looks like a line so this is very often seen, and this might be explanation for many of these power law distributions because then what you see is that the power distribution is obtained because of this really asymmetric between driving and relaxation. Sorry, I didn't add this slide to this. I will add it later after the talk so that you can see it. The point earn, this is kind of interesting because, and now I will not go to the detail because it's kind of technical. So we see that the point earn is this this log of PI. And then you can find the, in a similar way that is cross entropy. And then we can say, let's, this is kind of complicated let's look at the case where we sent into infinity. So then we have this minus log PI. And then we have this, again the same term so qi log PI. It's the opposite so because no before we had the PI log qi, but now we have qi log PI plus log qi. And if you calculate the maximum entropy distribution. It looks like that basically the PI is the normalization constant times qi minus gamma. And this is kind of interesting because this. If you think about it, this cannot be satisfied to the boundary solution so they are three scenarios so either the optimal distribution is the private distribution. Or it's this kind of, it's called winner takes it all because then because if you do the maximization, you find that basically the derivative, the partial derivative of the large function cannot be zero, which means that it must be the corner solution. The corner solution is either one or zero. And in that case only one can be one and the rest is zero. So in the end is really the winner takes it all so it means that if there is some depending on this this initial. So the qi is initial ratio or initial property of the boss or the number of the frequency of the boss in the turn, then basically by drawing and drawing what you effectively get is that you will add more and more balls of the same color and then most almost all the balls are of this color that that means. So interesting enough is that in that case, even the maximum to be distribution doesn't give us the, the interior solution of the property simplex. So that's that's very interesting here. So that's that that was that was the examples. Now I just want to briefly say something about the principle. So what we've seen is that effectively maximum to be principle can be seen as a special case of the principle of minimum related relative entropy so basically we minimize this this school book library divergence with respect to some distribution and basically this is how we can encode the constraints in our case and and if there is if the divergence can be written as a sample entropy and cross entropy, then we get this, but in information to remember generally I think you've discussed it already a bit. But you can see us really minimization of the divergence here of course we have the edit value of the present of terminal potentials but in general it's just basically, for example, having priors from the models or measurements and we have posteris from parametric family and then we know the parametric optimization and then the advantage is that the relative entropy is defined for discrete and continuous distribution so you know that when doing like continuous entropy so replace the sum with the integral you might get into some troubles because basically the entropy, the continuous entropy is then might be negative, or might not be so well behaved. The reason is that basically you can see the entropy as a special case of divergence, but you take the prior as the uniform distribution in this case. And of course, since the uniform distribution doesn't exist in the continuous case, like if you have a continue number of states so if you go from minus infinity to infinity, then you need to choose other prior distribution. So, it's nice because it connects information theory terminus and geometry because from there, this can be used also to calculate a thermodynamic length in these things. Now I want to talk a little bit about the generalization of the maximum entropy principle maximum in divergence principle if you have a like process where you don't have only states but you have the trajectory so you can basically define something that is called caliber which is the trajectory version of entropy so you take the probability of observing the trajectory. Maybe for you interesting is that you can write entropy production in terms of this caliber so it's the divergence and it's divergence between the full probability of the forward trajectory and the backward trajectory, hope that you've seen this with the introduction to stochastic thermodynamics so this is quite useful so you see that there is this connection. And I recommend this paper principles of maximum entropy and maximum caliber and statistical physics. This is in review some other physics so you can find many applications of this. The good thing is that with the caliber you can have many, like much richer structure of constraints. But for example you can say, I have a constraint in the sense that I can't I basically say, this is my number of transitions from state I to stage eight. I say this is average number of transitions. And what is the distribution and then what you see is that with the, what you can easily show that with the maximum caliber that the process that maximizes the caliber is basically Markovian because this joint probability can be written as the product of transition probabilities from one day to the other. So basically the Markovian process is maximized the caliber, it is given constraints so this is kind of nice thing to see. Then I sort of do a bit like your information is about the number of positions. Yes, yes, exactly so now you want to collect the position rates for the. And you want to infer the probability of observing trajectories. Okay, because it's so by counting the probabilities. It's not a very clear whether the system is Markovian or whether like it can be just you know, the result of the more complicated Markovian process where the memory. And if you use the principle of maximum caliber and just say these are my constraints, and you find out that the process that maximizes the caliber is the Markov process. Okay. Okay. So we know they're only gonna say is a number is information over the number of transitions. Okay. And then you recover that basically. Yes, that the trajectory distribution is Markovian. So basically the joint probability is product of probabilities of these two point transitions. Okay. And then, here I just mentioned I don't want to go to too many details there are other extremely principles in thermodynamics, sprig of genius principle of minimum enterprise production principle of maximum production for living systems, and they are kind of controversial. They are also their Wikipedia page they have also their page on this assignment project, maybe can be interesting to you to go through it and to read it. And the last thing I wanted to mention is that basically the maximum can be seen as the inference tool because the maximum principle in physics consists of two steps. The first step is to calculate the distribution, and this is really just the statistical inference method. And the second is plugging the distribution back to entropy and calculating the exact number of entropy and use it to calculate free energy and all these things. And this really the second steps connects us with the thermodynamics. And what we can find out is that for each mark for each distribution that we obtained by this maximization procedure. There can be exist a whole class of entropy some constraints that lead to same distribution, but generally front thermodynamics. And then of course the interpretation of the Lagrange for me just in terms of these German I mean quantities can be different. So, that's kind of interesting and here too. Maybe it is examples so basically, like, since the maximum entropy principle maximizes the entropy, then if I take basically increasing function of that entropy that gives me the same distribution right. Because maximum of a function and maximum of an increasing function of that function is the same doesn't change anything. But if you do the calculation, then of course the Lagrange parameters will be changed. Here in that case, is that that it will change the Lagrange parameters, but not the ratio. So there is some, I would say calibration invariance of that. But the important thing is that each of these change leads to different thermodynamics because if you really want to plug it back to the entropy. And all the function of the entropy is will give us generally different numbers so this is for example the case for for the relation between Lagrange multipliers of size entropy and ready entropy. Both of them give us the key exponentials, but their Lagrange parameters are different. So we can calculate it, and you see that this is something is one over Q is something that we saw on the previous slides when I said, of course, when I mentioned is self-referential temperature. So there was this Q exponential of something to one minus Q. This is exactly this, this factor. So we see that that now this factor comes from the fact that the maximum size entropy and maybe not ready, because since the rainy entropy is additive, then it's extensive and then the temperature is intensive while the size entropy is non extensive. It means that the temperature is non intensive so it really changes with the size of the system so that the energy can be extensive again. And then, okay, I didn't the other example doesn't matter. That's all I think that you are for today tired enough. If you have any question or something you can ask now otherwise. That's it. Yes, because the rainy entropy can be written in terms of like this one over Q, the logarithm, this logarithm of size entropy. And although they give the same distribution is to exponential, then these large multipliers are different. So by plugging in back to the entropy, the same distribution to different you get basically different relations between the large multipliers and the, for example, the temperature or free energy or something. Here the free energy is the same because the alpha over beta, this ratio remains the same, but the temperature changes. Something else. Sorry, could you keep the reference for this paper which they were. Yeah, they were. And they find that this this process. Sorry. If you will mention the number of positions and the process was right. This is this paper that maximizes the energy. Yeah, isn't this paper review modern physics by Steve present the guys. Yeah, it's quite long, but yeah, so is it in this paper I recommend reading. Okay. Maybe I mean maybe some very nice questions like that. I mean, why, why the probability distribution that distracts the system should minimize the actual. It's a postulate right so it's an axiom to tomorrow we go to this axiom approach. It's kind of is related to the second law of German I mean it's because the entropy production must be positive so you can then relate it to the case that it can only decrease and at the end of like in the time limit. You end with distribution that is maximum and that's why the entropy should be maximized. So this is vaguely the connection but of course, it's not so easy and doesn't work in all cases but this is the general idea that. Also, if you were discussing this age here and it is basically related to entropy this is that that the process the typically the market process goes towards as it relaxes goes towards the distribution that maximizes essentially. Okay. Just just yeah. No comment so I think that's it and enjoy the evening. Thank you.