 It should be OK. So these first three lectures, how are we for time? OK. So I'm going to try to put a break in at about 9.45 this morning, something like that. Then we can resume at around 10 to finish at about 20, 211, 1040, something like that, at the reception. OK. I'll try for that. These two lectures today are pretty much going to be part of the same. They go together. They're very, very closely related. They're not separate kinds of topics like they were yesterday and the day before. Has anybody here already had stochastic thermodynamics? OK. If you have had some of it, then a lot of this will look like review at the beginning. The stuff at the end I'm pretty sure will be new to you. But the first part you may have seen, you probably will have seen before. So anyway, normal operating procedures interrupt. I should actually put in a rule that not only do you get graded on homeworks, but the more interruptions you make with intelligent questions, the better your grade. So to do something like that to try to force people to speak. Anyway, everything we've been doing up till now, there's been, when you take physics textbooks, one of the things that's most cool is you get to use all these symbols. And so you can write down all these cool things and people look over your shoulder and boy, what do all those symbols mean? And some of the symbols that you use to describe events in spacetime, some of them are x. But x is kind of common. But there's also this one called t, which is time. Guess what? We haven't seen t yet in this course. So stochastic thermodynamics, unfortunately, the word thermodynamics should really be thermostatics. There's nothing dynamic in it. And what we've presented so far, there has not been a t. Well, I like the letter t. And the letter t is all around us. Everything is dynamic. So that's what stochastic thermodynamics is all about, dynamic evolving off equilibrium systems. That's what everything, arguably, interesting in the world is about. And so that's what I'll be starting to get into now. And so that button won't work. OK, maybe this. Oops. OK. For simplicity, I'm only going to be considering systems whose state space is finite. Not just countable, but finite. Some examples from physics of such, why is that not showing? This is strange. It's not, it's changing on my screen, but it's not changing up there. On my screen, it keeps changing. Yeah, I've never encountered anything like this. Actually, that's several screens ago. This is before I just splat out. So for some reason it was. I can't share on the screen the presentation. So sorry, so what do you mean? What should I do? Desktop one on me. Wrong desktop. OK, now it's changing. OK. All right, so examples of systems with finite state spaces are my spin system. Beloved of people who give out Nobel Prizes, for example, six months or so ago. You've got a countable number of spins. The spins can be up or down, ferro, anti-ferro, parodi-magnetic, all these other great Greek prefixes. Those are systems that have countable finite state spaces. In fact, you take the thermodynamic limit, but you're still countable. A quantum dot, a very, very simple system. Normally a two state system, if you have multiple quantum dots, of course, then it's more than two states. Also, with some caveats, if you're careful about it, a coarse-grained version of a classical state space. The normal phase space, where it's a six dimensions position of momentum. You can coarse-grained it. You can, for example, take the gas in this room. It's got a phase space. It's actually an uncountably infinite space space, of course, but you can coarse-grained it into a finite number of states. If you're careful about it, there are some subtleties that are, for example, discussed in this paper here. Everything that I'll be talking will still apply because, and here's the crucial thing, you have a Markovian dynamics over that state space. OK, now, this is a very important point. Physicists, I was originally trained in physics. I was my PhD, so I can include myself in that category. Physicists, compared to other disciplines, were trained how to think well, but compared to just like all other disciplines, they are extremely narrow in their view of reality. So in particular, those who came up with stochastic thermodynamics, they have a toolbox. They have uncovered things which are applicable to vastly many more situations than they focus on. If you read all about stochastic thermodynamics, you'll see these, frankly, very boring things about colloids and pulling on RNA strings and things like this. And frankly, who cares? I mean, people who get very excited about that, I don't want to be near them in a party. They're boring people. There's everything else, and in particular, what is starting to be appreciated by the physics community is that stochastic thermodynamics applies to any system that evolves according to Markovian dynamics. That is vastly many more systems than just colloids and RNAs that you're stretching and things like that. Here are some examples. You don't need any energy flux. You don't need physics. You don't need thermodynamics for stochastic thermodynamics applies to apply. I've got projects going now where we are applying stochastic thermodynamics to theorems of stochastic thermodynamics to the dynamics of opinions over social networks, to the dynamics of gene regulatory networks. I am going to corrupt Professor Greeley so that we examine the stochastic thermodynamics of ecosystems, any kind of a system where you can actually describe it by a Markov chain. And you've seen these since you were, I don't know, a young teenager. Almost every theorem that I will present today applies. So there's a huge amount to be done. Many, many PhD theses, many, many new fields and conferences all about trying to figure out what these theorems of stochastic thermodynamics, what their implications are in these other domains are even more challenging, more important, how to modify and extend and expand stochastic thermodynamics to apply to these other domains. Another one, of course, up here that I'm reminded of looking at Alessandro is neurobiology. Many systems in the head can be viewed, at least to a certain approximation, as evolving in a Markovian dynamics. These kinds of results apply. All right, very good. It is so difficult to talk loud through this thing. Anyway, so first some comments to bring everybody up to speed on discrete time Markov chains. Most of this course will be continuous time, but discrete time has many, many more intuitive aspects to it. So we can make delta T be small, some constant. And a discrete time Markov chain, if we allow it to be time heterogenous so that the actual update equation is changing with time, we can write it like this. OK, it's just a linear dynamics. It's the lowest order possible dynamics you can have. That's why it is so ubiquitous. The only restrictions that we have is that I'm going to just take this off and be done with. It's hard to breathe. Excuse me. The only restriction is that we wanted to actually conserve probability. So that implies some constraints. That means that for any probability distribution here, that's got to be a probability distribution. In particular, if this one's a delta function, any delta function, this must be normalized. The sum of overall x of this must be 1, for that being any delta function. So what this tells you straight away is that the entries to the stochastic matrix must be non-negative and that the entries in each column must sum to 1. So I assume this is a review for most people. OK, so if the matrix has a couple of very nice properties, one, if it's what's called irreducible, which basically means you can get from any one state to any other state through sufficiently many transitions. And if it's aperiodic and it's not just cycling around, what that means is ultimately that this is the formal definition of aperiodic. And there's lots of textbooks you can go through or Wikipedia or so on to find out about this. But if all that's true, and these are the kinds of assumptions that to make life simpler for us, we normally assume, that means that there is a unique fixed point to this matrix G. I'm sorry, I should also say that this is all for the case where G is not varying in time. So this is the stationary state, also called the fixed point pi. It's going to be an equilibrium in many of the scenarios that we are going to be considering. It basically, dynamics goes away when you are at pi. OK, so yes, we're talking about G independent of T, some other kinds of equations to bear in mind that if you actually go forward in points in time and iterations, you just take a G to the nth power. If G actually varies in time, life becomes more complicated. The important thing to note here, which will make life be particularly complicated when we get to continuous time, is that if these different Gs at different times, those are matrices. If those matrices don't commute with one another, then you must be very, very careful to maintain that ordering when you actually do this, take that power of those matrices. If you just screw them up, mix them around, you will get the wrong result. OK, so as I say, we are now going to be focusing on continuous time. So the idea, loosely speaking, is to take what we were just seeing and extend it to the case where T is not integer valued, but is instead a real number. So again, you can view this as the simplest possible different. In this case, it's a differential rather than a difference equation for the probability distribution over the state's x is linear. You can't get any simpler than linear unless you simply say the p of x is unchanging in time. So it's the lowest order equation, basically. I mean, you can derive all this by assuming you've got a stochastic process and filtrations and all kinds of complicated stuff. Or you could just use a simple differential equation for, because we're in a finite state space differential equation for this is just going to be a vector. P sub x is just a vector. In this case, we now call this matrix is now called the rate matrix. It can change in time just like before we were allowing the stochastic matrix to change in time. Again, we want to ensure that it maintains probability, that it conserves probability. So in this case, though, it does not mean that the sum of the columns have to sum up to 1. It means they have to actually sum up to 0. Very important point. That's because we're now looking at a derivative. What's over here is not the next probability distribution at t plus infinitesimal delta t. Rather, it's the change in the probability distribution from one moment to the next. The change, so if you do a sum over x of 1p minus another p, that's got to be 0. Even though each p by itself has to sum to 1, that's why the entries in the rate matrix sum to the column entries sum to 0. Whereas in discrete time, they sum to 1. OK? Is all this clear to people? Too slow, too fast, just about right? Goldilocks. Got to love the Goldilocks. OK. We don't need to worry about a periodicity when we're talking about continuous time. That basically comes automatically intuitively because everything is mixing. Let's see how to phrase it. A periodicity in discrete time has got to do with intuitively speaking whether you've got your Markov chain that's just going around in a circle. It's more complicated than that, but that's the basic idea. When you actually make the time intervals infinitesimally small, then all those circles are getting just completely mushed up with one another. And there's no sense in which you could actually say in continuous time that unless it's a delta function probability distribution that's maintaining periodicity, going to continuous time essentially takes care of that requirement for free. We still need to say, though, that's irreducible, which basically means, again, that you can get from any one state to another. And then, again, we have the conclusion that you've got a unique fixed point. So in this case, though, it's a fixed point of that differential equation rather than just the update equation in discrete time. All right, if the rate matrix is independent of time, then we can take this equation and basically it's a differential equation. So just like in any other differential equation, it's got a solution that's given in this form. If the k is just a matrix, a fixed matrix, then p of t is just equal to e to the tk times p of 0. This should be contrasted to, in the case of discrete time, this equation here. We were taking the power of the matrix when it was discrete time. And when the matrix was just giving the update function, when it's instead giving the differential equation, now we take the exponential instead. So there's that kind of a parallel. Now, by the way, has everybody here had, does everybody here know what the exponential matrix is? OK, good. Here, for anybody who doesn't, you can just plug it into the Taylor expansion. There's one way to do it. Of the exponential, you can just plug in the matrix there and pump it all the way through. So anyway, here is the proof, quote, unquote, of that equation does actually satisfy. This is actually a solution to that differential equation. OK, now, if k depends on t, remember, in discrete time, if the matrices, the scarcity matrices did not commute, we had to pay attention to their order. It's the same thing in continuous time, but it's much, much more of a pain in the ass. You have to end up using what are called ordered exponentials. We would want to be able to say something like this, perhaps. Because given this equation, when k is independent of t, notice that this is just the integral from 0 to t of k. So you might think, oh, well, presumably, if k can vary with t, I can just use that exact same expression. Just express this as the integral from 0 to t of k of t, something like this. Because then, when you exponentiate it, remember, our rule for exponentiation is we would get plugging it into the Taylor series expansion of the exponential. Now, we're taking the integral of that rate matrix. So we would get something like that. It's a reasonable first attempt to try to see what the solution is. It doesn't work. Things are not so simple. Here is a way of giving some little factoids that might help you appreciate that. If the two matrices, a and b, do not commute, then in general, e to the a times e to the b does not equal e to the a plus b. Yeah, excuse me. And so this should be, I give you a lot of pause. It should really make you concerned that this might not work. Because an integral is just a whole bunch of sums. And remember, we need to maintain the ordering of the rate matrices, of the rate matrices. And this is saying that you can't do it for just a sum of 2. Here, we're talking about infinite sum. So it should be pretty reasonable to you that you would be in trouble if you were to try to actually just plug this in and get it to work if the matrices at different times do not commute. So in particular, because of this, if you just look at the n equals 2 term, and this sum right here in this Taylor series expansion, you will get this is what you're claiming, and it's very, very problematic if the matrices don't commute. So the actual solution, you can look this up. It's going to have to be pretty complicated. It arises a lot in quantum mechanics, maybe in your second course in quantum mechanics, the sum over paths formulation. It's what are called ordered exponentials. Or the ordered exponential of the rate matrix up to a time t, it can be expressed as this particular kind of a limit where you're just making sure that you have e to the a times e to the b rather than e to the a plus b. Fortunately, we aren't going to need to use that in this course. But it is something important to realize that it's there under the hood and that there are circumstances in which you have no choice but to bite the proverbial bullet. Have people come across ordered exponentials before? Some yes, some no. OK, good. So that was a little bit of a side into Markov chains of various sorts. OK, now we're going to be getting a little bit closer to stochastic thermodynamics. Computation is all about how information changes. This is another, I'm giving a lot of sociology of science kind of pointers. As I said, it's not just physicists who are extremely narrow. Everybody, there's a famous phrase, give a man a hammer and a repel the second nail. Information theory, which Gilger reviewed on Monday, it is fantastically powerful stuff. But it is only about getting information a set of bits exactly as it is from over there to over here. So for example, this is really striking the way that people look in particular at neurobiological systems. Many, many people think they're being very sophisticated and they're very, very smart people by analyzing the brain in terms of information theory. Oh, look, it's very, very good at getting this precise information from, say, the retina to the proximal lobe of the brain. It's very, very good in terms of a communication channel. Well, news flash, if all we were concerned about was getting information from one side of the brain to the other, we'd be a lot better off if we just scooped out all this gooey, gray, pink stuff and put in a bunch of fiber optic. Because then we would have a hell of a lot more accurate information, if that was our only goal, we'd be done. And we would actually have a much better brains than these humans have going around with these jellies in their heads. Computation is about information transformation. It's not about information transmission. And that is what is important about what's going on in the brain. That's the kinds of things that we're trying to get at in this course in starting up this field, in essence. There's a bunch of conferences going on. And we don't even know what it means to talk about information transformation in any kind of a meaningful sense. OK? So all kinds of caveats. We know what the answer is not. It's not information theory, not solely information theory, but we don't know what the answer is. So hopefully, the things that I'll be presenting in this course are getting a little bit closer to what it is. So anyway, computer science is one of the fields that does try to talk about, well, computation and talk about information transformation. And so that's what we'll get to later on. But nonetheless, notice what we want to do is understand the stochastic thermodynamic, the statistical physics of information transformation. Now, yeah, it's all good for you to sit up on the lectern there and cast dispersions at all these dumb people using information theory to analyze things and so on. But nonetheless, information theory, Shannon entropy is thermodynamic entropy. It is central to physics. We want to figure out the physics of information transformation. So let's just first, for grins and giggles, figure out, calculate, start playing with how the Shannon entropy, the thermodynamic entropy, how does it change in time, say if we're evolving under a continuous time Markov chain? Which need not have anything to do with physics like I was emphasizing, but nonetheless, it's a natural thing to start to analyze if you are interested in the physics of information transformation. Well, believe it or not, nobody, well, I'm sure people actually first calculated this a century or so ago, but this only became standard in the physics community with the work of stochastic thermodynamics in the past couple of decades. An obvious question. I've got a Markov chain, the simplest possible dynamics. I tend to be interested in entropy because I'm a weird person. How does entropy change under a Markov chain? That wasn't the kind of standard calculation you would see in the textbooks. It turns out to be a very, very interesting one. Here are, let's see, I'll try to go through the steps. Let's do it. Just do it. Little bit of a simple algebra. There's entropy, entropy dot, ds dt. Well, let's see. If you plug it in over here, first of all, why am I allowed to write this first equation there? Why can I take the dot only over the first p? Because what we're doing is the time derivative of p log p of the trace of rho log rho if you want to be quantum mechanical. But here, I've only got a derivative on the first p. How come? Because p is normalized. When you look at the other term, when you do the chain rule, you're going to get the derivative of log of p, which is 1 over p, times p. You're just going to get 1. And the time derivative of 1 is 0. OK? Then, anyway, so the next line, we know what p dot is. We're doing a Markov chain. So it's k times p. We can divide by this px prime there. How come I can do that? Yep, it's the same thing. We're using normalization, essentially, because we're going to be summing over the x components in the log of the px prime. We can then expand this. Do I remember the factor of 1 half? I did remember the factor of 1 half. This is just going to be relabeling variables. And then you add the same term to itself. I'm done with the 1 half. That's what's going on here. Then we can expand this even further into this particular expression there. Messy junk, why the hell have I done this, besides the fact that, oh, maybe I've got a couple of minutes to fill up in the lecture. The reason that we've done this, so there it is again. Let me just, we got to this point. Notice this first term has got a k and a p inside the log. The second one has only k's. That's the difference between the two terms. OK? Keep that in mind, hold that thought. Here it is again. That expression kp, kp, k and k. Notice the difference between the numerator and the denominator of both of these is that you're flipping around whether it's xx prime or x prime x. So this first term, the difference between the first and the second term is whether there's p's inside the log. And in both terms, the difference between the numerator and the denominator is the x and x prime ordering, just as kind of a mnemonic-ish kind of a thing. We take this, let's see, it's backwards here. We take the first one here, and that's called s.i, Irreversible Entropy Production. The second one, where there's no p's, is called the entropy flow. OK? So right now those terms are meaningless. I've just, all that we've done is said that the time derivative entropy can be written as entropy flow plus entropy production. OK? For ignore those, ignore that v index down there that should not be on these slides, my apologies. OK, good. So the entropy production term that is this one here, you can actually prove that this is non-negative. This is where life starts to get interesting. The proof is actually quite simple. If you'll notice for any a and b, a is greater than b, if and only if log of a over b is greater than 0. That's true for any a and b, so that means that this term here is sum is non-negative. OK? Entropy production is non-negative. That actually, what you have just seen, what you've just done is derived the second-order thermodynamics. It's not a phenomenological thing, Clausius and Kelvin and steam engines or anything like that. But if anybody comes to you with a perpetual motion machine proposal, what you can just do now is say, OK, which step of this algebra do you think you're getting around? Now, I'm being a little bit glib. This is the second-order thermodynamics. If your system is evolving according to a Markov chain and so on and so forth. But nonetheless, in all those situations, this could be the opinions of a dynamic set of opinions on a social network. This could be the genes that are being expressed in a gene regulatory network. In all of them, the entropy production will be non-negative if we are modeling them as a Markov chain. OK? So we have a question. So could you please say again how you identify the two terms as entropy flow and entropy production? At this point, that's just a definition. To actually relate it, I should be a little bit careful. When I say second-order thermodynamics, I'll be getting to in a little bit. Generically, if you are coupled to an infinite heat path, and if you assume time reversal invariance, what's called microversibility, then the entropy flow term is actually going to be the heat exchange between the system and the heat path. So what we're going to have then is, yes. If we look at this right here, it's hard to look at from the side. But the s dot e, which is the first term, that's going to be the heat between exchange between you and the heat path. This is non-negative. So what this is saying is that the change in entropy, the rate of change of entropy of the system is greater than or equal to the actual heat exchanged with the heat path. It's all going to be divided by temperature and so on. That's the form of the second-order thermodynamics that you will find in your first year textbooks. Question? Oh, how do we conclude that? I don't know. Um, this is weak. So I tell you, we probably need a new battery on this thing. Oh, here we go. OK. So if you'll notice the first line that A is greater than or equal to B, if A is greater than or equal to B, then log of A over B is log of something that's greater than 1. So that's going to be log of x of a quantity that's greater than 1. So that's going to be greater than or equal to 0. Equal to 1 at 0. And conversely, if A is less than B, assuming that all those terms are non-negative, and they have to be because they're in a rate matrix, if A is less than B, then instead you're taking log of something that's less than 1. So that's going to now be negative. So therefore, A minus B times the log of the ratio is either going to be a negative times a negative or a positive times a positive. There are no other cases. So the end result is that every single term in there, no matter what A and B are, they're all going to be non-negative. Does that work? OK, very good. So you are easier to please than my remote here is. Hold on a second. I have a question from Zoom from Francesco. Since you mentioned it, answer me. So the question is, since you mentioned it, do we have a formulation for the entropy production rate for microscopic systems that are not Markovian? There are certain classes of them. Yeah, very interesting question. And that will be the topic of future lectures. We'll be getting to that. Basically, there is, well, I think I just broke this damn thing, there is what's called the inclusive Hamiltonian formulation of stochastic thermodynamics where you actually assume a finite bath. The system involves in an arbitrary, well, it's broken. The system evolves in an arbitrary non-Markovian manner. And so it's actually more sophisticated, not only in the sense that it allows for non-Markovian, but that does not assume an infinite bath. And a lot of the theorems of stochastic thermodynamics, you can actually, there are transformed versions of them that apply in those kinds of situations. So the answer is yes. And as it will turn out, hopefully, I'll be able to present this next week, that's actually a very natural, clean way of formulating actually the stochastic thermodynamics of computational systems. As a heads up, if the interlocutor, if the questionnaire, if the person on the other end, if they've come across any quantum thermodynamics, any of what's called Kraus operators, partial trace quantum operations, which I will be talking about Friday, I guess, that is basically a scenario where you've got non-Markovian dynamics. So the answer is yes, you can do that too. Oh, and also to give a plug, there's a paper I wrote with a collaborator, Jan Korbal, K-O-R-B-S-M-B, somebody needs more coffee, B-S-M-Boy-E-O, but a priority in new journal of physics. I think about a year or so ago in which we consider an arbitrary non-Markovian system and saw how the laws of stochastic thermodynamics change. So they might want to look that up. The case that you say arbitrary non-Markovian, life gets very complicated. So that's probably more than the question I wanted to know. This is unfortunate. I seem to have broken the laser. So transforming it from operating very weakly to not operating at all. Okay, anyway, moving right along. How do I get back the screen like so? Okay, so are we all good? All questions answered? Okay, let's see. Okay, so let's now be very daring. Let's integrate over time. This is a differential equation. We know that the entropy production term is non-negative. Therefore, nothing's working today. Probably we should share again the screen because it's not sharing it. I don't know why. Let's see, you were supposed to be doing. Okay. Okay, seems not working. Okay. Okay. Okay, excellent. So sorry for all of the technical difficulties. This is where we left our intrepid band of explorers. Now we're going to be very daring. We're going to integrate this over time. And so what we get is this formula here. I'm going to say the integral over time. Let's say we have a dynamics that sends some distribution p0 of x to p1 of x. Sorry, I've changed whether the subscript or the argument is position or time. But in any case, we do that. We have a dynamics that does this. We know that the entropy production is always non-negative at every moment in time. So we can write down this formula. The change in the shan entropy of the system between p0 and p1, that is this term here, negative delta s. This one right there, that's the entropy flow, which we will see later on. It can be identified with the heat flow or more generally any flow with other reservoirs. And then there's this thing, delta sigma, which cannot be negative. Okay. So a simple example of this. Let's say the system has attached to a single heat bath at temperature T. That will be, I'm not really derived that yet, but that will be important for saying that this right here is basically going to be the heat flow divided by that temperature T. Let's say there's two possible states. Let's say the initial distribution, that's this one here, let's say it's uniform, one half, one half over those two states. Let's say the process implements bitter ratio. So the ending distribution here, that's gonna be a delta function. So you're taking two possible states with uniform probability and you're mapping it to a final state, which is guaranteed to be, say for example, a zero no matter what you started with. That's the nature of this conditional probability right here. Well, that's what you call a bit, a bitter ratio when you've got a computational system, can start with your bit being zero or one, uniformly possible, and you end up with it being a definite state. So a lower bound on the entropy flow, a lower bound on the heat flow out to the heat bath, that lower bound will be if the entropy production is zero, if delta sigma is zero, and so we see that the entropy flow is lower bounded by the drop in entropy. That's what this formula right here is telling us. It's lower bounded by the drop in entropy. The drop in entropy in this particular case where if you start at P zero, what's the entropy for this particular P zero? Log of two, what's the entropy of P one? Zero, log of two minus zero is log of two. Rolf Landau's conclusion from 1961. This is a formal way of deriving it. If you read Rolf's paper, he and Charlie Bennett and all these other people, they were doing their work when no stochastic thermodynamics toolbox was around. They only had equilibrium statistical physics. So if you read his papers, you will find essentially there are no equations. There's nothing like what I just presented. There's a lot of words, and frankly some of the words even contradict one another. So fabulous intuition, but really it wasn't fully legitimate as a derivation. What you have just seen is legitimate as a derivation of what's called Landauer bound. That if you are erasing a bit in a computer, it's going to cost you at least KT log two if and only if there are two possible states and your initial distribution is uniform. Notice that if your initial distribution is not uniform, this is not going to be KT log two. You will be able to read many, many papers by very many bright people who will just say bitter racer costs KT log two full stop, paying no attention to what the initial distribution is. If you ever start reading such a paper, you do not need to finish reading such a paper. So anyway, here is some much more careful analysis of these kinds of things. Juan Parando at Alia, 2015, a Tucker hero, Sigawa, I cannot sing the praises of this paper too much. It's called something like logical irreversibility is not the same as thermodynamic irreversibility, which is true and which is something that many people, especially in computer science, simply aren't sensitized to. They like to take this very simple bumper sticker motto that logically, if you're logically reversible and you're thermodynamically reversible, in fact, the two things have nothing to do with one another. So, but anyway, there's also then this traces back to some work by Hasegawa in 2010, and I reviewed and extended some of these results in an article in Entry Magazine in 2015. Okay, so, let's see, okay. So the lower bound in end-of-reap production is obtained only at equilibrium, all right. How does the system can implement a beta-racer with an equilibrium protocol? Is it possible? Okay, very good question. So, I would direct the questioner to this paper right here. He's a person, Hasegawa actually deserves the credit and doesn't get it enough in my estimation. People use what are called semi-static relaxations. So, the idea is that you have an infinite amount of time to perform your beta-racer. You have no bounds, no constraints on what you can do to implement your beta-racer, and you have an infinite amount of time. So, it's just like when you compress a piston in your introductory thermodynamics text where you're doing that arbitrarily slowly, so therefore the entire process you're actually not increasing any entry production, the key is that each moment in time you are to first order at equilibrium. Exactly as the questioner is implying. If you do that, you can then implement your beta-racer and get zero entry production. If you are always at equilibrium, which requires you in general to be doing having an infinite amount of time and complete control over the Hamiltonian, there can't be any constraints on your system whatsoever. Then you can actually achieve the bound, but not otherwise. Okay? This actually, I thank the questioner because this actually is touching on a very important point. KT log two, Boltzmann's constant, it's a really small thing. It's teeny tiny. KT log two is like nothing. Your brain, there's all kinds of reasons to believe that natural selection did everything it possibly could to optimize the energetics in your brain. And there's all kinds of reasons to believe that natural selection is an all that it could to optimize the energetics and things like protein translations in cells and so on and so forth. These are all processes that actually require many orders of magnitude more than KT. What's going on? If we can get down to KT log two and natural selection says, you know, it's crucially important for fitness functions to be able to reduce the energetic costs, to be able to reduce the amount of heat flow because heat's ultimately energetic costs. What's going on? How are these two perceptions consistent with another? Here's the key, the questioner's question. Need to be able to say something a little bit more evocative than that. It, the answer of how you get energy production to equal zero, it's just like that piston where you have an infinite amount of time to very, very slowly compress it. Piston compression is essentially a bit of ratio. So that is not what's going on in the brain because in the brain there are massive constraints on the way that the brain can work. It's got these spiking neurons that are being connected to one another in this vastly messy system. It's got to do its computations fast. If a tiger starts walking in that door, I don't have time to adiabatically slowly compress a piston to be able to figure out, I've got to go out that door. I've got to do it much, much faster. And I've got to be processing all these other things at the same time. And most important of all, I've got extremely powerful constraints and restrictions on what I can do. I can't just push a piston closed and slowly push the piston the other way. That's assuming a particular form of the Hamiltonian that I've complete control over, that's not the case when you've got something like a ribosome that is translating amino acid sequences, translating a nucleotide sequences into amino acid sequences. It's not true in the brain. We cannot actually, natural selection could not find a way to actually exploit processes that get anywhere close to KT log two. All of the true thermodynamic costs up in our world arise from the constraints which are causing the EP to be many, many orders of magnitude greater than the land are about. Here is another set of PhD theses. Essentially nothing is known right now about the relationship between constraints. If I can only do things with a chemical network, if I can only do things with spiking neurons and the way that they actually operate in terms of ion pumps and so on. If that is the only set of tools that I have, that is a set of constraints that I have. What is the minimal entry production that will be generated if I perform a particular map from a P zero to P one, given that I have to use those constraints and I've got to do it in finite time? Very easy to stay questioned. Nothing is known about it right now and that is driving the thermodynamics of every interesting system, the answer to that question. Okay, off the soapbox. 955, well I'm going along but what else is new? Let me just try to finish up what's supposed to be this first lecture with a little bit of what are called fluctuation terms. And by the way, to emphasize again, so far I've been using physics, thermodynamics, heat flow and so on and so forth, that language to describe the theorems. But that proof of the second law that I gave, that does not rely at all on there being energies, on it being a thermodynamic system. Again, it applies to an ecosystem or to a gene regulatory network just as well. Okay, here are some more theorems that also apply very, very generally. These are what are called the fluctuation theorems. You may have heard of the Jarzinski's equality, Crooks's theorem and so on and so forth. Those are all examples of the fluctuation theorems. This is one of the I think more profound results has come out of stochastic thermodynamics. Up at the scale on which we all live, if somebody were to take a movie of what's going on in this room and then show it to somebody else, run it forward or backward and say okay to the person that they're running it to, is this movie going forward or backward? They would always be able to tell you. That's the second law up at our level. If you go down to the microscopic level, famously the microscopic laws of physics are time symmetric. You've got billiard balls in elastic collisions bouncing back and forth between one another. Show a movie, you can't tell if it's going forward or backward. How to reconcile these, that question traces all the way back to Boltzmann. Modern thinking is that ultimately it's got to do actually with cosmology and the Big Bang and so on. But here's the thing. If up at the R scale, at the macroscopic thermodynamic limit scale of things, if you could always tell which direction the movie's going and down at the micro scale, you can never tell which way the movie's going. There must be a middle, a mesoscale where it's not really clear, it might be going forward, it might not be going forward. Sometimes you can tell, sometimes not. It just stands to reason. Or more generally, the question is what is going on the middle scale? Can one say something about, is it sometimes phrased in titles or papers actually, the length of the arrow of time based upon the scale of your system? The answer is yes and that is done by the fluctuation theorems. So just to remind everybody, we've got a time varying master equation. The time derivative of the entropy is equal to the EF plus the EP. Notice that everything here, these are all concerning expectations over the trajectories of the system. Entropy, for example, is the expectation of log of P. And the heat flow is also going to be, that's an expectation over all trajectories of dynamics of the system and the heat path. We can also, this is actually arguably one of the first major insights of stochastic thermodynamics. You can actually define all of the quantities of thermodynamics at the trajectory level, not just at the expectation level. So for example, it's sometimes called stochastic entropy, which I think is a horrible name, should be called trajectory level entropy. But basically, let's call it the trajectory level entropy. If the system is in state I times T, that's just negative log of P sub I of T. It's expectation of shan entropy, but this is perfectly well-defined for just being in a single state. And moreover, you can have P sub I of T, that could vary with T. So I can have a dynamic process where the distribution is changing, my system is jumping, say, from one state I to another state J to another state K, moreover, the underlying distribution of all those states is changing with time. That way I can actually trace the trajectory level entropy along the trajectory as that trajectory evolves. The average of that, the expectation over all those trajectories, if you were to evaluate at the beginning of a process and the end of one, you're gonna get the difference of the shan entropies. But I can look at the level of an individual trajectory. Okay? So a theorem, I'm not gonna prove it here, there's many, many ways to prove these results. They apply in an extraordinarily broad wide range of scenarios, these are called integral fluctuation theorems. This right here is, that should be actually a capital sigma, this is the entropy production of the system. This expectation is over all trajectories. This is true, that's an equality, it's not inequality like the second law. It's true in almost every CTMC, it's amazing how wide range of situations that's true. If we take this and apply Jensen's inequality, we've got an expectation of a convex function. We can apply Jensen's inequality and what you will get is the expectation of trajectories of the EP. So you will get is that the expected value of that sigma there, that's non-negative once you apply Jensen's inequality, which is the second law. But this is more powerful than Jensen's inequality. This what it's allowing you to have is non-zero probability that any single trajectory, EP is less than zero. So here is the answer to that paradox which has been around since Boltzmann. Here is the entropy along a particular trajectory or the entropy production, let me say. Eraser, no eraser, I guess it's hand. Here's the EP along a particular trajectory. Here is the probability of that particular trajectory. This is the world that we live in. Thank you. This is the world that we live in. It's a delta function or more generally it has almost zero probability down here. As I say, the integral fluctuation theorem says that modulo of things like kT or if I should write it out, not using bra ket notation, the average over trajectories of the entropy production along that particular trajectory and I'm not actually defined that for you. I'm just sort of talking in reasonable terms right now. One of the papers that I've promoted very, very hard is by Esposito and Vandenbrock, the ensemble. It's got the word ensemble in the title from 2015. It actually derives this. But so negative kT, this right here is equal to one exactly. Up at the thermodynamic level, the way that we achieve that equality is by essentially having always a non-negative EP along essentially every trajectory you can ever see. That's why you will always be able to tell if the movie's going backward or forward. Okay, but let's see, color chart, what a concept. When we go down to the microscopic scale, we can't tell if we're going backward or forward. We get this. Masoscopic scale, you get something more like this. So at the masoscopic scale, there's always some probability that that movie will go backwards. The expectation value, the second law is still holding. So the expectation value of this distribution is always going to be positive. It's always going to obey the integral fluctuation theorem. But as we go from the micro scale to the macro scale, the actual shape will change. This has all been experimentally verified. This is where those boring experiments involving colloids and pulling an RNA and so on and so forth arise. That is the resolution of this deep paradox that's been around a century and a half, okay? So David, I have a general question. So when you're considered the entropy, Shannon entropy, this is a statistical quantity. It's an ensemble quantity that you define over an ensemble. Ah, but right here it's not. Yeah. This is a single point, single trajectory. Well, yes, but it depends on the probability of state i at time t, which is essentially also a statistical quantity, or essentially it's an ensemble quantity. So can you interpret this as an information cost, as a coding cost? Say, how much? It's one particular, it's kind of, that's an interesting point. So I'll be talking about this, I guess, on Monday, is I'm turning machines in algorithmic information theory, comagor complexity. What motivated a lot of the work on comagor complexity and work by Chaitan and Solomonoff and Levin and so on and so forth, was to try to come up with, so far we've got a quantification of uncertainty, or to view it another way of complexity and a probability distribution. They want to be able to come up with such a measure for a singleton, for a single particular, in their case, computational string. You could view this somewhat as being another kind of a solution to that issue, in that it's a measure of the uncertainty, essentially, along a particular trajectory. You still need an underlying probability distribution, but you're not taking the average, you don't need, you're not necessarily taking the average of your quantity under that distribution. You've been able to actually define it down at the level of a singleton. So basically what the insight was, is that shan entropy is just the average of a quantity. When Gulder presented the AEP, for example, you derived that from the law of large numbers by simply saying that the log of the empirical distribution, it's going to converge under IID to its average and the average of that log is going to be just the P times the log of that P. So it's essentially the same kind of an argument. Is that? Okay, there is a question in the chat, yes. Is there an equivalent interpretation at the path level for the Landauer principle? Is there an equivalent interpretation at what level? Of the Landauer principle. At the path level. Oh, well, as I mentioned before, the Landauer bound can only be achieved if you've got a semi-static process that takes an infinite amount of time. And if you're doing something like that, then every single one of the trajectories is going to go from, it doesn't matter, remember it was for the initial distribution was uniform. So the term, remember the stochastic entropy written right there, negative log of PI at time equals zero, this is going to be one half no matter which of the two states you're in. So this right here is just going to be log two no matter which of the two states you're in the beginning of your trajectory. By the end of the trajectory where the delta function with probability one, and in that particular case, well really it should be infinity, this right here is going to be zero. So the stochastic entropy along a trajectory, the trajectory level entropy in the case of Landauer's bound for every trajectory will start at log two and end at the value zero for every single one. And if you're doing, if you're always going to be in equilibrium, they'll be transitioned from log two down to zero. It depends on what's called your quench and semi-static process. But essentially I'm pretty, it depends on the details, whether there are going to be any trajectory whether there's actually more than one possible value of the trajectory level entropy at any particular moment in time when you're achieving actually Landauer's bound. But at the beginning and the end, there all trajectories have the same value. All right, I think I'm almost done with the first lecture only running half an hour behind. Let's see, movie runs backward as we said. And that's it, yep. So let's all take my five minute break and then reconvene. And as always, I'm running a little bit long. So we'll see how far we can get in the second half of lecture today.