 So originally, the talk that I just gave, the lecture I just gave, was going to be the end of one day. Oops, oh yeah, I'm going to make that one. Okay, thank you very much. So originally the plan was going to be that the lecture I just gave was going to be the end of one day, and this was going to be the beginning of the next day. So therefore, I am now reviewing what we went over yesterday. Anyway, I'll go through this quite quickly. Again, we're looking at the simplest possible differential equation concerning the evolution of a probability distribution in time. Example being the dynamics of a Turing machine, digital gate and a circuit. These examples I'm giving now, compared to the ones I gave an hour ago, these are getting much, much closer to computer science. Vigin receptors in the cell membrane sensing an external medium, organelles in the cell interacting with one another, and so on and so forth. Again, I recall that we decomposed the time durator of the shine entropy into an entropy flow rate, an entropy production rate. And physically, I said, I didn't actually derive, that this right here, the entropy flow rate that is arising from, in thermodynamic scenarios, a system of interest in SOI exchanging energy with one or more baths, and that this actually implicitly assumes a bath is infinite with a separation of time scales, as I mentioned. Tomorrow, actually, Goje will be presenting the version of stochastic thermodynamics when you've got a finite bath, so that this doesn't apply and you actually don't even have a Markov chain. But in any case, when you do have this kind of a situation, which is a standard assumption in equilibrium statistical physics, that you've got an infinite bath and you have what's called separation of time scales, what this means loosely speaking is that the bath is always, I'm thermalizing, the bath is always itself at a thermal equilibrium, the heat bath is a thermal equilibrium, and it's always getting back to that equilibrium much, much faster than the evolution of your system of interest itself. The end result being that if the bath is always at thermal equilibrium, it has no information in it about its previous interactions with the system of interest. So that's what allows the evolution of the state of the system of interest to be formulated, modeled as a Markovian process. If they were not a separation of time scales, if the bath were going through its dynamics only at the same speed with which the system of interest is, then even if the bath were infinite, it wouldn't matter, the state of the bath would provide you with some information about its earlier interactions with the system of interest. So therefore the current state of the system of interest would be telling you something about the state of the bath, the state of the bath will be telling you something about the previous state of the system of interest, so the current state of the system of interest would actually reflect earlier states of the system of interest, which violates the Markovian dynamics assumption. So that's a very hand-wavy kind of an intuition for why separation of time scales is necessary for us to be able to assume Markovian dynamics when we are actually in this kind of a thermodynamic view of things. Alright, okay, and as we went through last time, we went through the proof of it, the entry production rate is now negative, and I argued that that is basically I would say argued that rather than prove that, that that's a second door thermodynamics, at least for these kinds of situations where you have Markovian dynamics. We then integrate over time and we saw that the distribution go from P0 to P1, that this is the total, this gives you land hours bound, that the actual entry flow can be written as the difference between the entry production and the change in entropy. One subtlety to note, so now we're starting to get into some new things. Delta S only depends on the initial and ending distributions. Anything is allowed in between the two. That is not true for either the heat flow or the entry production. So for example, in bid erasure, the drop in entropy is KT log 2 if you had a uniform initial distribution. What is going on for the heat flow and the entry production depends completely on the way that you implement that bid erasure. That's the difference between a semi-static evolution when you are racing a bit versus something that's horribly messy and inefficient with all kinds of turbulence. So now I'm going to be generalizing this. In lots of physical systems, your system of scenarios, your system of interest is not actually connected to a single bath. It can be connected to multiple particle reservoirs, a heat engine, and a minimum is connected to two baths. A hot one and a cold one is the difference between those two temperatures that actually makes it a heat engine. But everything I was doing before, it wasn't clear where that would be coming in, the fact that you have multiple reservoirs that are sometimes cold. Here is how to do it. What we do is we index all the reservoirs, all the baths, by this V. V is just the symbol that I'm using here. And we change the, basically we decompose the rate matrix, the global rate matrix, to a sum over all the reservoirs of reservoir-specific rate matrices. The intuition is going to be that each reservoir wants to push the system to be in a particular state that's appropriate for that reservoir. So for example, a heat bath at temperature T is going to want to push the system to be an equilibrium of a Boltzmann distribution for that particular temperature T. If you have another bath that's connected to your system at a temperature T prime, it's going to be wanting to push the system to be at a Boltzmann distribution for temperature T prime. It's the sum of those two pushes, so to speak, that's giving you your overall actual rate matrix that's governing the system. The system has got two masters in that particular case. And that is being reflected here in the fact that we have a sum over reservoirs. Okay, so, yep, okay, that's everything. Good, all good, all good. Okay, so physically, what is the entropy flow? I've said a bunch of times that it is the heat flows, but let's see if we can decompose that and actually put a little bit more flesh on that statement. So, and by the way, I am trying to rush to and I'm not going to be able to make it in just 10 minutes, but hopefully it'll be not too much longer than that. So anyway, here's the same question. What is the entropy flow? Here is where physics is coming in. We're going to be assuming that the underlying micro dynamics is time reversal invariant. This is what is called local detailed balance, local because it's reservoir specific. If you have heard the phrase detail balance before, this is the formula for the equilibrium state of a system if there were just a single reservoir and it obeys detail balance. It's basically saying that at that equilibrium, the total flow between a state J and a state I of probability is zero. That there's no probability going, not probability going in either direction. Local detail balance says that that's got to be true for each reservoir separately. Each reservoir separately is connecting physically to the system with a particular interface that doesn't know anything about other interfaces. And so just that one particular reservoir would send you to be equilibrium for its particular temperature and so on and so forth. It obeys detail balance and therefore all of them separately obey detail balance. Does that make sense to people? So as I said, the stationary states for each of the separate reservoirs is just a Boltzmann distribution here. I've extended it to allow for chemical potentials. And then what you see when you plug this in that the heat flow term right here for this particular case, it ends up being this expression. Yes, the temperature of that particular reservoir, and we can just if we ignore the chemical potential terms, it shows us the temperature times the energy times the rate of change of the probability of that particular state. So what the heat flow is, is the change in probability is the K times P term of the total energy of the system. If I were to write the total energy of the system, sum over states I, this is the expected energy of the system, and sometimes T, energy. So therefore the time derivative is going to be two terms. This one is the heat flow. It's the, because this right here P dot, remember, this battery is dead again. Remember PI dot, that is this KP right there for that particular reservoir. So this summed over all, so this right here PI dot, remember T composing my precise notation, something like this. So the entropy flow term is the first term where you are having the energy levels fixed and you are changing the probability distribution. Physically what's going on is you're exchanging energy with that particular reservoir. That exchange of energy is changing your state you are in. It's not changing the energy levels of the states, it's changing the state that you're in. And that's what entropy flow is. That's what it means, that's what the local detail balance is all about. This term right here, you've actually, what you're considering is for a fixed probability distribution, what is the actual time, what is the consequence if the energy levels of each state are changing? Can anybody guess what this corresponds to in a standard physics system? Where you're not trying, you're not changing the probability distribution, but you're changing the energy levels of the various states. This is work, this is if you got an external work reservoir, and there's all kinds of subtleties as to how you would actually model a work reservoir. Typically it's an infinite system as well, the details there. But basically the work reservoir causes this term for the change of the expected energy. All other reservoirs, which just involve exchanges of energy or particles or things like that, these are exchanges of energy or particles fluctuations. This is very often taken as a deterministic change. No fluctuations, no exchanges, this is like changing a magnetic field on a system or something like that. So assuming we're not down at the level of quantum field theory, there's nothing being exchanged under the particular circumstances. Okay, so notice also that here that each in the heat flow term is the sum over reservoirs of beta-soup reservoirs. So if you have multiple reservoirs at different temperatures, then this is going to be the sum over those reservoirs of the heat flow terms. Okay, let's see. So now let's return to the lower bound, which was just Landauer's bound, in the case of a single reservoir at temperature kT and just two states and so on. But now we can talk about multiple reservoirs and let's just say that kT is not even defined. So as soon as you're talking about multiple reservoirs, we can't even talk about something like, oh, bitter ratio is kT log 2. There's no T. There's multiple reservoirs. Let's kick it up, allow an arbitrary number of states, arbitrary initial distribution, arbitrary dynamics p of ending state given starting state. Let's assume that local detail balance does hold. Then the entropy flow is, as we just saw here, temperature normalized heat flow into the reservoirs. That's what this thing is. E p is non-negative, so therefore we get the generalized Landauer's bound. That the temperature normalized heat flow into the reservoirs is greater than or equal to the drop of entropy. Well, you can't have any kT's around because you don't even have a single T. Okay, and this is the, you'll see that phrase generalized Landauer's bound often in the literature referring to this result. Okay, dependence of E p on the initial distribution. Hold on, let me just take a think about this. There's actually a lot of non-trivial material here, and I think it's important material as well. Let's tomorrow, if this is okay with you, I'll start going over the dependence of E p on the initial distribution. The basic idea behind what's the rest of all this material is that everything in sarcastic thermodynamics, conventionally, everything that I presented to you so far, considers the following scenario from a high level. We know that here is the initial distribution of the state of the system. We want to now analyze, investigate what happens if we change the dynamics of the process that then operates on that initial distribution. So for example, can we achieve Landauer's bound with bitter racer and there's some sort of things with Maxwell's demon and so on. And I've got some distribution over possible states of a ligand receptor in a cell. Depending on how the ligand actually works, its process, you will get different kinds of sarcastic thermodynamics. In many situations, it's not the initial distribution that's fixed and you, the scientists, are getting to vary the process. Very often it's the process that's fixed, but the initial distribution is varying. So for example, if I'm building my computer, it's got a whole bunch of digital gates in it. A whole bunch is actually a real big, big number by the way. Those gates, all those AND gates in this computer, they are actually all running the same thermodynamic process. They just all came out of a fab somewhere in Taiwan and they were all pretty much, for all intents and purposes, identical physical systems. But the initial distribution over the inputs to all of those AND gates is going to be very, very different from one another every time that this computer is run. They're all in very, very different positions, so to speak, within the configuration of the entire information flow through the computer. So the question for this computer really is, if I have a same fixed physical process, what are the consequences of varying the initial distribution? Rather than saying if I've got a fixed initial distribution, what are the consequences of varying the process, which is where physics has mostly been focused? This is also true a lot in biology. I could, so to speak, design a paramecium so that it's running a particular process which is optimized for some initial distribution of states of its environment. But if I then actually take that paramecium and drop it in a different pond, it's now in a different environment, same paramecium or offspring of that paramecium or what have you, but it's now got a different initial distribution of nutrients and so on in its environment that it must be operating on. What are the consequences when you just have a fixed system and change the distribution? Okay, just as an analogy, I think it might be helpful. I also didn't think about this before, but when we gave lectures in order, so for example in information theory, one of the things that we discussed was the channel capacity and cost problem and rate distortion problem. And if you remember, the questions that we asked include approaches like you fix the channel, right? For example, the channel is the same, the inherent noise in the channel is it's not changing, but what you're trying to do in this channel capacity cost problem is to find this initial source distribution p of x that maximizes the channel capacity. So it's actually like really, really, really close problems, but now, okay, two days ago we discussed this in the realm of information theory. Now we are taking that same problem, expanding it for the physical systems where we can use stochastic thermodynamics. Okay, so they're also closely related. There are all of computer science and information theory. All of these questions are all about fundamental balance on the resource costs. And to just give you a very, very quick teaser on what the end result will be, it's that if you change the distribution, so every process, every physical process has an initial distribution which results in minimal EP for that process. In general, that's not going to be like an edge, it's going to be something with full support. Let's say that's Q0. So I'll give you a physical system, an AND gate, a parametrium, what have you. There are some initial distribution over its states which will result in that process help generating minimal entry production. But let's say I instead put it in a situation where it's got an initial distribution P0 rather than Q0. Focusing on what Gulda was just saying, building on what she was just saying, recall that information theory, relatively callback library divergence, is all about what is the extra expected code length if you actually use a code that's optimized for a distribution Q0 but use it with a distribution P0. Here is a similar thing, but we're not looking at code length. We're saying that Q0 is actually the distribution that results in minimal EP, but instead using P0. We evolve both of them through time through the exact same process. So P0 evolves through this process to P1, Q0 evolves to Q1. Then what happens is that this, by using P0 rather than Q0, there is an extra term in your entry production. Notice that this is completely independent. All the details of the process are buried in this map from P0, Q0 to P1, Q1. Nothing else matters. This, by what's called the data processing inequality for callback library divergence, is always greater than or equal to zero. So I take an AND gate that's got, let's see, a possible, I don't know, four possible inputs. And I optimize it for some. It was made in the fab to be resulting in minimal entry production for a distribution over its inputs. That's whatever, one third, one third, one quarter and one 12, I guess. But then I'm actually putting it in a place in my computer where it's actually getting an initial distribution over the states that's different from that. Say it's one quarter, one quarter, one quarter, one quarter. That's going to result in extra heat. That AND gate is going to get hotter by an amount that's just given by this drop in KL divergence. The details don't matter. It's an AND gate so we know what that map from P0 to P1 is. It's, you're doing an AND operation. So we know what that map is. So however that AND gate works, it's going to actually encounter this much extra heat. So that's a teaser, so to speak, for tomorrow. I'll try to derive that actually. It's not too hard to actually do so. Please. So again, just continuing with this kind of an analogy. So we know that, for example, in information theory, we emphasize that you have an object that you want to encode and you can encode it by using, you know, like this entropy, like, entropy is like a quantifier of the amount of the bits that you need to use to encode that object, right? But we also say that if you assume that underlying distribution of that object is not P, but it's Q, which is like a wrong incorrect distribution, you need to pay with a mismatch cost of a KL divergence. But it was one term of a KL divergence. Now we had it, actually, on the slide. Yeah. Now we have two terms of a KL divergence. So when you try to encode an object, what you're doing is rather static, right? You're just considering one sort of like mismatch term at one time, KL divergence term. It's like one term. But what we are having here is actually more than just encoding that is used for communication, but you're running a process. You're computing, so you have two terms of KL divergence. Okay. So I think this is also sort of like strengthening this analogy, but also sort of underlining the difference between just encoding for communication and running a process for computation. Yeah. And building on this very good point and building on that. Remember this morning I said that computation is information transformation, not just information transmission. Exactly along the lines of the good point that Gilger was just making. This we saw in her review. That is the kind of thing you get for information transmission when you happen to be off in the way that you've designed your channel. This difference is a somewhat analogous cost. It's entry production rather than an expected code length that you run into when you're actually doing transformation. Okay. So let's end it there. Coffee's upstairs and we can never get too much coffee. Thank you, everybody.