 Second day of the spring college. I think we are ready to start. David, the floor is yours. Okay. Thank you very much. So again, I'd like to emphasize to everybody this is I'm learning while you're learning and To help both of us be learning. So to do this for me. Don't do it. Never mind for you Ask questions all the time. Give me feedback. You're going too fast. You're putting me to sleep. You're going so slowly Whatever everything like that. All information is good information. If you're a good Bayesian, you can just ignore data at the worst case. So Anyway Okay, so In retrospect actually I'm not sure maybe today's lecture should have gone before yesterday's they sort of go together yesterday Gioja she reviewed information theory. So just present you a bunch of mathematical tools But in a certain sense, they're still in the very real sense. There's still the question. Why do we care about information theory? Why do we even care about probability theory for that matter if we're doing physics shorting to this equation? The born rules got probability theory But actually if you adhere to the Church of Everett many worlds of which I am a card carry member by the way Then there's not even really probability even in the born rule in quantum mechanics No probability of general relativity What the hell is this probability crap that we're forced to learn along with all these other Things when we are undergraduates and so on and so forth. Why even probabilities in the first place? so I'm going to start by actually Giving a set of arguments. They're still controversial these in the literature But there's a bunch of arguments that actually say that according to very very simple desiderata You actually have no choice, but to use probability theory to quantify uncertainty I will then give other kinds of first principles arguments to say Oh, and if you just want a single number a single real number that's going to capture and quantify the amount of Uncertainty you do have Which we just determined was in terms of a probability theory What do you want to use and those arguments say entropy? And then I'm going to finally say talk a little bit about Bayesian statistics and then say, okay We decide we want to use probability theory. We decide we want to quantify things with uncertainty How is it that we should actually say here is my amount of uncertainty about the system What is my best guess for the actual state of the system and by doing that? With no discussion at all about heat baths or any of that other kind of weird stuff nothing about ergodic theory Which by the way between you and me is not very pleasant stuff Without any of that derive all statistical physics and that's basically Ed Janes 1957 paper. It's a beautiful paper Two papers have indirectly referenced first Everett's Many-worlds paper. It was actually his PhD thesis It's something like 15 pages long in physical review It's very easy to read and it basically solves a whole bunch of problems that people at the time weren't even really sure how to think about That's one and then this other one the 1957 paper by Ed Janes Okay, so on we go In many physical scenarios arguably in every single physical scenario We it's kind of strange not to be looking up at my screen You can get risky and try this in many physical scenarios We don't actually know all the details about the relevant laws of we know all the laws of physics Or at least we think we do we're cocky Physicists we know everything we know the laws of physics But if I give you a particular physical situation in front of you, you will not know all of the parameters to infinite precision That's almost axiomatically true. I mean, it's obvious or at least you can say a minimum What's that stupid size at a minimum? That's very very often true So we are uncertain in particular about the initial state of any system We know that's dynamic laws, but we're uncertain about its initial state so Well, what do we do we are physics grad students? Da-da-da-da-da the only thing that we know how to do in life is physics It seems we've done nothing else for the past six seven eight years We'll need nothing else to be quite honest for the next many many years So that's the only thing we know how to do we don't have a life, you know We're not human beings we're physicists and all we know how to do is physics We know the laws of physics, but we don't know the initial states to plug into our brilliant brains to calculate the future What do we do? We're stuck. We're physicists. We've got to be able to calculate, but how do we calculate? Okay, so What we so what we need to be able to do is I'm figure out how to do physics when you have what's called uncertainty All right So the first thing we got to do is figure out how to quantify that uncertainty So one natural thing to think is that we have a function over all the oops, that's not right Sorry, it's the first time using this pointer We have a function From all the possible events all the possible values of the parameters and so on so forth to the real numbers What do we want that function to be this is a way of trying to Mathematize this question of how do we deal with uncertainty and now as I'm going to be I'm saying I'm going to be demonstrating or sketching There's many different sets of the Zedorada that all say to use the axioms of probability theory commogoros Axioms of probability theory if people don't know what those axioms are if they don't know what a sigma algebra is and things like this All these fancy very very mathy cool sounding words I'm just Wikipedia. It is very very simple concepts It's several to Zedorada about intersections unions real-valued functions and so on and they basically are derived As a way to deal with uncertainty, and I'm going to sketch out a little bit of that These are some arguments that actually went back to Ramsey Frank Ramsey of Ramsey numbers a I believe he was a British dawn very very smart guy But in any case one of the things he's did it's almost like a fromats last theorem a throwaway comment He made a much sure if it was in a book or not Turned out to be a very subtle thing to prove and it's still argued about today But perhaps the canonical starting point for these arguments is a definite ease Dutch book argument so first What is a Dutch book a Dutch book means? This is a strange kind of a set up a Dutch book is a set of bets Odds and stakes between an agent and a bookie between you and me. I'm the bookie. You're the agents About some possible events for example the state of a system and a Dutch book such set of odds and stakes Means that I the bookie It's not that I'm going to make money from you under Expectation or something like that because that's already assuming the axioms of probability theory It's rather no matter how the world turns out. I Make money off of you. I Forget why that's called a Dutch book per se, but in any case This is something that I very much wants to have and you want to avoid at all costs, okay? so Let's assume that the age so what the the way the argument is going to go is That if you ahead of time as I'm a sign probabilities odds To the series of possible events which violate any of the axioms of probability theory. I Can design a Dutch book against you Such that well if you're truthfully saying these are what your odds are So you accept a gamble based upon this Dutch book. I make money and you lose money So therefore if you don't want to become even more impoverished than all grad students are You must use probability theory Okay, so let's I'm not going to actually prove it the proof from It's a little up tedious, but I'll go through giving some illustrations of it so And it's an if and only if so conversely if you actually present Probabilities that are consistent with the laws of probability theory, right? If you present odds on the possible events that are consistent with the laws of probability In that case there is no Dutch book that I the bookie can make against you So it's an if and only if All right, so here is an example my as I say I'm not going to go through the details It's a lot of just testing different cases and showing how to do things, but this is an example of a Dutch book so in this particular case we are talking about a Possible events are which horse? Windsor race, that's what we're betting on so it's like this is a physical system with four states You can think of it that way But for us, it's a horse winning a race There's going to be odds that you say ahead of time And I'm going to take those odds and they're going to violate probability theory in this particular case by semi to greater than one Since you are giving me those odds I'm going to design a Dutch book which since you believe those odds That's kind of an assumption with things that philosophers are you but since you believe those odds You will take this series of bets that I offer you and the end results going to be that I make money No matter what the actual hoop which horse actually wins Because you're violating this particular axiom probability theory about something to one so Again, maybe I could just do it this way So let's say that these are the odds them that You put that you place on the horses even odds in other words your probability you sign to horse one winning It's point five. You think the probability of horse two winning is a quarter and so on so forth You are violating the axiom probability theory because you're saying that the sum Overall events the probability of the events is greater than one It is difficult to talk through these things So I mean what we're going to do is I the bookie I'm going to them be designing a set of these these are the bets You will accept them all and the end result is going to be that because you accept all of these bets And you have to put in some money to a bet. I give you money back depending on whether it happens So for example in this one right here the debt price We just lost the oh, I said the debt price is a hundred If horse one wins, then I give you back that hundred plus a hundred Otherwise, I just keep the hundred Here because of the way the odds are you give me fifty But because of it's a 301 I'm the state the odds here if I if that horse wins horse to I give you back 150 And you would accept all these given that you say these are the odds Okay, so for example right here as I say you will be giving me then a total up front because you're making these four bets You give me the bookie a total of 210. Yeah, I guess these should be euros Presumably I couldn't find euros on the fonts whatever so you give me 210 euros and here's the four possible events of horse one wins you get back to 100 euros the hundred that you gave for that particular bet Plus the hundred because that horse ones here is again 200 you give back 200 200 So no matter what happens you end up Getting back 200, but you gave me a total of 210 Because there were four events and so in this way No matter what reality is because the odds that you Agree to the odds that you told me were true because they value this particular action of probability theory I make many off of you no matter what happens. So this is not making money under expectation This is making money no matter what the state of the world is Okay So we can elaborate this kind of reasoning and simplified a little bit So here's a Reasonable argument for if there's a case of well one possible event that can be true or false So remember the way that it's going The agent you give me Qs ahead of time Q is the odds s is the stake and If it's true, then I give you back S minus Qs, which is what the amount that you gave me and if it's false Then the payoff to the agent you lose it's minus Qs Okay, because you receive nothing if it's false So in this particular case We walk it through the reason that the expected amount to you if the probability of e actually is Q So this is why you would accept the gamble is because if those were actually the odds The reason you'd accept this is because you think it's fair And so we can throw little epsilons in so that we can now establish that actually You would like this because you think that your average payoff is going to be greater than zero Okay, so it's a fair bet if the probability of e actually were Q So you would accept it If Q were that particular probability that you believe so therefore you would also offer it to me going back the other way For some if you would offer me the bet at a particular p of a e that you think is actually true Okay, so Let's say that in this particular case the axiom of probability theory you're going to be violating is that p of e equals true Is actually less than zero Okay, so it's not a matter of adding up to one in this case There's an axiom probability theory saying that every event has a non negative value probability in that particular case rather than My bait buying rather than you're buying bets from me I will buy bets from you that you would offer me because you think these stakes are legitimate and As if we walk it through to the table here the payoff in this case to me Is going to be s minus P of e what you think times s if it's false Because if you look at the table right here, you believe p of e so this is what would be true for any queue That's the way that the book would go You believe that actually q equals p of e so in this particular case The payoff to me in this case. I'm the bookie because we flipped around who's the bookie and who's the agent would be minus p e of s and so if you look at this because P of e is less than zero I Actually end up making money no matter what happens Because you did this weird thing of saying the probability of an event is less than zero So this one's less this particular example of what happens if you value an axiom or probability theory is less intuitive than the example Where everything sums up to greater than one, but it basically follows by the fact that for fair odds This is actually a book that you the agent would be willing to either sell or buy So to deal with this case where problem where you've actually Offering a book with our lesson zero. We're flipping around who buys and who sells Using this right here. We set up the rules for the payoffs and therefore I win again Okay So are these kinds of arguments? The different days Dutch book argument is the kind of reasoning here clear Okay, good Let's see which on this one Yep, so this is the other probably so bookie wins Okay, so this is just going through what I pretty much said So you can continue this for all the actions of probability theory So there's other things called Cox's axioms many different ways of establishing this kind of reasoning Saying that We are uncertain about the state of the system in the case of Dutch book arguments And if our goal is to not lose money Cox's axioms are different. There's different kinds of axioms people have come up with we must actually use Komogorov's axiom that axiomatization of probability theory Okay, so What are some of the implications of those axioms of probability theory? So We just established. Yes use probability theory now. I'm going to just do something. That's a Rather. Oh, I don't know what the word is. I certainly don't know what the word is this early in the morning but It take what's actually just a standard definition and make it more grandiose by giving it calling it a theorem Some is a called Bayes theorem and all it is is a set of is a definition right here This is derivation of it the problem the conditional probability of a given b By definition is the joint probability of a and b divided by the probability of b that Just using this definition of conditional probability again is going to be p of a times p of b given a over p of b But yeah, yes, we have a question on on zoom By Michele is there an example like the one before on horses where? P is less than zero where p is negative Yes, you can set up the exact same kind of a reasoning. I reduced I try to Illustrate that there are various ways of formalizing the precise betting scenario By going from four horses to simply true false But you could do the juice you can run through the same kind of argument in this particular case the scholar pd page is actually better than the Wikipedia page and Just to answer from a larger sociological context What is argued about with Dutch book arguments these days by philosophers? Are questions like well? Why is it that a agent would have to? Do this I can imagine an agent who yes believes certain odds But I just don't want to do anything. I might not have any money to be doing any gambling at all So those are the kinds of arguments people make These days, but yes, you can do it in terms of horses as well Okay, okay All right So base theorem so if you start with The what's called the posterior p of a given b if you know This right here was called the prior p of a Then you can flip things around and you can get then P of b given a Crucial thing to notice here is that p of b you can calculate it just by normalization if I some Strap doesn't want to stand on if I sum up the right-hand side Over all b over all a rather I will get p of b So all that I need and that's so it's long as I know this particular expression here for all a and b Then I know p of a given b So this is a way that you can derive p of a given b from p of b given a and the prior p of a and this Is called base theorem And final fact the person who really did most of the work with it is a plus It should be called the process theorem. He did some of the first calculations concerning planetary motion and so on He used base theorem throughout all those calculations and really introduced a lot of very powerful important tools for using Bayesian statistics Just a little bit of history at the end of the 19th century The devil is in the details and the details are in the prior At the end of the 19th century or so people start to come up with really stupid priors Without getting into the formalization of how you can tell priors are stupid or smart and it's garbage in garbage out So they got very very stupid results predictions using Bayes theorem It fell into Distribute and that what actually that is what actually Led to what's called sampling theory statistics Fisher who right now is somebody whose name You're not supposed to talk about because he was a racist pig. But anyway, Fisher, right and many others Hypothesis testing p hacking all that traces back to Fisher There were still huge wars about Bayesian versus non Bayesian throughout the 20th century By this point, I think people are pretty much just tired of it and they use whatever works, which when whatever their personal preferences So anyway, that's a little bit of the background to the history of Bayes theorem Let's see what some of the implications of Bayes theorem are Let's plug in for these are generic a's and b's Let's see that say there's some particular event like the state of a system And you have some data concerning the state of the system and in this case as I mentioned This is called the posterior probability. It's proportional to the prior Times this which is called the likelihood Notice that this is right here. That's describing your observational apparatus. That's your experimental apparatus Saying that if this event actually did happen in my experiment the probability that the data I get out would have been given by I'm this conditional right here This is your prior probability of what the events could have been and this is what you're actually interested in According to all the rules of probability theory that we just talked about that's a probability of event given data All right, so Given Bayes theorem, let's say you want to actually come up with just a single prediction I don't want to know this probability over an event. I've actually got to let's say predict the specific single event To do that that I think happened here my data There's two important concepts that I'm a certain important concepts But one of them is what I will refer to is called the MAP events the maximum a posteriori This is actually the rule if you want to call that saying Simply predict that the event happened is the one who's posterior probability Given your data is the largest That's called the MAP one and basically physics is based upon MAP reasoning statistical physics is based upon MAP reasoning Now an important point to bear in mind about all this is forgetting to statistical physics Is that all this is true when you also even when for talking about probabilities of probabilities? These are called hyper probabilities hyper priors things like that. So in particular, here's an example Suppose you have a Gaussian so that's your probability distribution But you can actually have more than one Gaussian. Yeah, I'm you're not certain about what its mean is So if you are uncertain about its mean you can have a probability distribution over the means of the Gaussian will be a Probability over probabilities. It will be a hyper probability. So once you actually specify a hyper prior a Prior over the means we can then actually give you the prior We can give you the posterior probability of a particular mean and we can then just integrate through So if we want to actually predict one particular sample from that Gaussian, we can get it by just integrating over the prior if in this particular case We had no actual data no samples of the Gaussian okay, alright Now the MAP it's kind of makes sense But it itself is not really motivated by any kinds of desiderata So is there some sort of desiderata that we could use that actually does end up with a very very well founded rule for giving for turning any probability distribution into a prediction of what single event happened and The answer is yes. There are various ways of doing this one of them is a fellow called savage He has a bunch of axioms that basically say that you need to add something To your probability distribution to actually uniquely specify What is the proper was called the Bayes optimal? Prediction you should make based on the probability distribution and that's called the loss function Which by the way plays a role very very similar to the distortion function that good deal was talking about yesterday And the loss function it basically says that if you predict X prime, but the truth is actually X Then then you end up paying the loss function. It's a map from X and X prime to the real numbers You want to minimize loss? So according to savages axioms and more precisely what you want to do is choose the X prime that minimizes your expected loss This is called Bayes optimal and you can have posterior expected loss when these are All posterior probabilities and so on There's lots of very important consequences of this of sort of Rules of thumb that you know already one of them is let's say that you have quadratic loss So if you predict X prime the truth is X Then what you must pay is the square difference between X and X prime If you've got any distribution P over X Your optimal guess in this case is actually the average In general there in mind that if it's not a quadratic loss Which is kind of a funny thing to accept it means that if I double the amount that I'm off I have to pay four times as much and I guess there are some physical scenarios where that's true But there's certainly many where there's not if that is not your loss function in general you should not predict the average So for example, let's say your loss function is the absolute value of the difference between X and X prime Which in some senses makes more sense you would think that that actually is reasonable in more situations Does anybody have a prediction from based upon in essence sociology of what it might be the Bayes optimal? prediction Based of a probability distribution if actually your loss function is just the absolute value Can you predict what the answer is for what you should predict? Sorry Yeah, and the median actually has many more stable properties Then does the mean in many situations if you have long tail distributions and things like this It's famous that the median is going to be less sensitive to that is very very rare events than the mean is And in part that can be reflected that comes from ultimately the choice of the loss function Similarly if anybody here has done things like machine learning There's what are called regularization penalties in many neural net training algorithms One of them that was way back in the day called weight decay is basically a quadratic loss on the weights of your network if you instead impose an L1 loss the magnets the Your penalty is given by the magnitude of the weights Now you've actually got because of things that are called like to the shrinkage priors And so when you end up with a sparse code and it tends to perform much much better So all of these things are wrapped up There's all of machine learning is down that door coming out of these kinds of issues And of course there are other situations where you wouldn't want absolute value very very often you would want chronic or Delta If I don't actually get the exact prediction correct Then I'm screwed, but if I get a correct that I'm fine It doesn't matter if I'm off being off is all the same Then you actually go back to anybody guess To what the optimal prediction is for a chronic or Delta loss function to minimize the expected loss. Let's Go back Where to go? Yes, we wanted to minimize This right here where the loss function is Delta X comma X prime bang give the man a free coffee or Whatever else it is that you most desire for me this time of day It's free coffee, but in any case my loss function is very high for any other kind of liquid But yes in this particular case. It's going to be the MAP. That's going to be coming out okay, so in any case Here is a really fun example So David So in the end the choice of a loss function is essentially Like I mean is it the choice of a prior? No, they well they are related Yes, I mean if you if you think about the regularization the example you yes, so I was conflating things you're correct Well properly speaking I was just cheating a little bit. What's called regularization setting the Bayesian prior That does not necessarily involve a loss function per se, but they have relations They the what the two of them affect one another So I actually conflated two concepts there what I was actually giving when I was talking about neural nets is what a Betel is going after here. I'm dr. Marcelli. I guess professor Marcelli In that particular case the thing that I was describing about are actually the priors If you adopt a Bayesian perspective Non-Bayesian people in the Fisher school would call them the regularization penalties It wasn't exactly the same thing as a loss function, but the prior that you choose is related to the loss function In certain ways. So yeah, very good point. Thank you for emphasizing that. Yep. I was sloppy My bad. Um, but so here's a kind of a cool example. This goes back to Laplace actually Um People here how many days pick anybody? You've already had your cup of coffee. Should I just pick on people somebody else come up with a number? And there's no way I'm going to tell you you're wrong for how many times that you have seen the sunrise in the morning Somebody come up with a number. I Don't know your age. So you just tell me you buddy is so shy Let's just say I don't know 300. Let's say that you're 20 years old or 30 years old or something like that. Whatever. Let's say you've seen you come up 5,000 times Make it a thousand because actually very often you sleep in so you've seen the Sun come up a thousand times You've done the experiment a thousand times View a thousand times. So let's say that the Sun rising every day is a biased coin It's a random event It either can't come up or it could not come up someday It would be really unfortunate if it didn't come up, but a priori you've only got direct evidence your data Is only having seen it rise a thousand times in a thousand experiments If you toss safe coin twice and both times comes up heads You're not going to conclude that it's heads. You're going to conclude that there's some distribution over heads And that's true no matter how many times you see you come up There's this other body of work called no free lunch theorems and say that's actually true about everything you do in life And there are lots of consequences, but we don't need to go in that But so let's just talk about the Sun rising and we want to ask the question What is your best prediction your base optimal prediction for whether the Sun will rise again tomorrow? Okay, we can do that. So we've got a coin With some probability of heads and in our particular case the data is That we have tossed it say a thousand times and all and all thousand in our particular case that came up heads Let's say we're using quadratic loss If it's a fair coin and of every single sample of this coin is independent if their IID if it's not a mark-off process And so on so forth we can just plug it into all of this What we get is what's called Laplace's Law of Succession Which says that your base optimal prediction for whether the Sun comes up tomorrow is the number of heads You saw divided by the total number of experiments sorry plus one divided by the total experiments plus two So as you would expect it's less than one in this particular case We're talking about a thousand one over a thousand two Okay In general the way that this goes is if the your coin can have a total of r plus Possible values rather than just two that comes up in these numbers. That's r minus one and that's just r and so on so forth We can also use this kind of reasoning to do other things like for example. Let's say I've got some data And I don't want to know something like what it is the probability of a heads value Let's say for example, I want to know some characteristic of the distribution that I think was sampled repeatedly To give me the data. Yes. Oh No, no particular reason. Go ask the plus I Also, not only a quadratic loss, but also a uniform prior So this is a prior over priors. This is a hyper prior And we're saying that a priori the probability that my coin has bias Z where Z is a number between zero and one. It's uniform Over all possible Z's so therefore with that prior and this particular data. I can get a posterior I can normalize it and if I happen to use quadratic loss This is what comes out if I don't use quadratic loss if I use absolute value loss I'll get a different result. I Would get the median and offhand. I don't know what the median is and so on so forth on the loss function. Well for Loss functions that you and I just talking Off the cuff considered to be reasonable. Yes But if you were to try to formalize What is the space of possible loss functions and how do we say what the reasonable loss functions are? You'll have a more difficult time, but when we're just sitting around I'm drinking way too much of this coffee Yeah, we'll agree that for any reasonable loss function in this particular case You end up with you would end up with a number very very close to one You're correct Okay But let's say that we can examine this case again. It's a binary coin Binary coins have free that we can make functions of the probability distribution For example yesterday. Gojo was talking about entropy, which we'll get to in a second. So let's say the exact same data I want to actually what I want to know is What do I think the entropy of the distribution is that gave this? So I'm not asking to make a single prediction. I'm asking to know what the entropy is and to do that I Hate chalk by the way, but in any case What you would do is you would write down that the expected Entropy given the data up to proportionality constants and so on That's going to be integral over all possible probability distributions Since we have a finite number of events in this case we have two Notice that this probability distribution is just a real vector in this case It's a vector in R2 that happens to live in a simplex It always has to live on again simplex so we're going to have this and then we're going to have the posterior and We were just saying things like we can have it be a uniform prior over P in which case This would be proportional to probability of P of D given P Which would equal the product over all of our M experiments of the deep Datum so this will give you the posterior expected estimate of your entropy given your finite amount of data You can buy similar reasoning you could actually get variances in the entropy given the data and so on and so forth and There's a very very subtle issue. I'm here is how do you want to choose a prior in general? I Was actually involved with Well, actually the very first paper that wrote down this equation But I'm some of the more sophisticated recent work is by a fellow called pool P o o le he's at Princeton actually in Cognitive science or something like that, but I'm as based upon what's called Chinese restaurant prior So very very funky current stuff But that's the form how to set the priors This also becomes a lot of subtleties. This is very much on the side But it become a lot of subtleties of what you want to do for example, let's say your data Concerns variables with have two indices and you want to know what the mutual information is Between those associated values So I want to know as well as the posterior expected mutual information given data In that particular case it actually depends upon which formula you use for what mutual information is If you actually decompose mutual information yesterday, for example We saw a mutual information written as the relative entropy the KL divergence that's one way to write it and Just plug it into the definitions of this, but there are other ones Which in our up in our world are all the same? If I actually know my Distributions the sequel is that But if you try to estimate it from data, you're going to get a different answer depending on which of these formulas you use So which one should you use and here? Let's make it even more interesting Let's say there's other arguments that you don't even know how many there are So you're only saying probability distribution over some of the actual values. They're hidden values What do you do then? Turns out there are ways to deal with all that kind of stuff It's not actually Bayes optimal, but it is other uses of base there. So is this related to the fact that the I mean the average over the data of the I mean the Empirical entropy computer from empirical distribution Is not an efficient. It's not is an unbiased is a biased estimator of the entropy Oh, but okay. So bias and variance those are sampling theory statistics There is well, I'd be careful here. That's not these Very very mainstream Bayesian reasoning which what I presented here There is nothing like bias and variance. You don't care about bias. You care about minimizing expected loss It's actually turns out you can make a midway Framework that lies in between sampling theory statistics, which is where bias variance lives and full Bayesian You can in essence have a Bayesian version of bias variance and there There's lots of rich results, but this doesn't care at all about bias And in particular bias would do things like if you want to be unbiased estimate You were saying the frequency counts if there happened to be some bins some values that I never saw I would assign them zero probability and evaluate the entropy based upon that that Is clearly a little bit pathological you do a Bayesian analysis and you're never going to actually conclude that the probability of any One particular event is exactly zero with certainty You'll be allowing for the slop and so in general you will actually get different answers For this kind of a calculation, and you would if you were to do it with a frequentist unbiased estimator Worrying about the other higher terms in the bias as well, which is what people normally do that So I was wondering whether this problem is related to the fact that s of p is not a linear Functional here. Yeah It's really just much more a matter of the fact that you can write this out in two different ways We could work it through This right here is going to be the entropy of p like this It's going to then be the cross entropy And when we do the cross entropy for this kind of a calculation because this is not a delta function You will get something different from if you just look at the conditional distribution That's the kind of thing. So it's not so much a matter of frequentist or not It's just the fact that we can write this in their various different ways. And so for example Quantum thermodynamics you actually have relative entropy is has to be Represented in different ways because you don't know what it means to have a conditional to have a relative Cobag-Labor divergence You have to write differently from the standard formula that for example was presented yesterday Because you can't talk about cleanly one probability divided by another one inside of a logarithm So instead people break up and use another similar Formula equivalent formula for relative entropy and that's what you have to do in quantum thermo Which is another example of where you have to use a different formula and so therefore in quantum thermo if you try to do the exact same thing of estimating Von Neumann entropy given data it would depend von Neumann relative entropy given data would depend upon which definition of relative entropy you used Such as life and that's actually part of what we learn by Formalizing all these kinds of issues. Okay, so Moving right along Is this pace too slow too fast, what do people think? About right or Goldilocks. Okay, cool It's too fast that there is a way to slow this down. Yeah by making question asking questions. Yeah, yeah Notice like what I was saying of me yesterday The people who know things they're the ones who are the asking questions and they're the ones that impressing those who matter and it's the professors You guys want to learn how to be that you want to learn how to be a professor You're gonna learn how to be a rude asshole who asks questions all the time Because that's the only way you're gonna learn Something some kind of a good bumper sticker about if you want to learn you got to be an asshole anyway Okay, so let's say we just determined that you have to use probability distributions And we also though that there are going to be many situations Related to what we were talking about here where I say fine. I've got here's the probability distribution I came up with it with Bayes optimal reasoning or I was just given it by God or some such How do I actually quantify the amount of uncertainty in that probability distribution? So here's one way to answer that question. We've got a probability distribution over All these events x1 x2 and so on These are going to be so a successive Samples of some random variable x. So this is with the first sample the second sample We want a function which is going to be quantifying our uncertainty Which says that the? In this case, we're going to be looking we're looking for a function that we're calling the surprise the surprise in a sequence x1 x2 we want to be a surprise in X1 plus the surprise in x2 So I'm not going to start by directly quantifying uncertainty I'm going to instead be using an intuitive concept of surprise This is an argument that traces back to James actually so Let's say that what we want to our zero atom is that the surprise in a sequence Two events x1 followed by x2 what your total surprise is seeing those two events given that you know What the underlying distribution p is I want to just be additive is the surprise? I saw in the first one plus the surprise. I saw in the second one Also require that this function the surprise function It's decreasing as a function of its argument that the greater the probability of an event The less my surprise and actually seeing that event Okay, if you put these together what you will find is that your surprise function Modular some extra arbitrary constants. It's given by a logarithm. You can change the base of the logarithm, but that's what it is uniquely Okay, so now that we know what the surprise of a single event is drawn from a probability distribution Well, I know that probability fusion hard to say that fast probability Distribution by assumption because we just talked about the surprise in it So I can very reasonably say well, let's just wink a nod expected loss So we want to look at the mean so expect a quadratic. Sorry quadratic loss So we want to look at the expected value of the surprise. What is the expected surprise? Bang entropy So here's an example of this kind of reasoning Suppose that you have a Gaussian with your variance remember that we had the hyper prior before Suppose it now there's n plus. So this was an example of Hmm, sorry, let me back it up a second over here We just showed that the proper way that you are to quantify the uncertainty in a distribution is the entropy of it because you just take the average of this according to p of x and what you're going to be getting is negative the integral of p of x log p of x So that's the amount of uncertainty in a distribution Okay, now step two going through this a little bit more slowly apologies Let's using that say I want to actually predict what my distribution is So we're quantifying the uncertainty in a distribution by its entropy Now I've got data like something along these lines And I want to be able to predict my distribution here So I need a prior over distributions This is remember what an example of that for the case where we were talking about real-valued underlying variables But now let's be considering just the case where we've got a finite number of possible values of our random variable So that the distribution is just a finite dimensional real number And we want to have a prior probability over that real number So that we can say something like what is my posterior Bayes optimal expected optimal prediction for the distribution that Generates some data based upon that data Okay, this would be the kind of prior that goes into here Well, and here's a another Somewhat extremely that axiomatic piece of reasoning we just said that the uncertainty in a distribution is given by its entropy So one natural choice, and I'm not going to try to go further right Right now than that is to say that my prior over distributions is going to be one that is basically an exponential of the entropy Okay, so the more uncertain it is or minus the entropy so the more uncertain it is the less like the more likely it is a Flat distribution essentially I'm worse going to be saying is more likely than a very very peaky than a specific peaky distribution The reason being that my expected surprise For a flat distribution Would be less than my expected surprise if I would be seeing my samples from a peaky distribution Okay, not completely kosher in this sort of Broad brush up way I've presented the argument But right now I just want to get reasonable those arguments for why one might adopt this thing called the entropic prior Because the entropic prior arguably is actually fundamental to everything else we're going to be doing in this course and everything you've ever done Cisco physics Let's see how Okay, say that you just is another example in addition to gaussians and Sunsets some rises in the morning and so on so forth Another example is you come across a system that has n possible states and an energy spectrum We're now coming full circle to say how do we do physics when we have uncertainty about things So here's a situation where we'd love to be doing physics We've got this particular kind of a situation and let's say our data. This is a very strange kind of data Let's say that you actually know the expected energy of the system What we want to know is that what we're trying to predict is the probability distribution over these possible Impossible states and here is our data that we have and so what I tell you is that whatever that probability distribution is This is this expected energy So that takes the form of a delta function and to give you a heads up expected energy is things like temperature But that's the way it's very often interpreted. So this is saying Concretely, let's say that you know the temperature of the gas in this room the air in this room Exactly to infinite precision Okay, so that's your likelihood function recall from base theorem We've got a prior to the entropic prior so base theorem Says that the probability this should be given a D. I'm sorry the probability of any distribution Conditioned on your data that should be a D. I'm right there is Proportional to the delta function times the exponential Let's try to avoid completely the question of what loss function to specify and Just assume that the mode of a distribution is a good thing to go after or alternatively Choose a loss function. That's a delta function Then you then any of those kinds of arguments would say that we want to then say given some data in this case knowing the average energy Of a system then what I should predict is the probability distribution That is that's generated that is consistent with my data. I Should choose the distribution that is the MAP that maximize the distribution with maximal probability Given that information Okay Look again at what again? Apologies that's a D, but this is what I want to find The P that maximizes it So can anybody tell me what that answer is What is the P that's going to maximize this it doesn't matter what alpha is? Yep, because We are only going to be looking for those distributions actually you jumped two steps, so That's two cups of coffee for you So we know that we're only interested in those we've got a constraint that we're only interested in those distributions That have the given expected energy and this is just a dot product constraint if you'll notice We want to find of all the distributions that have that expected energy Which is the one that maximizes the exponential of the entropy? That's the same as saying which is the one that maximizes the entropy So what we have right here is a Lagrange multiplier problem We want to maximize the entropy Subject to the constraint on the average value of it We want to find the distribution that maximizes entropy subject to the constraint on the average value of the energy under that distribution Okay Both from distribution chronicle ensemble. Nothing was said about heat baths or anything like that Sorry David. So this assumption of The entropic prior it's essentially saying that distribution of The entropy is picked around a particular value which you choose by choosing gamma because I mean Yeah, but notice that here the gamma doesn't matter once we go to MAP Yeah, okay, so so then when you Take and yep So Maxent in the sense that it only looks at the peak and does not consider for example a quadratic loss or anything like that Maxent is non-basic People don't like to advertise that Ed Janes He is some of the things he's most known for is Championing Bayesian statistics and maximum entropy They actually conflict with one another Because he never he never Janes never actually even talked about in terms of entropic prior He never tried to have a Bayesian formulation. There is work to be done. Nobody has done this Students if you want a PhD thesis work to be done on saying what happens if I adopt an entropic prior But I want to do a proper Bayes optimal analysis how does that modify things for example, we just derived the Boltzmann distribution and Let's say that we have a very very small system And I'm going to be looked so we've don't have many degrees of freedom the opposite of the thermodynamic limit Let's say that rather than doing Maxent You instead want to do a Bayes optimal prediction How does your prediction for the physics change in small systems? If you've got a quadratic loss function or any other loss function you the experimentalist you the engineer ultimately Tell me what the loss function is and how does that actually to change me from going away from a Boltzmann distribution for small systems Nobody's done that By the way a quick aside of again. This is kind of like career advice. I'm concerning Ed Janes When he was trying to publish his first articles on Bayesian statistics back in the 50s and 60s That was an era in which the people who the editors of the major statistics journals They were all anti Bayesians. They were all sampling theory statisticians and this was in the middle of the wars So he was what's called desk rejected with all this papers and then very famously Or at least I think it should be more famously. He was given a rejection in the letter Which would said we are not going to even consider your submission for publication In fact, we're never going to consider any submission. You ever send us again And he took that and put it on a plaque and it was up in his wall Okay, so anyway Humans are human scientists are humans too and that's got some unpleasant consequences So anyway, I think that answers your question. We can do similar things as we've just said this gives us the canonical ensemble Just arrived it with an essence nothing about physics in there Similar reasoning if what I know is not just the expected energy, but this should be kind of an obvious Leading question if I also know the expected number of particles of the state of my system X also gives me an integer value Which is the number of particles so I know both of those. What do you think we get down here? For the answer for the max and answer come on come on somebody Don't be shy You get the grand canonical ensemble all standards to school physics equilibrium to school physics comes In this sense stray out of information theory You don't see this in the textbooks. It's a lot cleaner than what you see in the textbooks Gilder will be presenting some of what you do see in the textbooks, but there's a very interesting dynamic here What's going on in these in this particular line of reasoning is we have derived equilibrium to school physics From information theory or from arguments that can be there very very tightly woven into the foundations of information theory It turns out that if you instead adopt a bit bath-based traditional point of view of school physics Some of the kind of stuff that a girl do will be presenting From that in its modern form of stochastic thermodynamics You actually derive information theory so here is information theory From which you are deriving school physics If you instead you can adopt a bath-based formulation and you're going to derive information theory in the sense that Many of the quantities that you are going to be forced to calculate Are things like mutual informations and relative entropy and all of that so all the normal quantities and information theory Will arrive from the normal bath-based formulas so you can actually go either direction From information theory to the statistical physics Picture which is consistent with the infinite baths or with the infinite baths and so on you're going to be led to calculate all the quantities and information theory Okay, so yes the grand canonical ensemble Okay, just to summarize today Well, this first part of it Okay If you have uncertainty and we always have uncertainty then there's many different countries here are that say you should be using probability distributions to just discuss that to quantify that uncertainty if you want to then Given you've given a probability distribution, but now you want to predict a single element Salvage is axiom say you do that with a loss function. This is called the Bayes optimal prediction Very often the Bayes optimal prediction is well approximated by the MAP and if you've got a delta function loss function They will actually be the same thing We can do all this for the case where the elements that we've got these distributions over or themselves distributions Certainly, that's a straightforward thing to do when we are finite spaces because then the distribution is just real numbers So we can have all the supply to try to infer distributions themselves If your data is a set of delta functions About dot products of p of x And you use the entropic prior what comes out is max cent and all that is conventional to school physics By the way in addition to go talking about going away from the MAP. I'm seeing what happens to school physics When we use the entropic prior Something else. Here's a question for you. I don't know where the thermostat is, but we would go look at the thermostat right now It'll have probably Only gets down to doesn't get anywhere past the dot. It's just got a single digit of precision Okay, all the rest of it. It's uncertain In other words the likelihood function is not a delta function You can say it's a Gaussian or something like that So you have actually uncertainty about the temperature over the system What happens to all the analysis when you have uncertainty about the temperature? Especially because the average any average of a set of Boltzmann distributions is not a Boltzmann distribution It's not an exponential So the actual functional form that comes out When you just build into your analysis the fact that you know, I'm a little bit unsure about the temperature Nobody's actually worked on that Moreover and this is actually a paper that I'm working on with a colleague right now when you try to take that into account in sarcastic thermodynamics We're saying that we've got an off equilibrium system Where we've got a little bit uncertainty about the temperature. We've got a little bit uncertainty about the energy spectrum We might even have a little bit of uncertainty about the number of baths and so on so forth What happens to the sarcastic thermodynamics? Basically, all hell breaks loose a whole bunch of things break and so on and so forth But physicists just pretend that you know the parameters that you only uncertainties about the state that you know All the parameters exactly That's the standard operating assumption in physics even though that is never true Okay So I think I am done at that point. Yep. The blank slide says time to be quiet question All these results can be obtained also using Maximum likelihood. So why do we use to do we have to use? The Bayesian framework, which is I think more difficult since we have to impose a prior and Because because different he said you must use probability theory and savage then said According to these criteria, we want the Bayes optimal guess So whether it's difficult or not, that's not really for us to decide that's God or goddess or whatever is sitting up there in the sky. They decide how much how difficult they want life to be for grad students This is the thing that they decided on the fifth day Well, they're doing all the other stuff and they decided that well, we want life to be I'm somewhat challenging And so yeah, it's Bayesian statistics. There's not a choice that you have to make If you want to follow a strategy in which I could Dutch book you and make money off of you No matter what happens feel free to violate the Accenture probability theory If you don't want that Then we're forced to use Bayesian statistics in particular if you wanted to derive the canonical ensemble You can't do that by using maximum likelihood. That's I might be you need the atropic prior. Otherwise, you're not going to get it If you were to say that I've got a delta function about the energy of this of in this room and I want to choose a I'm Not even sure how you could use that to predict a distribution. It would just be you will be saying any distribution That's consistent with the temperature It all has the same likelihood Because of the nature of your data, there would not be any Handle using just the likelihood alone to distinguish among them Does that answer your question? That was very flip and glib and easy for me to say standing up here All right Okay Anything else other questions? Okay, so maybe we should take just a couple of minute break and then we're going to want to end the 20 of to give enough time for coffee So probably we're not going to go all the way through the next lecture Which is the more standard textbook way of driving statistical physics But we should I'm sure get through I'm a whole bunch of it Okay, so there's five minutes for everybody. Okay, 20, okay So this is not going to be a big problem for us to I think Just go through this in 20 minutes because we already assumed that people have some background in statistical physics and thermodynamics equilibrium conventional one, okay So let's go. We're just gonna quickly talk about like this good old statistical physics like micro canonical canonical Ensemble and just going to remind you like this fundamental post that that actually allow us to do conventional statistical physics is just like the our good city assumption and so on so forth and sort of after after this I'm just going to provide some pointers to like computer science and Try to emphasize the differences between like this conventional approach Conventional thermodynamics and then study stochastic thermodynamics. Okay So what we want to do actually in by equilibrium statistical mechanics is to sort of understand Okay, we have an atoms in a box of volume b For example, it's like a very like a simple setup simple setup that you can consider and it's placed in an arbitrary external potential any kind of atoms and You would like to understand the behavior of this kind of a system How can you formalize the behavior of these particles, right? So one thing that you can do is basically you can write down all the general as coordinates It's just like ranging from all the position and momenta and so on so forth and you can try to solve Newton's equations of motion But do we I don't know if is it feasible to do that? So we want to describe and solve for the behavior of atoms. So, okay It's it's not feasible to do that We know that because there are far too many particles and so on so forth, but also there are some other Questions about for example chaotic motion so that for example when we have this kind of a concept We know that the time evolution depends with ever-increasing sensitivity on the initial conditions So that when even if you have like some idea about like this initial state You cannot actually predict the future behavior, right? This is by the way this can happen independent of molecular chaos like it also happens for example if you have interesting rules, let's say of this Volume V box you can have like chaotic behavior even for one particle This is like something that's studied in quantum chaos So and also like the third argument is mostly the strongest arguments We are interested in the typical behavior, okay of these number of these atoms So we are trying to average and like we are trying to get the idea of the mean behavior Like the pressure calculations and so on so forth. So what's we are using here? Okay, statistical averages, but how do we find statistical averages? So this actually poses a question on how to define some probability distribution just as for example David actually just like it's the Just of his lecture so you need to you have some uncertainty, right? because we don't know like for example all the Moment and position to like this and this trajectory to with a face-to-face and you need to use some probability Distribution that actually captures what you don't know about the systems behavior So things that you can do this is just a verbal statement in one select We're going to jump to how we state this so you can either take like this tiny average of certain degrees of freedom Where you take this limit t little t to infinity or you can take an ensemble average of collection of particles turns out these two are equal in Algorithic systems so the system kind of self averages and this is like the thing that actually allows you it to do Statistical physics if they if you don't satisfy this you can't do statistical physics So yeah, the question was that we need to solve for the behavior of atoms and we are going to use ensembles for that So we're talking about ensembles, but how do we actually define ensembles, right? So it comes in this point the consideration of these constraints that we're imposing on the systems and these Constraints are actually going to lead to us to the probability distributions that we use to define this systems or characterize this systems So for example, let's take like just like this, you know This and atoms in a box kind of a thing in the classical scenario, but also yeah We can basically just simply excited to quantum But okay, we can write down again like this position and momenta and these will all of these points will construct the face-to-face If you're not considering if you're not a mathematician and considering like rigid bodies and so on so far they solve as R to the 6 and and What what an ensemble is simply a probability density function. I don't know if you see my first, okay You don't see him. I see it, but it's not moving anyways So it's basically a probability density function that is satisfying this this formula It makes sense. You need to have I mean you're basically summing over all the points in the face space, right? So and this is something that represents again connecting to the previous lecture What we do not know about the system Whereas the system itself and its time evolution is encoded in its trajectory that is evolving with time in the face space, right? so we just say to that There's an our good city assumption and it's we just mathematically express that this is one thing that I'm emphasizing here Even though it's really trivial for for many participants I guess because this is like this ensemble approach is how we do statistical physics, but tomorrow starting from tomorrow We are gonna see that in stochastic thermodynamics. We can invoke ensembles Yes, but we can also ascribe quantities like entropy to trajectory level descriptions as well And this is something that wasn't considered before the advent of stochastic thermodynamics And this is something you really know, okay So if you want to make some things more specific we can actually try to ask how we are describing an isolated system So I like system is that constant energy and it's like not connected. It's like a whole universe itself and there are no interactions Okay, so one thing that we should do is like to express the Hamiltonian as a constant of motion Okay, so the energy is constant in time and So how we write then we know you need to sort of deduce the probability description a probability distribution that describes the system Right, so we know that this is our like imposition that the energy is constant So depending on that, how do we write the probability distribution that defines this ensemble? It's a direct delta function, right? Sorry so basically this is this is the simplest example and Mostly I mean to be more precise just to remind you about it Of course, if you want to be physically or mathematically more precise You're comes we are talking in terms of like energy high pre surfaces and we are basically saying that if you have n particles in a fixed volume V and The system is kept at constant energy e you say that there is this uniform probability distribution that describes the system Which resides in this high pre surface of like this energy shell. Okay, these were the main concepts So one thing that you can do we did in this class in our like undergrad classes is basically just to consider now a partition of This whole system itself. Okay, this was the complete system and now we're sort of dividing it into two We know that the total energy is constant, right? Because energy is an extensive quantity we can basically write it as a sum of these, you know partition a sorry about that It's not you on e2, but yeah, okay. I changed it notation So basically we can write it as this sort of like the energy of the components. Okay, so And also one thing one other thing that we did was to express all this, you know Like this let's say phase space or the part of the phase space of the whole system As a Cartesian product of its components like the sigma a sorry omega a times omega b so based on that we don't like products we like sums and so on so forth and Depending on so many other motivations. We're not going to go into that This was how we this is how we write the Boltzmann entropy, right? And itself is also an extensive extensive quantity So, yeah, one of the things that we can do that this kind of a quantity is basically just take the derivative test And just sort of recovered the conventional thermodynamic quantities for example Just checking that okay, everyone knows that I think but yeah, okay So for example, this is something that gives us what the inverse temperature, right and so on so for it So you can we know that for example this s is encoded in terms of v and e this Microscopic variables and if you take some different derivatives like with respect to v you're recovering pressure and so on so forth Okay, that's good. That's great. And also these are like the first and second law of thermodynamics You can I mean if you just consider this expression on the like thumb right hand side and take the derivatives You see that on the other I mean in the opposite way you can basically recover that Expressions of micro canonical ensemble. So that's good But micro canonical ensemble is sort of boring because nothing is really isolated We actually want to investigate some systems close systems that are kept in terminal equilibrium How do we keep them in terminal equilibrium? We imagine a universe that is like partition into a system of interest and the bats reservoir, which is this is like the important point It's an infinite infinite idealized system Okay, so it is basically sort of dictating this temperature deciding the temperature of the system of interest And now we're asking the question of what is the equilibrium probability distribution function that actually characterizes this system, okay? So, yeah, here we go. It's the Boltzmann distribution and so Actually, David pointed it out. There are like so many different ways to derive this Boltzmann distribution But yeah, one of the most I think attractive ways is to use this kind of like the Optimization problem that was considered in David's lecture Which is like just use Lagrange multipliers and even without physical considerations You can actually find this one and yeah, one of the important things Central that a theory of like this equilibrium statistical physics is like the partition function Which has this form basically this 8 to the 3 n is something that I added to Emphasize the like the unit cell of the face space, but it's not important and then Building on this Basically, you can recover for example, take the logarithm of this Elan said and take a derivative over it and you still recover the thermodynamic quantities, okay? So for example, one of the things that you can do is like to compute like this mean intern and energy We know this I'm passing this Okay, and then Time can I learn? I mean 10 minutes past probably right? Okay, that's great You say you gave me 20 minutes. That's why I'm rushing but okay So then I can just stop for 30 seconds, but just make sure that I mean, you know this right? You know the subject perfect. Okay. Yeah, if you want to remember things just like two books London lip shits set now done. I mean, yeah, that's good Okay, so yeah, but if you want to consider it like a really Thermodynamically open system we need to not only consider energy Exchanges but also the matter exchange right so that we want our system to also have particle You know we want our system to also exchange particles. How do we do that? We are introducing sort of a new constant of motion that would allow us to investigate the system still at equilibrium and this is the chemical potential and We write down the grand canonical ensemble probability distribution in this form So now we have a different like this chemical potential also, okay? So Now the question what if we perturb the system? What happens then because we are always assuming that I mean in equilibrium statistical physics I think it doesn't hurt to say that this little t time notion of time. It actually doesn't really exist you're always averaging and your averages are like time independent but Things start to become interesting and enriched if you actually consider that there's an external agent That is protruding the system that is taking the system out of equilibrium and introducing time dependency What happens then we are asking questions along the lines of for example How does the system relax back to equilibrium or if it doesn't relax to equilibrium? How does it do it? So up until 90s, and I think it's also safe to say you just like 21st century. Yeah I don't know 2000 it was the case that non-equilibrium time and dynamics tried to tackle this kind of a question and Actually developed really impressive tools, but it I mean these tools mostly worked for small perturbations And it's like under the roof of something called linear response theory and you could yeah a lot You know you could drive expressions for like thermodynamically expressions that would be you know that would consistently describe What's happening to these systems if you just apply small deviations like small external fields but beyond this linear response regime Nothing non-equilibrium thermodynamics really didn't say so many interesting things because if I perturb a system And I take it like arbitrarily far from the equilibrium no framework to describe it, but then Stochastic thermodynamics, so what's stochastic thermodynamics does is basically it uses statistical mechanics Okay, it uses statistical mechanics to provide the rigorous foundation to non-equilibrium physics which is applicable to systems arbitrarily far from equilibrium and Again, the dynamics are still governed by equilibrium as our bars and so on so apart But now this time this whole framework of equilibrium thermodynamics. It can also be recovered by just like Stochastic thermodynamics and also than an equilibrium thermodynamics. It's captured by stochastic thermodynamics Of course, it's not like this holy holy thing holy structure like this knowledge of body body of knowledge, but still It does a lot So just a few points so tomorrow David is gonna introduce How we formalize okay stochastic thermodynamics we all talked about it, but you know there's like mathematical basis of stochastic thermodynamics Before going into that I just wanted to provide an intuition for stochastic thermodynamics So we mostly use stochastic thermodynamics to understand mesoscopic systems and their time evolution and you can't take Mesoscopic systems to be defined as like if you have a delta E around this KBT value, then yeah It's probably applicable you can use stochastic thermodynamics, but of course these systems are dynamical systems, right? And they need a dynamical description of how they evolve in time So two things that you can do you're gonna put you're going to provide a stochastic description Okay, this is the conventional stochastic thermodynamics that uses actually for example the continuous-time Marcochains jump processes like All the things that that we described by for example things like master equations If you don't know how to write down a master equation, maybe just look at it before tomorrow's lecture We are gonna start from there and the other thing that you can do is to again take a deterministic description What you're doing is actually Kind of resembling what we did in for example micro canonical ensemble or in the other ones where we take a universe and Then partition it into a system and there is our bar So one thing that is interesting is that I think 90% of the work in the at least classical systems stochastic Thermodynamics is considered with this stochastic description because it makes sense right because you're considering mesoscopic systems and they're sort of Exposed to fluctuations and so on so forth, but in this course I think yeah, it's it's rare to do it We're also going to talk about this deterministic description. This a the first one it considers infinite bats, okay? They dictate the dynamics stochastic dynamics over the systems and the B you can also consider finite bats So in with this deterministic description finite that formulas in that we are going to discuss bats states are actually also Important in a sense So, yeah, we also emphasize that still how we are going to drive stochastic thermodynamics Yeah, it's it's formulated by conventional statistical mechanics requirements We are going to use probability distributions, but they're going to have different meanings right now and Again, we're going to use reservoirs infinite finite mostly infinite and the one thing that is beautiful Is that these thermodynamic descriptions? They won't explicitly involve dynamics, okay? They won't be really complicated complex messy things They will be really clean and they will have for example interesting connections to what we just talked about like the Shannon entropy Shannon entropy is like the Entropy this expression that we're using in stochastic thermodynamics, and we are going to I think understand tomorrow better Why it makes sense to use it? so three more points Three I think major contributions of stochastic thermodynamics This is something that I want to emphasize before going into the mathematics tomorrow because it's just yeah It's it's gonna be I think just like a past like a run So finally we can talk about the definition of thermodynamic quantities in a consistent way for systems That are stochastic and fluctuating This is something that we weren't able to do before that and the second one I think it's really interesting is that so since Boltzmann's time We had this sort of complicacy about the foundations of thermodynamic irreversibility, right like the arrow of time paradox It's also going by the name of Loschmann's paradox. So for example this finite bad Formalism actually provides some understanding to irreversibility. Okay, and the arrow of time I think this is really important But I'm also not saying this is as gunge is myself But I'm also saying this as a TA because this is basically we are going to touch upon how we actually Recover stochastic description from deterministic description in stochastic thermodynamics. Okay, and Also, the third point is information is physical You know actually how to you know, what is like the term a dynamic cost to do some computational operations beyond the bit Eraser this point is important because I think this is one of the rare courses that are given on stochastic thermodynamics and computation, okay, so Finally, one more thing is that I think just to raise your appetite for stochastic thermodynamics It's gonna take 45 minutes and then I'm done with an announcement Is that you're gonna see after this introduction to stochastic thermodynamics tomorrow We are gonna be talking about thermodynamic uncertainty relations and so on so forth Which are basically providing bounce on the precision of physical observables in terms of the entropy production or the dissipation Okay, so These kind of bounds actually they exist in stochastic thermodynamics They're applicable in a wide range of scenarios They have limits and we must be aware of its limits So they are not applicable to I don't know of any kind of interesting system So we must be aware of it, but still they work quite well So one of the things that I wanted to emphasize is that I think this is a really interesting point This is on the computation side So these kind of universal bonds the energetic bonds they're kind of pointing out in the direction of that direction that Okay, we need to consider what we can do physically energetically, okay So if there are energetic constraints on what is physically allowable or not for any kind of a physical phenomenon that you can Consider from the stochastic thermodynamics perspective So one really relevant question is actually posed by computer science in the computational Complexity theory and it's also kind of touching the foundations of mathematics What do we mean when we say a problem is solvable? And if I give you a computational setup, can you solve that problem and how efficiently you can solve that problem and so Since I think 60s or 70s computer scientists and physicists sort of Tried to combine these two questions like what is physically allowable or efficient to the question of what is computationally allowed or like efficient by using Statistical physics good old statistical physics, okay If you know about it like spin glass systems and so on so forth again and all the reference to crazy so they kind of they tried to come up with Some measures of like statistical physics measures to quantify computational complexity But these kind of things almost all of them they include for example mappings to easing models and so on so forth And they are rather static. They're not actually talking about computing or problem solving as a dynamic process They're just modeling everything as a static process So we need to I think from a pedagogical perspective just during this whole course We need to keep in mind that what it means to physically compute and can we actually relate the physical trade-offs of Stochastic thermodynamics that relate the speed of a computation or speed of a process time a dynamic process It's noise level and whether the computational system is tailored to perform a given computational task to abstract limits of computer science that are impulse that are like in detail Investigated and computational complexity theory in terms of descriptive complexity or with resource complexity such as like memory and like you know time constraints and complexities, so this is Everything so one thing today. I'm going to post the homework and It's going to include about like five questions But one of them is just about the information theory and sort of like you're going to work with Markov chains And entropy rates because it's the most related topic I think to like the stochastic thermodynamics and CTMCs and the other four of them They are going to cover the topics that we are going to be introduced this week, but So I'm not sure if it's going to be super Meaningful to ask for the you know for for your solutions like on Friday So I think you can send out them to me until I don't know if that's also okay with you Maybe until like Sunday or something like that. Okay, so yeah but on this topic one more one more thing is that if you want any kind of like Recitation hours or something like that also for people on zoom if you want me to solve problems on Board or on just like by using these kind of slides I can do that and I think I should do that and Also based on that this was something that we discussed with Mateo Depending on how this course evolves depending on like I don't know how much you like Stochastic thermodynamics and computation and so on so forth I think it would be great to just sort of come up as they do in Leso ocean so on so forth like these summer schools in a traditional sense if you can come up with lecture notes and just like You know just produce to provide a perspective and so on so forth maybe we can as a group of people can do that and Just just calibrated our CTP on this so this is another really pedagogical thing So we can actually talk about for example open research problems and so on so forth and I think our mentors and Professors here they can actually guide us through this So, yeah, I'm going to send a questionnaire to slack If you want to do recitations and discussion hours so that we will specify some available times and That's all from me. Okay Thank you Okay, so We are free coffee is upstairs and You