 I think yes Good morning everybody. Welcome back. We start the second day with the second lecture from Venerkraut So good morning everybody and thank you for coming back Even though I ended with in great confusion. I hope yesterday. So For yesterday, I just want to do two minutes of discussion of further reading You understand that what I am trying to explain to you in a simple and sometimes a little joking away is Very serious material and I don't think it is possible to master this material without reading a book from time to time So I want to give you some hints on what what to read and Also, this gives us a recap of what we discussed yesterday So one very important point Which I hope that you will teach to your own students children grandchildren And so on is the power of statistics Which means that even a finite number of samples gives you definite information on the physical or mathematical Quantities that you want to compute This is very well discussed Well, it's discussed in a little book that I wrote Section 134. It's also very well discussed in the in the book by Larry Wasserman. I checked yesterday again It's really accessible for everybody and the section. No, I mean there are some books that That should be kept away because nobody can read them But some others that you can actually read but this one is also here and it it actually has it like this in the Title warning. There is much confusion on how to interpret confidence intervals And I checked that he doesn't put warning warning on every page in the whole book of 450 pages if you put three or four warnings and one is this that's the that the power of statistics is really not well understood The second point that we did not really discuss and it's because it's really too complicated is discussed for example in Wasserman section 131 What is a probability? There was a question after the talk and I invite everybody to To ask questions during the talk after the talk during lunch after lunch during the evening During the night and so on. What is the probability and The answer that I gave in private only yesterday is the dumb and I just cite again The book by Wasserman is there are many interpretations of what is The probability of an event p of a the two common interpretations are frequencies and degrees of beliefs so yesterday we discussed the interpretation of Probabilities as frequencies. It was the number of hits divided by the number of trials But I could have equally well discussed the Interpretation as a probability as a degree of belief that something is true This may sound funny to us to you It sounded funny to me a long time ago, but now it doesn't anymore I put this also into the section 134 of this book. So what we could have done and There is a computer program explaining Explicitly what it is. We did discuss it over over dinner last night So I could have said that I think That the quantity pi the one that makes pi r square equals the area is I think it's a uniform Distribution that's my belief. It's a uniform distribution between zero and four Completely different interpretation But this different interpretation can then be refined through the effect that I did an experiment with 4,000 pebbles by the way pebbles means little stones for those Not native speakers or I didn't look it up in a dictionary So I could have played the game with four thousand four thousand stones and then Refined my initial belief that pi could be anything between zero and four and obtain a probability distribution for pi So this is the second interpretation and there's a really nice Discussion in this book and there's an there's an explanation also There's a so it's it's a continuation of what we what we discussed yesterday on a completely different in a completely different language But using the same underlying axioms of probability theory. So the theory is the same It's just the interpretation that is different and this is the different the interpretation of Bayesian statistics So then there was a really interesting question of JB Jeff Inequality yesterday was a question. Well, it's so old. It must be really bad It's not true this inequality that we discussed is the best inequality that you can find it's a sharp inequality But of course if you don't suppose anything on your distribution So now you can do different things for example You can so they say that your distribution lives between a lower value and an upper value if you do this then you should not and we should not have this use the JB Jeff inequality them another there's a series of other inequalities this you can look up in the same book I will not say section four point one and this go for example by the names of her ftings inequality or mills That everybody should have read about but this book is also available online and there's no excuse for not reading it So now what I will discuss today. I will continue with the story I had the story yesterday of the direct something when we are much bigger than The square that you want to sample and we have direct access to the probability distribution by just throwing a pebble or by by By taking two random numbers between minus one and one So then we had the other case when The person's playing the game are much smaller than the Than the pub and this gives the Markov chain sampling and We arrived at the real strange Pattern of pebbles of stones and I want to discuss why this is the correct Why this also gives a correct sampling of a probability distribution? Pi of x y which is a constant for x and y between minus one and one so this direct something was invented by a French botanist by a French botanist before in 1738 and Then rediscovered by Ulam in the 1940s and the Markov chain sampling is Due to metropolis it in 1953 so now I have two two parts that I want to discuss the first part again is the is It will occupy us until the break at 955 sharp as sharp as it was yesterday and then from 1005 I will go into the disc into the discussion of mixing time and correlation times So now This system here of course is a terribly complicated system terribly complicated much too complicated for us so we will even we will simplify it further for the moment and we do a simplification of this model and What do we do if you want to simplify? We discretize So instead of looking so and instead of you know, this is kind of perspective drawing Which is also a big drain on my brain. So I don't do it. I don't do a drawing like this. I do it like this so now You can understand that we may just as well say we have a pebble or person running on these Points on a little square there. It's the three I call this the three by three pebble game so and you understand the basic layout is the same as here and so if the person is at one side at Time T it can go up it can go he can go she can go up down left and right So now if I'm in this configuration for example Also this config this configuration here. Let's call it a Can be reached by little stone throws Well either From the same configuration a or From the configuration that I may call be Or from a configuration that I may call see Because I don't law don't allow in fact. I don't understand periodic boundary conditions the configurations a B and C are the only configurations that I That allow me to reach a So for example, I could say at t plus one at time t plus one I'm at a and At time t I was at a or was it be or I was at C Now, let us write the fundamental or one fundamental equation and the fundamental equation is that As I told you yesterday the idea of Markov chain sampling is that you start off with a probability Distribution, which is not the uniform distribution in the square In fact at t equals to one or it equals to zero just because we are mean people we start off In the upper right corner Little question just for wake it for waking up for waking up everybody. Why don't we start? At the random position in the square Well, exactly if we if we saw the answer was if we start in the random position And we are back to what we did yesterday and we can go home and have Or we can go to the cafeteria and have beautiful Trieste coffee so we start up there and we have probability distributions that depend on time So what we can write is of course that the probability distribution at time t plus one or the probability to be at a At time t plus one is composed of The probability at time t to be at a Times the probability to go from a to a Plus the probability distribution probability at time at place B times the probability to go from B to a Plus the probability at time t to be at C times P from C to a Yes, no, no, no, no, no, no. This is just a regular some configuration a Well, let's let's put this way at time 255 you may again visit the point a I could have Well, I could have chosen another position another position In two seconds, I'll have a general formula. Okay, so it's just a general configuration a Let's put it this way. It is I could I Could call it like this pi t plus one at the configuration C or Or whatever what some some configuration is the sum over its neighbors and itself of pi t of C prime times the probability to go from C prime to C Okay, so this is a general equation. I Was just this was just an illustration So now let us suppose that there is a t equals to infinity limit limit and Let us suppose that it exists I will discuss in two minutes the three conditions for it to exist and so if it exists then T plus one The probability distribution at t plus one should be the same as the probability distribution at t at here at t at t So if I wait a long time Then I may reach a limit Where this distribution is the same as this one as this one and this is one and this gives me so this this means pi of a of B of C and so on for t Going to infinity or t is the same as pi t plus one at a b c and So on for t going to infinity And then I arrive at beautiful equation, which is pi of a Must be equal to pi of a times p from a to a plus pi of b times p from b to a plus pi of c times p from c to a or more generally If you want it just answering your your remark from above from above It would be that pi of a configuration C is equal to sum over C prime pi of C prime P from C prime to C This is a fundamental equation It is it is called the global balance condition it is so what it says is that The probability at any configuration C Must be equal to the probability configuration to the probability of neighboring configurations or configurations that can that can be accessed or that can access C Times the probability to go from C prime to C So this condition is one of three Necessary conditions a t equals to infinity limit to exist for existence of t going to infinity limit such that this limit is given by the probability distribution pi of a b c on the nine Let me give you one more Interpretation so this here this pi of C pi of C prime This is This is the input This is what we this is the this is the distribution that we want to sample that we want to out So in our case, this is the we want to have a uniform distribution On the nine sides of this three by three table game Right, this is what we want to have What we want to have a uniform distribution and in order to have this uniform distribution? Pi of C is simply a constant for all a configuration C But more generally this is the distribution we want to sample object or the distribution to be to sample sample again as discussed yesterday simply means obtain Examples of the distribution. This is where the name sample comes from so this later on this will be the Boltzmann distribution in our physics statistical physics in our physical statistical physics Context or later on because we are in a in a session in doing the three weeks You will do a lot of quantum physics also that yesterday that was already a question on what happens in quantum systems so in quantum systems This will be the diagonal density matrix You see how we can move in five minutes from people playing and completely nice Objects to things that we may not completely understand like the diagonal density matrix so this is the physics part of what we are discussing here as You can you could read it up there. I'm talking. I'm I'm giving lectures on Monte Carlo Algorithms so the Monte Carlo algorithm is This thing here This is The Monte Carlo algorithm so the Monte Carlo algorithm has to satisfy the necessary condition that the The probability that has to satisfy this Necessary global balance condition that That you I wrote here. I will give you more information very soon Yes, speak up you a young man you can you have power Thank you. Thank you very much. Thank you. I pay you a coffee So this is a pay you a coffee. Okay. I pay you coffee at nine o'clock So why is this a condition? Well, because I call it a condition, but it is a condition on the Monte Carlo algorithm to convert so so that the The distribution pi of t converges, you know So this this is the condition that pi of t converges to pi for t going to infinity This is the condition that I have to propose I have I can check we will have a few algorithms Later on and there will be conditions This is a condition. It has to satisfy There are two other conditions. This is a complicated condition. There are two other conditions. I will write them down immediately So the condition number two Excuse me. I have to get the right thing here. The condition number two is called Irreducibility and the third condition number three is called a periodicity These are really kind of baby conditions that are very very easy to be satisfied Okay Well, I'm really sorry But you know, these are very big blackboards, but they are a little low, but I mean But you can stand So it's I say to you I give it in private So it is condition number two is irreducibility and Condition number three is a periodicity at the end of at 10 o'clock at 955 when the two of us go having a coffee We will be doing it. So now let me write this condition Let me write this Let me let me spend a little time. It's so important It is so important the condition that in the little book I wrote I Didn't even give it an equation number Because I didn't think at the time it was so important 10 years ago. I thought it was not so important I thought that the equation that will come a little later Was important, but now I think that this one is really important. So I multiply this with one Okay, and if I multiply it with one it remains it was a valid condition We derived it all together. So we all know that this is a necessary condition and then I write one is equal to To let me write the equation pi of C Some over C double prime pi of C to C double prime is equal to one So what this means is I have a Monte Carlo algorithm Okay, that has to satisfy a give a certain condition three conditions Among them the global band global bands condition and this Monte Carlo algorithm if I am at configuration C at time T I Definitely I have to go somewhere. I can either stay at C or I go to the right hand left hand Upper down or whatever neighbor I have so I have to go somewhere. I Cannot go into thin air So this equation is clearly true and now I can plug this in here and maybe In the interest of sparing me an embarrassment of writing an equation which is long like this You can see that this equation means pi of C times P from C to C The double prime to the sum of these must be equal to this so let me give interpretation this thing here is called the flow from C prime to C Understand I have a certain probability to be at C prime and a certain probability to move from C prime to C So this is the probability to be here be there times the probability to from here to go here So this is the flow. It is the T going to be this is the time average flow from C prime to C and Then if I plug this equation in here what the global balance condition tells me is For all C let me write it in the in numbers flow Into C A flow excuse me flow out of C Must be equal to the flow Into C. It's the same as this equation here So flow is so it is sum over C prime Hi P Excuse me pi of C P from C to C prime we can call this also The flow from C to C double prime Must be equal to the sum over C prime P of C prime times P from C prime to C so this is also so this is Flow from C prime to C So this is also the global balance condition It's just the rewriting of the of the original of the original formula. So now Let us discuss So now I was discussing a lot on the on your right side now So that there's no nobody envious I come to your side on the left Okay, so now I let us discuss the detailed balance condition Detailed balance condition and let me go back to this little example of A B and C In the original example, so if I write out the the global balance condition as before I had that Pi of a times P From a to a plus pi of a Times P from a to b plus pi of a Times P from a to C Must be equal necessary condition global balance condition Pi a from P to from a to a plus pi B Times P from B to a Plus pi of C times P from C to a So this is the necessary condition on the original example Which means this one and now you see that I have two terms which are the same. I can take them away and Now the time Honored solution for the global balance condition is that in order to equate Pi of a P from a to B plus pi of a P from a to C. I simply I take this one Equals to this one this one Equals to this one Okay, and this gives me the detailed balance condition Pi of a P from a to B Must be equal or is equal to pi of B times P from B to a So this is the detailed balance condition and it is a sufficient condition For the convergence or it's it's one of the sufficient conditions together with the Irreducibility condition and the a periodicity condition that I discussed in two minutes for the convergence of Pi T Towards Pi So now Again, I pose took so I asked two questions. So what is pi of a in our example? In our example of the pebble game on the three by three. What is pi of a? Pick up young men young women Loud what is pi of a what is the the value of pi of a you know, I'm all I don't hear anymore Huh one half Thank you, so it is equal to one ninth because we want it to be one ninth Okay, we want it to be one ninth So this is for for B. So you know it let you understand if I put in et cetera Etc means and so on so it's you can replace a and B and C and D and EF So this is one ninths. So this again, let me say this is the physics part of our problem And this means that the algorithm Pi of a so it must be P from a to B Must be equal to P from B to a or if we use the detailed balance condition Then we have this is a condition if We use Detailed balance speak up Thanks again. I'm I see that I will have to pay another coffee Because so this is the one condition that is traditionally imposed It is imposed by 99 percent of 99.9 percent Of people working in this field because it's really easy to come up with a solution. I'll do it in two minutes. I Personally have abandoned it for the last ten years. I never use it I think it's old stuff It's all but so we we will discuss tomorrow in great detail How we can come up? I will this I'm not not even how we come up We will actually I'll show you and a number of algorithms that Satisfy the global balance condition that we have to satisfy but that violate the detailed balance condition but applying the Huh, no, no, it's they will be Mark Corvian algorithms. No, no, no We I'm speaking about Markov chain algorithms, but I will I will give you I think I will give you 13 Algorithms in 55 minutes. You'll all run out to have coffee So but they will they will not satisfy the detailed balance. We'll discuss this in detail tomorrow. Okay, but for the time using the principle of first things first We we we check out what what this condition gives because of Okay, so if if we satisfy the detailed balance condition, we have to satisfy this this thing here and now The easiest solution is simply to give each of them a factor one quarter One quarter one quarter and One quarter So we say just one second. We say that if I met a The probability to go to B is One quarter the probability to go to C is one quarter If I met C the probability to go from C to a is one quarter If I might be the probability from B to a is one quarter all of them being one quarter and Just let me finish and what is the conclusion of this if Okay, first of all, so if all of them are one quarter So the probabilities to go from one party from one position to its neighbor is one quarter and one quarter What is the? inescapable Conclusion of this is That if I'm in the corner I have to build piles Right, you understand I have to build piles so here with probability one quarter I go to the left with probability if I'm here with probably one quarter and go here and with Probably one quarter. Well, I would like to go here, but I build a Pile probably one quarter. I build a pile so this building of piles of Pables is nothing but the It's nothing but the action in order to satisfy the detailed balance condition so now let me discuss in two minutes The other condition. Okay. Yes. Thank you Okay. All right. Okay. Thank you. I will not pay you another coffee Because I'm not responsible for your health, but it may be unhealthy if I pay you too many coffees So there are two other conditions So now is the question so what do I have to do so that the What are the other conditions such that the t equals to infinity limit actually exists and There's one. So let us discuss it in the simple example Yes, yes, so now I'm giving you I'm giving you the answer. I'm answering your question So by giving an example where you satisfy the global balance the detailed balance condition But you are evidently not unique So I could have an algorithm Where instead of going with probability one quarter and one quarter I could have an algorithm Where with probability one half I go to the right this probability one half I go to the left So what this would mean if I start here I go to the right I May build up piles, but I always stay in this in this thing here If I start here, I'm here if I start here. I'm here So this is clearly this is clearly a situation where the t equals to infinity State depends on the initial configuration that I've chosen in order to avoid this This unhappy situation to arrive the Markov chain must be irreducible and Irreducibility so this is so I have the global balance condition That I've watered down just for the moment so this is number one Inescapable condition and the irreducibility condition the irreducibility condition is that That I may have a local Monte Carlo a local algorithm that I go only to the neighbors But in the t there must be a time t or t time delta t where I can go From any configuration to any configurations So and I write this as p the probability to go from a to Another to another we'll go in more detail into this later on to be Must be larger than zero for all a and b Well, this is the irreducible irreducibility condition and Then there is another condition, which is called a periodicity condition, which is but you see this is really easy to To implement you just have to be careful to move in all direction And you just have to be careful that you can reach any configuration. This is something that takes us five minutes Coming up with good algorithms that satisfy the global balance condition or the detailed balance condition takes us years and decades of life Satisfying the irreducibility condition takes us five minutes. It's it's really It's really easy and the third condition is the a periodicity condition and the a periodicity condition means That if you are at a configuration You see it here if I'm configuration a at time t I Can be no excuse me if I'm at configuration d at time t I Cannot be a configuration d at time t plus one Okay, you understand I'm looking now. I'm looking in this direction So if I made configuration d at time t, I cannot business algorithm. I cannot be a d at t plus one Because I move with probability one quarter to the right to the up to the left and down I don't know. It's not a problem. So but if I met t I cannot be a t plus one at the at d Yes, of course No, it's this is not I don't have to go be able to stay at the configure. This is not so What I'm telling you is the following. I'm just explaining the a periodicity condition if I'm at d at time t I Can be at d at time t plus Two Understand I can go here. Go back But then I can also then I'm here. I may have a rejection here build a little pile and go back So I can go at t plus three and t plus four Etc. Okay, so if I'm at t Then I can go back at time t plus two three four and so on and the greatest common denominators of the time lapse of two three four five is one and A periodicity means that the greatest common denominators of the delta t is equal to one in Common language, it means I shouldn't be I shouldn't have cycles. I Shouldn't have cycles. I should be able to go from in any time From one configuration to itself This is called the ape. Let me just finish. This is called the ape periodicity condition and you can look it up look up in beautiful book a little and let's say 99 98 percent less easy to read than the book by was a month or 95 percent I mean it's really more difficult, but you can read up the Mathematical proofs that show that under condition one two and three the T going to infinity limit is unique and We will convert exactly to the probability distribution pi of No, no No, the the ape periodicity condition is even more easy to to show for the for the for the for the for the for the end because there it is clear that I can come back at time t plus 1 t plus 2 t plus 3 and so on so let's say the the easiest example of of a Monte Carlo algorithm that is periodic is One with only two sides Where the probability to go? Between them is equal to one So it's just like a flashlight at time t. I'm here t plus one. I'm here t plus one t plus t t plus three and so on and so on So this is the easiest example it has the greatest common denominator of two and We will discuss later the transform matrix in two minutes We'll discuss the transfer matrix and it will have two two two solutions and two So it will have parrot. So under these three conditions the Markov chain is certain to converge Towards to the to the to the to the To the distribution so the distribution pi of t for any starting point Pi zero will converge towards pi So now let me discuss So let let me discuss down here in the in the corner the Metropolis algorithm the Metropolis algorithm generalizes or it gives Gives a precise recipe Gives a recipe for finding an Algorithm P that satisfies the detailed balance condition and Using this notation so you understand that for Using this notation like you see that Pi pi of a Times the probability for to go from P from a to b. It's called the flow from a to b Must be equal under the detailed balance condition Must be equal to the flow from B to a the Metropolis algorithm 1953 beautiful paper A little just let me finish the sentence beautiful paper a little buggy, but beautiful beautiful paper is You may not have seen it before in this formulation the Metropolis algorithm is that the flow from a to b should be the minimum of Pi a and pi b So let me just I'll come back to you in two seconds, but let us first check does this algorithm Satisfy the detailed balance condition and the answer is yes because the minimum of pi a and Pi b is the same as the minimum of pi b and pi a So it is evidently it is symmetric All right, so this is symmetric in a Going to be or be going to a so it's evidently a symmetric Second question. Can't we translate it into a language that we understand? Because this is kind of a little complicated. So let us translate it. So let's say that F from a to b is Equal to pi of a times P from a to b so we can translate it into P from a to b Because all the pies are positive it is the minimum of one times pi b divided by pi a and All those or many of those of you who have already programmed the Metropolis algorithm may recognize it in this formula Whereas in this formula, it's a little bit obscure. I Would suspect is there anybody who has already programmed the Metropolis algorithm Yeah, so many of you so I would suspect that many of you have implemented it without maybe checking of why it is correct right It's like this. This is life, right? We have to advance So let me show you that this is the this is the real This is the easiest proof of why the Metropolis algorithm is correct and it avoids some complications that are for example in in in the book I wrote I didn't really understand How to derive it easily and I I have a I'm really apologize I have a complicated table of four lines to actually check it But the reason why it works is simply because the flow from a to b is the minimum of pi a and pi b So now you have a question Well that Thank you. Well, I think I have to give out a few coffees late at 11 but so So the let's say the the this rejection probability The rejection probability is the hallmark Why is the trademark? It's you know, what's written on the coca-cola bottle is coca-cola and the rejection is the trademark of the Metropolis algorithm It's the trademark. It's it comes there everywhere The algorithm that I will show you tomorrow that satisfy the global balance condition have no rejections anymore and I I have all exactly like you say it's kind of strange why you have these rejections or what what? What's the what are they good for? Well, you understand what they are good for is if you didn't have the rejections Then the particle would move the pebble would move all over space Right, it will go to China. It would be it would be just it's just so the the Proposal probability of moving of doing the random walk with one quarter one quarter one quarter one quarter is fundamentally not Adopted to a system with boundaries and it corrects Your proposal probability to the physical system that you want to simulate that means to the boundaries But they are of course, they are much better ways Yeah, I think they are there better ways and we have to in the development of Monte Carlo algorithms We have to overcome these rejections because they break they break the the flow of your anyway So this is what I discussed in detail tomorrow. I completely agree with you that That the rejections are the basic flaw of the Metropolis algorithm Does this answer your question? Right, this is the basic flaw and and you pointed it out correctly. So let us check Let us check that Let us check that our original algorithm Satisfies so that the the original algorithm satisfies the detailed balance condition We just have to go from the discrete version to the continuous version of the algorithm And what happens is so let me take a magnifying So let me just magnify in blue So what simply happens is If I'm at this position And I stand here Then the probability to throw the table here is The same as if I stay at this site. I throw the table here So this is the the real condition so I can we discussed yesterday how we can have our What what can be the the? Probabilities distribution for me to throw the table. It does not have to be it does not have to be Rotational in variant so it could be that I throw in each direction with the same probability it could also be that I throw in the direction plus plus X and minus X plus Y and minus Y the only condition that I have to Satisfy is that the probability to throw from here to here must be the same as the probability to from here to here Later today if I have time I think I will have time later today I break this so this is the metropolis algorithm and Later today I break this Ideal of the symmetry of the symmetric Proposals grow from A to B and from B to A and this will be that's called what is called the metropolis Hastings algorithm that we'll discuss at the end of today So now I think it's a good good moment to to wrap up and to have ten minutes of Break let me just say just so that everybody understands the necessary condition Is the global balance condition? It's a little bit strange and in physics you would say there is a problem with units Well because here is where we have it's just like if you say that the velocity must be equal to a mass or you know It's just it's but it doesn't because this is a pure number So what it says that the flow into so this is the flow into the country the total flow into the configuration C Right. This is pi of C prime times P from C prime to C is the flow from C prime to C and If I sum over all C primes including C This means the total flow into the configuration C Must be equal to the statistical weight the probability pi of C This is the global balance condition as I said it has been overlooked for many decades But it's very strongly strongly coming back and it's one of the three conditions together with the irreducibility Irreducibility is that starting at time T at some later time I must be able to be at any other position and the a periodicity Mean meaning that the greatest common divisor of the return probability, which is called the period of a site a Site C must be equal to one So when I gave an example where it was two three four five and so the greatest common Common divisor is one so then we discussed the detailed balance condition, which is Which simply says I have that the flow in the configuration must be equal to the rate But this is the same as saying that the flow into a configuration C must be equal to the flow out of the configuration C and this is Simplified in the detailed balance condition Which means that flow from from C to C prime is equal to the flow from C prime to C and with this I think we are all set to have ten minutes of break where you can go out or you can come here and have further discussions Thanks for your attention So this is that the detailed balance condition. Okay, the detail balance condition is a sufficient condition it follows From global balance Well, I say it imply it's like this it implies global balance So what it says there are many ways to satisfy the global balance condition and One way to satisfy the global balance condition is through detailed balance Okay Now there are many ways to satisfy the detailed balance condition and One of the ways of satisfying the detailed balance condition is the metropolis algorithm. Okay, so this is metropolis algorithm it implies detailed balance and The detailed balance implies global balance. Okay, I think I have to write this down for other people So the logic of what I was discussing the logic is you have global balance and Then it was equal probability So I discussed the equal probability case with nearest neighbors okay, and this Satisfies the metropolis algorithm in a special case of new so we use the metropolis algorithm and this and so on So this was the logic Yes, okay, so let us start. Let us continue with this part for today So now I just want to there was there were a few number of really interesting question and One of the question was about the logic of what I was discussing. So we just want to Let me just recap this so we have the global balance condition which is a necessary unavoidable Condition for the metropolis for the for the Markov chain to converge towards from pi pi t to p but So what we did is we then simplified it to the detailed balance condition So the detailed balance condition implies the global balance condition Which simply means that we don't have to satisfy the detailed balance condition there are other ways to satisfy global balance condition, which are not detailed balance and I Personally think that this is where the future Of metropolis of Monte Carlo algorithms lies and we'll discuss some of them tomorrow so then there was So there was the detailed balance condition and I gave you an example of How to satisfy the detailed balance condition and this was the metropolis algorithm? So there's the metropolis algorithm But again, there are many other ways to satisfy The detailed balance condition one algorithm one way is something that we will discuss a little later today It is the metropolis Hastings algorithm Which is a little bit different. So other ways is metropolis Hastings Hastings and Yet another way is the heat baths algorithm Let me not write it another algorithm is the heat baths algorithm that also satisfy the detailed balance condition and then So even though in our example, we only have nearest neighbors So this nearest neighbor condition is also of course not not necessarily what I'm discussing here is more generally the convergence of Markov chains on the graph so we have connections we have a discrete system with Sites a or b if they're connected then there is an edge on the graph if they're not connected so this was very general for for the for the convergence of Discrete systems on a graph and but then we saw that it can be it's really easy also to generalize it for continuous systems So this was the logic of what I was discussing and let just let me say that the global balance condition means the flow Into a configuration must be equal to the flow So this is what we wrote down here the flow into a configuration must be equal to the flow out of the configuration Or the flow into the configuration must be equal to the statistical rate of a configuration so now We have three really simple conditions to So that our Markov chain converges towards the probability distribution pi so pi may be Pi may be the Boltzmann distribution or it may be the The partition function of the standard model Can be anything and we can go home because now we have this we have the detailed We know the detailed balance algorithm We know the metropolis algorithm and we can go home and solve any problem in physics and Because this is a very general algorithm And so that would not be there would be not no room and no employment for anybody not using the metropolis algorithm Because it can solve any problem in the world This cannot be completely true What I'm telling you it cannot be true and the reason is What we have to discuss now is on what what is the time scale? times Okay, okay. What are the times on which so how do we converge from the configuration pi t to? Pi So this is a short So this is okay. This will happen, but when when will this happen? when will we converge and This is the more general subject of what is called mixing times and Correlation times and I also discuss so this is will lead me to the discussion of the Transfer matrix. I call this a transfer matrix But it could also be called it in in in this book and many other books. This is called the transition matrix Okay, so let me discuss mixing times So the mixing time is the following This is also a subject that many of you will not have heard about but this is the real the real critical the real time on which which we That is that is used to to quantify the convergence of a Markov chain towards pi So it is clear That at t equals to zero in our little three by three game Pi of t pi of t equals to zero Was one here and zero everywhere else at time T equals to one or if we start in the upper right corner At t equals to one if we start in the upper right corner then this one Will be even equal to one half this one will be one quarter this one will be One quarter It will be zero zero zero zero zero here And we already have we have proven We know in the limit So at t going to infinity we have actually proven that The Markov chain will converge To this probability Distribution now it would be interesting to quantify the difference between pi of t and Pi which is the same as pi of t going to infinity and the way this is done is With a with a quantity which is called the total variation distance called TVD and the total variation distance is Between pi t and pi written like this is given by one half the sum over all sides a of The absolute value of pi t of a minus pi of A So this is the definition This is I realize that There are not many people In physics at least discussing Mixing times or this discussing this total variation distance You can look it up in the in the book by Levin in Paris, which is called Markov chain and mixing times appropriate to the subject I'm discussing here and also in Real nice article by a famous Statistic statistician Percy Dioconis in the mathematics of mixing things up So the word mixing comes here and it's it's I will go it. I'll show it right now so this This total variation distance it has a diff has a has another it can be expressed as an as another in another way, it's the maximum of the subsets a of Let me do it simply one two three four on the nine sides Of the so you can have a this so this is nine sides and So the maximum of the pi t of a minus pi of a This is the same So what I mean by what I mean by this equation is you can take You can take as a set for example Configuration number one Okay, so let me give numbers to the sides here one two three four five six seven eight nine and then what I'm doing here is I'm taking all kinds of Ensemble of sites and for example, I take site number nine and In pi t equals to zero. This is probability has one and Here it has one ninth So this could be so this must be larger than eight Nines at t equals to zero But I could also take so these are subsets Configuration space So just as example I take as a subset a is The set made up of configuration nine Okay, if it's only configuration nine then At t equals to zero The difference between the two is one Minus one eight ninth me is equal to eight ninth. So this is eight ninth Or I can take the subsets eight prime Being one two three up to eight Okay, and low and behold it would be one ninth Minus zero plus one ninth minus zero plus one ninth one ninth one ninth one ninth one ninth So it would also be eight ninths here and the reason why There's a factor one half is they are always So there's here you I can partition the sets that the subsets or I can partition the configurations into these configurations on which pi of t equals to zero is larger than Pi and the set of configuration that where pi of t equals to zero is smaller than pi so now it is It is clear or it is it is it is a true fact. This is the convergence There's a convergence theorem that I don't want to want to prove There's a convergence theorem That you can discuss they can you see in the these two references which shows that the total variation distance go behaves behaves like a Constance to a small constant to the power t and it will go to zero for t going to infinity So this total variation distance At t equals to zero in our case it starts at eight ninths and it becomes Exponentially smaller as time continues and it will go to zero And now what is called the mixing time which is called t mix Which depends on the parameter epsilon? is the time is The time Such that the total variation distance is Equal or smaller is equal to epsilon Excuse me. No, it's equal to epsilon It's equal to epsilon and it will be here this greater than Well, in fact, it is equal In fact, it is equal to eight ninths But I said it's the maximum of all the subsets of the elements One two three four five six and I just showed two examples One example was the subset configuration nine and One example was the subset one two three four eight So these were two examples where the this distance was eight ninths So I said the total variation distance must be larger than eight ninths But in fact it is equal to eight ninths Let later times so that the next time we can do it at t equals to one I can even computed what it is so at t equals to one in fact this total variation distance you take you separate as I did say you partition your You take the set of all the configurations on on which at time t The probability is larger than what it should be in the t going to infinity limit And if you sum this up, then you'll get this total way. This is the total variation distance So let's do it. So at t equals to one it is at the set is these three sites are larger Have larger probability at t equals to one than and t equals to infinity So the total variation that the distance at time at time equals to one is one half plus one quarter plus One quarter minus three ninths, so it's one minus three ninths and this makes two thirds so in one step Please control me whether I'm correct. So in one step. I have gone from eight ninths to two-thirds No, no, no, no, no, no, I'm saying that oh, yeah, so this is at equals to no No, this was the example at t equals to zero Okay at t equals to zero is eight ninths. We just computed at t equals to one it is six ninths and then it continues and goes Exponentially to zero to be zero only at t equals to infinity. So excuse me. I didn't it wasn't this wasn't clear So at t equals to zero is eight ninths At t equals to one it is six ninths and Then it will go to zero for t going to infinity. Excuse me. I wasn't here Yes, no, it's c is it should be one if it's it's smaller than one Yeah, it's a constant with a constant smaller than one So then okay, this is the yes No, no, so I'm saying that the that the total variation distance So the difference between the two distributions starts at eight ninths six ninths and then and then later on it will it will get Okay, so now I define a level So this is this is the total variation distance It starts in our case eight ninths in in more normally it starts it starts in the value below at one or below one and then as time continues it will go exponentially to zero and Then I define a value of epsilon Which is arbitrary, but must be smaller than one half for example people usually use one quarter and Then they say at the moment at which I have reached one quarter. I have the team mixing time of One quarter Okay. Yes Then that's why I used to have a smaller Okay, so where where the value of epsilon is reached for the first time That's what I want to say where the value of epsilon is reached for the first time so this is the mixing time and It has a it is an it's important concept and it also shows that now now Let me give it go show this means that there may be a set of configurations and the difference in the probably the total probably on the on the set for for the difference between The corrections to the infinite time limit is at most epsilon for the whole set. This is what is written here So this is as I said, this This is the mixing time. So now the mixing time as I have written depends on pi zero Is the second point? it may so we may be able to choose a starting point of our of our Markov chain that may be more convenient than starting up in the lower in the upper right corner So for example starting in the middle may buy us some time and The true normally the mixing time is defined as a function of pi zero but it can also be defined as the As the time at which I reach a total variation distance of epsilon for the worst possible initial condition So mixing times can be defined as a function of Starting config configuration or for the worst initial Configuration okay, so after having done this Let us go into the more canonical way of discussing Convergence of Markov chains the more canonical definition of the convergence is the Correlation time so this was mixing times Now we do correlation time So in order to write the correlation to a tool to in order to this we have to discuss the transfer matrix Yes because this is the fundamental I haven't really asked this was invented by People I know I haven't asked them why they called it the mixing time and why they even put it into their They say it's just a definition, but this is really the this is really the the fundamental time where the initial probability distribution pi t equals to zero mixes or kind of Goes into pi. This is what it's what is described So the idea is really you start off your Markov chain calculation from a distribution Which is not the equilibrium distribution, which is not pi This is the point. I want to because what I'm not doing is watering down this this concept So you the initial but there was a there was a remark just just before the initial point of your simulation It's not a random point taken from pi So we started on the upper right corner because we did not know to sample it. So this is how This is really actually means how the the starting configuration goes into So I don't know what what what what more to say But then there is another concept Which is called the transfer matrix and the trade month hands for matrix. Let me let's I call it P Is simply the matrix of all The probabilities to go from side a one From side a to side B Let me write it down the transfer matrix is Simply the matrix of all as I said of the moving probabilities So this is for example, this is element one one of the transfer matrix Which is the probability to move from site one to site one. So this probability is equal to to true Quarters so the probability if I'm on configuration one The probability to stay at configuration one is equal to one half Because if I'm down here, I have a rejection probability of one half Well, okay, so if I'm on site one from one and the probability to go from two side to the Probability is one one quarter So one quarter is the probability to move from one to two Zero is the probability to go from one to three Because we did not allow periodic boundary conditions and then probability one to four is One and the others are Zero Let me do it from from Two so the probability from two to one is equal to one The probability from two to two is equal to one Two to three is one two to four is zero one zero Zero zero so this This trans this matrix here encapsulates all information on the Markov chain algorithm, okay, and this is a matrix with All positive So now Elements and now what I can say what is it's clear that pi of T is Given by the transform matrix applied to pi of T So for example if I apply it To the element so if now I start at zero one. No, no, let's stay in our example it's going to be one half one quarter and So this is one example, and if I apply the transfer matrix on Pi then I Have again Hi, let's check this Pi is 1 9s 1 9s 1 9s 1 9s 1 9s 1 9s Applying this to This is 2 1 1 1 9s 1 1 1 1 1 9s So we see we see explicitly that multiplying 1 9s with the transfer matrix is Equals to to pi so this means that the t equals to infinity solution So pi of t going to infinity which is given by pi in our example 1 9s 1 9s and 1 9s is in fact, it is the eigenvector of The transfer matrix P with I'm moving away with eigenvalue equals to 1 and What we discussed What we discussed in Before the break the three conditions the global balance condition over here that the detailed by the global detailed bands the global balance condition the irreducibility and the A periodicity conditions are Under these conditions one two three pi is The only eigenvector of P with Eigenvalue with eigenvalue 1 in fact with eigenvalue of Absolute value with lambda absolute value is equal to 1 all the other eigenvalues are smaller in absolute value than 1 so now It is clear that Pi of t plus 2 is Equal to pi to P to the power 2 of of pi of T so if I'm so If pi t plus 1 is equal to P pi t then applying P to pi pi t plus 1 gives us pi t plus 2 is Pi square to pi t and the irreducibility and A periodicity conditions There is a value of t side or of t that such that pi P to the t is a matrix with all positive entries let me discuss this We discussed before that we have to have the probability to we have to have a finite Probability to go from any site to any site maybe not in one time step, but in two time steps or in three time step and so on so the transform matrix P is non-zero For all the sites From that are connected in one time step So pi of one two is non-zero because I can go in one step P P one two is non-zero because I can go in one time step from from site one to site two I go and go from here to here and then if I'm if I compute P pi P squared P 3 to the power 3 P 0 for 4 and so on this gives me is non-zero for all the sites that I can connect in four steps and There is a time T under these two conditions such that this matrix the P to the power of t is an all positive matrix now let us discuss convergence times and Then if it should be I think so Yes P to in the limit of going to infinity the matrix itself will be it will you have a probability 1 9th to go from anywhere to everywhere else. I haven't thought about thanks for the question. I haven't thought about it How can there be no negative entries? There are no negative entries there is no so this is a this must be a must be a positive It must be a positive matrix because it's either the because it probably so then if you square it and so on It's also all positive the eigenvalues under the detailed balance condition the eigenvalues of All positive are all real eigenvalues if you use the global balance condition then the eigenvalues can be complex and This creates a lot of problem and this is one of the many reasons why Algorithms that satisfy the global balance condition, but not the detailed balance condition have not been studied a lot But this will change very much. So now let us look at the Yes, that's what I'm saying of pie. No Yes But all the elements will be nonzero Which means that in a certain number of steps I will go from any side to any side else the main property that we should concentrate Before discussing other things is that there is one eigenvalue of this transfer matrix There's one eigenvector of this transfer matrix Which has which has eigenvalue one It's the only eigenvalue of absolute value of one and this is the stationary solution Which is pi all the entries of the stationary solutions are pi. It's pi is in our case, it's 1 9th 1 9th 1 9th No, it is this is what we imposed This is what we imposed in a more general case pi will be the Boltzmann distribution Yes, but we can have This is not a condition. We can have configurations that are inaccessible and they will have the they will have an energy Which is equal to infinity. This is not a problem Okay, let me just let me just conclude this this part to make a really clear what we are What we are discussing what we are discussing so now what I'm saying this transfer matrix In our case it has nine eigenvectors It comes with nine eigenvalues in our example So there's so this lambda equals to one Comes with the eigenvector 1 9th 1 9th and I can tell you that there's another the next smallest eigenvalue is lambda equals point seventy five Which has positive and negative entries It has also and then there is a bunch of other eigenvalues point five and so on and So this is lambda One lambda two this is eigenvector one eigenvector two and so on and so on So what I now do is I take the initial configuration pi of t equals to zero and I expand it in Alpha one times eigenvector how do I write it? P eigenvector one plus alpha two eigenvector two plus plus plus Alpha nine pi eigenvector to the nine Now pi at the time t is The transfer matrix t times applied to this thing here to this thing So this one has an eigenvector eigenvalue one. So this will be alpha one e one to the one Because eigenvalue Alpha one Lambda one to the t but lambda one is equal to one. So one to the t is equal to one plus Alpha two times I said the second eigenvalue is point seventy five Just as the example was shown. This is a positive number smaller than one to the power of t times eigenvalue two plus Etc. So It means that pi t Is equal to one ninth one ninth and one ninth plus Some constant times something That goes to zero with point seventy five to the power of t times the eigenvector eigenvector number two So these are the corrections finite time These corrections at finite time. They are they behave like point 75 which this is the second eigenvalue to the power of t This can be this of course is written as e To the t times the logarithm of point 75 Logarithm of point seventy five is negative and this gives you can check it for yourself minus t divided by 3.476 So what this means is Again now we have seen it explicitly What this means is that our corrections to the eigen to the so this is the pi Which is pi of T going to infinity There are corrections to the infinite solutions and these corrections disappear or decay Exponentially on a timescale e to the minus t over t Correlation time and this is the correlation time Which is three point four seven six What I'm explaining here is of course a very general property which means that Markov chain algorithms in general Monte Carlo algorithms always converge Exponentially Markov chain Monte Carlo algorithms always converge exponentially and this means that there is a scale If you have exponential convertions It means you have a scale and this scale is given by the correlation time and this correlation time tells you What does it tell you? Well, it tells you that On which time scale you lose your correlation So now I'm expecting a question and the question is What is the relation between the mixing time and the correlation time? and This is I'm very happy to arrive exactly at this point right now For five moments. So it's the mixing time the same as the correlation time and the answer is no the correlation time is a property of The dynamics of your system when you are already in equilibrium Which means if you are already at t going to infinity You'll the next configuration will be correlated to where you were before the next configuration will be correlated But then after a certain time you lose your Your correlation, so let me just put this on a new Let me put it here the take-home message The take-home message is There are mixing times and there is the correlation time the mixing time is the Relevant the really relevant quantity it is How fast it how long it takes you to start from the worst initial condition to get to equilibrium? to get to the to get one sample of the infinite Proximity sample of the infinite of the of the distribution at the infinite time So this is t mixing is how much how long it takes you? From the worst initial configuration Towards a sample one sample of pi For example, you may not you may do a simulation later on because we are in quantum systems You may work on quantum systems. You don't know that your system Has an order a quantum a quantum phase transition into what is isolating for something This is what you want to study, but the initial configuration is not the state that you That will actually be the final state. So the time to get From a bad initial configuration to the to this time is the mixing time the correlation time is the time to get from One sample of pi to the next independent sample of pi but of course I wouldn't bear it generally The mixing time is larger Then the correlation time But let me tell you this there's a general theorem There's that the tau mixing is smaller than the tau correlation Times the logarithm The mixing time depends on epsilon times the logarithm of one over epsilon times the smallest So there is some some some some connection that I will discuss So there's some connection that you look up in the Paris and living in Paris see living in Paris But more generally what is the at least what is the good news is that? You always have exponential convergence Always converge exponentially. So there's a timescale t core but unfortunately for many of the Applications that we are interested in is t core and T mix I Can usually not be computed rigorously both usually It's the logarithm of one over epsilon Epsilon was this lower limit. So this is not a big problem because it can be one quarter and the connection is it is one over The minimum on the discrete system. It's the it's the minimum. It's the element that has this on the on the graph It's these the smallest element of your The the smallest pie This the it's the it's the it's the configuration with the largest which the largest Boltzmann rate That is with the smallest Boltzmann rate So this can be this itself can be an exponential quantity So the mixing time can be much much larger than the than the conversion time. So what happens is the following? If I can draw a picture If your simulation time goes like this you start off with the initial state Then it will take you You start with initial state it will take you a time t mix To reach equilibrium to ask one sample for example, let's say you do this in the easing model at So you to reach one critical system and then you hop From one configuration to the next configuration on the T Core so this time this initial time is usually considerably larger Then the time to go from one configuration to the next. So this is The reason why I'm explaining this is this many people are completely unaware of That there are two different times one time to get from from the initial configuration Towards equilibrium and then another time that describes the dynamics in equilibrium With the exponential decay Yes Excuse me. Say it loud There's nothing we can well of course we There's nothing we can do we have an algorithm and we spend a lot of time I mean that I told you this these times are not cannot really be be Be computed rigorously in in most of our well the Yes, of course, but a good We have the algorithm that we have usually and then it's our analysis We just have to be aware that the time to reach one sample of the equilibrium is the critical time So this of course as we say as we have to be aware So this is the time This is what we have to this is what we have to be We have to have an algorithm that takes us into equilibrium Then of course there are many more things that I have to explain to you Okay, I have to explain to you then in any case in our Monte Carlo calculation as we discussed yesterday You only need even if you have a very big system You only need a finite number of samples and For all intents of purposes It is the same whether you start and independent calculations or whether you start one calculation and then continue Anyway, so I just wanted to say That this so that I want to explain this mixing times the correlation time we'll discuss tomorrow that even if this the space of our configurations Increases exponentially the mixing times Have an algebraic behavior this system size and maybe let me stop there in the interest of the next speaker Okay, thanks for your attention