 Okay. Okay. So as I said before, what I will teach you is exactly what Roderich Merzner told me to teach. So if you're not happy afterwards with what you learned from me, then it's not my fault. So I mean this exceptional situation. What I want to do in four courses is to give an introduction. But you know, it doesn't mean introduction. You can go out and leave and come back once the more, you know, introduction means getting to the core of the subject of Monte Carlo algorithms of statistical, of stochastic processes and of, and the convergence conditions I will tell them, tell all of this to you as Roderich instructed me to do. And then in the second lecture I learned, I found out that the second lecture is the same day. So there's one thing, if you have only a half an hour break between them, you have to prepare both of them before you start the first one. So the second one will be a hard disk from classical mechanics to statistical mechanics. So again, in very simple settings, I will not go into, this is also one of my research subjects. I will not go into the, into the physics part of the research subject with, although this is very, very exciting, I'll make the connection between molecular dynamics and statistical mechanics. Then in two days, two days from now I have a, we have an option, I don't know whether I will, so the third lecture will be either on sampling and integration. I will tell you all of this or a little bit also on past intervals and I'll finish with a discussion of, of classical spin models, easing models, x-y models, spin glasses, and especially some one of the cluster algorithms in various, various forms. So the motto, as I already told you, is what, you know, even the watchmakers tell you to break the rules. It's not, you have to master them, so I think it is quite useful, you will see it is quite useful to really learn the basics of any subject, of course, you know, second quantization or, or Schrodinger equation, but also, of course, once we are a numerical physicist, I think there's an enormous, there are very many people who have a lot of experience during their master thesis, their PhD thesis or even later, have a lot of experience, they've coded a lot, but when it comes to really explaining, you know, what the, for example, let's take, what the detailed balance condition really means and do we actually need it, what is it really, you know, they get, they get a little bit feverish and so this is what I want to, you know, what I want to be a doctor for. So I was helped also in preparing for today, I was helped because we had, we set up a massive open online course, I don't know whether anybody here followed the course, no? So, yeah, so we had, so it's on the, on the subject of the book that Odoich mentioned and it has run for two editions now and has attracted 30,000 students first year and 20,000 students last, second year from all over the world and this helped us also to get, to get very many programs into shape and it's now with this experience from, from teaching this massive open online course that we now have 40 line programs to do, to do molecular dynamics, seven line programs, you'll see all of this for, for doing many, many programs and even maybe I'll show you our 40, 40 line programs for doing path integral Monte Carlo calculations without interactions but besides that it's, it's just perfect. So let me do this very simple, very simple introduction. What, of course what Monte Carlo is about, you'll, I think you've had several lectures already where Monte Carlo is used. I also think that many of you use it in a daily, in a daily life or nightly life because I think programmers, well we need, we need a little bit of, of quiet right as we work. So we wait until everybody else is gone, gone home and then we come in. So it is this very interesting relationship here. I have just an example of integration in two dimensions with a, with a, with a weight that is one inside and zero outside, whatever. So instead of computing complicated, complicated integrals we do samplings, we do samples and this, this aspect of sampling, sampling which is different from, from doing a Riemann or Lebesgue integration is what is the real essence of the, of much of Monte Carlo calculations also, of much of molecular dynamics. It is doing complicated integr, integrations doing samplings. So we'll spend about one hour or more than an hour on, on doing, on doing this and the example I will use, excuse me, although you know this already, but, so example I will, I will use is just very simple discreet case of the, what we call, what I call the three by three table game. So you have sides zero, one, two, three, four, five, six, seven, eight and the goal for the moment is to be present. So it's, we have a dynamic process that goes from time equals to zero to infinity maybe and you want to be on each side with the same probability. So the, the, the frequency or the probability to be on each side should be a constant. This probability we call it, I call it pi k and k in this case goes from zero to eight. So now of course as you know, but as you know there are two sampling strategies and the one strategy was, was invented by, by Buffon in 1777. It has been reinvented many, many times later but this is probably the first one. It is simply, well you just take a random number between zero and eight. At each time step you take a random number between zero and eight and this is the position at which you are. So a uniform random number between zero and eight. So first, first time step you may be on three, second time on, on eight, seven, then you may be on thirteen, is that possible? No, so you may be on three, eight, five, one and so on, completely uncorrelated and this is what is called direct sampling. So this is what, this is an ideal situation, would be very, we would be very happy if we could do direct sampling for many other cases but there are some cases that we, that we can do but of course the more common case is the one invented by Metropolis, maybe you've, of course you've heard about it, 1953 is the Markov chain sampling method. So the Markov chain sampling method, let's just go over it what it actually means. The Markov chain sampling method consists in starting somewhere because it is always used if you cannot do direct sampling. So if you are unable to find a random position or to sample the distribution pi of k, so pi of k is a distribution, maybe it's a distribution and to get examples of this distribution this is what is called sampling. So sampling and sampling is the same, the same point. So if you cannot do the direct sampling what you do is then you start somewhere for example at point eight at time zero and then you go, continue go and go and go and go and go and go and go and let's do and so that's, so what is the condition that we have to satisfy? So the condition that we have to satisfy, let's say we have configurations A, B and C and for simplicity let's suppose that from one time step to the next we can only move, we can only move by one step to the left to the right and up or down. So this is very easy and then what we want to have and let's say we always start in the upper right corner so we always start up there and now we want to do a simulation and the simulation should give us, should put us onto every side with the same probability. So now what there are two conditions that we can write. So the one condition is if we are at A, at position A, well we have to go somewhere. You know and the reason why I tell you all this simple stuff is so that we don't get this lagging feeling in the knees any longer. You know after having spent so many years working with this stuff then you know do really understand. So we are, excuse me, if we are at A well we must go to some other place. It may be C or maybe B or it may be A again. Okay so the probability P is the probability to go. The probability to go from A to A or from A to B or from A to C is equal to one. Okay that's the first, we understand that right? We haven't you know we either, either we stay someplace, stay there or we go but something must happen even if nothing happens and it's probably to go from A to A and then the probability to be at position A at the next time step for example at any other point is the probability to be at B and to move to A times the probability from C to move to A plus the probability from being at A and staying at A. So this is very easy so this is the condition I wrote here. The condition to be at A at the next time step is pi of A from A to A pi of B of B to A and so on. Now here I left a little space so this is a little space and this little space can be equal to one and if you put this one from up here down here you get the global balance condition. The global balance condition so we actually know what we, right? So we understand this is the global balance condition and which we have to satisfy and if we satisfy then we are guaranteed more or less, we will see, we will go, then we are guaranteed that the probability that we will be at position A with probability pi of A in the long time limit. So you simply put this one into here and you see one term which is pi of A times A to A so we can cut it off but the other terms are A to C plus A to B and these are from B to A times C to A. So this in some sense is the probability to move out of A and this is the probability to move in from A into A and these probabilities must be the same and that's called the global balance condition. So now 99%, 90.9 or even more of this percentage is immediately go to the famous, famous detailed balance condition and say that in order to satisfy, so I'll, I'll, I'll, you know, I'll do dual, dual career so half of the time I'll be here and half of the time I'll be over here so don't be jealous over here, I'll come later. So the detailed balance condition says that this part should be equal to this part and this part should be equal to this part and this gives us pi of A times, probably to move from A to B must be pi of B times, probably to move from B to A and so on and so on and this is used in today or until few, few weeks ago it was used in 99.9% of calculations. Next year we'll have completely forgotten. I think the detailed balance condition, everybody will use the global balance condition. So now can, is anybody still following? So what is pi of A in our table game? A constant, right? Let's say it's a constant, okay? Yeah it's one ninth but it's, you know, one ninth is complicated to remember so let's say it was a constant. We said we wanted to be with the same probability on each side so in our special case we had to move from A to B with the same probability to move from B to A and A to C with the same probability from C to A and this is very, of course, very easy. So the detailed balance condition that we will overcome soon means that means that the probability from A to B must be probably from B to A and so on and so on and so on and so on. So it could be zero. So if it's zero going this way it must be zero this way or it could be something very easy, very simple solution. It could be one quarter. So it could be one quarter to go down. So then we have one quarter, one quarter, one quarter, one quarter and this, of course, implies rejections. So what we have to satisfy is, so each of these little, little arrows must be one quarter probability and if we move with one quarter probability to the left and one quarter probability if we move down, then what do we do with the rest? Well, we have to stay, we have to stay on the side. We have to stay on the side with a probability which is equal to one half. Okay, you're following me? So this simple solution here because we don't have periodic boundary conditions. We are not that evolved. We are very traditional in France. We are very traditional. We never move. So we, we haven't invented periodic boundary conditions yet and so we can only stay here and of course then, you know, we have no solution but to, to, to stay on, on this move with probability one half. Okay, so this is very simple. You're here. You have the first Python program. I'm happy to present you the first of my Python program. I heard that everybody uses it here or many people use it here. I was introduced to it by one of the lectures of next, next week, Chris Laumann. I was a, I was a 30, 40, 50 years Fortran guy. I was a really, I had programmed in Fortran since 1977 or something. I mean, or even longer. I started assembler, but then, then it was Fortran and I was Fortran 77 until 2008 until I met some people. I was a, I was really good at it. I mean, I could do everything in my, my, everything in my head. It was just, just fabulous. The problem was I could never teach my tricks to anybody else because, you know, it was, it was anyway. So then things changed. Things changed. Now I do this. Now, now I can explain everything, but I have a code that runs so slow that I never get results. So, so this is the situation. So what I have here, if you remember, if you remember, we had, I did a, I did a fantastic little trick. So here was zero, one, two. So when I was still in Fortran, I started at one here. Now in Python, I start with zero, three, four, five, six, seven, eight. So I set up a little neighbor list and I said the neighbor of, this is the neighbor of zero. Side zero is one, three. Is that okay? One, three, zero and zero. Okay. Is that okay? So, so that means that if, if I cannot go, then I stay. And the neighbors of one are two, four, zero and one. And then I run this and I start a certain, a certain amount of time. And then I hope that as I run, I will be, I will be in equilibrium. So here I have the second program, second program on my side. So this was table basic and this is table basic multi-run. So here I run it 25,000 times the same program. Okay. And then I check on which side I am and output of this fantastic program is here. At time T equals to zero, I told you I always start up there in the upper corner. Do you still remember why I don't start with a random site? I'm still following why don't I, why don't I initialize it with, see this program? Initially, I start with site on page eight. Why don't I take random integer between zero and eight? Wouldn't that be much easier? Exactly. Thank you. Thank you. Right? We are doing, we have already moved from, before 18th century to 20th century. I don't know how to sample a random site. Here, of course, we are simplifying, but in general, I cannot do it. I cannot get the random configuration of QCD on the letters of I just have no idea of, I have no idea of how to sample a random configuration of the easing model. I just cannot do it. So I start somewhere and then there's somewhere, you know, cannot be random. So here it's always eight. So if I started eight, I start here. Okay? All right? So at one time step, I was probably T one half, I'm here, probably one quarter, I'm here, here. I did this by averaging over this 25,000 times. Continue, continue, continue, continue. And this gets completely brown with here one ninth in the limit of T going to infinity. So now, of course, so this may be much too easy for everybody, but anyway, so this is, so here I went up to 10 because, you know, this was program, program in Python. So, you know, it doesn't run that fast. So I cannot go faster. So now let's look already at the, at the, one of the important parts we can set up. So now let's, let's do this calculation analytically. We can set up what is called the transfer matrix. So the transfer matrix is the matrix of algorithmic probabilities. And let's just remember for the next time, so the P from A to B, this is the algorithm. The pi of A, this is the physics. So later, later this will be e to the minus beta e of A, the Boltzmann rate, or if, if I'm allowed to do it, this will be the density matrix of A and A and beta density matrix. So now if I only take, if I only take the algorithm part, so this is the, the algorithm that moves me from, from A to B, then I can write this into a matrix. For example, if I'm on site, site zero, then I have a probability of one half of remaining at zero, or I could go to site one, which is the neighbor, or I could go to site number three, which is the upper neighbor. So I can write down this neighbor and this, this matrix, I, this I can only do for simple cases. Nobody is able to write down the transfer matrix for QCD, for Boltzmann, for fermions, nobody is able to do this. But in this simple case, you know, we can only do it for simple cases, but the lessons we learn, we can use them later. Okay, so this is a matrix. It has quarters and halves, and it's always, it is always positive. It's always positive. And so now what we can do is we can start with our initial vector. So the initial vector at time equals to zero, at equals to zero, pi of zero is equal to zero, zero, zero, zero, zero, and one. Well, there may be, there may be a few zeros missing here, but so this is the vector that we, that we, that we have. And then the, the, the, the probability pi, or the, the vector pi one, the probability distribution, exactly what I showed here, pi one is the multiplication of this, this, this transfer matrix with this vector. So, okay, let me show it here. So this is the initial state. Now I multiply the matrix with the vector, and I get the, they get the position at, get the probability distribution at time one, time two, and time three, and time, and so on. And then I simply have to, to, to do an eigenvector, an eigenvalue analysis of our matrix, of this matrix. So fabulously you'll see that if I, if I have a vector which is one-ninth, one-ninth, one-ninth, one-ninth, okay? All right, so I'm still here, I'll move over very soon. So if I multiply this vector with this, what is the outcome? One-ninth, one-ninth, one-ninth, multiplied with this vector is one-half of one-ninth plus one-quarter plus one-quarter, one-ninth, so this gives also one-ninths. So you see that one-ninths, one-ninths, one-ninths, one-ninths is an eigenvector of this matrix with eigenvalue equal to one, okay? So everything is fine. And all the other eigenvectors are smaller. Eigenvalues are smaller. So for example, the second eigenvalue is 0.75. And then now I can multiply many, many times, okay? I can multiply many, many times the vector with this, with the matrix, the matrix with the vector and so on and so on. And as I continue n times, I'll get something like proportional to the first eigenvalue plus the second eigenvalue to the eigenvalue which is 0.71 to the power of i after i iterations and so on and so on. So now you see that of course in the long, long, long, long time, long-term limit, only the eigenvalue, the eigenvector with eigenvalue one will survive. The others will be, will be, will be suppressed because 0.75 to the power of i goes to zero. And so we are very happy and we have obtained the result that in the limit t or i iterations, iterations going to infinity will recover the eigenvector, okay? So this table transfer, so I have a website, on my website you have, you almost, you almost have all the programs already. So we'll wait a little bit until I, after the end of the second lecture today, I maybe have a coffee and then I, I put all the programs on there. So then you can, you can play with this, of course there's nothing, there's nothing to see. So now let me come to the very, to a first important point. One of the first important point is that for any problem that we treat, for any problem that we want to treat, we always have this setup. Not of a nine by nine, uh, uh, transfer matrix, but a transfer matrix that can even be a, can be a continuous object. So this, this, this, this formalism translates into the continuous object, but you always have one eigenvalue, which is equal to one by construction. This is the construction that we had in the global balance condition because the global balance condition tells us that the sum of the probabilities will always be one. And the second condition is there will be a second eigenvector, uh, there's a second eigenvalue and this will be a little smaller, and this will be, decay like 0.75 or whatever it is to the power of i, and this gives us, this gives us an exponential decay of the, um, of the corrections to the, uh, to the, to the, to the equal probability law that we have in this example. So for example, the second, you know, you see it here, we have, we were, this is the, the optimal, this is the solution, this is the stationary solution. Then we have this, this correction, which is my, sometimes minus, sometimes plus, something complicated, tan power of 75, and this will decay like 0.75 to the power of i, and this will decay like exponential of i, number of iterations, time log of 75, or it will decay like minus number of iterations, the corrections will decay like the exponential of minus time divided by the logarithm of 0.75. So this means that any, any Markov chain algorithm has a scale which is given by this exponential, uh, decay, and this is what makes us sit together in this room, because if we didn't have this, then it would be not very interesting to do our calculations. So here you see that the probability, um, to be, um, yes, so I checked the probability, what I computed was the probability on this side here, okay, so the probability, I looked at the probability at side 0 as a function of time of iteration, and of course initially it's 0, at times, at time 0, this is probably to be a 0, then it is still 0, then it's still 0, then it's still 0, and then it becomes something, something, something, and then it will become equal to 0.29, and I'm just computing what are the corrections to this being 0.19, and you see a perfect decay by 0.75 to the power of i. So first point is any Markov chain has one eigenvalue that is equal to one, and this is the stationary solution, then there must be a gap, even, even the continuum system, there is a gap, and this gap, gap is here, the difference between 0.75 and 1, and this gap makes that, that the convergence is exponential. So first, first point is this, second point is that this exponential convergence is on the level of the probability distribution, so the, the, the random process, this is what I want to try to show here, so the random process itself converges from time to zero, so here's completely non-uniform, then it becomes a little bit more uniform, a little bit more uniform, uniform, uniform, uniform, and it's the probability distribution itself, pi of a as a function of t that converges towards equilibrium. So now, many of you have already done calculations I think, and usually you look at autocorrelation functions of some observables in order to find out, I know, has anybody ever looked at an autocorrelation function of a Markov chain? One, two, three, four, five, six, seven, eight, ten, eleven, okay, so this makes already twelve, thirteen, okay, so usually what your advisor tells you is, well, here's an algorithm, and look at how fast the magnetization converges towards its, its, its, its stationary state, and you do an autocorrelation function, you try to find out how fast it converges, okay, so now what I will tell you that this is, this is nice, we will have a more discussion on it, but the, the real convergence of the Markov chain is on the level of the probability distribution, so maybe you asked yourself this question, or maybe you know it already, but you asked yourself the question, if I, now if I, you know, my advisor told me, asked me to compute the autocorrelation function or the time scale of the magnetization, couldn't I do the time scale for the, for the energy or for some other observable, and wouldn't it be larger or wouldn't it be smaller, so what this argument that I'm trying to, that I'm showing you tells you is that there is a maximum time scale on which observables will decay and this maximum time scale is given by the gap to the second eigenvalue, okay, very important to remember, you understand what I'm, what I'm trying to tell you, if you don't understand what I'm trying to tell you, just stop me because you may not be the only one, yes? So you mean any observable will decay faster, so convert faster than this? Faster or at the same scale, so you know, I mean, who hasn't done this, right, on the, on the log normal scale you look at the autocorrelation function of the magnetization, right, the correlation function of the magnetization, it goes like this, this is time, this is the log of the autocorrelation function and you, and you, you, you fit an exponential, maybe it goes, it has two exponentials or something, right, there may be something, you try to fit an exponential, what I'm telling you is that this may not be the slowest one, but there is such a thing as a slowest variable that decays towards equilibrium and this slowest scale is given more or less by, or this is related to the second, to the second eigenvalue of the, of the transfer matrix. Of course this transfer matrix, we don't know it usually in complicated calculations, but it is there, okay, very important. So this is the transfer, so now let us look at, yes, yes, because I'm doing, I'm, yes, let's do it this way, let's discretize our system, let's discretize our, no, I mean it's not obvious in the continuum system that it is, but in, in fact this is, this is more complicated mass, but let's say for our purposes we can always cut out our finite system into, into little boxes and then we are back to the discrete case, but now the, the question you may be asking is how does this gap evolve as the system gets larger? This is of course another ball game, we'll discuss this later, but the calculation that we are concentrating on, the three by three pebble game, or the five by five pebble game, or the system of hard spheres in a box, you know, 750,000 hard spheres in a box later I will, I'll show you, you know, later today, I'll show you finite systems, the real things that we work on, on our computer before we do the finite size scaling. They always have a gap and they always converge exponentially. So, and there is an upper time scale and this upper time scale is called the convergence time and we may have problems finding it, but it exists, so that's what I wanted to show. So now, second point, very important, the Markov chain in order to converge or to be, to do what we want to do has to satisfy the global balance condition or the detailed balance condition that we just looked at. First condition, and there are only two other conditions that we have, that the Markov chain has to satisfy and then you are guaranteed that at the end of the day or at the end of, when it, once it has converged, you will get the solution that you want and the second condition is called the irreducibility of the Markov chain. Irreducibility can be, can be explained like here, let's take our three by three pebble game and let's have a second three by three pebble game, the red one and the blue one. If we haven't connected the two, if you haven't connected the two, then of course we see that if we start the pebble here, you know I have a nice animation but I don't think I have to show it to you, then the pebble will continue here and will convert towards equilibrium, equal distribution here or if we start it here then we go like this, then it will always stay up here. So this Markov chain is called reducible because it can simply be cut into two pieces and the second condition said that the Markov chain converges and that at the end in the limit of time going to infinity it becomes independent of the initial state because that's what we need is that it is not, that it is irreducible. So this is of course very, very easy to understand. For example here we just, we simply have to put a little bit of epsilon and epsilon probability so that the blue one can hop over to the red one. We are free to do this because we are working on the algorithms but not changing the physics. So we have to have an algorithm that allows us, that allows us to go from the blue sector to the red sector so the Markov chain must be irreducible and only if it's reducible then the team going to infinity behavior may depend on initial configurations and we may have multiple eigenvalues of one or multiple eigenvectors with eigenvalue one. So this is of course a very simple example. So this is the second out of three conditions and the final condition so here you have it here if, so here's the transfer matrix and you cut it simply up. This is a product over two independent, independent transfer matrices and of course this is not what we can use. So now the third condition is a periodicity. Excuse me? So what I'm telling you is that if you have this matrix, this problem, you know what I've tried to write here? This means I have an algorithm where the probabilities are one quarter to go to the left and to the right but only inside the blue sector and only in the red sector then, okay let me go it again, then you have a, then the transfer matrix is of size 18 by 18, right? And it's exactly this transfer matrix. So this transfer matrix as it doesn't have, as all of this is zero and all of this is zero, it has one unit eigenvector with zero, zero, zero, one ninth here and another eigenvector which is one ninth and zero, zero and then the outcome of the calculation depends on where you started. So this is of course what you don't know, what you don't want to have and this is called irreducibility. So third point and the final point is a periodicity can be explained in a simple case here. Imagine you have take, you want to be on, with probably one half here, one probably one half here and you set up an algorithm where with probability one you move from here to here and with probability one you move from here to here. So you hop, hop, hop, hop, hop, hop, hop, hop, hop and this is called the periodic Markov chain and this periodic Markov chain is written by the transfer matrix. If you are on side zero you go to side one, if you're on side one you go to side zero and it has two eigenvalues of modulus one. This is also something that you don't want because minus one to the power of i will still be of modulus one and of course then you have an aperiodic, if you have an aperiodic, if you have a periodic Markov chain then you also run into problems. But so now let me sum it up. We are in a very, very stable in the Markov chains as sort of Monte Carlo algorithm and Markov chains the same thing are so important because you have only to satisfy three conditions, global balance, aperiodicity and irreducibility and then you know that you'll converge towards the equilibrium solution that pi of n. Okay, so very easy. So again, independence of initial conditions in the T, if we satisfy these three conditions then we have independence of initial conditions in the T going to infinity limit. We have exponential convergence that means we have a time scale and that means that so the time state tau and if we simulate a few times this time scale we are effectively in equilibrium. So we have no corrections. Third point, the convergence time scale is related to the gap of the transfer matrix. It's not exactly the same thing but it is related to this gap of the transfer matrix. The convergence is on the level of the probability distribution and so the little sentence that I wrote here reflects the discussion we had. It reflects the correlation functions or the autocorrelation functions of what can be called slow variables. So there are no slower variables than the slow variables but of course we have to find them in order to analyze the equilibrium times of our equilibration times of our algorithms. This is of course an enormous problem because the autocorrelation function for example that if we can compute them you know it doesn't have to be a straight exponential and there can be many many problems. All right so this is the basics of Markov chain. The only problem let's put it this way is the only problem is what is this time scale. So that would be a really fantastic if we had if we had theorems if we had if we knew what is this time scale. That would be really easy you know this sorry in this for example it was it was 3.47 or something in the in the in the three by three people game. So the only problem in our field of research is that we don't know this time scale that is the only thing. So we have to guess it we have to we have to be good physicists we have to be good good whatever we have no usually we don't have we we are we know that in in the long term the correlations the the corrections will go like e to the minus t over tau and the only problem is what is tau that is the only problem and unfortunately although mathematics has been very very helpful in cleaning out of this all of this what is tau is an enormous problem because if we knew it you know we would have for example i've been working on many many problems in the past one of the the thing i will tell later for example hard sphere hard spheres very simple problems and for 50 years everybody is completely mistaken in what is the value of tau they think it's order of magnitude's order of thousands hundreds of thousand times smaller than what it actually is so in this impossibility to actually know what the what the value of tau is it makes a slight difference means that uh while Monte Carlo algorithms is something of an art and something of a science because we work we apply this it's it's between an art and a science it must really be okay so this is the other other comments okay okay so let's continue now let's continue with the metropolis algorithm again so the metropolis algorithm supposes that we are in the realm of uh uh we satisfy the detailed balance condition and the detailed balance condition pi of a p a to b equals pi of b p of p to b p b to a now we solved it did you notice we solved it by just looking at our at our system and said well this is equal to one this is equal to one and so one quarter must be equal to one quarter so we satisfy the detailed balance condition but in more general case if for example you want to be on these sites one zero to eight this probability is different with different probabilities exactly this this values then you choose since 1953 the original paper that invented the Markov chain Monte Carlo method you use this this rule minimum of one pi b of divided by pi of a okay and it can be proven very easily by simply filling out this form right you you have two ways either pi of a is larger than pi of b or pi of b is larger than pi of a and you simply fill out this form and you'll find out that this line will be always equal to this line in all cases so this is really really old stuff it has been used by countless people man let me see who has already programmed this metropolis algorithm yes many many of you okay so it has been programmed since 1953 as I said and this is one of the it is of course not a unique rule but it's one of the rules that can be used so let me just make to make the discussion complete let's go into what is called the metropolis hastings algorithm the metropolis hastings algorithm tells me that the probability to go from a to b or to to go from a to b is equals to an a priori probability to consider the move from a to b multiplied with the probability to actually accept the move from a to b to actually to go there okay for example I can say that the probability so it can be written so for example when I was on the side number four I said I go with probability one quarter I go here one quarter one quarter one quarter but more generally I have to I have to do it in two steps I must consider the move from a to b and then afterwards accept it and if we consider it then just rewriting this condition pi of rewriting this thing here p from a to b let's let's do it on on the blackboard so pi of a times the probability to go from a to b must be if we use detailed balance must be pi of equal to pi of b p from b to a and now we use this rule so the probability to move from a to b is the probability to consider the probability from a to go from a to b times the acceptance probability from a to b pi of a and this must be equal to pi of b times can I'm so bad at writing at what do you have to write here what do you have to write here so this thing b to a times the acceptance probability from b to a so this gives me this gives me that pi p acceptance probability from a to b divided by acceptance probability from b to a and this is equal to pi of b you see it here divided by a priori probability from a to b times a priori probability from b to a divided by pi of pi of a okay so now we are back to what we had already here so here we had pi of a so now we can also use so this can be used I've given example right so this can this is can be done with this metropolis hasing acceptance probability from a to b is the minimum of one and this term pi of b divided by a from a to b times a from b to a divided by pi of a okay so for example what happens if the a priori probability is from a to b is equal from b is equal to the probability from b to a if all the a priori probabilities are one quarter all right for before we had all the a priori probabilities were either one quarter over zero so we did not go from side three to to the outside or from the so they were all the same so if they're all the same then we are back check check to the good old metropolis algorithm pi of b divided by pi of a okay but the point of these a priori probabilities of the metropolis hasing's algorithm is that way that we can invent we can invent any algorithm that we want so here's p from a to b so maybe we want to go with probability seven halves no maybe not seven halves but two sevens in this direction and with another probability we want to go we can do we can do any design that we want to so you can write down any matrix that you want to you want we have a problem and your your your markov chain or your your physical system has a difficult problem to go up into the upper right corner of the configuration space you can force it to go into this direction by making by increasing the a priori probability to go into this direction and you can always correct find a correct algorithm by accepting these moves with this metropolis hasing's probability okay we'll do an example next time i will show you how to use this for example for the for the for the easing model and for for other cases where you actually bias bias the the bias the proportional probabilities of the algorithm all right so this algorithm has been used since for 60 years and but it's not the only one then there's of course the heat part algorithm and many other applications but we can also use for example that you have enormous choice in actually using you have an enormous choice in using algorithm to satisfy the detail balance condition or the global balance condition you only have to satisfy this condition and for example one condition that we'll discuss a little later is supposed that the probability pi of a the probability to be in a configuration a let's say it called it consists of a physical system with with pair interactions for example suppose suppose you have a bunch of particles or two days from now we'll we'll treat a bunch of spins we'll treat a bunch of spins on the lattice okay so usually for a long time what one what one what what what one did is to do a simulation you take your to take one spin you take one particle you try to move it you compute what is the change of energy in this as you do this move and then the change of energy gives you the change in probability rate so and then you compare the old energy the new energy to the to the old energy maybe let me write it you compare the new energy to the old energy and then you accept or reject with this with this weight so let me let me just write it here so the minimum of one pi of b divided by pi of a if we use that pi of b maybe e to the minus beta energy of b then we actually fall down onto this thing that you find it is e to the minus beta eb minus ea all right so this is you you propose you propose a change of the spin or you propose a displacement of your particle from here to here you compute what is the new energy you compute what is the old energy and then you accept or reject so this has been done countless times but there are many other ways to do it for example suppose that you have this factorized version and you have it usually in for example suppose that the energy the energy is a sum over pair energies e i j in this case e to the minus beta e is equal to a product over i and j e to the minus beta e i j and if you have this product over e to the minus beta e i j you can also use this rule you like look at each pair each part of the factorized probability and you accept it with this probability it gives a new algorithm very exciting algorithms and very useful stuff okay all right so there's enormous enormous possibilities so now let us look at the detailed and global balance conditions so the detailed balance conditions i already told you what it is it leads us to this to what to the to the e to the minus beta delta energy but now let us look at the global balance condition that i wrote and the global balance condition was pi of a times the probability from a to c and this means this is equal to the flow from a to c okay what does this mean the flow from a to so this is the probability to be at the side a or the configuration a and the probability to move from a to c together means what is the flow from a to c this is the flow from a to b so this is the flow out of the configuration a and the real condition for for markov chains of a Monte Carlo algorithm is that this flow out of a must be equal to the flow into a so that the flow into a may come from b beyond b to a or from from from c to a so the flow out of a must be equal to the flow into a whereas the detailed balance condition says that the flow from a to b must be equal to the flow from b to a let me give a little picture here so the picture is the following the detailed balance so you have configurations a b c d e and so on and in the detailed balance condition you always look at two configurations so for example two configurations that are connected by a little move of the spin or a little move of the particle okay and you always arrange yourself such that the probability the flow from a to another configuration like from a to c is equal to the flow from from c to a the flow from e to a is equal to the flow from a to e but usually this is too much this is actually too much more than what you need what you actually need is the global balance condition and the global so each of these arrows for example means a flow of equal to one so this the global balance simply means that the flow out so here for example you have a flow out which is equal to one two three and four must be equal to the flow in the flow in is equal to one two three and four so the flow into the configuration is equal to the flow out of the configuration so we will discuss we'll discuss algorithms later on that that satisfy only the global balance condition we'll even discuss an algorithm that satisfies a maximum global balance condition that means that if you have a probability to go from a to c then the probability to go back from c to a is equal to zero and even though this algorithm will install a flow a global flow here so the so it means that the the arrows only go in one direction so here also you satisfy global balance because you have three three flow three units of flow that go out and three units of flow that come in the flow in is equal to the flow out and the global balance condition is satisfied and you satisfy you satisfy and you will also converge towards the the equilibrium distribution okay so it is a misconception that is very common to think that as we are doing equilibrium calculations for example if we do equilibrium calculations that we have to use the the detailed balance condition we only have to use the global then you need here yes you will see you need boundary conditions but so you'll it's for example you can you can build discuss algorithms where you can all the flow over discuss algorithms where the spins I'll show one next time two days from now you have an algorithm that where the spins only turn clockwise so each move that you accept is a move where you go clockwise but so here the periodic boundary conditions are made but it would affect that if you have turned by two pi you are back to where you were but the the the energy or the mean energy the observables are all exactly the same as the ones that if you use another algorithm you just go sometimes usually you go faster okay so it's not a detailed balance condition I think that this you will see that many people actually presently working on global balance algorithms so the the detailed balance condition even the metropolis algorithm will be will be less and less I think will be less and less useful necessary okay so all of this was the setting up algorithms to obtain in the limit time going to infinity the probability solution pi of A so now let us set up a very simple let us set up a very simple algorithm for global balance for global balance for our three by three table game so for example so let me do it here on the on the piece of on the blackboard one two three four five six seven eight so now for example we can say the probability to go from side zero to side one can be equal to one the probability from side one to two can be equal to one so now this is an algorithm look at side you know we to prove that an algorithm is valid you only look at two configured two configurations so for example this one the flow in so if you're on side five you go with probability one you go to side four if you are inside four with probability one you go to side three okay so you have the global balance condition on side four you have the global balance condition on side seven and you have the global balance condition on side two so I've proven almost all the sides and so I'm almost all right so let's move to the next point all right so what do I do now what I can do is for example I can I can set up another chain so this was the white one so for example I'm so I add I do something that is called lifting so I have two choices either I do a I do periodic boundary conditions but that would be that would be too easy okay that would be too easy let's do something more more fancy what we can do with more fancy is we can add so we can say we are either either in the white sector so if we are in the white sector we move like this but if we have a rejection then we move from the white sector to the blue sector or let's do the red sector if it looks better like the red and we go back like this like this so now what we do is all right this is a perfect this is a super interesting Markov chain so if we are on six and we are white then we move to seven then we move to eight and if we are at eight well from white we go to red but we stay on side eight and then we move back to seven to six and so on and so on and so on and so on okay so this idea was this is called this is the simple example where we will do some other examples of a lifting algorithm so you see immediately what is the interest you understand this algorithm do you understand the algorithm is it a little strange yes I thought because until I had this question I okay I changed color only on eight yes but then but but then of course now you continue but so so what I'm doing is I carry I carry one color which is called a lifting variable I started zero move move move move and at eight you know I'm on time at some time I'm on eight then I change and then I'm red and I continue red red red red this sounds really strange but now what I do well let me do something more let me add let me say this is not the probability of one okay let's say this is only probability of one ninth and let me add little color changes of small probability on each side and so on so now what I do with probability almost one if I'm if I'm if I'm on color white with probability almost one I go from five to four but with a very very small probability like for example one ninth with probability one ninth I go simply I change direction either instead of going like this I go in the other way all right so we can prove very easily that for all of these configurations with the color index the flow in is equal to the flow out right all right it's it's sort of flow out for example from eight what is the flow out the flow out is simply that you see the flow out from the configuration one is is equal to the eight with the with the index red so the flow out of configuration eight white is equal to one and then I'm on next next time step I'm on on configuration eight red from eight red I move with probability one two seven red and so on and so on so the flow in is equal to the flow out and we'll program in in a half an hour we'll we'll program ourselves algorithm like this we'll have completely forgotten everything about mark about metro what is what was it called again we've completely forgotten so what is the point here is you know that mark of chains as we use as we used to use them in with with detailed balance they are diffusive algorithms right we have a diffusive so of course the algorithm here this algorithm is the quintessence is the prime example of a diffusive algorithm right you go with the same probability in as out this is what defines diffusion diffusion is defined if you have satisfied the global balance the detailed balance condition what is the problem is diffusion diffusion has a problem and it is that is slow because it's a random walk it will go from from one side to the next to it will it will move away a distance k only in square root of k time steps so it's a diffusive and we always thought that mark of chains were inherently diffusive so you converge very slowly now what you see with this maximum global balance or with global balance conditions is that you can introduce flow into your problem so here we have a lot of flow that goes always to the right or in this example beautiful example we have flow you know just like on the highway you go you go in in you know it's not very interesting in the two-dimensional example but anyway but you go you visit all the sites in in in proportional to in nine steps you have seen all the sites and not in in the in the in the nine square steps yes yes yes of course if you don't yes of course we have to satisfy a periodicity and the a periodicity is can be set in usually it can be added by a periodicity is usually in installed by having a small probability to stay where you are any for example any mathematician that ever proves the theorem will have a 50 percent probability to stay where he is so then then a periodicity is guaranteed so instead of going with one quarter this one quarter like this one quarter like this one quarter like this it only go one eighth and with half probability you stay where you are but in our case we simply install a little probability to move from from red to blue okay so now so this was this now now i come to the last point and the last point is understanding exponential convergence so i was i was thought that i was i'm five minutes faster than i thought i would be but anyway let me just go go through this so we have understood that a Markov chain converges towards the equilibrium exponentially fast so this is this is good news and it seems to be bad news and i want to show you i want to show you in the last minutes that it is not that bad news so we have exponential convergence so the exponential convergence means that the corrections to pi of k the the the stationary probability distribution dk like e to the minus well let's call it t over tau where tau is related to the second eigen eigenvalue of the transfer matrix so this is good news because it says that for t much larger than tau the corrections are zero the corrections are almost zero because the exponential of minus one tenth or one hundredth is but it seems like bad news because what do you want the sample of pi of k can only this only reached t going to infinity so the last point why i want to make is that is this is also a misconception so the the bad news part is a misconception the bad news part is a misconception because and this was found out in i would say the most important article on Markov chain on Markov chains that was written since 1953 it was written by prop and wilson in 1996 they actually looked at this problem you understand the problem right you share your office you are doing Monte Carlo calculation and you share your office with some somebody who is doing exact exact solutions for one-dimensional systems what he will tell you is well i cannot do as as complicated systems as you can simulate but at least i have an exact solution because you will get the solution only in the t going to infinity limit and this limit is longer than the than the time scale of your thesis right so now i tell you what you have to respond to him or not not to respond what you have to think yourself what you have to think yourself and this was as i said this is what you have to think is that the exponential convergence that you see here is not incompatible with perfect sampling perfect sampling means sampling your distribution as is as if it came from a direct sampling algorithm understand we had the direct sampling very simply took in our three by three table game we simply took one of the one of the points with probability one ninths right in our algorithm the transfer matrix seems to tell us that the exact probability one ninths one ninths one ninths one ninth is reached only in in the limit of infinite time so it seems that what you're doing is less interesting than what is what the what the exact solution is so well this is not true because what can be shown very i can even explain it to you that each individual realization of the markov chain reaches total independence from the from the initial conditions at a finite time so let me go to the to the so to the very beginning let me just go to the very beginning what we had we had this algorithm here okay so this is pebble basic dot pi the algorithm that starts from site a okay and does t mark simulations here t marks was four but can be four or a hundred or a thousand now the markov chain goes on and goes on it goes from site eight to fight seven to fight four to fight five and so on and so move on and initially of course it will keep the memory of the initial configuration that was the configuration number eight up here so initially it will be biased towards the initial state here okay so after you saw it that after one step you can only be here here or here of course there's a bias from the initial configuration but then there is a finite time and after this finite time it will have completely forgotten where it came from okay so after this finite time the simulation done with the markov chain algorithm is just as good as the direct sampling i would even say it's just as good as how doing an exact simulation and an exact calculation but when this time comes in depends on the random numbers that you that you pick here so sometimes it comes in very quickly and sometimes the time where you completely lose the memory of the initial state which was eight comes in later and this is just what I want to really make sure that you understand so if you have here after 30 iterations okay you still have a correlation with the initial state which is point zero zero zero one okay you have a correlation with the initial state which is point zero zero zero one but this does not mean that any simulation that you that you do has still a bias of zero point zero zero zero one but it means that one out of ten thousand simulations that you do well but one out of ten thousand simulations with random numbers that you take still has a memory of the initial state and the others have completely forgotten where they came from okay so this is this exponential this exponential decrease is an average you average over those of those markov chains that have completely forgotten where they came from and those who still remember and this number of those who still remember decreases and decreases and decreases and decreases exponentially and this was something a complete change in paradigm that led led to a complete change in paradigm in our understanding of markov chain and this is due to propon wilson and it is now a complete revolution it's still more in mathematics and in physics but it's a revolution in our understanding of markov chains so again let me just say it again so this decay of the correlation function does not mean that any individual realization of the markov chain of a markov chain remembers where it comes from there's always a time where it forgets the memory and this is a very powerful method that allow us today to solve also the problem of what is tau what i said before what is tau because then you can compute where this is okay so now i'm at the end of what i wanted to tell you before maybe we do the break 10 minutes earlier and then we can come back 10 minutes later because i have a lot of stuff to say later on so this independence from initial conditions it's a little bit difficult to explain so i didn't want to do that at the end of the of the of the session so the the the trick that you have to use is to say that it is not the convergence that is important that you have to compute you have to simply it is this this this loss of memory of so the loss of memory or the independence from the initial condition that you have to focus on and you have to simply have to set up your three by three simulation not by saying well i'm somewhere and now let me move with probability one quarter to the right one quarter to the left one quarter to the to the left one quarter to down one quarter up and then i move and then let's move again at this way this way this way this way you have to set up something that's called as a random map you have to to say say at time i if you are here you move down if you are here you move down if you're here move right if you're near you you move down and this allows you very very quickly to to to understand that at some point you simply don't don't know where you come from anyway so this is what i wanted to say now and let's make a break uh let's take questions and then after questions make wrong completely wrong don't listen don't don't no no no no no did you hear this no this is this is no this is heresy so what we are no of course you of course i mean you are yes such a great specialist you know but we shouldn't we shouldn't tell you this so what we are what we were we were solving what we are trying to concentrate on here was the sampling problem the sampling problem is how oh no no okay so the sampling problem let me see just let me see no so the sampling problem is how to place an example of the the probability distribution is pi of zero pi of one pi of two until pi of eight okay and now the sampling problem is to put one random sample onto to pick to pick one random sample out of this distribution so it is a random number between between zero and eight each with probability one ninth so this is the problem that we concentrated on then there's another problem is how to get many many samples to do your statistical averages and so on and so on and so on but this is completely separate what i was what i was trying to explain to you is as soon as you have one random sample you have solved this you know you have solved the problem that is really difficult then you need five or ten independent samples and then you do the average and you get the the average energy it's like an experimentalist maybe many of you are experimentalists so if you have just done seven samples of the superconductor and each of the superconductor had a had a had an upper critical temperature of 340 Kelvin right then you write your paper right you have and then you send it and then but so what i mean is once we have seen two or three samples of an enormously complex system then you have already have your result this is what you want you don't want to have all the samples of your superconductor the infinite number exactly here the problem is to get one sample but this one sample has to be obtained from the stationary distribution so the stationary distribution is one ninth right so the stationary probability distribution is from zero to one to two to eight the probability was one ninth this is the stationary distribution and this stationary distribution is reached on the timescale e to the minus t over tau but the initial so this was this is what we called pi infinity the initial distribution was one on site number eight and zero on all the other sites this was pi of zero and now i reacted so strongly against what you what you said more strongly than i actually believe i should do but is that as time goes on on a timescale e to the minus t over tau you go from this distribution to this distribution but of course the sampling has to be done on this distribution not on this one and you actually have to forget all the memory from your initial state you can do yes what do you mean all the burning you know all of this i mean if i yes i mean believe me i'm you know you know as Anders you know we are we are really you know we really make our money with Monte Carlo calculations right i really make my money with this and i never even think of a burn in time because the you know no i never you know burn in time is is when you think you have enough you've seen enough you know it's yeah of course this is the convergence time it sometimes it's called the burn in time or but this is it's a completely dangerous concept it is completely dangerous because it makes you believe well you know after 600 000 sweeps you know you must be in equilibrium who wouldn't be right who wouldn't be and this makes no i mean i'm you know we are really you know we are really real you know we really know what we talk about and this makes all the wrong papers this makes all the wrong so i mean i tell you and we work on things that have been unsolved for for for decades because people thought there was something like a burn in time and if it was you know if it was a month it must must have been longer than what what you know it must have been longer than this than than the correlation time and this can be so completely wrong and this is the reason this is the reason why Monte Carlo calculations as i said they are not the real science they are in art because we don't know what is this time you know if you had a theorem you know mathematician uh mathematician a or b tells us the correlation time must be smaller than one year you know we would we would be out of out of business immediately because you know one person would sit down and do a calculations of you know of a four-year calculation and then it would be done the reason why there are so many wrong results in our community is because we have not understand how stood how long these correlation times can be they can be decades later today i'll tell you about an algorithm that we invented it has a burn in time of only one year but it runs 100 times faster than what everybody else was using and we ran it for three years on 200 workstations and wrote a fantastic paper but everybody who thought that there's no way that the burn in i mean completely i'm completely allergic to this to this term and the second point is then some people also they take some variable right the energy or something completely local and then they do some fit of the exponential decay right and then they say well the energy decays on a time scale of one day and this is probably the burning time but i'm telling you again that there are some slow variables i'm just now writing a paper on this very subject our colleagues just didn't get it and there are some variables some variables that are really slow and they can be orders of magnitude slower than what you think they are and if you don't if you don't get it then you write the wrong paper then you have a wrong result and it is really really really so what i'm telling you i did not talk about burning times i did not turn up you know don't take it don't take it easy because you know it happens that you do a Monte Carlo calculation and somebody else does the experiment or does the exact solution so you will really you are really supervised yes yes i mean of course you can make many things you cannot you know then you have to be be smart and do it but for example today now i mean now i'm not making a comment i'm just saying that this is just on the sim you can understand the concepts on simple systems and this these concepts have been taken extremely far so for example for the easing model you have now algorithms that prove that you have converged so you simply the easing model at any temperature you simply have proven that you have converged and then you get one sample then you turn off your computer then you turn it on you get another sample you get 10 of them and you have 10 independent samples of the equilibrium policeman distribution many many samples so anyway but thank you for the for the question on you know i'm just i mean you're kind of making a massive attack here but it's but it is this is a real subject i mean the number of calculus i mean you are just starting you're just starting just be sure you know just programming something and running five times longer than than what anybody else does may just not be enough and then again i will not have time to really discuss in Monte Carlo Calcutta Monte Carlo methods are the most powerful methods except uh except of experiment in science they are the most powerful methods you have to you have to satisfy no you have to for any model you have to satisfy three conditions global balance no because he was laughing global balance global balance a periodicity and irreducibility and then you know that you converge and you have a general algorithm so they are the most powerful methods the only problem is again let me say so you have exponentially everything the only problem is what is tau everything else has been taken care of everything else has been taken care of of course you think very very hard of not me this is of course i mean if you're a mathematician you stop because you don't have a theorem on what is this i mean let's not call it burning time let's just call it you know it's either it's called it's called either mixing time or it's called i mean burn in time it's called either called the mixing time or it's called the correlation time so you have to think very very hard on what is what do you think is the slowest variable in your system for example in in cases that we have treated we were thinking very hard of some of some variable then of course then you have another problem you have to compute you have to do a fit of the autocorrelation function right you have an autocorrelation function you don't get tau you know you try to fit here and then you run a little longer and then you try to fit here but this is not the correlation function this is the correlation function minus the mean value because this function you know the the energy will converge or something to something so then if you are really in a tough situation try to find us you know you're in a you're in a winning situation if you found a slow variable and you know it's mean value for example in what we what we we can discuss over dinner we had you know we solved the very old problem we had but we knew that the mean value had to be zero so now we could actually know and we saw it was much much longer than everybody else so we were just you know completely apart but this is what you have to do you also have to compute you have to compare to exact solutions you have to find some other method that gives the exact solution so for example you can do maybe you can do if you do a quantum Monte Carlo you can up to four by four by four you can do an exact enumeration and then you and then and then you have to be and at some point unfortunately in our in our field you have to take all your courage into your together and then you write the paper and you try not to be wrong right that is but but this is this is the this is the real problem what is what is you know you know that the exponential we have exponential convergence you know the correction you know everything what is tau and this is why there is so much push to get more advanced methods to actually compute tau I mean it has been all the field you can look into history you know if you're interested in history of physics you see all the all the confusions about the easing model you see all the papers by by Michael Fisher completely wrong I mean just didn't understand anything until the good algorithms came out the big discussion x y model is it costed soulless in two dimensions it costed soulless or is it or is it or is it a continuous transition the melting in I mean you have hundreds of problems and it's it's only once you so let me give a good good answer to what a better answer you try to find a much faster algorithm than anybody else is using so if you are you know if you if you if you have an algorithm even for a four by four system you know it's like a hundred times faster a thousand times faster than anybody else then you can go and then you run it for a year and then you can be in a much better shape this is what happened in the x y model this is what happened in in in hard disk model this is what happened in the hybrid model this is it happens everywhere you know it's everybody tries to to argue about the some burn in or some correlation time and this is the big problem in Monte Carlo calculation this is the real big problem otherwise I mean it would be you know you have a solution for any model you know you write down any model you have have the solution it would be too easy it would be too easy all the rest of physics would just disappear but this is the real problem so I was trying to give you a little bit of an of an indication of where these problems are and thanks for the discussion because you brought the you know you brought us even even more I see that you are many of you are practitioners and you actually have these problems right so let's say the best the best bet is you know do it as you know as as as only wolf did it for the x y model simplify an algorithm you know that is a thousand times faster than anybody else and then you clean up you clean up a mess it's just incredible okay all right thanks for the attention