 So today I'm going to continue some discussion of cut-off but then switch topics to rate of escape and relating that to mixing as well. So here is one very simple example of cut-off but it already illustrates one interesting phenomena. So suppose we have a line segment and we move to the right with probability p to the left with probability 1 minus p. So this chain could be periodic so we always look at the lazy version namely with probability half we stay in place and with probability half p we move to the right half 1 minus p we move to the left and so we're going to take a p to be bigger than a half so there is a bias to the right and then all these things there is a so if we're doing when we're doing the lazy version there's always a factor of 2 that is sometimes implicit here so the time to need so the hitting time from the left to the right will be n over beta but again for the lazy version you have to double that and so the actual time to take it into mix by the lazy version it twice n over beta where beta is the drift. So in this case because almost all the stationary measure is concentrated on the right so the stationary measure grows by a factor p over 1 minus p as you move from left to right almost all of it is concentrated in the in the vertices to the on the right most side namely you know if I take the k vertices that are on the right then the remaining stationary measure is exponentially small in k and from that you usually deduce that the mixing time is equivalent to the hitting time from the left to the right and this is a classical to estimate using say the central limit theorem which will hold so you have to take into account the reflections here and here but you can still deduce that the mixing time is n over beta again times 2 because of the laziness. So here is an occasion for one nice exercise show the mixing time in L2 is so I'll write t makes L2 of epsilon is asymptotic to some a constant times n a where c is strictly bigger than 2 over beta so in other words while mixing time here is determined by the drift so it's n over beta again times 2 if you want to mix in L2 it takes strictly longer than to mix to mix in the total variation note that there is cutoff here namely this both in L1 and in L2 so in L1 you know t makes of epsilon for total variation is asymptotic to a n over beta again times 2 this is true as n tends to infinity for any epsilon so the first order of symptoms don't depend on epsilon but if you look at the L2 mixing time there is again cutoff so it's asymptotic to a constant n but it takes strictly longer so this very simple example already illustrates that L2 which is the most often the most convenient analytically to analyze well this example is very simple in L1 but in richer settings L2 is very easy but you have to be aware that in settings of bounded degree like this one L2 mixing tends to happen later than L1 and you see striking examples of this in Ayal's last talk I believe okay so with that with that comment I'm so I'm going to maybe skip this discussion of trees since I left it as an exercise but but again say that maybe I'll say the word so I stated the exercise and I think I'll gave you the same exercise to take a tree with n nodes so n is equivalent up to you know well it's about 2 to the k where k is this depth and you know it's twice that minus one and the question is to show there's no cutoff so one way to show there's no cutoff is to show that T mix T mix of epsilon behaves like some constant that depends on epsilon times n but let's not worry about T mix epsilon so T mix of a quarter show that that is equivalent to n and show that the relaxation time is also equivalent to n and this is again lazy simple random walk on this tree and this is a binary finite tree of depth k all right so the relaxation time is equivalent to the number of vertices the mixing time is also equivalent to the number of verses and because of the necessary condition for cutoff you deduce that this case doesn't have cutoff or pre-cutoff okay so often people think of confused trees and expanders because you know the infinite tree is in a sense an expander but finite trees are very different from expanders and so while on many expanders will have cutoff on the regular trees we want okay so so one problem that is still open is to characterize cutoff in a way where we can determine it without very precise calculation so we see that in this example just by estimating mixing time and relaxation time up to constant we can determine that there's no cutoff so to rule out cutoff we can do rather crude estimates where we lose constants but to prove cutoff we don't have yet a general method that allow us to do crude estimates say the mixing time is this order something relaxation time is that order and because of that we deduce that there necessarily is cutoff so for positive results we need something more so that's a problem that you know consumed us say for some years one result in this direction which I will say more in the last lecture is with really basu and Jonathan Hermon and this relates cutoff in the sense of mixing to cutoff in the sense of hitting large sets so so analogously to the mixing time which says how close we get in total variation to the distribution we can look at a hitting times of large sets so what so formally hit half of epsilon is the first time where for any starting point and for any target set a of stationary measure at least a half the probability that we still haven't hit it the tail of the hitting probability is less than epsilon so see it's very close to the notion of a mixing time but the differences instead of asking for distributions we're asking for tails of hitting times and as is noted here the point of using this definition is that it allowed us to prove that in the class of trees the conjecture of the conjecture that T max over T rel goes to infinity in this always necessary condition is actually sufficient for cutoff in trees and the earlier result in this direction is a with the Jan Ding and Eyal we show that on we show that the on for birth and death chains the condition is sufficient so this goes to infinity if and only if if there is cutoff and the same thing was proved earlier for separation by the cognizance of course so the historical story goes the other way from what I wrote it so the first result is that conic of course to consider not the total variation distance but the separation distance that I described last time and showed that on that distance a the conjecture holds if this ratio goes to infinity then that's equivalent to cutoff and then in this paper with Jan Ding and Eyal we showed that on birth and death chain again both of these results are for birth and death chains so those are just a mark of chains that move on a path at any point they can go left right or stay in place but it's not simple random walk so you have arbitrary transition probabilities to the left to the right and staying in place and and in all these cases to avoid periodicity let's assume laziness in all these cases okay so that those were the first two results and in particular these results show that for a birth and death chains a cutoff in separation is equivalent to cutoff in total variation and then in this later work with the Bason Hermon we show the same thing holds for trees so by trees I mean now not just simple random walk on trees but any mark of chain where the the underlying graph of allowed transitions is the tree and it doesn't have to be a regular tree so you have an arbitrary finite tree and the only assumption is that the transition probabilities are along the edges of the trees but they could be arbitrary any chain like that is reversible so so again in this class there are no counter examples mixing over relaxation going to infinity implies cutoff and one can somewhat generalize this so both in the in this paper both in birth and death chains and for trees if instead of going to neighbors you allow some bounded jumps then you can still get the same result so there is some regularity assumption there so I don't discuss that generalization in detail but it can be generalized from nearest neighbor to some bounded jumps okay so the technique of this proof relies on this notion of cutoff for hitting times so so they so there is a hitting cutoff for a family of laser reversible chains if and only if the difference a hit half of epsilon and hit half of one minute epsilon is little o of hit half say of of a quarter this is an analog of the definition of cutoff but now for hitting times and and begin the basic result is that it's enough to verify this type of modified type of cutoff to deduce in fact it's equivalent to cutoff for mixing which is what we usually care about and the reason this sometime so this looks like okay we replaced one definition by another definition why is this helpful the reason is that in some cases we can more easily identify what are the sets is harder hardest to hit as opposed to what are the sets that take the longest to mix so trees is one such case and they won't give you the details of this proof but I just want to illustrate what are the sets that are hard to hit in the tree so given that I have an arbitrary tree there's always a notion of a central vertex so what's in so a vertex is central what does that mean it means that if I take the tree T and remove the vertex V so I take tree and just remove the vertex V together with the edges adjacent to it this of course will be a union of some new trees which are the connected components remaining that they correspond to the degree of vertex and all of these should have all of these should have stationary measure less than a half is the well less than half the total station which was which was one so so we want that when we remove it no piece of the tree will be a will be larger than than half the tree so so there always so here's one kind of easy fun exercise if you haven't seen it show that for any tree so this is a finite tree a pie here of course is just proportional to the degrees so the stationary measure for a simple random walk is given by the degrees in general we're talking about a tree with some transition probabilities and pie is a stationary distribution for that it's true you know in that generality it's true really for any measure on the tree that there exists a central vertex can there be more than one can there be more than two no at most two so if you have a tree like this then both of these are central vertices but so there exists a central vertex and and at most two what they are they're necessarily so so so here is one exercise and then a the moral morally what drives these proofs is the fact that the sex at the hardest to hit are the sex that are hiding behind the central vertex so if I have so for every central vertex and every starting point x I can look at the set of points that hide behind the central vertex so it could be several components here so for instance if this is x then these two components are hidden from x or separated a by the central vertex from x so so this set which includes a the central vertex v and these components that are not the component of x a formal art set of measure at least a half and this type of set is the hardest to hit in a tree and that's a so because we can identify so that requires proof but because we can identify the sets that are hardest to hit a we can analyze trees better than better than other graphs questions okay so I want to say something about product chains which are an important class so so if I have n different chains often we take them to be copies of the same chain like we do in the hyper queue then we can define several natural chains on the product space and one chain is choose a coordinate at random and then move in that coordinate so formally this is given by so if we had original transition kernels PI on the ice chain then this lifts to the product space so PI tilled will be a transition kernel which only moves the ice coordinate and keeps the other coordinates in place and then the chain is just an average of these PI tilled formally what this means we have n coordinates each one is a chain in its own right we choose one of these uniformly of course you could use other weights and then we move in that chain so so this is a type of chain that's often used and here is a here is one result there are many versions of this result I think the earliest is due to the iconic cell of course 96 the version I state here is in is actually written here for the continuous time version of the chain I'll explain and and there's a there's a very nice version in the work of a Lubeckian Allen sly and also in the work of Laquan that I'll come to so the version I'm stating here is actually a little bit different because it's a this is the version that's easiest to prove it involves instead of the discrete time chain the closely related continuous time chain where instead of doing what I told you where we choose a coordinate random and move that we do this with we wait an exponential time and then so a potential time with mean one and then we choose one of those coordinates so it's equivalent to giving a exponential clock of mean n to each of the coordinates and having them all move independently so that's continuous time is you know in some respects a little easier than this quick time because it creates some more independence but the type of results remains the same I'm just stating the continuous one because that's the easiest to prove and the basic thing you see here is a generalization of the hypercube result so remember in the hypercube we had a mixing time of n log n over 2 with cutoff so in general if you take a fixed chain and take the nth power of that chain that always will have cutoff and we can pinpoint where it is so the the cutoff time is so is n log n the instead of n log n over 2 there is another factor gamma which is the spectral gap of the underlying of the underlying chain in one coordinate so for the hypercube the underlying chain was very simple it was a two-state chain where in time one you just move to the uniform distribution so because it's lazy so time one so that chain has spectral gap has eigenvalues just one and zero so spectral gap of one so in that chain you get half n log n the generalization here is a you still get the half n log n but there is this factor one over gamma which comes from the relaxation time of the individual coordinates so you see that when I take a large product the mixing time of the large product is not determined by the mixing time of the individual coordinates but rather by the relaxation time so this comes back to the distinction between a you know classical Markov chain analysis and the more modern one so in the classical one one took a fixed chain maybe with a million states maybe with a hundred states and drove time to infinity and then the asymptotics really depended on the highest eigenvalue in the you know more modern theory modern here goes back to 1980 we think of a growing sequence of chains and we're just interested to to drive the total version distance down to some fixed epsilon not to really close to zero and then the mixing time is not determined just by the highest eigenvalue however in this setting when we take a large product in order to get the total version distance small on the large product you really have to drive it to very small on each coordinate hence the modern theory on the large product reduces to the classical theory on each coordinate that's why the determining factor is the relaxation time in each coordinate rather than the mixing time in each coordinate okay and this theorem so again in the Markov chain book you can find the proof of this so i'm not going to give it here in detail but i wanted to explain you know where it comes from one here is n chains run independently what it's like n independent chains yes run independently yes with this exponent yes exactly now the way i stated it here i wanted to keep the correspondence with the discrete time chain so that each coordinate had a clock of mean n often when we go to continuous time we then also speed up so we make each coordinate move at the with mean one or rate one and then you will lose a factor n everywhere here in the mixing times so just be aware when people talk about continuous time often they do this rescaling which i haven't done here now one very very nice and very general theorem by Chen and Salaf Kost says that if i have a discrete time a lazy chain or a continuous time chain they have cut off together so you can pass from one from one to the other so improve whichever version is more convenient in an application one direction of this is very easy or i should say one bound in this direction so if you can so it's easy given the lazy chain upper bound it's easy to obtain a continuous time upper bound because continuous time chain can be seen as an average of the lazy chain but the other direction is non trivial it's kind of a Tauberian theorem so an abelian theorem is one that involves averaging it's usually easier so that's passage from the lazy or discrete time chain to the continuous time chain is an abelian theorem it's very easy this was everybody knew this before that paper the non trivial thing here is to go the other way and to go from a mixing of the upper bound in the mixing of continuous time chain to the lazy chain which involves some it's an analog of a Tauberian theorem or deep authorization i should say that for reversible chains we can go further and so in work with Jonathan Hermon we proved the conjecture of Aldous which says that for discrete time chains instead of doing the usual laziness device to get rid of periodicity you could just do one lazy step in the beginning so again this is only for reversible chains and instead of looking at mixing time of the lazy chain or of the continuous time chain you can just take one lazy step in the beginning and then all the rest are just steps of the discrete time chain and that one lazy step is enough to kill all the possible periodicity and you get a cutoff in this revised one step lazy chain if and only you have cutoff in the continuous and lazy chains so this is only true for reversible chains because for reversible chains the only real type of periodicity you can get has period two in a of course in non-reversible chains you can get other periods and and this one lazy step is not going to do it any okay so so one other thing i will comment on is that the proofs for the continuous one when one works with product chains there's another metric i didn't discuss which is extremely useful which is the Hellinger distance which is written here again in the in our Markov chain book and in many other sources you can find more discussion of this but the keys of the Hellinger distance is that it behaves very well on products much better than a total variation and it can be bounded above and below by the total variation distance and so so i'm going to skip the details of this of this proof and just tell you one more thing about a product chain so until now i discussed the easy case of a product chain where i take a fixed chain and take it to the end power and that always has cutoff but another natural thing to do is to take a sequence of chains that is changing and for the nth one to take its nth power so so for a while i believed that all such examples should have cutoff right so you take the nth power of a chain but that chain is also allowed to vary within so for instance a so one problem i mentioned but let me officially give it as an exercise in slightly different notation that i mentioned before so look at the z z deal okay so i'm going to take the say zk of n so this is going to be this is just going to be a cycle on k equals k of n nodes so k of n is at least two but is an arbitrary number and then i take then a lazy simple random walk on z k of n to the end has cutoff so i make no assumptions on this k of n sequence except that it's always a bigger than two so it could fluctuate arbitrarily and yet so the point of cutoff you know can so the formula for the mixing time of the nth chain you know depends very much on k of n but but there is always cutoff meaning you know t mix end of one nth epsilon over t mix of epsilon goes to one so this is an example of chains that change with time and we take the nth power and this always has cutoff but it's not the most general example and what Laquan showed is that in general the best you can say is that all any product of n chains has pre cutoff with factor two so but so this okay so this is true but maybe it's better to think so in this okay in this notation epsilon should be close to one but this is true for any choice of epsilon um and he first proved it for separation and and then for it's true also for total variation via the helinger distance so the example that shows that this two is sharp so the two cannot be improved is actually a variant of the Aldous counter example so and I'm not going to give the details on that so one thing that I still believe is that if you have if the underlying chains that you take to the nth product are have sufficient entropy then so there are some sufficient conditions that are still not articulated which will show cutoff you know more general than this case so so there should be a theorem there isn't yet a sufficiently general theorem that gives a so so this is I wouldn't call it an exercise what more a problem a to characterize when if you take some say some graph g n and take it to the nth power in the same sense when this has cutoff so it always has pre cutoff no matter how the g n change as long as you know you're doing lazy chains or continuous time chains but when it has cutoff a well normal all examples but again not all okay um so the Lacquan example and maybe I'm not going to give it in detail right now and I want to tell you about one more tool to prove cutoff it's especially useful for separation distance but can also be used for total variation and this is the notion of a strong stationary time so so given a mark of chain we know what's a stopping time in here we're going to discuss stopping times that can depend on additional randomness so for instance an important kind of stopping time to remember is look at the hypercube example remember we had the lazy hypercube we had n coordinates and chose a coordinate at random every time I can ask for the stopping time when I have I chosen all coordinates right so that's the coupon collector time notice that this is not a stopping time for the um I mean the stopping time is it is a stopping time but it's not measurable with respect to the random mark on the hypercube itself because if I chose coordinate three and and then didn't flip it because it's a lazy step or I chose coordinate five and then didn't flip it there is no evidence in the in the movement of the chain if I keep track of where is the particle there is no evidence in the movement of the particle to whether I chose coordinate three or five but in the definition of my stopping time I am looking at which coordinates I chose so formally it's a stopping time in an enlarged filtration where we are keeping track also of the coordinates we chose the key thing that keeps it a stopping time it doesn't look into the future so in this enlarged filtration so that stopping time is an example of a strong stationary time so what does that mean so not on so first of all at the time where it shows all coordinates in the hypercube the distribution is completely is completely uniform but that's even true if I tell you how long it took right so how long it takes it's a coupon collector time it's a tends to be n log n but suppose I was lucky and I managed this is very unlike you know very unlikely event that I chose the first 10 coordinates I chose were all different that could happen with a very tiny probability but even on that event when I chose all coordinates the distribution is completely uniform and if it took less or more time so that's why it's called the strong stationary time contrasted with the notion of stationary time a stationary time is one where at that time the distribution is uniform but it's not uniform if I also condition and how long it took so let me show you an example of a so here's an example of a stationary time which is not strong a choose choose some node z distributed according to pi so a random node z with distribution with this stationary distribution and use tau is just the hitting time of z so min t so that x t equals z so obviously x tau you know it's equal to z so obviously it has distribution pi okay but you know if I I know the starting point say I'm x zero is here and I chose I chose this z if I'm told that tau is two then I know that the distribution is given that additional information distribution is very non-stationary I know that I'm very near the starting point so there's a distinction between stationary times and strong stationary times both can be used for mixing so there's a very interesting fact that stationary times still bound the mixing time for lazy chains so this is not at all obvious so the so for lazy chains a T mix so this is a result of Aldous from the early 80s that T mix is a lazy chains or continuous time change T mix is bounded by a constant times the expectation of tau so and to be precise let's take the maximum expectation I'm not writing it but maximize over the starting point of the expectation of this stationary time then the mixing time is bounded by that assuming laziness but so that's a non-trivial theorem there's a much easier theorem which is at least as useful which is that strong stationary times bound the mixing time this is because many natural stationary times are in fact strong stationary so we saw one example for the hyper cube and we'll see some more so so this is so so here is a little argument that bounds the separation distance so this quantity 1 minus ptx y with pi y in terms of a strong stationary time so for strong stationary time again one characterization is that the distribution time tau is stationary and it's independent of tau so from that you can easily deduce there's a one line deduction that if you have a strong stationary time the probability that tau is less equal t and x t equals y can be decomposed as a product of the this probability and the probability that x t equals y which will under this conditioning which will give you a pi of y so this is actually an equivalent definition of strong stationary time now once you know that if i want to look at this difference so this is just the definition of ptx y and then we bound above the probability of x t equals y by throwing adding the additional condition that tau less equal t and then uses the composition to just get a one minus the probability of tau less equal t in other words the probability that tau bigger than t in other words for once we have a strong stationary time we can bound one minus this ratio by the tail of that strong stationary time and notice that this is true for any target state y the initial state x is fixed here so when i write p of tau bigger than t really this course represents this probability starting from x so so we discussed in the hypercube the mixing time in total variation is half n log n the mixing time in separation actually is n log n and and this argument with that stopping time gives a sharp bound in that case now okay here we always have an inequality but in some important cases there is equality here so what is when do we have an equality if adding this condition doesn't change the probability and this is when y is a halting state so maybe i'll give this definition so then sx of t is the maximum over y of this quantity one minus ptx y over pi y and the bound we saw two slides ago is just that sx of t is always at most the probability of tau bigger than t but there are important cases where you have equality and that's when you have a halting state so suppose you have a state so so a halting state is a state where so given a stopping time tau a halting state is a state that i know okay i fixed the starting point a halting state say that i know if i hit the state then the stopping time must have already occurred so i run the chain and and i have this tau and a halting state is one where i can't hit it before reaching tau so again looking at our canonical example the hypercube suppose we start with all ones so then the all zero state is and and our tau is as i said the tau is a coupon collector time when we touched all coordinates then the all zero state is a halting state because there's no way to reach the all zeros without touching all coordinates okay so once i have a halting state then the previous argument just in this inequality i have an equality because x t equals y means that tau must be less equal t so this step which was an inequality actually is is equality and so we get that um this general inequality is once we have a halting state it is as an equality and it means that any so we get a somewhat surprising inclusion if we have a strong stationary time with a halting state then it is optimal meaning its its tail so its tail is is smaller than the tail of any other strong stationary time okay because for any strong stationary time we have this inequality holding and if the strong stationary time has a halting state then this inequality becomes equality so it means strong stationary times with a halting state are always optimal yes don't you take the maximum over that kind of size yes but you can you can take the maximum but first um and then you would take the maximum also over the right hand side but first we can look at optimality with respect to a fixed starting state and that's what i'm talking about here sx is the sx is the maximum over y yes okay so so if inequality holds and not equality for one y then what does mean that you get an equality of maximum so the point is this this is an equal so this is always bounded by right so the right hand side here is p of tau bigger than t so it means that for every y this quantity is at most p tau bigger than t if y is a halting state you have an equality for that one for that y but it then means that this y is realizing this maximum right because it's it's bigger than the the value for every other y okay sorry i missed the the definition of the healthy state okay so yeah because i only said it in words it didn't write it so so y is a halting state so halting state depends on two things a stopping time tau and an initial point x okay so the halting state can change according to the initial state so if the probability starting from x of tau less equal than a then tau y is one so tau y is the first time that the chain reaches y okay so tau y is the hitting time of y we started x and we want that tau y will be with probability one greater equal than tau that's the definition of a halting state for tau and x okay so again in the example of the hypercube we started at all all ones the all zeros is a whole y equals all zeros is a halting state for this stopping time the coupon collector stopping time and the initial state of all ones so let me give one famous example of cutoff which is proved using strong stationary times so this goes back to Aldous Diaconis from the 80s and this is top to random insertion so this chain will be not reversible so some of our tools disappear it's not lazy yet it's very easy to prove cutoff for this chain just knowing some basics about coupon collector or about some of geometrics so what is the chain we have n cards in stacked pick a uniform I'm sorry take the top card remove it from the deck and stick it in a uniform spot so after we remove the top card we have n minus one cards and there are n spaces because we can put it in the very bottom or the very top so there are these n minus one remaining cards define n spaces between them and we put the card uniformly in one of those n spaces okay so that is called top to random insertion and it has cut off a time n log n with the constant of one so we'll justify that shortly now for the upper bound one uses a strong stationary time so so here is a strong stationary time as you you put the top card as as you keep running the chain the original bottom card keeps moving up and if you think about it the cards under the original bottom card are in random order between them because so suppose I have you know after inserted so it might take a while before anything goes under the bottom card but eventually so this is geometric with a parameter one over n the time until the bottom card rises by one then we after it's already not the bottom card now it's easier to get cards under it because there are two spaces there so now it's geometric with parameter two over n but the two cards that will move there you know when the second card to move there it has equally chance to go in each of the two spaces under it so when I have two cards under the original bottom card their order between them is uniform and you can continue this by induction when I have k cards under the original bottom card the k those k cards if I look at in what permutation they occur all the k factorial permutations are equally likely very easy to check by induction so when the original bottom card reaches the top position all the n minus one factorial permutations of the other cards are equally likely and then I take this original bottom card which is now at the top spot remove it and put it in a uniform spot so now everything is uniform so it's a stationary time and it's even a strong stationary because even if I tell you how long this process took still everything is still true that the distribution is completely uniform this is a strong stationary time and it's easy to understand its mean and even its distribution because again the bottom card to rise one is geometric with parameter one over n because each time you have a chance one over n of putting a card under there then so let me write this down somewhere so so tau this stopping time which is a time for bottom card to reach top and then plus one because after that I want to make one more move so this is a definition of tau and tau has a distribution which is first geometric one over n plus once it's at height two I have geometric two over n and so on so a so a so so the expectation of tau is the sum of these expectations and this will give and this is going to be very close to n log n asymptotic to n log n and also you can check for instance by Chebyshev or by more refined ways that tau is concentrated so around n log n so the variance of tau is the you know you can sum the variances so it's its order n squared so so tau will vary by about n from from n log n so it's concentrated near n log n so this means that if I wait slightly more than n log n with very high probability tau will be will have happened already and hence we will have mixed so now although there is no official collecting coupons here observe that tau is exactly except for this yeah tau is exactly the coupon collector time because if you think of the coupon collector time time to collect the first coupon is one the next coupon you know it's geometric n minus one over n and so on so the coupon collector time has the same distribution except you have to sum it this way instead of this way comment okay so so okay so this is a strong stationary time is it optimal or can you come up with a better strong stationary time okay I'll answer this one but the next question I expect you to answer so it's not optimal there's a better strong stationary time so instead of waiting for the bottom card to reach the top and then and then make one more step let's look at the second from bottom card okay so this is the card that started in the second from bottom position wait for it to reach the top and then do one more step so of course this will happen earlier strictly earlier so if I look at a new stopping time tau tilt which is the first time so that a second second from bottom card reaches top plus one so this is strictly smaller than tau but it's still a strong stationary time because it's also true for this one that the cards under it are in a uniform random order so initially under it we have the bottom card but then the next card that comes in will come in either on top of the bottom card or below the original bottom card so the fact that these cards are internally in uniform random order remains true for this case so this tau tilt is better okay so can you think of a strong stationary time that's even better than this it this time it will be uniform so again so the the bottom card could reach the top it allows for it because I mean yeah now the bottom card for you know for the distribution to be mixed the chance for the bottom card to be to be at at the very top should just be one over end it shouldn't be you know larger than that and and it is one over end because as we said in the before the last step the original you know the second to bottom card will you know when it reaches the top the distribution under it is uniform so a of the other cards so the original bottom card could be with probability one over n minus one it could be right near the top and then we just need the this current top card to go to one of the locations under it which is another factor n minus one over n so the chance it will the original bottom card will reach the top is exactly one over n as it should be so this is a strong stationary time but I'm asking is it optimal because there is a halting state for it if you just fix some state where the second to bottom card is below the bottom card then this is the right so we have really a bright students here so thank you so a okay so so let me repeat this so we look at this the two cards the original bottom card and the original second from bottom are going to rise together and retaining their order until the original second from bottom is moved so until this time tau tilled so so any state you can name any state you want where the original second from bottom is below the original bottom any state like that is a halting state for this tau tilled so it means we don't have to search to see okay can we define in some other clever way a better station no we're done this is this one is optimal and it's optimal simultaneously for all t so it optimizes the tail for all times together so this is not obvious that you have a strong stationary time which is at the same time optimal for all for all t one thing I should comment there is a theorem of aldous and dirkonis that says even when you don't have a halting state so sometimes there is no halting state but there always is an optimal strong stationary time so there always is one that achieves a that where the equality aim so let me get back almost die so equality here holds whenever you have a halting state but even in but sometimes there is no halting state and maybe I can I can give another exercise find chain so that every strong stationary time has no halting states so I think there is such a chain with three or four states so you don't have to look for very big chains however in many examples there is and then it's very useful okay I'm going to excuse me yes yes so so let's just fix x so for fixed x which is the starting position every strong strong stationary time has no halting states okay I could talk more on a cutoff but let me stop this for now because I want to start telling you something about the other topic announced of this theory of this course which is rates of escape so I think so a lot of what I'm talking about in this part of the course you can find in chapter 13 of of this book this is a book with rust lions that was just finished this year after some 25 years of work and you know it's available both from Cambridge University Press and online on rust lions web page you can find the online version so and this is actually who goes copy so you can go look at it in his office if you're staying around so and I know Jean-François has a copy as well I wish I could bring you all copies but the carry on you know doesn't the plane would not take off you know so it's 700 pages so anyway so here's a little proposition about mixing which involves a rate of escape so let so consider simple let's say lazy simple run the walk so it could be actually lazy or not lazy on a because I'm doing a lower bound on a graph g of n nodes then there is a bound for the mixing time so okay then for any epsilon less than half if n is large enough then t mix of epsilon is bigger than the diameter of g squared over a constant log n so so contrast this with the trivial bound which says the t mix of epsilon is bigger than the diameter of g over two okay so the so the trivial bound follows because if I have a if I have two nodes a x and y that achieve the diameter so the distance so when I talk about the diameter I'm talking about the graph distance so the distance from x to y is the number of edges in the shortest path between them and then the diameter represents the distance the largest distance in the graph so we suppose the graph distance in g between x and y equals a d which is the diameter and then if if t is less than the diameter over two then it follows that if I look at the distribution from x and from y right their total variation distance is maximal because one is supported on the vertices that are closer to x and to y and the other is supported on the vertices closer to y than to x so this total variation distance is just going to be one for all times and it means that the d bar metric so d bar of t is one but but d of t is bigger than half d bar of t certainly d of t is a distance to stationarity maximal distance to stationarity while d bar is a maximum distance from two nodes so it means that the total version distance to pi is at least a half for okay so that's the trivial bound but here now the trivial bound is sometimes sharp for for an expander the diameter is log n and the mixing time is order log n as well so but you know if we're talking about bounded degree graphs which are a lot of the graphs we care about then the diameter is at least log n and then this bound is at least as good as this and often it's much better because in many graphs we know the diameter is much much bigger than log n and then this would be better it also hints at the importance of diffusive estimates for random walks much more general than you know random walks in rd that we are familiar that have this diffusive behavior so behavior the fact that time to reach distance r is often about r squared is much more general than we are used to and this is one example of that okay now the way we'll prove this is from a very general and famous estimate which should be even more famous and that's the veropoulos current bound which has many applications i'll show you several of them so veropoulos current bound yes it's called the veropoulos current long range estimate or long range bound it says that for this is from 1985 now this is not a joint paper there was a first paper by veropoulos that proved a slightly weaker version of the bound and then in the same year and motivated by that a kind of a almost optimal version by Keith Karn so these are two separate papers and what they proved is the following so if so for a reversible chain it's all right pi p so pi is a stationary measure and p is the transition matrix which can be finite or countable and let's make it the let's add let's make it irreducible so then we this this can be dispensed with so again I remind you reversible means that pi of x p x y is pi of y p y x and i'm going to run right p x y sometimes small p sometimes big p it means the same thing just a transition probability from x to y so that's reversible and then the inequality says that p t x y can be bounded as follows so square root of so there is a less important factor in the front this ratio of transition probabilities with square root and then an important factor let me write it in an explain so to root pi y over pi x e to the minus distance squared from x to y over 2 t okay now well the right hand side it's clear what it means this is and you see a Gaussian tail I know there is no is this is not in z d it's completely general setting and what is here so st is a simple random walk on the integers let's say we have this completely general reversible chain and we can bound the individual transition probabilities at time t by the probability that the simple random walk on the integers the most friendly object in has gone distance bigger than d x y so what is d x y so given a law given a reversible chain we always have a graph where we connect x and y if we can go in one step from x to y and reversibility means that this graph is undirected if we can go from x to y we can go from y to x and so this is the graph we work with and the distance is just a graph distance here an equivalent description of a reversible Markov chain you can just start with a graph put weights on the edges and walk with probability which is proportional to the weights so every reversible Markov chain can be described this way and and this always yields a reversible Markov chain okay but so the d x y is distance graph distance in this graph of allowed transitions okay any any questions about this bound yes in fact the I'm not going to go into that but the you're encouraged to look in the original Veropoulos paper because what Veropoulos did he took a graph he took every edge of the graph and replaced it by a pipe and this way he obtained the manifold and then he proved that if the pipes are thin enough random walk on the graph is well approximated by Brownian motion on this manifold and then he used a continuous version of such inequalities on this manifold to deduce the graph inequality and and there are other so in continuous time there are actually more variants of this that I won't go into but the what Kahn realizes so of course in doing these transitions he lost something in due to approximation errors and then what Kahn realized what's the you know correct discrete analog of this continuous argument and that's the proof I'll show you later okay now the second inequality here is just the standard Bernstein or Chernoff bound for simple random walk it says you know the Gaussian upper bound for simple random walk so you know for so you can all you know you can look this up in in Wikipedia or wherever you like so this I'm going to leave either you know it or it's an exercise to verify it or look it up so going from so this is just an estimate for simple random walk and integers that probability it goes far so you get it just by looking at the moment generating function of the simple random walk so they so of course this is not new to these 85 papers this is your you know 80 years earlier or so but but so this is the the inequality put in these papers and so let me so we'll see a proof of this but let me first tell you you know how it does it imply the mixing time bound I wrote down it's very easy draw so again start with the same picture find in the graph so this application this application is for finite graph since I'm talking about diameter so find x and y that realize the diameter so the distance between x and y is capital D which is the diameter look at the set a which are all the point z so that the distance from x to z is the is at most the most d over 2 and observe that for every z and a the distance from y to z is going to be at least d over 2 because otherwise you'd get the contradiction to the triangle inequality right if the distance was strictly less than d over 2 then you'd get distance from x to y to be strictly less than d okay so and the distance from x to y is d so so we have that now so for every z and a what can you say about so pt yz is less than so we use that bound so how so we're talking about the graph so so here I should have emphasis right so this is a lazy or simple or so it's true both for lazy and non-lazy simple on the working graph pi is the same it's just given by the degrees so the largest possible ratio of pi y over pi x is going to be a root n so every vertex has degree at most n and at least one because I'm talking about so it's okay I should have so when you talk about a graph I didn't say I wanted to be connected so my chains are reducible and the graphs are connected so add the word connected otherwise the mixing time doesn't even make sense so it would be infinite so the graph is connected and so this ratio root pi y over pi x is at most root n and then a then we have this e to the minus the distance of x y was the diameter over two so we get the d squared over a t so now if you plug in a t which is the diameter squared g over a 16 log n so to be precise you have to put integer parts I'm not writing them down so you can add them as needed so but if you plug in a this choice of time into this formula right so this is this is just d squared over 16 log n so the d squared will cancel and so this will become e to the minus a two log n so overall this will be n to the minus three halves and I omitted the two okay so there's a two okay so in other words the transition probability from y to a is going to be less than two over root n because the size of a is at most n just sum over and the other hand the so the same argument gives it the transition probability from x to a complement is also less than two over root n because anything in a complement has distance bigger than d over two from x so it's exactly the same argument gives us this bound and now if I look at the bar of a the bar of t it's bigger than the distance p t x a minus p t a y a so it's bigger than one minus two over root n minus another two over root n right just by these inequalities so one minus four over root n so so you see that this is going to be close to one provided n is large so you know d of t is going to be bigger than half of that so half minus two over root n and so you see that for any epsilon less than half you have the if epsilon is strictly less than half and n is large enough we get this inequality okay so we'll start the next class with a proof of a varopoulos current inequality it the the shortest proof which is the original current proof is a surprising use of chebyshev polynomials and the the nice thing is it uses chebyshev polynomials but it doesn't you don't need to know any theorems about chebyshev polynomials just the definition of chebyshev polynomials does does the work it's quite remarkable proof I would say that so you know in in some 30 30 something years of doing mathematics this I would say is my favorite theorem it's not my theorem but the theorem I like the most because it's so useful and I'll show you several applications and the proof is so beautiful so you will see it in the first 10 15 minutes of the next lecture and then I'll show you one more application to random walks on groups so that so this was application was already described in varopoulos original paper that a random walk has positive on a group so think of an infinite group there's a version for finite groups as well it has positive speed if and only if the entropy grows linearly so the non trivial direction of that follows from this inequality okay so we'll continue tomorrow you said before that you could bound the next time by the expected time of the stationary time right yes yes yes it also works if it's almost stationary provided the chain is either lazy or continuous time