 OK, good morning. So is the microphone working? Or is it? OK, so I want to start today with the topic of the rate of escape and specifically the current varopoulos bound. So let me remind you what it says. It says it's an irreversible Markov chain. We have, OK, I don't need the 2 if I write probability of st bigger than the distance from x to y. And this, in turn, we bound it by twice. So we're going to focus on the first. Since the second is a standard turn of bound, you get just from looking at the expectation, expanding the homogenearity function. So we're going to focus on that. And so in the proof, I'm going to focus now on the k in the just to simplify notation. So we'll do two reductions. So one is, although this applies both to finite and countable chains, it's easy to reduce to the finite case. Because my graphs will be to have simple random work defined, we would want bound the graph to be locally finite. And in general, this is not just simple random work, but we're going to, this assumption can be removed. But we're going to assume that the graph G, which of allowed transitions, is locally finite. So every vertex is a finite degree. It may be unbounded, but it's finite. This can be removed, but most all examples we care about are locally finite. Then, so if we assume that, then we see that within distance t from x, there are only finitely many vertices. And so it doesn't matter if we think of the whole infinite graph or just of the finite graph, which is the t neighborhood of x, because the graph's vertices further away play no role in going from x to y in t steps. So we can reduce to the case to a finite state space. So then p is just a finite matrix. Now there is something I'm going to do, assume now, when we're going to generalize later, assume now that a pi is uniform on the state space. And we'll discuss the generalization later, but this just allows us to focus on the key things. So of course, if we have a regular graph, then pi is uniform. If it's simple, I'm walking a regular graph. And the examples of symmetric walks on Cayley graphs also pi is uniform. So those are some of the key examples. And then this, of course, has a simpler form. So this factor is not there. So we're going to need some basic facts about Trebyshev polynomials. We want to emphasize how this proof is both elementary and a bit mysterious. So I'm going to prove everything we need about Trebyshev polynomials, which is very basic. If we look at, so Trebyshev polynomials arise from looking at cosine function. So maybe a good starting point is the identity that cosine of k theta is, if you multiply that by cosine theta, this is cosine of k plus 1 theta plus cosine of k minus 1 theta. And there's a 2 missing somewhere. So you are all closer to high school than, or almost all are closer to high school than I am. So OK, maybe one exception. So you should remember this better than me. Now the way we're going to use it is, so the Trebyshev polynomial is defined by cosine of k theta is, so I'm going to write q. Usually they're denoted by either tk or qk. I'll write qk of cosine theta. And this identifies the polynomials qk. So in particular, q0 is just identically 1, q1 of z is just z. So I'm going to use z for cosine theta, although so z is going to be real. It won't be complex. It just will be standing for cosine theta. So q2 of z is, of course, 2z squared. And in general, a qk of z, so for any k positive, qk of z is a polynomial of degree k. 2z squared minus 1, thank you. Polynomial, thank you. So it's already the case of 2z squared minus 1. You already obtained from this identity. And in general, the identity allows you to verify this by induction because here is q. So if z is cosine theta, then this is qk plus 1 of z. And by induction, this is degree k minus 1 polynomial in z. And this is degree, this is qk of z. And you multiply by 2z and subtract the other one. So by induction, we see this fact. And you also see that the leading coefficient is the power of 2. We don't care about that. We do care a lot that it's exactly degree k, which we see from this identity. Now, the other fact we need about Chebyshev polynomial is a consequence of the binomial theorem. Namely, if you take z, again, z is cosine theta. And then let's write what z to the k is, or z to the t. So t would be a large integer corresponding to the time we care about. So let's write this as e to the i theta plus e to the minus i theta over 2 to the t. Now, this, of course, we can expand using binomial expansion. And what we will get, and I want to write, instead of writing the binomial coefficient, I want to write it directly in terms of simple random walk. And it's just this, so it's the probability that st equals k, sum from k, which is minus t to t, times e to the i k theta. Because, of course, you can get this for binomial expansion. But also, very pictorially, if you just write t factors like this, one after the other, then you want to multiply them out. Well, in the product, you have to each time choose. And we're going to take e to the plus i theta or e to the minus i theta. So this corresponds to the steps of the walk. So you have t factors, so it corresponds to t steps. And if you chose some pluses and some minuses, you add them up. And that will mean that you will get a factor e to the i k theta. And the number of choices that will lead to e to the i k theta is exactly the number of choices to make st equals k. And the normalization here you get from the two exactly corresponds to the one for walk. So you can see it kind of by hand, or just use the binomial expansion. Now, let's take real part of both sides here. So z is itself real, so taking real part doesn't change anything. So this whole sum must equal to its real part. And that is just some probability sum over k, probability of st equals k cosine k theta. So because cosine is an even function, it will be convenient to define also q minus k for negative things. It will just be defined to be q k. So this is just for convenience. So it's still consistent with the q k of z, q k of cosine theta equal cosine k theta. That's also true for, I want it also to be true for negative k, so that's fine. So we have this identity, which now we can write. Let's eliminate theta and write it in terms of z. So z to the t equals sum over k, probability of st equals k times q k of z. So this is now an identity of polynomials. z to the t, pure power of degree t equals the sum of polynomials of degree k with these coefficients coming from the random walk. Now, if we have an identity of polynomials, we can plug in any matrix we want into the identity because powers of the matrix also commute with each other. So this implies that we want to take the, so p is the transition matrix of the walk, the finite matrix. So p to the t equals the sum of p, p0 of sum over k, p0 of st equals k, q k of z. This is now an identity of two matrices. Again, this proof is supposed to be transparent. So if there's something confusing, please stop me. q k of p, good. So this is an identity of two matrices. And let's look at a particular entry. What entry do we care about? The entry at location x, y of this. So x is the starting point, y is some target. And again, q k of p is a matrix. And we take the x, y entry of this matrix. So if two matrices are the same, also the x, y entry is the same. Now remember that q k of p is a polynomial of degree k. So in this sum, suppose x and y have distance 10. And they look at k equals 7. What can you say about q k p of x, y? It's 0 because, right? So obviously, for k less than the distance of x, y, p to the k, x, y is 0. And all lower powers also are 0. So q k p. q k of p is just a combination of this for k and for lower degrees. So q k p of x, y is also 0 for k less than the distance. So in this sum, we can just sum over k, which is at least the distance from x, y. Thank you. Absolute value of k is at least the distance from x, y, p, 0, s, t equals k. So there's the positive and the negatives. And we can write this way or just take a factor of 2 and write for both. But I'll do this. So sum over all k so that absolute value of k of distance of x, y probability of s, t equals k. And here we have a q k p of x, y. OK, so now we want to say something about these terms when k is bigger than the distance of x, y. So what can you tell me about the eigenvalues of p? In what interval are they sitting? p is a Markov transition matrix. Where are they? Minus 1, 1. So lambda is in minus 1, 1. So what are the eigenvalues of q k of p? They just have the form q k of lambda, where lambda is an eigenvalue of p. And here we're in a very simple situation. Remember, we assumed that the pi is uniform. And when I stated the theorem yesterday, I assumed that it's a reversible Markov chain. So what this means is we are assuming now that because pi is uniform, p is symmetric. Pxy is Pyx. The generalization is p is self-adjoint with respect to pi. P is a self-adjoint operator in L2 of pi. But now we are pi is uniform, so p is just a symmetric matrix. So it can be diagonalized. And so in the right basis, p just looks like p is unitary equivalent to this diagonal matrix. And so if you take a diagonal matrix, so if you work in that basis and you apply a polynomial, like q k, then if you apply it to a diagonal matrix, what will happen? You'll just get q k of lambda 1, q k of lambda 2, and so on in the diagonal. And you can do all this in this basis of eigenvectors. So the eigenvectors are the same. The eigenvalues are mapped with the polynomial. So in general spectral theory, this is sometimes called functional calculus, here we're working with matrices. Everything is very simple and elementary. So these are the eigenvalues. What can you tell me about where they are sitting? So in lambda is in minus 1. So this is a polynomial. If you look at this polynomial, like 2z squared minus 1, where does it map minus 1, 1, 2? I want to bound how big could this be? Minus 1, 1. But if you write down these polynomials, it's not so obvious from their formula. Well, for 2z squared minus 1, it's still obvious. But if I write q 10, some big polynomial, it's not obvious. If you go back to the definition, if lambda is in minus 1, 1, it is cosine theta for some theta. And qk of cosine theta is another cosine of something else. So it's certainly in minus 1, 1. So we're using only two things about the Chebyshev polynomials, really, or three things. One is they're exactly degree k. Second is this binomial identity that we have here. And the third is that it maps minus 1, 1, 2. So this is also in minus 1, 1. And because it's a symmetric matrix, the largest eigenvalue in absolute value is the norm of the matrix. So it means this matrix has norm at most 1, fact 1. So if I look at the matrix qk of p, if I look at any entry of that, this is just multiplying this matrix by two unit vectors, right? The unit vector, which is the dirac at x, and the unit vector is dirac at y, I multiply by 1 on the left, 1 on the right. These are both unit vectors in L2. So it means that this number is always at most 1, because I just took a matrix of norm 1 and multiplied it by 2 on the left and the right by two vectors of norm 1. This is just in little L2 with no weights now. So now we're done, because now let's take, so this is a positive number. This is a positive number, so I can put absolute value here. Put absolute value here. And then this is less than, well, all these numbers are at most 1, so it's less than sum over k, absolute value k bigger than dxy, p0 st equals k. But if you look, this is exactly this probability, right? Probability that st is bigger than the distance. If you just write it out to the sum, it's exactly the sum we have written there. So that's the whole proof. Questions? Yes? Pi? Very little. So in that case, okay, so qk is still a norm 1 operator, so for general pi, so the norm of qk of p, so that argument still works, but everything, now you have to work really in L2 of pi. So p is a self-adjoint operator in the scalar product given by with weights pi. So this norm is still 1, because it's still true that for a self-adjoint operator, its norm is given by the largest eigenvalue. And now, but if you look at the Dirac measure at x, if you want to compute its norm, so now the norm is in L2 of pi, right? So the norm squared is just pi of x, because you have this weight, pi. So, right? So then, if you look at the, if you want to understand what qkp of xy is, well, it's no longer, so we wanted to say before that it's just taking delta x and multiplying by qk of p. So this is the vector, which is Dirac at x, so it's 1 at x and 0 everywhere else. And so before we had this thing, but now, because we're doing a scalar product with respect to pi, when we take the inner product, we have this extra pi of x. So now the correct identity is pi of x times this equals this inner product. Now we can still use Cauchy-Schwarz to bound this above by the norm of delta x times the norm of the operator, which is at most 1, times the norm of delta y. And remember that all these norms are with respect to R in L2 of pi, so this product is the square root of pi x, pi y. OK, now if you divide both sides by pi x, you'll get that factor root pi y over pi x. That's the whole change. So now we have proof of this. So we have the diameter squared over log n, lower bound for mixing time. Here's one more exercise. Take g and expander on n nodes. Maybe this would be g0, an expander on n nodes. And subdivide each edge n times. So each edge, I mean, into n edges. So you can think of subdivision or stretching. So we have this original expander. So g0 is, say, to be concrete, a three regular expander, or d, so bound a fixed degree expander. So the original graph is a three degree expander. And now we take every edge and expand it. So check that for this graph, and this yields the graph g, the resulting graph g has about n squared nodes. Because the original g0 had n, and every edge got expanded, subdivided into n pieces. So we get n squared nodes in the resulting g. And then check that for this g. Indeed, it's true that the mixing time of g is up to constant, the diameter squared of g over log n. So this means up to some absolute constant. So you don't have to check the lower bound, because we proved it holds for all graphs. And I should say to avoid periodicity, so here I mean by T mix of g, I really mean of lazy, simple random walk on g. So we have a lower bound. And for expanders, the mixing time is log n. And you can think how much is the mixing slow down by the subdivision. And you'll find that this is that. So we have some verification to do. But so this kind of bound is sharp. And it's a little surprising in the sense that initially, if you don't think carefully of examples, you might think that if a graph has polynomial growth, like this one, then the mixing time should be at least quadratic in the radius or in the diameter. But this shows that the mixing time can be a little faster than the diameter squared, but not a lot faster at most log n. So now any more questions? So I want to illustrate another application of Veropoulos-Karn in a different direction for random walks on groups. Again, this is the application to groups. So I'm going to present a classical argument going back to Kreminovich, Verschik, and Veropoulos. Since Anna Erschler is here, I should mention there are a lot more sophisticated arguments of this type that she and others have generated. So the application to groups, we will be interested now in a simple random walk on a Cayley graph group G. But G is finitely generated by some finite set of generators S, which is symmetric. So with every generator, we also have the inverse. And then remember that the Cayley graph has x and neighbor of y if and only if x is in y times S. So this is the right Cayley graph, and we're going to do the right random walk. You just have to make some choice. So we connect two things if they are obtained by multiplying by a generator. And then we have a simple random walk, which just takes a group element and multiplies by a random generator. That would be simple random walk. And everything can be done much more generally, but let's stick to the simplest example of simple random walk. And then there's an equivalence of a, there are many interesting equivalences. The one I want to focus on involves speed and entropy. So let me recall what is entropy. So for any measure mu, which is finitely supported, the entropy of mu is the sum over x. And this is the sum over all x that have positive mass. So that's the entropy of a measure mu. And if we have a, if x is a random variable, then we will sometimes write h of x to mean h of the distribution or the law of x. So this is just shorthand. So we also need entropy of a pair of random variables h, h, x, y. It's just defined as the entropy of the joint distribution, right? x, y have a joint distribution. So this is just the entropy of that. And so I'll write it in this abbreviated form, right? So this is the joint distribution now mu of the two variables x, y. So you can define conditional entropy. So h of x given y is the sum over x and y of, so h of x given y, so this is not a random variable. This is actually already averaged. So this is a number obtained by summing maybe the probability of y equals little y times, if I condition on y to be equal little y, then I have some conditional distribution of x and I just look at the entropy of this. And one thing which is easy to verify from the formula, this is the same as the entropy of the pair x, y minus the entropy of y. The basic facts about entropy, in particular, this difference is always non-negative because it's an average of these non-negative quantities. OK, so one way to measure the growth of a random work on the group is to ask how far it is from the starting point. Another is to ask how spread out is a distribution. So and that will be done using entropy. Maybe one more fact I want to point out is if mu is supported on n points, then h of mu is at most log n. And of course, you have equality of mu is the uniform distribution because if I write h of mu minus mu of x log of n over, OK, so this is true because mu is a probability distribution. And then you can write this is less than the sum of, use the fact that log of t is less than t minus 1, so n over mu x minus 1. Sorry, what? 1 over n. Yeah, what did I do wrong? Yeah, log h of mu minus, no, right, because this is a minus, thank you. So I'm going to put them in the bottom. And since we're summing over n things, this will be 1 minus 1, which is 0, thank you. OK, so one way to measure the spread of a random work is to look at the entropy of xn. So one more remark we need is that the entropy of x. So if xn is the random work on the group, then the entropy of xn plus n, I want to bound it from above. So this is less than entropy of the pair, which I can write as the entropy of xn plus the entropy of xn plus n. I'll give an xn. But if I condition on where they're here, now we want to use something about the group structure. So if I condition on where the random work is at time n, xn, and then look at where is xn plus n, well, it's just multiplying. So xn plus n is just obtained from xn by multiplying by an independent copy of xn. This is just because it's a random work on the group. So this is the same as the entropy of xn plus the entropy of this independent copy. So if I know where I am at time n, the uncertainty about where I'll be at time n plus n is just the same as the uncertainty of xn. You easily see this formally for this identity. And so you get the subadditivity of entropy. So this means that there exists the following limit, which is defined as the limit of h of xn plus n divided by n. Because for any subadditive sequence, this is a positive subadditive sequence, so always it's an elementary exercise that always if subadditivity implies linear, that this limit must exist. And this is known as the asymptotic entropy or the averse entropy of the random work. Can you hear the speech of xn? Defined by starting from any vertex. So sorry, I didn't specify. So xn, the random work, is defined by starting from the identity. So xn is just, sorry, I didn't say that. So xn can be written as the identity, which of course I could omit, times a product, g1, g2, up to gn, where these are in our case, these are IID uniform on the generating set S. Now, so that's one important quantity, the asymptotic entropy or the averse entropy. Another important quantity is, which is most natural for probabilists, is the rate of escape. So I'm going to write xn as a distance from the identity to xn, so this is the notation we'll use. And then the asymptotic speed is defined as the limit of, and again, this limit exists because, again, expectation of xm plus n is less than, just because of writing, again, the same formula, xm plus n as a product, like this, and using the triangle inequality. So there is this limit, which is the asymptotic speed. And here is the theorem we want to state, which is that the speed is positive, if and only if the asymptotic entropy is positive. We're going to get more quantitative. So one direction here is immediate for Cayley graphs, namely, so if the speed is 0, certainly the entropy will be 0. So this direction is easy. So let's write dn for the absolute value of xn, then h of xn can be written as h of xn, which is dn. All I need is the inequality, but in fact, these are equal. Now, the distance in n steps is at most n. So this is at most n plus 1 possibilities for the distance. So this is most log of n plus 1. Now, what is the distance of xn given dn? So it's a probability that dn equals k times the entropy of xn given that the distance is k. But the distance k, how many vertices do we have at most size of s to the k? So we can bound by the log of the size of s to the k. So if we take h of xn over n, it's bounded by... So when we take the log here, we can get a log s term out. So if you take this log s out, what you get is a sum probability xn equals k times k, which is just the expected distance. So this is just going to give us the expected distance. And here we divide it by n, so we divide it by n. So this immediately gives you the trivial direction. If the limit of this is positive, and of course, the limit speed is positive. So this is easy. The other direction is not intuitively as clear. Because so in the other direction, you assume you are zooming off at a positive speed. But why can't you be concentrated near some location on some sub-exponential? So it says, if you're moving fast, then you must be spread out. It's not obvious at all. But it is easy consequence from Verlopoulos-Kahn. We can start here. So in the other direction, we want an inequality bounding the entropy from below. So let's write the entropy. So it's sum, it's probability starting, again, we start from the identity, xn equals x, log 1 over this probability. Now we want to apply Verlopoulos-Kahn here. And the key is to apply it only to one of the terms. So we're going to keep this as is. But here we want to apply Verlopoulos-Kahn. So this is bigger than, OK, we have the sum over x. And then Verlopoulos-Kahn gives us, say, upper bound on the probability. So it's a lower bound on this log. And what is the lower bound? So we get a distance squared from e to xn divided by the time, which was n, 2n, minus log 2. So this is just using pn ex is, at most, 2 e to the minus d squared xy over 2n, right? So a d squared of ex, right? So we just took this, took logarithms, or took 1 over and took logarithms, and we get this, OK? Everyone agree? OK. Now, but what is this? What we got here is that the entropy of xn is at least, so let's see, I had the 2n. So this is expected, this is expectation of the distance squared. So I can write this as absolute value of xn squared divided by 2n minus log 2, which, of course, is using Cauchy-Schwarz, I can use the expectation of absolute value of xn quantity squared divided by 2n minus log 2. OK, now you see, so the direction we're proving now is that if the speed is positive, if the limit of, so let's divide by n, so hxn over n is bigger than exn over n squared minus log 2 over n. OK, so now it's clear, if this has a positive limit, this will be the speed, so let's take a limit in these inequalities, we get that the entropy is bigger than the asymptotic speed squared, and I was omitting some 2 over 2. OK, so this proves the other inequality. So if the walk moves fast, it has to spread out. One other consequence that's worth pointing out of Veroppolo's current is that, so any questions or comments on this? So one other corollary is if a graph G has a polynomial growth, meaning that if I look at the ball in the standard graph metric, this is bounded by some constant r to some power, I'm out of, say, some power b. So this is true for any r. If G has polynomial growth, then the walk can escape at most diffusively. So let's get it directly from this inequality we have here. So the expected distance squared is at most what we got from this inequality. OK, so if we just take that identity and multiply it by 2n, so this is a linear term, and this is a term of order n log n. So you see the expected distance squared in the graph of polynomial growth grows at most, n log n. Now, this is true if we have a regular graph. If we have, so this was if G is regular, but it still works with different constants in the general case of a simple graph with no multiple edges, if you just plug in the full varopolis current, you'll just change the constants. So even without regularity, it's still true that you have the expected distance squared grows at most n log n. So in particular, so it's true for any graph of polynomial growth, it's particularly true if you have a subgraph of Zd because that has polynomial growth. So just a little story about that. In 85, Harry Kesten proved by a different method, the martingale method, a variant of the varopolis current inequality. In fact, almost the same as varopolis had proved, but with a different proof. So he also obtained such an inequality. And he asked for graphs of polynomial growth, specifically subgraphs of Zd. Did you really need this log? So it's a very natural question. And I must say that every year or two, someone asks me this question. So luckily, I'm prepared with the answer. So if you have a subgraph of Zd, can you go faster on a subgraph than in Zd? Because you know that in Zd, a random walk is diffusive. So the expected distance squared grows at most linearly in time. Can it grow faster in a subgraph? Usually when you pass to a subgraph, it slows things down. But the answer is it can grow faster, and this type of inequality is sharp. And the example was provided in 88 by Barlow and Perkins. It's a similar idea to this stretched expander I showed you before. But that one is not fitting into Zd. So you basically take a tree and stretch it more and more as you move out. And then they showed you can embed it even in Z2. So even though Z2 is a recurrent graph, so any subgraph is recurrent, you can still find a subtree inside Z2 where the random walk will move faster than on Z2. In the sense of expectation, again, it's recurrent. So it returns to the starting point many times. But if you look at the expected distance or expected distance squared, it can grow faster. OK, any questions or comments? Yes. So for the speed there, we can just remove the expectation to take the formula. That's right. So for the speed, so there is a sub additive ergodic theorem that says that if you take the limit of xn over n without expectation, it almost surely exists and equals the same L. And there's a similar thing for entropy. So entropy can be seen as the expectation of the information. So if you take a log of 1 over the probability to go in n steps from the identity to xn, so the entropy is just the expectation of this quantity. And so there's an almost sure limit. This will converge to this average entropy H. I should say that the equivalence we just saw between positive entropy and positive, positive asymptotic entropy and positive speed extends much further. It's also equivalent to having non-constant bounded harmonic functions on the group and also equivalent to having a non-trivial tail for the random work on the group. So there's four natural questions which are all equivalent. Positive speed, positive entropy, it's called non-trivial Poisson boundary, which is an existence of harmonic functions that are bounded and non-constant. And finally, trivial tail. So if we look at the sigma field generated by the walk from time, non-trivial tail. So if we look at the sigma field generated by the walk from time n plus 1 on an intersect over n, that sigma field is non-trivial. All these four things are equivalent. So this was understood in the 80s. And you can find, again, one exposition of that in my book with Russ and other expositions in many papers of Khaminovich that discussed this topic and, again, extensions in works of Anna and others. It might be good for you to mention, gradually, three, what you mentioned is true for any random work, not only for simple random work. Thank you. As for the implication, you explained any positive in place, each positive. It's true for any finding supported, not finding supported. Any L1. To have a finding first moment, this is a result of Kassel and the Trapplerite. And their proof doesn't use at all the carton of aeropolis, because if the support is not finite, of course, you can't have a carton. That's right. That's excellent. And one direction of this is I was planning to give as an exercise. The other direction is deeper. So to thank you, Anna. So as Anna is saying, the equivalent H positive to a speed positive. This equivalence extends to random walks on the group with random walks that with the long range jumps, as long as the expectation of one step is finite. So we were discussing walks on a caligraph, where you walk to the neighbor, but you can jump far away, as long as the expected distance of the jump is finite. Then first of all, both of these limits exist. And you still have the equivalence. So this is, in general, too hard for an exercise. But there is an exposition of this. So this proof was simplified. And I'll forget all the citations. This equivalence was first proved in this generality by Carlson and Léder-Pierre. And there's a simplified proof. I'm now forgetting the author of the simplified proof. I'll remind myself. And we do have an exposition of the simplified proof in the book with Ross of this general equivalence. But I want to just give you as an exercise the easy part here. So what is the exercise? So one is, so part one is show that finite speed implied, finite exposition x1, finite, implies that H of x1 is finite. So if the expected distance you go to is finite, then also the entropy is, in one step, is finite. And then more generally, show that the, so what we obtained in this Cayley-Graft case was that the entropy was bounded above by a constant multiple of the speed. So show the same thing in general. So little h is bounded by some constant that I'm going to allow to depend on the walk times the asymptotic speed. So note the different nature of the upper and the lower bound. The upper bound is linear. Entropy is the most linear in the speed. The lower bound here is quadratic. The constant is just the growth of your group, right? Yes. But I'm leaving this. So part of the exercise is to identify the constant. So this direction is doable. The other direction, bounding, so extending this one again requires different ideas since the Veropoulos coin is no longer, no longer can be used. So this is a different argument by Carlson de Drupier. But I'm just giving as an exercise is the easier direction here. OK, how are we doing on time? So now I want to talk a little bit about lower bounds on speed since here, so one of the topics of the course is rate of escape. We talked here about upper bounds. What about lower bounds? So lower bounds on rate of escape, random walks on groups, again, finite and infinite. So suppose G is a deregular calligraph. If G is infinite, then the expectation of the distance squared is a sleast n over d. And I should say, yeah. So G is deregular. So D is the degree or the number of generators. If G is finite, then the expectation of xn squared is at least n over 2d for n, which is less than the relaxation time. This constant, half can be improved, but I don't care. So note that in the finite case, so this inequality, expected distance squared grows at least linearly. So it holds in the infinite case. Obviously, it's not going to hold in the finite case forever because this remains bounded by the diameter squared of the group, and this is growing. So claiming it's true until the relaxation time, remember the relaxation time is 1 over the spectral gap of the random walk. So one open question, does this hold in the finite case, can you push this further beyond the relaxation time? Does it hold until the mixing time? So here are some questions. What about until the mixing time? And I don't care about the constant 2d. Can you prove something like that until the mixing time? Or I think how long could this happen maybe until the diameter squared? So I find this a very interesting question. Both it would have great consequences if we understand this, but also philosophically, you're walking in a Cayley graph, and maybe you are getting a report of how far you are from the starting point, but you don't know really is the Cayley graph finite or infinite. So if you walk for long enough, eventually you'll see the walls closing in on you, and you'll feel claustrophobic, and you won't be able to keep escaping. But how long is that time? So certainly by the time your distance is the order of the diameter, you have to feel it. But will you feel it much earlier? So how far can you push that lower bound? So the arguments we know only push until the relaxation time, which is a little paradoxical because if you take a graph like an expander where the relaxation time is just order 1, this inequality says nothing, but actually on the expander, you move quite fast until you reach the diameter. So this equality is kind of complementary to the ones coming from expansion. So it's great to have Anna here because these inequalities are actually related coming to an insight she had about using embeddings to bound speed. And I'll say a little more about that. So there is another. So let's look at the groups G of exponential growth. So an example that is the lamplighter, a nice example to think about is the lamplighter on ZD. So let me remind you about the lamplighter group. So we have a base. You saw it, the cycle in my lecture and you heard about it in Charles' lecture as well. But you could have any base. So for instance, if the base is ZD, you have at every site of ZD a lamp that can be on or off, one or zero, only finitely many lamps are on. And you have a marker, a walker. And what the marker can do, it can flip the lamp where he is or move to some neighboring node. So these all have exponential growth. Very easy to see because by walking to this, you can walk to distance n over 2 and turn on n over 2 lamps. So these are good examples. But for all groups of exponential growth, Veropoulos proved a nice bound, the probability to go from the identity to x or to any or back to the identity decays like at most a constant, e to the minus constant n to the 1 third. Now this is sharp for x equals e and the lamplighter on Z. Quick explanation why this is sharp. If I'm doing lamplighter walk on Z and I want to return, what's the identity of the lamplighter walk? It's just the marker is at the origin and all the lamps are off. So I start there and I want to return there in n steps. What's the most efficient way to do it? The most efficient way is to have the marker walk inside some fixed interval of length r, say or 2r, say from minus rr, and only turn on lamps there. And then at the end of the interval, he will come back, turn off all the lamps, and come back to the identity. So what bound does that give us that gives us pn of ee? It's going to be at least. What is the chance of staying within an interval of length r? Well, at time r squared, it comes for free with constant probability. If I want to do it for time n, which is larger than r squared, it will cost me some n over r squared, because I divide time into rounds of length r squared. In each round, I have a constant probability. So this is the probability of staying in the interval. So the marker that is walking. And at the end, as it's walking, it turns on some lamps. At the end, we want it to turn off all the lamps there and go back to the identity. This we can easily get at a cost, which is exponentially in r. I wait. So I do this for, say, n minus 2r steps, or 5r steps. And then the last 5r steps are devoted to carefully walking, finding all the on lamps, turning them off, and walking back to the origin. If you just force the walk to do that in just 5r steps, it can be done. And the cost will be exponential in r, so to do that. Because I just specify for 5r steps what I want the walk to do, this costs me exponential in r. Is this clear as a lower bound? Again, if something is unclear, then. So we want the walk to stay here. Each round is of length r squared. It's a constant probability. I want to do it for n over r squared rounds. And then I want the last r steps to come back to the identity. And if you look at this, you want to maximize this, so you will balance the two terms. So you'll choose r, which is n to the 1 third, to balance these two terms. And then you'll get e to the minus, different constant, n to the 1 third. So this shows on this lampliter group such a lower bound. And a Varoupolis theorem tells you that for groups of exponential growth, this is sharp. What you can easily deduce from this is that the expected xn is going to be bigger than some constant times n to the 1 third. Because by time n, if you look at the ball of radius, tiny constant times n to the 1 third, each point there has very small probability. So take the probability of xn. You just basically multiply this probability by the volume, so e to the minus c n to the 1 third, times the volume if you take the ball around the identity of radius, some small constant alpha times n to the 1 third. If you make this alpha small enough, this ball will grow slower than this factor. So this probability will be small, but this is an upper bound for the probability you are in this ball. So it means that xn is going to be outside the ball. So using heat kernel bounds, this is kind of the best you can get. But using embedding ideas, you get something better. So here I expressed it for the second moment, but one can actually get it also for the first moment that the expectation of xn is at least constant root n. So here one has to go against Cauchy-Schwarz, so there's some argument needed there. But we still don't know a way to get this from heat kernel type bounds, from transition probability bounds, because this bound doesn't depend on where x is compared to the identity. So if we would want to get heat kernel bounds to improve this, we would need what's called better off diagonal bounds, and these are not generally available. So questions? Yes? So this bound is sharp for its monetary quotes, but not for non-obminable groups? Right. So for non-obminable groups, you always have positive speed. So zoom off with positive speed. So these questions I'm focusing now are interesting for amenable groups. So but one strange thing is this inequality, which I didn't say. So actually, thanks for the question. So this inequality is true for the infinite case. It's true for g amenable. For g non-amenable, we have a stronger inequality of bigger than n squared times constant, but only for large enough n. So it's actually open if this inequality for infinite groups holds for all groups at all times. So it's a very strange situation. We know it's true for amenable groups and even in the bigger class of non-Cashden groups, but we don't know it for general groups at all times. But if the group is non-obminable, if you just wait long enough, it will be expected this n squared will grow quadratically. So it didn't write that case. But so what's true in a general group is this is always true for n large enough. So what I wrote here, it's always true for n large enough. But for amenable groups, it's actually true for all n, from d is the degree, d is the degree, then the size of the generating set. And this is I'm talking about the special case, just simple random work. If the group is non-amenable, but without Cashden, you don't have the same way. Yeah, the same thing I said, the same thing extends to non-Cashden also, even if it's for all times. So I'm not defining this now. So these kind of proofs are based on an embedding idea pointed out by Anna, based on the MOOC theorem. And the more direct proof was given later. So the embedding idea given by Anna. That's not really my idea, I should say. It's very kind of you to say. Well, I think it was your idea to, not the embedding, but the using it to derive speed was your idea. Oh, really, I have explained this once to Ballant here. Ballant here? OK. So he's explaining how to use it to extract the square root, what I did. Right, right, right. So that's right. So Anna first observed that how the MOOC embedding theorem gives a lower bound on the expected distance squared, and then, which is what all I wrote up there. And then Ballant noted how you can use then a martingale fact to get the lower bound on the expected distance as well. And then there is a, so Ballant should be mentioned here as well. And then there is a paper with James Lee that gives more direct proofs. And an exposition of this story is also in the book with Russ. I mean, not in the most general case. And your question is for any group you have exactly the same thing. So my question is. And divided by D or? Yes, exactly. With exactly no loss in constants. Yeah. For all n, is it true that for all n, and for all kilograms who have this inequality, we don't know. We do know it, of course, for large enough n. And we do know it for all n if it's non-Cashtan. So only Cashtan groups like SL3Z remain unknown for this. OK, so all right. So I do want to show you some proofs of this. So I will start now and go a little bit into the exercise session, and then we'll switch to actual exercises. So the exercise session will be after later this afternoon. But let me start in explaining the embedding ideas behind this proof. So where is the? So here is a lemma from this. So take g, which is finite or infinite, and let f be a real function in L2 of g. So if we're in a finite group, I just mean a function, real-valued function on g. If the infinite group were assuming that it's in L2, with respect to just the sum of squares, no weights. So let f in L2 of g, and let's define the Dirichlet form of f, can be written as sum pn xy. So this is the Dirichlet form of f with respect to the n-step Markov chain. And you can also, so if you expand the square here, so you'll get what you want to have. So this is this inner product in L2 of g. And this you can, this is a classical tour presentation of the Dirichlet form. So if you take here and expand the square, if you sum this, you write this, so you get fx squared plus fy squared minus 2fxy. So if you sum the terms with the fx squared, you'll just get f times f. If you sum the terms with the fy squared, again you'll get f times f, because we are in a symmetric walk. So if you take this sum with x or with y, you'll get the same thing. So that's how you get the i times ff. And then the last term, the 2fxy, so the 2 cancels with this half. And then when you sum this with fxy, and because of the pn xy, you'll just get p to the nff. So this is the tour presentation of the Dirichlet form. And then the lemma is that the expected distance squared from the identity is always bigger than 1 over the degree times d1 f over d1 of f. Yes, so this is for random walk on a group. So g is not just a general graph. So this is for random walks on groups. So this is all in this setting, a deregular calligraph. And in the last minute, I want to explain at least. So this will be the lemma. But let me explain how in the finite case it gives the inequality we want. So if g is finite, then there is you take an eigenfunction for p corresponding to the second eigenvalue, so lambda is lambda 2. So this will be the right function to plug in. So what is dn of f in this case? Let's take f to be normalized in L2. Why not? So then this will just give you 1 minus lambda to the n. So this is just 1 over d, 1 minus lambda to the n over 1 minus lambda. So the lemma holds for any choice of f as long as it's not constant. So the denominator is not 0. So I didn't say f is not constant to remember the time in France. So f is not constant. So we're assuming, in other words, d1 is not 0. So we have this, which we can write as the sum 1 over d times the sum of lambda to the j. But in the finite case, we assume that lambda is, we assume that n is less than t rel, less than 1 over 1 minus lambda. So that's, so this means that this is bigger than 1 over d sum 1 minus 1 over n to the j. And this is in, so lambda is bigger than 1 over n. And now we're completely elementary. But in particular, this is bigger. So I won't give the sharpest arithmetic calculation. But this is bigger than 1 minus j over n, which is n bigger than, so if you sum these things, it will be bigger than n over 2d. You can get the constants a little better, but it's good enough. OK, so that's, so Alex, so in the beginning of the exercise session, I'm going to steal some of it to explain just the proof of this, of this lemma. And then we'll have a complete proof, at least for the finite case of the inequality. And then we'll discuss exercises. Thank you. Any questions comments? I have a question, but the fact that the bound is better for each knee, don't you have the same bound, at least for certain amount of time in finite case? You mean, so you're worried about the 2. You can improve the 2, but I don't know actually how to improve it to 1. So you can replace it by 1 minus 1 over e, or it's in the fact. Even for shorter time frame, for shorter time frame? The shorter the time, the better the constant. But I can't actually get it to 1. But in this connection, I do want to mention one more exercise, sorry, since you asked. So here's an exercise. Show that for g, which is a Cayley graph, the relaxation time is at most a constant times, so let me write a specific constant 5, you can do better, times the diameter squared times the degree. So again, it says some diffusive upper bound. The relaxation time is at most this. This is true for any Cayley graph, and more generally, it's true for any transitive graph. And this is very classical, I think, first proved by Babay in 91. So I should give credit this type of bound. And many other, many proofs exist. So we really know it's true. But it's an open question whether the same holds for t mixed instead of t rel. And again, I don't care if you replace the 5 by a million, but a constant times diameter squared times degree, it's a conjecture of mine that this is always an upper bound for mixing time in any Cayley graph or any transitive graph. And it's open for mixing time. It's classical for relaxation time. OK, thank you.