 So this is the third and last lecture. And in today's talk, I'd like to cover two items. One is the item that we began last time, which is the second proof for the result on the cut-off for random regular graphs, the one that uses a spectral analysis. And that's the one that's going to also carry two settings where we don't have randomness. So remember that diagram that we had at the beginning where we had the first proof that heavily relied on the randomness of the graph and on regularity, then we'll have a second proof that will not rely on randomness anymore, but rather on the spectral properties that we understand very well on the random regular graph, but still uses regularity very crucially. And then in the second part of today's talk, I'll address what happens when you don't have a regular graph. Let's say it's random, but now let's say half of the degrees are 3 and half are 4. And then things are different. OK, so also because this is the last lecture and last time we did things very slowly and carefully, but we only have this one last lecture to go. So today I'll try to give you the outline of arguments instead of spelling everything out. But feel free to come along in the break if you want to see the details of various statements. OK, so as I promised yesterday, we said a few things very quickly towards the end of the lecture. I want to just mention them a little more formally. So here's a proposition. So form P equals 2. So I'll remind you first of all, what is it that we're after? Well, this picture above, I'm going to shut the lights when we get to it very, very soon. But this picture above is a Ramanujan graph, and you might not remember what that is. But a Ramanujan graph is a graph where the bound that you have on all the non-trivial eigenvalues is essentially the best one possible. So you want to take your adjacency matrix before you divide it by D and turn it into a transition kernel of a simple random walk. Just take your adjacency matrix and want to somehow confine all the eigenvalues into the smallest possible interval around 0 that you can in magnitude. And it turns out that 2 squared d minus 1 is the magic number. You can't do any better than that. So if you have any sequence of graphs, deregular graphs going to infinity, then you can say that the absolute value of lambda is going to be 2 root d minus 1 minus constant over the diameter. And this is what Shar was referring to as the Alonbo-Pana theorem, and he had discussed that theorem in more general context yesterday and in this course in general this week. So 2 root d minus 1 is the best possible one. And if it so happens that all the eigenvalues satisfy that they are at most except the first one, at most this number in magnitude, then, but really, this number, not plus some epsilon plus if it were of 1 error, really at most this number, then you call the graph a Ramanujan graph. That's all. OK, so if you'd like to think of it differently, it's a deregular graph where the spectral gap, the absolute spectral gap, you also take the most negative eigenvalue in mind, for instance, is the best possible one. And as we saw from Charles' talks, g and d, the random deregular graph, is almost Ramanujan. It's weakly Ramanujan. It's 2 root d minus 1 plus a little of 1 error. And there are precise and actually fascinating conjectures for the distribution of the second largest eigenvalue. And in particular, it is believed that with uniformly bounded away from 0 and 1 probability, a random deregular graph will be Ramanujan. But with probability, I don't know, 2 thirds. But that is far from the present technology's abilities to prove it. OK, some various constructions randomized and non-randomized for Ramanujan graphs. And in order to deal with this object, as I mentioned yesterday, that's what I was getting to here. You have a random graph. And suppose that I tell you, well, forget about the random graph. You just want to use the fact that you have some information about the spectrum. And you want to explore that in order to control the L1 mixing time. So the first thing that one would try would be to say, I don't know about L1 mixing, but L2 is tightly related to the eigenvalues. I mean, if you know the eigenfunctions as well, then it's an equality. So let's see what we can do with the eigenvalues. And maybe L2 will bound us L1 with some luck it's going to be a tight bound, a bound that we can realize using a lower bound on L1. Then we're done. This is the way many proofs for L1 cutoff actually were obtained. So using precise representations, complete representations of all the eigenfunctions and all the eigenvectors going through L2 to bound L1 and providing some bound that matched using some elementary arguments. So for p equals 2, what can we say? I'll write the definition of what we mean by the distance according to L2Lp norm. So this would be just starting from some point x. You can just say that this is just you measuring yourself, measuring the relative density with respect to pi. Your station is solution p is as usual the transition kernel. And you're measuring it with respect to maybe this is written in small font, with respect to Lp. So this is the distance according to Lp at time t. And as usual, we'll say that t mix according to Lp to within epsilon is the minimum t such that dp without any x is at most epsilon over t, where this guy is the worst case dp. So the worst starting position. This is the standard definition. If you plug in p equals 1, then integrating over pi cancels this pi and multiplies minus pi here, and you get just the L1. Or equivalently, twice the total variation mixing. So L1 is essentially a total variation distance is the case p equals 1. And we're just going to use the p equals 2 to bound the p equals 1 case, ideally. So here's the proposition for every epsilon larger than 0. So on g, it is g and d. Simple random walk with high probability has. So for every epsilon larger than 0, you know that t mix of L2 of epsilon is equal to 1 half plus little of 1 log where rho is this magic number from above. I mean, it's the bound that we have on the eigenvalues divided by d because this is a normalization to make the adjacency matrix into a transition for simple random walk. And what I mentioned yesterday very quickly is that this inequality, so just a lower bound, so this as a lower bound applies to any d-regular graph. Any d-regular graph, so generally, you can say that t mix in Lp for p that is between 2 and infinity, t mix of epsilon will always be at least p minus 1 over p, log 1 over rho, and minus some order log log n. So this is the thermonistic statement. And the proposition is telling us that actually these random regular graphs, or these Ramanujan graphs in general, because the only thing that we need in order, so this is kind of a red herring, this on G and D, what the proposition actually needs is just looks at the spectral. So if you have a Ramanujan graph or a weakly Ramanujan graph, in which case you have less control over your little o here, but what we don't care about that for now, then Ramanujan graphs actually are the fastest possible for simple random walk for any Lp. And here I wrote p that is at least 2, going all the way to infinity. And we know, at least from what we've seen in the past two classes, that G and D was the fastest one for L1. So in fact, this will be true for any p from 1 to infinity. So these are kind of funny objects. OK, the proof of this lower bound, actually also the upper bound is immediate. The proof of the lower bound is very short. It would be good to kind of get us into gear before we discuss how the actual proof of Ramanujan graphs for p equals 1 works. So I'm going to do it very quickly. So here's the proof. And the lower bound is essentially exploiting the same old connection between the cover tree and G, which is something that we've seen earlier in this course and also in Charles' talks, goes as follows. So for the lower bound, this is what I'm going to start with. And I'm going to start doing it just for p equals 2. So any deregular graph, p equals 2. So what you can say is for every x in your graph, you want to understand what sum over y of pt xy squared is. And the reason is that pt x dot with respect to l2 of pi squared, well, what is that? So I'll write it like this. We have a sum over this. Sum over x of mu of x. This is, I'll write it without writing it in terms of p to the t. I'll just write something that's true for any distribution mu. So for any distribution, for every probability distribution mu, if you just write this, so the relative density with respect to pi squared, this is my d pi. This is a discrete space. So this is what we have here. When I write mu, we place p to the t by mu. This, if we open it up, we get what? We get the square minus 1, because the cross term, again, the pi vanishes. So we have a minus 2 plus 1, which is a minus 1. Yes, fancy, right? So with this simple observation, we see that this is the guy that we care about. But our pi, we also know what our pi is. It will just be the uniform distribution. So let's look at what summation over x of mu squared is. So this is what I'm writing here. So summation over this x is actually this y. This is fine. So summation over y, because this is now becoming the source vertex for the walk. So pt xy squared, what is that? That is nothing but p to t of xx. So these are the wonders of reversibility, which, so this just measures the probability of returning to the origin. We are in G. We are still not on the tree. Returning to the origin in 2t steps. So now we can lower bound it by q to t of o o, where this is random walk on the tree, using the exact same connection. So it is just a lower bound. But definitely, because the tree has the same labels for many vertices in G. So we really don't need to go back to the root in order to revisit the same vertex multiple times. But certainly, if we revisit the tree, the tree is root, then we do go back to the origin. So q transition kernel, transition kernel of r. We used this notation in the course, the script x on td. But what is this guy? But this guy, this is nothing but the probability. And you can now say that my o also looks like a 0. But this is a 0. So the probability starting from 0, that st equals 0, where s is the appropriate one-dimensional biased reflected at 0 random walk. Because now we are just forgetting that this is a tree. And we just have this biased random walk that goes to the right with probability d minus 1 over d into the left with probability 1 over d. And we reflect it at 0, because we don't go to the negative. So this is nothing but the height, the height of xt. So this is known, this guy. So this is equal to rho squared. I'm writing a precise formula because it's nice that these things are known to this level of a precision. This is equality. And let's write what this actually means. This is asymptotically, if we are looking at t large, this behaves like rho to the 2t. So what do we care about? We care about this t to the minus 3 halves and about this rho to the 2t. So this means, if we go back to why we needed this, this means that if we look at pt x dot divided by pi dot minus 1, l2 of pi, it's at least 2 rho tilde, rho squared. I have a square root pi here. And now the important things, we have a rho to the 2t divided by t to the 3 halves times n. And the n comes from this ratio of pi that we said, let's postpone that till later. Here we did it without the division by pi. So dividing by pi gives the factor of n. So now we see that in order to actually kill this n and bring our l2 of pi, our distance with respect to l2, to be, I don't know, a half or close to 0, then we have to pay a factor, or we have to pay t. t should be at least 1 half. First of all, 1 half log base 1 over rho of n. That's to kill the n. But we also have our t in the denominator, which we can kill with an extra sum c log log. So that is the lower bound, this trivial. And for the upper bound, we just want to use, I'm using this, by the way, you may not realize this, but one of my purposes in doing this exercise is to remind everyone the spectral expansions with respect to l2 distance, which we are going to use in the proof of the Ramanujan graph. So those of you who find this super elementary bear with me slightly longer. So for the upper bound, well, all that you need to do is remind yourself of the fact that we can just write the l2 distance as summation over i running from 2 all the way to n. Now we have something like this. This is something that we did right on the board last time. We have fi evaluated at x squared. And now we have lambda i divided by d to the power of 2t. This was a different formula for a different, it was equality for this distance. And now, if we look at it like this, then we see that in order to kill n together with something that is, if you know that all the lambdas are at most, all these guys are at most rho, then you get essentially the same thing for Ramanujan. Our assumption for Ramanujan was that the absolute value of this guy is at most rho. So we get here something that looks like n times rho to the 2t, which is exactly matching this guy up to this log-log correction. So that's why the Ramanujan ones achieve this equality. OK, so one last thing, yeah, OK, here's one last thing. So this is this formula, the fact that this equals n times, oh, did I, I forgot the n. And yes, well, we can, OK, maybe we'll give it as an exercise. I think I'll give it as an exercise. There is an additional way to see this lower bound, which is related to what Charles was talking about yesterday. If you know, if you don't want to look at the tree and want to immediately obtain a lower bound that matches this rho to the 2t times n, you can use the fact that, I mean, how did our upper bound work? It said, well, you have all your eigenvalues are at most rho. And rho is enough to kill an n in time, well, rho squared is enough to kill an n if you raise it to a power of t, which is a half log n. That was the upper bound. The lower bound that you could say, if you don't want to look at the tree, just uses Sear's theorem that tells you that not only do you have to have an eigenvalue close to rho, close to 2d minus 1, but there's a linear number of such eigenvalues. Then if you know that there's a linear number of such eigenvalues, then you have to kill an n with that rate. OK. Sorry, can I start? Yes, but I'll give this, but if it's related to the exercise. No. OK. You put here the L2 problem. Yes. About when you looked at mu, but then you've not looked at the square of the line. I don't know what you're talking about. Where? So this shouldn't be a minus 1. If you're looking for it to be close to 0. No, no, you're looking for this to be close to 0 after the minus 1. The 1 here is pi, essentially. It's the. So in the previous line, you've got a minus 1. Yeah. But if you're looking for the right-hand side to be, like, the cross-leaves. I do. But there's no minus 1 on the second line. OK, but you can add a minus 1 if you want. I take t to make this on the number of cross-leaves 0. Or a small cross-leaves. OK. If t is small, then it's big, and you want it to be small. This is really simple. I'll be happy to walk it for you in the break. OK. OK. So the. What is it doing in the minus 1 for this one? Yeah, but for these times, that's why I wrote it in kind of like in a faded font. I want to look at the term that multiplies n. OK, so until you become the point you could. The question is, so when you start with total variation mixing, you start with distance of 1, and you need to go to 0. When you talk about L2 mixing, you start with root n, and you need to go where? So you could say that you need to go to 1, but you could also say that you need to go to 10. And it would be the same, because we know that there is a cutoff in these situations, because there's a theorem by Saliff-Costin chain that tells you that Yuval's condition, this product criterion, is going to capture a cutoff for any Lp if p is larger than 1. However, we don't know where that cutoff is. But if we know that there's cutoff, then setting a constant is really, it's the same for all of them. So if you choose 10 and you are happy that it's OK. So the take-home message from this was spectral analysis does not work, because this guy, which is our L2 mixing time, this is our L2 mixing time for GND, we have it precisely. And we can do the same for any Lp larger than 2. And between 1 and 2, you could also calculate it. But these guys all miss the bound that we are after. What is the bound that we are after for L1? It was, I'll remind you, this was the formula that we got from GND. So if you think that the Ramanujan graph in general should be like the random regular graph, then this is the bound that you're after. This looks kind of different. And one of the exercises was it's a simple calculus exercise to show that for any d that is 3 and above, this guy, the leading order constant before the log n here is strictly bigger than this one, as it has to be. So as a method, it breaks down. So let's see what we can do. This is a picture of the Ramanujan graphs and some simulation that shows us that this is actually simple random walk in L1 as opposed to L2. And you see that L1 and L2 look like this. So the blue curve is the L1. And it mixes strictly before the L2 does. And this is the picture of the generalization of this proposition that I wrote down that works for NEP. So for NEP, it turns out that these Ramanujan graphs and also the random regular graphs that behave the same way, they attain the fastest possible mixing time. And it goes like this. But OK. And that's the last one, I think. This is a simulation for just the LP distance on one family of Ramanujan graphs, how it decays. And as you can see, you have L1 and then much farther, L2 and so on. So all of them take longer and longer to mix. And L infinity is to the far right. This is the theorem that we want to show. We want to say that exactly what we had on GND also applies on a Ramanujan graph. OK. So this is the theorem that we want to show. But we also know that spectral analysis isn't good enough, just the naive approach to spectral analysis to just write it down. And we also know that our understanding of a general known bipartite Ramanujan graph is limited. So we cannot do the tricks that we did in GND, which I'll remind you, the bottom line was you needed to know, to estimate the size of the cut between every two balls that had volume routine. So two typical balls with volume routine, you wanted to know that they have the right number of edges between them, that it's essentially like the époisson with fixed mean. If you knew that, then you're in business. But otherwise, you have nothing. OK, so the solution in this case. Let's keep this on the board for a minute. The solution is simply to work with non-backtracking walks instead. So as we did for the case of the proof of the walk in the random regular graph, there we moved to non-backtracking random walks, even though you could have carried out the same proof for the simple random walk, which we did in the first paper. Here it turns out that the move to non-backtracking random walks is imperative, for the sake of the spectral analysis. So let's remind you that A is the adjacency matrix of G. And B is the adjacency matrix of the non-backtracking walk, let's just say the non-backtracking operator, which acts on directed edges. And this acts on the vertices. And what we know about, if you recall Charles' talk from yesterday, there is this tight connection between the eigenvalues lambda 1 all the way to lambda n of simple random walk, or of A in this case, and the eigenvalues theta i, like this, there are dn eigenvalues, d for the degree n for the number of vertices is the number of directed edges of B. And it turns out that this relation, maybe I have a picture of it, let me jump for a second to this picture, you see these two circles, you see a circle, above there are two types of actually of eigenvalue distributions. Now the eigenvalues are no longer real, unfortunately, because this matrix is no longer normal, but you can still plot out the complex eigenvalues. And they sit over there in the complex plane, and it turns out that for Ramanujan graphs, they are all exactly with modulo root d minus 1. Exactly. I mean, there are trivial ones that are plus or minus 1. Let's ignore them. It's just like you have a trivial one for simple random walk that is d. For Ramanujan graphs, you have all of these eigenvalues, you have, let's say, 1 at d. And all the rest sit here at an interval between plus and minus 2 root d minus 1. It turns out that through this relation that Charles mentioned, all your complex eigenvalues theta i are going to be transported to have exactly modulo root d minus 1 in the complex plane. So these are your fitas. They sit here, and you have the plus and minus 1, and you also have the d minus 1, which also comes from this d. So this is what the picture looks like. And why is that helpful? Well, let's go back to the theorem above. It is helpful for the following reason. Let's imagine that Rb was a normal operator. And we'd like to do exactly the same trick as we did before, which is to write down the L2 distance explicitly. What would we get? Well, we would get something like summation over i that is at least 2, up to n. This is d times n. And there is some contribution for the eigenfunctions. I'm going to ignore it. We can ignore. Technically, it's not such a you don't need to raise your eyebrows that much. It's actually easy to eliminate the eigenvalues. If, for instance, you cared about just the average starting position instead of the worst one. I mean, if you have something like this, if you have summation over from i going from 2 to n of fi evaluated, this is your eigenfunction, evaluated at x, your starting position squared. And now you have some lambda to the i to the 2t. You have something like this. If I'm telling you that instead of this, you want to take 1 over n summation over x. So this is for every x, you have the mixing time from x. So this is a quenched property. So we first draw the random environment, the graph. Now there are various points. Each of them has a mixing time. And probably they are all the same. They are all because of symmetry. It's not transitive. If the graph was fixed and transitive, then this thing is exactly 1. Otherwise you can still say what it means to mix from an average starting position. If you do that, you can interchange these two sums. And then you use the fact that summation, what do you get? You get summation over x of fi of x squared, which by Parseval is 1. This becomes a 1. And you are left with exactly what you need. So this is a cheap trick to bypass lack of understanding of eigenfunctions. But it turns out that in this situation of the problem, we actually, even without this, we can go beyond eigenfunctions. This is not the main issue. We can somehow bypass the fact that we don't understand it. So I'm going to ignore them. Now if we ignore them, what are we left with? Well, we have something that looks like summation over i of theta i divided by d minus 1 to the 2t for the non-backtracking random walk. And now I'm committing a serious crime here by treating b as a normal operator. That is a huge difference. But let's just see whether this gives the right intuition or not. So the degree of b was here just a 0, 1 matrix. I put just the way you shall define it. I wrote 1 from moving from x, y to y, z such that these are two edges and z is not x and 0 otherwise. So this is a 0, 1 matrix. How many ways do I have to do this on a d regular graph? d minus 1. So d minus 1 is the first eigenvalue, the trivial one. That's why this point is here. This is the Perron eigenvalue. And now what does this look like? If I'm now in a situation, if you believe me that all the eigenvalues magically lined up on this circle at radius root d minus 1, then here I'm going to write just n, well, do you want me to write n minus 1? I can write an n minus 1 if you want. What we have here is n times, so we write a root d minus 1 for the modulo of this theta i. And we divide it by a d minus 1. And we raise the entire thing to the power of 2t, which is n d minus 1 to the minus t, which allows us to kill this n at time log base d minus 1 of n. n is dn. So we see that moving to the non-backtracking walk of which the simple random walk is simply a marginal. It's just the end, the end point, for instance, by this exact same argument of the cover tree. Remember that for that argument that related simple random walk to non-backtracking random walk was fine for any graph G. It didn't need to be random. So it always suffices to bound the mixing of the non-backtracking random walk. And I'll remind you that that argument had a conclusion. Suppose you can show me that there is cutoff for non-backtracking random walk at some time l. At some time l, l for non-backtracking random walk implied the same upper bound, just an upper bound on mixing of at times d over d minus 2 times l for simple random walk. Plus there was some error, plus all the root l error. So that was the reduction. So if we know that we can put in log d minus 1 of dn as an upper bound, then we get our d over d minus 2 upper bound for the theorem, which is exactly what we're after. And the lower bound is the same. The lower bound was also fine for any deterministic graph. It was just relying on how fast it takes you to actually see the vertices of any graph with even doesn't have to be regular. It has to be maximum degree d. You can't see all the vertices before that time. But here it's the l2 mixing time for the non-backtracking. Ah, but that's an upper bound on the l1. So what you have, good question. So what you have here is that the non-backtracking random walk had the same lower bound of covering, of seeing all the vertices. So what this calculation that is, of course, erroneous, because we said that it's normal. But what this calculation would say is that here is the l2 point in time where you mix. This is this log d minus 1 of n of dn, it doesn't really matter, for non-backtracking. And here is l1 lower bound for non-backtracking. So actually, the non-backtracking walk has the same cut-off location for both l1 and l2, as opposed to simple random walk where they are shifted apart. So you move to non-backtracking random walk, and they glue to one another. But for the sake of simple random walk, you just needed the upper bound. So your question actually raises the point that I didn't say that actually this would also show, not only would give us cut-off for simple random walk at the time, at the rescale time, but also cut-off for the non-backtracking random walk and then at the faster time, both with l1 and l2 at the same location. OK, but this operator is not normal, but it turns out that one can bypass that technicality. And we'll see that in a second. But before we do so, let's just see what one can achieve once you know that, for instance, that the l1 mixing time is a time log d minus 1 over n times a. Once you know, for instance, that the non-backtracking walk has cut-off at this t. So we know that if we do a non-backtracking random walk now on a Ramanujan graph, if we believe that this argument can be corrected, should have cut-off at this location. Well, what does that mean? This is actually the time that it takes us to see all the directed edges. And essentially, at that time, we mix. So that must mean that the distance, and this is the worst case starting position. So it means, in particular, that almost all vertices should have that distance from us. We know that almost all vertices are at least that far. If there was a constant fraction that was far there, then it would mean that the total variation distance would be less than 1, because that would be a distinguishing statistic. So here is where we use the fact that L1 really has a physical interpretation, as opposed to an analytical or spectral interpretation. From L1, we can read distinguishing statistics. So it tells us various things, such as this one. So almost all vertices are at that distance from the origin. And you can say more things. So for instance, the diameter, I won't go into all of them. The diameter one, for instance, you can say that the diameter is at most 2, this comment from below. How do you know that the diameter is 2? Well, from one vertex, you see almost all vertices at distance 1 times log d minus 1 over n. So from any two vertices, it suffices that you see more than half at distance 1 log d minus 1 over n. And there has to be a path between them of length 2 times log d minus 1 over n. So this is trivial. And actually, you could say more things. For instance, you use the fact that this is a non-backtracking work. And that means that I can say, well, I don't want you to just reach z here, or most z's, in a path. But I want you to reach z here in a path that starts with this edge and is non-backtracking. And that seems kind of like, OK, what's the big difference? But for instance, these Ramanujan graphs, they have a huge girth. So some of these constructions have a girth that is 2 thirds log base d minus 1 over n, for instance. So if I force you to go through a specific side, suppose that the shortest path went through one of the other neighbors from W. In order to actually get to W, because you can't turn around, you have to somehow complete a cycle. And the cycle will cost you a macroscopic amount of time. It's a huge amount. But actually, it turns out that there's such a wealth of paths that you can always get from any direction that you want, to any direction that you want to get to z from. Anyway, and there are various other statements. In the paper, we listed various more complicated corollaries. OK, so the key to the proof, I think. So we started this proof actually after seeing Charles' paper. Because in Charles' paper from 2015, the one that you talked about yesterday, he somehow analyzes the non-backtracking random walks in a way that really looked like, in order to show that it's weekly Ramanujan, in a way that really seemed close to what we need, we need to somehow do exactly the opposite, use the fact that it's weekly Ramanujan to somehow count the non-backtracking random walks. And then it kind of seemed like, well, this thing has to give you the proper control. This slide summarizes what we said, that just as simple as spectrum analysis fails. And this picture shows you this. You see how the L1 and L2 glue on the non-backtracking side and don't for the simple random walk. So for simple, you have L1 and L2 different for non-backtracking, they glue each other. OK, so two quick things. First, the reason that all the eigenvalues are on this disk at radius root d minus 1, that's an old fact. It's a result of Hashimoto in 18.9. It uses the Iharazeta function. And Charles also referred to it as Basz's formula. Basz's formula also implies it and applies to not necessarily general regular graphs. So he wrote the Basz Iharazeta formula. And it tells you that whenever you have an eigenvalue lambda, that eigenvalue gives rise to of A. It gives rise to two eigenvalues, theta 1 and theta 2, of B. And then B has many more states. It has dn as opposed to 2n that I just counted. But all of these d minus 2 times n are trivial eigenvalues. These are these plus and minus 1s that you have various degrees of freedom. And this is something that I'm going to dedicate one of the exercises to today. OK. So one little calculation that Charles did yesterday, and he did it a little quickly. And I want to do it again just because I think it's important to bear in mind, which is how you can actually see that every lambda gives rise to such theta's. So suppose that A F equals lambda F. And suppose that theta solves this equation. It's the root of this quadratic. Then I want to convince you that actually you can write down an eigenvalue of B that matches theta. So also, by the way, if you look at this formula, maybe we'll write it very, very quickly. So this is the solution, right? This is the solution of a quadratic, minus d minus 1. This is the solution of this quadratic equation. Notice that if lambda is less than 2 root d minus 1, that's exactly when this discriminant becomes negative. Then we have a lambda over 2 plus i times, so we have plus, in that situation, we'll have a plus i root d minus 1 minus lambda squared, or minus that thing. And the modulo, we'll end up with having a theta and a theta conjugate, right? In that situation, when lambda is less than 2 root d minus 1, and what is the absolute value, what is the modulo? Well, we'll see that it's lambda over 2 squared minus this squared, the lambda over 2 squared cancels, and we're left with a d minus 1. This is for the square. So the modulo is root of that, right? So if you believe that theta that solves this quadratic is, that's all the non-trivial eigenvalues of Rb, then you immediately see that being Ramanujan implies that all of them have to be on this circle in C. Does it mean that B has a most two and distinct element? No. This means, well, it has at most two n plus 3. It has at most two n distinct non-trivial eigenvalues. There are the ones minus ones and d minus 1 that you can immediately write down. So actually, slightly less than that, because the d minus 1 and one of the one copies come from plugging in lambda equals d, right? If you solve lambda, you can see that if you put in lambda equals d, then d minus 1 and the rest, n minus 1 eigenvalues, they can give rise to non-trivial eigenvalues on the circle. So how do you see that this quadratic is the right thing to look at? All that you need to do is to define the following. I'll say that g of x, y. This is a directed edge. So I'm defining an eigenfunction explicitly. It's just theta f y minus f x. This is it. Now, I'll just verify very quickly that this is an eigenfunction. So what is B times g of x, y? What do I need to do? I'm summing over z such that yz is an edge, non-directed. Just a simple edge. It doesn't matter. And I want z to not be x. This is where the operator B would send xy to an edge yz such that z is not x. And now, what do I need to do to that one? Well, I need to apply g on yz. So it is theta of fz minus f y. This is g of yz. So what is that? What we have here is exactly theta times a f of y. So minus f x minus d minus 1 of f y. Because, look, f y gets counted as many times as we have z's. We have exactly d minus 1 z's because it's irregular. It's the out-degree of this B operator. It's the Peron eigenvalue. So this d minus 1 f of y is just by the regularity of B. And now we're left with this guy. So if we had counted all the z's, including x, this is exactly like applying the adjacency operator, which is af of y and multiplying it by a theta. Because it would say, look at all the neighbors of y and apply f over them. But we forgot 1. We left out x. So minus f of x. And now this rises. Now we'll use the fact that af is lambda f. And we have here theta lambda minus d minus 1 multiplies our f of y minus theta f of x. I rearranged it a little. I took the theta f x out. So I used the fact that af equals lambda f. And I have that. And now I want to use the solution of the quadratic to say that this is nothing more than theta of g xy. Because you just see that you have here exactly. This is the point where you just plug in the fact that theta lambda minus d minus 1 is exactly theta lambda is exactly theta squared. So this is theta squared because of the solution to the quadratic. So we have theta squared f y minus theta f x, which is exactly theta times the g that we started with. So Charles did it very quickly yesterday on the board, roughly. And now we see exactly where these eigenvalues come from. And the problem that we still need to work with is the fact that this operator is not normal. So the fact that we have exactly the right kind of control over the fitas, it doesn't really care about it. I mean, it doesn't let us work with it the way that we want to. So for this, that was the bit that required some delicate treatment in that paper. We needed to decompose the operator and show that even though it's not normal, it's decomposable into 2 by 2 blocks. So it is uniterally similar to a block diagonal matrix whose block sizes are just 2. So even though they are not 1, so it's not a diagonal matrix, they are just 2 by 2. And they look like this. The theta 2 and theta 2 prime are, in the case of Ramanujan graphs, going to be a value and its conjugate, its complex conjugate. Just like we saw here, the solutions to this quadratic equation, otherwise they would. So in the real case, in the Ramanujan, in the non-Ramanujan case, then you have two real values that are marked here as a theta 2 and theta 2 prime. But they still come from being the two roots of the same quadratic equation coming from the same lambda. So these can have lots of multiplicities, and that's what made the proof more delicate. Specifically, in the Ramanujan construction that I showed pictures of, eigenvalue multiplicities are as high as n to the 1 third. So if all the eigenvalues were different, then there would be some easier ways to work around the fact that you can somehow nearly diagonalize. And they're not, but you can still carry through. So with this formula, all that remains is there's still the business of the eigenfunctions which I again want to ignore. But the only thing that one needs to now notice is that if you have a 2 by 2 matrix that looks like this, I'll put here a theta and a theta prime, 0, and I'll write here alpha, just like we have over there. And we raise it to the power of t, OK? So in the spectral decomposition, so you have your lambdas, then you raise your matrix to the power of t, and the lambdas just rise to lambda to the power of t. And then you square it when you do this Parseval. But now we have a 2 by 2 matrix, but it is still easy to see what's going on. So here we have a theta to the t. Here we have a theta prime to the t, 0 with 0. And here we have something that looks like, I'll call it gamma. And gamma, we're going to call it gamma sub t. Gamma is just this sum, right? This is well known. So it's theta to the j, theta prime to the t minus 1 minus j. This is all, right? So your gamma t, this value of gamma t, it's at most alpha times t times theta to the t minus 1, OK? So instead of just sustaining as above, above we had, let's replace this, above we sustained, which is now below, we sustained something like a value of theta to the 2t, OK? So we had the theta to the t and then we squared it. Now there's an extra term that comes in from this off diagonal entry. And this extra term is actually becoming dominant because it has an extra t in it. So what? So what we get, instead of getting this, we'll get instead constant. This is what we had before, instead of getting, this is what we had before. And instead, we have another t squared. This is a minus, OK? D minus 1 to minus, exactly what we had before. This comes from the gamma squared, OK? So OK. So we need to take t to be the same thing plus a log log to kill this t squared. And that's it. You get this by taking the trace of the power of the operator? You could, well, there are the eigenfunctions to get rid of, but yes. OK, so this essentially concludes this proof. OK. Now, in the little time that remains, I want to give you an overview, an overview, a very rough overview, of why we needed to have yet a third proof of cutoff for GND because too sounds like enough. And the point in having, OK, this is the calculation that we just did. The point in having the third proof is to treat with non-regular random graphs. And if the graphs are non-regular, then this completely destroys the spectral proof. And it also makes the original proof, well, it also seriously damages the original one. But non-regular random graphs are associated with fascinating features, and one of which I highlighted in this slide. It turns out that the thing that plays the key role in understanding what the mixing time is on a graph that is, let's say, not three regular, but half the degrees are three and half are four. What do you do? Already from this exploration process that we saw, this BFS working with the configuration model, we saw that we developed the neighborhood of a vertex, and it was essentially like a deregular tree. Now, in the new situation, we can do exactly the same thing. And now the neighborhood of a vertex will no longer be deregular. But every time, we just sample the degree from one of the remaining half edges. A half edge, a uniform one, would come along. And it will bring all its friends with it. So this would give a Poisson-Golton Watson tree. In the case of G and P, or not Poisson, but some other Golton Watson tree with a degree distribution Z, that in the case of, let's say, half degree three, half degree four, is not really half, half two and half three. It would be the size biased version of this variable. Because if a guy has many, many half edges, you are more likely to choose it. And it will bring then all of its friends as the children. This is well known. So as I was saying, the thing that plays a role is then the entropy of a random walk on the Golton Watson tree. So the degree sequence will describe, in the obvious way, using this size biased transition, will describe a distribution, an offspring distribution that I will call Z. And now we are forgetting about the random graph. We just have Z. And we picture this infinite tree, this infinite Golton Watson tree. And we recall that when we had this cover map argument and everything was D regular, then what was the cutoff time for the non-backtracking random walk, for instance? We wanted to reach level which was log base D minus 1 of n. We want to reach a level where we would see n vertices. And that marked the onset of mixing. And simple random walk was just a time delay over this. Alternatively, I could say that this is the time where your distribution of the non-backtracking random walk. Let's look at its entropy. At first, you have a point mass. Then it is uniform on D minus 1 points. And then it's uniform over D minus 1 squared points. So it grows. The entropy is exactly growing. According, it is just uniform over the size of the layer, which is D minus 1 to the k at distance k. So this is the rate of growth of the entropy. And you are waiting for this to become just completely uniform. So the time that you need to wait is exactly that. This is for the non-backtracking random walk. And you could say the same for the simple random walk. It has a delay, but it is essentially the same thing by exactly this analogy between non-backtracking random walk and simple random walk. Simple random walk is nothing more than once you know the distance on the cover tree, then it is just non-backtracking random walk. So it turns out that this entropy is the thing to look at if you don't have a regular setup anymore. But it's instead now just some offspring distribution, offspring variable z. So I write this hx on the top left. And I'm telling you, well, now if you don't, if you, for instance, this is the first, you were not listening the entire week. And all of a sudden, you looked at this slide. Then you consider a bias random walk on a golden Watson tree. And you can define the distribution. You can look at the distribution SRWT, which is just the distribution of random walk after t steps on this tree. And count its entropy just like you well-defined in the previous class. SRWU standing at 4. So the fact that this limit exists is a nondeterministic constant almost surely requires proof. But let's take a leap of faith here. The fact that these constants are well-defined and are non-random is known. And I'm asking, what can you say about the relation between these three constants? One of the inequalities is trivial. We know the ratio. Actually, I can tell you already. We know the ratio between the top two. But we don't know the other two inequalities in this generality. So for instance, if z has, I'll say at the end what we know. So in this generality, if z can take the value 1 with positive probability, we have a conjecture, but we don't know it. And it's a very basic, seemingly naive question. It's just the growth rate of n2. Oh, that's loop erased. Loop erased, I just mean, this is a tree. So understanding loop erased is very easy. What you have here, what you have on the left is just simple random walk. It goes up and then goes down somewhere else. The only loops on the tree are the trivial ones. So the loop erased is just the trace of the walk where you eliminate all the trivial going back and forth. So instead of the non-backtracking, they not allow you to go back and forth. So the non-backtracking is a ray. However, it is a very simple ray. It is one where you know exactly that if you have now k children, then you go to each of them with probability 1 over k. Loop erased also looks like a ray, but it assumes the harmonic distribution over these. It works like the simple random walk would, but then you eliminate all of these. Actually, these guys have an exponentially decaying tail, but the fact that the walk can actually, this object is extremely more complicated than just the non-backtracking random walk. OK, so here's what I was saying. Formally, you see that Hx, this growth of entropy, that's the key parameter for simple random walk on a graph with the degree sequence. So you have a random graph with some degree sequence. You define a random variable Z that is the size-biased distribution that I mentioned that has the size-biased distribution. And now the growth of entropy on the corresponding golden Watson tree dictates when mixing should occur. Both for simple and for non-backtracking random walk. The fact that, so for non-backtracking random walk, this HY, because you can write it explicitly, because you know exactly the growth of entropy, because the probability to visit a vertex is nothing more. So if you have here degree, if you have D1 children here, D2, D3, and so on, the probability to visit this guy is just 1 over D1, 1 over D2, and so on, right? Because you choose exactly one of the children with equal probability. Now, this is Z1. Let's write it like that. Now, it's a random guy, Z3, and so on. So Z1, Z2, Z3, and so on. But this is a golden Watson tree. So the Zs are independent. So if I want to understand what this probability behaves like, I'll take a log. And now, I know that I just am summing log of Zi's that are IID, and I have a central limit theorem. So the growth, I'm trying to say, that the growth rate of this guy is nothing more than the expectation of log Z. That's the rate at which entropy grows for a non-backtracking random walk. And I should say that Nikolai allowed me to go faster. This is an infinite. Ah, the Z was at least one. If it can have leaves, then you need to define the non-backtracking walk to go up, maybe, and then continue. And it should be true still, but then the proof is even. But you could still say that it stays something like this and ask what the relation between the corresponding parameters is. So whenever you are stuck, you have to go to your parent and then continue. OK, anyway, so expectation of log Z will be this h, y. And Justin and Anna Benamou had the proof of the second statement using a different argument. And then if you open their paper, you'll see expectation of log Z. And they have an argument that also works through this exposing a tree from a left, a tree from a right. But then there is some, this becomes a delicate and they use some exchangeable pairs technique. It's very nice. It's going to appear soon or it just appeared in other probability. In order to get the simple random walk case, the fact that the walk can go back and somehow, you'd think that it wants to go forward, but actually then this was all part of some detour, then it goes back and chooses another path. And the fact that we do not really understand what the support is of random walk on a golden Watson tree, we understand, but there is understanding, but it is limited, meant that an entirely new approach had to be drafted, at least in our paper for simple random walk. We also did non-backtracking with trees from the left and right. And OK, what I'll do, I'll show you one quick thing. Because we are running out of time, the picture on the top shows you, for instance, the distribution of simple random walk on, I don't know, one, two, three, four, five, or maybe six levels of a golden Watson tree, with a distribution that is half, half, either one or three. The fantastic idea of plotting the histogram above the tree is Nicolás. That appeared in a paper of Nicola and Jean-François. Which I stole the idea from the book. Oh, such a good idea. It comes from the Bible. OK, and that paper discussed, so again, you want to understand how so, as I mentioned, it's criticality then. But this picture is to show you that this quantity d, that was the fact that Wt over t converges almost truly to this quantity, Wt is this log. This is exactly, this is an equivalent formulation for giving it in terms of the entropy, goes back to the famous paper by Russ Robin and Yuval from 95. And we, for instance, in this paper, had to get some quantitative rate of convergence. Replacing this arrow of almost true convergence, but we needed to get something about the fact that it behaves essentially that the variance has order t. So not just that the expectation is dt and that you convert almost surely, but that you have a variance of t. OK, so I want to finish with two things. Let me hide this, put this picture here. I want to divide the next five minutes and then we'll finish into two parts. In the first, I will tell you what it is that we actually did in order to handle this case and why and how to actually, and that would be very easy because I'm going to give you such a high level sketch that it would only take a minute. And the second is to understand the phenomena that is captured by these pictures, which is something that is kind of a take-home message in the regular as opposed to non-regular world. The idea, we already discussed this exposing a tree from the left, exposing a tree from the right. The idea here is to say that when you are doing a simple random walk on a Galton-Watson tree, this dimension that was mentioned in the previous slide that captures the rate at which entropy grows tells you that even though the boundary of this tree grows like the branching number, it grows like the expectation of z to the power of k at level k. Actually, what does this entropy growth rate mean? It means that the simple random walk, we phrase it in terms of the loopy-race random walk, which is simple random walk when you erase all the trivial loops and you looked at it at distance k. It means that simple random walk at distance k was confined to an exponentially small segment of level k. So instead of growing like expectation of z to the k, it grew like some d to the k. And d was strictly less than expectation of z. So it does go exponentially, but with a different base. And that makes it exponentially small in terms of that level. So what does that mean and how to actually make a proof that allows you to understand when mixing occurs? First of all, it accounts for this picture here. Because how did we discuss what the diameter was or what the typical distance in the graph was? We said, expose the tree until you see all the vertices. I mean, that was one of the exercises when we said calculate the typical distance between vertices and then also do it with exponential ones. It's essentially you do a breadth-first search and you wait until this tree grows such that you see n vertices as the leaves, essentially. And that gives you the typical distance in a random graph. So in order to do that, since expectation of z is bigger than the growth of this segment to which simple random walk is confined to, we can think of it as if simple random walk. So the tree grows very fast, but simple random walk has its own little tree and it grows at a slower rate. So this vertex x is going to see all the vertices in the graph, let's say here. That's where you have n vertices. And this dictates the typical distance. So when Duet had his picture of what the distance between the origin and the walk is, the picture would still look like that, exactly as in Duet's paper with Nathaniel Beresticki. And you could say, well, as soon as it stabilizes, that's when I have mixing. This is a very natural margin. This is what we started the first class with. This is what started this conjecture of Duet from 07. If you look at that point, if you look just at this margin of the distance, you'll see that it stabilizes exactly at the location at log base expectation of z of n. And that is easy. However, at that point, simple random walk will still be confined to a smaller, to a polynomial. I mean, it's exponentially smaller. So to some polynomial strictly less than 1. And we have a formula for it, but we don't really know to write it down. I can't tell you immediately what it is for even a distribution on two points. Have to run some numerical process to approximate it. Anyway, so it will be confined here. And in order for the entropy to actually become, to go to log n, so that you are uniform over all the vertices, you have to wait longer until this guy becomes bigger. And that is why on the irregular graph case, you see the blue curve that is the mixing time one. It starts at one. It has cut off, but macroscopically later. Whereas on the regular case, both happen exactly the same time. OK. So I think that in order to understand just this one minute overview, all you need to think about is that in order to do that, we want to essentially explore the tree, just like we did before. The proof says explore only the parts of the graph that the walk is likely to go to. You can somehow, already by exposing the tree, you can see, ah, this part of the tree already looks like the walk is not going to visit there. So I'll stop exploration here. Because if you just explore a BFS and wait until you mix, then you'll run out of vertices and the tree approximation is going to die. It's very bad. We keep wanting to have lots of spare vertices so that the tree approximation would be accurate. So you are careful to only explore the really heavy part of the distribution. And you can do that. And somehow explore as you go along. And then as soon as you know that your entropy is not too big towards the end, you know that you are on a subset of the vertices. And maybe the walk escaped this tree, so that's a bad event. But if it's still got confined to the popular part of where it should be, then most vertices here don't have a large weight. And there's an end game argument that uses the spectral, which we won't get to. But that's what I meant by a hybrid. We have a combinatorial part which somehow understands the distribution of the walk on the graph. It bounds the maximum weight of a vertex. And then we say, OK, at this point, we punt. And the punt is to use the spectrum to say that this is an expander and say that if the maximum weight of a vertex is, let's say, at most, into the epsilon over n. So we are off by a factor of n to the epsilon compared to the stationary distribution. Then the time that it will take us to kill this with the spectral gap is log of that to some constant basis, which is like a psi log n for epsilon that is a little over 1. And then we do this in the, we are more careful to get the right window. So this would actually be e to the root log n. And then when you take a log, you get the root log n that you want. But these are details. OK, so I think with that, I said everything that I wanted to say. I'll just show you the answers to these inequalities that I wanted that I said that we don't fully know. So in this paper, we posted on Monday or on Saturday of this week. So we know the left inequality is true if z is at least 2. If you allow some probability for z equals 1, should also be true. We don't know it. And we don't know the right inequality at all. But it should also be true. Nu is the speed of random walk. This is the house of dimension that was the scary formula from the previous slide. So I think I'm done. In the exercise session, I'll start by solving this distance problem with exponential ones. And then I'll give you handouts of problems that will solve outside like Charles' idea. It was very nice. And then I'll walk between you to somehow. So we'll start with like 10 minutes of just 5 to go through that one exercise. And then we'll do the handouts outside. Thank you.