 good morning. So I'll try to do as I'm told which is to not accelerate the pace and you can help me by asking questions whenever something seems unclear. And I'll start with the last thing that we finished with yesterday which is the following. It had we to remind you we're trying to understand simple random walk on a random deregular graph. And the first step was to reduce this to non-back tracking random walk by a series of simple observations. This is what we ended with. It still applies to any G. You don't use the fact yet that G is a random regular graph. So for any deregular graph on n vertices we just used the cover map. It was a simple argument and said that if you look at the time L where the non-back tracking random walk has or is already has distance epsilon from its stationary distribution and I reminded the non-back tracking random walk is a Markov chain on edges. So it has it is it's stationary distribution is uniform on the d times n directed edges. So directed edge where you are and where you came from. That's a non-back tracking random walk. So d times n states if you choose this L and pi is your uniform distribution on the vertices, the usual one, then the distance from the stationary distribution in total variation is at most what? It's the probability that this guy, the script one, to remind you that was the analog of the random walk but on the cover tree. So we are saying either your distance from the root, we have the walk on the graph, we have the walk on the tree on the infinite d regular tree. Either the walk on the tree, this is the tree started at O and you're doing a random walk here x, t and you're saying either this walk did not make it to height L or it did make it to height L and then you just pay the mixing time of the corresponding non-back tracking random walk on g. Okay? And you're the one who asked me at the end of the class, the end of the class I finished with this formula and actually this formula was missing this little of one. Thank you for the comment. This little of one comes from the CLT. So now we use the fact that we chose t and t was in that, on that display this we did right. We chose it to be d over d minus 2 times L. This accounts for the speed of simple random walk as opposed to non-back tracking random walk, so the speed on the tree plus s square root L and this accounted for this formula, replacing the probability of reaching that height by this time and so on. Okay? So this is just an estimate for simple random walk on a d regular tree, which is nothing more than a one-dimensional biased random walk because of the symmetry. Okay? So that's what we ended with and what does that mean? It means that if we need, if we have a corresponding lower bound that gives essentially the same value then we will have, this will complete our reduction to the non-back tracking random walk and here is the lower bound that I'm going to give as an exercise. This is very simple. So this is, so what we want to say is that for every, so this is exercise, for every epsilon and s larger than zero, the liminth of dTv t minus s log t minus 1 of n is at least 1 minus epsilon minus this probability of 1. What I wrote liminth. So now I don't need it at all. Okay? So this exercise tells you that what we have here is essentially best possible. Okay? So if, so this statement, so this statement will be true. So again you will not, you will not need to use the fact that this is a random graph. Okay? And to prove this just think about the time it takes you to escape a ball of a given size. Okay? Okay. So this is, well this is again log d minus 1 of epsilon n over d and cd is the same. cd is the usual, the same one that we had from before to d minus 1. Okay? So what is the conclusion from, from, from these two parts? The conclusion is that if we have cutoff for the non-backtracking random walk at log d minus 1 over n, exactly log d minus 1 over n plus something that is of lower order than square root log. Suppose we, suppose that this happens and this now will depend on, on your sequence of graphs. Okay? So, so far everything we said was, here's, here's a way to give an upper bound on the, on the simple random walk. Here's, so if you should be, if you choose L to be something then you essentially get the same bound for, for the rescaled. Okay? For the rescaled time, rescaled just by the speed of, of simple random walk compared to just non-backtracking random walk. And this is a universal lower bound. So, for the, for the upper bound, do you use the fact, do you use the fact here that the graphs have very few short cycles? No. Because, yeah, you, you, it is just, it is just so all, all that you're saying is that, yes, that you're coupling exactly. So in a sense, if the graph has many, many, many cycles then non-backtracking random walk on G is going to take longer to actually mix because you'll get stuck in those cycles. All that you're saying here is that if you, if the non-backtracking on G itself mixes by time L then rescale it by the, by this factor of the speed and you are moving to simple random walk. Okay? As, as an upper bound. Okay. Okay. So the conclusion is that if you manage to do this, you happen to have cut off at this time, log d minus 1 over n plus some little log of square root log, then we'll be able to just multiply this by d over d minus 2. We'll add, and we'll add something that will overtake this smaller order term. And this, this something that we add, this s square root L, the normal fluctuations as you're walking up and down the tree, that's something that's really there. It's, and we have a matching lower bound for it. So the result will be an understanding of the exact distance from stationarity at time this plus constant times square root L. Okay? Which is the theorem that we mentioned last time, we stated. Okay. So this will imply the main result. Okay. So the question is how, how do we do this? And, and this is where, where these, where the first proof kicks in. So here's a sketch, sketch of proof 1. Okay. Basically, basically, what we'd want to do is the following. We'd want to start with, so now we have a, we have g, here's our g. And to remind you what we want to do is get this part. So we want to show that there's a cut off from on the track and random off at this time. And we actually just care about the upper bound because the lower bound is again trivial. It's trivial because of the same reason that you get the lower bound from here. If you take a time that is log d minus one over n minus something. Okay. So just think about, about how many possible states you see. So look at, at the range of the ball that, that is centered at some directed edge e. And you'll see that you don't, don't have enough time to actually get to see the, a linear number of, of edges. Okay. So, so if you, if you multiply this by one minus epsilon. And similarly, you can actually get just, just to reduce a large enough constant and you will be left with just a, a smaller and smaller fraction of the sites that you will visit. Okay. So the lower bound is, is as usual trivial. And, and now for the upper bound, what we'd like to do is, is essentially say that when you do a random walk, a non-backtracking random walk from an edge e, then at time, some designated time t, which is this time, you will be essentially at every target directed edge f with probability about one over dn. Okay. Which is the, the stationary distribution, which is the uniform one. So the proof is going to be really elementary. And, and the way that we're going to do it, we're going to look at the neighborhood of e. So this is e. It is some directed edge, some xy. Okay. It goes like this, then goes like this, goes like this. And, and what we're going to do is we're going to count the number of paths that, that go from e and go all the way. I'm going to draw it in a funny way. Then I'll explain and go all the way until they hit some target edge f. Okay. And, and the way that I, that I wrote this down is, is actually indicates the, the proof idea. So what we're going to do is in a, in a very, from a very high level perspective, we're going to expose the edge, the neighborhood of e. So remember that this is, the result that we aim for is a quenched result. So for, so with, if you take a random graph g, then with probability that tends to one is n tends to infinity, this graph is going to be such that this mixing time of non-backtracking random walk is going to be at most this plus some, let's say, order square root log log n. So this is the result that we aim for. We aim for a quenched result. But in order to get this quenched result, we'll actually take, take the, the directed edge e, walk in the annealed setting where we expose the graph as we, as we go along and we'll argue that the structure of the graph is good enough for every source edge e. Okay. With probability, so, with high probability. So in other words, we'll, we'll do a random exploration of, of our source, the neighborhood of our source edge e. And we'll show that good things happen with probability, except with probability that is smaller than one over n. Okay. Let's say that with, except with probability that it is n to the minus 1.1. Now we only have d times n, this constant. So a linear number of source edges. And this will say that by, by a union bound all the source edges are good enough. And good enough means that if you do this, this, this walk from e, this, the, the target of this walk will be essentially uniformly distributed over the different directed edges. Does this sound understandable? Okay. Okay. Okay. So this was a, so this is a, this is the approach. Let's, let's get a, let's get to it a little more concretely. So how many people here have worked with a, with the configuration model? Pretty good. Okay. So what is Nicola? Will he watch if I speed up? Oh, he's there. Okay. Okay. So, so I can accelerate just a little bit. I got, I got permission. So we're going to expose the neighborhood of e according to configuration model, which means that we are matching, we are matching every half edge. Let's say ui, u is in v and i is an index from, from 1 to d. Every vertex has d half edges and it gets matched to a uniform other v comma j. And we do this, we just put a perfect matching. Okay. So, so dimels over all, over all these half edges and then we collapse everything into, into vertices and you can work in the, in the model where this is a multigraph if you want to prove something about a multigraph or you can walk or work in the model where you assume that the graph, this is a uniform distribution over simple graphs and both are obtained exactly the same method because you can always expose the, we're going to expose these things as we go along and this is rejection sampling. We could say that if on the first time that we see an edge that creates a double edge or a self-loop, we abort. Okay. And then, and otherwise we go on which is another way of saying you can always expose these, these matchings as you go along and every time that you do it, you can assume that, that, that they do not, that they do not really create multiple edges or, or, or self-loops or you can work in the multi-graph setting. It's up to you. So, so this is the configuration model and what I'd like to note, this is just is going to be useful later on. We have dn half edges. Okay. dn half edges. The number of edges in your graph is dn over two, but notice that when we are matching once, it's a simple trivial fact, but I want to, I want to write this down so that you, so that you'll notice it later. The number of edges is dn over two, but when we match, we match to a half edge and it so happens that the number of half edges is exactly the number of directed edges, edges, which I mark by e with, with this Vick symbol and that's the state space of our non-backtracking random walk. Okay. So, so matching, if I'm going to match to a uniform half edge and I'm going to write a one over dn, this is going to be exactly the one over dn that I'm shooting for, for my uniform distribution over the erected edges that is this uniform distribution of, of y are a random walk on g. Okay. Okay. So, yt is, as before, non-backtracking random walk on g and we, we're going to start it at, at e equals some, we're going to choose some vertex x and some half edge that I'm going to mark like this. So, I'm going to identify directed edges also with, with, with half edges, either the outgoing one or the incoming one and I'm trusting you to, to understand this, but you can ask me if it's ambiguous. And we'll, and we'll start, starting at the fixed e. Okay. So, here's one final piece of notation. If you have a graph h, we can say that the tree axis of h is just the number of edges minus the number of vertices minus the, the number of connected components, components. Okay. Which is just the number of edges that you need to delete from h into, in order to make it into a tree, the minimum number of such edges. Okay. So, if you have just one cycle, it's going to be one edge and so on. Right. Because, so, so here's a little lemma. It is very simple. Lemma one. I'm going to write it into two, let's write it in two parts. And, and this is going to be, to show the, about half of you who have not worked with the configuration model, how, how one proves these things and then we can do the rest in a quicker speed, if I'm allowed. So, let's, let's choose r to be anything up to one-fifth log n. Okay. Actually, so, so in most, in most, in, in the, in the next, in the first few applications of this lemma, we'll actually just need to think of r as being about log long. Every except at one point and then we'll want it to be logarithmic. Okay. So, but, but this will work for anything. Then with high probability, the following holds. First of all, the tree axis of, of the ball of radius r around an edge e. So, what do I mean by this? We have our edge e. It goes like this. And now we are, we are looking at all the possible edges that you can get to by doing a non-backtracking random work of this, of, of this length. Okay. And this defines a graph. Now, if you forget the directions of the edges, you get an undirected graph and you can ask yourselves what the tree axis is without starting to worry about the erected cycles or non-directed cycles. It will just be undirected and you're asking how many edges actually went backward, so to speak. Okay. It will actually be sideways when you do a BFS. But, but non-tree edges, edges that did not go to a new vertex. And this is captured by this Tx. So, this tree axis would be at most one for every directed edge. And this is with high probability. And also, you can say something like, like this, the number of edges, of directed edges, well, this thing is not zero is at most square root a. Okay. So, larger than zero, I mean by not zero. Maybe I'll say not zero. Okay. So, this is what you, what we're going to show. Is this clear? The proof is going to be very, very simple. And I'm, I'm, I'm writing it down just in order to, for, for, just for the benefit of, of the, of those of you who have not seen working with, not have prior experience working with these configuration models. The way that you do it is, is just you do a breadth first search. Okay. We have this edge, E. And now we're going to, to expose the matches of these half edges. First of all, E was just one half edge. It's matched to some v comma j. And that v comma j is associated with all the v one through d except for j. So, d y minus one, if I, if I know that this is E, this is u comma one, let's say, x comma one, it goes to some v two, let's say. Then I have d minus one directed edges. Okay. That, that are, or, that are, everyone except this, this one that I just matched going forward. And I'm going to, to now find, reveal the identity of who the matches are of these ones. Again, everyone, every one of these is in a uniform sample over all the remaining half edges that are still alive. And then, and then I have a bigger subset here. And I'm going to go one by one and reveal the matches of this level and so on. So this I'll call, let's say, this is the ball of radius, I don't know, this is the ball of radius one. This is the boundary of the ball of radius one. And then I have the boundary of the ball of radius two and so on. Surrounding, okay. So, so when you are doing this, what could possibly go wrong? So every time that I'm, that I'm visiting a vertex that is completely new, so I'm actually, I'm, I'm allocating, I'm finding the match, the matching half edge for my current half edge. And this half edge is, is a combination of some, of this is some z comma three, let's say. So this half edge, if it involves a new vertex, z, that I've not seen before, then I'm kind of happy. Then this z will come associated with d minus one new half edges and the tree structure carries on. But what happens if I've already witnessed this z before? So this can happen if, for instance, when I'm, I'm, I'm, I'm finding the match, so I found the match, let's say, of this, of this guy, the match of this guy. And now, all of a sudden, this matches to one of the half edges that I already have, have seen before, okay, that are part of this level that is still alive. Right, that, that's the only thing that could go wrong. I, I either go to a new one or I go to the level that I'm currently exposing. Okay, and, and, and this would somehow mean that, that, that this guy that I'm currently matching and the one that I matched to are problematic. Okay, so this is the bad event. And I, and I want to claim that, and this will, this will create one such non-tree edge. Because if I eliminate this edge, I'm back to the tree structure. And I want to claim that we could have, now, we could have such situations. So there's a, a constant number of, of cycles of, of, of, of fixed, for any fixed length, the number of such cycles in this model is going to be Poisson with an appropriate fixed parameter that depends on d and that length. Okay, so there are going to be short cycles around, around various directed edges. There are going to be triangles and so on. And these are catastrophic for trying to prove a cutoff unless you stay far away from them as we'll see later on. Because everything here depends on the constants. And if you start right next to one of these short cycles, it means that your balls grow with a wrong constant. Right, the, the early on effect of starting next to a cycle means, okay, we'll see this later on. So, which is why we're doing this lemma. But, but the point is, I want to say that, that the number of such bad edges, even though we can't avoid them, it's going to be very, very small. So, namely, it's not going to be two at distance, even up to one-fifth log n, which is a huge distance. Right, the maximal distance in the graph is about one log base d minus one of n. Okay, so, so how do we see this? Well, here's a, here's a simple observation, capturing what we just said. So, the number of non-tree edges, those which we count with this quantity Tx, the Txs, that at, that, that we reveal, that we find when exposing, let's say, the kth level of the, of the BFS is stochastically dominated by a binomial random variable. So, the size is going to be, so every time that you, that you expose, what's your chances of being mistaken? Well, there are at most the size of this boundary edges that are dangerous, that you don't want to touch. So, so the size of, let's say, B1, as opposed to all the remaining half edges. Okay, so the probability and, and how many chances do you have to, to mess it up? You have, again, that, that many, that many edges that you are matching. So, this is stochastically dominated by this guy, which is your number of trials, and the probability to actually make a mistake is nothing more than the same number divided by the total number of half edges. Okay, now, there are slightly less half edges, okay, because you already took away some. However, how many did we take away? I mean, this is, so we grow at most at the speed of d-1, even if everything is perfect, d-1 to the power of r is still sublinear. So, this is, this little is actually, we omitted something that is like n to the one-fifth, so smaller order. Okay, so, so it's stochastically dominated by this, and this is for the case level, so that means that this entire quantity, Tx of Br, is stochastically dominated by a binomial whose size is d-1 to the r plus 1. So, the boundary, this is at most d-1 to the k, right, this is what we just said, and sum this over from 1 to r, okay, this is an upper bound, and now the probability is, I'll take the last one, the largest one, we don't really care, it's d-1 to the r divided by d-n, I'll put this one, my little one here in a small font, and okay, and that's about it. So, the probability that this thing is at least 2 is what? Is up to constants, we, it's just, it's just this, the number of tries choose 2 times this probability squared, okay, d-1 to the r divided by d-n squared, okay, which is, if you do the calculation, so this is d2, so this is d-1 to the 4r, which is at most 4 over 5 divided by n squared, so we get n to the minus 6 over 5, okay, n squared as opposed to n to the 4 over 5, okay, and this is little over n, okay, so we beat a union bound over the edges and know that it's at most 1. Thank you very much. And what? It was still true. And similarly, if you just want to know how many guys actually have tx Br over E that is positive, this is at most n for a union bound over these guys times just the product of these two. So we get d-1 to the 2r divided by d-n, but this is up to constants like n, okay, and this is at most n to the, was there a floor, a ceiling, there was no floor, no ceiling, so it's 2 over 5, which is less than square root n, okay. So the expectation is n to the 2 over 5, here we said, with high probability, so we used Markov. We increased it a little bit and used Markov to say that this is with high probability, okay. Okay, so this was a complete proof, obviously, yes. Yesterday you gave an exercise asking to show that with high probability, at least 10 square root of 36, should it also be done with the configuration? Yes. Yeah, and actually, you know, there's this, there's this famous name for in the Israeli army when you ask, when you ask whether to carry some, when people want to go somewhere and someone asks whether to carry a kit bag with them and the, and then they say, sure, why not, it's called a kit bag question. So as long as you ask about the exercise, I was thinking to, this reminds me that I wanted to add an item to it, okay. So this was the exercise. You, so we wanted to show that the typical distance between u and v, so I'll put this in parentheses, in g and d, typical u v, so you fix them in advance and then we want to say that let's say this, the distance between them will be log d minus one over n plus little, plus bigger of one with high probability, okay. So you can say it like this, plus o p of one. I asked for something weaker than this, but this is what's, what's true and the proof is not more difficult. It's exactly the same proof. In any, in any case, what you can also do, and this is a acute extra little item, is instead of thinking about g and d, so you have your edges of g and d, let's put exponential one weights i, i, d on the edges on the edges of, of this g and d, okay. So now you have a random matrix, a random metric, okay. Again shortest path metric, no negative weights between two points and I want to ask what is, what is the distance now, okay. And the proof should, I mean again, again it's a hint you can also, it's, it also proceeds along similar lines, okay. So actually, actually these two proofs are, are roughly speaking they come from, well I just erased it, but the two triangles that were drawn here developing the neighborhood of u and developing, of e and developing the neighborhood of f, this, this is the way to prove both. And they are very similar, only here there's a little tweak because of the exponentials, which makes it cute, okay. So this is for Friday, okay. I would have gotten to it sooner or later today, okay. So lemma one a, which was this, let's pull it up, is done and lemma one b, I'll try to, I'll try to put it here, is, is similar, it's not a much more difficult. And, and this shows us why it's important to actually, why, why this really isn't a restriction to assume that actually we, not only, I mean, there are these guys that have this tree access of one. And I mentioned very quickly in passing that if we have a triangle that is very close to us, this is going to be very disturbing. So ideally we'd want to start our random walk from an edge that has none of these close by. And, and lemma one b says that we are not paying anything essentially, in order to, to get to such an edge. So, so if you take r and r prime that are positive, h can be so, so g, okay, let's, let's, let's say I'll call this graph h now and not g. This is a deterministic statement. Okay, h is any graph, is a deregular graph, okay, d is at least three, but I'll stop saying that. And x is, okay, let's say that e. e is in the directed edges of h, such that tx of b r plus r prime of e is less than one. Then the probability starting from e that tx of b r prime start centered at y r, that this guy is positive. This is at most some, something that decays exponentially in r. Took a long time to write, let's pull this up. What are we saying here? We're saying something very simple and it's just taking us a lot of notation. We're saying that if you start in a graph h and you start at e and you know that up to distance r plus r prime, if you stretch out that far away, then there'll be at most one guy, one edge that's not going to follow the nice tree structure, okay, it's going to be an axis edge. So if you look at this as an undirected graph, forget the fact that these edges, you are exposing them and as you're exposing them, you're thinking about them as being directed, even though they're really undirected edges. I mean these are just half edges. The walk it's going to, to have states that, that remember where it came from, the edges are undirected. But so if you look at this graph and it's an undirected graph and you just count how many edges it's going to have, this is connected by definition because we are doing a BFS. So if it has k vertices, then it's going to have k minus one edges and if it's a tree, okay, and, and we, and we are assuming that it has at most k edges. This is the assumption, the deterministic assumption. In that case, in that scenario, we're saying that if you do starting from E, a random walk of length, an undirected random long walk of length R, and now look at the ball of radius R prime around you, okay, so you are still in inside of that guy, then you're going to be a perfect tree around that point. That's what we're saying, okay, except with probability that is exponentially small in R. That's also true, this guy from the front row might say, you want to say that this is also true, okay, thank you. So the proof is again very simple. So first, if it happened to be the case that Tx of B R plus R prime equals zero, then we are done, okay, it could be that this just was a tree to begin with, and then you are still, you are confined to a tree, so definitely you are fine, okay, and the other case is that this equals one, okay, so we have just one, one non-tree edge to worry about, okay, so what does it look like? So here's a cycle, call it C, it has some length, I don't know what the length is, okay, and this cycle is induced by this edge, I'll take the smallest cycle that this edge creates, okay, so I have such a cycle, this is this case, and on this cycle I am hanging trees, remember that the graph is D regular, okay, so on every guy I have a bunch of trees that are, that are, that I hang on each vertex, D minus two such trees, so this is the structure, and the trees are perfect trees, this is the structure of this ball around E, okay, now as I'm doing the random walk I can just define tau to be the minimum T such that the distance in, in, in vertices let's say, the distance in vertices between C and our, let's say, so remember that YT is YT1 and YT2, that's the directed edge, it has a source and a target, so I can say what is the first time T where this distance is bigger than, than what it, than what it was in the previous step, so when do I step away from C for the first time, so this could be for instance that I'm on the tree and I just went down instead of up, it could be that I'm on C and that I went, and that I went, that I stepped away from C for the first time for instance, the simple observation is that Y, YT, all the way to YR all satisfy that TX of BR prime of YT is zero, okay, so, so T equals, T equals tau all the way to R, okay, so all of, so all of these points, Y are, Y tau, Y tau plus one, all the way to the end, all satisfy the, the property that we are interested in, which is that the ball of radius R prime around them is going to be a tree, okay, and why is that, so remember tau is the first time that we stepped, we made a single step away from C, what, yeah, yeah, exactly, so as I said that it's simple, so the point is that as soon as we step away from C we can't go back, right, this is a non-backtracking random walk, so we keep, we keep increasing our distance from C, from the first time that we actually stepped, took one step away from C, deterministically the distance is going to increase by one in every subsequent step, okay, and as we know we are walking on trees here, so, so we're going to have the, the required properties. I mean that the ball, the R prime is, does not touch the cycle? Yes, yeah, this is the entire picture of, of the R plus R prime neighborhood, this is everything you could possibly see if you take R steps and then take a ball around you, so if you don't see, see within your, your, your R prime neighborhood then you have no chance of, of having a non-tree edge. Is it because the excess is exactly one? Yes, yeah, this is this case, the excess is exactly one. Okay, so now it's just a question of how, of, of, what's the probability that this store actually happens, so, so we want to understand what's the probability that store happens by time, by time, I guess, R, okay, because we just want to evaluate, to evaluate T equals R, and then if Tor is at most R, we are very happy, is this clear? So we want to take one step, the walk does R steps, and we want one of these steps to move away from C and that's it, and then we'll know that, that, that we are good. Okay, so, so this is very simple, so what's, what's this probability? Well, let's, let's, let's, let's look at a, look at, and this will do quickly, look at time T and look at the second coordinate of the, of the, of the walk. So first of all, if it is, if it is not on the cycle, okay, it's a, it's completely off the cycle, so it's on one of these trees that are hanging, so we are either going up, okay, with probability, so distance increases with probability d minus 2 over d minus 1 and decreases with probability 1 over d minus 1, okay, we go up, as opposed to we have d minus 1 options, okay, this is a non-backtracking on the walk, we can go up or we can, with this probability, or we can go down with that probability, then if, if it is on the cycle itself, it can be, it can be the case that, that, that also, that also the first coordinate is on the cycle, so we, we look like this, okay, this is yt 1, yt 2, and there are these trees that, that are hanging over, okay, in this situation, we, the, the probability that, so the, so our distance is zero, because we are on the cycle, the probability that we stay on the cycle, probability that yt 2 plus 1 is on c, okay, given yt is exactly 1 over d minus 1, we have just one edge going here and this picture has one tree, okay, because this is the d equals 3 picture, but actually there are d minus 2 trees hanging on each point, so d minus 2 ways to come down and just one way to stay on the cycle, so the probability to stay is 1 over d minus 1, and finally if we have, if, if we are on c, but the previous, but, but the source is not on c, which means that we are, that, that, that we are like this, okay, so we, we, we visit c for the first time, okay, so, so this can happen exactly once, because we are, because otherwise after that we'll stay on c and tau will happen the first time that we leave c, okay, so this is, so this is just once at most, so at most once, so the first time that we visit c, then we have two probabilities, two choices of walking on c and remaining with distance 0 and the rest, so probability, so then we have 2 over d minus 1 to stay on c and the rest will take us away, okay, 2 over d minus 1, okay, so, so altogether what does that mean? It means that the probability that tau is bigger than r is at most, so every time I pay, so, so this is, this is, this event, okay, I'm trying to, to not do this step, okay, this would increase my distance and then tau would happen, so I have to follow this guy, so this is 1 over d minus 1, here again I have to follow this guy, it's 1 over d minus 1 and here it's 2 over d minus 1, so every time I pay, I pay this, this 1 over d minus 1, okay, and once at most I pay a 2 over d minus 1, so it's 2 d minus 1 to the minus r and that's what we wanted to prove, this is what we wrote, yes, okay, so I think by now we are happy because what did this tell us? It told us that we can, so this lemma 1b was a deterministic statement but remember that our g is such that every e can be taken to, to have this b of a, of let's say up to one-fifth log n is at most 1, okay, so we can sacrifice some small capital R, we'll make it log log, and we'll know that the probability that at the end of these log log n steps that will not have a radius of what remains, which is about one-fifth log to be a perfect tree, this probability is little or one, okay, and any little of one would do, we don't really care which probability it is, okay, but we'll need to choose log log, the actual reason to choose a window of log log comes in a little later, any questions? Okay, so now comes the last lemma and we'll see why we are doing this, so this is lemma 2 and lemma 2, I'm going to give you as an exercise to complete the details because it is very similar to lemma 1a, but I'm going to state it clearly and I'm going to just draw a picture to say how this works, so let's define the following, I'm looking at l, which is slightly more than a half log base d minus 1 of n, why is that crucial? So, for two reasons, first of all, most of you will kind of feel that understanding the random graph is pretty trivial up to a distance such that the ball has root n vertices, okay, because before you start, before then, this is the birthday paradox, before then, you will not really see the effect of the graph as opposed to a tree, but as soon as it becomes greater than root n, then things become messier and messier, so we have here more than root n, okay, we have n to the 4 over 7 and for the proof anything that is larger than a half would work, we just need to tweak the parameters accordingly, was that a question? Okay, so we'll choose l to be this and we'll choose r to be log n and we'll let sr be the set of directed edges such that the ball of radius r around them is a tree, so two statements, so with high probability 1, for every e in this set and for every k that is at most l, we have that the size of the boundary, so of level k of this ball, the boundary of the ball of radius k is 1 minus big o of 1 over log n d minus 1 to the k, okay, so d minus 1 to the k is a perfect tree, we have a perfect tree up to a 1 over log n, a 1 minus order 1 over log n multiplicative factor, okay, and 2 and this goes all the way to level l, so past the point where you have root n vertices and point 2 is the following is for every e and f and I'll write it like this, okay, so we just like we did this thing with e where we exposed the neighborhood starting from of e that is like x comma 1, I want to do the same for f, another target edge, but here I want to think of it as if and this is the same, but I gave this little inverse vex sign over f, here I'd like to think of the walk as pointing towards f as opposed to pointing towards as opposed to going the other way from f, everything is exactly the same, it is just symmetric, so instead of thinking, so for the exploration we usually just expose the undirected graph, we just looked at, so it makes no difference for the non-backtracking random walk, again it doesn't matter whether you are walking with edges that look like source to end or whether you are looking at edges that are end to source, okay, but this is how I'd like to think of f as going the other way around, so all these walks would kind of, these are walks that end at f instead of walks that start at f, okay, and we can do exactly the same thing, so what I'd like to say here for the second item is that the intersection of the bolts bk of e intersect with bk of f like this is at most, and I wrote here, n to the minus 1 over 7 d minus 1 to the k, okay, the same range of k for every k at most, I can eliminate this, okay, so first of all why did we actually, okay, so a couple of things, first of all it looks kind of useful to know that for every, that if you start with every edge whose ball of radius log log is zero and we already know by lemma 1 point a that we can get to such a point with very high probability just by walking log log n steps, okay, it's very useful to know that starting from this edge it's not just that the first log log n levels are exactly tree like, but after these log log n levels you have exactly the right number of vertices all the way to distance n, okay, and why is it really important for us to have the right number of vertices in every level, this is because we are trying to understand, so this is e and we are walking down, have this little picture that is, so here's e and here's what happens if you have let's say a triangle right next to e, so it looks like this as opposed to, okay, so this picture demonstrates why starting right next to a cycle is problematic, we're trying, our goal would be to somehow to argue that if we look at e and we're doing a random walk at a given distance then this random walk will essentially be distributed evenly over all the vertices, in order to do that our way of doing that is to say, well, instead of walking log d-1 over n and then saying that you're uniform, this is our a, okay, we're going to say let's walk halfway, half log a is d-1 over n, okay, and start by saying that we look like a perfect ball on this many vertices which is about root n, okay, and we know this because actually we know that up to a negligible factor, what we have up to that distance is really just a tree, so when we are walking down a d-regular tree, we are by definition uniform, it is completely symmetric, in every level there's a negligible part of that level that may have involved some issues, but we never see that part, we never visit it, because it's negligible, okay, so as we are walking up to a distance that is one half, we are uniform over this set of root n vertices, now let's pick our favorite target edge f and look at all the incoming random walks, okay, and here we have again distance log d-1 over n, and again these would be uniformly distributed over the root n points at that distance from f, so what is the probability that if I do a non-backtracking walk from e at distance that is this plus that, I end exactly at f, it's just the probability that I'll walk here, then cross over to one of these points and then walk over there, okay, so if I know that I have exactly the right number of points here and exactly the right number of points there, then I will be able to deduce that this is, that I have the right probability, so let's do this, but in order to do that it's imperative that I have the right number of points here up to a one minus little of one factor, so if I had half the number of points here, which is what would happen in this scenario, okay, it's the exact same t, I would have half the number of points that I expect to have if this were a tree, because on the very first step I encounter the triangle, and this would be problematic if I encountered a non-tree edge in any of the first fixed number of levels, it would just give a smaller and smaller error, but it would still be fixed and it would destroy my cutoff proof, okay, so now if you push, if you push the tree edges up to distance log log, the first non-tree edge, then you can kind of absorb its error and this is what we, this is done in this lemma, okay, so this is why we assume that the first non-tree edge is here after distance log log, okay, so let's finish the proof, okay, so to conclude, so let's take t, so everything that I just drew, I'm just going to say formally, we'll choose t to be log d minus one of n plus a small log log, okay, and lemma three tells you that with high probability for every e and this f that are in sr, the probability, so what do I mean by the way by this with high probability for every, okay, so you expose the graph, right, and if you expose the graph then the fact that e and f, then this sr becomes measurable, but you still have randomness in terms of the random walk, so this is the probability that I'm writing here, so the probability if you start from e that y, 2t equals f is at least one minus although times one over dn, okay, here's the proof, here, okay, let's keep this here and put the proof over here, so let's expose the left guy and expose the right guy, I guess what you'd like to do is to say that here you have some d minus one to the t, okay, up to a little arrow, if this is t, here you have d minus one to the t, okay, and when I'm going to start with two random points e and f, so I still, in order to prove this lemma, I do not want to condition on the graph yet, I just want to expose, to start with two random directed edges, then expose the log log n ball around them, this makes this set, this event measurable, okay, but I only, I've only seen just the log log n ball around them, okay, and now if they do not belong to this set then I don't need to do anything, if both of them are in this set I can continue the exploration up to distance t, okay, and I'm going to expose them together, expose both balls together up to distance t and eliminate the parts that are common, according to this little lemma, okay, so what I'm left with is something that looks essentially like d minus one to the t here and d minus one to the t here and I could have gone more than one over two, I could have gone all the way to four over seven, okay, minus some log log, so I have a round d minus one to the t here and d minus one to the t there, what's the probability that the random walk starting for me ends at f, okay, yes. Is that part two of this lemma there? Yes. Yes, so e and f were also chosen before you take the graph? This was with high probability, that's how you prove it. But e and f, it's for all e and f with high probabilities, with high probabilities for all. With high probability, for all e and f, with high, so with probability one minus zero to one, these two statements hold. If e equals f or it's very close to f. Excellent question. For every e and f such that the distance between e and f is bigger than two r, two l, two, what do we need? We probably need the r, but that is a very good question. We'll have to, we have to of course r, yes. So you have to, you start, you want to start with the two balls at a distance r and kind of, and get a guarantee that they are disjoint and now you're kind of exposing them together and then when you're exposing them together, finding edges that are bad for one is essentially finding, so understanding whether you have a non tree edge is essentially the same as understanding whether you have a tree that is either going back to your own ball or is visiting the other guy's ball, which makes this analysis the same. Okay, but thank you for the question. In the center of the ball at r, they will overlap a lot. Oh, that is fine, but they may overlap, but when you just start with distance r, they are disjoint. Both balls. You both have a radius which can be up to l. Oh, that's fine, but I'm saying that in every level, then in per level, the intersection is negligible compared to the size of that level. Okay, right? Into the 4 over 7 is big. This can be, so eventually this thing is going to be large. It's going to be polynomially large. Okay, but all that we're saying is that it is still negligible compared to the actual size of these balls. Okay, where were we? Ah, yeah, so the probability of a non-backtracking random walk starting for me to actually visit f is to say all I need to do is give you a lower bound because if I'm showing you that almost all targets f have this lower bound, then we are done. Right, then I'm covering, then I'm giving you essentially all of the mass essentially. And those f's that we didn't care about, those that are not counted in SR, that's a negligible set. So a lower bound is all that we need, and for a lower bound, we just need to visit one of these vertices, cross over, and then visit all the way over there. But what is the probability that a half edge here is going to be matched to a half edge there? Okay, so here we have one half edge. One half edge. This, the probability that it goes over to one of these half edges is exactly d minus 1, so the size of the right guy, which is d minus 1 to the t divided by d times n. Okay, that's going to be this probability. And I have d minus 1 to the t choices to actually get here, to get to any of these. So all about d minus 1 to the t. So this is d minus 1 to the 2t divided by tn. Okay, so this is the expected number of edges that I'm going to see here. So if I had chosen t to be exactly half log base d minus 1 over n, we'd get a Poisson number of such edges in the cut. And with fixed probability, we would choose one of them, let's say. Okay, so that's not good enough. So a Poisson number of edges in the cut would not be concentrated. And that is the source of the complication when you want to do things with a precision level of order one. But you can, and then you end up resulting to some Poissonization arguments. But here we have the luxury of not trying to do things super accurately for the non-backdacking random walk, because this log, and paying a log log n error, which will get washed away anyway when we introduce the simple random walk normal fluctuations of square root log n. So if I now add this log log, this log log term, then the expected number of edges in this cut is now going to be log, or log squared, actually. Right? Log squared, because it's d minus 1 to this. Now log squared is very well concentrated, and that will allow us to take a union bound over e and f. And that's it. Okay, so just writing it down. I started already. So proof, let's say qe is bt of e, and qf is bt of this guy. The probability starting from e, that y to t equals f, is at least d minus 1 to the minus 2t. Okay, this is just the probability to choose the right path. Okay, times the number of edges times, let's call it, m ef, where m ef is the number of edges, e prime, f prime, such that e prime is in qe. So, and f prime is in qf, and e prime matches to f prime, okay, match. Okay, but this is a binomial random variable. This is qe, this is qf. I'm going to say something that is a little inaccurate, and then I'll fix it. So this is essentially stochastically dominated by a binomial random variable. You have qe attempts, and for each one, you have qf over dn choices to actually choose the right one. Then you need to somehow account for the fact that this is sampling without replacement, as opposed to with replacement, but this entire random variable is also stochastically dominated, I'll write it very small, in a small font, by the guy without any errors as an upper bound, and therefore, and this will allow us to, and you know, the mean is log squared. So the probability that you have some polynomial, let's say, n to the quarter, n to the quarter, is e to the minus n to the quarter, okay, which is, which vanishes and takes, allows you to take a unibound over all the, versus, so I'm putting this here, but you can ignore it if you don't, if in the first pass, and now, the probabilities, and now, you know that Mef is, this tells you that this is, let's say, I don't know, this, this is d minus one to the 2t divided by dn times one plus or minus epsilon with probability one minus e to the minus constant epsilon log squared n, okay, this is just concentration for this binomial random variable, because we, because this guy, so what is d minus one to the 2t, it's n log squared n, okay, so this has order log squared n, and therefore it's very concentrated, it allows, even after taking a unibound over e and f, okay, so, so the proof is, is concluded, because, so we've shown that this probability is at least one minus epsilon divided by dn, and now, and this will imply that dtv at time t2 plus r, this is, is at most epsilon plus little over one, r steps we invest to get from our initial edge, e, to an edge such that Br is a tree, okay, with high probability, okay, so this is this little over one, once we do that, I want to argue that from this new edge, I can get to all but a small number of edges with the right probability, okay, so the small number of edges that I can't get to is those that are not, that their Br is not a tree, but we see from lemma 1a, the bottom line, there are at most root n such edges, as opposed to the linear vast majority of the edges that are nice, so, okay, maybe I'll write this down, I was using for the last step the fact that, that, you know, you can do something like this, even, so suppose that e was a, was nice, which we reduced to the total, to, to calculate total variation distance, I'm summing over f and I have the stationary distribution minus my probability of being in f, okay, maximum with zero, right, this is total variation distance, okay, one of the, one of the, so I, so pi minus f maximum with zero, that means that, that if I, that I can just, so, so this with high probability, so, right, we have dn minus the size of this set, okay, so there are the bad ifs for which we pay 1 over dn and ignore this factor completely, those whose br is not zero, excellent, thank you, thank you, this is nice to get such questions, so, all that I'm trying to do is spell out why we just needed a lower bound and that lower bound just needed to be valid for most targets and not all targets, so SR is the good set, it is, it is n minus root n, essentially, dn minus root n, okay, so, so, so this guy will be negligible, right, this is dn, this is dn minus square root n, this is what Lemma 1, 1 told us, 1a and for the rest, we have this SR times epsilon dn because we have 1 over dn minus 1 minus epsilon dn, so with high probability this applies to all of these, this is why I wrote with high probability here, the first part is, okay, which is at most epsilon plus 0.1, okay, so I spelled everything out, essentially, except for this little Lemma that is an exercise and now, in the seven minutes that remain, first of all, let's breathe a little and see that actually what we did here was there's another board oh, and only now I found out and it's one of the nice big boards, okay, this is really a great board okay, so the proof that we just gave was elementary, we essentially counted the number of paths from E to F and constructed them, gave a lower bound on them and just showed that you essentially walk along a tree here, walk along a tree there and have exactly the right number of connections between these two trees, in order to carry this out, you needed control over the cuts between balls of size root n situated around typical points of your graph, which is something that you have in a random graph but you really do not have in arbitrary constructions of expanders, there are, one has control over linear expansion of sets, over the cuts between a set and its complement if it's linear but not around sets that are mesoscopic so this entire proof approach becomes problematic, now our next goal is to somehow try to prove the exact same theorem but for a graph G that is deterministic and we know that this, let's say, max over i between two of n, the eigenvalues of its j-sency matrix is at most two square roots d minus one, okay, and maybe we'll allow and then maybe a little of one term and we know that G and D satisfies this property and we'd like to somehow forget about everything else that we know about G and D, just remember that we know this spectral information and infer from that that we have the exact same cut-off phenomenon and then, and this is the approach that will allow us to say this statement for these Cayley graphs, these Lubovsky-Phillips-Arnak graphs that we mentioned in passing yesterday, these Ramanujan graphs where our understanding is really limited, so there we have, we do not even know what the diameter of these graphs are, that's a famous question of Peter Sarnak, Peter Sarnak conjectures that it's four thirds log base d minus one of n, the best known bound is two log base d minus one of n, there are various proofs even showing two minus epsilon would be striking, anyway so our understanding of the geometry is really limited but we still can carry out a different approach that would give us an analog of the main theorem and actually that main theorem already gives us a lot of information about the geometry because since you, so if you will see this as a simple corollary, if you know that you mix by time log d minus one of n, that tells you that the typical, so this is essentially the shortest possible distance, that means that in that range you see almost all the points, so that means that for every starting vertex, so necessarily the typical distance is log base d minus one of n, right, it's easy, from one point to almost all other points and that's, and proving that otherwise would require an argument and this is something that you could readily read off, okay, so how are you going to do that? Very, in a nutshell, when you have spectral information, what is the first thing that you want to do? You have these eigenvalues, you could just say that you'll write the L2 distance, so which you've all defined and since we have just two minutes, I'm going to, everything that I'm writing right now is going to be sketchy and I'm going to repeat it in a more detailed manner at the beginning of the next class, but this is just for intuition, what you'd usually want to do is do something like writing this, i equals 2 over n fix squared lambda i to the 2t, okay, this is your L2 distance squared at times, at time t, this is the eigenfunction, the eigenfunction evaluated at x, which is the starting point of your simple random walk, okay, and it does not depend on t and then you pay the exponent, so the eigenvalue raised, divided, normalized by d, raised to the power of 2t, so this is your L2 distance, and L2 distance is an upper bound on the L1 distance, so the first attempt is to just say, I have spectral information, I want to bound the mixing time, this is an equality, this is not an upper bound, this is exactly the L2 distance, so let's calculate the L2 distance and hope that the L2 distance actually matches, gives us a bound such that at time log d minus 1 of n, we mix times maybe, okay, so times dd over d minus 2, which is our lower bound that was valid for any graph, okay, if you do this calculation, so here's an exercise, so exercise that we'll end up with, okay, so for g, for every d-regular graph, d-regular graph g, t mix in L2 of epsilon, and for every epsilon larger than zero, this is at least 1 half log base 1 over rho over n, where rho is this, so let's take, let's say, so rho is this maximum, is this, let's, okay, let's, let's, rho is this maximal, is this maximal, is this thing on the left, divided by d, okay, so 2 root d minus 1 essentially, in the case that we care about, so you'll have 1 half log 1 over rho, log base 1 over rho of n, and you'll note that for every d that is at least 3, this 1 half log 1 over rho of n is bigger than 1 plus eta d over d minus 2 log base d minus 1 over n for some eta of d that is positive, okay, so we are done, I just want to say that this exercise tells you that if you plug and play the spectral information, okay, and try to get a bound on the mixing time, and this is, and this is the bound that we are after, d over d minus 2, this was the result that was true for g and d, this bound will always be off by a constant factor, always, so we have no chance of actually just taking the eigenvalues, calculating the L2, we calculated, but this is, okay, this T mix is actually not only going to be a half plus of 1 as a lower bound, this is the actual reality for g and d, this is also an upper bound, okay, so if you are ambitious, you can try to prove inequality, it's also not difficult, okay, so maybe you should try to shoot for that one, so this is tight, and there's no hope in just using that to give the bound that we want, but tomorrow we'll see how the non-backtracking random walk does, so moving to that one and just using the spectral information that we had on g is going to be enough, and then we'll use the exact same reduction to non-backtracking random walks that we did today, thank you, that's it. Yeah, at the end, you're saying that the L2, the time is free. Oh, oh, oh, oh, I should. And the L1, what do you think? Well, the, okay, this is a, I want to say, I want to say that you have this one, this, this, this is, I should say, this is an inequality that is always true, this is an inequality that is always true, the L2 is always bigger than this guy, and by the way, and this equality, this will be valid for g and d, or for Ramanujan graphs, but, and for those graphs, where you do have L2 that is exactly here, and L2 happens macroscopically after the L1, so that is correct, and actually you can write down all the LPs, and they are, and g and d will somehow, you will, for each of them you will have a universal lower bound, something that looks like this, but will also depend on p, and that universal lower bound will be attained exactly by g and d, so the random graph and also Ramanujan graphs would be the fastest mixing amongst all the regular graphs with respect to each of the LP nodes, okay, but at the times will be increasing until, until you get to L infinity.