 Right. Good afternoon. So we are back with our stochastic block model. Remember, this is this random graph model, which has an underlying block structure and our aim is to try and reconstruct this block structure from the observation of the graph. So that's depicted in a picture here. And so we left it yesterday at the point where I was telling you that the classical methods that we want to use. In general, these are spectral methods, which tend to process the adjacency matrix of the graph extract the eigenvectors associated with the largest eigenvalues and then cluster according to the entries. All right. So why does that fail in the in the sparse setting that we're considering here. So let's first see that it fails because the eigenvectors associated with a large eigenvalues tend to be localized. So they may have their entries in a small support and this will tell you nothing about the cluster structure. So let's try to get a feel for this. Let's see first that there do exist some large eigenvalues for which the corresponding eigenvectors are localized and will not be helpful for clustering. So here's the simplest argument that I know of you want to get that. It's to first look at the case where you don't have a block structure. So that's the other training random graph. So look at structures in this graph that are isolated stars. So that could be nodes, which have a collection of neighbors, and they are not connected to the rest of the nodes, and their neighbors are not connected between them. And so I might consider for a node I whether it's the center of a star with the branches. Okay. So, there's one thing that you can take as an exercise that if you consider neanderthal training graph with n nodes and p equals lambda over n as the probability of nudge being present so lambda being of order one. So if now you consider D to be given by some small constant C times log n of a log log n. I'm not mistaken. And you take C to be less than one. Then you will have many such isolated stars in your graph. And if you want to try to show it by yourself, you could prove first that the expected number of the stars of that kind goes to infinity as n goes to infinity. And that the variance of the number of distance like that is negligible compared to the square of the expected number of such distance squared. So if you have those two things, then you can call upon the so called second moment method to prove that the number of such objects goes to infinity with high probability. Because you can use the image of the chef inequality and show that with high probability you have many such objects. All right, and this is the case for that particular state. So we have such objects in the graph. So if now you think of the adjacency matrix, this will produce the Peron Frobenius eigenvalue of the corresponding graph. So that will be that is something that you can figure out is an easy computation and this Peron Frobenius eigenvalue will be given by the square root of the number of branches. Sorry, so if you have a matrix square matrix with non negative entries, the Peron Frobenius theory tells you that it admits positive scalar eigenvalue and modulo some irreducibility assumption. So this is the eigenvalue with the largest modulus. And so in that case, you can figure it out. So the adjacency matrix of a star graph has as its largest modulus eigenvalue, that is the Peron Frobenius eigenvalue, the square root of the number of branches. By this argument that in your original graph you have many eigenvalues which are of order square root of C log n of our local log n. So large eigenvalues they go to infinity with n. So if you think of the corresponding eigenvector it is supported by the by the vertex set of the star. And so it's a localized eigenvector. It does not tell you anything about its support does not tell you anything about the block structure, which in our stochastic block model consists of macroscopic blocks of order n nodes. So that's the exercise you can do in the setting of the other training graph, but exactly the same kind of argument goes through when you consider instead the stochastic block model in the sparse regime will consider so again we have existence of the stochastic block models associated with large eigenvalues whose eigenvector whose eigenvector is localized and not useful for for clustering. And conversely you can also ask, are there perhaps other eigenvalues for which the corresponding eigenvector is non localized. And this results out that if the eigenvalue is large, the eigenvector is localized in some sense. So there's really strong evidence that you cannot do anything with either the Laplace matrix of the graph or the adjacency matrix of the graph. So this is why we need to do what colleagues. The spectral redemption we need to redeem spectral methods in order to do clustering in that setting. So let me give you a before I tell you what they meant by spectral redemption let me tell you first. In earlier conjecture that was made in this stochastic block model. You can, if you think again of the relationship between this random graph model and the tree model we were describing first, you can use belief propagation as a way to try and estimate spins or blocks in the graph. And so this is the equation I'm writing here on the top of the slide. So you take your graph and you can consider each oriented edge. So here that would be I J pair of edges you consider some orientation. And now, now you want to compute somehow the conditional distribution of the spin at node I, given the observation of the neighborhood at distance D. So that's the, the setting you might you might consider. And so, if you knew the conditional distribution of the spin at node I given some far away information away from J. And then you could bring together such information for many neighbors of node J and, okay, propagated along the edges. And so that's, that's what this equation does here, or it tells us. I think I've done it in the right direction so to get the message, the belief propagation message from I to J. I can combine the messages that I received from each other neighbors. So, you can construct an algorithm which is initialized for each oriented edges by some say random distribution on the collection of blocks and iteratively you update those messages using belief propagation. So the conjecture that was done in 2011 by you're really on the sale et al. is the following one. It says that if you are above the custom stigma threshold, that is to say the average degree alpha of your random graph times the square of the second eigenvalue of this stochastic matrix P is above one. Then initializing BP with random weights, iterating it sufficiently many times, it would converge to a fixed point, and this fixed point would carry some meaningful information about the block structure. So, today, this is not proven numerically, this seems to be true. There's a numerical evidence towards this conjecture, but it's still open because it's very challenging to analyze the fine properties of belief propagation in sparse regimes like that. So, motivated by that subset of the team in the day sell et al paper together with other people there was a Elcan and muscle who joined the forces at that time, they decided to cook up a spectral method that would be based on linearization of belief propagation. So, recall, we had messages that consist in distributions of the queue blocks. Okay, so psi from I to J is a distribution. We know that belief propagation admits this fixed point that is the fixed stationary distribution new in for our stochastic matrix P. So why not linearized belief propagation say, assume that the message psi I to J coordinate S is a small perturbation of the stationary probability new service. If you parametrize things in that way, you can take the belief propagation equations linearize them so just retain the first order coefficients, you know those epsilon I to J sub S for any spin value S. And you get for the belief propagation once linearized. An interesting structure, namely, your vector of torture and quantities epsilon is updated by being multiplied by a matrix. So the dimension of this thing is the oriented edges. So for each edge we can pick two orientations. And for each oriented edge we get q coordinates one per block. Okay, so we have oriented edges times blocks as the dimension. So this vector is updated by being multiplied by a matrix that is a tensor product of a matrix indexed by the oriented edges. Connect your product with a matrix that is indexed by the blocks and this is the stochastic matrix P, the one that depends on the blocks. So let's now look a little bit more at the matrix that depends only on the graph structure and not on the parameters of the model that is the B matrix in this equation, which is known also as the non backtracking matrix of the graph. Okay. So by construction, its dimension is the number of oriented edges so twice the number of edges in the graph. Okay. So, in greater detail, so you can you can figure it out by yourselves looking at the BP questions. You know that the BP messages they flow in a particular way, you get messages from an edge to another edge if there is a, the two of them constitute a path of two hops, but there's no flow from an edge to itself backwards so it's hence the name non backtracking. Okay, so an entry of this matrix B for an edge, he thought for a pair of edges oriented edges, he F will be one if and only if the edges EF constitute an oriented to hop path in the graph. And you get zero otherwise in particular you get zero if he feeds into F but F is just the reverse of right. So this is a matrix that's been studied in the past in particular people in number theory have, have worked on it. We will see some of the earlier results of this matrix later on, and it's convenient for instance to count specific combinatorial structures on the on a graph. But it's, it's quite clear that if you want to count paths oriented path in the graph starting from one edge, going to another edge with a number of edges in between, say, T edges, and you want to forbid back back tracks, namely, okay, if you want to count ordinary paths in a graph you can use the adjacency matrix you raise it to some power and you get a counter of paths. If you want to forbid back tracks. So, let's say a path back tracks, if it goes like this back to us. I one, I two, and then if I three equals I two, I one, this is a backtrack I want to forbid that. So if I want to count the number of non backtracking paths from one oriented edge to another oriented edge of a given length then I just need to do to raise this matrix be to the power of the length that I'm interested in. Okay. All right. So, the spectral redemption is the following thing where usual spectral methods fail in those past models spectral methods based on the matrix be associated with the graph will succeed if we are above the cast and stick and fresh up. That's what I want to discuss in some detail now the main results that we do have about the spectral structure of this matrix before our stochastic block models and the consequences for spectral clustering based on the spectrum, the on the non backtracking matrix. Okay, so it will take a bit of notation. So maybe some right. Why is it a good idea. Is it helpful for us to remove backtracking paths. So you can think of those star structures that I was alluding to earlier on. So if you have a small star like that in a while small, not so small, a star with the branches in a graph so high degree node, then you will have many paths that start and end at node I, but this will have to backtrack. You can backtrack any number of times. So high degree nodes in use large eigenvalues in the adjacency matrix. Somehow because they reflect the existence of many paths that do backtrack and that start and end at that node. Somehow, the backtracking paths are the reason why you do see high eigenvalues with corresponding localized eigenvectors in the adjacency matrix. And so we get some help by removing backtracks. Okay. So, First, let me let me introduce the notation to state. Yes. Yes. Yes, it's in random graphs, like the ones we consider you tend to have short cycles but very few of them. So you know that the number of triangles say is of order one. So you may you do have. So if you have a node that is in a triangle like this. So you will have none backtracking paths that can circle this triangle. Okay. But since we have not too many triangles this is not something that matters. So if you consider a generic node in this graph, it won't have it won't be part of short cycles, the shorter cycles it will belong to will be of length order log n. And so backtracking path of length smaller than some constant times log n will not will not contain cycles. Okay, some notation. So remember we had these two parameters for our model alpha the average degree P the stochastic matrix and we called M the mean progeny matrix is just alpha times P. So it turns out that the spectrum of this matrix M is going to be important in our analysis of the spectrum of the non backtracking matrix B. So we call lambda I of M the Eigen values of M ordered by decreasing by decreasing absolute value. So the Peron Frobenius Eigen value of this mean progeny matrix is going to be alpha, the average degree, and we'll have only a smaller absolute value for the next Eigen values lambda two of M, etc. Okay, so we'll be able to relate the Eigen values of B to the Eigen values of M, and we'll also be able to relate the Eigen vectors of M to the Eigen vectors of B but here we need to do something because the Eigen vectors of M are Q dimensional the number of blocks, whereas B is the size twice the number of edges. So we need to lift somehow the Eigen vectors of P or M, in order to get candidate Eigen vectors for B. The lifting operation is what is described in the second paragraph. First as follows, I pick a Q dimensional Eigen vector of M associated with a sum Eigen value lambda I of M for each oriented edge U to V of the graph. I assign a lift of the Q dimensional Eigen vector by saying that vector Y sub I at coordinate E will take value X sub I of the spin of node V. So I assign to each oriented edge a coordinate of the Q dimensional Eigen vector that corresponds to the spin of the end of the edge, right. Okay, so I get my lift. And this is not good enough. This won't be close to an Eigen vector of B. But if I apply B, a sufficiently large number of times to this lift, then I will get something that is close to an Eigen vector. So that's what I do. I take parameter that is a log n times some tiny constant. And so, eventually I have a candidate Eigen vector for B that is denoted Z I and that is B to the L times the lift Y I. All right. And so now I'm ready to read the statement. So, this works as follows. Consider Eigen values. So, remember this case 10 stygum condition, this was stated as the second largest Eigen value of M has a modulus squared that is larger than lambda one of them that is alpha. So this was my, my case 10 stygum condition was saying lambda two of them squared is larger than lambda one of them. So let me define R zero to be the rank, the largest I at which decide a case 10 stygum condition is met. Okay, so it will certainly be met for equals one. So above the case 10 stygum threshold, it will also be met for equals two. And maybe it will be met for equals five but then stop being met so R zero is the largest index for which it is met. And the statement is then as follows below for the Eigen values that are above the case 10 stygum threshold so that is Eigen values of index. So if it is no larger than R zero, then the corresponding Eigen value lambda I of B that is the highest largest Eigen value of the largest modulus converges in probability as my graph becomes large to lambda I of M so I have the spectra of the two matrices that coincide for those meaningful I know as well that there is an Eigen vector. That is obtained by the lifting procedure I described that will be a parallel to an Eigen vector of B. So my construction indeed produces conviviate Eigen vectors that are asymptotically aligned with the actual Eigen vectors of So that's when I'm above the case 10 stygum threshold for indices I larger than R not, I can only say that the corresponding Eigen value of the non backtracking matrix B in modulus is at most square root of alpha plus some little of one. So that's that's the description that we get, and we get more information actually on the Eigen vectors but I'm not going to detail that. And so, perhaps a picture. So these are asymptotics that kick in reasonably quickly if you simulate random graphs of that stochastic block model with a few hundreds of nodes. In this case with two symmetric blocks, you see those asymptotics kicking so for instance here we are above the cast and stygum threshold so we have q equals to so we have two Eigen values for matrix M, and we see both of them in uses an Eigen value in the spectrum of B that is close by. So they are outside the bulk of be where we call the bulk of be the collection of Eigen values that are within a disk of radius the square root of parameter alpha. Okay, so we see this asymptotic behavior kicking rather quickly. Okay, so let me let me write away state the corollary about clustering so we can work a bit more. Q q by q blocks. Yes. Yes, so they are size q. And so if I want to produce a candidate Eigen vector for the be matrix, which is of size twice the number of edges, or the number of oriented edges. I do a lifting procedure. So I first create a vector of size twice the number of edges, which has for for an oriented edge e you to be. It has as its entry. And three of the Q dimensional vector. That is an eigen vector of M. And I take the entry that corresponds to the spin of the endpoint of the of the. And so this is in that. So that the Eigen vectors of be will in fact be correlated with the spins of the nodes, because the Eigen vectors happen to be constructed when they are close to vectors constructed from the spin that the notes. No, no, no, these are parameters of the models. So you cannot do. But the fact that the coordinates will show up, but the correct. The coordinate that impacts an edge, the coordinator of an edge in the Eigen vector will be the coordinate corresponding to the spin of the endpoint of that edge. Okay, so the Eigen vectors of be somehow reflect the block structure. So, the precise statement that we get is the following for for block reconstruction so I assume you are above the kestan stick room threshold. Take the Eigen vector of be associated with the second largest I can value. Okay, so that's. So we start with the vector side to then we bring it back to a size n. You start with a vector indexed by oriented edges. Now, for a node you, I would construct the quantity five of you which is the sum of our vectors pointing into you. So we have a V that is a neighbor of you of side to of V to you, the edge from V to you. Okay. So we want to have entries per node of magnitude of order one so this vector we can normalize so that it's, it's you can know is exactly square root of n. So the rigorous statement that we could prove is the following. It holds in the case where the measure new is uniform. Okay. And so we do a partition of nodes based on this vector and the thing that we can prove is for a random partition what you would want to do in practice is take these entries. And then if it's positive put it in one on one side if it's negative put it on the other side and it seems that this does produce a meaningful clustering in practice but the one for which we can prove something is a randomized is where we put nodes into two boxes at random where the probability of ending in the plus box is given by one half plus one of a 2k times this quantity five for the node that we're considering. So we want this to be a probability so if this would go above one or below zero we just say nothing about this node, but if we pick the parameter k large enough. There's a vanishing fraction of nodes for which this happens. And this clustering into two boxes achieves positive overlap according to the definition we have. Okay. So for this we can prove by using fine properties of the eigenvectors of the matrix B. I'm not going to. Are there any questions about the statements. No. Does this property lead to symmetry breaking of the adjacent symmetries. Yes, it is non symmetric definitely and in fact you can see that on the plot here, the spectrum is not real. It's definitely a complex spectrum the bulk occupies actually does occupy the boundary of this disk of radius square root alpha. But next you declared that all block matrices are symmetry. So, so the non backtracking matrix is something you can construct for any graph, and we were we have been discussing only non oriented graphs. For which the adjacency matrix is symmetric, as well as the Laplacian matrix so we have those classical matrices that are symmetric they have a real spectrum. However, the non backtracking matrix that you can construct from any non oriented graph is generically non symmetric. So because you cannot. Okay, if you if you have one edge that feeds into another one. You don't expect. So if it was a symmetric matrix, you would have also the entry F to E would be equal to one as well so it's definitely non symmetric. This is a symmetric construction from a cement, well, non oriented graph was adjusted symmetrics is symmetric. Thank you. So. All right, so the reference on the spec for redemption is a paper by in 2013 by Jack a la et al in the last. So the theorem I was describing something we proved together with a marker, the large and Charles Bordenab. And the proofs have become simpler and simpler in particular there's the, perhaps the nicest version now is that by Bordenab cost and then that could be in 2020. But it's still not easy definitely quite hard read still. And some other simplifications were brought in works with my PhD student Ludovic Stefan. In particular the randomized clustering here is something that we developed with Ludovic. So let's try to give you a, okay. Perhaps to sum up where we stand on this problem of clustering into blocks with a significant overlap. These graphs from the stochastic block model. So we have now a result. So the whole redemption thing says that when you are above the case instinctive threshold, you can reconstruct in polynomial time. Okay. So the plot here is, is meant to illustrate the face diagram as we know it for the queue. So in this case, the x axis is the number of blocks. And in the symmetric situation where you just have two more parameters needed to specify your model so in your P matrix. You put one coefficient on the diagonal and another coefficient everywhere else off the diagonal. So the y axis is the coefficient on the diagonal for the P matrix. And so we, we have a linear region here in this q a diagram, which depicts the above kestens teagum region. So it is polynomial time feasible to cluster the nodes in a meaningful way. So we do have as well some bounds on the region where we know it's not going to be feasible, even with unlimited computational power. These are the scenarios where the associated tree reconstruction problem is not feasible. As I was telling yesterday, if the tree reconstruction problem is infeasible, then the block reconstruction problem in the graph is infeasible as well. So we have a region where we know no algorithm will will produce meaningful clustering. The interesting thing is that there is a region in between where we also know that the brute force algorithm can succeed, whereas we are below the kestens teagum region. So the consensus is the following experts in this area believe that the threshold for polynomial time reconstruction is the kestens teagum threshold here. And that below that you need super polynomial time in order to cluster meaning fully the notes. Okay, so I'll just. Okay. Maybe I'll just finish on some intuition for the results. And I won't say much more than the intuition about the proof arguments for this, this theorem. So actually the intuition for why we can produce plausible eigenvectors for this non backtracking matrix. It does come from the analysis of the tree problem we started from. Particular from this martingale convergence property that I was describing when establishing that we had this, what we call the census reconstruction using a particular construction exhibiting a martingale and so on and so forth. So, here's what how the intuition works. If you if you recall, we produced a candidate. Eigenvector. Zi for an edge, he that is a you to be to be the following manner. We had the I equals B to the L times why I were. I have K to a one, let's say, of F equals to value T precisely. We had a Q dimensional vector x sub I taken at the value of the spin of notes. Okay, I had this particular construction. So now if you think of what this means, if in the vicinity of that edge, my graph has a tree topology if I go not further than L hops away. So you can see that this entry here. This is precisely the sum along non backtracking pass that go. So, L hops away from this edge. So I have these non backtracking paths I have so nodes here. All those nodes downstream at distance L, say a node W. Then what I do is I sum over all those nodes. So I sum over such W's precisely XI of the spin of those guys. And this is exactly the martingale quantity I was constructing the other yesterday, except that I was normalizing in order to have a martingale. So our martingale was of this form. Same thing except that I had alpha times lambda I of P to the minus L. And so now I can recall what we had considered there is a martingale convergence theorem which tells us that in the tree setting, if the parameter L becomes large, then the normalized quantity approaches a martingale limit. Okay, I have this martingale limit. And so, since the neighborhood of a node in the graph is closing distribution to the tree I was considering, I can, at least it's a possibility argument. I can hope that the quantity I have produced here, this is reasonably close to the I of E, this is reasonably close to the martingale limit to this lambda I of P to the plus L times, say, the view of E, this is the martingale limit. So if I believe that then I can now. Well, the fact that the martingale converges says that if I do this construction with B to the L or B to the L plus one this is the same thing that I get, except for a change in the in the position here. And this is another way of saying that my vector is approximately an eigenvector so the martingale convergence theorem tells me we get approximate eigenvectors and we get also an expression for the eigenvalue so this is the intuition and we can make this intuition work but this is very technical so this is why I don't want to say much more just that. Yes. Yes, let's do that. Let's start sharing.