 It's an honor and a pleasure to be invited here in this prestigious place. So I'll be talking about infant problems, and especially pre-reconstructional community detection. So let me tell a few words about the kinds of infant problems that we have in mind in this area. If you take the first one, community infection, typically the task consists in processing a graph which represents relationships between entities. And you want to cluster these entities into groups, such that nodes within a group are statistically similar. So the picture on the top is the graph of quotations between bloggers in the US and somewhere Republican, somewhere Democrats. And just by looking at the graph, you can recover who's a Democrat, who's a Republican. And that's what this picture illustrates. So this is a very generic task, community detection. And that will be a focus of this talk, a second one where we are trying to retrieve some structure hidden in a large data set, data graph alignment. So I'm giving this example just to show that there are many problems beyond that of community detection. So graph alignment is a task of given two graphs, find mapping of the vertices in one graph, two of the vertices in the other graph, so that you have more or less graph isomorphism. And you have many tasks that can be phrased as a graph alignment problem. For instance, you can have a graph representing the interactions of users of a nonline social network. And you may have on one such graph identities that are revealed. And you may have another graph representing interactions of users of another online social network for which the identities are not revealed. And so if you can find the isomorphism, you can de-anonymize the second graph. So that's one example, but you have many more. So there are algorithms for all of these tasks. There are plenty of algorithms, but it's hard to say which one is good, what are the relative merits, what is the difficulty of the task at hand. So the philosophy of the work that I'm going to describe is let's look at large random instances of these inference problems. And by analyzing those large random instances, we'll have a handle on the hardness of the task. Is it difficult? What kinds of algorithms will succeed? And we will be able to prove theorems about the feasibility of the task. And we may also develop new algorithms in the process of analyzing the performance of existing ones. OK, so I'll be focusing on community detection. You can forget about graph alignment for now. And what I want to convey is this kind of picture. So you have a large instance of a community detection problem. You're trying to recover blocks underlying the data set. And by varying the parameters of your probabilistic model, your generative model of the task, you may have three situations at least. So there are three depicted here. There is one set of parameters. So I'll go over the meaning of the axis. Don't bother trying to figure out what they mean for now. You may have three different outcomes at least. One where the signal is just too weak, you cannot recover meaningful blocks from the observation of this large graph. There's a two-week signal-to-noise ratio. This is impossible. There's a situation where it's not only possible, but available polynomial time algorithms can be shown to succeed at extracting the information. That's the easy task. That's the easy region. So that's the triangle in green on the left. And then in between, you have this very puzzling region where you have signal. You know you can extract it in exponential time, but you don't know whether a polynomial time algorithm exists that will succeed at extracting the signal. And so you can take a point in this diagram, move up, increasing the signal-to-noise ratio, and cross boundaries. So the red line will be an information-theoretic threshold. That's where useful information appears in your data set. And when you cross the blue line, then that's a computational threshold. That's when the problem becomes easy, whereas it was feasible but hard between before that. OK, so that's what I want to convey for the particular problem of community detection. But before doing that, I'll go over the tree reconstruction problem, which is interesting in its own right. Which also paves the way for the understanding of this community detection problem. All right, so here's the outline. I'll describe two phase transitions that occur in the problem of tree reconstruction. Then I'll move to community detection related to the tree reconstruction problem. Talk about the three phases, showing that a hard phase exists indeed. Show that above the so-called K-sten-Steagum threshold, we can indeed recover the hidden information in polynomial time. And then the last bit I want to cover is a link between study of this community detection problem and a random matrix theory. So let's start with a tree reconstruction. And I'm missing a picture on that slide, so I'll start using the chalk on the board. So here's the setup. We have a generality tree. So we have a non-sistor or a root at the top of the tree. And then we have children. And the root has a trait that could be color of its size, whether blue or brown. And these traits are passed from father to son or mother to daughter, if you prefer, in a probabilistic manner. And so we can look at the tree down to a depth D. And so we would have a spin or a trait sigma i. And the rule for transmission of traits in this model is probabilistic. We have a stochastic matrix P, which gives you the probability that sigma of i equals T, given sigma of the parent of i equals s. And so we'll assume this is an irreducible matrix. And we denote by nu. It's a stationary distribution, so nu P equals nu. So I'm almost done with specifying the model. The last thing I need to mention is that the root has a spin that is distributed according to the stationary measure. And that these propagations are done independently of one another. So we have a description of the process. And the task informally is given the tree to depth D. So I have a bit of notation. Td would be the tree to depth D. I also denote by Vd. It's vertices, Pd, it's edges, sorry. And by md, the set of depth D nodes. And so the question will be, can one infer non-trivially the spin root from observation of the tree down to level D and the trace of the descendants? So it's as if today we looked at the colors of our eyes. We knew about the genealogy tree from backwards to Eve or Adam. And we wanted to figure out whether Adam had blue eyes or brown eyes, say. So the number of children is fixed. So right now I have not said when the tree is. I will be assuming this is a Galton-Watson branching tree. So that will be my hypothesis throughout. So formally what is the definition of reconstructibility? We'll say that reconstructibility holds if the mutual information between the spin and the root that we're interested in. And the information Gd, which is the tree down to level D plus the spins at depth D, does not vanish as D goes to infinity. I assume I don't need to recall what mutual information is to this audience. Don't hesitate if it's convenient to run it. So I of x, y, the mutual information between two random variables, x and y, would be the entropy of x minus the entropy of x even y. And this would be some of the values that it can take of the probability that it takes as value times the log of one probability that it takes as value. And conditional entropy would be the same with some of our x, y. Probability of x is x, y is y. The log of one of the probability that x equals x even y equals y. So here's the mutual information. And it has the nice property that the variables would be independent if and only if this is 0. So that's this reconstructibility notion captures the fact that we have information correlated with the root spin arbitrarily far down the tree. So that's the first task we'll be interested in. And then we may ask for a harder task, which is so-called census reconstruction, if I am given not the tree to depth D, but only the spins of the nodes at depth D. And even less than, well, if I don't know the tree, I can only count how many individuals at generation D have such or such trait. This is what we call the census. If we do a census of the population today, we count how many have brown eyes, how many have blue eyes. And we have got these counts. Can we, from these counts, infer non-triviality what was the color of the eyes of Adam or Eve? So that's census reconstruction. And formally, this is going to hold census reconstructibility if the mutual information between the spin of the root sigma r and the census in generation D does not vanish as D goes to infinity. And so indeed, we need to specify what the tree is. And for us, it will be a Galton Watson branching tree. And the critical parameter will be the average number of children per individual. So that will be alpha. So the parameters of the model will be p, the stochastic matrix, and alpha, the branching number of the tree, if you like. And of particular interest will be the case where the number of children of a given node will follow question distribution with parameter alpha. Do you assume that the matrix p, the stochastic matrix, is known, or is unknown? I assume it's known. Yes. I assume it's known. So we know the parameters of the model, and we try to answer these two questions. When is the tree reconstruction problem physical when it is census reconstruction physical? OK. So perhaps one thing to mention is that we can produce notation new r, s, d, which is the probability that sigma r equals s with h e d. And so we have that i of sigma r d d will be given by somewhere s. So I did not mention that, but I will use q as the number of possible traits of each individual. And so the spin values will run over the integers from 1 to q. And so this mutual information would be then new r, s, d, log of new r, s, d over new s. That's a convenient property. And from this, actually, we can see at once that the task is going to be impossible. This will go to 0 as d goes to infinity. If and only if the distribution new s d will go to new s, will converge in probability to the deterministic distribution new s. And one more thing to note is that this must converge almost surely to a limit because this is the expectation of a random variable conditional to some information structure. And actually, we can enlarge this information structure by giving also the spin values below level d, d plus 1, et cetera, revealing also the tree below that. But the conditional independence properties we have in this model make sure that, so maybe I can write that. So I can write hd. This is the information contained in the full tree. And sigma ld plus j turned on 0. So because of conditional independence of hd and sigma r given gd, I have that new r sd is also the probability that sigma r equals s given hd. And this decreases with d. So this must converge almost surely to limiting distribution. And that will be the probability that sigma r equals s given what I might call h infinity, which is the limiting information structure infinitely far down the tree. So perhaps another remark is that these limits must exist. And that is something that follows from a basic inequality in information theory. I have that, for instance, sigma r and xd plus 1 are independent when I condition on xd. This is a Markov random field property of my propagation model. So if I reveal, sorry, that's true for the Galton-Watson branching process. So this implies, and this is a basic data processing inequality of information theory, that the information common to xd plus 1 and sigma r must be less than the information between xd and sigma r. So we have actually decreasing sequences. So believe it exists, there's no problem with that. So the first reason that I want to present now is a characterization of when sensors for constructivity is feasible and when it is not. And that is covered in greater generality than I will tell you in a paper by Hanan Mosel and Yuval Perez in 2003. And to state the result, we need to describe the spectrum of the stochastic matrix P. And so the result is as follows. We know it's a stochastic matrix. We know it's largest eigenvalue. If we order them by absolute value is 1. So let's take lambda to the second largest eigenvalue in modulus. And so the result is as follows. We have reconstructivity if alpha, the average number of children times the square of this second eigenvalue is strictly larger than 1. And we do not have reconstructivity, sensors reconstructivity, if it is strictly less than 1. And in the equality case, they show that for certain types of trees, then it's not reconstructible. So we would want to say it fails if it's less than or equal to 1, but it's not shown for every situation. So I'd like to explain to some extent the arguments in the proof. I'll be much more sketchy further in the talk. But for this one, I think it's important because it will allow us to understand why we keep talking about this extensive threshold and where it comes from. Really, we will see that in trying to prove these results. So let me, is there an eraser somewhere? Oh, there are plenty here. OK. That's a big one. No theorem about reconstructivity. I mean, non-sensors like that. I'll talk about it. I'll talk about it. Yes, so I'll quote theorems. I will give one implication in one direction. I'll talk about reconstructivity afterwards. But I want first to convey what is this extensive threshold and that characterizes the census reconstructivity. So the proof goes as follows for the physicality case. So let's assume alpha lambda 2 square is larger than 1. Then we can form from the census. OK. So first, let's take x in r to the q, an eigenvector of d, which is associated with this lambda 2 eigenvalue. And now we can form a random variable zd, which is by definition the sum of our spins i in level d of the entry of this eigenvector x evaluated at the spin of that node. And so we will, this is a function of the census because you can write it also sum over s of xs times capital S xsd. So that's really a function of what we call the census. And we show that in fact this already contains enough information to do non-figural construction. So how do we do that? We use Martin Gaye Convergence arguments. And to do that, we introduce a proper information structure, filtration, and the one that will be adequate for us is the information present in the tree to level d and all the spins of nodes to level d. So I know that sigma dd, where dd is the set of nodes in the tree to level d. And so we'll show that this is Martin Gaye for this information structure, that it converges to a limit that still carries information about the spin at the root. So why is it a Martin Gaye? So we look at expectation of zd conditional and fd minus 1. We can, oh, I forgot to normalize. Yes, I need a term. I need lambda to alpha to the d here, if I really want it to be a Martin Gaye. So that will be lambda to alpha to the minus d times the expectation. And I want to make zd minus 1 appear, so I write it this way. I have some of our nodes in the level d minus 1. And then I actually have moved the expectation inside. And I have the expectation of the sum of d such that the parent is i of x sigma j given fd minus 1. So that's the first part of the calculation showing this is a Martin Gaye. And so we know there are, on average, alpha children. And we know that conditional on the spin of their parent sigma i, the spin downwards will be t with probability p sigma i t. So this reads lambda to alpha to the minus d sum over i d minus 1. And now we have sum over t of alpha p sigma i t x t. So now we leverage the eigenvector property that could be precisely the summation of our t x sigma i times lambda 2. And so I can prime it on this board. So I will indeed get lambda 2 alpha to the minus d plus 1 i d minus 1. So that's precisely zd minus 1. So that's the part that does not depend on the assumption. It is always a Martin Gaye whether we are above or below the Kasten-Steagum threshold. And so now let's leverage this Kasten-Steagum assumption. And leverage it by showing that the Martin Gaye has a bounded second moment. And if it has a bounded second moment, it is uniformly integrable. And if it is, then we have a theorem that tells us that it converges almost truly and in L1 truly limiting random variables. So the Martin Gaye convergence result is if sub-tremum over i d of a expectation of zd square is finite, then there exists z infinity such that this precisely the expectation of... So that will suffice for us to prove that we have information in zd arbitrarily large d about the spin rule because we have that... Maybe I'll prove that and then I'll go back to how we extract information based on this property. OK, so to do this we look at the variance now and that's where we see the Kasten-Steagum condition appear. So the variance of zd will write as the variance of the expectation of zd given at d minus 1 plus the expectation of variance of zd minus 1. So the first one by the Martin Gaye property, this is just variance of zd. Whereas the second one, this is the expectation and now if we look at the Poisson situation we will have something that will be sum over i in Ld minus 1 and we'll have up to some constant a term that is the variance of the Poisson random variable with a bounded mean. So maybe I should not detail this too much. Let me just say that we have a term that is less than constant times 1 for each of the... for each of the nodes in level d minus 1, the contribution to the variance is order 1. OK, I have morally each of them contributes an offspring of spring type t. Let's say we will have Poisson distribution with a parameter alpha p sigma i t. This is going to be multiplied by x t. So we'll have a constant we can take the supremum of this x t. We'll just have things that can be bounded uniformly for all i and then we have one contribution per node at level d minus 1. So except that I forgot again my lambda 2 alpha to the minus 2d that pops out when I take the variance. It is d minus 1 for the first term. Yes, you're right. Sorry. OK, so all right, so now we can leverage the assumption I have because the expectation of some of our i in d minus 1 of 1, this is precisely alpha to the d. And so I have that variance of z d is less than variance of z d minus 1 plus up to a constant. Now I have my term lambda 2 squared alpha to the minus d. I have conservation of alpha to the d term. And so this is less than sum over all integers of this lambda 2 squared times alpha to the minus k. So this is indeed less than another constant. So the Kesten-Steegum condition implies that we have a uniformly integral martingale. Now let's look at the expectation of this martingale conditional on the root spin. And that's precisely x sigma r. And so one thing we need to remount now is that since we have an eigenvector that is associated with an eigenvalue that is not the trivial one of this stochastic metric, this lambda 2 is distinct from 1, then the eigenvector cannot be constant because the whole one's vector is an eigenvector associated with the one eigenvalue. So this is not constant. And hence we, if we want to have a mean for this limiting random variable that is conditionally on the spin at the root dependent on this spin at the root, we cannot have independence. So I think I can skip detailing that, but this is really what drives the reconstructibility proper. Okay. So for the other direction, I just mentioned a theorem by Kesten and Steegum that is instrumental in showing that we cannot hope to achieve reconstructibility based on the sensors if we are below the threshold. So the theorem by Kesten and Steegum says the following thing. So if alpha lambda 2 is less than 1, we have that taking x as d, the number of individuals at depth d of type of a spin S can be normalized by their expected, subtracting their expected value. Now if we take an appropriate normalization, we look at this whole Q, then this will converge in distribution as d goes to infinity to a Gaussian random variable with a covariance matrix which is independent of sigma r. So that's conditionally on sigma r. The limiting distribution is independent of the spin root value. So that's instrumental in the proof. You need further steps in order to deduce the property, but this is really what the crux of the proof and so that was done in 66 by Kesten and Steegum. Okay. So this one I wanted to give some details. I'll be much more sketchy from now on. So I'll now mention the other threshold we were considering which was reconstructibility. So that was all about sensors reconstructibility. We have this Kesten-Steegum threshold. Now what about reconstructibility? Well the information about the root that is captured by the spins in level d and the tree between the root and level d can be summarized in the conditional distribution of this spin at the root given the information we have. It turns out that there is an algorithm that allows us to recursively determine this distribution. This is the so-called belief propagation algorithm that's been discovered several times and is usually attributed to Giudia Perl in 82. And so basically to determine the distribution of a spin at a particular node even its descendants at level d. So let's have a tree like that. We have node i here and then it has some descendants at level d that I call lid and then you have the whole of level d here. So by applying the conditional independence property plus the bias formula you can really in a couple of lines establish this recursive formula for these conditional distributions of a node spin knowing its descendants in generation d. And so that's also called the sum product algorithm. So basically this distribution for node i is given by a product over its sums of a sum of its sum's distribution conditionally on their own descendants in generation d and you have an appropriate normalization which involves matrix p as well as the stationary distribution nu of p and so that's something you can propagate across towards the root and you need to initialize it properly by taking the Dirac mass at the true value of the spin that you do observe in generation d. So that's an algorithm but this gives you if you compute it all the information there is to to know about the quantity of interest which is the spin at the root. But this is besides being an algorithm this is also an analytical tool in particular because of this you can find the recursion for the distribution of this conditional low given the information at level d for the spin at the root given that this spin at the root takes a particular value of t. And so this is a simple fact you can condition on the value of the spin at the root let's say t you can determine its number of children d you can determine the spins of its children then conditionally on that the messages that come back from level d using the BP recursion will necessarily correspond to if you go down to level d plus one they will correspond to what you would have taking the information from level d so in other words we have for this IED a recursion that is called the density evolution that is as follows. So IED plus one is characterized as the low of the outcomes of the BP algorithm if we plugged in inputs in that recursion that were distributed according to IED okay so this is in this sense that belief propagation is not just an algorithm but it is also an analytical tool because once we are equipped with this density evolution equation we can deduce things about the feasibility or not of the reconstruction task okay so something I've already mentioned is that non-reconstructibility amounts to this distribution of the conditional low of the root spin converging to the Dirac mass on the unconditional distribution and so there is something that can be deduced from this consideration about this density evolution equation which is if there is a unique fix point for this density evolution equation it has to be the trivial one okay and since the conditional distribution of the root spin given the information at level d has a limit and that this limit must satisfy must be a fixed point hence if there's only one fixed point we know for sure that reconstructibility is doomed okay and so in a paper by Andrea and Marc Mésar in 2006 they they claimed this but also they showed the converse that at least when the spin distribution at equilibrium is uniform then non-uniqueness implies reconstructibility so if you have non-uniqueness you can craft signals from the spins at level d propagate them upwards and you will get something that is non-trivial that contains non-finishing information about the spin of the root so a further use of this density evolution equation was made by Alan Slay in 2009 where he established something that was that the spin values will be uncorrelated that's what's given by that and and we can deduce that if true reconstruction fails indeed block reconstruction in the associated commutative detection problem phase but did the converse is true as well because it seems that since you have a locally you see a tree and locally your common field has the same structure as in the tree if you can reconstruct on the tree then you will be able to reconstruct on the graph well on the tree you get to see the spins at a far away distance yet you don't get in the graph so you have your stack at generation log of n you mean it's it's not that in the tree we have cycles so if you can reconstruct on the tree that means that if you look at the tree you see d then you have some information on the root right if i look at the shape of the tree and at the spins i guess and since locally in the graph you see a tree if you explore not too far that means that in the graph if you explore up to distance log n and you see your tree shape then you if tree reconstruction works that means you you should be able to tell something about the the spin in the graph i'm not not necessarily not necessarily so uh maybe i show plots perhaps i can't observe many labels you don't observe any spins that's the difference three they give you the spins at the bottom ah yeah they don't give you any spins ah i see ah the contrast i mean it is tempting to say through reconstruction holes then i can surely reconstruct the ghost i think not so that we have the experts in the room to to uh give us more more insight into that but uh but with tree construction means you you are given the labels yes far away but no labels you want to cluster nodes but you just see the graph no labels anywhere uh okay uh so uh we we have a sufficient condition for impossibility of the task we can just look at the true reconstruction if it's failed the community detection is failed as well uh let's see now that there is indeed a region where we have uh reconstructibility but we are still below the kestens teagum threshold so uh that that's an argument that was made by uh uh banks et al in in a paper in 2016 for the symmetric stochastic block model which is if you like the uh uh the one corresponding to the symmetric pots model so the way they are metrized it is to take as the r s t matrix two parameters c in on the diagonal and then c out of the diagonal uh so fairly straightforward to determine the kestens teagum condition of that the mean number of neighbors that would be seen plus q minus one c over q and i think the uh okay never mind the the exact value of this this is a simple uh algebra um and so uh what what they show is that for this model for q larger than or equal to four uh then uh we can have uh reconstruction in the stronger of the two senses that i've proposed so uh strictly positive overlap uh while being below the kestens teagum threshold and so let me try to uh to explain briefly the idea of the argument so uh so you can consider uh sigma hat i an assignment of labels to nodes that creates a balanced uh partition if you say uh number i of nodes i label s this is n over q and uh then we we can consider good partitions these are balanced and moreover they uh they put the correct number of edges crossing the boundaries of the two blocks and those internal to the uh to the uh blocks so i i can put m in of the sigma hat this is the number of uh so uh okay so maybe it's a good time to take a five minute break how it works now uh so what they use in the account the number of edges that cross uh the boundaries corresponding to uh so that that would be m out rather but i can't you do remember this okay yes yes so we are in this stochastic block model world we have a graph that we want to show that there is a hard phase where you can reconstruct non-triviality underlying blocks but still you are below this kestens teagum threshold so kestens teagum threshold we will see is a good candidate for an information uh computational threshold yeah and so the the point here is to show that you can do something non-trivial even if you're below that so that the the white region in the first diagram i showed is not empty and so uh banks et al they consider the symmetric stochastic block model and they proved indeed that this white region exists and so i'm trying to give the the gist of their argument and because as follows you take uh a candidate partition of the nodes into uh into um blocks of equal size a balanced partition then you will insist on the number of edges crossing these blocks you draw and the number of edges internal to these blocks you draw to be close to what it would be uh on average if you had the true partition so if you know the parameters of the model you can compute that so you similarly i would uh define m in of sigma hat the number of edges internal to the partition i'm considering and so a good partition is one such that these numbers they don't negate too much from what you would like them to be if you have the true partition so that's the expected number under the true partition and so they have special conditions which look like that so they enforce that right and so uh given these conditions you can take exponential time check all the balanced partition check if these hold and if one satisfies the criterion you stop and what they show is that for some parameter ranges any uh balanced partition that satisfies this must have another lab that is lower bounded away from zero and the way they do that is by uh uh applying the first moment method they just look at the expectation of the number of balanced partition that satisfy those and yet that have another lab that is below a particular value and so by playing with combinatories they show this must go to zero as then goes to infinity and so this is the idea behind our argument so using the first moment method and the proper notion of a good partition they they reduce the problem to a combinatorial problem okay so the white region exists okay and that's something that was expected from the result of sly in 2009 on the tree that there is an intermediate region but okay again can we have a diagram where we superimpose the tree regions with the plural and the regions we know on graphs so that okay let me try so uh okay so that was q and i guess c uh in that was and so the one i it's for three or four graphs so yes we don't we don't know for sure okay so that's ks and so we know this is easy well that's what i'm going to tell you next for graphs you can do in polynomial time something like that and we know census reconstruction is uh is in this region so the diagram i was drawing was uh here three reconstruction and so we know that for graphs yes and so uh and here the transition appears so it's distinct for q equals four okay they start diverging and for uh so that's the tree diagram and now you would put something like and so it would be feasible but hard for uh graphs in this narrower white range and impossible below but we don't expect these two boundaries to be one another thing okay um all right so let's move on to uh above the ks density threshold uh and so the the point is to see what can be done above here and so uh there's been a very inspiring conjecture made in a paper by uh uh Alemka and co-authors de sel et al in 2011 saying well if you are in this region you can consider belief propagation which if we take inspiration from the tree uh situation this is the best thing there is you just need to initialize it correctly so their conjecture was if you initialize BP with random distributions and you let it run then it's going to converge and uh converging uh uh distributions of belief propagation will be such that you have non vanishing mutual information between the true spins and the distributions that are the fixed points you converge to uh and numerically it just seems to be true so that's so so just the it's just a structure that tells you so you see if you initialize with uniform distribution in the symmetric case you stare at the uniform distribution nothing happens if you perturb slightly then some uh symmetry breaking will take place and so that's going to be uh you know it's uh reconstruction is optical permutation so one permutation will win and you'll end up having a fixed point that tends to predict correctly according to this permuted mapping of the labels so that that's their prediction and it's uh it's really consistent with the numerical evaluations um but it's still standing i mean it's still open uh it's very hard to characterize sharply the dynamics of belief propagation in those models so it's it's uh it's not uh known it's well it's strongly believed but it's not correct let's say and so here's a picture uh i mean trying to convey some intuition for what's going on here so here i'm drawing uh if i take for q the posterior distribution of this spin vector given the graph that i observe and uh x would be a candidate overlap measure so i might consider some kind of large deviations functional of this overlap so say minus one over n log of q of those spin assignments such that the overlap is particular x and so uh the intuition for for uh these transitions could be as follows if you're in the non-reconstructible phase the large deviations are like that so uh the bulk of the posterior distribution is an uncorrelated spins so you don't have information there's nothing you can do in the heart phase uh the posterior distribution would have a non-vanishing mass uncorrelated spins so it's uh it's useful signal if you could sample from the posterior distribution then you would get something useful but uh the large deviations functional is like that and belief propagation is uh trying to move uh downhill along that curve and so if the zero overlap uh vectors uh are stable and on those dynamics then you get stuck so we don't know how to initialize bp and so the heart phase is where you get stuck even though there is uh uh information in the posterior distribution and the easy phase would be uh the trigger fixed point is unstable so you puncture it slightly and you go to a good set of of configurations um so but as I say it's not it's not proven yet so it led uh again Lenka and co-authors to uh try and propose something uh something else but before I move to that let me say the next idea that comes to mind is uh take of the chef methods so spectral methods are very popular in that context what that might mean is take the adjacency matrix extract the eigenvectors of the leading eigenvalues and then try to use the coordinates of the corresponding eigenvectors for a given node as uh information about the underlying speed uh it turns out this phase uh in in those random graph models when uh the average degree is order one uh because the the spectrum tends to be uh you know flattening while extending away from a bulk and you have large eigenvalues uh that tend to be uh triggered by high degree nodes so in a sparse random graphs the degrees tend to be distributed uh as Poisson random variables and when the mean is order one then you have fluctuations you don't have concentration to the mean so you have degrees of order log n of our log log n typically and this would induce uh eigenvalues that are like the square root of that and the corresponding eigenvector would be a bit like the eigenvector of a star graph centered at this point so high value at this vertex small values uh at the neighboring sites and even smaller further away and so these are not correlated so uh hence uh the proposal by Lenka et al they have a very nice name for that they call it the spectral redemption because you need to uh and to salvage uh classical spectral methods and so uh they they propose to stick to BP but linearize it so you take the BP iteration and you uh linearize it around the trivial fixed point so you can write your messages passed from node i to node j as uh uh the distribution us that's the trivial fixed point and times one plus some perturbation and you just do a uh an expansion and so you find a very nice structure for the BP iteration it's a tensile product between a matrix indexed by the oriented edges so BP works on oriented edges passing on distributions along each oriented edge so here you see the tensile product between a matrix that is uh uh indexed by the oriented edges of the graph and the matrix of size q by q which is precisely the p matrix that you have in the tree reconstruction problem uh so what is this matrix b this is a non-backtracking matrix uh which would put a one uh on entries u to v x to y if u uh to v feeds into x to y so that is v is equal to x but x to y does not backtrack uh on u to v that is uh y is not equal to u okay so that that's uh the picture here and so um their uh spectral redemption conjecture was if we cannot prove uh that BP works then maybe we can prove that uh spectral method based on the non-backtracking matrix does work uh okay one more word about this non-backtracking matrix it's asymmetric so we are no no longer in the realm of Hermitian matrices so it's it's a bit nastier to to uh analyze uh and it encodes in a very succinct manner uh the number of non-backtracking works that you can have of the graph g so if you take the k power of b and look at the entry e to f what you will find is the number of non-backtracking paths of lengths uh k plus one edges starting with edge e and ending with edge f all right so uh what i want to describe next is is a uh result we had with charles bordeneuve and marcle large where we we indeed confirmed the prediction of this spectral redemption uh that you can do uh in polynomial time uh correlated reconstruction in this stochastic block model when above the k-sten stigum threshold uh by uh processing the eigenvectors of this non-backtracking matrix so uh it's it's a bit uh heavy to parse this slide but let's let's try that so uh what it says in a nutshell is that the there are some eigenvalues of what i call the mean progeny matrix m which is p times alpha okay i think i introduced that in an earlier slide but i might not have mentioned it so m is alpha times p okay and so uh it has uh uh q eigenvalues lambda i of m and some of them will be found almost identically in the uh non-backtracking matrix spectrum while others will be lost in a bulk and those that will stand out and be uh present in the uh spectrum unchanged of the non-backtracking matrix b will be precisely those that satisfy the k-sten stigum condition but is to say those for which uh their square lambda i of m squared will be larger than alpha which is just lambda one of them and so uh that's uh that's the statement about the uh the eigenvalue so r naught is the largest index for which an eigenvalue lambda i of m satisfies the k-sten stigum condition and so they are present in the spectrum of b whereas all the other eigenvalues of b are code in a bulk which is a disk of radius square root of a and the one other maybe i i i'll i'll not talk too much about the eigenvectors but we have results also on the corresponding eigenvectors when when an eigenvalue stands out is out of the bulk we can characterize the corresponding eigenvector when this eigenvalue is simple as being nearly parallel to an eigenvector that we can construct from the matrix b somehow and this is what we leverage to prove that a spectral method you know taking the matrix b extracting the eigenvalues the corresponding eigenvectors pick the right one then this is a an eigenvector indexed by oriented edges so we may project it down to a vector indexed by nodes uh this is what this projection does so for node u i sum up the entries of edges pointing outside of u and then i get my my uh variate for the node u and this carries some information about the underlying spins uh yeah so that that's how one shows that the the the spectrum of b allows to do in polynomial time correlative reconstruction okay i i think i don't want to pass this definition of the vector because it's not that unless unless someone insists okay uh all right sorry so the uh the output of this theorem is that you have a you have a good way to construct this sigma hat or you have a good a good choice of sigma hat yes yes so you take you take b you extract the spectrum you pick an eigenvector you project it down to a vector on r to d n and then uh you know that you get a scalar for each uh for each node and this scalar is uh uh positively correlated carries information about the speed and then so you but you can choose i mean what's how do you choose the eigenvalue how do you choose the index i so if we're still in that game where we know the parameters we have revealed uh matrix m uh p alpha etc then we can just uh look it up and say oh well there's an eigenvalue that satisfies the case density on threshold so i know i should be able to find a corresponding eigenvalue in the spectrum of uh of b maybe i'll show you this picture so maybe this is the better answer so that's uh okay an instance of this stochastic block model this is the spectrum of the non backfacking matrix so it's uh on the complex plane because we uh have non uh non-symmetry and so uh you have the perron frobenius eigenvalue which is uh real positive and uh you may draw you know a circle of width the square root of this perron frobenius eigenvalue and that's what you would call the bulk of uh of this spectrum and if you find eigenvalues that are close to the real axis and that are in between the perron frobenius eigenvalue and the bulk these are the right ones for your reconstruction so that's something that you can do uh from a data driven perspective if you get the model parameters you you can even predict that you will find them there okay um all right so um uh maybe okay maybe i should say well then the eigenvectors and why that falls if you have more than two eigenvalues outside of the circle outside of the bulk can you choose uh any you choose the the first one or the largest one you had to learn that three if you're in just in the game of that thing i can extract signal i'm not insisting on catching as much signal as possible then take any uh but certainly you could do better by taking all of them that are outside the bulk we don't quite know uh how we should do that if we wanted to but uh okay you just need one to prove that uh reconstruction can be uh maybe i should just say words uh about uh this this eigenvector construction so if i if i consider uh a vector uh so that that starts by taking y in r to the e the orienting edges uh such that y u v equals x i of uh u uh so we have sigma u where x i is the eigenvector of m that is associated with the eigenvalue of m i of m and now if i look say at e transpose l y on edge u to v for some power l then what i will have is something like that if my neighborhood is three like i will eventually have a summation uh over a node at distance n sigma i for those nodes at distance f and so uh invoking the coupling of the neighborhood with a tree i can relate this to the martingales i was using for the census reconstruction and so that would converge on the tree as n goes to infinity to some uh then time infinity u to v limiting random variable if i renormalize by alpha m star i say to the minus l after proper renormalization that would converge to a limiting thing and now you can uh get a feel for why this is a good candidate for an eigenvector because if i if i replace the entries of this vector by d lambda i to dl times limiting eigen uh limiting value then i have that v transpose of v transpose to dl y u to v that would be again something like alpha lambda i to the l plus one the infinity of u to v so i find that approximately the martingale convergence theorem gives me a kind of uh eigenvalue eigenvector uh equation where the eigenvalue would be uh alpha lambda i so here okay i've been using the notations from the census reconstruction so that would be we had alpha lambda i of p so that's lambda i of m which is just okay so that's one part of the intuition but you need to work quite a bit to uh to really show that indeed the eigenvectors can be constructed and in fact this does not suffice you see this is a bit more complex the construction of the candidate eigenvectors we need to apply v transpose l times but then b l times so that's uh this is for technical reasons uh okay so we we have we have uh uh then the result that in polynomial time we can achieve reconstruction above the test-sensitive expression so one thing i wanted to uh to tell you about is relationship between those results on the non-backtracking spectrum and results uh in the random matrix theory literature um so there's a line of work about characterizing the spectra of deformed large random matrices so the the canonical model of a random matrix is the Wigner random matrix which is a matrix that is per mission whose entries are independent up to this symmetry and they are Gaussian so you would have w ij would be a Gaussian with variance sigma square over n for i less than j and on the diagonal you would double the variance let's say so that that's the prototypical random matrix and it's been studied a lot in the 50s Wigner showed that the for fixed sigma as n goes to infinity the spectral measure the empirical distribution of the eigenvalues converges to this semi-circle low which has density given here and so bike benaros and peshe characterizes the impact on the spectrum of such a Wigner matrix if you add a low rank matrix to it that is independent of Wigner matrix W so uh hence the name BBP or bike benaros peshe phase transition so uh specifically they consider a nine times n symmetric matrix p with fixed rank q fixed spectrum as well we scale the quantities to be order one and uh similarly to the notation I was using for discussing the case density Goum threshold I'll introduce r not to be the largest i such that the square of the eigenvalue lambda i of p n uh exceeds the variance term of the of the Wigner matrix and uh so here's a version I took from benaris george and now that could be but this is really this bike benaros peshe transition uh if uh I satisfied the BBP criterion you might call it then the highest largest eigenvalue of W plus the perturbation p uh is outside the the bulk so it's larger in absolute value than twice sigma and it's precisely given by the original eigenvalue lambda i of p plus sigma squared over this lambda i of p okay whereas if you are uh for i larger than r not uh the eigenvalue is lost in the bulk you cannot see it and in fact there is a very strong parallel between this situation and the one we have in this uh non-backtracking matrix spectrum of of uh stochastic block models so I'll try to to convey that now uh so first let's try to read off the stochastic block model the variance parameter in the Wigner matrix case so recall we we would view the adjacency matrix as a signal matrix block structures uh plus the uh which is the expectation of the matrix condition of the spins and then uh we have a noise matrix that is added to it so uh sigma squared in BBP is the sum of variances of terms along a row or column so in our case we would sum you know this uh r sigma i v over n times uh one minus r sigma i v over n because this is the variance of a Bernoulli random variable so ignoring the lower order terms we get the sum over over t of r sigma i t over n um and if you if you uh if you check the model parameters this just gives you the the parameter alpha another way to see this is in those random graph models the uh uh node degrees are asymptotically distributed like a Poisson random variable and this has variance alpha so uh the variance of the the sum of terms is asymptotically the sum of of the okay it just works out okay and then the signal matrix p seven would correspond to the block matrix I was writing down and so the spectrum of this one is asymptotic to the spectrum of this mean progeny matrix uh uh m and that uh that we we saw before and so the K-sten-Steagum condition on the one hand says uh an eigenvalue will be seen in the non-backtracking spectrum uh lambda i of m if its square is larger than alpha and the BBP says uh an eigenvalue uh lambda i of the deformation matrix p will be visible if its square is larger than than uh sigma squared and so by this identification we find that this is one and the same uh criterion for uh being visible in the spectrum of a particular matrix um so we try perhaps to push this parallel a bit further now uh because okay we have uh the same criterion for eigenvalues being visible but in one case we see them in the non-backtracking spectrum in the other case we see them directly in the spectrum um and uh the difference being we we look in the SBM at sparse situations we have sparse random graphs whereas with Wigner matrices we are in a different regime we have a non-sparse uh matrices so let's let's try to to push the parallel a bit further and for for that i will uh quote a formula that is the so-called ihara-bass formula that links uh the uh the spectrum of there's a b we see here it should be i uh identity matrix minus z times b so you have a formula that relates the the uh determinant of uh identity minus zb to the determinant of an n by n matrix which involves the adjacency matrix of a graph and a diagonal matrix uh within its diagonal the degrees of the nodes minus one okay so that's the ihara-bass formula this is uh known uh for decades now and it already implies that uh an eigenvalue uh of b which is not one minus one or zero is necessarily uh why it's an if and only if it's uh it is such an eigenvalue if and only if the determinant of lambda squared i minus lambda i and lambda a minus the same diagonal matrix is zero so now we we can uh process that a bit and uh if we uh assume there is concentration of the degrees in our in our random graph models we let the alpha parameter go to infinity uh sufficiently uh large so that the uh deviations the relative deviations of degrees to their means vanish uh than any uh eigenvalue uh lambda in the spectrum of uh of the non-backtracking matrix b that is small compared to alpha is necessarily such that lambda plus alpha over uh lambda up to a tiny perturbation is going to be in the spectrum of the adjacency matrix so uh the way to pass that is uh that if you let uh your average degree uh increase you move from sparse uh models to uh non-sparse ones then you expect the uh the eigenvalues in the non-backtracking matrix to show up in uh the spectrum of the uh adjacency matrix modulo the same kind of transformation that is present in the by then i will space transition so that that makes perhaps the parallel even stronger uh and perhaps another point to take from this uh from this uh calculation is that uh all the the hassle of constructing the non-backtracking matrix and of uh extracting a spectrum is needed if you are after sharp thresholds in the um in the sparse case whereas if your models are dense enough you may just go with the adjacency matrix and stick with classical uh spectral methods so i was planning to conclude uh about uh now with uh so if if it's too early i can say more things but uh if uh if not let's conclude uh so i there are many exciting developments that i have not covered so for instance i was talking just now about denser models going beyond the sparse case and in that case uh there are very powerful tools that have been developed in particular by andrea there is this so-called approximate message passing uh set of of techniques that that is really the tool of choice now for uh analyzing non-sparse models and so if we wanted to have a version of van ben helspeche in in uh uh non-sparse cases then this this would probably be a very promising approach uh i've talked about case density transition a lot uh and then about the information theoretic transition for reconstruction there are finer phenomena that you can look at and if you're interested you can go and check a very recent paper by by lenka and uh richie tasanghi and and senerjian where the the the distinguished cases where belief propagation uh might be um might be non-trivial but yet not optimal so there's even final categorizations that you can that you can grow uh and about uh i guess one thing that i i think very very interesting is a better understanding of the heart phase to me it's it's really understood as of now so the landscape for instance of bp the the magnitude of the basins of attraction the what are the fixed points etc so i think there's a very rich geometry here that that is worth understanding and we can also try to understand such landscapes of energy for other dynamics not just to bp uh so that that's that's a very exciting uh perhaps a way to try and understand algorithmic hardness in those problems and more generally i would say that statistical physics bring us with a very rich perspective on this computational complexity problem by looking at those large instances we see a phenomena that we would not see otherwise we start distinguishing between different phases identifying cases where a very simple algorithm works situations where it starts to fail all all this is is really a contribution of this perspective on the computational complexity and with this i will stop thanks for sure so are there any questions so uh maybe maybe i'm wrong but uh so so when you're sparse because you look uh you take your one by clicking matrix and you look at the power of order again yes yes so if you are sparse enough i do that you can do it in uh polynomial time but no no we look at powers to characterize the eigenvectors yes and so we prove that the eigenvector carries some signal by showing that the eigenvector so you don't need to okay yeah that's i think the appeal of this uh this spectral method is that you just do the spectral analysis you pick the eigenvector and you don't need to power it okay uh we had looked earlier on at other spectral methods where instead of considering a non-backfriking matrix you would construct matrices counting self avoiding paths and we have done some work more recently with lidovic we see counting graph distances uh and so this takes a bit longer to construct they have other advantages but the beauty of the non-backfriking matrix is that you just you you know of the shape extract the spectrum pick the eigenvectors and you have an algorithm you are able to describe precisely the eigenvectors of the outliers or yes yes yes what what is the result so the result is uh it is as if uh conditional on the spin of the uh of the end points of an edge so you you know these are indexed by oriented edges so you pick an edge u to v you have spin sigma u sigma v attached conditional on these two spins the entry of the eigenvector is distributed as the the limit of a martingale that we can construct on the associated tree model uh with uh as an initialization point uh something that is uh dictated by those spins so it's i mean this is something you can you can uh write down so this is it is deterministic at the end it's not deterministic there is there is a fluctuation there are fluctuations but the means we know exactly and we can we can also compute the variances these laws are typically complex so martingale limits that would be a not easy to describe distribution but the the mean and variance we can easily compute and so it's as if you have conditionally underspins independent martingale limit samples with the proper mean and variance either at least in a simple description of the distribution i i don't know either i mean we can write a fixed point equation for the characteristic function for instance so that that's something we can do and if we're interested in uh estimating quanta as we can do that numerically i don't think we can uh in in general we can we can say a lot more normally for the limit of large degrees the functions right for instance the theorem of our land that you put at the beginning of the construction you obtain by approximating the distribution of the degree of uh belief by the ocean is so we're going to the sparsity lessons higher and higher degrees as i'm going to say you have averaging phenomena so it's central limits your hands start to kick in and you have more explicit characterizations so more and more deterministic more and more concentrated because when we when we stick to you know this signal to noise ratio of order one so second eigenvalues squared divided by first eigenvalue say one point two when this stays order one you cannot hope to be a perfect at reconstruction so this is really uh the the right notion of a signal to noise ratio in this space reconstruction problem if this becomes uh very large then indeed you can hope to be uh asymptotically exact any question so i have a question concerning the characterization of this space transition because in statistical mechanics in general you don't just want to know where the transition is but you're also interested to what happens when you come close to the solution so you have an idea i mean do people have a heuristic heuristic behavior or what happens with i don't have an idea uh for instance the reconstructibility transition we know uh from very recently that in some cases exactly where it is but in general we we are not sure about where it is so i think there is a conjecture for where it should be uh that's been established rigorously in a paper by then kaitan in 2017 for disassortative symmetric stochastic block models but uh okay i would not venture as a sender this kind of uh universality physics point of view these are kind of similar phase transitions as in in midfield systems and physics but people didn't look so much into like the critical behaviors not many so maybe not the most interesting questions in this conversation well if you want to if you transition is between uh let's say a fast algorithm and slow algorithm maybe you would like to know when you approach the transition from the fast side i mean uh how fast does it stay i mean how how much i mean how fast is it or how how how quick is it when you when you approach so if you're polynomial decay for instance that's the polynomial polynomial decay uh rate does it how does it change when you approach so in the dance systems we have really simple equations that tell you about the convergence time of the outward so there you could do it's just a scalar fixed point equation so you could analyze all the i have another question about this the spectrum of the the matrix so what you see is that do you still have to extract if you if you're given the the graph you have to extract by by some numerical uh how do you say do you have to extract to compute exactly the spectrum and then look at the first the first outliers and try to extract statistical information about this i mean my question is do you need to compute it by by some uh some natural algorithm for instance or can you use this this v to the l dt to the l method to to to have at least an approximate i guess yes i guess you could have a good starting point from this maybe you you would need to use long shows iterations but you could initialize them in a smart manner so you don't have to read to compute the spectrum because this can be a lot of but you know another remark about computing the spectrum is you know that this correspondence between the spectrum of b and this e arabas formula so in the paper on the spectral redemption they also leverage that to to show that you can really work with n by n matrices so you can cut on the uh the complexity of it no question so let's thank the speaker