 Okay, shall we? Okay. So, as I was saying, there's a converse to the result I was stating on census reconstructability holds if you're above this gestant stigma threshold. And so here's a kind of converse. Assume a specific law for the number of children in your branching process, assume Poisson of alpha say for the distribution of the number of children. Assume you're below this gestant stigma threshold so alpha times lambda two of P in modulus squared is less than one. Then the mutual information between the spin at the root and the sensors vector vanishes. Okay. And I stated here for Poisson distributions but this is more general. In fact, this holds for quite a large family of offspring distributions. So, why is it referred to as the cast and stigma threshold. Well, this is because it derives from some results established by Harry Castan and his colleagues to go mean in the 60s. In a paper entitled additional limit serums for in the composable multi dimensional Galton Watson processes. So, what they were looking at was multi dimensional branching processes, which is really what we have here. We are looking at a slightly more general family of multi type branching processes. And it's like we have a gene genealogical tree and we have Q species, the species corresponding to a trade so we already are dealing with a multi dimensional branching process. So, there's a kind of law of large numbers that you can have in those multi dimensional branching processes. If they tend to survive, then the sensors vector tends to blow up exponentially fast. Right. And so if you normalize the number of individuals at generation D with trade as by alpha to the minus D. So they tend to approach new sub S. So that is the stationary probability for trade S with your probabilistic mechanism your stochastic matrix. So they look at deviations from this large law of large numbers behavior so they look at what happens when you take your sensors vector. The thing predicted by this law of large numbers. And then you scale down by some variants or rather standard deviation parameter and in that case that turns out to be alpha to the minus D over two generations. And so it turns out that there's not a universal behavior for the limit of these random vectors there's a dichotomy and the dichotomy is the predicted by this cast and stigma threshold. It's exactly given by whether alpha lambda two squared is above one or below one. And when it's below one. It turns out that the distribution of this rescaled vector converges to a Gaussian multi dimensional Gaussian that is not dependent on the spin of the root node. So we can consider branching process conditioned on the value of the spin at the root node. And that gives you a limit in distribution and this limit in distribution below the cast and stigma threshold does not depend on the on the spin of the root node. Okay, so there's a limiting distribution that is Gaussian and that does not depend on the spin at the root node. So we can strengthen that quite a bit so convergence in distribution to a limiting distribution. This is equivalent. So this this is according to the theory of the weak convergence and probability theory to see the equivalent convergence in the levy Prokof metric. So the theorem due to stress on which tells you that if you have convergence in distribution of two sequences of random variables then you can construct a coupling of these two variables such that with high probability they are close by. So we can translate this convergence in distribution result to a coupling results saying if I condition, according to the value at the root, the spin, whether it is tau or tau prime. I look at my sensors vectors x at depth D conditioned on tau or conditioned on tau prime. I can make this coupling such that the probability that they exceed epsilon times alpha to the D over two goes to zero for any epsilon. Okay, and I can actually use that and be fit up if I if I take this coupling at generation D now from these coupled sensors vectors I will construct the sensors vectors at generation D plus one in a coupled manner. And so here the proof is simpler if you assume Poisson distribution for the number of springs because conditionally on the census at generation D, the number of individuals of a given spin at generation D plus one is a Poisson random variable. And for distinct spins they are independent. So these are nice properties of Poisson distribution. And this allows you to show a stronger result at generation D plus one namely that you have a coupling of the sensors vectors at generation D plus one such that with high probability the two are equal, not just. Yes, they are equal, not just distinct. This is a coupling which establishes that the distance in variation between the two distributions goes to zero. I guess some of you have heard about distance in variation of random between random variables. So anyhow, I'll also give a pointer to lecture notes if you want to fill in details that I'm going over quickly. And so that's what gives you the result because there's a simple derivation which allows you to show that in that, in that particular case, the mutual information between the spin at the root and the sensors vector can be up and bounded by constant time the over distinct values for the spin at the root of the variation distances between the distributions of the corresponding sensors vectors at generations. So if this goes to zero then the mutual information goes to zero and that's how we can get the converse to the test and stigma. Okay, so we have a good understanding if you like of the onset of a census reconstructed entity this is predicted by this test and stigma pressure. So let me move back to the first question I was asking which was tree reconstruction and giving myself a lot more information and giving myself the whole tree and I know precisely which node at generation D gets which spin, not just a summary statistics. So. Right, so the mutual information between the spin at the root and the tree as well as a spin that generation D is given by the conditional distribution of the root spin, knowing the spins at generations. So let me denote by new hat. This corresponding conditional distribution we know that this mutual information is a function of this new hat. All right. And so, how do we compute in practice this conditional distribution. So, well, we have a very nice conditional independence property in this model that when we know the spin at one node the spins downstream and the stream and the spins upstreams are independent. In a sense, a Markov random field over a tree. And in such Markov random fields, conditional distributions can be computed efficiently using the so called belief propagation algorithm. So we can we can follow that way. And so in order to determine the conditional distribution of the spin at the root, given the spins at layer D, we can move up the tree for each node in between I compute the conditional distribution of that guys spin given its descendants at that layer. And so that's the, that's the belief propagation algorithm. And so you can figure it out for yourself that because of these conditional independence properties, then the conditional distribution of the spin of not I, given its descendants within layer D. So maybe a teacher is, is in order here, so I have R here, I have I here, and then I have. Sigma J. So, these are those guys that are I conditional. All right, and I can compute recursively those quantities using this belief propagation equation. The mark that is important is that they admit the trivial fixed point. If I plug in within this equation, the stationary distribution, this is a fixed point. What I, this is not what I will do if I'm given the spins at layer D I will plug the true values and so I will inject dirac masses of the true values as distributions and iterate and get distributions these will be random distributions because there are functions of the spins below. But that's belief propagation. So it turns out to be a powerful analytic analysis tool in order to understand the onset of a reconstructibility in the following way. So, knowing that we can use this belief propagation equation to compute the conditional distributions, then we can determine iteratively the distribution of this conditional distribution at layer D. This is a random distribution, it admits a distribution. So this is distributed probability distribution on distributions on a set of size Q, and we can using a belief propagation determine precisely what is the law of. So I'm writing QTAUD for the distribution of the conditional distribution of the spins at the root, given its true value is tau and I observe spins at depth D. And so, just a consequence of belief propagation is the so called density evolution equation, which characterizes how this QTAUD evolves as I increase D. Okay. So that's one way of writing it. I'm not spending too much time on it, but this is really something you can figure out by thinking about it without calculations, just a consequence of belief propagation. It admits an unconditional version. You can ask what is the distribution of the conditional distribution of the spin at the root, without making any conditioning on this value of the spin at the root. So it turns out that this unconditional law is an average of the conditional distribution QTAUD I was writing. And so you can similarly deduce from the previous one, a fixed point, well, an evolution equation for this distribution of a distribution as you move it. And as I said, belief propagation admits a previous fixed point that is the stationary distribution new. And so we know that this density evolution equation that I have summarized as a formally you have q hat d plus one is function psi of q hat d. You can use for q hat d, the Dirac mass at new, then you get Dirac mass at new idea as the output. Okay. So that's a trigger fixed point. And it turns out that working a little bit more we can, we can get a characterization and whether there is reconstructibility or not. The necessary and sufficient condition is as follows, there is reconstructibility if and only if this fixed point equation admits another non trivial fixed point besides the trigger fixed point. So there's one direction which is easy, which is if what is the easy direction again. So, you remember we had this conditional distributions new hat, we know they converge as we buy the backwards martingale convergence theorem as I get less and less information because I get to see spins that are further down the tree. So this is a decay of information I get a limit to that. I know that I have for the limit of fixed points. So the distribution of the limit will be a fixed point for this recursive equation. So, if there is only one fixed point and this is the Dirac mass at you, then the fixed point at the limit will be the direct mass at you which means that the conditional distribution of the spin at the root converges weekly to the distribution new and this is the same as not, not having reconstructed. So uniqueness of the fixed point implies non reconstructed and you need to work a bit more if you have a fixed fixed point that is not trivial, then you can massage it a bit and construct a statistics that will allow you to get meaningful information non vanishing information about the road. Okay. So it's, it's not like that it's, you know, there's a fixed point that is not trivial. And so you forget about belief propagation, and you use this fixed point and you say okay I'm going to move to pass this to construct a statistic like I was constructing a statistic based on the eigenvector for census reconstruction you construct a statistic that you propagate towards, and this will give you something that is correlated with the spin at the root. So, in a sense, you, you don't use vanilla belief propagation use something slightly different, and that works out nicely. So, yeah, so the idea of that was given in a paper by Mark Mazar and Andrea Montanari in 2006. They looked at specific setup but the idea generalizes nicely to a more general branching processes, non uniform stationary distributions and so on. I guess that in this true reconstruction problem if you run BP, you get as much information as there is. Yes, yes, you have no loss of information so this other line of argument tells you there is information that you can access by doing this computation. And I think like that BP does carry some information in that tree model. So when we move to non tree models. Right. Yeah, yeah, so. Yes. Okay. So, are these two problems reconstructability and no and census reconstructability equivalent. Well, no, they aren't. There's paper by Alan slide in 2011 where he looks at Q every model so he looks at a queue, the number of spin values. And he looks at four and he looks at regular trees. So there's a number of children that is the same for all and everyone be a constant and integer. And he shows that for Q larger than four or equal to four you can find parameters where you have reconstruction but not census reconstruction. Very tough. He used Mathematica to pass well to produce string of formulas and it's not human readable this paper. I think they are. But that's the first statement of that can. Okay, and so for those in the room who are proficient in, in, in cavity method I guess this density evolution equation is really familiar to you. So this is something that is central in the cavity method in statistical physics. And this is also very important in the theory of error correcting codes. Okay. So with this, I will move to my graph models on which I want to do community detection and reconstruction so graph clustering. So there is a strong tie with the models we have been seeing. So something I was mentioning during the break. Those three models and those reconstruction models have been looked at by experts in genetics. They had problems of that kind they were they were really the ones who proposed this kind of questions. So there is a distribution here that's more in light of what we are going to look at now. So, we are going to look at graph clustering but not that any graph clustering will look at graphs that come from a generative model that will be the stochastic block model, which is a well known and popular model of a random graph. So here's how it's constructed. You have and vertices. You will assign to each of them a spin at random. And let's call new the distribution with which we sample the spins. And let's assume there are q values for the spins and so we see the same notations as in the previous problem. So if we assign spins to the, to the nodes, we decide whether to put edges between two distinct nodes and we do so at random so for a pair of nodes of spins s and t. So where is that on my slide. So, yes, okay, I and J let's say they have spins, sigma equals s and sigma j equals T, then I'll put an edge between them with probability that space like one over M but the numerator is this r s t term, which is p s t. Which is the weight in the stochastic matrix p s t times alpha from some alpha that is a parameter characterizing the density of edges in my graph and divided by new t. Okay. So, and the way to think of it is to look at the adjacency matrix of this graph so this is a nine and by n matrix with zeros and ones. Well if we condition on the spin values of the nodes, then for, and we stack the nodes according to their spins. So we'll have a block structure for the conditional expectation of the adjacency matrix, knowing the spins. So for spin s and spin t the probability of an edge is the same whatever the notes so we have a block homogenous block here that comes from this, this generative model. And so it's convenient because when we ask, can we cluster nodes according to their statistical properties well we have a planted model where the clusters we are looking for they are part of the models. So that's what we want to get that we want to retrieve the spin values in this model. Okay, and so, yeah, let me finish this. So, the way to look at this is to say, we have a graph characterized by its adjacency matrix. So this adjacency matrix is a block structure model plus some noise. So we, we want to get at the, at the block structure and find our way through the noise to the block structure so in a sense this is very close to things you've heard throughout the week. Random matrix theory, low rank deformations of matrices, and so on, except that here we have a rather specific noise model because the entries in the end are zeros or ones. So, a couple more comments about this generative model says BM as I parameterized it. If I look at the average degrees of nodes conditional on their spin. So here I'm, I'm conditioning on the spins of everyone. Okay in the graph so Sigma underscore bracket and is the vector of spins so conditionally on the spins of everyone. So, what is the degree of node I. Well, you can just distinguish according to the spins of the other nodes. So you have roughly new S times and nodes with spin with spin S. And you can see that you get to connect node I with such a node is alpha P Sigma is divided by and times new S by definition of my mother so it's it's all on purpose that I parameterized it that way. So over in the end I get something that is approximately alpha. And this is approximate because, okay, there are close to new S times and nodes with spin S, but this is a law of large number of phenomenon and they are fluctuation so this is why it's not an equality. So I'm parameterizing things to make my life miserable so it's very hard to find the clusters. It's harder let's say at least when the average degrees conditions on the spins are the same, because I cannot hope to distinguish clusters by just looking at the magnitude of the neighborhood numbers of neighbors. So I'm looking at difficult scenarios. So what one other word about this model. One matrix that will be extremely important is one matrix that is called the mean progeny matrix. And it's a q by q matrix. And so the entry ST for two spin values as and here this mean progeny matrix is just alpha times the corresponding entry in the stochastic matrix P. And something that was on the previous slide that I forgot to say is that we assume we have this reversibility relation us PST equals new T PTS. And this is necessary if I want the probability of putting an edge between nodes with spin S and T to be something that is symmetrical in S and T. All right, so why mean progeny. Well, as we'll see this model is closely tied to the tree model, so a multi type branching model that we were describing previously. And so the mean progeny matrix has for its entry ST the average number of children of type T that individual type S gets and in the random graph this is the average number of spin T neighbors that spin S node gets. Okay, so that's for terminology. Okay, so let's let's define our objectives in this in this framework. So, first definition is that of overlap. So the overlap between a vector of estimated spins so Sigma hat I the estimate of spin at node I that I construct from my observed graph. So the overlap of this vector of estimates with the true values of the spin, but it's just rescaled and centered evaluation of the number of values that I get right. Okay, so I could count how many spins I get right, divide by and so that is a fraction that I get right, and then subtract. So let's say the sub premium of a rest of a new service. So that's, if I say everyone has been one or everyone has been 10, I get zero at best. Okay, so that's to get something that becomes positive only when I do something that trivial. There's one more thing in this distribution so the names of the blocks of the spins have absolutely no meaning in that model so we need to take care of this in that in determinacy, we cannot hope to estimate properly the name of the of the blocks so we maximize over the, we maximize over permutations of the spin names are measure of the fraction of correctly predicted spins. So this is maximum of a permutations pie in the symmetric group of our q elements. And so I'm counting success if I have seen my equals. Okay, so that's all up. So, from this, I get a first definition. So that reconstruction is feasible. And if I can produce estimates sigma hat I from the observed graph that ensure that with high probability the corresponding overlap is lower bounded away from zero. There is some epsilon so the probability that I get overlap above epsilon goes to one as I'm going to infinity. And so there's the same definition adding the qualifier polynomial time. So I said that block reconstruction is polynomial time feasible. So there is an estimate sigma hat I that achieves non vanishing overlap that I can compute in polynomial time. Okay. All right, so that's one notion of success or reconstruction in that graph model success of clustering. So the weaker notion that I'm, I'm stating for the record and because it's closer to the performance objectives that I was describing in the tree reconstruction model. So we could say well, rather than aiming at non vanishing overlap I could characterize success as achieving non vanishing mutual information. So this notion of weak reconstruction tries to capture. But I guess I'll not, I'll not dwell on this will be talking mostly about achieving non vanishing overlap and not bothering with that. So the situation as I understand it now is if you have a weak reconstruction and the distribution new is uniform, then you also have strong reconstruction you can achieve positive overlap. And it's always the case that if you can achieve positive overlap you get weak reconstruction. So what I don't know is if new is not uniform and I have weak reconstruction do I really have strong reconstruction. So we'll not worry too much about that because we will specialize quite often to the case where new is uniform that will not be this distinction will be irrelevant. Okay, so let's, let's try to explain. Yeah, is the weak reconstruction related to like the detectability treasure in a sense of being able to distinguish between Erdogan and, and as BM generated for I'm not sure. Okay, I think it might be the same as strong reconstruction actually because of the reasons I was, I was. Okay, so if the distribution new is uniform, it is the same. Okay. So in theory it could be weaker, but we don't know and I would, if I had to bet I would bet it's the same. Okay, I have no proof. All right, so let's see why the two, how the two models are related the model, the stochastic block model and our problem of, of block reconstruction or community detection and the three models that I was describing. Previously. So, there's this result about the structure, the local structure in this random graph so you can pick a node I in the random graph. And then, okay, it has a spin sigma I, and then you can look at its neighbors can look at the neighbors of I, and you can carry on and you can in fact look at the ball in the graph distance. Around I to distance D. Okay. And now you can attach to each nodes. And it turns out that you get a tree with nodes decorated by spins. And that if you take the not too large. So for instance, they D as a constant times log n for a graph with n nodes. So the tree that you get here will be in distribution, according to the distance in variation very close to the law of the multi type branching tree we were considering before. If I consider the offspring distribution. That is, parameter, which is my parameter for the average number of neighbors. And if I take the very same P matrix for the propagation mechanism of. And so this is a, this is a well known result in the theory of random graphs that you have this branching structure for the neighborhoods of nodes in a name in a random graph if you don't look too far from that node with high probability. You get a tree in this non dense model where the average degree is not too large it's all there one here. So you get a tree and for the stochastic block model you get a tree that is a branching tree. So, up to vanishing arrows for distance in variation, and you can really prove it by induction constructing your neighborhood as you go so you can ask how many neighbors do I get with that particular spin. And you know it's a binomial distribution in this model. However, it turns out that the binomial distribution and the Poisson distribution are close by in variation distance so you can bound the error you make, and you assume you have coupled the Poisson binomial distributions successfully. And you iterate and so you can show that with a probability close to one there's a coupling that succeeds. And so that's how you put that. So, okay, so the local structure is the same so that's good sign. Why there's a link between the two models. I'll state another result about the, the structure of those stochastic block models that's a result that is due to Mosel Neiman and slide that studied those stochastic block models and reconstruction problems in these quite thoroughly. So they established, let me pass this for you they established a kind of approximate conditional independence property in those in those random graphs. So we know that in the tree model if we look at nodes. I will look at the tree from this node assuming it's the roots we look at the spins at some distance D, then whatever happens beyond is conditionally independent. There's something very similar here. So what we have is in the random graph model we look at node I look at say it's the neighborhood. And we distinguish so we draw a set of nodes. These are the nodes at distance. Plus one on the center nodes here. And so we get W is whatever is left beyond. So the result says that the conditional distribution of the spin at I given the full graph, as well as the spins at distance D plus one and beyond is very close to the conditional distribution of the spin at I given the graph topology within distance and the spins at distance D plus one. So it's the graph version if you like of the conditional independence property that holds exactly in trees here it holds in the limit so asymptotically. This has a first consequence, providing us with a strong link between the two questions of tree reconstruction on the one hand and block reconstruction in the graph on the other hand. So if we put together the two results I was stating so this approximate conditional independence, plus the fact that the distribution of a neighborhood is close to the distribution in a tree. In the tree model, then we can get with a couple of steps that if the tree reconstruction does not hold, then weak reconstruction in the graph model cannot hold. Okay. So that's something that goes in this direction because I have no information about that spin. If I'm looking at the whole graph and I'm given on top of that, the spin values at some large distance D of order log n. And so I need to consider two nodes. And with high probability there's no correlation I can guess on the spin values when I observe the graph. So I'm on waving a bit here the more rigorous argument is sketched on the slide, but that's that's one, one strong tie between the two models. So none tree reconstruction implies none weak block reconstruction and hence none strong block reconstruction. Okay, so you, you might ask, is this a sharp condition so for non reconstruction in the graph model. So to the best of my knowledge it's, it's not completely settled. It's known that there are some cases where you may have tree reconstruction and you may not have graph reconstruction. So graph reconstruction is a stronger requirement than tree reconstruction. And so an example of that kind has been proven in a paper by Kojougland, Chakala, Perkinson's Deborah Vines 17. And they have a conjecture for what is the correct condition for graph reconstruction but I'm not going to try and convey to you to take me out of my comfort zone. And Mark will be telling you about the information theoretic reconstruction problems that are really a parallel to what I'm describing here. So you will get to see some of that in this lecture. Okay. Okay, so let's now discuss a little bit about algorithmic approaches for clustering our graphs so the basic vanilla method for clustering graphs is a spectral method where you take usually all the adjacency matrix of the graph or the Laplacian matrix of the graph. You do a spectral analysis you pick a few Eigen vectors. So if you are looking at the adjacency matrix what you typically do is pick Eigen vectors associated with Eigen values with large modulus. Yeah, that's fine by me. Okay, let's do. You're welcome. Yeah, thanks. I had a question about the first part of your lecture and the problem of reconstructing the spin at the root given some information about what happens at layer at the so you made a distinction about whether the mutual is zero or larger than zero which which gives distinction between a problem is reconstructible and the other one that is not. My question is first in the theorems that that you proved. Are you able to access also the value of the mutual information, the limiting mutual information when it is non zero or can you just prove whether it is zero or non zero, and, and in general, is the value, the actual value of the mutual information important in any aspect of the, of the theory so whether the mutual information is in some units is equal to one or 100 doesn't make any difference. So if you normalize for example, with the upper bound, if it is like 0.001 or it is 0.999 doesn't make any difference in, in any aspect of the problem. So I guess it must do some, must have some consequence and if you, if you are serious about estimating this spin value at the root. Obviously, the properties of your estimate will be affected by this limiting quantity if it's not zero but close to zero. I guess the people who are serious about that in the tree model where the geneticists who started off this, but I'm not aware of any work that you know tries to really quantify the limiting value and reduce useful outcomes from that but that's the theorems that you prove you can only prove whether it is zero or non zero but not the actual value of the limit. So the proof I'm describing for the census algorithm. It does not provide you any bounds so it's, you know, existential rather. So it's not good in that respect. I guess the one about the fixed points of the density evolution equation should provide you with some quantitative estimates for this limiting and neutral information. So I think this you could really work out even numerically you might accept that, at least. But again, I don't know of anyone who was serious about pushing that further. But that's a very good point. Okay, thanks so much. The military formation has been computed like the actual value for different settings of the SBM. So in the SBM case, yes, the results that I know of there and there's a paper by Montanari and Emmanuel Ape when they look at degrees that grow. And so they can call upon some Gaussian approximations. So, you know, they start looking at channels with Gaussian additive noise like Mark was describing. Yes, in this limit of dense graphs from the mutual information point of view the SBM is exactly the model that Mark will be discussing. But like in the sparse case, it's much more hard. And like, there is a so in the disassortative case where nodes tends to be more connected when they are in different groups. There is a proof of what the mutual information is. So that's in the Perkins. Yes. Yes, exactly through cavity arguments. But in the assortative case, like years after years people get like the formula is the same essentially but there is always a kind of regime of as of signatures ratio, which is a, which measures the difference between the varieties between intra and inter different groups and inside the group. And it seems that there is a gap that no one is able to close at the moment and from. But they have a guess that they say is exact but none recalls. Yeah, but, but it isn't it. I would say six six until six months ago, yes, but now I would bet that the formula is actually wrong in this, at least in a certain gap, there is something really special happening in a specific gap and Okay, no, no, it's not so it's not so clear in this assortative case, even at the level of the information which usually is a high is easier to access that than the overlap. So, yeah, I've tried myself for a very long time and I lost a lot of time. Any more questions. Yeah, it's huge more. Thank you everyone.