 Last but not least. Okay, so this graph as a morphism problem is believed to be not polynomial time solvable in general, but it's also believed to not be an NP complete problem so not one of the hardest NP problems. Okay, so we are not going to focus on this one but this provides us some context for our problem which is graph alignment which is a relaxed version of that you're given two graphs. You'd like to see if they are near isomorphic isomorphic. So what does that mean, can you find a labeling of the nodes such that most of the edges get carried to edges most of the non edges get carried to non edges. So one way to formalize that is to minimize the number of disagreements between the adjacency on the starting graph and the adjacency at the receiving graph so a one or a zero you wanted to be carried to a one or a zero and you count the number of disagreements and you minimize this number of disagreements over permutations of the names of the nodes in the graph. So, this you can formalize as a minimization problem of permutations, you mean you either minimize or maximize depending on how you construct your optimization problem so you can view it as maximization of the trace of a product of four matrices. The first matrix A is the adjacency matrix of the first graph. Pi is the permutation matrix associated with one permutation in the symmetric group with n elements. A prime is the adjacency matrix of the second graph and then pi transpose is again permutation matrix, the transpose of the permutation matrix you're considering so this is a permutation matrix associated with the inverse of the permutation matrix so you meet maximize that over the permutation matrices and that will give you the best alignment, the, the, that will minimize the number of disagreements that will maximize the number of agreements between your two graphs for this labeling which is equivalent to permutation. Okay, so that's what we want to solve. So it's got many applications I'll go quickly over that but one application that's been quite well considered is that of the anonymization of social network data. You have one social network data where you know the identities of the individuals and you have one where it's anonymized if you can align the two graphs then you can carry the identities from the non anonymous data to the anonymized data and so you de anonymized so it's a breach of privacy if you can succeed at aligning graphs. There are other, other applications where you carry information in the first example is carrying identities to an anonymous data set. So you can also carry information from a graph that say a mesh of the cortical surface of a brain and you may have a reference mesh that has been carefully annotated you know this part of the mesh corresponds to that subset of the brain area. The broker area or whatnot. So if you have a new mesh of a patient's brain say you can align the two meshes and carry the information of one to the other. So there are many such applications I'll not dwell on that. And I'll jump right away into the setting we consider for analysis. So we look at a synthetic generative model for correlated graphs in which we want to solve this alignment problem. So we love, we love Erdos-Renis graphs, right, we've heard about them already. You even get to see the faces of Erdos-Renis, all is good. So we are going to consider correlated Erdos-Renis graphs. How does that work? You take the same node set for two graphs so the vertices numbered from one to N okay. You construct a master graph which has edges present with a probability P over S. So P is the probability of an edge being present in the end that you want to target. S is a correlation factor. And now you do two sub-samplings of this master graph. You keep edges with probability S and you remove them with probability one minus S. So you see that when you do one sub-sampling like that, you get an Erdos-Renis graph with parameter P for the presence of a probability, the probability presence of an edge. So you do that twice independently. But if S is large, you see that the presence of edges in the two graphs is correlated. So S captures this correlation. All right. And so you don't get to see these two graphs as constructed from this down something procedure. You shuffle the names of the nodes in the second graph. And so the game is to deconstruct this shuffling so to find the permutation that will realign the graphs. Okay, so we know that the ground truth is this random permutation that is a part of our generative model. Okay, so a lot of work has been done recently on this problem and the focus in the first works was to recover exactly the unknown permutation, even the two observed graphs. And so in that setting, we know a lot. We know, for instance, the exact condition on the parameters of the model. There are two parameters. Basically, there's a probability P of an edge being present in each of the graphs. And S is the correlation parameter. So we know from the work of Kulina and Kavash that it is possible information theoretically to recover this unknown permutation from the observed graphs. With high probability if and only if the product NPS is log n plus something that grows within. Okay. If this is met you can by brute force find the permutation if this is not met you have no means to recover it. And then people have looked at the computational phase transition. So under what conditions on the parameter, does there exist a polynomial time algorithm that recovers the unknown permutation. The results have, have kept arriving over the last two years. So the best that is known to date is that there is a polynomial time algorithm that succeeds at recovering exactly the unknown permutation if you have an average degree in your graph. So that's the product N times P that is a power of log n at least. Okay, and if the correlation parameter S is not too small. So it is larger than some constant. So that's the latest and strongest result there is. And it's due to change my own Mark Wilson and third co author. So, what we get from, in particular, the first point is that there is no way you can recover the permutation if the average degree in your graph is not logarithmic in N, you need at least n p of order log n if there is any chance to recover the full permutation. We'll focus on a harder regime that is the sparse regime where the graphs have order one average degree. Okay. And so there's no way we can recover the full permutation but we'll go for a less ambitious objective will try to recover a part of that permutation. So we'll say we're happy if we produce an estimated permutation that agrees with the unknown permutation on a non vanishing fraction of the nodes. So that's what we might call the overlap between the true permutation and the one we produce. And so we want to produce permutations from the observed graphs with non vanishing overlap. So I'll first discuss results about the information theoretic feasibility or infeasibility of this task and then I'll move on to the polynomial time feasibility of the task. Okay, so an impossibility result first, that's something we did with Mark LeLars and our jointly mentored PhD student Luca Ganassalli last year. So if the product lambda times s is less than one, then we can show that it's information theoretically impossible to even partially align the graphs. So maybe I'll tell you a little bit about how we get to to such a result. So there's one concept which is very important in studying this alignment problem. This is the concept of the intersection graph. So remember, we start with a master graph. Okay. And then we have edges in this master graph. And then we down sample. And so we may keep the edges. So let's say I keep this edge in G one. So, I was one. So keep this edge in the other down some procedure. So I don't know if that was a prime IJ perhaps so for each such pair of nodes for which I have kept the two edges I said this is an edge of the intersection graph. So this is just the collection of pairs of nodes for which the down something has preserved the edge in both in both operations of down something. Right. And so by construction, we know exactly what this intersection graph is. We start with a, an air dodge for any graph with parameter. And that is P of our S. All right. We down sample once to keep one set of edges. So we go to G of NP. And then we down sample a second time to figure out whether we keep the edge in the other down something so we eventually get to the end of training graph with modified parameter P times S for the probability of an edge being kept. And I realize I have not said what lambda is, I will take P equals lambda over N. So lambda will be of order one. And this is the average degree. In both our graphs. Okay. So, when I'm saying it's impossible to partially align the two graphs when lambda s is less than one, what I mean is that. So this is lambda s over N. And so lambda s is the average degree. In the intersection graph. Okay, so the value one for the average degree of an industry in graph is a critical value. It predicts a certain number of phase transitions that were actually the ones studied initially by Erdos and Rene back in 59 and in 60 so that was really on set of the theory of random graphs. The first result in the theory of random graphs is that when the average degree of an Erdos-Rene graph is below one, all connected components of the graph have a number of nodes at most log N, constant times log N. Whereas if the average degree is above one, then there is a giant component which takes a non vanishing fraction of the notes of ordering. So we know more actually about this phase transition we know that below this phase transition, the Erdos-Rene graphs, so they have no giant components, but most of their nodes are in trees. Okay, the connected components will be mostly small trees. Okay, so there's a vanishing fraction of the nodes that goes into components that are not trees, but most of them are trees. And so we can leverage this to show that indeed it is not possible when lambda s is less than one to partially align the two graphs. So there is a construction so I'll be very sketchy, but maybe okay let me try to give the the gist of it. So what we do is we produce our two correlated graphs in several stages. First, we produce a candidate intersection graph. Okay, so we know lambda s being less than one that it consists mostly of tiny trees of trees without the one nodes in it. So given such a random graph, we pick permutations of the node labels that are forced to, you know, send preserve this intersection graph structure. So for instance, I have a ij, i prime, j prime and I could take sigma of i equals i prime, sigma of i prime equals i and so on and so forth. And so if I choose permutation that shuffles labels in such a way that those three components get preserved. Okay, I have not messed up my my structure for the intersection graph. And moreover, with some techniques of so called Poisson approximation I can show that I can do these constructions so pick those sigma pick a number of them a finite number of them, and then I can fill in the remaining edges. And with a non vanishing probability for each of these permutations, I will not have created extra edges in the intersection graph. So I could create an extra edge for instance if I have. Okay, maybe this is in the intersection graph I have ij, I have i prime and j prime here. Today I have k here and I'll take a sigma of k equals k and then I permute those guys. So, eventually, if I do the permutation and I had this originally. Okay, so I had one edge in the first graph from i to k, and one edge in the second graph from i prime to care after the shuffling. So I have an extra edge in the intersection graph, but this happens with probability order one over N. And so, overall the number of new edge that I created over the one and I can use this Poisson approximation technique to show that with non vanishing probability I have such extra edges. So why is that interesting. It's interesting because when that happens, I know that the distribution of the pairs of graphs that I produce after shuffling by the sigma or before they will have the same distribution. So now I can say nature follows this construction and decides afterwards which of these it chooses let's say I have done a number of p of such constructions of pairs of correlated graphs, these are the same graphs that you will get to observe that there's two p choices underlying this which have p different alignments and nature chooses one of these p alignments, and you don't know which, and they have vanishing overlap between them. So in that case, you're doomed because you'll try to make a guess but you have no way to have a probability one of a p to guess the right one. The right one you'll be off on most of the vertices. So this is how we, we established this impossibility of alignment, even partial alignment when we are below this critical threshold lambda s less than one for the average degree in the intersection. Okay. Okay, so what we do a bit more with this kind of reasoning but let's let's not dwell on that. So what feasibility in an information theoretic sense. So, what we believe to be true is that we have the right threshold here and that when lambda s is larger than one. We can be able to align the graphs in perhaps exponential time, but right now we can't prove that what is known for sure today is that if the product lambda s is larger than four, not one, then you can indeed align the graphs in in exponential time. The, the, so the sharpest result is due to a year on who jamming Sue and Sophie you who have produced a number of results on this problem so this is a recent paper from last year. And we had the worst constant for the condition in a paper was done in a whole year before. So let me just give you some of the ideas for the feasibility result. So this is going to be reminiscent of what I tried to explain about the existence of the heart phase for graph clustering so we'll use essentially what I call the first moment method. So we'll define a permutation to be good if after relabeling the nodes according to this permutation in the second graph and after counting the number of joint occurrences of edges in the two graphs. Having done this relabeling. I have a number of edges of co occurrences of edges in the corresponding intersection graph you might say that is larger than the number of nodes and times some constant beta. So we know exactly the number of edges in the true intersection graph, it's going to be. So the number of edges in the true intersection graph, basically it's the one half of n times the average degree in this intersection graph so we know that's that's the target we know this is achievable by some permutation. If we've got the right permutation we would have this up to fluctuation terms. Okay, and so we'll say we get a good permutation if we get n times a beta that is slightly less than lambda s over two, right. And so the, the end of the argument, it's a parallel to the argument I was catching for the existence of this heart phase in the graph clustering. I want to show that any permutation that are that is good in the sense is a permutation that achieves a non vanishing overlap. And so we follow the first moment method, we compute the expected number of good permutations that are that have an overlap below some value gamma. So this is something that we can, we can explicit somehow. So it is, it is a lot of work still once you have gone that far. So, what is the remaining task. So for a given permutation, you want to control the probability that it produces an overlap that is at least gamma or at most gamma. And so you have to deal with, with the random variables X of, well, for a given permutation. You shuffle the nodes so that the overlap is at most gamma, and then you fill in the edges. And so you want to compute the probability that the permutation you produced is good. So this is the way in which you do the calculation. And maybe I should not dwell too much on this, but this is a delicate thing to do. So for a given permutation with a bounded overlap, you need to show that the corresponding number of common edges is with high probability below and beta. So if you try to use a turn of style bound, but you don't have in this summation of our, of our edges you don't have a sum of our independent components. However, if, if you carefully put the components together in the sun, you, you may end up with things that are independent. The instruction is delicate. And I should not spend too much time on it. So but that's, that's how you get to, to feasibility results. And so really this is a delicate evaluation and I guess this is why we are at right now we get a condition lambda is larger than four and I think that if we were a bit more. Careful in this line of argument, maybe we would catch the right condition which is lambda is strictly larger than one. Okay. So that's a conjecture and we're not too far on the exact boundary for information theoretic feasibility. So I'll move to polynomial time feasibility unless you have questions of what I've told so far. Maybe I'm a bit just tired but if I want to show that feasibility results I want to show that there are good permutations right. Yes, what you want to, well you know there are good permutations because you know that the true permutation you're looking for is going to be good with high probability. So you know there are good permutations what you want to show is that no good permutation produces a poor overlap. So the way that you do it is, you will sum over permutations with a poor overlap, and you will show that the probability that any such permutation is good is tiny. And you must show that it is so tiny that after summing over all such permutations with poor overlap you get something that is still tiny. This is in this way that it goes in the same. Right, so let's, let's now move to to poly time partial alignment of graphs in this model. What's going on on this. Well, you should see on the screen the region, or maybe you see it. No, it's a very light blue the region where you see it on that one. That one is even better. That's the conjectured information theoretic boundary so we have right we have lambda, this is the average degree in the graphs that we get to observe. We have s that's the correlation parameter between zero and one. Okay so lambda s less than one is, when we get, we will start things at one. Something like that. This is lambda s equals one and we believe that the, the information theoretic limit is given by this line. So we'll start, you know, carving out of this feasibility region a polynomial time feasibility region. So first with this result. So this result will tell us that there is a triangle it exists it has none empty interior it is not big but it exists and will improve that as I proceed. So we have one polynomial time algorithm, which I described in a minute, which succeeds at polynomial time alignment in this tiny rectangle of the triangle so we have the, of the phase space, and this is the so called neighborhood tree matching algorithm and this is again something we so that's something I did with a look at the PhD student I already mentioned. Okay so let me describe a little bit this algorithm so the first thing I need in order to describe it is this notion of a tree matching weight. So tree matching weight for us is the following thing given two trees that are rooted. Okay. If I give myself depth, candidate depth D, then the tree matching weight of the two rooted trees is the maximum number of leaves at depth D that I get in a tree that I can write in both of them so I can find a tree that is a subtree of each of them, and I count its leaves. And so the maximum number of such leaves is for me the tree matching weight between the two. So that's what I'll use in this algorithm. The first remark is that you can compute that recursively so this is something that is amenable I mean to to reasonable computations. So if I have those two trees if I want to compute this weight, what I'll do is I look at all matchings of the children of the truth. I'll start from the tree matching weights to depth D minus one of the corresponding pairs of rooted trees. For any matching I send that I get a candidate value if I maximize over the matchings of the children of the truth. I get my matching one so that's amenable to a recursive computation. Okay, so what do I do with that in order to produce an algorithm. Well, here's a general idea. I would consider any node I in the first graph G one any node you in the second graph G to look at their neighborhoods to some distance D. Okay. So if these neighborhoods are trees in general DC the neighborhoods in sparse and the training graphs are tree like so I know this will happen most for most pairs of nodes. So if these are trees, I'll compute the tree matching weight of the two neighborhoods and I'll take large matching weight as a sign that these nodes are good candidates to be matched. And so I would want to produce a matching that consists of pairs that have large matching weights when I consider their neighborhoods. So this is not quite sufficient. Because we intuitively you know that if you have a correct match okay the neighborhoods are correlated in a way that you understand quite well and so you'll have a large matching weight but if you take so in your construction of the master graph before shuffling and when you do the two down some things okay if I consider two nodes in this graph that point in G one for I and G two for you to a common node if there is a large matching weight between the two neighborhoods for the common node they point to then I and you will inherit a large matching weight so this is not sufficient I need to filter out false positives. Okay, so there's an easy way to do that that we developed. We call the dangling tree algorithm so dangling tree trick sorry. So, instead of declaring that node I and node you are candidates for a match if the neighborhoods have a large matching weight tree matching weight. I'll ask that for each of the two nodes I can identify two neighbors. I can identify J, J prime for I and V prime for you, such that the tree that is dangling from J in G one and the tree that is dangling from V in G to have large matching weights and similarly for J prime and V prime. So, thanks to that I can work around this, this false positives so I have more or less the same philosophy for the algorithm but I need to introduce that to get rid of false positives and that's that's what we do. So, I guess if you do that reduce the probability of false positives and if you want to reduce even more you could look at the dangling trees which are connected to the nodes connected to the. We could go down that way. So, if you just take this algorithm you need a couple of technical lemmas to show that things work but if you change this to three dangling trees rather than two. It's the proofs get much simpler. So that's what we follow in the end our proof strategy we use three dangling trees for each node and we need three high weights for three tree matches in order to qualify a pair IU as a as a candidate match. Yes. Yes, yes, you know that you will miss out many good matches. If the degree is low. I repeat the question if I repeat the question. Yes, we do have proper matches that are even isolated nodes and so they, the matching weights are going to be zero so this you'll never get. But since we are after a partial alignment so we would be glad if we matched correctly 10% of the nodes say, indeed we will never get 100% in that regime, especially for isolated nodes there is no way you can tell that they are, and in the end of training graphs, you get a sizable fraction of nodes that have degree zero so you're doomed for those. Okay, so the details of the algorithm as follows you, you have a depth that you use in the computation of the matching weights that is logarithmic in the number of nodes okay so that's what will work. And you pick a threshold for the weights that I'm going to describe is going to be exponential in this depth. Okay. And so, two nodes are you such that you have two dangling pairs with a high matching weight are added to a set of potential matches. And what happens I'm going to give you a little bit of intuition for why this is the case. What happens is that for each. There's a non vanishing fraction of nodes in the graph G one such that you can find such that the correct match is identified in this procedure, so we will get a sizable fraction of the correct matches in this procedure. And the number of nodes in G one such that you do match them to more than to an incorrect node that has a vanishing fraction of the notes. So this is what we can prove and so massaging this list of potential matches, we can produce a permutation that we know will achieve non vanishing over the map. So, a few words about the, okay, a few words about how we get to that result. So, first thing that we need to work on is understand the matching weight when you have to nodes that are very far apart in the graph. So one thing I mentioned previously, is that the neighborhood of a node in another training graph is close to a branching process. More specifically, a Galton Watson branching process with law for the number of children in each individual that is Poisson distributed with. So, then if you try to match to nodes that are very far in this master graph originally. The neighborhood in G one looks like a branching process Galton Watson process and the neighborhood in G two of the other guys again like Galton Watson branching process and they tend to be independent. Okay, so we need to understand how large a way to produce if we consider to independent Galton Watson branching trees. So that's the first thing we do. And so you can by induction on the depth, get probabilistic bounds on those matching weights, and you have two independent branching trees. And then we can show that with high probability, the matching weight of two Galton Watson branching processes that are independent that have a average number of children that is between one and some not so large number lambda note. This grows like a constant gamma.