 Our next speaker is Gabriele Sikouro from King's College in London, who is going to tell us about the planted matching problem. Okay, so thanks a lot for the invitation. Thanks for having me here. This is a joint work with the game sergeant and Link's the Bolova. It's a work on Imprints problem as you would see and that is defined on graphs and I will try to introduce the problem first of all giving some some examples so you can You can see how it works so the first example is a example of the example of problem of Particle truckings, so you have a supposed to do for example gas or particles and take a snapshot of these particles at time t and then you take another snapshot at time t plus delta t and you wonder what's the best guess that you can make to assign the position that you Recorded the time t and with the position that you record a time t plus delta t given some probabilistic model that you have of the motions of this particle for example This particles are diffusing and you would like to guess which part is going where Okay, and this means that you are Sensory solving a problem on the graph in which you can imagine that some notes are represented initial position and some notes represent the final position and you are searching for a key factor in particular one factor for an assignment and The idea is that you want to find that assignment is the maximum likelihood amongst all possible ones according to your probabilistic model Okay This was just one example second example DNA sequencing so The problem DNA sequencing is I will not enter of course in the details because I'm not an expert of DNA sequencing but one thing that I Can tell you is that what people do is that essentially read some DNA and they extract some pieces of DNA and they have some major of likelihood of the fact that these pieces of DNA are Contiguous or not and what they want to do is to reconstruct the true sequence of the true DNA sequence Trying to maximize some likelihood again overall likelihood or the fact that the final sequence is the correct one. It's the good one and For example, if you assume that there's like in this work of Bagaria dings won't you what they were considering is the idea of a circular DNA molecules. So in the end what you have is that you have some Problem on the graph in which the notes represent these pieces of DNA at the links Represent the fact that they can be continuous or not and they have a likelihood on them and you're searching for an hidden miltonian cycle in this graph In both these problems what you are searching for essentially is a plant decay factor So the problem is essentially problem of inference on a random weather graph in which you have that you have some notes and Edges between these notes represent some likelihood of the fact that these notes are Contiguous for example in the case of the DNA sequencing or they represent subsets the position in the diffusion Example and you are searching for an infrastructure that is a plant a K factor. So is a spanning K regular sub graph the graph K equal 1 means for example that you are searching for a matching K equal 2 means that you are searching for a collection of cycles and So the the miltonian cycle problem is a special case of this the second case And in both cases of course what you are doing is some You would like to solve a planted the problem in which there is a ground truth. It is again, for example the correct assignment and but we have noisy data and In the body so Timela setting what you would like to do is to study in the posterior Of this of this model. So, okay in principle if you Try to do that The typical scenario is that you can be in different phases depending on how strong your signal is No, better than me this You can be in a phase in which is information theoretically impossible to recover the signal this a priori I mean you could expect that there is a regime in which you cannot recover the signal Regime which is in principle possible information theoretically possible to recover it And then you can try to distinguish the case in which you can computationally do it in an easy way. So important time or no so this is just general if you want scenario and Of course to make some analytical discussion about this type of problem we decide to Come up with a model that we know in detail. So we could do essentially the math on it and The model that we introduced is actually a very simple toy model So the idea is that we construct a graph We consider an ensemble of random graphs In which each element of this ensemble as a planted key factor hidden by construction in it And we add some noise on top of this key factor and then we try to recover the hidden signal somehow from From the complete structure that we construct So that the toy model works in this way We have a n vertices that you as we assume to be and we assume and to be very big So we work in the thermodynamic limit and we construct a key for a key regular graph With this n vertices. So for example in the case k equal one We just pair them. We have a collection pairs the collection dimmers for k equal two We have a collection of cycles And we assign a weight w to each edge that is extracted from some distribution p of w and This will be our signal is this planted structure with some weights and Now we add the noise and What we do to hide our signal is to add Given a non-connected pair so a pair that was not connected in the hidden Structure we add an edge between the endpoints with some probability C over n with C Decided a priori and we assign a new Weight w extracted from the distribution of q of w to each of these edges So essentially we are superimposing to our structural and air-to-shortening graph awaited And so the final graph we have coordination k plus C And the problem now we ask ourselves can we recover the planted the sub graph Assuming that we are given some information about how this out the final graph has been constructed So we know the weights we know the topology We know the distribution p at the solution p and we know c and k so essentially we know everything about the steps that were used to construct the graph and The question is at this point. What is the fraction of the plant configuration that we can construct so one way to start with is to parametrize our problem just putting Putting a binary variable Mij on each edge ij and we say okay that in the solution We hope that this Mij are equal to one on the cork edges and zero otherwise So a solution is parameterized by a matrix Mij And the posterior It's not difficult to see that the posterior takes this form In which you have this function that's just Let's say concentrate the posterior on the good matrices M that actually represent a key factor of our graph and There is a pre-factor that is e to the minus e of M where e of M is just the sum of Minus the log likelihood ratios on the edges so you can imagine that So this is going to play a role of some sort of cost function and What we have is that essentially the posterior counts some the log likelihood of Each edge that we are going to select for our key factor So we check a key factor This exponents is just the sum of the log likelihood of the edges that we have taken we have taken and Because of this overall minus sign here the maximum likelihood Estimator is the arg mean of this function Where the arg mean is taken over good matrices M that represents key factor So the problem becomes immediately a problem of Okay, we are searching for a maximum likelihood estimator So it's a sort is an optimization problem with respect to this function and for example for k equal to 1 I Will respect of k equal to 1 for simplicity this becomes a perfect assignment problem in which The weights are sort of random because I generated the w randomly and So the omega the log likelihood are random But there are two kind of randomness there are the randomness on the edges that are planted in the randomness I mean there are two kind of distribution of this omega the on the plant and on the planted edges So to compute this maximum likelihood the maximum likelihood estimator what we can do is to write down some equation and that will not Go into details on how this equation are written This is very standard and the very old actually was done in the 80s for the general assignment problem and there were proof that this is going to give you the polynomial time the optimum and In the end what you do is that essentially you try to estimate this Auxiliary variables that I will call for the surrogate is on cavity cavity fields So there are two cavity fields for each edge that in a sense Contains the information about the fact that you should take or not each edge Coming from one end point and from the other end point So we have the cavity field coming from for each ij of the cavity field coming from i to h in the direction i to h and the cavity field that go in the direction j to i and the rule is that once you compute this cavity fields You take the edge if the sum of these cavity fields is larger than minus the log likelihood on that edge Okay, and you can drive the equation that this cavity field satisfy message passing equation and Okay, and it's everything is very standard so we're not interested on how this is derived but just An equation that you can solve says consistently on a graph until you reach convergence and then you have all this h on all your edges and with this h You You you pick your edges Of your maximum likelihood estimator using this rule And of course the average error in yen is how many Edges you got wrong so just count the number of edges that you misclassified as planted or non-planted Okay, so up now everything is pretty straight forward and Simple in a way So we can run some experiments to see how things work and For example here we have using we are using two distribution one is the uniform distribution between zero and c on the if you want if you want on the bad edges And the exponential distribution of argument lambda on the good ones And see what happens so I fixed the c Fixed the average coordination of the error to show any Graph that you sort of superimpose to your structure Your lambda is a sort of signal to noise measure your signal to nice ratio in the sense that For example if lambda is very big This means that This solution yet will be spread Far beyond the interval zero c so you have a strong signal And Okay, so This is what we observed during the distribution What we see is that there is a there is no impossible phase You always get some some signal And And you have a transition between partial and full recovery So for example for a given c here You have a regime an interval in which you have an error that is Files or partial recovery and then at some point the error goes to zero It goes to zero quite smoothly actually goes to zero very smoothly And we were able to show that this Error goes to zero in this case with this two Two distribution goes to zero exponentially fast. So there is an exponential Singularity in that Makes the error go to zero in in four if you send c to infinity. So in this limit we were able to show this Analytically And the surprising thing was that Actually, yes, we have a full recovery in this phase like for a lambda larger than four But what we have also is that the fields the age diverge So the quantity that actually your belief propagation algorithm is computing Go up in the phase in which you are able to recover the signal So what happens is that essentially you stop the algorithm anyway And you find out that with the value that are increasing and diverging You construct your estimator and is the exit one So we were trying to to understand what's going on And the way we did this was to study the belief propagation equation that essentially describes That essentially gives you this auxiliary quantity In a probabilistic sense. So we said, okay, let's Take the what the question that the algorithm is running and let's study them in probabilities. So you have You introduced two random variables H hat and h in this slide that represents how these cavity fields behave on the good edges and on the bad edges Because of course what a good edge and a bad edge sees Statistically speaking around it itself is different. No a good edge for example, if this is a A planted edge a good edge is receiving Messages is receiving a signal from the neighboring edges that are all Non-planted all wrong edges. Let's say all edges are not in the solution In the in the in the actual ground truth And instead if you are on an edge that is not in the ground truth You have some signal coming from other edges that are not in the ground truth plus a special one that is From one neighbor that is in the ground truth. So you have to distinguish statistically these two objects And so our problems which from writing down the algorithm was actually sort of simple given the I mean trivial given the literature to study the Statistics of this the probability properties of these two variables and in particular how Like the fixed point the probability Distribution the distribution is fixed point of these Equations There are improbability of course. These are equations in distribution and quality in distribution This means they are equation for the probabilities now of h and the chat And in principle if you know the distribution of h and the chat you can compute for example quantities like the average error You can say okay My average error is related to the probability that I extract I look at the good edge an edge in the ground truth and I look at the probability that its weight that is again a random variable Is larger than the two messages that are coming inside because the criterion to take the edge is that the opposite Inequality holds so this means that you make a mistake or the probability That you take an edge that is not in the ground truth And you you actually take it so that the signal that enters in the edge is I mean the two messages some to something is larger than the likelihood on that edge And you see immediately that actually there is always one special solution This is a solution with all this h that are infinite And this corresponds to the full recovery because if you plug the h hat equal to plus infinity and minus h hat equal to minus Sorry to plus infinity again in the in the two equations In these two equations that describe the probabilistic evolution of the of the fields on the graph They are they satisfied equations and if you plug them here in this expression for the error you get zero Okay Okay, so what happens of course for reasons of time I will Not enter much in the details, but what happens then on the transition There is this solution is a weird solution is a solution with all these fields in fit and What happened was that what happens is that in the regime in which you have partial recovery These fields are fine and they have an asymptotic distribution That is like if you take this equation Improbability and you search for a fixed point In probability of these equations So you make a flow of probability if you want and you want to search for the probability that satisfies this these equations in The question you see that Uh, you can run a dynamics in which you start from example for any distribution for like say a Gaussian distribution You try to get this fixed point in the partial recovery phase you get a fixed point It is a fixed point on five values. So it gets Perfectly normally shaped distribution There is some distribution that we don't know analytically, but it's we can compute numerically Instead when you approach the food recovery phase and you are in a food recovery phase What happens is that this distribution every time that you update in your algorithm your Your distribution to get this the fixed point A drift appears and moves this distribution towards fields are at infinity and essentially is Becomes a from propagation phenomenon in which your distribution Is just propagating forward towards plus infinity with some velocity v and The interesting thing is that we were able to estimate this drift velocity Because it's deeply related to other processes that in physics are known as brand I mean physics about what's improbability are known as branching random walk process And we're studying studying the since the 70s And for which there is a criterion I mean we're able to extract the criterion for to quantify this velocity to find actually A lower bound for this velocity These are just numerical Things that show you this drift happening in the food recovery So here they just stop at different points depending on the resolution of you want of the algorithm So there is a finite size effect that makes the drift stops at some point But I mean finer finer is your resolution later and later it stops And So the the criterion that we found is this one. So it's a formula that of course Say no much, but it's a formula that depends on the relevant quantities in the problem depends on k depend on c the average coordination of this of the Erdos trainee if you want grafted we superimpose to our signal And depends on omega hat on omega that are two var and variables that are distributed as the weight on the planted and As the likelihood sorry on the planted and unplanned edges And the criterion is when this velocity is positive This is a lower bound for the velocity of our Over our front in our distribution So if it's positive this means that there is a drift and this is going to plus infinity and this means that we are covering perfectly the signal In particular the v equal to zero case is the marginal case in which you expect that there is no drift and from the v equal to zero case you find Manipulating exactly this expression you find the criterion involving again k and c and the two distribution that you used for the good and bad edges It is this expression here square root of kc Bp at p where Bp at p is the batataria distance. What is what people call batataria distance? It's not a distance in a metric sense. It's a measure of Similarities similarity of two distributions so We conjecture actually that v equal to zero is the actual Transition criterion and this is a type of phase diagram that we found This actually is with your scale variable. So for any k factor. So there is lambda and c on the two axes and there is a region in which The blue region in which you have partial recovery. So you have a finite error And the remaining region so the red and the white I will comment on why there are two colors if there are some units but for like the way the red and the white are the region in which you perfectly recover your signal And the blue line that is The frontier between these two regions is given by the condition v equal to zero and the points are numerical simulations that fall more or less on the boundary Is plot is with the same distribution that I used before in the In the example Okay mathematicians were interested in this as well. So there are some proof of what I've said because what we did was with Let's say at the level of theoretical physics regular as people say so not let's say it was sort of rigorous but Not proven and in 2021 actually Dingo and shu and yang were able to prove That our condition was correct for the one factor. So for the case of the matching There was there was a previous work that actually was very inspiring to us that was by Marami and shu about a specific case in which the two distribution were The more genius one essentially and exponential one, but in the dense limit And for the case k equal to so in the case of cycles, there was a proof that was Even I think the year is wrong is last year, but okay on the complete graph only on the complete graph and They obtained this criterion for the threshold and you see that this criterion actually is compatible with the criterion written above If you assume, let's see goes to n and you take the logarithm and Okay, so there is some indication of the fact that this criterion that we found that actually should hold for any sparse graph and any k is correct okay, so This is just a few conclusions. So we obtained this general bound for the recovery transition on this plant k factor problem We included sparse apologies as well in the discussion And we found that the transition is continuous and in particular with the way distribution that I showed you the transition is infinite order And let's say that the phenomenology behind this transition of this kind of Front propagation that so the distribution gets if you want, I mean the flow towards the fixed point get this drift At the transition and so we characterize the transition in terms of this drift And of course freedom work is to prove that this bound is tight As I said there are proofs of motivation for special cases We have not fully investigated how the order of served transition depends on the involved distributions in the general case so Because of our argument is that essentially the fixed point start to drift. So We are making the assumption that this would the transition is second order so there is just there is no Like the argument follows from the idea that this is a second order transition and It would be interesting to see if the similar phenomenology appears in other problems and All I would say in this talk was about the maximum likelihood estimator You can of course work on other estimators like a bias optimal estimator in which you just pick the edges The probability was marginal as probability larger than The response is probably larger than one-half to get a compile. So these are the references And thank you for the attention Get some questions if you haven't Yes Thank you very much for this very clear talk. Uh, I just have a question is How did you choose these two distribution p and p hat? Yeah, this distribution where you were chosen I mean in the analysis that we made We only assumed that there were Essentially absolutely continuous distribution. So they could be whatever essentially of course There are special cases in which this whatever makes the probability again For example, if the support of the two distribution is different it's obvious How to recognize what is planted? No, because they are different ways But the analysis was done with general distribution the plot that I've There is a caveat that you have to take into account Maybe when you go to the sparse case to the dense case you have to rescale reasonably The distribution of the non-planted edges just because you have for example, if you have that you're planted The good edge is order one And all the others are order one You it's essentially possible to recover the signal. So if you want to You hear what we did is to scale You see this goes from zero to C. So this means that the C is very big. There are few Edges are order one And the other are order C. So essentially the competition between the plant and the one is with few Like order one other plant but aside from this Yeah, the assumptions were very generic You could swap you could swap the two. Yeah, and yeah, you could do the same Yeah Any other question? Yes I think it continues a bit the previous question But in the first two examples you gave do you know what the distributions p and And they're one of tracking Yes, so in the in the particles, I guess it would be a Boltzmann weight of something Okay, I honestly don't know how things work in the DNA thing So but for the particle one what what people can what people did actually there is There are papers that followed up on this work trying to take into account The geometry is that you can assume for example, I mean you can have a model like a diffusion model in which the probability is Related to the fact that the fusion model. I mean Brownian diffusion is governed by kernel This is Gaussian and so in the end what you end up with when you take the log likelihood Is just an optimization problem which You are the weights the log likelihood are distances for example square because the square comes from coming from the From the exponent of the Gaussian And it's more complicated than what I discussed because of course there are a clean correlation on the point No, so the distances what I've said is that ah, we put random weights on every edge In that case, there is correlation Simply because they are in the clean space and then distances have to satisfy Constraint Sorry, maybe I missed What happens in the in the pruning region of your face diagram? I you didn't miss. I just didn't say so, um, yeah, what I Didn't mention is that there is a There is a region in your space of parameters in which you can Recover your signal just by topological arguments For example, if your graph, uh, if your graph is Erdogan, you have a final probability that some of your edges Some of your nodes are coordination one And you know that there is a plant structure and for example, it's a matching. So this means that that edge Belongs to the plant structure because you have no other choice. No And so you can take off that edge And uh, you and repeat this operation until you don't have any leaves And you perform a pruning out of it And you would cut off all the leaves of your graph until you reach a core that you cannot prune anymore And another pruning effect is related to the fact that you see the distribution these two distribution Have a support that do not overlap. I mean they are not Like there is a regime like weight larger than c are easily identified as planted no because they belong to the second distribution and you can Again, take them off The right region comes from this simple procedure of Taking off edges Just looking at them and at the topology or the way that they have On the basis of these two criteria Okay, let's thank again Gabriele recording start