 Mi bile ne malo pos constellationa z evočenjem na ali podchuckleso itd. Všeč nas trafila z izvečlih vizaitov. Zvom sem te nekaj tehnosovil v rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih rovnih. In svojo nisem zelo, da te toga včasinja bilo vse dobro. I izgleda vi svojo povrst, ki občatak adekvaj, nekaj je štef proste tega všakajga spremisela in čakaj je vse občasini vse optimizacijanju in infarciju? En tous ni spremisela tudi mene madnih moždih vse vse in ki ljudim vse naprej nozda in posledaj nekaj začoma na kompleksne valje, tudi vse owe ordah na redu, skupnjena, maximizučne vsoženje in načinjena, očeljovati za komplekšnji izveženje, zelo našli, kako lahko vsoželjimo in vse bo poslednji postiri in izveženje, in kako želite glaske fizika, v sezivu skupnje, objez vsezivali, izvežena je o vsezivalah. Vsezivalu je zelo načinjena. Vsezivali je, da se zdajem, načinjena je, da se zdajem, sa skupnjem, nekaj sega nečinje. Tako, ta vsezival, nekaj, where I have some tuning parameter, for example the mean degree of the random graph in the problems that I will study, s. s.Yeah, the hardness of the typical problem produced for this random assumptions may become very large eventually, as large as the hardest problem in real life and at the same time these predstavljajte po objevku transisišne zemlje. Zato ideje izgoz včasne všeč. Vse najšilj izgleda prišljitosti z kompleksidem algoritmi in transisišnjem. Zato prišličujemo vse zelo probleme, ki daj smo stavili. Pomembno, zelo se prišličeno vse objevku transisišnja zelo se načinje, nekako, da se počutno, všeč je to naredil. Zgledam, da ne potrebujem vse poslustvene algorjev, zato da sem spesivni. Imam zelo in sparse random modeli. Zelo, da imam milijon in varjev. Algorjev musi bil liniar, zelo, da nekaj zelo v humani tajničkih. In za to, da sem koncentrujila sparse modeli, kaj je od vrst, model je zelo, skupaj, izgleda problem, ta problem včetnja včetnja, tako da je več več več zelo. Zdaj, tudi za mnimi možnimi algeritom, sem zelo koncentrujiti na dve algeritome, tako blivnja propagacija, kaj je algeritom za vsečenja komputenja margina z vsej, z vsej, z vsej, vsej, vsej, vsej, vsej, in taj algeritom je vsej, kaj je vsej, kaj je vsej, v svej, vsej, v medričnih fesih, nekaj nekaj, da je več roba. Čekaj, da jih pričeš od random in semlj, da je realistik problem, peči je, da je problem z vseh vseh. Zato je to, če smo na Mon-Tek-Karli vseh algoridom, zelo Simea Nilinga, da imam tudi nekaj bolj, kako je tukaj define. Tukaj je več roba. Vseč, da se počeš, nekaj nekaj nekaj vseh, zelo, kako jih pričeš, nekaj algoridom, nekaj nekaj vseh, zelo, da bomo dobro preformacije. Prvno je, da tudi algoritmosi je veliko komplikatni, da je analizala in zelo, da smo nekaj nekaj nekaj izgledali, nekaj in v random problemi, kaj je v principu nekaj nekaj nekaj nekaj nekaj nekaj nekaj nekaj izgledali, nekaj transiton. Zelo, je tukaj izgledaj, kaj nekaj nekaj nekaj nekaj nekaj nekaj nekaj nekaj nekaj nekaj kaj je ni komplex energijenče prič na exponentijalne tukne. Ni zelo se prič se vzelo. Zelo, kaj je predmene, milijte v KCh. V zelo, ker se pravi, svoje, da ne vse plizimuje, so vse početno, da je dopozirajte KCh, zkušnja, dopozirajte, z vsek, dopozirajte, da je dopozirajte, zelo se dopozirajte, dopozirajte, dopozirajte, da sam se ne možete vsek, da se dopozirajte, svarne protokole. Zato, da se zelo, pravoma je dolg, konča replika nekačne potrežne. Ko sem spoletil replika in nekako v zovohu, v ko je pa izglede nekaj replika. a je to je zelo državčo protokol, čigaj ovo pa za nekoj nekaj energijo. in in pri našom res replika800 puteš nekaj energija in izrožite replika po nekoj nekaj protokol in uredne nekaj energije na veli nekaj replika. Početno je to tračne protokol. In možda prav ochotno zelo na veče zelo odkbrightvane. Protočen prist. Nivne zemljevacje problemi. Nivne zemljevacje problemi v random. Se bila izvajsno, da se zavisi random zaviskovati zemljevacje problemi. Profesije s kajške je zelo proste. Se vse ne vse zaviske, da se vse nivne zaviske zaviske alpha n zaviske. In je zaviske problem. Vse zaviske vse in vse zaviske zaviske zaviske. Nivne z상ljevacje. ramgon, graph, q-coloring, and so on. Again, the key parameter is the ratio of constraints per variable, which is nothing but the degree of the random graph. And if you consider the interesting cases, you always find a sharp threshold between a phase where a solution exists and a phase where a solution dot exists. The most interesting fact is that, actually, if you run an algorithm, a linear time algorithm, a polynomial time algorithm, you are not able to reach this threshold. You typically stop before. So there is this gap, this algorithmic gap, which was also discussed in the talk by Afonso, that essentially don't allow you to reach the optimality and you stop before. And you want to understand why. So we work hard to understand this in the last, say, 20 years. And let me put here the final picture. So suppose that you take this problem and you look at the space of solution. These are just solutions. So zero energy configuration. And before reaching the set and set transition, actually you have a full set of phase transition. The first one, which is maybe more important, is the dynamical phase transition. What happened here? The space of solution shattered in an exponentially large number of clusters. At this point, there is no singularity in the free energy. If you look at the free energy, this was my comment in the talking talk two days ago, if you look at the free energy, nothing changes. Nevertheless, timescale diverge at this threshold. And so sampling in this region becomes exponentially hard in N. Then if you continue, you have more phase transition. For example, at the condensation phase transition, you really have a condensation phase transition with a free energy singularity. So you can detect it thermodynamically. But even later, you have more phase transition related to the color of this cluster. So red cluster contains frozen variables, variables that inside the cluster of solution must take a unique value. This makes very hard for algorithms to find solution in red cluster, in brown clusters. So this is a conjecture. So having found, sorry, and the rigidity is the point where frozen cluster becomes thermodynamically dominant, and freezing transitions where unfrozen cluster disappear, even they subdominant to one. So we have all these plenty of phase transition. And the natural question is, are algorithmic behavior related to this phase transition? Unfortunately, I have to be honest, no. So if you try to claim something, OK, we did some progress, but it's less than expected. I have to admit. So for example, suppose that you consider algorithm based on belief propagation. Belief propagation provide exat marginals up to the condensation. So you would say, OK, I ran an algorithm. It returns the right marginals. I have to find a solution with these right marginals. Is it right? No, because if you're in this region, when you start fixing variables according to the right marginals, the problem becomes more complex. And so this critical line essentially turns, and you eventually eat a phase transition, just fixing variables. So in this case, we were able to solve this BP base algorithm with GEM a few years ago. And we actually find that the behavior of this algorithm is indeed related to a phase transition. So this is a good news. But the phase transition is before the dynamical one, because both the dynamical and the condensation turn back. So if you want to find a solution without eating any phase transition, you can do it but before the dynamical phase transition. So these are a little bit disappointed, because I would like, at least, to sample uniformly solution until the dynamical phase transition where time scale diverge. If there is no divergence of time scale, why should be difficult to sample uniformly solution? Moreover, if you move to smart heuristic algorithms, like algorithm based on message passing, survey inspired estimation, backtracking, survey propagation, or even Monte Carlo base algorithm, but not sample uniformly, just finding one solution in the smartest way. Or bias random walk, like the focus metropolis search that was discussed by Roberto yesterday. You discovered that this algorithm stops somewhere, but all beyond the dynamical threshold. And where? We don't know. We have to admit that after many years, we have found all this phase transition in the model. But there is no tight connection with the behavior of smart algorithms. Nevertheless, we got an intuition, a key observation. If you look at the behavior of smart algorithms, they don't sample solution uniformly. They prefer, by large, a subset of solution. There are subset of solution which are much more accessible. They have a larger basin of attraction. So studying the uniform measure of a solution is completely useless, because it is not that measure which is seen by the algorithms why they look for solution. So, for example, they never find frozen solution. So if you assume that they never find frozen solution, all brown cluster must be cut away. So now, at the moment, the best conjecture for the ultimate algorithmic threshold is this point here, because assuming that any algorithm can find only blue cluster and frozen cluster, when the blue cluster disappear, there is no way, no algorithm that can find solution. So after this key observation, one possibility could be, OK, let's try to study another model, not the model where the measure is uniform over old solution, but it's bias. It's bias towards the solution that are actually found by algorithms. And this idea is very similar to the one in Ricardo's talk about the robust ensemble. OK, so we tried recently, let me say, a first attempt in this idea, bias in the measure. So let's see if bias in the measure we can understand a little bit better what algorithm do. So we took, with Louise and GM, this random hypergraph by coloring, also known as not only equal ksat. Essentially, you take a random hypergraph. In this case, we took an hypergraph where every hyper edge connect five variables. You want to color with two colors, such that every hyper edge is not monochromatic. Very simple. OK, this model has been studied in detail, all the phase dagger in this work. So in particular, we know the values of the dynamical and the satt and satt threshold. Now, you run simulated annealing, and you realize that it works beyond alpha d until more or less 9.6. OK, fine. But even more important, you realize that the simulated annealing is not reaching old solution in the same way. It's reaching, more probably, solutions where, obviously, this kind of coloring is forbidden, so there is no such a coloring in the solution found, but also this kind of coloring happened less often than expected if you were sampling uniformly solution. Why? Because this is a close interaction. This is potentially freezing this variable, because the green variable is the only one that satisfies the interaction. So essentially, the green variable cannot change. So it's potentially frozen because of this interaction. And what actually do the algorithm produces a solution where these are more abundant than this one. So what we did, we just studied another model where these clauses have weight 1, but these clauses are weight 1 minus epsilon. You can do against statistical mechanics, but now you have a phase diagram in the alpha epsilon plane. So epsilon equal to 0 was the original model, so this is the dynamical threshold of the original model. And you see that the change in epsilon, the dynamical threshold moves. And when it goes, well, more or less around 9.6. So I don't want to see that this is an explanation, but it's a step towards getting closer to what algorithm that do. Algorithm search for those solutions, so you count those solutions. Not the solution that is not reached by the algorithm, because it would be very useless to count those solutions. So in some sense, if you find the right measure, the hope that the algorithmic threshold is still related to some thermodynamic threshold, but in the right measure, not the uniform measure, which is not the one used by the algorithms. OK, so lemme do some first short summary of this algorithmic, of this connection between algorithm and random constraint satisfaction problem. This part of the tool is shorter, because essentially we don't have great result, but just some attempt. So we found many phase transition in this random constraint satisfaction problem, which are hard optimization problems, where we can do statistical mechanics. We got several hints of the origin of computational complexes. So we found dynamical phase transition, long range correlation, frozen variables. So a lot of reason to become hard. But nevertheless, we have to be honest and say that the exact connection between the hardness and the algorithmic complex is still missing. So at least few cases that we have been able to solve. And especially for smart algorithms, we have to invent something else. So we have to reweight this, my idea, the space of solution, such that to mimic what algorithms see. The algorithms see something different from the uniform or Gibbs measure that we are used to compute. So the study is still very long. And meanwhile, if you want to solve a hard optimization problem, use parallel tempering. At the moment, to the best of my experiences, it's really, by far, the best algorithm for solving non-specific problems. So it's a very general proposal algorithm. For example, this is a work we have done recently with Maria Chiara on the largest independent set in the regular random graph, which is a very hard problem. So you have to find the largest independent set is like in physics would be the densest hardcore model. So you put a particle on a random graph and you try to fit more particles possible in a situation where particles are not nearest neighbors. So for d equal 20, already the problem is very hard, and parallel temper is approaching the theoretical maximum value. But especially for d equal 100, which is extremely hard problem, is close to REM, is close to the REM model. So it's really, really, very hard. Montegarlo stops around the dynamic phase transition, but parallel temper is able to enter the dynamic phase transition that here is really, really, very, very sharp. So it's pretty impressive and we have no theory for parallel tempering. So it's one of those algorithms that we have no idea of why it's working, well, the idea of why it's working, more or less, yes, but what should be the limit of parallel tempering, we have no idea at all. Okay. Okay, so this is very open. Let me pass to the second part of the talk and so let me move to inference problem. Again, I want to be very specific because I don't have general recipes. So I will concentrate on Bayesian inference, which is a particularly simple inference problem because essentially you produce a signal Y starting from, sorry, you start from a signal X and according, it's created from the prior and according to a model, you generate data Y and then you pass to the student, the prior, the model, the data and the noise level. So you pass a lot of information, so I agree that this is a particularly simple inference problem, but again, we have to start from simple situation to understand eventually what's happening in more complex situation. Now, using Bayesian formula, the students compute the posterior and then you have to sample the posterior in order to infer whatever you want. Okay. And in particular, if you manage to compute marginal probabilities, you can compute, for example, an estimator that minimizes the mean square error or an estimator that maximizes the mean overlap, whatever you want, depending on what you are asked to do. Okay. So essentially you have to compute marginal probability, but as we know, this is as hard as computing partition function. So in general, this problem is very hard. So again, we move to random sparse by Bayesian inference. So we put ourselves in a setting where we can use an approximation, namely the beta approximation to compute exactly the partition function, exactly the free energy. Okay. Consider that in this class of random sparse by Bayesian inference problem, there are many interesting examples. For example, stochastic block model, which is the standard model for community detection, is in this class. Also, plant random graph coloring, which is the problem that we will use in the rest of the talk to illustrate the behavior of Monte Carlo algorithm in solving these kind of problems. So, why we like this random sparse by Bayesian inference setting? Because thanks to the base optimality, we have an equality between the noise in the data and the temperature that we use to do inference. Thanks to this matching, essentially, the condition which is called Nishimori condition in statistical physics, we know that the replica symmetry free energy is correct. So we are putting ourselves in a setting where the free energy can be computed exactly. But this is not all the story, as we will see. Never the left, we are in a situation where belief propagation returns the right marginals. So we are putting ourselves in a pretty lucky situation, but nevertheless, there are interesting questions. So, once we use belief propagation to do signal recovery, essentially, we want to infer the signal, what can happen are mainly two situations, which were discovered in this fundamental paper by Aurelian Floran Lenka and Chris Moore. So either you increase the signal to noise ratio, which is, again, the mean degree of the graph, because now the graph is providing information. So the denser the graph, the more information you have because you have an inference problem. So the larger the graph, the more information you are providing to the student. So you increase the signal to noise ratio and you measure some accuracy. The accuracy that you can reach optimally by sampling exactly the bias posterior probability or by running BP, which is a linear time algorithm, so what you can achieve. So you have a situation where the phase transition is continuous and so you always achieve the optimality. So the IT threshold is equivalent to the algorithmic threshold. And you have another situation, which is more interesting for our purposes when you go from a situation where, essentially, there is no information to detect, even being optimal, you cannot detect it, so it's impossible. And to a situation where if you had infinite amount of time and computational power, you can't find the signal, but if you run BP, you don't find anything. Why is this? Essentially because BP solves the BP equation, not exactly, but you have to solve it in some way, and you do it recursively. So you start from the non-informative fixed point, you try to perturb a little bit, and you look for another fixed point. But as long as the non-informative fixed point is stable, essentially, you stay there. So this value, CKS, is nothing but the point where the fixed, the paramagnetic, uninformative fixed point becomes locally unstable. As long as it is stable, BP has no chances of finding the signal. So until here, the paramagnetic fixed point is stable. Here it becomes unstable, so BP goes away from the uninformative fixed point and where it goes? Well, the only other fixed point is the one that provides you the signal, okay? And so you can detect optimally the signal. So this is the typical scenario. Actually, in a recent play with Lenk and Guillemi, we found that in sparse inference problem, there can be more complicated situations than these two, but for the purpose of this talk, is enough to have in mind this situation, okay? So in this situation, let me move now to even more pertinent, so easy to understand inference problem. Suppose that you produce a noiseless inference problem. If the problem is noiseless, you have to do inference with zero temperature. There is no temperature, no noise, no temperature. Okay, so in some sense, if you do a zero temporal, so a noiseless inference problem, this is equivalent to do what it has been calling this very nice pair by Frank and Lenka, quite planting in random constraint satisfaction problem. So the inference problem, you can prove that it's equivalent to do the following. You take a random constraint satisfaction problem and you add one solution, the red dot here, okay? So you force that solution to exist always. So essentially you build a random constraint satisfaction problem such that the red point is always a solution. So what happened that the phase diagram, I show you before, changes a little bit. How does it change? Essentially, until this point, so this curve where the old, the entropy of the random problem and this curve is the entropy of the planted cluster. So as long as the planted cluster entropy is smaller than the total entropy, who care about the new cluster is subdominant. But beyond this point, which is the condensation point, the planted cluster becomes dominant. So in principle, if you were able to find the ground state beyond this point, you would detect the signal, okay? So before the condensation, no way. Even if you find a ground state with high probability is not the signal and so it is impossible to detect the signal. Beyond the condensation you can detect the signal, but nevertheless, if you try to connect this phase diagram to the actual algorithmic behavior, and again now we are in a nice situation, you see phase diagram, algorithmic behavior, phase transition, algorithmic behavior. So this is a nice situation where we are connecting phase transition to algorithmic behavior. And what is the connection? So essentially until this point, and now I call the information theoretical point, there is no way of finding this because it is subdominant. So even if you find the ground state, will be not the planted solution. In this region belief propagation is still locally stable on the uninformative fixed point. So no way, it will provide you no information. At this point, belief propagation, go away from the uninformative fixed point and you can find the planted solution, okay? So this is the phase diagram of belief propagation. Fine, this is very well known, but now let's ask what happened with Monte Carlo because belief propagation is not very robust. So I would like to understand where the Monte Carlo method can do as good as belief propagation, which is an exact sampler of the posterior. So on this side, more or less, it's obvious. So here is impossible and so is impossible also for Monte Carlo, even if you find a ground state, will not be the signal with high probability. What happened here, I will show it in the next slide, why even having only the planted state, the finding it is very hard. This is related to glassy physics and then the rest of the talk will be to explain, to try to convince you that actually, Monte Carlo can do as good as BP. So it can be optimal in that sense, okay? So first of all, why is hard to find this ground state that seems to be the only ground state? Well, we know this from many years ago. Actually, this was the same plot that Roberto showed yesterday. It's from a work I made when I was postdoc here. So it was a nice time. And so, suppose that you take this ferromagnetist part-trismim model, don't care about who is the model, but care about the fact that it is a model which has a crystal, which is the planted state. So the energy is not scaled the right way. This would be zero in the constraint size fraction problem notation. And so if you take the crystal and you raise the temperature, eventually the crystal melts to the liquid. But if you now take the liquid and you cool it down, you don't do the phase transition to the crystal, which is here, so this is the information theoretical phase transition. So below this line, the crystal has a lower free energy. If you were able to sample exactly the posterior, you will find the signal, but you are not able to do that with local algorithms. What do local algorithms go? Where they go somewhere here and what are these? Are glassy states, okay? Glassy states, which are not seen by belief propagation because belief propagation is a replica, is under the replica symmetric assumption, but in practice, practically heuristic algorithm, they do feel glassy state. So even if the planted state is the unique ground state, so finding the ground state, you find the signal, in practice you don't find the ground state. You get stuck on the glassy states, okay? And then you can take this transition. The important one are the dynamical phase transition when you get stuck into the glassy states and the IT threshold in a droid here. Okay, so let's put this information together with the zero temporal phase diagram of the BP, belief propagation. So now this was the previous impossible, hard, easy phase diagram of belief propagation and I put also the temperature. So essentially from the IT threshold have this green line from the dynamical threshold have the dynamical temperature, but also have another important information, which is the stability of the paramenetic fixed point that I can draw also in temperature. So in some sense, if I cool down from here when I reach the red curve, the paramene, the liquid becomes unstable. If it is unstable, you have to go somewhere else. Where do you go? There are still no glassy states, the glassy states form at the blue line. So if you look at this phase diagram, you can make some prediction about the behavior of, for example, simulated annealing. If you cool the system here, you find an instability away from the liquid, where you can go only to the plant state because glassy states are still not present. But if you cool down here, you find first glassy states and you get trapped there, okay? So according to this, I should observe some change around 18 in the behavior of simulated annealing. So let's run the algorithm and this is simulated annealing trying to minimize the energy. So if I reach zero energy, I found the plant state. If I get a finite energy, I'm stuck in glassy states. And this is exactly what happened. So for connectivity 20 and 19 have a clear phase transition to the plant state. At 18 have half of the sample getting stuck in glassy states and half of the sample going down to the plant state. At 17, I always get stuck in glassy state. So you see that 18 is very close to the algorithmic threshold for this simulated annealing running on finding the coloring in a random micrograph planted model, okay? So you see that here essentially is a game within three players. You have the paramagnetic state which is uninformative from the inference point of view that you want to leave, you want to leave and you want to go to the plant state which is informative about the signal. But there is a third player which is always trying to catch you which is the glassy state, okay? So if you go to the glassy state before finding the plant state, then you are stuck. Because it's like starting from the hill. If you go to this part, then you are stuck, you cannot go back. This is different from what Lenka presented in her talk, because in the talk of Lenka, because also of the two-body interaction, you can go from the glassy states down to the signal. But that was because the model has two-body interaction that provides always a little bit of signal of force in the direction of the signal even when you are in the glassy state. In general, this is not true. So this model is more difficult. Once you get trapped in the glassy state, you are stuck, okay? And you cannot find the signal anymore. So our analytical prediction for the behavior of simulated annealing is that this will be the algorithmic threshold. So it's not KS that was conjectured before, but actually it's beyond because you have to cross the red line before crossing the blue line. If you first eat the glassy states, you get stuck and you don't find the ground true. Okay? So this was a little bit disappointing because we would like to use Monte Carlo method, yes. So this TKS means a transition, a equilibrium transition between the paramagnetic phase and the crystal phase. Yes. And this transition is first order, I think. Is it first order, yes. Essentially, it's the spinodal of the liquid. So it's like when you have a first order free transition and the state of highest free energy becomes locally unstable. So it becomes locally unstable, you have to go to the one of lowest free energy. But usually when there is a first order transition, it's very difficult to jump from... But at TKS something like this happened. So for T larger than TKS, you have this situation like this and you are here. When you go T smaller than TKS, you are in this situation. Okay, so spontaneously you go to the signal. So this is the signal. Okay? So TKS is where you start having local instability so you can flow. There is no jump. Because here the system is very large, 10 to the 5. We are not sampling time scales exponentially in N that can allow you to jump over the barrier. We can only flow down, okay? So this is the meaning of TKS. But that's in the universe of Landau theory and if you look at real metal composite ferromagnets, usually the first order transitions are granular and are achieved in many small jumps, not by a single maximum, which has gone away and become a saddle point. So in other words, adding realism to these problems may make it difficult to have such a nice example. You can check by yourself that it's done by a single jump and because essentially it's an avalanche, avalanche. So once you start the avalanche, all the other variables prefer to go in that direction. So it's like having an avalanche that brings you in the other. Except that for C equals 18, half the time the avalanche happened and half the time it did not. Exactly, because you are very close to the critical point. Exactly. So we are now in the mixture phase of talk and questions. Well, I have a question, but it's on a different thing. So probably on the previous part, so probably you... Ah, you can postpone it. I just have a few slides more and then we can... Okay, so let's try to close this gap between the threshold of BP and the threshold of simoeniling. And before, let me show you something even more interesting. So when you run this kind of experiment, you realize that there is a huge finite size effect. So this sequence 17. So in principle, simoeniling should not work because if you compute in the thermodynamic limit, the dynamical transition towards glass state take place before the KS transition towards the plant estate. So in principle, it should not find the plant estate. Indeed, if you run very large size, 10 to the 5, you get stuck in the glass state. But if you run smaller sizes, 10 to the 4 or 10 to the 3, all the runs jump to the plant estate. Okay, so you say, these are finite size effect, I have to live with this. No, you can live with this, but you can even explain it analytically. Why? Because when you compute TD and TKS, you have to solve BP equation. You can solve BP equation either on the infinite tree and in that sense, you get the thermodynamic limit. So you get one number, okay, the number that you will get in the thermodynamic limit. But you can also run BP on a given graph. And the result depends on the size of the graph. This was very unexpected because you lay these analytic tools if you run it on a large enough graph and you get something which is very close to the thermodynamic limit. No, in this case, we found a huge finite size effect even in the analytic method. So you run BP on the graph. And for example, the full curves are TKS measure on some thousands of different graphs, okay? So you see they have huge finite size effect and the same for TD. For TD, the finite size effect are smaller. So what happens is that if you take the max between the two for each graph, what you realize is that for 10 to the 5, the max is always TD in accordance with the simulated and annealing behavior. But for the smaller sizes, the max is always TKS and it happened before. So we have an analytic prediction of why smaller sizes jump to the solution instead of getting trapped to the glass state. Because on those sizes, one instability happened before than the other instability because of finite size effect. And so we have an analytic way of controlling and understanding finite size effect in these problems. Okay, so let me go to the last point. Let's try to close the gap between BP and Monte Carlo method. So we use this replikiness in annealing, which is very easy. You just take R copies and so the sum of the energy of the R copies plus a coupling term. So in some cases you prefer a little bit to these R replicas close by, okay? The reason is that when Ricardo and Koworks introduced it is because they were interesting counting regions of the phase space which had a larger entropy. So you want to find larger valleys and so it's better to have a big ball instead of a single point like. And in our case work very well because essentially the plant estate has a larger entropy. So look what happened for the usual very large size, very low slow cooling rate that we are using all the runs for 17. So y equal one is the original model. It gets stuck. But already for y equal two, or even better for larger valleys of y, you do a first order transition to the plant estate. And moreover what is nice is that this is our analytic prediction. So we do have analytic prediction for this kind of model that seems to beat standard similarity annealing. Where the analytic prediction come from? Well, in principle if you want to do the exact solution is very costly because you have to, for each variable you replicate y times. So if the variable has q states, the super variable has q to the y states. So it's very costly to solve it exactly. So we do something a little bit more simpler, which we just run BP on the replicated graph. The problem is, so you take the random graph, you replicate r times, so you put the extra connection between the replicas and then you run BP over that graph. The problem is that there are small loops because if this is an original edge and this is the same edge on the replicated graph, then there is a connection between the replicas and so you have short loops that in principle prevents belief propagation to provides the right answer. And this is true for finite y, it provides a wrong number. Nevertheless, if you take large y, this coupling becomes very weak and so the effect of loops becomes irrelevant. And in practice what we get is that our prediction for the threshold of replica, the similar annealing, come from the large y computation of BP on this graph. Even more interesting, when you put y larger than one, the glass transition changes natural, becomes continuous. So it seems that coupling these replicas not only allows you to find the ground state more easily, but even the kind of glassy phase that you have is different. And we know that optimizing in a spin glass phase which is continuous much easier than optimizing a first random first order spin glass phase. And final comment is that if you do this computation on the infinite tree, you get badly wrong numbers. Why? Essentially because the messages passing along the same edge are correlated. If you go on the infinite tree, you wash away that correlation and you get really bad numbers. So you are not allowed to go on the infinite tree. So these are another example showing that running belief propagation on a given graph is much wiser than doing computation on the infinite tree. Okay, last slide, I just show you the numbers to convince you that the ultimate threshold for replica and annealing is very close to the BP1. So this was the old phase diagram and now we are working at much higher temperature. So really coupling the replicas, the phase diagram changes a lot. We are at much higher temperature. And TD and TKS are very close by and so you have to plot the difference in order to convince yourself that the difference is going to zero close to the same threshold as BP, okay? Is a tiny difference, but okay, we are doing an analytic computation so you can compute things with really four digits of precision and so you can do it. Okay, so let me conclude, more or less in time. And so even if we concentrate on very specific inference problem, the nice news is that it seems that we have an analytic prediction for the behavior of simulated annealing and replicated simulated annealing, at least for large y, which was highly non-trivial. By the way, is mandatory to consider glassy states, okay? If you don't take into account glassy states, you are not able to make any prediction about algorithm trying to solve hard inference problems. And usually these states are not seen by approximate methods passing of propagation because these are replic asymmetric approximation, okay? But actually in some cases in BP we see it, but not always. We are able to study finite side effect even analytically via the behavior of BP on finite graphs. And finally this fact that the glassy transition changes the nature, in my opinion, put some hope on the possibility of entering the hard phase by Monte Carlo Mezzo. At the moment we brought Monte Carlo method until the limit of the hard phase running in linear time. What happen if I run Monte Carlo method in a time which scale with a larger power of n? N to 1.5, N to square. Can I enter the hard phase so I really need exponential times to find the signal into the hard phase. So I leave you with this open question and I thank you for your attention. Thank you very much.