 Can you start sharing? Do you hear me? Yes. Great. Yes. Please start sharing. Do you see my screen? No. We see you but not the screen. How about now? Yes. Now we see. Okay. Just a second. Professor Mosteri will talk about local Monte Carlo moves with algorithmic quantum and thermal fluctuations. Just a second. I'll start. Recording in progress. Okay. Please. Good afternoon. I would like to try to organize there for inviting me to present this talk. And I apologize for not being able to attend in person. I'm going to present a new class of quantum and classical Monte Carlo techniques that rely on algorithmic or inhomogeneous quantum and thermal fluctuations. So there's a history of physics-inspired heuristics that have been applied to optimization and sampling. This starts from original metropolis-hasting algorithm, simulated annealing, and half-feed model, Boltzmann machine learning, and various techniques in tackling optimization and sampling for continuous variables like Hamiltonian Monte Carlo. And all the way to paratemporal server propagation and influence from quantum mechanics. We mentioned quantum annealing. There are new insights in non-equilibrium and magnetic physics that I think it's going to be important to develop a new class of quantum-inspired optimization and hybrid quantum classical optimization and sampling. Which I'm going to discuss today. So what is the main problem with all these techniques is that generally the mixing time of local equilibrium, Markov chain Monte Carlo, sampling is exponential in the worst case. And this has been an open problem for decades. And people basically assume that this is just a fact that we have to deal with it. So I'm going to challenge that notion today and saying, like, if you become violet, either local or equilibrium assumption, we might be able to basically create shortcut in the dynamics of the Markov chain Monte Carlo and Arabian studies stay much faster. So I'm going to present the results that we have obtained recently on a class of quantum-inspired algorithms. There are two techniques that we introduced in this work. Basically, we have an adaptive gradient free strategy that can learn key instance-wise geometrical feature of the cost function and use that information on the floor to create cluster moves in the configuration space. And basically avoiding this kind of inevitable exploration versus exploitation trade-offs. So what is the intuition behind our algorithm? So I guess in this audience, you are all familiar with this cartoon of a one-dimensional picture of objective function versus a solution space that can have this rugged landscape. And you can use various techniques using basically this equation or thermoscape in various algorithms like simulated annealing or higher-temporary to get to some fairly good quality solutions. But it's usually the case for really hard problems that you get stuck in some local minimum. And the idea of using quantum fluctuations to basically tunnel in this configuration space has been introduced by Nishimori. And then later on within the adiabatic quantum computing community and Eddie Farhi, Daniel Liddar and others push this line of research. So I'm going to discuss here possibility that we can do this classically, actually, or even build hybrid quantum classical algorithm. So I should mention that here in this work, I consider the energy function to involve all other not only quadratic terms of interactions of variables. So variables here are binary, but the general algorithm can be applied to a very discrete set of problems. And it could involve K local interactions. So the inputs are the real parameters given by HI and GIJs. And we want to find a bit of string that minimizes cost function. So I'm going to mostly present a cartoon of the algorithm, but I have some equations for people who want to see a little bit more details. And so let's start. So what is this kind of this new step? Our algorithm basically starts with running an off-the-shelf server for a bar to get a local minimum. Like assume this kind of red dot here. Do you see my cursor on the screen? Yes. Yes. Great. So the idea is to get some ground state or some high quality state that is far away. Usually people, when this happens, they basically push the panic button. Or we have a problem that basically we are not making progress in a low-temperature replica, for example, in pyro-tempering. So let's boost the energy temperature. And as you boost the temperature, the energy landscape gets flatter and flatter. So you obtain exploration by losing all the information you already had in the replica. So you lose everything you had because by increasing temperature everywhere, you have, you know, in ideal, you know, in extreme cases of inferior temperature, you can just fly everywhere. But you have no information about the configuration landscape anymore or very little. So this is not what we are going to do here. So what we are doing here is that once we get localized in a Bayesian attraction here, for example, given by this red dot, we want to, instead of trying to desperately to just get out of this with some, you know, one-size-fits-all technique, like just increasing temperature everywhere, we just want to understand what is wrong. Why we got a star here. And in order to do that, we actually, you know, we make the problem more localized. So we add a term to the Hamiltonian, which is proportional to this kind of general lambda linear parameter here. At lambda zero, basically you have original problem, which is the entire non-convex problem. At lambda infinity, you have just the bitest ring you obtain, which is given by s star here. Here I changed the indexes from s to r to indicate this is the localized replica. And basically what we do is that we say that there is sufficiently small lambda that the system becomes basically convex. So you, the solution to this Hamiltonian, this Zoriga Hamiltonian, would be basically in this blue region, which is basically we convex firing this. But it's not too small that we escape this minimal, basically this Bayesian attraction. Why we want to extract information about the frozen variable or backbones that are pinning us here. Okay, so we have this kind of epsilon i side-dependent rescaling as well. We want to make sure that, for example, for scale-free networks for heavy weight variables that interact with a lot of nodes, we do not reduce the lambda too much to escape from this minimal. So actually in contrast to other things, we want to be localized here. Why? We want to extract information on what's going on. And we do that here with loopability propagation, but you can use any algorithm, that any efficient inference algorithm would do that. So here loopability propagation is just basically is the field or beliefs of that side j-gaze from i or the neighbors, and given by the local magnetic field, the term, linear term that we added, and also basically all the messages that goes from the k variables that are within the neighborhood of i, which of course is not j, because we want to see the influence of j. And these messages are given basically by jij and this local field themselves. So you have to solve this pair of equations. So we do actually an adiabatic loopability propagation. So looping means that, you know, general loopability propagation are exactly on tree and basically any graph that like beta lattice that has a local tree-like structure that perform fairly well. But you can apply them to general graph, but they perform very poorly here. They do well as far as lambda is sufficiently big. So we start from lambda that is very large. So it becomes like your boundary condition initialization of these equations. And then we just decrease it every time we get to the fixed point of the loop propagation. And we do it until it breaks down. Once it breaks down, the advantage is that it gives you information that you left the basis of attraction. And so then we know that the information, any information after that is unreliable. We go back to the last time that it's converged and we calculate local marginalization and higher order interactions, like 2-point correlation or K-local correlation functions. So why we do that? So we believe that this information, if you look at this, it has some very valuable information here. It tells you why the things are not working. It tells you what kind of subset of variables are frozen because they are going to have very large local magnetization and they're going to be correlated with each other. And then we do some thresholding. The technique is that basically say if the 2-point correlation functions are above certain threshold, we consider them to blind to the same basically cluster. So we call these things backbones and I can tell you why. So basically this is, you know, like Cartoon considered like these are the six variables here that are frozen basically have very large local magnetization and these are the red variables. And the other ones are stars or like basically in these phasional attractions, they are floppy. They can have any things and you are still there. You are not moving out of these phasional attractions. So one thing about the language when I say here backbones, we actually mean a surrogate backbone because the backbone is actually a notion is used in the Slingolas community and graph theory is that a set of variables that all have the same value in all the solutions or degenerate solutions. Here what we mean is a surrogate backbone is that all variables that almost have the same value over all high quality solutions in a given region of attractions. So these are the backbone of a given region of attractions, not the global solution. So what kind of value you get from this calculation is that what it does that first notice that this is on the fly instance-wise. So you calculate the backbone of a given instance on the fly when you are stuck in the regional attraction. So one thing that you can think to actually do is to flip all these variables saying like if these are frozen here maybe a simple simulated annealing or any multicolor techniques that you just flip one variable. It gets always rejected because the energy penalty is too much so maybe you should just move all of them at once and this is kind of a quantum spiral move. You can do this but most of the time this is not going to end up well because you are going to go to a much higher energy. Definitely you are going to be very different points in the regression space. So you are absolutely doing a non-local move but it might be a lot of work to get done in that new region of attractions and it's not clear that even it has better minima in general unless you have a z2 symmetry this works really well but what we ended up doing we used this mechanism but in conjunction with something else so this is the second key idea here in this work actually it's saying if there are subset of variables that are pinning us in that region of attractions why not just increase temperature over that subset because it means that they are actually operating at different time energy scales basically they are exponentially slower these are basically related to size of the outlet excitation for low-diameter spring glasses as well so this is known that you are basically bound to the dynamics of these backbones so we are just making it faster basically fast forward in data dynamics of only this subset but why once you do that basically you are essentially flattening the energy barrier that is separating you from like the other local minima that might be better and if you do that then you can start the algorithm again you say okay now I'm getting stuck in a new region of attractions you run the algorithm again and you find a new backbones and then you increase the temperature over that backbone and hopefully after a while you get to a ground state or a much higher quality of states in an approximate ratio so to look at the high-level version of the algorithm so you have a higher-tempering backbone algorithm this big-year baseline algorithm which is these are different rep because of different temperatures so the red one is the highest temperature and you have a stack of them at each fixed temperature you run an MCMC sampling you find a seed solution using that seed solution you add that penalty term to the Hamiltonian to be the local surrogate problem and you do an efficient approximate inference here we use loopability propagation because they have by construction linear scaling and also it gives you a signal whenever you left that region of attraction which is we really need to do together meaningful information about the backbones and then you will put backbones using the calculation of higher order marginals and once you do that you can boost the temperature inside the backbone basically and fixing all the variables outside the backbone or non-backbone variables and once you sample that for one and then you fix the backbone and sample the rest and you can do this iteratively by basically doing inhomogeneous sampling inhomogeneous means that the temperature is not uniform everywhere for simplicity you can consider two temperatures inside a higher temperature inside backbone and lower outside backbone but this can be more complicated the schedule I'm going to discuss some of example of that later but basically and still you do the standard replica exchange between different temperatures so this actually our numeric shows and it's very easy to see that below the surface transition these things is more valuable above that you just don't need to do this fancy technique so this works well for low temperature replica so let's go to some of the benchmarking results so we studied quite a different sets of problem we actually started the chimeras this community motivated from the way of architecture and it did really well for the quasi-2D but then our goal was to go beyond that because for 2D system there is this iso energetic move ICM moves or HODAR moves that you can actually grow clusters and what we ended up studying is that we want to do the hardest problem that are possible and as you know random KSATs are the testbed of computational complexity so we studied four SAT problems and also random QIP and in some industrial quadratic assignment problem I'm going to focus here on the random four SAT problems that are well studied for decades so the limitations of both generic and dedicated solvers are well known so you can see if you are actually doing something non-trivial here or not so one thing the first thing we did is that we studied it against worksat which is a generic solver generic stochastic solver for KSAT and for random instances 100 instances of four SAT a near computational phase transition so these are the size of problems for 5000 the ratio of clusters to variable was close to 10 which is pretty deep into the rigidity phase that means that almost like each variable has 10 connections and that the instances for four SAT have on the typical instances are very hard to have first order phase transition we indexed here by the so called complexities based on server propagation you can calculate the number of basically clusters of the solutions that you could have and so the fewer you have it's a harder problem and we studied this and we observed that we have almost order of magnitude improvement over worksat in worse performance of a solver over 50 repetitions would do better than the best worksat so actually here this data is for four but we repeated for many repetitions to see if we have worse case performance but it's always we are in the basically single digit violations of clusters so this is the vertical access in logarithmic scale is how many violations you have and here on the order of 50,000 clauses and basically violations of single digit is like around 10 to minus 4 approximate ratio so here the result comparison with server propagation which is the dedicated solver for case that developed by Parisi Mark Mazard and other coworkers and this solver is known to perform is the best solver for this we have there is another variant is the backtracking server propagation that we also benchmark and in both cases we obtain orders of magnitude improvement for like top 10 worse instances and here for this is just four repetitions of this non-equilibrium Monte Carlo for 95% of instances it's really better at 10 to minus 10 to 9 number of 6 so comparing it with the local stochastic algorithm we benchmark adaptive pyro tempering this adaptive pyro tempering is really optimized we tried it on various different problems so quadratic assignment problems on the order of thousands of binary variables in fraction of a second but here you can see that this is struggling to solve problems this force of problems so the performance of the local strategy increases compared to the local strategy and as you increase the number of states for example 10 to 9 we solve 50% of instances more into 10 to minus 4 approximate ratio so to summarize here in this part of the talk is that we develop a technique that actually can learn geometrical features of a cost function in an instance-wise fashion and involve non-local cluster updates this is and we observed order of magnitude improvement versus generic and specialized solver for approximating random case-out problem near competition phase transition so I want to here emphasize that as you increase the clauses to variable for a case-out problem when you are under constraints so basically there are fewer clauses much fewer clauses than variables you have basically a single this is like a cartoon or in solution space that you can see that in that regime it's completely satisfied and it's very easy to add like connected solution it's basically convex if you add more clauses you get to this kind of clustering phase transition and you have a lot of different solution version of attractions but they're exponential of number of them and it's very easy to find one of them if you add more clauses you get to this condensation phase transition you have fewer of them but it's still easy because these are liquid basically clusters like that if you add more clauses you get to this rigidity phase transition and here deep into this regime as you add more clauses before sad phase transition here this is something known as overlap gap properties where the distances between the basional attractions are larger than the bits of this so when this overlap gap properties there are rules that's known it's recently proved that any local strategy that is basically insensitive to the inputs so that includes simulated and in quantum like QAOA is in that class these are insensitive so they have this concentration and they're basically they don't see the geometry of various geometrical features and they cannot perform well and this is including in server propagation as well in deterministic category basically they have linear scaling but they fail to penetrate this and we have evidence that now we penetrated into this space of solving some of these instances of course that doesn't mean that you solve the worst cases there's always an entropic barrier that is beyond this even if you can identify all the backbones basically there's just too many directions even if you flattening the energy landscape still you have to find the right direction in high dimensional system it could be exponentially high so what does this mean for quantum computing community and quantum annealing in particular doesn't mean that it's harder to perform like observe quantum speedup and in a way yes if you want to compare benchmark against an standalone quantum solver but if you create hybrid quantum classical solvers actually believe that actually you can do better than any individual quantum classical techniques so to illustrate this I want to show you this the same 1D plot of objective function versus configuration space that you can see like a like a simple local sampler we can find these different vaginal attractions these are the purple region here as you improve this kind of classical techniques with these non-local moves you end up in new vaginal attractions like that are inaccessible basically with the off-the-shelf solver but some of this could have facing a quantum own barrier something that is better to involve quantum fluctuation and that basically the interplay of these non-local classical and quantum moves have increased your egoticity in the configuration space so how do you do it in the overall scheme of this algorithm is that as you have a baseline algorithm and you do this kind of mcfc sampler at that temperature you find solutions again building surrogate Hamiltonian doing approximate inference growing the backbone now instead of doing this inhomogeneous thermal fluctuations for some of these backbones that are very hard and it seems that they don't have their own structure so you can do this hierarchically in each backbone there could be another backbone of course but some of them have no structure and you can see that the quantum fluctuation can give you a better diffusion basically you could see quadratic speedup using that hottest start so you see vaginal attractions that are inaccessible otherwise but also this does something interesting it's accessed as sub-grab selection sub-grab selection for just like quantum solver first that notice that it can reduce the sizes a backbone could be an order of mind to a smaller number of variables and also reduce the dimensionality of the graph like the graph connectivity and so the embedding becomes much easier task to do so it becomes an effectively could be 2d or even 1d backbone which could be easier to implement in a like a NIST device so basically this could be like a technique for various different sub-grab selection when your problem input is just too big so what kind of quantum techniques can be used here so I'm going to highlight one general technique that we developed over the past few years we had micro-operatives, micro-gram and adult-flow and others that we use kind of inhomogeneous quantum annealing protocols and we have a recent work that we showed that how these techniques can lead to sampling a diverse set of solutions so the notion of diversity here is important it's not important only when you go to the sampling it's not only important to get just one good solution you want to get different solutions of different flavors with large you know distance to the original solution that you have so we have this Paris order parameter that characterizes the overlap of two basically solutions over n variables but this is a 1d projections of what's happening so first at low temperatures it's featureless but also it just it cannot distinguish if you have two dominating projections or 10 or 20 this is insensitive to that kind of information if they're like basically the droplet exaltation have the same volume so if you consider a ground state that like this is for 2d system like having a bit of string with like a white and black denoting spin up and down you can see that for single exaltation there are droplet of variables that you can flip to get to the degenerate single exaltation this is with an approximation ratio of 0.001 and they're very different in flavor but they don't they give you similar cues so we need to get to this kind of level to see what are like characteristics of solar if you want to even benchmark this could be a good indicator a new way of creating distinguishing performance of biosolver so we introduced this diversity measure estimating the set of low energy states for a given approximation ratio let's assume that you can do that the diversity measure is based on some distance measure between two solutions like we use having distance we do have connected having distance basically having distance that is based on a set of variables that are connected between each other basically this is inspired from droplet excitations of low damage spin glasses but it's also related to the notion of back ones I introduced earlier so we say that if this distance is larger than some renormal distance threshold basically say n over 4 or n over 8 it becomes increasingly unlikely that these solutions are belonging to the same regional attractions of course this is not always the case there could be connections between far away regional attractions basically when they are connected with only local moves but that just becomes unlikely as you increase this R so if you build a graph which each vertex is a high quality solution and you just truncate the edges correspond to distances larger than Rn the cardinality of maximally independent set over that graph tells you basically how diverse the solution and you can introduce a diversity ratio similar to an approximation ratio saying that the diversity of a particular solver given by this condition over the diversity of all solver combined or if you have a solver to approximate way of calculating the diversity you can use some greedy algorithms to see what performance of this solver is in contrast to basically to approximation ratio which doesn't give you any information about how different the solutions that you are obtaining there might be just one region of attraction with very big support that your solver finds and it's very deep but always you get that and other solver could have many different interactions so there's no way that approximation ratio give you that information so we tested this with this kind of benchmarking homogeneous versus inhomogeneous schedules why because the inhomogeneous can have with their envelopes like basically the way that you do it is that you have thermal quantum fluctuations to basically start at certain variables and have a space time profile is that they propagate from one point could be from one end of linear shape for example or from the middle and basically propagate to all variables in some function that is given characterized by this JG function which is most time and space dependent so there is some argument based on causalities that basically if you do this because the variables that are not experiencing phase transition already get influenced by the one that are already D they act basically as local feed biases to them so it's unlikely that they violate that so basically you get suppressing imagions of topological defects this is actually imaged from like studies of kibble's rate mechanism or continuous phase transition that are varying by Mark Ramson and Zorek himself so recently we developed a multi-front version of this is that instead of having a single critical front how about just starting at very different positions like that you start the critical fronts and they merge at some boundaries of course this could be only effective if you have some insight so you should have a hack of what basically these droplet excitations are for 2D actually we developed approximate tensile network contraction basically based on an MPSMPO or you could use corner transfer matrix techniques to to have efficient basically tensile network contraction to get basically bungees of these droplets of course for 2D system gives you more than that but we use only the information about the boundaries and we basically triggered the seeds of these critical fronts in the middle of these droplets and we basically did in homogeneous quantum annealing we used this quantum Monte Carlo to do the simulations we did 2 different cases of 30 by 30 lattice and 40 by 40 and we basically tried to see how much works we need to do to some to get a good diversity ratio from basically 0 to 1 and this is the vertical axis time to diversity for this in different order of magnitude here about 10 to 9 we consider time out and you can see that the performance increased by 30 to 40% in the diversity for both like 50% median and 80% quantum that in homogeneous time out much later basically can sample from 50 to 75% or going from 50 to close to 90% of diversity compared to the baseline simple homogeneous schedule so we did some experiment on this and that with Alex, Hossein and Muhammad which is you can look at that paper I'm not sure if there's a poster talk on that but Marek give a talk on Wednesday 11 I guess on this diversity measure techniques if you like to see more details please attend that talk so basically with the combination of quantum classical fluctuation we can see more of like unaccessible version of attractions in the frozen regime that possibly you can do with either classical or quantum and once we do that basically the ultimate kind of product would be to offer it for example in a cloud based system like Google cloud platform for example that you have a paratemporal baseline algorithm and in low temperature replica you can work various different on local move either by some classical processor that does you know it could be a dedicated processor or an icing machine or a quantum processor and all combine like new ideas non-local move and basically you have to basically work these systems I think these are the ultimate best discrete optimization sampling system that are going to be developing in the next few years thanks a lot for your attention thank you Masoud questions I think there was a question also in the chat start from that so from the chat Catherine was asking are these 4 SAT instances generated at a hard critical point in close to variable ratio yeah so they are generated deep into the frozen regime based on the estimations from server propagation yes they are very close to computational phase transition but we wanted to be just into the SAT regime so there are at least some of them are going to be satisfied at thermodynamic limits if they were like large you know in principle before SAT on SAT phase transition which is around 9.98 that would be all of them should be satisfied but because this is finite size effect so some of them are not even satisfied although they are in SAT regime but they are very close to that thank you for the talk I have two questions the first one is is there any reason why the enormous Markov chain that you built within the replicas should converge because I feel like detail balance should break down there yeah that's a very good question we are actually having a follow up work that we show that although this does not satisfy detail balance actually that's the reason we call it non equilibrium Monte Carlo but it arrives at a steady state and I can give you an idea why that's the case basically notice that the way we do it is that we are not really having this kind of crazy like a non local move that we flip all the bits simultaneously we have this boosted thermal fluctuation within a backbone and once you do that basically if you freeze the backbone variables it's just a standard MCMC at high temperature over a sub graph and it satisfies detail balance okay and once you get to a better like a local minimum basically of the backbones you freeze that and you just sample a standard MCMC outside and that's also satisfied detail balance so basically you can think about very, wherever you force the backbones and you just sample the rest it's just a convex or in a Bayesian of attractions if your estimation of the backbones are conservative enough basically that lander is sufficiently big you are going to just like this simple sampling in a Bayesian of attractions and satisfying detail balance there and once you freeze the non backbone variables you just satisfying detail balance with sampling a sub graph okay so the combination works but we have a construction that show that this is actually the case or more rigorously in a follow-up work basically yeah there is a study that says immediately we were robust across the board for all the instances we try and it's basically very fun but I think you got the idea that basically you satisfy detail balance in the non-local move and you're like one and you're also satisfying detail balance when you're also you know just trying to do exploitation in a Bayesian of attractions I see thanks my second question is when you are comparing adaptive paradigm with non-equivalent Monte Carlo you look at the number of Monte Carlo states but it seems to me that you are not first of all considering the time it takes for you to have your local solution before starting non-equivalent Monte Carlo and then you don't consider the time it takes you to build your localize so good problem and to grow the backbone right so if you include those time do you think that maybe if you increase those time for adaptive parallel tempering maybe my catch up with NMC? No we include those time so basically it's a good question you know you are saying that there is a computational overhead even if you have like algorithmic scaling that is favorable because believe propagation is just has linear scaling the number of parameters involved but the thing is that still it takes some work to do this usually in our experiences was like a few percent between 5 to 10 percent overhead of the LBP and if you add these things to adaptive prior tempering when you are hitting an exponential barriers these things don't matter 5 percent more time you absolutely obtain no weather solutions so prior tempering when things gets really stark you know you can improve the number you can increase the number of seats or the magnitude it's not going to be much so if you are really touching obtaining information about backbone it was every penny to actually find out and do this technique so in our experience that the overhead was not a big deal okay so thank you for the nice and thank you for the nice and interesting presentation and so as for the non-local update so I suppose that there's a proposed NMC algorithm so requires additional computational time so actually I was wondering so is there so any benchmark of the time-to-solution so compared with naive submitted annealing or naive prior tempering methods so are you asking like if so to obtain basically like a calculating lower ones on these things these are heuristics very hard still we don't have any I think general results to even compare like prior tempering versus simulated annealing or quantum annealing versus simulated annealing generally obtaining like performance guarantee over the worst solution we don't have that and I don't think that's actually the right way of thinking about it because you know this general objection to this kind of people who are you know they think it's really great to prove the worst case performance generally whenever you do that it implies that your algorithm is kind of has a brute force nature otherwise how do you know that like in worst case you have that performance basically in branch and quantum techniques you have to just like exhaustively look at every other branches to know what the worst case performance is here we cannot do that it's like a heuristic so and then you need to benchmark it hey Masoud very nice talk have a question and a comment so the question is it seems like you are kind of driving a stake through the heart of quantum annealing by proposing that tunneling is something that we can mimic classically we don't really need to do quantum tunneling we can in fact quantum tunneling takes exponential time and the width of the barrier and so on and we can just cleverly circumvent that using your very nice classical move is that the right way to think about it in many cases yes not all cases I don't say as part of my talk I was saying that I think the combination of quantum classical could do better and I can tell you an example of why that's the case so imagine you have a backbone that you find and has no particular structure and you can do a classical and assume it's just one the like a very low dimensional backbone you can basically once you do this kind of increase the temperature over there you're going to do really faster much faster than a standard paratropic but it doesn't mean that the classical fluctuation is the best to do because if you do a quantum work basically it's like an unstructured backbone a quantum diffusion has this kind of quadratically faster diffusion compared to classical think Santoro actually has a result on this that shows like general compare to a stochastic diffusion classically a quantum there is a quadratic separation so you can see that you can sample the backbone much faster quantum currently at least in absence of any other underlying structure so here you have to notice that this is like appreciate that here we're just saying that there's some underlying hidden symmetries that people don't see it when once you get to this so-called a spring-glass space transition that might be one might be able to compute and explore that to basically get out of a spring-glass space but it doesn't tell you that getting out of it is always the best to do classically it's that like the core things is that once you find the subgraphs that are the computational bottleneck a still quantum advantage could be observed over the subgraphs okay then a comment so we had a paper last year on benchmarking three XORSAT for many different type of heuristic algorithms including quantum kneeling but the state-of-the-art classical approaches the winner in that benchmark was Parisi et al's SAT on GPU don't know if you're familiar with it but they they have an excellent highly specialized algorithm running on clusters so I can send you the reference to that but I'd be very interested to see the performance of your new algorithm on that benchmark the SAT on GPU and also the Fujitsu digital annealer was very close those were the two best algorithms there so as I mentioned I believe that against any algorithm that are local this means that all classes of prior tempering that we know including the most you know we have very efficient implementation this is actually you know something that developed by Segi Arsako we improved it by 40% we have adaptive strategies that it's very faster so in the probabilistic algorithms we actually benchmarked very good students the best in the class and against the deterministic we worked with Parisi team like basically Federico Richard Tresenchi Rafa there are actually quarter on these things and they run various versions of server propagation backtracking server propagation the particular algorithm you are using your benchmark we can talk about it and see if you can benchmark against those problems but generally I'm saying that any strategy is that local would not be able to penetrate to this so-called overlap gap in the problems that have overlap gap properties fragmented solution space if you are in that fragmented solution space things are believed to be exponentially hard if you are local and you are insensitive to the input so the things that this algorithm is sense you know I just really want to emphasize I think future of like optimizations like you have to have sensitivity to the input means that for each instance you should do different and ensemble averaging is basically any technique that has concentration I think it's just not clever enough it's not sophisticated enough because this shortcut disinformation about this geometry of energy landscape is going to average out like it's going to be nothing if you do any disorder averaging Other questions? Thanks for the talk. I actually have a very naive question that I had been wondering since the very beginning of the talk what exactly are these back bones how does they really relate to the fragmentation? Sorry how does it relate to? The fragmentation Oh the fragmentation in the solution space so basically you know the back bones are you can consider they're just a set of variables that give you the address to a regional attraction the address means that if you fix those variables to that kind of either up or down like it's just a bitter string over soft ground if you fix that they just give you like a location of a regional attraction the rest of the variables that are floppy are the degrees of freedom that gives you various different points in that regional attraction so why that's the general feature is that it is like no matter how you label it you can say like the first variables if it is 0 or 1 it can split the configuration spins in half so that gives you an address if you are in the first half of configuration space or second half so you can see like if the first 10 spins gives you an address but the generally the idea is that finding the clusters that are conspired together to trap you in a regional attraction is an extremely difficult thing so I should emphasize that we did this work on the unweighted FORSAT which was no information down to a random FORSAT they were unweighted so in the description of the program itself is like a white there is no information at all to develop a cluster the information solution space in this fragmented or shattered solution space is that those this joint or the gap between these solutions to introduce this diversity measure to capture it gives you an idea of what basically the features of this are of course you know finding exact backbones for all these basional attractions is a sharply hard problem not only that you have to solve like the problem you have to find all these good quality solutions and that's just not possible in polynomial time but we get an idea of what is going on in the cartoon or like a shadow of the backbone efficiently with like something that is linear in time and that's good enough to create the shortcut in the configuration space that's the idea I don't know if I answer the questions like these are basically you know saying what are the backbones or like subset of variables that gives you the address of this kind of particular shattered so there's an intuition that says that if you sample you do a sample widely over a bunch of local minima then the backbone of those local minima is the good part the part you're supposed to keep because it's what they all have in common that make them local but you're sampling from a local minima and the backbone is the bad part that is keeping you stuck there and so I guess if sometimes backbones are good and sometimes backbones are bad and how one would tell the difference oh okay so there's two things like these are you know as I said there are surrogate backbones of like various different high quality solutions like local minima that you find the backbone in the traditional like the standard notion is it's the set of variables that are consistent within all the solutions so if you find them basically you're done the rest of the problem basically this creates a backdoor like a dedicated sub-solver this is called like a you know finding backdoors of a problem it means that by fixing this you're truncating in the like branch and one technique of you know things you're truncating the decision tree and you only you're basically in your right branch and whatever you do it's going to be good but finding that particular backbone that correspond to the ground state or the general ground state is like very hard so here what I'm saying is that the other backbones are other backbones correspond to the other vaginal attractions and those if you discover them basically they're conspiring against you to keep you local and once you sample over those backbones you might end up in the correct vaginal attractions that those degenerate ground states are so basically here I define backbones more generally for any like a deep valley in the configuration space and it could be one of them like has degenerate solution but I think solution is always within a target approximation ratio so if you have like something beyond ground states directly you have multiple vaginal attractions far and having distance that are satisfied so basically there is not a backbone that is good you have to just find the ones that you know have they're deep enough that you penetrate to the target approximation ratio okay any other questions one more so when we started about parkolation theory I actually came across a concept that you are actually fusing this kind of cluster so that you come to the global minimum so that concept is quite clear to me but I don't really know why are you really fragmenting the clusters into smaller clusters that has got local minimum you don't need to do that there are various versions of how you can use that information basically you can consider two tier like you know first of this algorithm is just you you discover things either they are belonging to one of the so you get back ones or they are not so but you have to consider this is an algorithm that is recursive but it's also online basically each point in the you know when you're running the standard MCMC it starts building some model of what the configuration energy landscape is basically so you do approximate inference on the flight to extract that information and then basically as you do this you improve your knowledge about these back ones you can do this in different ways to dig deeper into a one back one saying like maybe there are more back ones inside that that are the other links but you don't need to basically it depends on how complex your problem are and how you know what are your target approximation ratios it might be that if you don't do that certain target approximation ratio are inaccessible because although you find a back one but the back one has a structure itself I should tell you size of back ones could be really big for force and like 5000 variables we find frozen back one of under order of 4000 and we show using vitamine procedure that is actually developed that you can show that these are frozen variables of that size so we have a generalization of that you know vitamine procedure for low energy states it's originally introduced for only the solutions but it's never gives you a false signal so basically it tells you whenever you have a frozen variable and we show that adaptive priority template cannot get to any of this in even thousands of repetitions so it's not like a based on some law so like you cannot just get lucky by you know finding this these things that are so far away with large back ones so a back ones of 4000 are still almost the same size as the original problem it could be hard but that that's basically your new soft problem so you can target that and try to see if there are any other back ones inside of it so so you can but here the result I showed you is just only one time you're not digging deeper into each of the back ones still we got this performance in the first year of all algorithm is that clear okay so I think there have been a few questions unless there's anything you okay one last minor question clarification so when we talk about solving problems better than other algorithms we mean here in this talk max problems right because for specifically sad when a problem is satisfiable and we want a solution there are other algorithms like conflict based close learning which may be better than algorithms we are comparing to like the TAC actually we benchmark against that oh so thanks for asking this question so we benchmark against CDCL kind of algorithms conflict-driven close learning techniques are more fancy branch involved software based on DP algorithm which is basically what it does is that it goes each branch of the in decision tree and it just says that whenever there is a conflict it just mark that and it's just never visit that branch again so basically you have fairly efficient way of sampling certain sad problems but not in this regime this is the regime that they actually have exponential running time we we tried top best algorithm in sad solving competitions in like mini sad and there are a few other techniques that they had such bad performance that after months of running they couldn't even exit on these instances so they were that hot in this regime that very close to sad on sad things for four sad for random four sad there is no signal for them they have no clue basically where to go it's exponential and I can tell you like why is that basically that those algorithm perform well if they can truncate fast in each of the branch so they see a lot of branches but they never get to the depth of it to the experiential depth of it and that's just you do not get a conflict at this closest variable ratio you just don't have to go very deep into the back but also they cannot be like they cannot truncate like basically they have to just visit each branch and go very deep into it that's the only way to deal with this four sad at this ratio of closest variable by construction you can see look at the Christopher Moore book that it's explained this very nicely that why this algorithms basically when the formula is frozen basically as Moore says you have like this branch and what techniques have no clue what to do so close to the phase sad on sad phase transition you are saying that on both sides of that phase transition the algorithm you are presenting is better for both like both if we take the problems which are satisfiable and we want to find the solution and the problems which are close to phase transition but are unsatisfiable and we want to solve maxed right yeah so but of course you can do maxed in any ratio of closest variable if they are so hard like it's finding we find that like finding wrong state close to sad on sad although in that term of dynamic limits they are satisfiable at any final size they are not be satisfiable okay yeah we believe that close to phase transition on either side will perform better as strictly I think the algorithms that could do whether or not is like similar they should have sensitivity to the input okay I'm saying like if you don't have sensitivity to inputs you are not in science wise you are not in that game I think this is like we need to develop these classes of algorithm I'm not saying that clap I'm just saying that it's better than any local strategy that basically it's perform similarly across all inputs basically it's not adaptable based on the okay okay I think it's time to take the so I think we can take us to the game for the talk so