 Great. Do you guys see my slides? The transition. Changing too. Great. So this is a joint work with André Monsonari and Mark Selty, who's a fantastic graduate student in mathematics at Stanford. So we'll be talking about optimization of mean field spin classes. So the setting here is very similar to the talk from two days ago by Erion Zubak and we'll be interested in optimizing these things that are called spin-glass Hamiltonian. So let me define it. So here we have a Hamiltonian, which is a Gaussian process. This is the definition. It's centered and it has a covariance that when evaluated at two points, sigma and sigma prime, it depends only on the inner products between those two points. And the Gaussian process is indexed by the hypercube. So plus minus 1 to the n. And just to give you two examples that you have to keep in mind through the rest of the talk, you can look at the SK Hamiltonian or the two-spin model where you can have this explicit expression, which is just a quadratic in sigma i, sigma j. So it's like a two-body interaction. And in this case, you can compute the covariance and you'll find that this function c that dictates the covariance is just x squared or x squared over 2. Proper normalization there. And the second example that I want you to keep in mind is the three-spin, which is almost the same thing except you have a three-body interaction. And these Jij's, by the way, are all Gaussian independence random variables. And c here is x cubed. So the question here is I want to maximize this Hamiltonian, find the ground state energy or the ground state configuration approximately, over the hypercube. Okay, so what do I mean by optimize efficiently? So I'm here depending on the description that you're trying to choose for the Hamiltonian, this probably may or may not make sense. So let me put this on a more formal definition. So here I'm getting, I'm considering an algorithm that takes, that takes as inputs gradients from some oracle that computes gradients from you. So you just query your points and then the oracle computes the gradient of the Hamiltonian at that point and returns it to you. And you want to use all of those queries to form some valid configuration, sigma, such that you're at least one minus epsilon times the optimum with high probability. And you want the number of your queries, the number of conversation, the length of the conversation with the oracle to be polynomial in n and one over htm. And so like in the worst case, this is already NP-hard actually. So you can show that already for quadratic, except for the two-spin model, where the jij's are not Gaussian, but they're chosen in a certain adversarial way. This problem is extremely hard in the sense that's already, if you want to have an approximation that is very bad, that is close to zero at a logarithmic speed, then this problem is actually NP-hard. So as a, in a worst case scenario, this problem is, is, is not feasible, but for a random scenario like this one, we may expect to do a little bit better. So what is known about this problem? So if you look, you can look at convex relaxations, which is a general theme that you can try to replace a non-convex problem with a convex one. If you consider this, okay, so pure case-pin spherical model, which is a k-body interaction, like it's a monomial of size k. And you're, instead of, let's relax the constraints that say you're on the hypercube and just be on the sphere. So I just constrained them to norm of the vector sigma to b square root n. And you can consider this level k sum of squares relaxation. And you can try to analyze its behavior, or at least the value that it achieves. It turns out that it has a value that is larger than the optimum by a factor that diverges with n-pollinomial. So it's no good. You can look at Langevin-Glober dynamics. So there's a long, rich literature, but so in this, in a glass phase, these dynamics exhibit slow mixing and they get stuck at what's called the threshold energy. So thirdly, there is another approach that was, okay, presented by Ayaan-Suvak two days ago. So these are, I call them dual algorithms for a reason that will become clearer. So for the spherical model, we've seen an algorithm by Ayaan. So then next, Andrea Montanari tried to extend this approach to the easing case for the two-spin model. And these both achieve, you can show that they both achieve a global optimum or an approximate global optimum under this condition that is called no overlap gap. So we've seen from Suba Bacha's talk, the overlap gap, what that means. And if you assume that there is no overlap gap, that actually implies the existence of an algorithm. And so this is, this theme has been also investigated by many authors, co-authors of David Gamarnik that show impossibility results if this, if this condition is not satisfied. Okay. So let me dive into a little bit of the technicalities of the problem. So if you want to optimize something, you'd better know where you're looking for. So what is the ground state energy that you're trying to achieve? And this is called the Parisi formula or the zero temperature for Parisi formula. So if you, so, yeah, so we can try to define it. And so you can, you have to consider first a space U of functions gamma that are non-decreasing. And this, you have, in the definition, you have to consider this nonlinear PDE that is called the Parisi PDE that takes any gamma from the space, puts it here, and then solves this nonlinear PDE backwards from this terminal condition. Now, once you solve this function phi, you put it in this functional P of gamma, that is takes the value of the, of phi at zero, zero, and then subtracts this linear term. Okay. Now, there's a theorem that tells you that the maximum normalized by n will converge to the infimum of the Parisi formula over all of this class. So this is a variational problem that you get in the limits. So the first thing that you would, I mean, the first thing that you would think when seeing this, this kind of formula is how are these two things related? So this is a maximum, this is an infimum that they don't, they look like they come from two different planets, right? So we'll try to shed some lights on this, on this question. Okay. So behind this, this, this proof or behind this, this whole machinery that will lead to this proof is this thing called the ultrametric structure. So if you look at the clusters of near-optima in the hypercube, these are represented by the leaves of an infinitary tree. So that's the tree that you're looking at here. It has a root and then the leaves of that tree are the clusters of near-optima or near-optima themselves. And the ancestral states, the intermediate nodes in this tree, they correspond to actually points in the solid hypercube or the solid cube and at any level of this tree. So I'm indexing it by, by time here. And at any level, so if you take a point at that level, the two norm of that point is about square root, square root t times n. So lastly, if you take two points on the leaves of this tree, then their Euclidean distance is reflected by the tree distance between the nodes. So if you take this point and this point, then their Euclidean distance is reflected by going up the tree and coming back again. Okay. So the main algorithmic idea is to exploit this ultrametric structure. So what we're going to try to do is algorithmically start at the root and then try to navigate and explore a random path in this tree until we get to the leaf. And then in which case we'll get an approximate ground state. And geometrically, there's an equivalence picture geometrically for this, for this approach. It is, you start from the origin of Rn and then you try to do some kind of diffusion that kind of mimics this tree behavior. And then at some point, you'll, at the end, you would hope to reach one of the apexes, one of the extremal points of the cube. Okay. So the main results here is the following. So there exists an algorithm that outputs a feasible solution in C of epsilon iterations, such that if you evaluate the Hamiltonian at that point, so it's one minus epsilon times the infimum of P of gamma over a slightly larger class that I call L instead of U. So recall the ground state is given by the infimum over a class U. And here I have a class L that is slightly larger. And in particular, it contains monotone functions. So the functions that are in L do not have to be increasing, which was a requirement for the definition of U. Okay. So this algorithm is, of course, optimal if when you compare this infimum with the other one, then they're equal, right? So in particular, so if this infimum is equal to the granted energy, if the infimum is achieved at a non-decreasing function, obviously. And in particular, if this original ground state energy, which is optimized over U, is achieved at a strictly increasing function, then you get that this algorithm is optimal. So this is called an overlap gap. So that's the definition of an overlap gap or fold or continuous replic asymmetry breaking. It turns out that this algorithm is best possible among class of INP algorithms. So this is, there's an optimality, a guarantee aspect to this algorithm is that when you try to look at all INP algorithms, I'll define that in a second. When you try to look at all these algorithms, then the best value that is achieved by them is given by this, by this variational problem. Okay. So this is an algorithmic threshold for optimization in spin glasses. So you can try to discretize these problems and solve them numerically. So if you look at the two spin model, and you solve either one, if you solve these extended various formula or the ground state one, and you find that, sorry, excuse me, you'll find that the optimal gamma will be just strictly increasing, meaning that this non-decreasing constraint that you put in the original problem is not active. So if you solve either of one, you'll always find an increasing function. And this is a conjecture, this is a famous conjecture in mathematical physics, that there is no overlap gap for the SK model. That means that the ground state energy is the energy that is achieved by your algorithm, and which is given approximately by this value. This is in stark contrast with pre-spin model. So if you look at the ground state energy for this model and you're optimizing over non-decreasing functions, the function that you get looks like this. So it's constant up to a certain point and then starts to diverge. On the other hand, when you just remove these constraints, it turns out that you're very far from being monotone. So the optimal function goes down and goes up again. So if you try to evaluate these things, so the ground state energy is about 8.81, and the algorithmic value that you achieve is about 0.8. So it's very close, but it's strictly smaller. So the second thing that you can try to evaluate here is the threshold energy, which is the energy that is conjecturally the one that is achieved by glalbertide dynamics, at least for short times. And this has been evaluated by Riedzo in another paper, and it's about 0.788. So you see that the algorithm, this energy that is achieved by our algorithm, is strictly larger than the threshold energy. So this is an interesting finding. So this is, by the way, these numerics are preliminary, but we have to work on them to make them better. But it's still a very good observation to make here. Okay, so I can try to explain a little bit with the algorithm, how the algorithm is derived. So you can look at a, try to start from a stochastic formulation of the Parisi formula. So there is this SDE natural SDE that is related to the problem associated to this problem, which describes the evolution of the cavity field of the problem. So this is an SDE that you can write down and describes the evolution of the cavity field. And the cavity magnetization here is given by this formula, given by just the second, the first derivative evaluated at x t. So what are these two quantities? These two quantities just are, so if you're familiar with the SK model, you can look at the replica symmetric formula, where the overlaps concentrates at a value q. And it turns out that x t is just normal 0 q, and m q is just the hyperbolic tangent of this value here. So this is, this is what you would find in the replica symmetric case. In the full replica symmetric breaking case, this is the more general form of that. So the main algorithmic idea here is to discretize this SDE with driving noise. So you see there's a Brownian motion. So the main idea here is that the randomness that will form this Brownian motion will come from the Jij's in the right way. And the right way, by that I mean, we're going to use an algorithm that is a. So let me tell you what the algorithm is. So you start from zero. So first of all, you discretize time. So I'm going to consider that time evolves in discrete steps delta. And you start from the origin. And I'm gonna have to choose functions f l that I'm gonna have to come up with. They depend on the entire history up to that point. And you evaluate them. And they give you this point that I called ML. And once you get ML, then you compute the next iterate CL plus one using this iteration. So you have a multiplication by J. And then you're subtracting a linear term for the previous ends with coefficients that are explicit. So they have a closed form formula. Then with this holiday, if you consider the behavior of this algorithm, then in the end goes to infinity limits, then when for any fixed l, this tuple will converge to a center Gaussian vector that has an explicit covariance that is given by f times f, the different times on which you evaluate it. Okay. So now basically I have a Gaussian process for which I can just tune the covariance. I can choose these f ls in order to get whatever covariance I want. So let me choose them in this way. So my f ls are chosen in this way, which is just basically mimics the SDE and the continuous time dynamics that I showed you before. Okay. So here I have a function V. It doesn't have to be the function that is given by the crazy formula. And here I have a function u. So these I have to search over these two functions in order to describe the whole class of algorithms that I have. Okay. So let me... I'm sorry to interrupt. You have like one minute or maybe two. Thank you. Yeah, we'll do that soon. So just for this algorithm to work, it turns out that this necessary condition is that this function u that showed up in the algorithm has to satisfy this condition. This is just because if you just assume that the algorithm works up to time l and you want to show that it works up to time l plus one, then there's an induction argument that'll actually require this condition to hold. So let me not dwell on that. And you can compute the energy that is given by this algorithm is just this integral. So this is a derivation that we can do. And so this is a necessary condition. And now you have to add another condition because we're in the easy in case, which is just that my magnetization at point at time one will have to be between minus one and one. This is just specific to the easy in case. Okay. So I can maximize over all my choices of functions of this energy given these two constraints. Turns out that I can slightly relax that. And I can just optimize over all stochastic processes that can have these constraints. And I don't have to worry about the v here. So I can just absorb this xt in you. Okay. So that value I'm going to call it e star. Turns out that you can relax this assumption. So this thing is the most difficult thing to deal with. I can put it upstairs here with a Lagrange multiplier. And that will give me a relaxation. And this relaxation has a Lagrange multiplier new that I suggestively written in this way and involves this thing called gamma. Turns out that this relaxation is the Parisi formula. And now so you'll get that's the energy that is achieved by your algorithm is smaller than the infimum over all gammas of this function, which is exactly this. So if you optimize over a large enough function space, you get actually the energy that is achieved by your algorithm. And turns out that okay, so the optimal control will be given exactly by what you expect, which are these, these formulas that are described and the effective field and the effects of magnetization. So the main message here is this duality. So we have on the other hand the Parisi formula. And on the other hand, this stochastic control problem. And these two things are equal actually. So these are two, two different exactly equivalence definition of the same number. And this is what the algorithm does. So here this is that's the description for the algorithm is doing. And this is just the dual formula. This is where you get an infimum instead of a supremum is just by taking the dual. Okay, so let me just conclude here. I want so interesting directions here aren't physical interpretation of this extended Parisi variational principle. And we don't have that yet. This is just what is achieved by the algorithm. And we don't know what that is in terms of the actual gives a measure. And we would like to explore this duality property. Excuse me. And then other models would be the perceptron case stats by protect models and etc. Sampling would be an interesting idea. And then we can try to extend this idea to sparse models and thank you. That's, that's all. All right. Thanks on it for