 What I wanted to do in this second part of these lectures is to add some more tricks and techniques to your bag of tricks. So we are going to do like a replica analysis of this of this problem. The goal is again to find the minimum, the minimum loss, but using a different, different set of techniques which eventually will lead to the same result but in a larger, you know, in a larger space on the average, a loss and I will also mention a couple of problems that we have with this with this derivation. So the idea to compute the minimum is to set up a stock Mac problem with a fictitious temperature. So we imagine that we have a parameter beta larger than zero which is the inverse temperature or problem. We define a function Z, let's say, Gibbs Boltzmann partition function where vector X live on on the sphere of radius and, and we integrate over the Boltzmann weight exponential of minus minus beta H of X, where H is our loss loss function. We know, and we are going to redefine mathematical mean as the minimal value of H of X subject to the constraint that X leaves on the sphere. And this is this minimum we assume that is achieved at some value X mean of the vector of solution. What is the reason why we introduce this, this object, well it is because in the limit for for large, large beta. We can apply the Laplace method Laplace approximation method to this to this integral. And we know that in the limit beta to infinity, this integral will be dominated by. And will behave as exponential of minus beta evaluated at E mean, exactly, which gives us a prescription to evaluate E mean. So the minimal loss is given by minus the limit beta to infinity of one over beta log of this partition function Z. So this is the recipe for computing the minimum of a function of several, several variables. And of course, in presence of randomness. This is this Z is a random is a random function so we need to compute a suitable statistics of this of this Z, or in particular, of log log Z. This is a random random object. Okay. So, this is random variable. So what we are going to study two objects. So the average of E mean, which will bring us with which will connect us to the previous result that we obtained using RMP techniques, but this this problem is also very very rich, because we are able to estimate also the full distribution of E mean, not just its average value. So the full distribution, at least in in a large variation setting so for large, for large and we are proceed by different, you know, gradual steps. So number one, this is what we're going to discuss now to compute the average value of E mean, of course, we need to compute the average value of log Z, where Z is this integral. Already the, the source of all our problems because we are unable to compute the average of log Z directly. And so, to compute the average of log Z, we use a heuristic method that was developed over the years, which is called the replica trick. This method is non rigorous. It is based on some on some heuristic. I mean it is based on an exact result, but there is a catch. So the idea is that the average of log Z can be written as limit and to zero of one over log of the average of Z to the end. Okay, the, this is an exact, an exact identity, how you prove it, well, you expand this for small n, this gives one plus n log Z. And then you take the average of this, which becomes one plus n average of log Z. And then you have a log of one plus something small, and to first order in little and this is just the coefficient of, of N. Okay. Well, most of you will not need this type of derivation, but it was just for completeness. Okay. So this is an exact identity, it is called the replica identity, because the way we are going to use it is by assuming first that little n is an integer. So when, when you assume that little n is an integer what you're computing essentially is the average of a product like a replicated product of your, of your integral here. And, and, you know, the power of an integral is just a larger integral. And that's, that's the reason why we can, we can compute this type of objects. Of course the mathematical subtleties arise because we will need to then compute the limit for little n going to zero so in the vicinity of zero whereas we will have a result for ends that are very far away from zero. And we will find only on the, on the integers. Okay. So our hope is that by closing the eyes, everything will will will work. And I'm sorry to report and this is a very difficult moment for me that this will probably not be the case in this, in this situation so we really need mathematicians, I said it. Okay. So there are a lot of mathematical subtleties concern concerning this, this trick, but, okay, let's let's proceed as if they did not exist. We have our object here. So our average of even becomes minus the limit beta to infinity of one over beta. We have the average of log Z, and we use the replica identity to write this as one over and log of the replicated partition function said to the end. That's, that's the formula that we are going to to use. Okay. So now we can proceed for a broader class of loss function, and then specialize it to our own loss function. So imagine that our broader class of loss function can be written in this in this form, vk of x squared, obviously restricted to x square equal to n. So the properties that we require of vk. So the vk is, we assume that our Gaussian distributed independent, and we have that they are centered. So the expectation of vk is zero, and then they have a peculiar covariance structure. So vk evaluated in x, a times be, sorry, the L evaluated in x, be this covariance is diagonal. The coefficient is given by the dot product of a xa and xb divided by n. Okay, so in, in, in our case the the function f. Now we're progress this problem, the function f is just linear. So it is Sigma square plus you, and the derivation is included in the in the handout it is in equation in equation 18. So if you, if you replace by vk of x, the, the, the linear, the linear system structure that we started from, you can compute the covariance and you will, you will observe you will prove that the covariance is in this form with a function f that is linear. Okay, but in general, we can, we can do this, this procedure for much larger much broader class of, of loss functions. That's what we, it is more convenient because we, we will kill more birds with one stone. So what we have to compute now. So we have our object Z, which is the integral dx over x square equal to n exponential of minus beta over two. So minus beta, the loss function, but the loss function is written in this generalized form. Okay, so the first, the first thing to do to massage this object before we raise it to the power and is to use the so called Hubbard strata knowledge identity, which is a fancy name for a Gaussian integral. So you get du exponential minus one of you square minus I root be you why is equal to square root of two pi exponential of minus beta of y square. This, this identity is used every time you have a term raised to the power to in the exponent to lower the power to from two to one. So, you have this identity where you have something raised to the power to, and you, you rewrite this as something raised to the power one, but the price to pay is that you add one extra integration over the auxiliary variable you. Okay. So to do that, of course we need to use the Hubbard strata knowledge identity multiple times one for each terms for each component of of x here. So what we get here is that this will become an integral over a vector you this vector you will leave in on R to the M. E to the minus one of you transpose you. So this is the, the quadratic quadratic version of this object the multidimensional version of this, this object. And then we have the integral over DX exponential of minus I square root of beta. And then we have the summation k one to M of UK times vkx. So using this this trick we have linearized the dependence on vk in the in the exponent at the expenses of introducing a number of extra integrations over over you. Okay, so now we, we have to raise this Z to the power little and, and then we need to take the average over the disorder, which is in in the in the case. Okay. So, we need to take the average of Z to the end, where we assume that little and is an integer. Okay, so we need to replicate this integral little and times. So we get that this DU integral becomes replicated several times. So now we have the you with the next index a, where a goes from one to little and. And we have the constant to pie raised to the power and over to then we get exponential minus one of summation over a from one to little and. One to little and of you a dot you a. So this the dot product. So it is essentially you a transpose if you want you a, and then we need to replicate also this V integral, right. So we are replicating this integral. And we get. Sorry, this one is not V is X. So we get product for a from one to little and of the X a. And then we have a product for K from one to M. So I'm bringing this down downstairs for for convenience. And this is half of what of the average of exponential minus I root beta summation a from one to end you a component K. vk of X a. The average over the distribution of of the V's in in doing this the step I of course exploited the fact that the V case are all independent. So I can move the average from outside to inside. Okay. Good. So now we have the replicated partition function. And we are the next thing to do is to perform the average over the V case. Which where of course we are going to use this, this condition, which I just about to erase where the V case are independent and Gaussian distributed with with this particular covariance structure. Okay. So, yeah, I have no choice. Sorry. Okay, so this by this notation, I mean the case, the case entry of the replicated vector UA. Okay, this is the notation that I'm using. So what we need to compute here is the average over the exponential of a Gaussian number, right, because the V case are Gaussian. And this is a sum of Gaussian Gaussian variables. Okay, so essentially what we what we need is to compute the average of exponential of Z, where Z is Gaussian with a particular covariance structure. So I'm going to use this result that the exponential of Z average over a normal distribution with me new and variance sigma square is exponential of new plus sigma square over two, which is a result that I put in a question 12 of the end out. So, so what is Z in our case, Z in our case is this exponent here. So it is minus I root beta summation over a of you a K. of X, a. So what is the, what is the average of Z. Well the average of said is minus I root beta summation over a UA component K average of the K of X, a by linearity. So this is of course zero, because we assume that our V case mean zero, and what is Z square. Well, then we have minus and minus is plus I square is minus one. So we get a minus beta. So we get summation over a and be you a K, you be K. And then we have the average of V K X, a V K X, be, which is precisely the covariance structure that we assumed from the beginning, which is diagonal. And, and with with the pre factor that is this function of the, of the dots of the dot product. Okay. So, now we can continue from from from here, the result of this average is nothing but exponential. The value is zero exponential of sigma square over two. The, the variance of this. So we get the product K one to M of exponential minus beta over two summation. And then we have B, you a K, you be K. And then we have this function of X a dot X B over N. Okay, I'm using just the sigma square that I've computed there and there is this factor of two of one of that comes from from here. But now we can perform the integrals over you. Okay, so this term. So let me just, let me just erase here again. Good. So we have that. And then to the end average is the integral over the X a, and then we have an integral over you one. You, and each one with a factor of two pie to the M over two. Yes. So this is the X a. And then we have an integral over the U one over two pi to the M over two, the U and over two pi to the M over two. And then we have, we can, we can put together this this term here in the exponent, and this term here to reconstruct essentially a Gaussian integral in little n dimensions. Okay. How do we do that. I can just write this as minus one over two, the vector of the use. So the, the row vector of the of the use that multiplies a matrix. And all these is multiplied by a column vector of the use. So what is the, what is this matrix that connects the row vector and the column vector of the use. Well, in, in this term here, this is, this is just a quadratic, a pure quadratic term. So the matrix that is in between is just the identity. Right. We have just the identity matrix in little and dimension. And here we have a term that connects components of different type type A and type B of the vector of the vector you. Okay. So the object here is beta, because there is a minus beta over two of what of F x a dot x B over. So, do you agree with this with this rewriting. I'm just rewriting all the terms that depend on on you in this exponential form where you have a vector of you, vector of you, and in between, we have a matrix that must have this this form that connects the different entries of these two vectors it is just a just a rewriting of the integral that we that we had. So rewriting is very nice, because this integral here becomes a Gaussian multivariate Gaussian integral in little and dimensions. So the result of this multi dimensional integral which is also included in the, in the handout is the determinant of this matrix in this and raised to the, to the power of the size of the vectors. So the result of this integral is just the determinant of the identity matrix in n dimension, plus beta, our function of x a our covariance function of x a dot x B over and all raised to the power minus m over two. And of course, no in this case, the constants are fine because I included them in the, in the measures. Any questions about this. Yeah. What is the difference between the function and the loss function. And the loss function. That's as that's a great observation so it is a another quintessentially bad choice of notation because I called f of no in that case it's correct right so no no. So the loss function. So this this F, this F is the generic function that represents the covariance between the vk, vk values. Okay, in that in for our specific problem, this f of you is sigma square plus you, but we can we can do the derivation for a jet for a generic covariance, F. So we are going to specialize this F to this value only at the end of the calculation. Okay. Is this what what you were asking is. Okay, okay. Any other, any other question. This, this one. This, this one. Yeah, this one is, is, is a matrix of size little and by, by little and which is indexed by a and B. So this is a matrix and we sandwich it between the two vectors, the column vector and the row vector of size little and I don't, I don't know a way of writing it more, more clearly than that. Okay. Next, we do a change of variables, because you see everything, everything in here now depends on the specular combination the dot product of vectors of type a and vectors of type B so replicated different corresponding different replicas. Okay, X, X, A and X B. Okay, so it is natural to introduce a matrix Q of size little and by little and with entries Q AB, which are one of our n X a dot X B. Okay. So this is a matrix for which we can say something about the diagonal, easily. The diagonal is one over N X a modulus square, but we know that X module square must be equal to N, because this is a constraint problem on the sphere. So this matrix Q will have one one one one on the diagonal. Okay, that's that's the space over which we are. We are integrating. We use an identity that I reported on the, on the handout on equation 19. So that's an identity, and the identity is as follows if you have an integral over vectors X one X and of a function of the matrix X transpose times X. Where is the N times N matrix. So X is the matrix whose columns are the vectors X. Okay, that's, that's the left hand side of the identity. This is equal to a certain constant that is known explicitly depends on capital N and little and capital N is the size of these vectors little and is the number of the actors you are integrating over. This is equal to an integral over a certain domain of matrices Q. The number of matrices of Q times the determinant of Q raised to the power and minus and minus one over two in the key and the domain, the of Q is the matrix matrices that are non-negative, non-negative definite of the form such that, and of course, the integral runs over the X one square equal to N. Okay, so essentially this, this theorem establishes the Jacobian of the change of variables that we are that we are after and this Jacobian is just given by a determinant raised to some power that some exact power, and there is a pre factor that we can that is known exactly. Okay. This is an identity, it's a it's a theorem. So we can just apply it to our setting. So for getting constants, we have that Z to the end, the replicated partition functions, others is now equal. That's, that's precisely the setting that we have that we have here on on the left, where we have little and an integration in little of size capital N. So that's that's precisely the left hand side of the identity. So we can rewrite this object, modular constants that anyway we know as an integration over non-negative definite matrices with once on the diagonal of the determinant of Q to the power minus N plus one over two. Let me rewrite everything in this form, exponential minus N over two, phi N of Q. So all the terms that depends that have a capital N in front, I put, I put them here. So including the determinant of Q to the power N over two. So this function phi N of Q is given by what? Well, you will have a log that type of term. You will have minus log that of Q that comes from this term here. Plus alpha. Remember that alpha is capital M over N larger than one. And we assume, we assume that we are taking the we will take the limit and to infinity. In such a way that M over N is fixed plus alpha log that of identity plus beta. Let me call this f hat of Q. So the f hat of Q is essentially the function f evaluated on the matrix, on the matrix Q. Okay. Sorry. This is in the sense that the element Q, the element Q AB is the one that is in the position AB, and it is determined by this equality. Yeah, so so each element of f hat. Yeah. Okay, so with the choice, of course, if you sigma square plus you, or, you know, F, or F hat, equal to sigma square plus you, then we are back to our initial progress this problem but this, this expression is valid in general for for this type of functions that are expressed as sums of functions so you can pick a polynomial polynomial of higher order for example and the derivation would carry over in the same in the same way. The next step, of course, of this derivation will be to evaluate this integral, this matrix integral for large N using a subtle point or steepest descent type of argument. So we will be looking for an extremum of this function phi N over a specific set of a suitable set of matrices. Okay, so we will take the limit N to infinity of this capital N to infinity first. And then we will have to take the limit little n to zero, the replica limit, and then we will have to take the limit beta to infinity. And with all these operations in whatever order suits are suits as best for getting any type of mathematical rigor. In the end, we will, we will hopefully land to a nice result. Okay, so we have to take three limits capital N to infinity little n to zero beta to infinity. Okay. So well I think I can stop here. Yeah. Yeah, this result is not not exact in in any sense of the word exact. And under any choice of the word exact this result will not be exact. But, I mean, we can rename the word we can give a new meaning to the word exact and and that's, that's what we're going to explore. Yeah. Yeah, so. In general, I mean, you know, if you follow this prescription in the end you you get something that that is usually instructive. But but then, you know, if you want to put it on a mathematically rigorous way that's that's a much harder, much harder problem you usually this gives a good heuristics on the on the solution that the then people who do like rigorous stuff would, you know, have as a starting because if you know what you need to prove at the beginning of, of, of your work is better than not knowing what you need to prove. So, Exactly. So after after you've done this, this calculation, then you need to start again with a completely different way and try to prove that this, this result is rigorous. So what, what is the source of non rigor, many things, we are interchanging limits freely. So we are taking limits and to infinity, you know, in in in different order than than was initially conceived, and also the fact that the replica is correct as as a calculus identity if you manage to extract the analytic continuation of your set to the end in the vicinity of little and close to zero. But what we are doing is to compute that to the end for little and integer and integer 123. So, unfortunately, the distance between one and zero is enormous. This type of problems. It's really enormous. I have a question about the very beginning of the computation. Yeah, I've seen that the partition function that you've written is basically a spherical integral. Yes. Okay, the integral is e to the minus beta, a function which is the most quadratic in X in the vector X. Is that right. At most loss function of the procrastos problem. It is quadratic in v of x. It is quadratic in vk of x. Yeah, if you if you take me to be a x plus in our case in our case. Yes. Then I guess that the sphere is such spherical integral with the of an exponential that has at most a quadratic stuff at the exponent can be computed, maybe not in a different way for sure. If you if you convert the spherical constraint into a Gaussian integral that you can do. It should give an analogous result to what you have without the need to use replicas. Does it work. So the I mean I'm not 100% convinced about this. This argument is because of the positivity constraint on the a transpose a matrix that appears in when you expand the quadratic form. So I think that that that will get in will get in the way of your of your argument, but we can we can discuss. Okay, I see, I see. So I'm not 100% convinced that you can avoid replicas and do an exact calculation on on this. Okay. Yeah, we, I guess, when you turn the spherical constraint into a Gaussian integral that you have minus square norm of X, and that should keep everything convergent, provided that the Lagrange parameter they say auxiliary Lagrange parameter that you introduced to enforce the constraint is bigger and smaller the second value depending on the smaller depending on the symmetry that you have. Okay, so you will have a minus. And that that might constitute like it might truncate the range of your integral. Yeah, yeah, for sure, which, which will create a sort of error function type of integral in the large dimension. All right, and that that I think is, is, is a source of the problem that you cannot. It will not lead to a convergent integral over over the full over the full. Yes. Okay, so you cannot exchange the integrals over the constraint with the with the integrals over K over X. That's that's my, that's my intuition. So relax now.