 OK, welcome for the last day. And what's the best way to start the last day than with a computation with Pierre Paolo? So please go ahead. So today, we'll speed up and shorten a bit the pauses and also the lunch, because there are strikes for the buses, as you may have seen. So we'll try all to catch the last bus to go in town. So we have to finish before, let's say, 15 minutes before 4 p.m. OK? So just follow my lead. We go inside. And all right, let's try to keep it in time for today. I promise. All right. OK, thanks very much. And welcome back to the last couple of hours on the progress this problem. So I took the liberty to fill the blackboard already with a short summary, because there's been a large gap in the lectures, so maybe just to refresh your memory. So we are considering the following problem. So we have a system of linear equations. We have M equations in N unknowns. And this system is random in the sense that the matrix of coefficients, which is rectangular, is random. And the known term on the right-hand side is also random with the noise, so the variance of the elements. The elements are centered, and the variance is equal to sigma, the parameter sigma square. OK? And there is a nonlinear constraint on the space of solutions. So the solutions need to live on the sphere of radius N. We define the parameter alpha, which is the rectangularity parameter. It is the ratio between the number of equations and the number of unknowns. And we introduce a loss function that I call H. It is essentially the quadratic norm of the difference between the left-hand side and the right-hand side. So in general, if this loss function is exactly equal to 0, then it means that the system of equation is compatible in the sense that we can find a solution, obviously given this constraint. If it is strictly larger than 0, then the system is incompatible in the sense that, in general, we cannot find a solution that satisfies the system exactly. And this loss function quantifies how wrong your putative solution is. OK? Now, we want to compute, of course, the relevant figure of merit is the minimal loss that I call imin. And this minimal loss is provided using a so-called replica analysis, which is, of course, heuristics. It is not rigorous. It is provided by Rachelle Tublin and Jan Fyodorov in a couple of papers that I linked, that I listed on the handout and in the PhD thesis of Rachelle, who is also available online and that I strongly suggest that you have a look at. It contains a lot more material that I will be able to cover today. So the way the system, the setup works is you define a partition function z, introducing a fictitious temperature beta. So you integrate over the putative solutions, the Gibbs Boltzmann weight exponential of minus beta h, the loss function. In the limit beta to infinity, this object is supposed to converge to exponential of minus beta dominated by the minimal solution, the minimal loss here. So if you invert this relation, you have that the minimal loss is given by minus the limit beta to infinity of 1 over beta, the average of log z, where, of course, we take into account the random nature of the problem. So we also average the left-hand side and the right-hand side. So the technical issue is how do we average the logarithm of this partition function over the disorder encoded in the random coefficients of A and in the random entries of B. And to do that, we use the so-called replica trick, which means that we are going to use a mathematically correct identity in the wrong way, meaning that we are going to take the limit n to 0 of 1 over n log of z to the n, where, of course, little n must be computed in the vicinity of 0. But first, we are going to assume that little n is an integer. So we are going to promote little n to an integer so that essentially the task is to compute the average of a replicated version, little n times an integer, of the partition function z. Why this is useful? Because a replicated integral, which is z, is nothing but a larger integral. And that's what makes the computation possible. So we can exchange the integration over replicated variables x and the integrations over A and B. So we can perform the average over the disorder first, which we cannot do if we have a logarithm in between. So this is the calculation that we did last time. And we ended up with the expression here on the right hand side, forgetting normalization constants that anyway will not play a role in the large capital N limit. This is the integral that we ended up having. It is a nice integral because it has a form of exponential minus capital N into some function phi. This function phi acts on matrices of size N by N. And these matrices are matrices Q. They have diagonal, that is all the elements on the diagonal are equal to 1. And the explicit expression is given by this difference of log debts, where the parameter alpha appears in here, the tangularity parameter. And in general, we did the derivation for a general function f. For our problem, the function f has a particularly simple form. It is sigma square plus u. Should you start from a different setting, you can consider in here, for example, polynomial in u, this would correspond to essentially a system of nonlinear equations with a nonlinear constraint. And the problem becomes much more difficult. But for special instances of this problem, this has been considered by Rachelle in her thesis. So there is a chapter devoted to this more general class of problem. OK. So now I can start from here. So I will keep this term here, because that's a starting point. And I will erase the summary, unless there are questions, of course, and feel free to interrupt me. If there is a question by Anna on the chat, can the n, the replica limit, little n going to 0, can the n analytically continue? Yeah, I would say that I'd be very happy first to be able to answer this question. And I'd be very happy to duck the question entirely at this point. So essentially, mathematically, that's the whole issue. So we cannot mathematically prove. I mean, some of these results can be proved rigorously using alternative methods, typically, or using rigorous versions. So in some sense, the question should be more properly be asked to Jean. So the answer is that we don't know whether this is true. We need to assume that the calculation that we do for little n, an integer, can be meaningfully continued in the vicinity of 0. That's the heuristic and non-rigorous nature of the method. So I don't have a more, I wish I had, but I don't have a more convincing answer, unfortunately. I believe for this specific problem, no, not yet. Some result, and I will discuss this briefly in the second part, some results have been made rigorous for a closely related problem, where essentially use this very same techniques. And the problem is instead of minimizing that loss function, what you're interested in is some form of quadratic form problems with some sort of magnetic field. So you have something like this. So you have a preferential direction, and you have a quadratic form where you take the w and the h as random. So this is a related somehow problem, at least methodologically, for which there is a rigorous analysis. And this is very interesting because the rigorous analysis in some region disagrees with the replica result. So this is a very important playground to test the limits of the replica trick, because we have there the rare luxury of having also an exact rigorous results. For the procrastinist problem, I don't believe there is. Yet. So OK. So for large n, we start from this integral. For large n, this integral lends itself to a Laplace or subpoint approximation. So this integral will be dominated by the configurations of q such that phi n of q is minimized. So as usual, we assume that z n for large capital N will be dominated by the configurations of q such that phi n is minimized. And q min is then arg min. So it is the matrix that minimizes phi n of q. This step already entails some mathematically itchy procedure because we are taking the limit, essentially the limit capital N to infinity first. So we are in some sense exchanging already limits. So the order of limits would be little n to 0, beta to infinity, and then something about capital N. And here we are already doing something not completely legit. But that's the only thing we can do. So if we take now the limit n to infinity of what? The limit n to infinity of the average imin over n, then we get that this is given by the limit beta to infinity, 1 over 2 beta. The limit n to 0, 1 over n phi n of q min. So I'm just combining the replica trick, the beta to infinity, and the fact that we have to take the logarithm of the average of z n and the logarithm of the average of z n will give the logarithm of exponential of minus n over 2 phi n at q min. And that explains the appearance of this extra factor of 2 here. So now the task is to compute the matrix q that minimizes this object. So we want to find the solution of this minimum problem. So we need to differentiate the function phi n over the entries of the matrix AB for all a smaller than b. So q is symmetric and has a constraint on the diagonal being equal to 1. So this is the general structure of the matrix q. So OK, let me write it like this. OK, is it clear why the diagonal must be equal to 1? It is because the definition of q AB was something like x A dot x B divided by n. So on the diagonal, we have in the numerator, the norm square of the vector x, which is by construction by definition, is equal to n. Because we live on the sphere. So on the diagonal, we need to have 1's. Good. So in order to do this derivative, we need to calculate the derivative of these log that here and here. So we need to use an identity, which I reported in the handouts. It is equation 20 in the handouts. And the identity is as follows. The derivative with respect to a matrix element AB of the log that of a matrix m is equal to the element ba of the matrix m to the minus 1 of the inverse matrix. And well, I don't have time to sketch the proof, but I linked a nice proof on the internet. You can find many of them, but it is a very compact proof. So if you trust this result, we can proceed by applying this identity and differentiate this function. So when we differentiate minus log that q, we get the inverse, a certain element of the inverse of q. So we get q minus 1 elements AB with a minus sign, but I move it to the other side. So this is equal to what? This is equal to alpha. And then there is a log that of this object. I'm already putting myself in this specific setting, not for general F. So we will get an extra beta, because we need to differentiate using the chain rule and differentiate inside the argument of the determinant. So we get an extra beta. And then we get the inverse matrix of that object plus beta of q plus sigma square En minus 1 element AB, where the En matrix is the matrix with all 1s. It's a rank 1 matrix, which is filled by 1. Where does this come from? It comes from this sigma square in the function. This sigma square is not multiplying the identity. It is multiplying every single element. So we need to include this matrix in there. Good. And the dimension here is little n. Now, in order to solve this matrix equation, what people do is they make an ansatz for the structure of the matrix q. The simplest ansatz that you can make is the so-called replica symmetric ansatz, or RSB, RS. B, in the case it doesn't work. So the way the game works is you start with the simplest possible ansatz, which is this replica symmetric that I'm going to describe. If everything goes through without problems, then you assume that, OK, maybe we got the right ansatz. If at some point you get some inconsistency, then this might be a signal that your initial ansatz is not correct, and you need to do something a bit more sophisticated, a bit more subtle. That's a very simple and fast explanation of the whole business. But let's try with the simplest replica symmetric ansatz. What does this mean? It means that you are considering that all the replicas, so all the elements of this matrix q are identical. So remember, the indices A and B would index the replicas, the replicas that you have here. That's why they go from 1 to little n. And the idea here is that there should not be a difference between whether you call a replica the replica number 2 or the replica number 17. They should all be equivalent. And this corresponds to putting the same value q in the SDF diagonal element of this matrix. So if you do that, what you need to compute is the inverse matrix of this guy here. And you can do it in several ways. The faster way is to assume a structure of this type for the matrix inverse. And then to compute the parameter gamma and eta by imposing that q minus 1 times q must be equal to the identity. This gives two equations for the parameters gamma and eta. This is simple algebra. So I would just give the result. There is here, even though, OK. Maybe, yeah, yeah. We have assumed q. Sorry, if we had assumed that q was front of the, yeah. I'm just doing it for pedagogical. I'm going a bit slower for pedagogical reasons. So at this stage, everything is totally general. Then we make an ansatz. And I appreciate that there are faster ways to do that. But I just thought I would give all the steps. So this leads to the following equations for gamma and eta. Let me just report them for completeness. So it's 1 minus q. 1 plus q times n minus 1. And then we have eta, which is minus q divided by 1 minus q. The same thing, 1 plus q, n minus 1. So these are the values of the diagonal and off-diagonal elements of the inverse matrix q obtained by imposing that q minus 1 q must be equal to the identity. So you impose this condition. You get two equations. Solve the equation. Get these results. Good. And now we need to find the inverse of the r matrix. r matrix is the matrix on the right-hand side. So it is identity n plus beta. Obviously, assuming that q has the same structure. So what is the form of the matrix r? So on the diagonal, you have 1 plus beta. Then you have the diagonal element of q, which is 1 plus sigma square times 1. And we call this number r diagonal. On the off-diagonal, you have 0 plus beta q plus sigma square. And all these elements are identical to this one. And here is the same. And we call this object r. So the matrix on the right-hand side here, for which we need to compute the inverse, has the same structure, more or less, as q, apart from the fact that the diagonal element is different from 1. But it is still diagonal. And all the other elements are identical to each other. So essentially, we can play the same trick. We say that the matrix r to the minus 1 is rho d and rho. We assign these two values, and then we impose that r to the minus 1 times r is the identity. And we find the values for rho and rho, rho d and rho. And if you do that, of course, you get an expression that is very similar to this one and reduces to this one when r d is equal to 1. So what you get, I'm plugging it here, you get that rho d is r d plus r n minus 2 times r d minus r, r d plus r n minus 1. And rho is minus r divided by r d minus r, r d plus r n minus 1. OK? So we have the expressions for the diagonal and off diagonal elements of this matrix r to the minus 1. Now we need to impose the equality between the off diagonal elements, right? Because remember, a must be smaller than b. And if we do that, OK, let's keep it there. You get the following. You get eta with a minus sign. So this is the off diagonal element of q to the minus 1. So OK, there will be a minus sign, OK? And this must be equal to the off diagonal element. So 2 alpha times beta times the off diagonal element of the matrix r to the minus 1 is r. And the off diagonal element of r to the minus 1 is rho. So it is minus r divided by r d minus r, r d plus r n minus 1. You agree? But now r is this guy here. So it is beta q plus sigma square. And r d is 1 plus beta 1 plus sigma square minus r minus beta q plus sigma square. And then here you have r d, which is 1 plus beta 1 plus sigma square plus r beta q plus sigma square times n minus 1. So you have an equality that fixes the optimal value of q in such a way that this subtle point condition is verified. So this is an equation that needs to be solved for q. And it involves all the parameters of the model, the replica parameter that needs to be sent to 0, the beta temperature that needs to be sent to infinity, the alpha, the rectangularity index, and sigma square, the variance of the noise. Everything is in here. And if we solve for q, we get the optimal matrix that satisfies this condition. Now let's leave this condition here for a moment. And let's go back to our function phi. Remember, we need to compute our function phi at q min, this matrix with a particular replica symmetric structure. Let's do that. Yeah. So there is a reason. So let me just. They do, yes. Sorry? I think that's the reason. Let me just check my notes and I'll get back to you. But it is true that the diagonal conditions, the off-diagonal conditions, are consistent because I checked it. Yes. So I think there is a reason of simplicity or something. In some sense, you can probably pick whatever you want. But yeah, let me just think about that. So we need to compute phi n evaluated at q min. And no, actually, probably the answer is because in the end, you want q min in here. But q min has the same structure as a generic q. So the diagonal element is fixed. Right? So we don't care because we know that q min will have the term a equal to b fixed to 1. So your freedom is really in the off-diagonal elements. This said, the diagonal condition is still compatible with the off-diagonal. Yeah, and now it's impossible because we might have won. It's in elements of q. And the other is not only q in the diagonal. Sure, that's true. Yes, yes. So they must be compatible. And they are, yes, because I made sure that I checked it. So OK, I think we are more or less happy with this consistency check. OK, so q min has this structure then. And q here, where this q, we should call it q star, if you want, this q is a solution of this equation. It's not a generic q. So phi n of q min is minus log that of the matrix q plus alpha log that of the matrix, the right hand side. So in order to proceed, we need to know how these two determinants, what they are equal to. And for this, there is a nice result that I gave in equation 21 of the handout, that if you have a determinant of a matrix of this type, this determinant is equal to gamma minus eta n minus 1, gamma plus n minus 1 eta. OK, so all you have to do is to plug this formula in here and just massage a bit the final result. If you do that, I'm just quoting the result for the lack of time. But it's really simple. There is a question. So should we not compute the limit of replicas n going to 0 for phi n of q? Before finding the equation for little q, otherwise q would depend on n, the number of replicas. Yeah, you can do both. The two strategies would commute. You can first compute it in full generality and then take the limit here at the level of the subtle point equation and then plug it back or do the other way. They're the same. They are the same. OK, so this is the expression of my function phi n at q mean that depends on this q on little n on beta and on sigma square. And now, as has been suggested, we can take the replica limit. So we can compute the limit n to 0 of 1 over n phi n of q. And at the same time, we can compute the off diagonal condition in the limit. The equality by off diagonal condition, I mean the equality for q. So you need to take the expression that I very wisely just erased and take the limit n to 0 there. And you get an equation for q, which simplifies drastically. This q 1 minus q square equal to alpha beta square q plus sigma square divided by 1 plus beta minus beta q. So that's the condition that you get by taking this limit. But this is very really trivial. It just kills a bunch of terms in the previous equality. And this is what survives. What we have here instead is, OK, you see, there is a 1 over n. So this n gets killed. And the remaining term survives. So we have alpha log of 1 plus beta minus beta q. And then there is log of 1 plus something small. So in the limit n to 0, this log of 1 plus something small goes as the something small, the beta q plus sigma square n divided by 1 plus beta 1 minus q. So this n is cancelled out by this n. And what survives is plus alpha beta q plus sigma square divided by 1 plus beta 1 minus q. This n is killed by this n. So what survives is minus log of 1 minus q. And here we play the same trick. So we have minus q divided by 1 minus q. Let me try to forgot it right. Yes. Do you agree? So all the terms are of the correct order in little n, which is killed by this 1 over n in the formula for the emin. OK, now we got the off diagonal condition and we need to analyze this object and the off diagonal condition in the further limit beta to infinity. Remember, we have this double limit n to 0 and then afterwards beta to infinity. And that's where the situation becomes more interesting. So in the limit beta to infinity, let's look at this off diagonal condition first. So what happens on the right hand side? We have a beta square on the numerator and then we have a beta square downstairs. So the right hand side tends to a finite limit in the limit beta to infinity. And we have that the condition is q 1 minus q square goes as alpha q plus sigma square. And what remains downstairs is just 1 minus q squared. So with some caveats, but I will tell you later. So I'll erase here. So this gives an equation for q that can be solved easily. And q becomes equal to alpha, the rectangular index, sigma square divided by 1 minus alpha. And this is nice. It depends on the rectangular index and on sigma. But there is a problem. The problem is that inside the definition of the matrix q, the diagonal elements are 1s because this is a dot product between x and itself, so it has norm 1. But this q must be a dot product itself between a vector x and another vector vector x. And these are normalized by n. So the scalar product cannot be larger than 1 between these two vectors. So there is a further condition that is that we need to impose that this q, so alpha sigma square over 1 minus alpha, must be smaller or at most equal. But let me put this in, let's assume first that this is strictly smaller than 1. And if we do that, this imposes a condition that alpha must be smaller than a critical value 1 over 1 plus sigma square, which is smaller than 1 if the noise is finite. So the situation in alpha is that there is a critical value alpha critical. Then there is the value 1. And we can say something for alpha smaller than alpha critical, in which case the optimal value of q is strictly smaller than 1. OK? So what we do if q is strictly smaller than 1, then we have that the limit n to 0 of 1 over n phi n of q min for beta to infinity. So we need to take this guy here in the limit beta to infinity with q understood to be strictly smaller than 1. OK? So if we do that, well, we have 1 over beta that kills here. So everything is of the form alpha log beta 1 minus q. Plus some function of q over the 1. And if you divide this by 2 beta, this term on the right hand side goes to 0. So all I'm saying is that in the limit beta to infinity, assuming that q is strictly smaller than 1, this object divided by 1 over beta goes to 0. All terms are killed. This tends to 0. And the consequence of this is that for alpha smaller than alpha critical, which leads to q being strictly smaller than 1, then the average value of min of the loss function divided by n goes to 0. So the system in this regime is typically, or on average, compatible. Remember, this alpha is the rectangularity index. It is smaller than a number smaller than 1. So this is a system that is strongly under-complete. We have many more unknowns than equations. And so it is probably not surprising that the system is typically compatible. But there is a region here where we have an under-complete system where something else might be happening. This is an interesting region. We have fewer equations than variables, but due to the non-linearity of the constraint, something different is happening. How can we access this area? And then I stop. How can we access this other regime where we can access it by noting that this relation might be saturated when we impose that q is strictly equal to 1, which, of course, for this situation, I could not simply do this, right? Because q equal to 1 creates a problem in the denominators. So in the limit when q goes to 1, we need to assume that something different happens to beta. So we cannot take the limit q going to 1 and the limit beta to infinity separately. So we need to assume that q goes to 1 and beta goes to infinity in such a way that the combination beta 1 minus q is finite to a certain value v. In that situation, when q is frozen to its largest possible value, which is equal to 1, something else happened. And we can see here this object in this limit, q to 1, beta to infinity, and beta 1 minus q equal to v tends to a finite limit, which is non-zero. And the limit is alpha over 2, 1 plus sigma square. I leave it as an exercise, depending on this parameter v. And we have, of course, an equation for v that comes from taking the same limit here. So v is not just a random number. It is a number that is fixed by this equation in the very same limit. And the equation is in the very same limit 1 equal to alpha v square, 1 plus sigma square divided by 1 plus v square. So if you solve this equation for v and you plug it in here, you get a result for the average loss, minimal loss. And the average minimal loss in this other regime, when the rectangularity index is larger than the critical value, is 1 over 2 times the square of alpha, 1 plus sigma square minus 1, which is the same value that we had obtained using a random matrix calculations in our lecture number 2. So what happens is that the minimal loss increases here. But it doesn't increase from 1. It increases well before it. So undercomplete systems in the presence of nonlinear constraints are typically incompatible if the noise is strong enough. And the loss, of course, increases with the rectangularity. So the more equations you have in the game, the more incompatible the system is. And it increases with the value of the noise. So that's the take-home message of this calculation, that in the presence of nonlinear constraints, even undercomplete systems may not be typically compatible. You cannot find a solution of this system, even if you have many more variables than equations because they are constrained to leave on the sphere. OK, I can stop here if there are questions and then we move to the last part. Sorry, Paolo. If I divide that guy over there after the limit we have by beta, everything gets killed. Sorry? Yeah, top right. Yes, right there. We have not divided by beta. Yeah, you're right. So this is essentially a beta. OK, OK. Times is up. I just reported, and probably this one, I just reported the final result after the one over beta. Yeah, you're right. Thank you. Remember, you need to take this guy and then take the limit beta to infinity of 1 over 2 beta. But the 2 is included, yes, I guess. So it's probably a 2 beta. So do simulations corroborate this HEPICA prediction? How large must be n to observe a good approximation of that curve? I haven't done numerical simulations. I guess yes. Sorry? I guess yes. I mean, if it testable quite easily. I think it is testable quite easy. I haven't done it personally and I haven't seen it in the papers and in Rochelle's thesis, but that I might have missed it, even though I looked at it quite carefully. But I think it is testable, yes. I wonder if the spherical constraint is relaxed to a softer constraint where you say that the norm is less than 1. I guess all that becomes a convex program and it can be solved easily numerically. OK, so if you don't impose that the norm is strictly equal to 1, but it can be anything up to 1, yes. I think probably it becomes, I'm also wondering what happens if you impose the equality only on average, like that you have like a softer constraint where your norm is n or it fluctuates around n. And there you will have another parameter, of course, which is the width of the fluctuations around the radius. So you will get some sort of. Yes, yes, yes. All right, so let's start again at 1040 sharp. There is the coffee break.