 Okay, so I think we can, we can start. I just prepared a quick summary of what what we did yesterday so the problem that we are trying to solve is about the so called oblique you progress this problem so essentially we have a linear system of equations so x is our unknown a is the matrix of coefficients that we assume to be equal to of size M times and with in general M. Number of equations larger than the number of unknowns so the system is in general over complete with quadratic constraint on the on the norm of the of the solution. Sorry, this one is so by computing the introducing a Lagrangian that takes into account this constraint and computing the critical points. So let's go back to an equation for the solution in terms of the matrix W, which is the Wichert matrix corresponding to the matrix of coefficients, and we have an equation that fixes the Lagrange multiplier. The equation depends on random objects that are the eigenvalues the positive eigenvalues of the Wichert matrix W and the component essentially of the noise vector B. Okay, and in general we have a profile of this function on the left hand side that must match this constant and over sigma sigma square and depending on the level of the noise at which we we cut this equation. So we have a number of real solutions of this equation which gives the number of critical points, irrespective of their type irrespective of whether they are minimum maximum or subtle in this large dimensional, large dimensional space. So the noise here is essentially in the term be and in the matrix a so a would see. Sigma square. So we define essentially be as sigma psi where psi is is a Gaussian vector with mean zero and variance one. So, we wanted to compute the average number of critical points irrespective of whether they are minimum maximum or or saddles, and after averaging over B, and then averaging over the matrix, the matrix W, which we represented in this form. We ended up with a formula with the known constant that depends on and and and sigma, an integral over lambda, the one of the n plus one degrees of freedom. And are the number of components of X and then plus one which is the Lagrange multiplier, the an integral over Q which will produce a vessel K function that comes from from a change of variable. And then the remaining, the remaining important bit that we wish to compute is this five function, which is an integral over essentially we shared matrices of size and minus one, but with an extra absolute value of this determinant. And that comes from the cuts rise from the Jacobian in the cuts rise formula. Okay, so this is the main technical hurdle that that we would like to to overcome, and this can be done and this is the last thing that I wanted to show you, thanks to a trick that that is due to to young young fielder off. So the evaluation of this integral proceeds separately, depending on whether lambda is larger than zero or lambda is smaller than zero, but I will only do the case, lambda larger than zero for time, time reason. The idea to solve this integral is to first perform an orthogonal decomposition of the matrix. W, which is we shared in terms of eigenvalues s one s and minus one. And then, oh, minus one where all is an orthogonal matrix of size and minus one. If we do that, we notice that essentially this integrand is rotationally invariant, so it will only depend on the eigenvalues here. So the measure the W gets transformed into an integral over the orthogonal group, an integral over the eigenvalues. So this is the equilibrium of this transformation, which is given by the wonder month determinant in N minus one variables. So I put a pointer to this transformation law in the handout I think it's a question. So this delta object is just the product, j is smaller than K of s j minus s k. Okay. So it is an object that contains the differences of all the eigenvalues evaluated. Yeah, the N minus one eigenvalues of this matrix. So there is a pointer to a derivation of this of this time. Okay. Now, noticing the rotational invariance I mean we can we can proceed formally by changing variables and having an integral over the unitary group, an integral over the eigenvalues. So all these terms can be expressed in terms of the eigenvalues alone, right. So we get exponential of minus and over to summation k one to n minus one s k. And then we have a product from J or K from one to n minus one of s k raised to the power and minus and minus one over two. So this term here can also be written in terms of the eigenvalues only. This will be the product k one to n minus one of the absolute value of lambda minus s k. And then we have the absolute value of the wonder month determinant in variables of size and minus one. Okay, so this integral over the orthogonal group is just, it's the volume of the orthogonal group essentially so it is a known constant. We don't care, we will absorb all the constants in the main constant here. So we have this, this integral, this n fold integral and minus one for the integral over the eigenvalues. And what young fielder of noticed is that we can lump these two terms together, using the following identity. The absolute value of the wonder month in n minus one variables times the product k one to n minus one of the absolute value of lambda minus s k can be written in terms of the wonder month. So we have n variable. s one s n. So we are enlarging the space we go from n minus one to n variables, then we stick in here at Delta that lambda that s n is equal to lambda. And then we are integrating over. So this, this is an equality and identity that you can essentially prove prove directly, you just write the bottom of the terminal, which is this object here, and you recognize that the terms in this wonder month, the term in this extended determinant that pertain to the lamp to the s n eigenvalues are the product of s n minus s one s n minus s two s n minus, you know, s n minus one. And all these you can write into using this delta function all this s n becomes become equal to lambda, which reproduces exactly this term, this term here. So using this, this trick, which is a fundamental observation we can rewrite this integral in a much more convenient form. The absolute value is is in here as well. Yeah, sorry, of course. Thanks. Thanks. Yeah, that that if if these were so simple that we could get rid of the absolute value now. Thanks. I wish. Yes. Okay. Okay, so what we have thanks to this identity is that this integral here. We can extend it to include a further variable. So the s one d s n minus one and then we add an integration over the s n, and we write something like exponential of minus and over to summation k one to N. Okay, so we extend this sum up to N. Remember it was up to n minus one and of course we need to correct for the for the nth term in this, this exponential. So we write here in front exponential of n over two times s n, but as sent is equal to lambda. Okay, then we extend also the product, the product of eigenvalues. Okay, from one to N. So we need to divide this of SK to the power and minus and minus one over two. And of course we need to correct this. So we need to divide this by SN to some power, but SN is lambda to the M minus and minus one over two. So we do this corrections, and then we have a wonder moon in n variables of our vector of eigenvalues times a delta that fixes that lambda is equal to SN. And now we are essentially essentially done because we are recognizing in in this integral here is integral with this term, the joint probability density function of the research ensemble of the eigenvalues of the research that we recognize and which I give in a question to of the of the handout, and with this extra delta here. This becomes just the marginal of the joint PDF. So this is just a spectral density. Okay, so we have related the average of the absolute value of the determinant of a we shared to the spectral density of a larger we shared. So that's the essence of this of this trick. And if you're not totally mesmerized by this trick. Well, I don't know what what we will do it for you. Okay, so this is just a wonderful, wonderful trick, isn't it. So the object here, this entire integral is nothing but the spectral density. Let's call it raw and of lambda of a we shared matrix of size and by, and by N, which is known in closed form in terms of Lager. So I give, I give a reference to this in the in the handout as well. So everything now is written in terms of quantities that either are known or can be easily, easily computed. If the object is effectively is sold remember that had to start with this lambda larger than zero hypothesis why, because in order to recognize that this object is the spectral density of an enlarged we shared, we need to impose the constraint that we shared to the spectral density. Okay, because the, the we shared matrix as as positive eigenvalues. So for lambda negative we have an entire set of arguments but luckily, the case lambda smaller than zero is not as interesting. The number of critical points of all types is essentially the sum of two contribution and plus and and minus where plus and minus are the sign of of lambda. So you need to split these two contributions. Yeah. Sorry. Sure, but you cannot connect it. It is no longer true that you don't have to worry about the absolute value. Yeah, true. But you have to do the computation and you cannot relate it no longer to the spectral density of we should. So you saw one problem but not, not the full, the full story so the absolute value is a problem in general, not in that, in that case, but you cannot relate, you can no longer relate it to the spectral density of, of a larger Okay. So essentially, the, the problem is solved, all you have to do is to take this formula here and stick it into the, the, the format that I gave you before, and there are two remaining integrals that are left the integral over lambda and the integral over q, one gives a best of function and there is a surviving integral that is difficult to write in closed form. So in the end, you get a final formula that I can write for n plus, of course. So you have a certain constant that depends on m and sigma and this is fully explicit, and then you have an integral over lambda of exponential and lambda plus one over sigma square root of lambda. You have rho n of lambda, some combination of Laguerre polynomials times the integral over q of q to the m minus n minus two over two exponential of a certain combination here, another constant times q plus one over q. So the integral becomes essentially a best of function and this other integral must stay this way, because it is, I mean, you can with a lot of work reduce it to a finite, finite sum but it's not much more illuminating. You can solve because you can plot this, this function explicitly. Why did I say that and minus is not very, very interesting. Well, you, you understand it from the finite and picture that I drew before. Remember, you have something like this. The lambda negative. So the intersection of this horizontal line that gives an interest and a negative intersection. You can see from here that the intersection can either be one or zero, essentially. So if, if the, if the horizontal line is here, you have one negative intersection. If the horizontal line that is here, you have zero negative intersection. So for for large and this object is essentially either zero or one. And for, for, for even if you take the average, essentially you have a situation where n plus starts from to end and goes down to one. That's the result. If you plot this function here. You have n minus that starts from zero and quickly raised up to one here. So if you put together these two, two terms, then you, you get, let me change color. And the average number of critical points has precisely this smoothed version of the staircase where the number of critical points goes from to end. When we are up here, down to just two critical points, a minimum and a maximum where we are down, down here. So I just showed quite quickly, but with all all the steps that this problem can be solved completely for finite and if, of course, you are not interested on the type of critical points that that you have here. If you are if you want to distinguish whether you're talking about a minimum or maximum or a saddle, the problem is orders of magnitude more, more complicated and for this particular problem, we don't know the answer. The reason is that in the cuts rise formula, you need to put an extra constraint that you won't be Hessian to be, for example, positive or negative. And that constraint is something that we cannot easily handle with this with this formalism. Okay, so what I wanted to do now is I wanted to go back to a more perhaps stringent question which is the statistics of the smallest solution. Okay, can I can I raise everything and I'm happy to take questions if if there's any. I will keep the, I will keep this sketch here. So let's go back to the equation for the lagrang multiplier, which is let's call it g of lambda, which is the left hand side of this. It's divided by lambda minus Sigma I s I sorry, square psi t bi square, and this must be equal to one over Sigma Square. So every real solution of this equation provides an intersection here. And each of these intersections correspond to a critical point of of our loss function of our constrained loss function. Okay, so this is of course a random function of London. So what is what is interesting for us, the most interesting of these intersections is this guy here, because it is the smallest solution of this equation which according to Brown theorem corresponds to the smallest value of the loss function, and the smallest value of the loss function is what we are, we are interested in because that's, that's an indication of whether our system is compatible or non compatible in large dimensions. So we want the statistics of this guy, and to do so, we try to average this equation over the disorder with the condition that lambda must be smaller. So this is lambda lambda must be smaller than s one. So this is the smallest eigenvalue of our wishart. Okay, in this way we are singling we're singling out the, the smallest eigenvalue here. Okay, so we are singling out the smallest lagrange multiplier. Okay, so if we average this equation and we try to solve it with the constraint that lambda must be smaller than the smallest eigenvalue of the s one, then we have some chance to characterize the smallest value. Sorry. I mean, it's either this one or this one, you need to imagine that you have only one cut here. Okay. So you have only one cut. So whatever, whatever is the, the first intersection that is to the left of s one, this will be the smallest value of the Lagrange multiplier. Okay. So we want to average this, this guy here over the distribution of side is normal zero one and the distribution of the eigenvalues of a wishart wishart matrix that's the two sources of disorder. So we first average over psi. So we get one over N summation, s I lambda minus s I square. And then we have summation over lambda. Sorry, L and M. And since the components of the vector psi, which remember psi is just Sigma times B. Sorry, B is Sigma times side and the components are ID Gaussian. So this is just with variance one. So this is just a delta L M. So what we get here is just summation over L V IL square, and this is equal to one because we assume by normalization of the eigenvectors. So we perform the average over the side or the noise disorder, and we can rewrite this, this object that remains here in this way, one over N minus the derivative with respect to lambda of the summation over I of s I lambda minus s I. I'm pulling out a derivative so that I can, I can take out a power of two here. And then what I do is I add and subtract lambda here, so that I get a factor of minus one here, and a factor where s I only appears in the denominator. So if we do that, then we get that this object becomes minus the derivative with respect to lambda of lambda, the guy that survives here times one over N summation over I one over lambda minus s I. This is an object that Mark has discussed at length in his lectures. This is just the resolvent of the we should the we should matrix. Okay. So we get to the to the final formula that if we average the left hand side of this equation over psi. We can average it over a or over the we should matrix w, which is a transpose a. What we get is that this object is equal to minus the derivative with respect to lambda of lambda g of lambda, where g of lambda is the average over the we should matrix of one over N trace of lambda I minus w to the minus one. So essentially we we have an explicit expression provided that we can compute the average resolvent of the we should we should matrix, which is something that we essentially with with Mark we have touched in sideways. Yes. Yes. Yes, so this, this is a notation that in the large and limited is a bit misleading what we are going to what we are going to determine is the fact that of course the eigenvalues when we average over the we should matrix will will form a continuum of eigenvalue which is the spectral density. So we are we are, we will need to require that lambda is smaller than the lower edge of the much and couple store essentially. So what what is going to happen is that in in the large and limit when we average. The eigenvalues will will form a continuum here, which is described by the Marchenko Pasteur law, and we are going to impose that lambda is smaller than the lower edge of the Marchenko Pasteur. Of course, of course, it's not. Yes, so we are just assuming that we are taking the solution that lies outside of the Marchenko Pasteur C. Good. So, excellent. So the to compute this object here we use what what I call in in the handout equation 16, the average trace identity, which is something that also Mark essentially covered. What we need to do is we need to compute the integral with some some constant C, the integral of the Marchenko Pasteur density with the lambda minus s that comes from from from here. Under the condition that lambda is smaller than the lower edge of the Marchenko Pasteur. Okay, and this integral can be computed exactly by making a change of variables. To do that, you, you can reconstruct an integral. So this goes between as minus and as plus you reconstruct an integral in the x, that is, of this form x, one minus x divided by one plus a x. One minus BX. That can be computed explicitly explicitly it is in various tables. I can give you the result here. Okay, so you, you make this change of variables the constants are are known, you use this integral and you plug it in in here. And this will become a function of this parameter lambda, which we assume is smaller than s s minus, and then all you have to do is is to take a derivative of this object. There's just a long but trivial, trivial algebra. Okay, so if you do that, you get an object. Let me write it like. So a and B are these these coefficients that are of course expressed in terms of s plus and s minus the two edges of the Marchenko Pasteur. So, so with this with this change of variables. So you don't find this integral on tables but you do find this integral tables, and then all you have to do is to match the coefficients a and B with some particular, you know, combinations of s plus and s minus. Okay, so you get this equation. So this is the left hand side of the average version of this of that equation, which must be equal to one over sigma square. And this is an equation that you need to solve for lambda. Let's call it lambda star in in the larger limit this lambda star is essentially the location of the smallest Lagrange multiplier. And after you have averaged the left hand side of the of the equation over all sources of of disorder. Okay. Now if you solve this, this equation for lambda star, you get a very interesting expression which is one of the main take home message messages, which is this. lambda star, while I put the star here is equal to square root of one plus sigma square, and then you have root alpha minus root one plus sigma square. And then root alpha one plus sigma square minus one where so alpha is m over n, the ratio of number of equations versus number of unknowns, which is larger than than one. So, a few consistency checks. So, first of all, we see that lambda star the minimum. The minimum Lagrange multiplier changes sign at some point, depending on whether alpha is larger or smaller than one plus sigma square, which is in agreement with with this. So this this value here. I mean, depending on whether alpha is larger or smaller than one plus sigma square, the smallest intersection here will will happen here or or here will be positive or or negative. Okay. And then we have this very important combination which will play an important role in the in the in the future. Having the result for the typical smallest multiplier, we can proceed by computing the typical value of the smallest loss. So what we want to compute is the, we take the age of X, which is written as one over to a X minus B square, but then we replace for X, the solution of the critical point equation. Okay. And then we know that X. The solution was w minus lambda identity to the minus one a transpose B minus B. Sorry. This, this was the solution and then there is an extra minus minus B. And then we compute this object. When we take the square norm. It's not equal to lambda star. So, it, it is, it is the average. But then we, we have assumed some sort of self averaging from from the, from the beginning. So it is the average and the typical value if if our old derivation, you know, goes through. Otherwise, we wouldn't have, we wouldn't have been able to start from averaging the entire equation. Right. So, if, if this disturbs you, this would be the average value of the smallest eigenvalue, but in the back of our mind we need to assume that this is also the typical value. So that we are not in a situation where the average and the typical value do not coincide. It is an assumption behind the fact that we, we are averaging from from the beginning the left hand side of the equation. Right. Otherwise instance by instance, that equation will have a fluctuating solution, a widely fluctuation fluctuation solution right, which is not necessarily represented by by it. The fact that typical and average do not match is not necessarily linked to a lack of concentration. It may be just two different quantities, even in problems where the typical quantity. Sorry. No, no, okay. Like, if you assume concentration, it's. Yeah, if you assume concentration, then, then it's again, I would say. Okay, good. So what you have to do is to compute now this, this object and then replace lambda with lambda star here. Okay, so you just need to multiply this. You just multiply one half of a X, which is this object here transpose, and then you multiply it by itself. Okay, so you can then you plan. You compute this at lambda, lambda star. Of course you need to average over B and w again. So I will not. I mean, I will just give you the last step. So these are algebraic step we're just multiplying matrices. Okay. In the end, you get the following. So this is essentially the minimal. So averaging over a and B, this will be the average value of the minimal loss. So what you get is one over two B transpose a. So after, after multiplying matrices, which you need to average over, let's say, a or w and and B. So you, you do that. So the average over B of B square is easy. So it is summation I want to M of bi square average and B. We know that the variance of B is Sigma Square. Okay. So we get that this is M of Sigma Square and this takes care of this of this average. And then we need to average over B this term here. So let's call this matrix. So you get that the average of B transpose R B is equal to Sigma Square, the trace of R is a simple one liner exercise for you. Okay. So the B, the B average is easy. And once we have done the B average, we get a term trace of our. So the trace of our is is a term of the surprise is the trace of W divided by lambda identity minus W, which we need to average over W. So this can be can be done using the marching poster law. So this is, this is easy. Okay, so everything that the minimal loss can be computed by averaging over the we should distribution and, and what we get in the end. Is this final result. Which is that in the limit and to infinity, the average minimal loss divided by N is given by this result one off square root of alpha one plus Sigma Square, minus one. And here is essentially what what you had here as well. So we have, we have a formula for the average minimal loss in the large and limit obtained by averaging over all sources of the of the disorder. And of course, we have computed this formula, assuming that alpha, which is the ratio M over over N was larger than one using a different technique so replica technique which I'm going to do next and for the remaining two hours. We extend this result down to a critical value here, which is smaller than one, and it is given by one over one plus Sigma Square. So what what is this result telling us, well it is telling us that for an over complete system. The minimal loss is positive. So we will not be able to satisfy the system, exactly, which is not. I mean it is quite obvious that if we have a number of equations that is much larger that is larger than the number of unknowns, we would not expect to find an exact solution. We will, we will show with another method that this carries over to another critical point which is smaller than one. So even under complete systems are in general, not compatible, due to the nonlinear constraint. Only only when the alpha the ratio is smaller than this number that is smaller than one, then typically we get that the average minimal loss is zero, which means that we can satisfy the system exactly. So the presence of the nonlinear constraint, the quadratic constraint makes under complete systems in general, non compatible. Okay, that's that's the take home message and and you see it for example in a simple example, like if m is equal to one and n is equal to two, you get a system of two linear system where you have one equation. With two unknowns, but a nonlinear constraint. So you, if you didn't have the nonlinear constraint, then you would have a very large number, very large space of solutions, but the presence of the nonlinear constraints tells you that you get a real solution. Only if be one is smaller or equal than a certain value. So if the noise is too big. Then we cannot satisfy this this equation, even though the number of unknowns is larger than the number of equations. Okay, so the presence of the nonlinear constraint is actually very important. So we can make a system that would otherwise be solvable, not solvable. And that's, that's the source of this. Yeah. Yeah, so the question is, is it known if below this, this transition point. The, the loss function is in India on the average is exactly zero, or if it is like a sub linear terms. And I would, I would guess that it is certainly sub linear in N. So what we have is that to leading order in N, the average is zero that's the only conclusion that we can, that we can get. And I'm sure there are, let's say, order little low of one corrections to that to this, to this result. And so, terms, terms that go to go to zero when and goes to infinity but it is not identically zero. So in conclusion, what we, what we are, what we are saying is that a very large system on average or typically is solvable. But the finite, a finite system will will have, you know, correct. So if you're talking about finite and then most of the systems will not be exactly solvable. That's for sure. This is a statement about typical behavior in very large systems. Let me, let me also mention that it is obvious that this, this threshold decreases so goes to the left when Sigma when the noise increases, right. The, the larger the noise, the more we expect the system to be incompatible in, in, in general so this this threshold is expected to move to the left, when the noise, the noise increases up to now we only have a result about this but we have to deal with the method that we are going to develop the replica method we can push this, this line down to this, and this is a very important message. So under complete systems in the presence of no linear constraints are generally non-compatible, maybe non-compatible. Sorry, I did hear. I, you're right to be honest I did this very quickly it might be that that there is, there is an absolute value I'm, I'm, I'm not sure I will need to redo this more, more carefully. I mean, this was just an illustration to say that there is an inequality constraint on the known term here that comes from the, from the quadratic constraint. Yeah. Yes, probably you are, you're, you're right. Yeah. Okay. Any other, other questions. Okay, well, luckily I can now relax. I'm sure it's not true. Okay, we have a break right.