 Four, one, one, five, six, seven. There we go, wonderful. So our next speaker is Dominik Schröder, who's at ETH in Zurich, and he's gonna give us a bit more of an introduction into random matrix theory and some of the key quantities that appear then, particularly the resolvent analysis. Thanks Dominik for coming, and yeah, go ahead. First of all, thank you very much for the invitation, it's great to be in Trieste. Yeah, so in light of the audience I thought today it might be the most appropriate for me not to give like a random matrix talk about the latest, greatest results, but more give like an introduction to one of the key techniques, which is the cumulant expansion. So in the first part of the talk I will mention the matrix Dyson equation, which is like one of the main equations in random matrix theory, already mentioned by Ben in the last talk. And then in the last part of the talk I will go beyond the matrix Dyson equation for a concrete model, which is maybe more interesting from the applied perspective, namely the non-linear random matrices occurred in random theta models. So yeah, okay, so as I said, is it better? So in the first part I want to talk about correlated Hermitian random matrices. So the setting, the general setting I want to talk about in this part is that you have a Hermitian big, like a large random matrix that say complex entries. And what I always want to assume is that the expectation is somehow bounded because we want to talk about bounded spectrum in the end. Then I want to be vague here, but I want to have some sort of decaying correlations which make this whole story applicable. There will be a concrete assumption in the theorem, but generally some decaying correlations. And then one important assumption is that the so-called covariance tensor, which is this S operator here, which is obtained when you take a deterministic argument and you sandwich it with the mean zero part of the random matrix, and then you take the expectation, so that's the deterministic operator, and you want to assume that this is flat in the sense that it's upper and lower bounded by the average trace for all positive operators. So this is, I mean it's a quite abstract condition maybe, but what it really ensures is, for example, if you take a simple matrix, which has just one on the diagonal and the rest is zero, and then you evaluate this covariance operator on this matrix and then you take the BB entry, then this gives you the variance. So this condition somehow ensures that the variance of every entry is upper and lower bounded by one over n, which is exactly the right condition to ensure a bounded spectrum. So that's the flatness. These matrices you can analyze on three levels basically. The first level is the macroscopic level. That's, you ask what is the global density of states? What is the global law of the eigenvalues? And this is the picture, the top picture here, and the message here is this is not universal. So depending on how you choose your model, this can be anything. Usually it consists of these kind of bumps and cusps, but it's not universal. Then the next level is the mesoscopic level. This is, if you zoom in a little bit, but you zoom in so far that you don't look at single eigenvalues, but you still look at like mesoscopically many eigenvalues. That's the next level. And then the final level is the microscopic level, which I don't want to talk about today, but this is shown in the bottom picture. And the interesting message here is the more microscopic you get, the more universality you expect. So this is really usually in weakness vision for random matrices. No matter the details of the model, once you zoom into the spacing distribution, you somehow expect this universal gap distribution. An interesting fact is that this formula here is wrong, but it's like very close. So it's like 1% close to the truth, but the formula is not right. So first the macroscopic scale. So the macroscopic scale asked the question, how do you determine this black curve here? So the histogram is of course the histogram of eigenvalues, and then there's a limit. How do you determine this limit? And the answer in this setting is you solve the matrix Dyson equation. So the matrix Dyson equation is an equation in an unknown N by N matrix, which involves the expectation. So A was the expectation of your matrix and S was the covariance operator. And you want to solve this equation that minus M inverse at a spectral parameter Z in the upper half plane is equal to Z minus A plus this covariance operator applied to this matrix M. Under the condition that the imaginary part of this unknown matrix M is positive definite, this has a unique solution. And what one should think of is that the solution to this equation really is a good approximation for the resolvent of the random matrix. And once you know the resolvent, then you can go via Stielke's inversion and get the density. So that's how you compute this black curve in this here. And depending on the precise model, actually solving this equation might be hard, but usually numerically you can do some iteration. So it converges quite fast. So that's the density of states and there's actually an interesting classification. So under quite general assumptions, what you can prove is that the density of states is one third earlier continuous. Whenever it's positive, it's analytic. And the only singularities which can appear are shown in this picture. So you can have these square root edges. So here locally the density behaves like a square root and then you can have these cubic root cusps. So when you have a zero of the density of states within the support, then locally it looks like a cubic root. And that's all that can appear. So you cannot have a quartic root or something like this. And cusps, they don't always appear, but basically when you tune some parameter in such a way that two bumps merge, then you have a cusp. So they are quite ubiquitous as well. So now to the mesoscopic scale. Let's zoom in a bit and let's state theorem. So the theorem is that the resolvent is well approximated to the solution of this matrix Dyson equation up to the fluctuation scale of single eigenvalues. I will show what this means in the next slide. And here, this is a theorem which assumes some polynomially decaying correlations. This 12 here, this is far from optimal. In fact, in the Gaussian case, two is enough. And I assume that one is enough. And also the spatial structure is not important. So this was just a statement for convenience that you assume this polynomial decaying correlation so that you somehow can quantify what you mean by decaying correlations. And then the statement is twofold. The more important probably is the second one, namely when you take your resolvent, you subtract the solution to the matrix Dyson equation and you multiply by anything deterministic and you take an average trace. Then this is one over n eta small, where eta is the imaginary part of the resolvent. So the imaginary part of the resolvent tells you somehow how fine you can resolve eigenvalues. And the other statement is an isotopic statement where the error is larger. This has a long history of results. I cannot mention all of them. But the interesting thing here, I think is this fluctuation scale. And this I will show in the next slide. So according to this classification of singularities, the fluctuation scale basically is between three behaviors. So the fluctuation scale is defined like locally at an energy, how big of a window you have to integrate over to an expectation find one eigenvalue. So in the bulk, you expect that you have to integrate over just a horizontal window of size one over n, then you find one eigenvalue in expectation. At the cusp, this becomes n to the minus three quarter and at the edge, it becomes n to the minus two thirds. So that's the fluctuation scale. And above that scale, you have a local law in the sense that if you look at distances a bit bigger than these fluctuation scales, then you have a law of large number for the eigenvalues. This is related to the quantiles, of course, of the density in the sense that the fluctuation scale is the difference of two neighboring quantiles. And the important corollary of the local law is rigidity. And rigidity here means that you pick an index i and you order your eigenvalues lambda one to lambda n. Then the quantile tells you where the limiting object should have its eigenvalue, if you can talk about that. And then the statement is that the real eigenvalue is as close as it can be to the quantile, namely close by the fluctuation scale. And I want to highlight that this is something maybe quite unnatural. So for example, if you just put, would sample like independent points according to any distribution and sort them by size and compare them to the quantiles, and this is something you could never expect. This is much more rigid. So eigenvalues of a random matrix are much more rigid than if you ask a child to draw 100 random points on a line. So this is this rigidity and the local law implies rigidity in this very optimal form. So the proof idea of this optimal local law, this is what I wanted to talk about mainly, is it uses a so-called cumulant expansion. And what one should go back to is really integration by parts. So the central idea is that you want to, the matrix Dyson equation is an approximate equation for an approximation to the resolvent. So let's write an equation for the resolvent. It's the trigality. The identity is equal to w, which was the centered part of your random matrix times the resolvent plus the expectation of the random matrix times the resolvent minus the spectral parameter times the resolvent. And I want to focus on this first term, the w times g. And for scalar random variables, one has a beautiful identity, which is just integration by parts. So when you take any function F, which is like C1, and you take a scalar Gaussian X, and you compute the expectation of X times F of X, then this has two terms. And if the X is a centered Gaussian, then the first term is not even there. So basically this is the variance of X times the expectation of the derivative of F. This is a trivial exercise to prove. For non-Gaussian random variables, the replacement of this integration by parts, it's what's called the cumulant expansion. So on the left-hand side, we have the same. Just on the right-hand side, you have an infinite series over the K plus first cumulant of X divided by K factorial, and then you have higher order derivatives. And it's actually an interesting theorem that there's nothing between these two. So a random variable, which is not Gaussian, has infinitely many non-zero cumulants. So you cannot have the first three or four terms. But either you are in the Gaussian case or you are in the infinite case. But the good news is that higher order cumulants don't matter much, usually. There's a matrix valued analogous version of this integration by parts, which requires heavier notations, simply because you have non-commutativity. So the matrix value integration by parts would be if you take your random matrix W and the function of the random matrix, and you want to compute the expectation. Then this has two terms. The first term is again trivial. And then the second term, now here I introduce this tilde, which is an independent copy of your random matrix. So you take an additional expectation and then you take the directional derivative of your function F in this tilde direction and evaluate this directional derivative at the original point W. So that's the matrix value integration by parts. And if as we are doing the W is centered, then only the second term survives. So that's the matrix value integration by parts. And there's also a matrix valued cumulant expansion, which is too complicated to write here, but essentially it involves multivariate cumulants and multivariate cumulants. The first few are easy to define. So the first one is just the expectation. The second one is the covariance. And the higher order ones, you should just think of as being higher order generalizations of covariances in the sense that they have the property that when you take the cumulant of two independent subsets, then the cumulants vanish. So this here would be the third cumulant and you can check that whenever your X is independent of YZ for example, then this here is algebraically zero. So these are cumulants. So how to go from the cumulants to the matrix Dyson equation, that's actually very simple. So I wrote here again the matrix valued integration by parts. So what we have to compute is we have to compute a directional derivative of a resolvent and that's the triviality thanks to the resolvent identity. So when I want to compute the directional derivative in this W tilde direction of my resolvent G, then I have to add this epsilon W tilde and subtract what I had and divide by epsilon and take the limit to zero. And then due to the resolvent identity, this is simply minus the resolvent times your perturbation W tilde times the resolvent. So in particular, when I take the expectation of this top equation here, I get that the identity is equal to the expectation of A minus this sandwiching, this covariance operator minus Z times G. And you recognize that this in expectation means that the resolvent fulfills the matrix Dyson equation in expectation. So this is how you get the equation basically. It's just Gaussian integration by parts. The difficulty is making this rigorous. And to make it rigorous, you replace the Gaussian integration by parts by a cumulant expansion because usually we work with non-Gaussian matrices and you prove that the resolvent fulfills the matrix Dyson equation up to a small error in high probability that's also of key importance. All these statements are in high probability and then you conclude the local law by a stability argument because you have a stable equation and you found an approximate solution of the equation so you can conclude that the approximate solution of the equation is close to the true solution of the equation. So this concludes the general part. So this is the matrix Dyson equation and this is how you can prove local laws for general correlated random matrices. This brings me to the second part which is beyond the matrix Dyson equation because the matrix Dyson equation has a feature which is that the solution just depends on the first and the second moment of your random matrix. And there are many scenarios where the random matrix is not described fully by the first two moments. So now I want to show an example where this is the case and this example is motivated by applications also interesting to many people here I think. So there's a theorem which is usually called the Gaussian equivalence theorem and this says that if you take two IID random matrices W and X in the application you would think that W is some weight and X is maybe some data but here these are just independent IID random matrices and then you assume that your non-linear you have a non-linear function which somehow is centered with respect to the Gaussian distribution this is not so important otherwise you just have some rank one perturbation but let's assume this was simplicity and then the Gaussian equivalence theorem tells you that this Y matrix so the non-linear function applied entry-wise to this matrix product has the same global same asymptotic singular value distribution as the one of a much simpler model namely the one where you don't have any non-linear function anymore but you just have some scalar parameters. So here you have this theta two is just the integral of the derivative of F with respect to a Gaussian distribution and the theta one is the expectation of F squared with respect to a Gaussian. So that's the Gaussian equivalence theorem and the psi here is independent noise. So the message is if you have this matrix product W times X and you apply a function entry-wise then this has the effect of adding some independent Gaussian noise which is somewhat an interesting message I think. This result, okay and then of course maybe I should show you the equation. The eigenvalues of this Y transpose Y which is equivalent to the singular values of Y converge weakly to a compactly supported measure who steals his transform satisfies a certain quartic equation and the quartic equation depends on, oh I think I forgot to say that maybe so these parameters phi and psi these are somehow the range, everything is in the large dimensional limit and this result assumes that the ratio of any of these large parameters converges to something finite and these are the phi's and the psi's. Sorry I think I forgot to write that. So this is the quartic equation. The original proof of this statement was probably due to Pennington and Bora who used the moment method and they assumed analyticity of the function F. Then it was extended later to non-Gaussian setting by Bernini and Piché and how this fits in the framework today is that last year together with a master student of my MADTH we gave a like resolvent based proof which somehow sheds a different light I think on this theorem and as a side product the proof is a bit more robust and we can allow for non-analytic test functions and we could also allow for an additive bias so we could add some plus bias here and it changes the result a bit. So the resultant viewpoint on this is that exactly this random matrix Y now so let's recall the Y was the F of WX normalized. This here you actually view as a correlated random matrix and the correlations which matter in the end are the following. So the first thing which matters is that the expectation of every entry is zero. This is due to the assumption that the integral of F against Gaussian noise was zero and then the variance of any single entry is the theta one. The theta one was the integral of the F squared against the Gaussian. This is not surprising that the variance of any entry is exactly this by a central limit theorem essentially and then what's interesting about this kind of correlated random matrix is that it has so-called cycle correlations. So this here is now for every K you somehow the leading contribution comes from even length cycles with fresh indices and in this case the joint cumulant is related to the derivative of your function F and once you have this result then you easily get this quadratic equation which I showed earlier using some simple algebra and this also somehow highlights this Gaussian equivalence theorem in a matrix sense. So by controlling all these joint cumulants can somehow make sense of this approximation in a cumulant and matrix sense. So with that I think my time is almost up so let me summarize. So I showed you the matrix Dyson equation and I showed you how it can be derived using a Gaussian integration by parts and then I argued how you can use a cumulant extension to make this into a rigorous proof of a local law for correlated random matrices and then in the last part of the talk I showed you an example where all this machinery in some sense fails because the result doesn't just depend on the first two moments but still the cumulant expansion also applies to these settings just that you're not in the framework of the matrix Dyson equation anymore and I showed you this with the application to this non-linear random matrix model viewed as a correlated matrix and then I would like to thank you for your attention. Thank you for this nice exposition and this nice result. Do we have any questions we have for you? Hi, thanks for the great talk. My question is related to this Gaussian equivalence theorem you were mentioning. Do you expect that you really need both matrices to be sufficiently random or is it a restriction from this kind of techniques? Would it be sufficient for one of them to be sufficiently random to have this kind of result holding or do you think not? I think there are two levels to ask this question. I think probably to ask whether the statement is true that even if only the W is random then I think that the equation changes. I think that you should not expect that for fixed data matrix the equation is the same. What one could probably also do and might be feasible is to prove some quenched version of this. So you say you fix some random X with high probability but then you fix it and then you just use the randomness of W to still get this equation. For fixed, I think actually this has been done for fixed X to get this equation. I think this is this iterated Machenko-Pastore or something like this. I think this has been done for fixed X as far as I know of this equation. It's a matrix equation. Very nice, thank you. Another question over there I think. So, okay. Hi, thanks for the talk. So you mentioned this idea that the solution for the matrix Dyson equation serves as an approximation of the resolvent. So I was wondering in which sense this approximation holds. So does this mean that if I take, say quadratic forms, A transpose the resolvent times B then this quantity gets close to the same quadratic form of your approximate improbability. Say is it something like that or? So maybe I didn't understand the question. So the statement is that when you take, let me go back to the statement. So you take the difference of the resolvent and the matrix Dyson equation and then the statement is that if you multiply this by any, either sandwich it between deterministic vectors that this is small or when you multiply by deterministic matrix X or independent matrix X, then this is small. But I think you asked whether you can you can multiply by something random or? Yeah, no, it's, I mean something. Okay, so the statement is whenever you multiply by anything deterministic then it's small. And beyond that, I think it's just not true because if you would fix, if you would choose these U and V to be eigenvectors then you wouldn't expect this to hold. Yeah, so it's true whenever you test against something deterministic. Okay, thanks. Are there any more questions from the audience? Maybe you gave us a little bit of a cliffhanger when you said in the very beginning that result on the spacings that we all know, it's wrong, it's like very close but there's like 1% missing, I think you said. So what's missing? I think the answer is there is no, there is no nice formula for the truth. And the physicists got this formula using some heuristic arguments and it's extremely close to the truth. But the truth is you have this, it's so-called sine kernel which is basically tells you that the correlation between eigenvalues behaves like a sine of the difference. And then to get the spacing distribution from this you have to do an infinite sum on the integral of the sine kernel. And it's hard to do and I think there's no explicit expression but the end result is 1% close to this formula for some reason. And Dyson got this formula, I think this formula is called the Vigna-Sermais and it's extremely accurate but not correct. Okay, that's a nice little bit of anti-trevair, thanks. Marco, I think you had another question. I was wondering whether the techniques that you described can be pushed beyond the simple model F of WX. So for example, can you analyze the spectra of deep models or can you analyze Jacobians, conjugate kernels, anti-case? The short answer is I don't know. I think what's feasible and what has been done is if you are in this very special case when the theta two parameter is zero then after the first layer, so-called, then this first term is not there and you end up in the original setting in the sense that you have an independent matrix and then you can iterate and this has been done by Vinini and Vichy, I think. So in this special case, you can do it. In the generic case, probably tracking the correlations through multiple layers can be quite hard. But I mean, if I stash in another matrix W just on the left side, can I just look at this? So what's the problem? So now I get the product of three and then I get the product of two, but- Yeah, I think- Size independent, right? In this sense, you could prove something just that the result will then have multiple terms. We'll have one term with L as the number of your layers. You will have W one times W two times W and you will have all subsets of these weights multiplied. So you will have a sum of many things but in the end, I think this is what's probably the truth. Thanks. Just that if theta two, it happens to be zero then it's very simple. All right. I don't see any more immediate questions. So I would say let's thank Dominic and all the speakers of this morning's sessions again. Recording stopped.