 Thank you. Okay, welcome back. Welcome back. Coffee, you have coffee? No? This is not going to end well. Are we ready? Good. So we continue our study of, so topic of today's lecture is the so-called orthogonal polynomial, polynomials technique. So I will explain what this is useful for. Let me set the stage first. So consider a real symmetric matrix H, which is of size n by n. So we know that since H is real and symmetric, it can be diagonalized via an orthogonal transformation. So H can be written in the form O lambda O transpose, where lambda is a diagonal matrix containing the eigenvalues of H. And O is an orthogonal matrix, so a matrix that satisfies this relation. So this is the basically spectral theorem in its simplest form. Now we want to see what effect this transformation has on the joint distribution of independent elements, independent degrees of freedom on the left and on the right. What do I mean by that? So suppose that the matrix H is random. This means that it will be described by joint probability density for the entries, P of H11, Hnn. By this, I mean the entries, for example, in the upper triangle. So these variables are not exactly n square, right? Because there is a symmetry requirement. So the number of independent variables here is n times n plus 1 over 2. So n times n plus 1 over 2 degrees of freedom, these are the entries in the upper triangle. So this is the measure which pertains to the left hand side of this equation. Now we want to express this joint probability density times the measure in terms of eigenvalues and components of the eigenvector. So this object here can also be rewritten in terms of another function, P hat, which is a function of the same number of variables, just different variables, namely the eigenvalues and a set of real parameters that parameterize the orthogonal matrix here. So we can write this equation. This is just the law of change of variables for probability densities. So by this symbol, I mean we denote the collective set of eigenvector parameters. So we have on the right hand side n eigenvalues and so we will need n times n minus 1 over 2 real parameters to parameterize the eigenvectors. So that on both sides we have the same number of parameters. So every time you can write a random variable or a set of random variables as a function of other random variables, you can make this change of variables for the probability density. On the left hand side the entries, on the right hand side eigenvalues and eigenvectors. Excellent. And then the theory of change of variables tells us what this object is, what P hat is in terms of P. So P hat of lambda 1, lambda n and the components of the eigenvectors is equal to what? Well, it is equal to P, this function here, where h11 is expressed in terms of the new variables. h22 is expressed in terms of the new variables, hnn is expressed in terms of the new variables, but this is not enough. So in order to find the new joint probability density of eigenvalues and eigenvectors, I need to take the joint probability of the entries expressed in terms of the new variables, but this is not enough. There is a term that is missing. What is the term that is missing? I need to multiply this object by what? Yes, by the absolute value of the Jacobian of the transformation between entries and eigenvalues and eigenvectors. So this object here will be the determinant of the Jacobian of the transformation between h and the set eigenvalues, eigenvectors. So this is just, there is no reference here on two random matrices. This is just the law of change of variables for joint probability density functions. The new joint probability density function is the old one expressed in the new variables times the absolute value of the Jacobian of the change of variables. Now this Jacobian in general could depend on eigenvalues and components of the eigenvectors. This in general it is possible that if you make a change of variables and you have this set of new variables, this Jacobian can depend in principle on the full set of new variables. Now it turns out that in the case of random matrices, this is not true and this is a fortunate event. So for random matrices, so for this type of transformations, J, the Jacobian of the transformation only depends on lambdas, so not on the eigenvector components. And this expression, this Jacobian can be computed exactly and it is exactly equal to the factor that we have already seen in the joint probability density of the eigenvalues. So it is the product of lambda J minus lambda K to the power beta, where beta equal 1, 2 or 4 depending on whether the original matrix is the real symmetric. So in this setting beta would be 1, but you can redo the same calculation for ensembles with unitary or simplecting invariance. All that changes is this exponent beta. So you see this Jacobian which in principle could depend on lambdas and the components of the eigenvectors, it turns out it depends only on the eigenvalues. Good. Note also that this derivation does not make any assumption on the measure on H. So H doesn't need to be a Gaussian matrix or anything. This derivation is completely general. It is valid for any random matrix that can be diagonalized by an orthogonal transformation. Excellent. Yeah? Yeah, because yes. So the columns of O will be the eigenvectors of your matrix H. So this set O for me it means the set of independent parameters that you need to parameterize the matrix O, which means the number of parameters that you need to parameterize the eigenvectors of the matrix H. And you need this number of parameters, n times n minus 1 over 2, because the eigenvectors have some properties. They need to be orthogonal and also normal. So you have some constraints that decreases the number of degrees of freedom that you need. Okay. So can I erase here? Good. So now we have just by using first principles and a calculation that I haven't done. I'm just giving this as a fact. The calculation is not difficult but a bit boring. So let's take it as a fact of life for the moment. Now we have this object. So what is this object? This object is the joint probability density of eigenvalues and eigenvector entries, let's say. By entries I mean the number of parameters that I need to parameterize the eigenvectors, which are the columns of our matrix O. Good. So now suppose that we want, if we want the joint PDF, joint probability density function of eigenvalues alone, what do we have to do? Yeah. So I take this object and I will integrate out the eigenvector components, right? So I can define another function, let's say P double hat of the eigenvalues alone. What I need to do is I need to integrate over this collective set of variables which corresponds to the eigenvectors, the joint distribution, the joint PDF of eigenvalues and eigenvectors. So since we know the expression of this object, what I need to do is to do this collective integral over O of P, the distribution of the entries written in terms of the new variables H11 lambda O HNN as a function of lambda and O times the absolute value of the Jacobian, which we know as this form, okay, where beta will be equal to 1 in this setting, but I just leave it there for convenience. So if I want to compute the joint PDF of the eigenvalues from the joint PDF of the entries, this is the operation that I need in principle to perform. I need to integrate out the eigenvectors and basically this object clearly does not depend on the eigenvectors, so we can pull it out, but we have this integral to perform. Now, this integral here, so this integral can be moved out, this integral here can be performed or not. So depending on the random matrix model we started from, we can perform this integral or we may not be able to perform the integral, okay? So it is not guaranteed that we are able to carry out the integration over the eigenvectors. It is not guaranteed. Sometimes we can do it, sometimes we cannot. And this is the core of the problems when we cannot perform this operation, then we only have a very partial information about the eigenvalues. We don't have access to the full joint probability density of the eigenvalues, okay? So we are in some sort of nasty, nasty waters, because we don't have access to the piece of information that we need to compute stuff about the eigenvalues. Now, there is one case where this integral can be performed. One case where the integral can be performed is when, can I remove? Well, it's too late. So the integral over O can be performed if your random matrix model belongs, you remember our scheme, belongs to this category, okay? So if your random matrix model is rotationally invariant, then the integration over the eigenvectors can be performed always. This is not the only case. There are other cases where this integral can be performed, but this is a very large class of models for which this integral can be performed exactly. Why? Well, because if your model is rotational invariant, this poses a very strong constraint on the form, on the explicit form of the joint probability density of the entries. So there is a theorem which tells you that if your model is rotational invariant, then the joint probability density of the entries can only be a function of the trace of h, the trace of h square, the trace of h to the power n. This theorem goes under the name of Weyl's Lemma, and it is a very powerful, it poses a very powerful constraint on the form of the joint distribution of the entries. So if your model has this rotational invariance property, the joint pdf of the entries cannot be arbitrary. It can only be a function of the trace of h, the trace of h square, the trace of h to the power n. It's a very, it's a very strong constraint. And now we can use this Weyl's Lemma here in this, in this integral, right? Because we know that this object here will now have this form there. So this p, this p will be a function of trace of h, trace of h square, trace of h, n, which we can write in terms of the eigenvalues. It is a function of sum over i of lambda i, sum over i of lambda i square, sum over i of lambda i to the n. You see? I'm just rewriting the function of the traces, and the traces we can write them in terms of the eigenvalues alone. So you see that this bit does not depend on the eigenvector anymore. This bit didn't depend on the eigenvector already. So all we have to do then is a trivial integration of a constant. So we need to integrate over the eigenvectors what? Just one, just a constant one, because all the rest depends only on the eigenvalues. So it goes out of the integral. So in summary, all we have to do is the integration over d o, which basically just gives us a number, a constant in front. So if our model is rotationally invariant, we can carry out this integration and we have access to the joint probability density function of the eigenvalues alone. So for this class of models, we are always able to characterize the joint probability distribution of the eigenvalues alone. This is not guaranteed for any other model lying anywhere else in this scheme. But for rotationally invariant model, this is possible. So in summary, for rotationally invariant ensembles, the joint probability density function of the eigenvalues as always this form, like a constant, which is due to the integration over the orthogonal group, times the absolute value lambda j minus lambda k to the power beta, beta equals to if the ensemble has rotationally as orthogonal invariance, and the joint distribution of the entries written in terms of the eigenvalues. So this is the general form of the joint probability density function of the eigenvalues for random matrix models characterized by the rotational invariance. You can recognize that the Gaussian case that we discussed yesterday exactly falls in this category. Why? Because for the Gaussian case, this object had the form exponential minus one-half trace of h, h square. So this guy would be exponential, it's one-half lambda i square, which is exactly of this form. But of course, the Gaussian ensemble is here while this expression is more general. It holds here. So now, you might have the question is this the only case where this integral can be performed exactly? The answer is no. We have a larger class of models where this integral can be performed exactly. I will not discuss this in these lectures, but the situation is not this hopeless. So we have some hopes also in other matrix models to be able to compute this integral. This is a bit more technical, so I will just leave it like this. Yeah, sorry? Yeah, exactly. So if we start here, we started off with a real symmetric matrix. So this beta will be equal to one. I'm just leaving it there because the derivation is exactly identical in the other two cases, except that the power will be different. So I'm just leaving it explicitly there. It will be equal to one in the case we are dealing with, equal to two for Hermitian matrices and four for Quaternion-Salf-Gewall matrices. Good. Okay. What is also important is that for a very important class of random matrices, this integration cannot be performed. And these are the matrices here. So in general, if the only thing you know about your matrix is that it has independent entries, then in general, you are not able to perform this integration. So although this object will have a product form, the way entries and eigenvectors are, the way you write entries in terms of eigenvectors is so complicated that you cannot carry out the integration over the eigenvectors. So the consequence is that for this class of models, the joint probability density of the eigenvalues alone is not known to date. So this is an open, a very important open problem. For example, if you have a random matrix formed of zero and ones which are sampled independently, this could be, for example, the adjacency matrix of a random graph, random network. Is anybody here working on networks, graphs? Yeah. So if you have the adjacency matrix of an energy-strain network, for example, the entries are independent. They are either zero or one depending on whether there is a link or not between node i and node j. This matrix belongs to this class. And the joint, of course, the eigenvalues, you can produce your matrix, diagonalize it. You will get n random variables out there. These are the eigenvalues of this matrix. But the joint probability density function of these numbers is not known to date, just because we are not able to carry out the integration over the eigenvector degrees of freedom. So although we are able to say something about the spectrum, we don't know nearly as much as we would like to characterize the spectrum more precisely. Good. So this is a job for you or for the future. This is one of the biggest challenges in the field. Try to write down the joint probability density of the eigenvalues for a random graph, for example. We don't know it. Okay. Yeah, sure. It's this one. Questions so far? Okay. Now I will introduce the orthogonal polynomial technique. What is it useful for? So first of all, what kind of ensembles does it apply to? Well, the orthogonal polynomial technique applies to rotationally invariant ensembles. So it applies here. So it applies, in particular, for ensembles for which the joint PDF of the eigenvalues is known. So what is it useful for? Actually, the best way to find out how to describe it is not for beta equal to 1, but for beta equals to 2. So the case beta equals to 2 is somehow simpler from the technical point of view. So although there are not conceptual problems in extending it to the other betas, I will describe the technique for beta equals to 2. So what is the problem? Remember yesterday, we introduced the spectral density and of our ensemble defined in this way. Okay. So this is the average spectral density. So what is the relation between the average spectral density, so this function here, and the joint probability density of the eigenvalues? This is the question. So which of these two objects is more informative? It carries more information. We have two functions. One is a function of a single variable that gives some information about the spectrum. And here we have a function of n variables that gives some information about the spectrum. So which of these two functions gives the largest amount of information about the spectrum? Sorry? Yes. So the joint probability density function contains more information, clearly. This one contains less information. So this means that this object should be somehow obtained from this object through some sort of operations. Right? Okay? So how do we characterize this object in terms of that object there? What is the relation between these two objects? Well, all we have to do is we just rewrite this average in terms of an average over the joint distribution of the eigenvalues. So what we have to do here is we need to integrate over d lambda 1, d lambda n. And here we put the joint density function of the eigenvalues. And then we take this average. So we stick this object in here. Right? Do we agree? We are just using the definition of the average by taking the function we are taking the average of, multiplying it by the probability and integrating over. Now I leave this calculation to you. So if P is a symmetric function of the eigenvalues, you prove using this formula, and this assumption that rho n of lambda is equal to the integral d lambda 2 d lambda n P of lambda, lambda 2 lambda n. So this is a function of the eigenvalues so the claim is as follows, the average spectral density which is a function of lambda is obtained by taking the joint probability density function of the eigenvalues, replacing the first eigenvalue for example with lambda instead of lambda 1, and then integrating over all the others lambda 2, lambda n. So this object on the right hand side will be a function of lambda, which is exactly the argument of the spectral density. So the claim is the average spectral density is the marginal, is the marginal PDF which is obtained by integrating the joint probability density function over all variables but one. This is an exercise for you. Of course, I mean just in case you don't find a very good motivation within yourself, you do this type of exercises that I am proposing, I just recall that there will be an exam and blah blah, okay? Is the message clear? It gives you a strong motivation I guess. So I strongly suggest that you try to do this calculation. It is well as strong as it can be. Sorry, good. It is in your own best interest. So this is a very interesting point because if we know the joint probability density function of the eigenvalues and we know it because we are dealing with rotational invariant ensembles, all we have to do is to perform a multiple integral over this one, but this multiple integral is strange because we are not integrating over all variables. We are integrating over all variables but one and this integral is also very complicated because this joint PDF correlates as a strong correlation between the eigenvalues. So this integral has very bad factorization properties. So we need to invent a technique to deal with n minus 1 fold integrals over a joint PDF of strongly correlated random variables. This is something completely new. We don't know how to deal with integrals of this type. So the orthogonal polynomial technique is a technique to compute this object exactly for any finite n. So you can compute a spectral density of a 17 by 17 or 31 by 31 random matrix taken from a rotational invariant ensemble which is a pretty large class of models. So I just wanted to introduce briefly this method. Can I raise? Okay. So from now on I will not put the double hat on top just to save space. So we have the joint probability density function. Remember we are on this side of the board, rotation invariant ensemble and on top of that we fix beta to be 2. So the probability density function of the eigenvalues will be 1 over z n. This is a normalization constant. Then we will have the product j smaller than k of lambda j minus lambda k to the power 2. So I don't need the absolute value in here because that's the square. And then here in principle we should put this function. So the joint distribution of the entries written in terms of the eigenvalues. Now it turns out that for most practical purposes this object here can be rewritten in this form as exponential or minus summation i 1 to n of a certain function of lambda i. So this is the way you will typically find the joint probability density function of eigenvalues for rotational invariant models in textbooks. It is a slightly less general form than the one I gave you before but it is very good for manipulations and it is sufficiently general that it covers the majority of practical cases. So you can write. So this object is universal. So it is valid for all invariant models. This v is model specific. So this different random matrix models will be characterized by a different function b of x. Good. So the task that I want to discuss is evaluate the following n minus 1 fold integral. So integral over d lambda 2 d lambda n 1 over z n product j less than k lambda k. lambda j minus lambda k square exponential or minus summation i 1 to n v of lambda i. And this will give basically the spectral density at point lambda 1. So we want to evaluate this n minus 1 fold integral. If this object was not there then the evaluation will be trivial because here the exponential would factorize and so the multiple integral will be just a product of independent integrals. But this guy here makes life much, much harder because every pair of eigenvalues are correlated. So every eigenvalue fills the presence of all the others. So this integrand does not factorize at all. So we need to find a much more clever way to kill this integral. The orthogonal polynomial technique is exactly what allows us to achieve this. Okay. So let's give this object first a name. So delta n of lambda let's call it product j larger than k lambda j minus lambda k. Now it turns out that this object here can be written in the form of a determinant. So determinant which is constructed like this. The first row is 1. The second row is lambda 1, lambda 2, lambda n. The third row is lambda 1 square, lambda 2 square, lambda n square, lambda 1 to the power n minus 1, lambda 2 to the power n minus 1, lambda n to the power n minus 1. Okay. For example for a 2 by 2 case, if you take this object here, the determinant will be lambda 2 minus lambda 1. It's not really a proof of the general case, but it's convincing, isn't it? Okay. This equality is an exercise for you. Got it? Okay. So this object here can be written as a determinant. This determinant has a name. So it is name after the person who never wrote anything on the subject. Okay. So this is called Van der Monde determinant and Monsieur Van der Monde never wrote a single line on this determinant. It was actually Cauchy who studied extensively this type of object. Okay. Now this Van der Monde determinant has a very interesting property that we can understand from a 2 by 2 calculation. So I showed you that the determinant of 1, 1, lambda 1, lambda 2 is equal to lambda 2 minus lambda 1. Fine. But now suppose that I compute the determinant of 1, 1, 3 lambda 1 plus 17, 3 lambda 2 plus 17. Okay. So if I compute this determinant, I get 3 lambda 2 plus 17 minus 3 lambda 1 minus 17. So the 17 cancels out and I'm left with 3 times lambda 2 minus lambda 1, which is 3 times the object that I had before. But what is interesting is that the 17 here has completely disappeared. So I could have taken 18, 19 or any number. I could have put any number in there and the result would just be 3 times the Van der Monde determinant we had before. So now maybe we are onto something here. Indeed, this property carries over to larger and larger determinants. The Van der Monde determinant can be written, so if you introduce polynomials of degree k, the Van der Monde determinant can be written as 1 over a naught a1 a n minus 1 times a determinant of a n minus 1 a n minus 1 times a n minus 1 of pi not of lambda 1 pi not of lambda n pi 1 of lambda 1 pi 1 of lambda n pi n minus 1 of lambda 1 pi n minus 1 of lambda n. So what I'm saying is that the same Van der Monde determinant, which had this simple expression is, well it is left unchanged apart from a trivial rescaling if we replace every row by a polynomial, not just by a monomial of this type. So in the row k, we need to put a polynomial of degree k minus 1. For example, in this situation, here we have a polynomial of degree 0 and a polynomial of degree 1. And the known term has cancelled out and this factor 3 is precisely the leading term of the first degree polynomial. So it will be the a1 in that notation. So why is this important? Because this matrix here, it is full of numbers that we don't need. It is full of numbers like this one, 17, that are washed out when you take the determinant. So we have increased immensely the number of parameters in this matrix without changing this object. So the idea is maybe we can choose these polynomials, these numbers in a very clever way to simplify our calculation. So far, this identity is completely general. It is valid for any polynomial of this form. But maybe we can choose a particular type of polynomials for which the calculation that we need to carry out becomes simpler. This is the idea. Now we can do 5 minutes break and then we will go on with this. So this was the representation of the Vandermonde determinant in terms of arbitrary polynomials of degree k. So the goal that we have will be to choose these polynomials in a clever way to facilitate the evaluation of this multiple integral we had at the beginning. So this is the first ingredient. The second ingredient is the fact that you remember in the joint density for B take was to 2, we had the product j less than k lambda j minus lambda k square. So here in the JPDF we have the square of the Vandermonde determinant which means the square of this object. So then we use the algebraic property that the determinant of a certain matrix square is equal to the determinant of A transpose times A because A and A transpose have the same determinant So this is equal to the determinant of summation j from 1 to n of A j i A j k. I am just rewriting the product of these two matrices explicitly. Good. So if we now apply this formula to this determinant here we have that the square of the Vandermonde is equal to 1 over this object square. So product j from 0 to n minus 1 A j square which is this bit times and we apply this formula here. The determinant of summation j from 1 to n and then we need to put the entry of this matrix. So this will be pi j minus 1 evaluated in lambda i pi j minus 1 evaluated in lambda k. So the square of the Vandermonde will be this object here which is just the square of this object which is constructed from the coefficients of the leading terms of the polynomial pi times the determinant of this special matrix index by i and k and here we have the polynomials of degree j minus 1. So this comes just from here where the matrix A i j is this one, is the matrix of polynomials that we have there. So remember here we had exponential of minus summation over i V of lambda i. So we have taken care of this guy here and now we need to take care of this object here. So the third ingredient is put the weight function exponential of minus summation over i V of lambda i inside the determinant. So we want to squeeze this term here inside this determinant and to do that we use the following property that if we have product over lambda of a certain coefficients gamma of lambda determinant of f i j this is equal to the determinant of root gamma i gamma j f of i and j. So we have a product that multiplies a determinant then we can put these guys here inside the determinant using this property. So we can put the product that is there inside the determinant. So exponential of minus summation over i V of lambda i times vandermonde square can be rewritten as 1 over product j from 0 to n minus 1 A j square times the determinant of what? Summation j from 0 to n minus 1 a certain function phi j of lambda i phi j of lambda k. Now I am telling you what phi or these phi's here are. So we have this expression here and now we are sticking this object inside the determinant. So the phi, so phi i of x will be equal to exponential of minus 1 of v of x times the polynomial of degree i evaluated in x. So we have this expression here. So this expression all I am doing is rewriting the joint pdf of the eigenvalues in a more interesting form because you see basically this object here is the joint pdf of the eigenvalues apart from the constant in front and look what we have achieved. We have written the joint probability density function as a determinant which makes it clear that the eigenvalues are not independent they are entangled in a very difficult way because they appear inside a determinant. But the fact that the joint pdf is a determinant has very interesting and very important consequences which eventually will lead us to be able to crack the integral. So to summarize what we have done the joint probability density function of the eigenvalues for a generically invariant random matrix model is 1 over Zn the normalization constant times the product j from 0 to n minus 1 of aj to the power 2. So this aj are the leading coefficients of the polynomials we have picked times the determinant of this object here. This object is so important that it has a name we can call it kn of lambda i lambda k. So this kn is a function of two variables x and x prime which is exponential of minus 1 half v of x plus v of x prime summation j from 0 to n minus 1 pi j of x pi j of x prime. This object here is called the kernel and the kernel depends on what depends on the potential v of x. So the potential v of x is basically coming from the definition of your original model. So v of x is reminiscent of the model we started from. For example for a Gaussian model this guy would be x square. For other invariant model this can be anything. And it is also dependent on the choice of our polynomials pi. So remember pi k. Remember that these polynomials are at this stage completely arbitrary. So pi k is an arbitrary polynomial of degree k. This is quite powerful because I haven't told you what polynomials you need to use. You can use whatever polynomial you want provided you divide it by this number. Your result would be always correct. That's very powerful and absolutely not a trivial fact. Good. Now what is the next ingredient? Can I raise here? So we have rewritten the joint pdf of eigenvalues in terms of a kernel. The determinant of a kernel. Now the new ingredient is how do we choose the determinant of the pi. The polynomials. So it turns out after long attempts people have realized that there is one choice that makes life easier. So one choice is the best choice. So p of x choose them also normal with respect to v of x. So it turns out that the best choice for your polynomials is to choose them in such a way that they satisfy this orthonormality condition. Now the reason is unclear at this stage. But I will try to explain you now why this choice is absolutely the best you can make to crack the original n minus 1 fold integral we started from. Let's make an example for the Gaussian case. So for the Gaussian case you have this v of x that would be like x square over 2. So what are the polynomials that are orthonormal with respect to the weight exponential minus x square over 2? Roughly speaking over the real line. You have an idea? Sorry? Hermit. Yes. So for the Gaussian case it turns out you need of course to normalize the Hermit polynomials in such a way that the coefficient here is precisely delta ij. So it is 1 because Hermit polynomials have a coefficient here. So pi j of x will be a Hermit polynomial computed in x over root 2 divided by root of root 2 pi 2 to the j, j factorial. So this would be the best choice for the Gaussian case. Now I will try to explain you why. Why this is the case? Now it turns out so the kernel, let's leave it like this. Let's leave the kernel here. So if the polynomials pi of k are chosen this way, if we choose the polynomials this way, then the kernel satisfies a curious reproducing property. So what is this reproducing property? If the polynomials pi are chosen that way, then if you integrate over y, k n x, y times k n of y, z, the result of this integration is k n times k n of y, z. The result of this integration is k n of x, z. So you take the product of two kernels with argument x, y, y, z, you integrate it and then what happens is that the two objects basically collapse onto a single kernel of variables x and z. So in order to prove this you just take this definition, you just plug it in here and you perform this integral using the orthogonality property and you will see that one of the sum will cancel out and the remaining sum will reproduce the kernel. So now you might wonder why is this property useful? After all we need to compute like a multiple integral of a determinant of the kernel and here I'm giving you just a property of the kernel itself but there is still a determinant to compute. Now it turns out that this reproducing property has a very important effect on the determinant of the kernel. So this reproducing property has an effect on the determinant of kernels constructed this way and this is the property that eventually will lead us to be able to compute the integration, the integral. Why? Well this is, so ever time I see this proof I am just amazed by how beautiful it is and I wish I had discovered it myself but I didn't. I wasn't even born. So now there is this beautiful theorem that we can use. So in summary what we have to compute is this integral d of the kernel d lambda 2, d lambda n of the determinant of this kernel, k n of lambda i lambda k. So in essence this is the object that we need to compute and now let's assume that we picked the polynomial phi according to that orthogonality property. So this means that this object satisfies this reproducing property. Now there is a theorem that tells us, suppose that we have a kernel that satisfies this reproducing property, how can we compute an integral of the determinant of this kernel? Wow. So let's start with a 2 by 2 example so that we will see how this all works. So suppose that we have like a 2 by 2 matrix where the entries are of this form f x 1, x 1, f of x 2, x 2, f of x 1, x 2, f of x 2, x 1. Okay? We call this J 2 depending on the vector x, where the vector x is x 1, x 2. Okay? We have a 2 by 2 matrix where each entry is a function of two variables and we assume that f satisfies the reproducing property. So the integral of f, x, y, x 1, x 2, x 1, x 2, x 1, x 2, times the integral of f, y, z with some in dy is equal to the f of x, z. So now what is the integral over x 2 of the determinant of J 2? So we are taking the integral over one of the variables of the determinant of this matrix where the entries satisfy the reproducing property. Let's compute this integral explicitly. So this will be the integral over dx 2. So the determinant is f x 1, x 1, times f x 2, x 2, minus f x 1, x 2, f x 2, x 1. Right? Here you have f x 1, x 1 is not integrated over, so it goes out of the integral. So this would be f x 1, x 1, times a number, let's call it q, and this number is just the integral over dx 2 of f x 2, x 2. So q is just the integral over dx 2 of f x 2, x 2, minus the integral over dx 2 of f x 1, x 2, f x 2, x 1. Right? Now what we do here? How can we compute this integral? Well, we just use the reproducing property. So we are integrating over x 2, f of x 1, x 2, f of x 2, x 1. So this guy here is equal to f of x 1, x 1. So you see that the final result is q minus 1, f of x 1, x 1, and although I cannot probably convince you of this, f of x 1, x 1 is the determinant of j 1 of x. So if you are able to believe this, look what's happening here. We are integrating over one single variable the determinant of this 2 by 2 matrix, and the result, apart from a constant in front, is the determinant of a smaller matrix of the same type. So if this property carries over for larger matrices, then we have like a domino effect. We integrate over one single variable, an n by n determinant, and the result of this is just an n minus 1 times n minus 1 determinant. And then we can integrate over the second variable, and we just bump up the determinant, and we can basically kill all the integration one by one using this domino effect. So using the reproducing property, the reproducing property allows you basically carries over from the kernel to the determinant of the kernel. I'm just trying to summarize it for you. So the theorem is as follows. So let j n of x be a n times n matrix whose entries depend on a real vector x 1. x 1, x 2, x n, and have the form, so j n i j is f of x i x j, where f satisfies the reproducing property integral dy, f of x y, f of y z. Is equal to f of x z. Then this is the beautiful, well I would say amazing theorem. The determinant of j n of x integrated over let's say the last variable. So this is just a single, a single integral. It is an integral over one single variable of this determinant. Take it as the last variable of this vector. Then this object here is equal to q minus n minus 1 times the determinant of j n minus 1 computed in x tilde, in the vector x tilde. So this is where the vector x tilde is x 1 up to x n minus 1. And q is equal to the integral d x of f x x. This is known as the Dyson Godin integration lemma. And I think actually a more accurate reference is in the handbook. I don't know where I've got the handbook. Do we have the handbook? Must be somewhere. So I just reproduced one of the best references on this. This is on page 18 as the paper by Mahu and Mehta where they actually prove this theorem exactly in the form I gave you. That's the page 18 of the handbook. So you see provided we have a function that satisfies the reproducing property and we are integrating the determinant of a matrix built out of this function, then this integral has the form of a smaller determinant. This is a completely non-trivial fact that you can integrate a determinant and still get a determinant, just a smaller one. But if this is true, and it is because it's a theorem, then we can use this theorem to kill this integral. So we are going to kill this multiple integral like one at a time. First we integrate over lambda n and we will obtain like a smaller determinant. Then we will integrate over lambda n minus 1 and we will get a smaller determinant and so on and so forth up to kill all the integral with the domino effect. So that's a powerful trick. So the integral of the determinant of the kernel built out of orthogonal polynomials, lambda i, lambda k, sorry, lambda j, for i, j from 1 to n, if we integrate this over the object over d lambda n, what we get is the determinant of k n lambda i, lambda j for i and j between 1 and n minus 1. Note that this n here doesn't change, so this object will have the same functional form as the original kernel, but just the variable lambda n will no longer appear in the arguments. In doing this, I use the fact that q, which is the integral over d x of k n x n, which is the integral over d x of k n x n, which is equal to n, just prove it using the orthogonality property. So if this is true, we can apply the theorem and we get n minus n plus 1, so the coefficient in front is exactly 1. Then we can, if this is true, we can apply our domino effect, so the multiple integral, the n minus 1 fold integral of the determinant, well in general we can prove a stronger result, so lambda i, lambda j, for i and j from 1 to n. So if we integrate this determinant over lambda k plus 1 up to lambda n, this object is equal to n minus k minus 1, so this is the factorial, the determinant of k n lambda i, lambda j for i and j between 1 and k. So if we integrate an n by n determinant, whose function satisfies the reproducing property, over variables lambda k plus 1 up to lambda n, then the final result is, apart from this coefficient, a determinant of a k by k matrix. In particular, if you set k equal to 0, then what you have here, you have the integral of the determinant d lambda 1 d lambda n, right? So if you set k equal to 0, what you are doing is you are integrating the determinant of the kernel over all variables, over all eigenvalues, okay? So this object is equal to n minus 0 factorial, so it will be equal to n factorial, and using this result, you can compute the normalization constant z n, is this clear? I am just applying this formula, clearly the determinant is no longer there, if k is equal to 0, we are putting k equal to 0, so we have n factorial. So now we have that 1 must be equal to d lambda 1 d lambda n p of lambda 1 lambda n by normalization, right? But we know that this object is equal to what? To 1 over z n, the normalization constant, and then we have a product j from 0 to n minus 1 a j square, then we have the integral d lambda 1 d lambda n of the determinant of the kernel, and we have proven that this object is equal to n factorial. So this means that z n, the normalization constant, is equal to n factorial divided by product between j and 0 and n minus 1 of a j all square, where the a j are the coefficients of the orthogonal polynomials corresponding to the weight. This is a general result, this is a normalization constant of the joint pdf of the entries, and now we are almost done, right? Because remember what we wanted to compute, the average spectral density, it was defined as 1 over z n, the integral over all variables but 1 of exponential minus v of lambda i product j less than k lambda j minus lambda k square. So this was the task to compute this n minus 1 fold integral, and now we have all the ingredients to give an exact formula for this object. Why? I should probably keep this formula here. So this multiple integral here, after a lot of manipulation, we can be basically rewritten as 1 over n factorial using this fact. The integral d lambda 2 d lambda n, the determinant of the reproducing kernel, and using the domino property here, this is equal with k equal to 1, so we are using this formula with k equal to 1, this is equal to n minus 1 factorial times the determinant 1 by 1 which simply means the kernel itself. So this object here is the kernel evaluated at equal arguments. So the final result is that the average spectral density for a random matrix model is just 1 over n times the reproducing kernel evaluated at equal arguments. So I give you a rotational invariant random matrix model characterized by certain potential v of x. You just construct the reproducing kernel by using the orthogonal polynomials corresponding to v, and then you have immediately the result for the spectral density. You just take this kernel, you evaluate it at equal arguments, and you are done. Now this might seem slightly weird, but if you, to convince yourself that you have done the right thing, what you have to do is you apply it to the Gaussian Unitarian Sample case. Remember we are working at beta equals 2. So the average spectral density for finite n will be 1 over n, the reproducing kernel evaluated at equal arguments, and the reproducing kernel will be built out of Hermit polynomials. The Hermit polynomials I gave you a few blackboards ago. So we have 1 over n exponential of minus 1 half, which is 1 half lambda square plus 1 half lambda square. This is the kernel evaluated at equal arguments, summation j from 0 to n minus 1, and then we have Hermit polynomials square lambda over root 2 divided by root 2 pi 2 to the j, j factorial, which can be simplified as 1 over n square root of 2 pi exponential minus lambda square over 2 summation j from 0 to n minus 1, Hermit polynomial square in lambda over root 2 divided by 2 to the j, j factorial. So this is the exact result for the average spectral density, so for the shape of the histogram of eigenvalues of GUE matrices of size n. So this is valid for any n, n equals 16, 71, 120. This result is valid and it is expressed in terms of the sum of square Hermit polynomials. Now of course you might not be convinced, so what you do, you produce GUE matrices, you just diagonalize them of the size you want, and you just collect all the eigenvalues, produce your histogram, and then plot on top of this curves this formula here. And this is what I have done in the handouts, so on page 20, that's the figure I showed you yesterday. So here you have the histogram of eigenvalues of a GUE matrix of size 8, I guess. This should be an 8 by 8 GUE matrix, you see a lot of wiggles in the histogram, and then on top of that I plot this function here for n equals to 8. You can see that the agreement is perfect. I expect it like a whoo. It's nice, right? Come on, I worked two hours to obtain that. So I think we are almost done for today, but just to give you an idea of what we should in principle do next. So this result is valid for any finite n, 17, 21, 8, whatever you want. Of course if we send now n to infinity in a certain appropriate scaling limit, what do we have to find? All together. The semicircle, right? So we proved that the limit for n to infinity of the spectral density of Gaussian ensembles is the semicircle. So we need to send n to infinity here, and we should recover the semicircle. Now if, I mean, you can imagine that this procedure is not completely trivial because we have a sum that goes up to n. So to evaluate the limit n to infinity we need to massage this expression a bit before taking the limit n to infinity. Actually this can be done pretty efficiently, so maybe next lectures at the very beginning I will use up maybe 10 minutes to show you how to recover the semicircle from this finite n formula. So if you see the pictures, just to give you a flavor of what's happening, you see what happens if you increase n here. You see that for finite n the spectral density extends over the entire real axis, which means that for finite n, say n equal 8 or 15, you have a non-zero probability of finding eigenvalues anywhere on the real axis. So finding an eigenvalue which is equal to 20 will be extremely rare, but it is possible for finite n. In the limit n to infinity this is impossible because the semicircle has a compact support. So it is impossible to find eigenvalues of an infinite matrix which are exceeding a bound like root 2. Now what happens if you increase n in this picture? Well what happens is that the number of wiggles will increase. So the frequency of wiggles will become larger and larger and in the end the edges of the semicircle will become steeper and steeper until you get this nice semicircle forming and all the wiggles are so frequent and so the spectrum is vibrating a lot and so it reconstructs a smooth semicircular shape. So if you are able to use MATLAB and if you are not just copy and paste the code, this is a very instructive exercise and it gives you a lot of physical pleasure when you obtain this curve. Now go and take a coffee. You need one. Thanks very much. See you tomorrow or at some point.