 Yeah, welcome back. So today I wanted to discuss this morning and this afternoon another important problem in the context of random matrices, so the statistics of the largest eigenvalue. So this is a class of problems which pertain to a branch of statistics that in general is called extreme value theory. So before tackling the random matrix case and explaining to you why this is an interesting problem, so we will give some motivation and some applications. I wanted briefly or maybe not so briefly to go back to the case of independent random variables so that we can make the parallel between the independent random variables case and the eigenvalues of random matrices which we know are strongly correlated random variables. So we can appreciate the differences between the two cases. So one step back initially, I will just describe the problem in the simplest possible terms. So we have a collection of random variables x1, xn, so these are random variables which are iid, so independent and identically distributed and they are taken from a common PDF which we denote p of x. So p of x is the common PDF, we draw each of these variables from. And now we ask the question, what is the statistics of x max? So x max is the largest element of this set. So what is the probability distribution of the largest element of this set? So this is a random variable which in general will have a certain distribution and how can we compute it starting with the only information we have which is this PDF here. Now the problem for independent and identically distributed random variables can be tackled efficiently if we define an object which is q which we call qn of x. So this will be the cumulative distribution function of the maximum. So this will be the probability that the largest element of the set is smaller or equal than x. So the probability that a random variable, the one that we are interested in, is smaller or equal than a certain value x. Now if you think about it, the fact that the random variable x max, this one, is smaller or equal than x occurs with the same probability as the probability that all random variables are just smaller or equal than x. So this object here is exactly equal to the probability that x1 is smaller or equal than x, x2 is smaller or equal than x, xn is smaller or equal than x. So this obvious equality now gives a way forward because we can evaluate this object quite easily. Since the random variables are independent, this probability is simply the probability that the first random variable is smaller or equal than x times the probability that the second random variable is smaller or equal than x and so on and so forth. So eventually, you can write that this object is just the power n of the probability that one of these random variables is smaller or equal than x. So it is the integral up to x of the PDF. Do we all agree on this? This is the probability that one single random variable is smaller or equal than x and we've got n of them, so we need to multiply them together. So this object raised to the power n. Okay, excellent. So this object here, we can call it capital P of x, this is the cumulative distribution function of each individual random variables. Okay, now we are interested in what happens if we send n to infinity and then the situation becomes in principle quite trivial, but then it turns out that there is a very deep and non-trivial result here. So you see if we don't do anything to x and we just send naively n to infinity, so this object here, this is a function, but this object is between by definition 0 and 1 for all x. So if this object is strictly less than 1 and we send n to infinity, then what is the result of this limit? 0. So if this object is exactly equal to 1 and we send n to infinity, what is the result of this limit? Yeah. So if we send n to infinity naively without doing anything to the variable x, then the result of this limit is trivial, it can only be 0 or 1. So in order to get a non-trivial limit when we send n to infinity, we need to do something else. So what we need to do, well, what we need to do is we need to send n to infinity and x to infinity in such a way that a certain combination of n and x is finite. So more precisely, let's define a variable z which we call x minus an over bn. So an and bn are certain constants that depend on n. And we define this new object z, which means that x is equal to bnz plus an. And now we substitute this x here as the argument of the cumulative distribution function of the maximum. So we obtain qn evaluated in a new scaling variable, which is bnz plus an. And then we send n to infinity. So the question now is, can we find constants an and bn depending on n in such a way that this limit exists and it is a non-trivial function of z alone? So we are scaling x, which is the only thing that we can play with here. We are scaling x with n in such a way that we are trying to find something that is non-trivial, not just 0, 1, but a non-trivial function of z. So the main problem in this business is to find the constant an and bn if they exist. Now there is a quite powerful and very deep result here. Can I raise here? So there is a very powerful result which goes under the name of Fisher-Tippet and Niedenko theorem, which basically in essence say that for IID random variables, the scaling function f of z, so this function here can only be one of three different types. So there is a complete classification of the type of scaling functions that you can obtain from this limiting procedure. So the constants an and bn will depend on the common probability distribution of each random variable. This function f of z is universal in the sense that it can only come in three different species, in three different categories. What are these three types? Well what you need to define is basically the upper edge of the support of the pdf. So this is the supremum of the point for which p of x is smaller than 1, which basically means that is the upper end point of the support of p of x. So for example if your pdf is uniform between 0 and 1, then this x star would be 1. If your pdf is an exponential, then this end point is at infinity. So this is the upper end point, it is the largest value if you want that each of your random variables can take. Now there are three cases then, so if x star is finite or infinite and p of x falls off faster than any power for x to x star, then the limiting distribution is, anybody knows? No one has an idea? There's Matteo down there. The limiting distribution is dumbbell, which means that f of z is the exponential of minus the exponential of minus z. So this needs to be a cumulative distribution because it is the limit of a cumulative distribution. Now can you see it? So if z is going to infinity to plus infinity, can you see that this object goes to 1? And if z is going to minus infinity, this object is going to 0. Can you see it? So this is a proper cumulative distribution function and it is the limiting distribution in the case, for example, if your pdf is a Gaussian or an exponential, then the probability distribution of the maximum will converge after being properly centered and scaled to a Gamble distribution. Coffee? Yes? Yes. Yes. So we are now dealing with the Gamble distribution, is it better? Okay. So I will give you an example later so that everything will be clarified. The only thing that I'm saying is just I'm trying to describe the three different types of the limiting distribution that, according to this theorem, can be reached if you are considering the maximum of a set of independent and identically distributed random variables. So I'll just give you the names. The first one is Gamble. The second one is f Fréchet. The third class is called Weibull. Has any of you ever heard of these three classes? No? Okay. Yeah? So this is finite. We also want the limit to x star to be 4 of the faster than that. So this is a bit of a sort of pathological case, but it is true that if the way your pdf approaches the finite point is faster than any power low, then the limiting distribution is still Gamble. So it is not true that if your support is finite, you always get a Weibull. Good. So the second class is if x star is infinite and p of x falls off as a power low. So for example, p of x is x to the minus r plus 1 at infinity. Then the limiting distribution is Fréchet. So the formula is, let's call it f2 of z, this one, let's call it f1 of z, is exponential of minus 1 over z to the gamma if gamma is this object here. z larger than 0 and 0 otherwise. So if the pdf you sample your individual random variable from is a power low, then the statistics of the maximum after being properly centered and scaled converge to this function. So both of these functions clearly are cumulative distribution functions because these are the limits of a cumulative distribution function. This means that for z, for the argument going to infinity to plus infinity, it saturates 1 and for the argument that goes to minus infinity, it saturates to 0. So if the exponent here should be gamma plus 1, here you have an exponent gamma. It's not the same exponent you need to support. Well, it comes out from the calculation. This doesn't come out trivially from the distribution of each individual random variable because you need to scale the argument and take the limit. For example, this gumbel here, you cannot guess it from the fact that each individual random variable is, for example, Gaussian. It is more complicated than that. But I will give you an example. I'm just giving you the classification now. So now if x star is finite and p of x goes as x star minus x to some power gamma minus 1, then the limiting distribution is viable, which is defined as f3 of z is exponential of minus z to the gamma as z is smaller than 0 and 1 z is larger than 0. So far, this is just the theory. Now we can do an example just to understand how this whole business works in practice. Can I raise on this side? Good. It's actually very, very simple if you think about it. So the result is non-trivial. But once you see an example, you will understand that the situation is quite simple. So what is one of the simplest PDFs that you can imagine? Uniform, another one? Gaussian, another one? Sorry? Paolo, another one? OK, I'll do the exponential just because I've prepared it. So the exponential is quite easy, right? So exponential PDF. So we take p of x as mu exponential of minus mu x. So the support is x larger or equal than 0. So your PDF, you sample each of your random variable from is like this. Now let us compute the cumulative distribution function of the maximum of a set of exponential random variables. So we compute the object qn of x, which I told you is the integral between 0 up to x of mu exponential of minus mu x prime, dx prime, all raised to the power n. Agree? So this is exactly the formula I gave you before, specialized to the exponential case. Good. So if you perform this integration, what you get is 1 minus exponential of minus mu x all raised to the power n. So this is an exact result, which is valid for any finite n. You take two random variables, or 17 or 91. This is an exact result. You can simulate your system, even though you will need a lot of statistics, and find the perfect match between this formula and the histogram of the maximum of randomly generated exponential variables. Now we want to take the large limit of this object. What can we do? We can first rewrite this object as exponential of n times the logarithm of 1 minus e to the minus mu x. So I'm just taking the exponential and the log. I haven't done anything fancy, except that I have destroyed the mark. I would love to repay this. Excellent. So now as x going to infinity, we can expand this logarithm. So what is the expansion of this logarithm as x goes to infinity? This bit becomes small. So what is the expansion of log of 1 minus something small? Wait. The logarithm of 1 minus epsilon for epsilon going to zero. So logarithm of 1 minus epsilon as epsilon goes to zero goes as minus epsilon, right? Correct? Good. So if we apply this here, we get that this object is approximately exponential of minus n exponential of minus mu x, because the epsilon is this, which we can rewrite this n. We can bring it to an exponential form. So we can write this as exponential of minus exponential of minus mu x minus log n. So I'm just rewriting this exponential, this n as exponential of log n. Hello, n is exponential of log n. OK. So now we have this expression here. Do you recognize anything here? Exponential of minus exponential of minus something? Gamble. So here is where Gamble is cropping up, right? So if we define this object as our z, as our scaling variable, z equal mu x minus log n. So if we keep this object fixed as we send n, n, n x to infinity, then this object converges to a Gamble distribution with parameters a n and b n, which are simpler. So a n here is log n over mu, and b n is 1 over mu. Just compare this definition with the one we gave before in terms of a n and b n. So the precise statement is that the limit n to infinity of q n evaluated at the argument x, which is z plus log n divided by mu, is equal to exponential of minus exponential of minus z. That's the precise statement for exponential distribution. You see that this object only depends on z. It is a non-trivial function, so it is not just a zero one. And it is due to the fact that you are sending n to infinity, not just here, but also inside the argument of your function in the appropriate way. So if you pick different values of a n and b n, you might not get anything like this. You might still get a trivial limit, either zero or one. Is that clear? Now the power of this theorem is that this type of limits, the right hand side, is universal for a very large class of distribution. So if instead of picking an exponential, we picked a Gaussian or a stretched exponential, this pdf, we would still get this universal function on the right hand side, provided we pick a n and b n appropriately. So a n and b n are non-universal. They change from one pdf to the other. But the right hand side, the limiting distribution, is universal. It's one of these three types. Questions? We need classifications of the maximum, or the limiting distribution of the maximum for IID random variables, which is a quite powerful result. So from now on, ignorance of any of these functions will be sanctioned. So from now on, you don't have any excuses when you will find a Gamble distribution, a Fréchet distribution, or a Vibral distribution. So these are the extreme value classes that you all need to know. You cannot have a career in theoretical physics without knowing these functions. I'm not sure if this is true, but that's pretend it is. Okay? Gamble Fréchet Bible. Extreme values. You should immediately connect those to this class of problems. Good. Now, why is this all relevant? So this was not about random matrices. We just did a step back to recap. What is the situation for IID random variables? Now we would like to do the same approach for eigenvalues of random matrices. So to consider the largest eigenvalue, for example, of let's say a Gaussian random matrix. What is the distribution of this object? Now it turns out that the situation is immediately much more complicated because the eigenvalues talk to each other. So the largest eigenvalue knows something about the second largest, the third largest, about even the smallest eigenvalue. There's a long range, all-to-all correlation. So all this classification goes straight into the bin, and we need to start the calculations and the theory from scratch. Now I wanted to give you some motivation why the problem of largest eigenvalue is interesting in physics. One of the, can I erase everything here? So one of the early example where this type of questions came up is in theoretical ecology. So I will describe briefly the first paper which connected theoretical ecology with properties of random matrices. So this paper is a very famous paper by Lord Robert May. So it is in nature 238.413 published in 1972. So this paper is called Will A Large Complex Systems Be Stable. So if I'm not mistaken, this paper should be reproduced in the handout. So it's on page 24 and 25. Actually it is a two-page paper, so it is reproducing full. I invite you to read it. It's very beautiful. It's just two pages. It has thousands of citations. It is a good and pleasant read. Good. So the idea of this paper is quite simple. So suppose that you have a certain ecosystem composed of many, many species, lions, tigers, sharks in the same ecosystem. I'm not sure, but OK. Let's pretend they leave all in the same place. And let's assume that this ecosystem at the beginning is non-interacting. So we have a non-interacting ecosystem, which means you have lions there, tigers there. They don't talk to each other. They just leave in the same area, but they are separate. And let's denote by rho i the population density of the i-th species. Let us suppose that this ecosystem is non-interacting and stable. What does it mean it is stable? It means that each species has an equilibrium value, which is obtained by the natural process of birth and death. And if you perform a small deviation, so if you perturb one species away from the equilibrium value, let's denote this difference by xi, then there is a spontaneous tendency of each species to go back to this equilibrium value, simply because every species is self-regulated and the rate of birth and death just balance off exactly. So the simplest model for this type of decay to zero, so what we are saying is that xi should decay to zero at large times. So the simplest model for this type of decay is a differential equation, a system of differential equations of the type dxi in dt equal to minus xi. So we are saying that the deviation from the equilibrium value decays exponentially to zero as time gets large. Is this clear? Of course, this is just a model, but it makes sense that if you have an ecosystem that is non-interacting and stable, each deviation individually for each species from the equilibrium value would spontaneously and fast decay to zero as time goes on, goes by. Okay, now the question is what happens if we switch on interactions between the species? So if we let lions and tigers interact. Now what may imagine is a random ecosystem so that the interactions between species do not have any specific, do not show any specific pattern. So how would you modify this law in the presence of interaction? Well what may imagine, so interaction on. So what may imagine is that now this differential, this set of decoupled differential equations would be coupled in this way. The xi in dt is minus xi plus alpha summation j of a, well j can be different from i or we can just set the self-interaction to zero. So may assume that the model will get modified by adding some sort of average interaction strength between the species which is modulated by a coupling matrix between species i and species j. And he assumed that in the most general setting we can treat this matrix A as a random matrix. So we can populate the entries of this matrix at random from a certain probability distribution. And then he asked the question will this complex ecosystem, complex and random ecosystem be stable so in the limit t to infinity will all the xi go to zero again? Of course we have a system of differential equations in presence of randomness. So there is a level of randomness here because this is a random matrix. So of course it doesn't make sense, the question will this complex ecosystem be stable does not really make sense because we have an element of randomness in the game. So what we should ask, the question we should ask is what is the probability that this ecosystem will be stable? So what is the probability that all the xi's will go to zero, will decay to zero at large times? So if all the xi's which are the deviation from each species from its equilibrium value go to zero at large times then the ecosystem is stable. It means that even in presence of interactions like pairwise interactions the ecosystem is able to self-regulate itself and so you don't have extinctions for example. So you don't have species that disappear from the ecosystem just to give you an example of instability. Is the setting sort of clear, this is just a model, we can argue whether it is realistic or not but we can just move from here and try to draw some conclusions. So for example you can take A, this matrix A as Gaussian symmetric matrix, so like a real symmetric Gaussian matrix, so which belongs to which ensemble, what's the name of real symmetric Gaussian matrices? G, yes, G O E where O stands for orthogonal, why? It is diagonalized by orthogonal transformations, excellent, good, we are doing well. So how do I solve this general problem even if A were a deterministic matrix, this is somehow a linear system of coupled differential equations, so you know how to solve this, right? So what is the way to solve this system of first order differential equations, yeah, exactly. So we have to write this as a matrix equation and then diagonalize the kernel matrix, let me just do it quickly. So we introduce a vector of the deviations of each species from its equilibrium value. So we can rewrite this set of differential equations in this form, derivative with respect to time of the vector x is equal to what, is equal to alpha A matrix minus the identity because there is this self-interaction term applied to the vector x itself, do we agree? So if you multiply this object by this vector, you exactly reproduce this component-wise differential equation, excellent. Now what you do is you make an orthogonal transformation, which means that you diagonalize your matrix A, so you write A as let's say, well I call it S lambda S to the minus 1, where lambda is the diagonal matrix of eigenvalues, which are real, and if you define a new vector y equal to S to the minus 1 applied to x, you can show that this matrix equation can be rewritten in terms of the vector y as in dt of y is equal to alpha lambda minus the identity times y, applied to y, and now this is nice because we have completely decoupled all the equations. So this matrix is now diagonal and so the i-th component of the vector i only interacts with the i-th component of the vector i, there is no longer any cross term, do we agree? So this works also if A is not random, this is just the way we would solve the system of first order differential equations. Now when will this system of differential equation be stable? Yes, all the eigenvalues are negative, but the eigenvalues of what? The eigenvalues of A, yes, so this object here, which is a diagonal matrix, let's call eigenvalues the entries that are on the diagonal of this matrix, then this system of ODE's will be stable if all of these objects on the diagonal are negative, do we agree? Okay, so the condition that we want, the condition that we require is that alpha lambda i minus 1 should be smaller than 0 for all i, where lambda i are the eigenvalues, lambda i's are the eigenvalue of the original interaction matrix, which means that lambda i should be smaller than 1 over alpha, let's call it w for all i. Now the fact that all the eigenvalues are smaller than a given barrier, than a given threshold is exactly a statement about the largest eigenvalue, right? So the original problem of May's ecosystem has turned into a problem about the probability that the largest eigenvalue of a Gaussian matrix is smaller than a certain threshold. That's how the connection arises between the physical problem that we're dealing with and the statistics of the largest eigenvalue of a Gaussian matrix. So the probability of stability, so now what May found is something quite striking, okay? So if we put w here, which is 1 over the interaction strength, and we put here the probability of stability so that the ecosystem is stable, okay? Now let's try to understand what this diagram is telling us. W is 1 over the interaction strength between the species, okay? So if alpha, the interaction strength, is turned off, we are back in the previous case of a system that is non-interacting and stable, okay? So what is the probability that the ecosystem is stable in this region? 1. 1, right? Because we know for sure that the ecosystem is stable, it is non-interacting. Then we turn on some interaction, but it is weak, okay? So the interaction between the species, it is still weak. Now what happens is that in the limit n to infinity, the probability of stability is still 1, so it stays there for a while. So here we have the probability that the ecosystem is stable is still equal to 1 in the strict limit n to infinity. So the probability is still 1. Now the interaction is getting stronger and stronger between the species while I move this way. Now what May found is that there is a critical value of the interaction strength after which so to the left of which this probability of stability drops suddenly to zero. So there is basically a sudden phase transition in which in the limit n to infinity, the probability of stability drops from 1 to zero, okay? So an ecosystem that is almost surely stable, if the web of interaction becomes sufficiently strong, will turn immediately overnight into a system that is almost surely unstable. So there is a sharp phase transition. So the precise statement is that if you take the limit n to infinity of the probability that the ecosystem is stable, so this is 1 if alpha is smaller than a certain critical value or alternatively if w is larger than a certain critical value and it is zero if alpha is larger than a certain critical value. So of course this results sparked a lot of interest and also a lot of controversies. Now this was just to give you some motivation on why it is interesting and often useful to study the statistics of the largest eigenvalue of a Gaussian matrix. So we would like to know, we would like to understand this phase transition a bit more. For example, well I just say this thing and then we can have a break. So one may ask the question, okay what happens if n, the number of species, is large but not strictly infinite? What happens if n is large but finite? Any idea based on your physical intuition, what happens when you have a phase transition which can only happen strictly in the limit n to infinity and you relax this condition? So what you will have is that a sharp phase transition typically gets smoothed out and you get basically a smooth crossover between the two regions. So for large but finite n this phase transition will be smoothed and you will get some sort of continuous distribution where the range over which you have this crossover becomes steeper and steeper as you increase n. So this one if you increase n you will get something like this and then you will get something like this and then at some point pump in the limit n to infinity you get this continuous jump. So this means that if you plot the derivative of this object, so this is a cumulative distribution function if you want because it is the cumulative distribution function of the largest eigenvalue. So what happens if you plot the derivative of this object? What is the derivative of the cumulative distribution function? The probability density function, right? So if you plot the derivative of this object what you will get is typically that for large but finite n you will get some sort of shape like this with a certain typical width here. This typical width will basically shrink as you increase n because in the limit this should become a very peaked and in the limit the delta function at the transition. So what is the size of this width? We estimate the size of this typical width as a function of n, well it turns out that we can and the order of this width turns out to be n to the power minus two-third. So I will try then later on to give you a heuristic explanation for why this power minus two-third should appear in the game. So as you see when you increase n this becomes sharper and sharper so the PDF becomes narrower and narrower more peaked around the critical value, okay? Good. So now that I have sort of given you some motivation why the study of this type of problem might be interesting we will try to in the next part of the lecture to give some more introductory statement on the distribution of the largest eigenvalue for Gaussian matrices and then we will continue this afternoon. There's a lot to be said, okay? Let's come back in five minutes or I don't know, ecosystem. So I gave you at the beginning of just summarized I gave you like a crash course on extreme value theory like alpha hour then we discussed the ecosystem problems which is connected to the largest eigenvalue of random matrices. Now I wanted to give you like another crash course on large deviations now so that we can in the end combine all these tools together to analyze the distribution of the largest eigenvalue properly this afternoon, okay? So the crash course the reason I'm doing this is that because we will need extreme value theory and we will need large deviation theory to understand the properties of the largest eigenvalue and also because this example that I'm giving you I find it extremely instructive. So actually I think I understood even though I worked on large deviations for a long time I think I only understood it really when I was informed about this example, okay? So I thought that it would be my third or fourth gift to you during these lectures to just tell you what I've learned from this example which I find it very instructive, okay? So crash course on large deviations so the example from which everything will become clear is a standard random walk in one dimension, okay? So random walk in 1B so you have a one dimensional lattice with spacing 1 and we have a walker that can hop to the right or to the left at each discrete time steps with probability p and with probability q equal to 1 minus 1 minus p, okay? So if we denote by xn the position of the walker at step this position of the walker at step n is given by the position of the walker at step n minus 1 plus a random variable psi n. This random variable can take value plus 1 or minus 1 at each time step with certain probability. So the probability of psi n can be written as p delta of psi n minus 1. So with probability p our random variable will take up the value plus 1 plus q which means 1 minus p delta psi n plus 1, okay? So this is the standard recurrence equation for random walks in one dimension. The position of the walker at times n is entirely determined by the position times n minus 1 plus a random variable which can take value plus 1 or minus 1, okay? In essence this is the translation of the Markov property for random walks. Yeah, regular lattice space like the spacing is 1 and I take discrete time, okay? The evolution is in discrete time. Discrete space and discrete time in one dimension, okay? Now I'm sure you all know how to solve this recurrence equation. We can iterate it down to the first step. So we obtain that the solution xn is given by the summation k1 to n of the noise variables, right? And then we can take from this equation we can, for example, obtain that the average position of the walker will be at time n, will be just the average of k1 to n of the noise variables. So this will be summation k1 to n of the average of psi k. So what is the average of psi k? Zero. Well, zero maybe not, right? Because if p is much larger than q, your walker tends to go much faster to the right, right? Yeah, yeah, but let's say that since q is 1 minus p, so we can write that this object here, summation k1 to n, is n, let's say, p minus q, assuming that p is larger than q. So we know that p plus q must be equal to 1 because of conservation of robins. At the same time, you can compute the variance or, let's say, the second moment first, well, second moment first and minus, let's say, the square of the first moment. This is the variance of the position. So I'll give the result to you for p, q, n. This is a nice exercise if you have never seen it before. Okay, now we have a result about the mean position of the walker at time n and the variance. So maybe since the variance is finite, what is the tool that we can use to approximate the probability distribution of the position for large times? Yeah, so what is the name of the mathematical tool that we need to apply to approximate the probability distribution of the position of the walker at time n? I only computed the mean and the variance, and I obtained that the variance is finite. Yeah? Yeah, so we can apply the central limit theorem to state that the probability of the position of the walker being x, n at time n can be approximated for large n as 1 over root 2 pi times the variance, which is 4 p, q, n times the exponential of minus 1 over 8 p, q, n x, n minus p minus q, n square. So the position of the walker at time n for n large will be peaked around p minus q times n, which is the average, and it will have a distribution which has a Gaussian shape around this value with the variance that we have computed. Clearly if p and q are equal, so one-alf probability to the right, one-alf jumping to the left, then the average position will be around zero, with clearly fluctuations that are described by a Gaussian PDF. This is all standard stuff. So another way to rewrite this in a more mathematician-friendly way to write the statement of the central limit theorem, so probabilists would prefer to write this in this form. So x, n, we can write it as p minus q, n plus root 4 p, q, n times chi, where chi is a random variable and x, n clearly is a random variable. So we say that this random variable will be equal to the mean plus standard deviation times another random variable. And so the statement will be that p of k for large n will be a standard Gaussian as n goes to infinity. This is just a restatement of the central limit theorem in a form that probabilists would find more appealing. So I just redefine a new random variable which is related to the first random variable in this way, and the PDF of this new random variable will be just a standard Gaussian. Good. And now things become quite interesting, right? We want to understand large deviations. OK. So x, n, the position of the walker at time n, what is the largest value that x, n can take? No. What is the largest possible value that x, n can take? n. n. Yes, n, right, because although it is unlikely, I can in principle make n steps to the right. So in n time steps, of course, if I start from the origin, in n steps, the farthest position to the right that I can in principle reach is n, correct? So this is the farthest position that I can reach on the right. What is the farthest position on the left? Minus n. Right? Good. So here we have the origin, and here I'm floating the probability of x, n, n. So we know that here we will have some p minus q times n value, and here this PDF will look like a Gaussian deep around this average value and with a certain width, which is the standard variation that we have here. Do we agree on this? The property of the Gaussian PDF, what is the support of the Gaussian PDF? So what is the support of a Gaussian PDF? What are the values over which a Gaussian PDF is defined? The real axis, right? So in principle, although the probability decays very, very fast, you have non-zero probability for any value on the real axis. So if we interpret this relation correctly, and this is a Gaussian, then we run into a problem. Because if we try to extend this PDF, this Gaussian PDF, all the way on the full real axis, we run into troubles. Because we know for a fact that the largest value that x, n can take is not infinite. It is n. So what are we doing here? You see what the problem is. We have a statement about the central limit theorem that tells us that the distribution in the larger limit is a Gaussian, but we know that our random variable has a finite support up to n. Great, but there is a problem, because you can compute the probability that x, n is equal to n exactly, right? Let's compute this. So what is the probability that x, n is equal to n, p to the n, right? Because I need to take n steps to the right, each of which happens with probability p. p to the n, which I can write in exponential form as exponential of n log p, right? Now let's try to compute p x, n equal to n from the central limit theorem. So I'm computing it in two ways. This is the exact result. This is valid for any n and also in the large n limit, right? Which is an exact result. But now, if I replace x, n with n here, what do I get? I get 1 over root 2 pi for p, q, n exponential of minus 1 over 8 p, q, n times n minus p minus q, n square, right? And if I simplify, you see what I'm doing? So I'm trying to compute the probability that x, n takes the value n in two different ways by using the exact result, which is p to the n, and by using the central limit theorem approximation. So I'm just replacing x, n with n here. So actually you can't even get the value bigger than 1. Absolutely. Absolutely. But the decay, so what you should do is, I mean, if you take the logarithm of this probability, the logarithm of this probability density, and you divide it by n, and then you send limit n to infinity, here you obtain a very specific result, OK? If you are doing the same operation on at the level of the probability density, you get a different result, OK? So it means that this object here, on a logarithmic scale, is different in the two cases. So it means that the shape of the probability density in the limit n to infinity is only accurate up to a certain distance from the average. And this distance is precisely given by this number here. It means that when you try to extend the logarithm of this probability density away from the region where the average n plus or minus the standard deviation, you run into trouble. Because the logarithm of the two of the Gaussian probability distribution and the exact result would only match around the mean, but not around the edges. So the only point that I'm trying to make is that we need to be careful in using the central limit theorem away from the mean, because the central limit theorem is not able to describe events that happens far away from the mean, OK? So what happens here is that there will be a function that interpolates smoothly between the average and the extreme events. And this function that interpolates smoothly between the average and the tails has a name which is called large deviation or rate function. Do you know how to compute the large deviation or the rate function for this problem? Well, we can compute it quite easily by simply using one trick. Let me write here. So let's write an exact expression for the probability that xn, the probability that the position is equal to xn at times n, OK? This is a well-defined distribution. It is a discrete distribution. There is no problem at finite n, right? There is no approximation. So how can we write an exact expression for this object? Do you know how to write it? We computed the average. We computed the variance. But how can we compute the full distribution? Yes, but that's complicated. So a more direct combinatorial way of doing it. Yeah. So suppose that we have we take n plus steps to the right and n minus steps to the left, OK? So can we write equations for n plus and n minus? Probably yes, right? Because n plus minus n minus is equal to what? The number of steps I made to the right, sorry? The total n is a time variable. So n can only increase. But if I made 10 steps to the right and 5 steps to the left in the end, where do I do it? Yeah, so that's the final position that I reached, right? Is it clear? I moved 10 steps to the right, 5 steps to the left in the end. I am in the position 5, correct? This is what you wanted to say. And what is instead n plus plus n minus n, the total number of steps, right? So what I can do here is to write that this probability will be p raised to the power n plus, right? Because with probability p, I'm taking a step to the right. And all the steps are independent times q raised to the power n minus. And then I will need to add here what? I will need to add there a combinatorial factor, right? Because I can take the steps in any order and still get to the same position. Yeah, so you will have probably n over n plus. Because you will have, this will give you the number of possibilities of performing n plus steps to the right out of a string of n total steps, OK? Now using this exact expression, which is valid for any finite n, and using these two equations, it is up to you to obtain a result for a certain object. I'm telling you what this object is. So you combine this object and these two equations. And you show, so the exercise is to show that p of xnn for large n goes as, so the leading term for large n of this object is exponential of minus n phi of x, where small x is given by xn over n. So all you have to do is to replace n plus and n minus with the solution of this object and then approximate the binomial for large n. How do we approximate the binomial for large n? What's the name of the guy? Sorry? Yeah. What I'm saying is that you will have, you will have, you know, you will write it as a ratio of factorials and then you will need to expand factorials for large n, which you do with exact, yes. So you will find that the leading term of this probability has this exponential form, exponential minus n into a function of x that you need to determine. Now the solution for this function of x is very interesting. So we started with an exact result and we are taking the limit n to infinity. Now the function phi of x that you need to prove will be 1 plus x over 2, so please do this exercise because it's very instructive, logarithm of 1 plus x over 2p plus 1 minus x over 2 log 1 minus x over 2q. So this will be, this will be the result of this operation. You start from a finite n formula, so everything here is well-defined and you just take the large n limit and the leading term has this exponential form with a function that has this explicit expression. Have you ever seen this function before? Well, I want an answer, either you have seen it or not. Yes, you've seen it. How many people have seen this, have seen this before? So why is this function important? Why is this function the central object of the theory? Let's compute what happens when xn is equal to n, so the problem that we had before. When xn goes to n, so we are in the extreme situation where you've taken most or all your steps to the right, this small x becomes equal to 1. So what happens to phi when x is equal to 1? So this one would be killed in the limit, so if you technically you send x to 1. And here you will have log of 1 over p from here, which is equal to minus log p. And minus log p is precisely the object that we had here. It came from an exact calculation for finite n. So this function reproduces the exact leading behavior of this probability for the extreme event where you've taken all your steps to the right or for that matters all your steps to the left. So if you plot this function, this function will have a domain between 0 and 1, and here it will go to the function log p. Here it will go somewhere else. It will go probably to log q, so minus log p, minus log q, and then it will be like this. So we have seen that this function reproduces correctly what happens to your probability in the extreme case where you've taken all your steps to the right. But also it will give you a lot of information around this minimum here. So if you expand this function around its minimum, you will find that your function phi of x around here goes as 1 over 8 pq x minus p minus q square. So this function here has a quadratic behavior around its minimum, and this quadratic behavior is very interesting. You see what is the center of the parabola? It's p minus q, which is exactly the imbalance between the right and left probabilities. Now if you replace this quadratic behavior into here, what do you get? Yeah, a Gaussian with which mean and which variance. So p of x n, n will go as exponential minus n into this function here, which is 1 over 8 pq x, small x, but small x was x n over n, so x n over n minus p minus q whole square. And if you rearrange these terms by pulling a factor of n here outside, this guy here is equal to exponential minus 1 over 8 pq n x n minus n p minus q square, which is precisely the peak of the Gaussian pdf that we obtained from the central limit theorem with the correct mean and the correct variance. So this function here, which is called the large deviation function or rate function is an amazing object. It is the central object of the theory, why? Because it reproduces the central limit theorem, but also it reproduces the extreme event. So with one single function, you are matching the regime close to the mean, which is described by the central limit theorem on the scale of the standard deviation, but also the extreme events where, for example, your walker has taken all the steps to the right or all the steps to the left, all in a single function. So unfortunately in most of the probability courses at the undergraduate levels, there is a large emphasis on the central limit theorem, but not so much emphasis on large deviations where actually from large deviations, you can somehow reconstruct the central limit theorem but not the other way around. So if you only know about the central limit theorem, you cannot argue what's happening very far away from the mean. So for deviations that are much larger than the standard deviation. So what I wanted to argue is that the rate function, the large deviation functions should be the central objects that we should aim for because rate functions, from rate function we can reconstruct the central limit theorem, but also have information about extreme events that are not covered by it. So what time is it? 10.49, which is the end, right? Okay, excellent. I exactly did what I planned to do. So this afternoon we are trying to combine extreme value statistics, large deviations and I will try to give you an example of application of all this in a field that is outside random matrices just to make some sort of interesting connection between a random matrix problem and a combinatorial problem. So I will show you some nice slides as well. So see you at 2.30, right? Okay.