 All right, so let me thank the organizers for the invitation. So I'm going to talk about some large deviation problems for the spectrum of vigno-matrices. So as we know, many spectral asymptotics of vigno-matrices are universal, but much less is known about the large deviation behavior. And in particular, if there is a universality phenomenon also happening there. So in this talk, I will present different kinds of large deviation results for this model of random matrices. So we consider a vigno-matrix that is a random Hermitian matrix with independent coefficients up to the symmetry. We assume that the variance of the octagonal entries is normalized to be 1. We can define then the empirical aggregate distribution, which amounts for the global behavior of the spectrum. We know that if we normalize the spectrum by 1 over square root 10, then this random probability measure converges towards a deterministic measure, which is a semicircle low. So given this convergence, one may ask about the large deviation behavior. So one way to formalize this question goes through establishing a large deviation principle. So the goal is to compute at the logarithmic scale the probability that the empirical eigenvalue distribution is close to some fixed measure, which is different than the semicircle low. So the goal is to compute how fast this probability decreases to 0. So what is the speed of large deviation? And what is the constant, which is a function depending on the target measure here? So what is the rate function? So in this talk, we will focus on the question given that, I mean, for such results, how the speed and the rate function can depend on the distribution of the entries. So very few results are actually known for the large deviations of this statistics of the spectrum. So the first result was given for the classical Gaussian ensembles. So those are Vignum matrices, which have a low with density in exponential minus beta over 4 times the trace of h square, where beta equals 1 in the case of the GOE and beta equals 2 in the case of the GUE. So this beta over 4 is taken so that you have a variance equal to 1 for the diagonal entries. So what we saw in Alice Guiness Lectures is that the empirical angle value distribution of this model satisfies a large deviation principle, which is with speed n square, and with an explicit rate function, which is the following. So this rate function involves the so-called non-competitive entropy. So the point is that for this model, so the joint law of the eigenvalues is completely explicit. And this allows one to perform a Laplace method to obtain this kind of large deviation principle. But what is very crucial is to have this knowledge on the joint law of the spectrum. And it's very challenging to understand the large deviations of the spectrum in cases where this is not accessible. This said, there is actually a class of eigenmetrices where this is possible. And this class has been called the class of eigenmetrices without Gaussian tails. So those are eigenmetrices where the distribution of the entries have a tail distribution, which decreases in exponential minus some constant time t to the alpha with some alpha, which is strictly less than 2. So those are entries which have a tail distribution, which is strictly heavier than the Gaussian. So for these big matrices, Bordeneuve and Caputo proved a large deviation principle for the empirical eigenvalue distribution. They proved that the speed of large deviation actually depends on how fast the tail distribution decreases. So they showed that the speed is in n to the 1 plus alpha over 2. And they provided an explicit rate function, which is given in terms of complicated variational problem. This variational problem can actually be solved for specific target measures, which are free convolutions with the semicircular law by a symmetric probability measure. And for those measures, then the rate function is simply the alpha at the moment of nu. This rate function also has the surprising property that it is infinite for any measures which cannot be written as a free convolution by the semicircular law. So that any measures which are not of this form, actually, it's very unlikely to see such a deviation. So the reason why the speed of large deviation actually depends on the tail distribution is because of the following fact. So what they proved is that the large deviations of the empirical eigenvalue distribution is explained by deviations of order square with n of a number of entries which are of order n. So because of the behavior of the tail distribution, you get then this speed in n to the 1 plus alpha over 2. So in this large deviation regime, you have this network of entries which are making deviations of order square with n. And what happens is that the spectrum of the matrix with the rest of the entries sort of concentrate at the large deviation scale here. And therefore, the only possible deviations are around those free convolutions. Because somehow, in the large deviation scale, your matrix looks like a deformed Vigno matrix. So this strategy, however, seems to be very specific to this case of Vigno matrices. In particular, if you're looking at the case of bounded entries, you cannot possibly implement this kind of strategy. So what do we know in this case? So we have an idea of what is the speed of large deviation by just a concentration result, which is due to BNN Zetuni. So they prove that the probability that the empirical eigenvalue distribution is away from the semicircular, this decreases as exponential minus n square. But proving the existence of a large deviation principle is completely open for any example of Vigno matrices with bounded entries. But so motivated by this problem, I will consider a simpler problem, which is to understand the large deviations of the moment of the empirical eigenvalue distribution. So those are just the normalized traces of x over square root n to some power d, where x has centered and bounded entries. So we know that in this case, the normalized traces they converges to the moments of the semicircular in probability. And again, we can wonder about what is the large deviation behavior. And this will be the question I will address in this talk. Sorry. So first of all, in order to give you an intuition about the nature of the large deviations, I just want to give you an example of a simple lower bound. So assume that we are looking at an even power and that our Vigno matrix has only rademarrow variables. So sorry, what they are, rademarrow? Sorry? What it means, rademarrow? Right. So the entries are distributed with this distribution. So this is what I call rademarrow variable. So the existence of the rate function is not known. So I mean, you have those concentration results, which are very general. But computing, I mean, proving the actual existence of the limit is not known of this probability. You can always get lower bound by, you know, I mean, OK, so I don't know for general target measures, but you can probably build some strategies for certain deviations. You can build examples of lower bound. But the issue is to get matching, also, lower bound and upper bound. So let's assume we want to give a lower bound on the probability that the trace of x of a square root n to the power d is above some constant c times n. So one way to do that is to push the top eigenvalue to a level which is of order n to the power 1 over d, the top eigenvalue of x of a square root n. So if you have plus and minus 1 entries, one way to do that is to ask that you have a principal sub-matrix of a certain size, which has only once entries. And the size you have to take is of order n to the 1 over d plus 1 half. And you see that you obtain a lower bound, which is in exponential minus a big O of n to the 1 plus 2 over d. So this lower bound indicates that the speed of large deviations for those traces actually will actually depend on the power of the moment. So what about upper bounds now? So we know by concentration results that we can get an upper bound on the probability that the trace of x of a square root n to the power d takes some unexpected value. So we have this result of Mekhes and Zarak, which show a two level deviation inequality. So you have here on the right-hand side competition between two kinds of behavior. So you have a sub-Gaussian behavior in n square t square and a behavior in t to the power 2 over d times the speed we saw before in n to the 1 plus 2 over d. So you see that if you take t, which is of order 1 over n, then it's the first Gaussian term, which will win. And you see that you obtain a concentration inequality, which is going to reflect the Gaussian fluctuations of the traces. On the other hand, if you take t, which is of order 1, then it's the other term, which will dominate in the minimum. And you obtain that the large deviation speed of the upper tail should be in n to the 1 plus 2 over d, since we saw before that we also have a matching lower bound in terms of order. So there is indeed in the case where d is event, there is a different behavior which happens below for the lower tail. So the deviations below the this moment of the semicircular. And the reason is, so if you look at the event where this moment of the empirical eigenvalue distribution is below the one of the semicircular minus something, this is an event which is a closed event for the weak topology. And therefore, if this happens, it means that this empirical eigenvalue distribution is away from the semicircular. And this we saw before that this event has a speed in exponential minus n squared. So you have two kinds of behavior in the case where the power d is even, whether you are above the limiting value or below. So in this talk, I will only consider the large deviation problem of the upper tail of those normalized traces, where the speed is in n to the 1 plus 2 over d. So I would like now to state the main result of this talk. So I need to introduce some notation. So I will denote, I will introduce, so the logarithmic Laplace transform of our vigno matrix, which I denote by lambda of h. And this is defined as the logarithm of the expectation of exponential the trace of x times h. Now we will also introduce its Legendre transform. And so our main result gives a description of the large deviation upper tail in terms of this function lambda star. So what we prove is the following. So we prove that the logarithmic scale, if you, the probability that the normalized trace of x over square root n to the power d is above this moment of the semicircular plus some positive t. This is in the leading order equal to minus some variational problem, which consists in optimizing this function lambda star of y, where y is a Hermitian matrix of size n, with the constraint that the normalized trace of y over square root n to the power d is above t. So note that this is not exactly a large deviation principle. But so indeed in this generality, I do not know if this variational problem actually has a limit. But we will see later that for a certain class of vigno matrices, we can actually give a large deviation principle. So here actually, so I wrote that I assume that the entries are bounded. But actually, I can also work with matrices which have the same concentration property as the one with bounded entries. So I need to know that convex lip sheets functionals of my matrix have sub-gaussian concentration. And in particular, so you can also allow entries which have a density, for example, in exponential minus x to the power alpha with some alpha bigger than 2. This will also work. So this main result is this description of the upper tail in terms of some variational problem. Yes? No? No, yeah, it's a everything. I mean, so this is of order n to the 1 plus 2 over d. This variational problem is also of that order. So it's just like an expansion of the. So I cannot pass the limit at that stage because I don't know if this variational problem has a limit. However, so we can actually go to the limit for a certain class of vigno matrices which have been introduced by Guyonet and Husson. And this class is called the class of vigno matrices with shops of gaussian tails. So those are vigno matrices where the logarithmic Laplace transform is point-wise bounded by the one of the GOE or the GUE, according to whether you are in the symmetric real case or the Hermitian complex case. So examples of vigno matrices which satisfy this condition are given by the one with entries which are rademars, so with this low, or which are uniformly sampled in some interval. So here I put square root 3 to get a variant which is equal to 1. So for this class of vigno matrices, so we can actually prove a full large division principle for the normalized traces with an explicit rate function. So this rate function is actually very simple. So it's this function beta over 4 times t to the power 2 over d. And what you can observe is that this rate function does not depend on the distribution of the entries. So you have some kind of universality result for the level of large deviation for this class of vigno matrices. So this universality phenomenon was already observed. By Alice Guionnet and Jonathan Husson, and they proved using different techniques that the top eigenvalue of those vigno matrices also have a universal large deviation behavior. So the speed and the rate function are universal and are the same as the GOE case or the GUE case. So our result complements theirs and complements this picture that in this class of vigno matrices, the moments of the empirical eigenvalue distribution also have a universal large deviation behavior. So this result kind of contrasts with what is expected for the empirical eigenvalue distribution where we don't really expect such universality to happen. However, what may happen is that certain deviations of the empirical eigenvalue distribution are universal, but this is again completely open. So during the rest of this talk, I would like to give you an interpretation of the variational problem which arises in this description of the large deviation upper tail of those traces. So what we will show is that this variational problem actually corresponds to say that certain changes of measures are actually optimal for this large deviation problem. And what we show is that the best large deviation strategy consists in those changes of measures. So if p is the law of your vigno matrix, you may consider now the family of all its tilts, which are probability measures with an affine log density. So the density is an exponential, the trace of some matrix h times x. You have to subtract the logarithmic Laplace transform to have something which is a probability measure. And what we will show is that those changes of measures are the optimal ones. So more concretely, so if you start from a vigno matrix which has rademarrow entries, then this family of probability measures is the set of the laws of vigno matrices where the entries are also rademarrow but with different parameters. So what do I mean by optimal strategies? So in the large deviation literature, there is this prevalent idea that one can obtain a large deviation lower bound by performing a change of measure which is going to transform a rare event into a typical one. So I would like to illustrate this kind of philosophy, which is very general. So I would like to justify it in a few lines. So if you take some function on the set of permission matrices and a potential change of measure q, which is absolutely continuous with respect to your initial probability measure p. And let's say we want to lower bound the probability of this rare event where f of x is above some level t. So what you can always write is you can always write this probability as an expectation under this new probability q. And now you can see that using Jensen inequality, you can get a lower bound on this probability, which involves the relative entropy of q with respect to p. And the probability of this event where f of x is above t, but this time under this new probability measure q. So the consequence that you can derive from this lower bound is the following philosophy that if you manage to find a change of measure, which is going to make typical your rare event, so the event where f of x is above t, then you obtain a lower bound, which is in exponential minus, roughly speaking, it's in exponential minus the relative entropy of q with respect to p. So the main part of the work is to find good changes of measures, which are going to give you a sharp lower bound. So for our example of the normalized traces of x or square root n to the power d, so what we claim is that a certain family of changes of measure are actually optimal. And so this family is this one, so the family of the tills, which I denoted by pH. What you can show is that the relative entropy of pH with respect to p only depends on the mean of x under this probability pH. And so it's just equal to this function lambda star of y, where y is the matrix of the mean. So remember the picture that I told you just before. So the idea is that if you find a change of measure, which makes your rare event typical, then you get large deviation lower bound, which is in exponential minus the relative entropy. So somehow it's very important to understand now what is the typical value of the normalized traces under this probability measure pH. So under this probability measure pH, now your matrix does not have centered entries. So what you can write is that you can see your matrix x under pH as a perturbation of a centered vector matrix. I can subtract the mean and add the mean. And so what you can show is that if the Hilbert-Schmidt norm of y is not too big, then essentially what you have is that this trace kind of split up into two parts. You have the trace of x minus y over square root n to the power d plus the normalized trace of y over square root n to the power d. So you can show that if the Hilbert-Schmidt norm of y is not too big, then all the mixed moments actually vanishes in the large n limit. And then so what you can also prove is that under this probability measure pH, so you move the entries of your Wigner matrix. But if you under this assumption that y is not too big, you did not change too much the variance profile. So that here the typical value of this recent Wigner matrix is still this moment of the semi-circular. So this means that under this change of measure pH, the typical value of our normalized trace is this sum of the displacement of the semi-circular plus the normalized trace of y over square root n to the power d. So now if we want to obtain a lower bound on the probability that the normalized trace of x over square root n to the power d is above the displacement of the semi-circular plus some t, then you can perform this change of measure pH with where the mean y under pH satisfy this condition. And you see that using this change of measure pH, now you transform this rare event into a typical one. And therefore, so using this kind of bound, you can show that therefore you have a bound in exponential minus lambda star of y. So this is true for any such y, which satisfy this condition and that condition. So you can now optimize on y. And you obtain a lower bound, which can be described in terms of some variational problem. So you can also show that this variational problem is actually achieved for matrices which satisfy this kind of condition so that you can remove it in the variational problem. So this variational problem is exactly the one that I claim to describe exactly the Taylor distribution. And the main point is that it corresponds to saying that this special family of tilts of your Vigno matrix are the optimal changes of measure. So it happens that many large deviation problems follow this strategy. And those have been put together under the name of nonlinear large deviations. So where the tilts of the background measure are the optimal changes of measures? So for example, for the large deviation of empirical means, of an IED sample, this is an example of a large deviation problem where those tilts are the optimal changes of measures. There's also the example of subgraph counts in Erdos-Renegraff's. So there is a huge literature on that subject. I'm only stating the first paper, which really shows a large deviation principle for subgraph count in the dense case. So an Erdos-Renegraff, maybe let me quickly remind you the definition. So an Erdos-Renegraff is defined as a graph on n vertices where for any pair of vertices, you put an edge with a fixed probability and independently for any pair of vertices. So when you have such a random graph, you may ask about what are the number of subgraphs which look like a given graph. For example, how many subgraphs there is which are triangles or which look like stars or clicks of a certain size. And the large deviations of those objects also fall into this nonlinear large deviation theory. And so this nonlinear large deviation theory was introduced by Chatterjee and Dembeau. And the aim was really to unify all those large deviation problems and to find a criterion so that this strategy, which consists in those tilts, are the optimal ones. So let me now describe what is the criterion that they found. So their theory aimed to be quite general. So here we are moving to a slightly more abstract setting where we have a random vector on an. So canonically, you can think about this random vector as having id coordinates like Rademauer entries, for example. And now we are studying a smooth function of this vector. And the main question we want to ask is what is the last function of this vector? This is the large deviation of f of x in the large n limit. So what we saw is that so using those tilts, so using this family of probability measure, we can always more or less provide a lower bound on the probability that f of x is above t in terms of some vibrational problem. So this is provided that you have some concentration property under your tilted measure. So if f of x has enough concentration property under that measure, then you can using this strategy, you can give a lower bound, which is this one. And the nonlinear large division theory aims at understanding when is this lower bound actually sharp and when can we reverse this inequality. So Chatterjee and Dumbo found the following criterion. So they proved that if you take x, which is uniformly sampled on the discrete hypercube, then this lower bound is indeed sharp if the set of gradients of the function f is of low complexity. So this means that the gradient does not span into too many directions. More precisely, they gave a criterion in terms of low metric entropy. So they were asking for the criterion is that you can cover your set of gradients by a small number of L2 balls of radius delta square root 10. So remember that as we are on the discrete hypercube, you have a certain function defined on this hypercube. So you have two to the n vertices. And on each vertices, you have one gradient. So essentially, for generic functions f, the number of the covering number of the set of gradients will be exponential in the dimension. It will be essentially the number of vertices. So this criterion actually asks for a sub-exponential covering number of the set of gradients. We can also discuss why we are asking for a mesh, which is in delta over square root 10. And the reason is that square root 10 is the diameter of the hypercube. So you can see that if you make a narrow on the gradient, which is of order delta over square root 10, you're going to make an error between the function f and its tangent, which is of order delta. So on every convex subset where the gradient is constant, with an error, which is delta over square root 10, you're going to make an error of order delta between the tangent and the function. OK, so more precisely, what they did is that they gave a quantitative upper bound on the probability that f of x is above some level t. So they proved that you can get some error terms with respect to this rate function i of t. So the error terms are of two types. So there is this first one, which is a polynomial in those derivatives, which is very complicated. But it's explicit, and it depends on the derivatives up to the second order. And then you have another error term, which comes from the complexity of the set of gradients. So the bound is maybe more complicated than this in reality, but this is how the structure of their bounds. So what you can see is that if this probability here is of order minus vn, so if you know what is the large deviation speed, then this bound is meaningful if you the covering numbers of the set of gradients by balls of radius delta of square root n is a little o of the speed. So maybe let's see now some examples of functions f, where this bound is actually meaningful, so functions which have a gradient of low complexity. So the first example is just like linear forms. Then the covering numbers of the gradient is trivial. You can also look at an example of a function which is related to the Curie-Weiss model. So you take this quadratic form, the sum of xi xj for i different than j. So you can normalize it by 1 over n square in order to study the deviations of order 1 of this function f. So here you can prove that the speed of large deviation is just n. So the gradient at any point x, you can see that it's almost a constant vector, so that the set of gradients of this function is almost one dimensional. And as it is almost a one dimensional set, so it has covering numbers which are polynomial in the dimension. So the log of the covering number is only logarithmic, and therefore it's indeed a little o of the speed of large deviations. So maybe now an example of function which is not of low complexity. So if you look at the larger second value of the Vigno matrix, then so here, so n, so big n should be n, little n choose 2. OK? So we know for this model that for this function that the speed of large deviation should be n, little n. However, the gradient of this function is essentially the top eigenvector. So you see that if you look at the image of the gradient, it spans the whole sphere. So in particular, this has a covering number which is exponential in dimension. So it's an exponential n, and therefore this is not a function of low complexity. So a function of low complexity is a function where the gradient really spans into a few directions somehow. So what about the normalized traces of powers of x over square root n? So the bad news is that it's a function which is not of low complexity. The gradient spans into too many directions. But what we can show is use a probabilistic argument to reduce the complexity. So what we show is that we can actually reduce ourselves to the large deviations of a function of low complexity if we truncate the trace. So what we can show is that in the large deviation scale, somehow the deviations are only due to the edges of the spectrum, and therefore we can truncate the trace. So here I'm denoting the trace subscript k to be the trace where you keep only the k largest eigenvalue in absolute value. And so using concentration arguments, one can show that the part of the trace which corresponds to the n minus k smallest eigenvalue where k is a small fraction of n, this part of the trace concentrates at the large deviation speed, which is n to the 1 plus 2 over d. So that we are reduced to study this truncated trace. So we also have a condition on the growth on how many eigenvalues we are still keeping in the truncated trace. But it's actually sufficient to reduce the complexity enough. So for this truncated trace, what you can see is that now the set of gradients actually is only a subset of matrices of rank k. And therefore, this has a potential to be of flow complexity. If k is low enough, so we can actually show that this function is of flow complexity. So however, if you look at, so we should come back to the bound of Chaterjien-Dembault. So we fixed the complexity error term, but we still have this smoothness error term which makes it difficult to apply this bound in this context for our example of the normalized traces. So motivated by this example, I gave an improvement on their bound using different techniques. So what I showed is that we can remove this smoothness term. The counterpart is that I have to ask for a covering number, not only of the set of gradients, but of the convex hull of the set of gradients. So it may increase enormously the complexity for certain examples. For example, if you take a function where the set of gradients is all the unit vectors. So this has a low complexity in Chaterjien-Dembault's bound, but this has a very high complexity in the bound which is here actually because the convex hull is a cube and this has a huge complexity. So however, so introducing this convexity here makes the bound a bit neater. And for many examples, it gives an improvement. So in particular, we are able to use this new bound to derive the large deviation upper tail of the moments of the empirical eigenvalue distribution and show that this variational problem here is describing the upper tail and that it corresponds to actually saying that certain changes of measure are optimal. So to conclude, I just want to remind you so that from this estimate, we can derive a large deviation principle for those traces in the case of big pneumatrices with shafts of Gaussian tails. And this complements the result of Guyonet and Husson on the top eigenvalue for this model of big pneumatrices. So I thank you for your attention.