 Okay, so first of all, I'm very happy to be here this afternoon and speaking after so many prestigious speakers. So my talk is going to be about random matrices with some applications about 5G and the titles from Shannon to Wiener, of course. So basically, before introducing the topic that I'm going to talk about, you have to know that as I told in the beginning, Huawei has been opening a research center and we have a huge effort in our teams working on 5G technologies related to something that's going to be happening in 2020. So for people who are not familiar with that, you have to know that generations of telecommunications are around 20 years. Things have started with 2G, as you know, which is what we call voice with mobility, which started in 1990s and ended in the—it's still not—it's still going on, but you can see a lot of GSM, I would say, networks going down around 2010, roughly 20 years. And basically after voice, people thought about bringing video, and that's where 3G came in and what we call visiopheny. And that started around 2000, with basically 2020, it still continues. Of course, we came up with the fact that visiopheny was not working that well and then there was a change in bringing what we call mobile internet around that thing. Same thing, around 2010, people started working—deploying 4G technologies with the rough estimate of 20 years, meaning until 2030, and that's the rough time where at the moment we're deploying those technologies. You have to know it's roughly 20 years, 5 to 10 years before people start doing research. Same thing for 5G, it will be starting in 2020, going on until 2040. We started the research already in 2012, 2013, and it's going on with the aim of developing, I would say, the standard between 2016 and 2018. So now, a lot of research teams around the world are preparing the technologies, such as the standard point of view, which is in 2016. We have things that go in the standard for 2018, and then with all the different prototypes, developments, 2020, we have the technologies. So what was 4G, by the way? Mostly mobile internet, totally providing high-speed mobile internet to the user, and basically an internet-full IP, flat IP framework. You have to know that one of the caveats or drawbacks of 4G, it's because we wanted to have so much mobile internet, we forgot about voice, and voice was considered within 4G as nothing, not a technology anymore, but an application. And you have things like voice over LTE going on over there, with all the fact that it's not working that well, and people are trying to solve it at the moment right now. Now what is 5G about? The same thing. When you define a technology, you define what you want, and then behind it, there's the technology which is going to be doing it. So you have to define what's the aim of 5G. The same thing you have to know in 4G, the technology behind is what we call LT Advanced or YMAX Mobile, 802.60M. The same thing in 3G, we have CDMA as a technology, but different flavors. The same thing as 2G, we have IS95, we have also other technologies like GSM, which is the most known in the world. Around 5G, we're still not at the technological phase where, at what we want. And it turns out that there's a common consensus around different organization that basically these are the things that we want around the new technology we're going to deploy. We want systems which have a huge capacity, meaning providing more bits per second per hertz per kilometer square around the earth, and basically people are trying to find a way to do it. We want also that each user on a given place on earth get also a 10Gb, I would say, speed of downloads. We want also that the latency tends to be reduced to a factor of 1 millisecond. Why? Because we have many vertical industries which are looking forward to have a time response which is very low. Typically what we call the Google car or whatever, electrical car, business, and industries which are looking at cars which can be automatically piloted far away. We have also everything that is called the Internet of Things. Basically in 5G, we're looking at one possible waveform, one possible way to absorb all these links, and basically we're looking at one technology which can absorb the massive amount of objects that are communicating. You have to know that these objects in general are sending Twitter like communication, so it's not about high data rate, but it's about low data rate, massive low data rate, high data rate, and the last point is what we call the energy efficiency. Basically we need to find one technology or a couple of technologies, so this is a big discussion at the moment, which can subsume all these factors at the same time. We know that there's going to be a lot of devices, around 50 billion devices by 2020. We need to absorb them. We know that a lot of people are looking forward for higher data rates for many new applications which are related to e-learning, to holographic purposes. We have also a lot of vertical industries such as the mobile industries and other types which are looking at the fact that they can be controlled by a network, and all these have to be developed. At the moment, of course, I'm not going to give you the solution because there's a lot of discussions on many ways to do it. I'm going to give you a generic path, a general way on where research or mathematical tools can be used to analyze one key technologies which is called the MIMO technology. So MIMO, multiple input, multiple output, so I'm trying to explain it also to everybody here, so I know that the guys in the lab know things, but it's also for the people in the SOS who have no clue about it. So the same thing as before, before going and predicting what things can be done, let me go to the past and give you a bit of a flavor about, I would say, wireless technologies and how they have evolved and how we can capture new things out of the past. So you have to know that the mathematical framework of communication started roughly in 1948. 1948, basically that was a very important year for people working in telecommunication. There were two important contributions. One, a contribution here by Shannon called the Mathematical Theory of Communication, published in the Bell System Technical Journal. And basically the idea of Shannon was to say the following, if I had a model of my environment, at that time that model was called the AWGN model, additive white Gaussian noise model, then I could know the amount of information I can transmit from here to here without error. At the same time, Wiener, I don't know why this was a French publisher, but he published it in a French publisher called Cybernetics for Control and Communication, the Anima of the Machine, had the idea of saying the following, well, in general, it's very hard to have a model of my environment. Why? Because things are varying, there's a lot of changes. So basically what I'll do is through a process called feedback, I will try to shape my input such I can target my output. In that case, of course, you don't have a notion of zero error, you have a notion of outage. Okay, at that time that outage was called the MMSC. Minimum mean square. And basically through an analog feedback to a process, you could develop exactly the target that you wanted to this process, okay? So what happened now, 60 years later, which is around the year 2000 and after, well, instead of having in our system, and this is exactly what we have in 5G to the fact that we have a multiplicity of inputs, we have a framework where we have many, many inputs here. And the same thing, instead of having one output, we have multiple outputs. Of course, this could be a lot of base stations, a lot of small cells, a lot of femto cells, transmitting to many users here, okay? Of course, these boxes can be all related or not together, and all the users here can be or not related between them. If they are related here between them, then it's a multiple antenna system, meaning you have many antennas on the device, transmitting to many antennas on another device. But it could be also many boxes at home that you have, which are connected to an ADSL line. And you can have also many users here which are connected to what we call device to device communications, okay? And basically you can have also some feedback from one of the outputs here to one of the inputs here or from one of the outputs here to some of the inputs here. So typically in a classical cellular system, you have to know that this is a base station, this is a user in a cell, this is a base station, this is a user in a cell. And this user feeds back automatically here. But of course, you get also signal here. So you have many ways of doing it. One of the big problems we have to control to understand how much MIMO can bring us in 5G is to understand how much information I can transmit from here to here with the constraints that have been put in 5G. What are the constraints that have been put in 5G is that you have mobility. So basically you need to transport a certain amount of information from here to here within a fraction of time. And that's basically because the box here moves across time. So for people who are familiar with that, this is a channel. And with finite energy. What does it mean finite energy? It means the total energy you can use to transport this information from here to here is limited. And basically when you have that constraint, you have to know that things change radically. I'll give you an example that we do in communication. In telecom, what we do to transmit information from here to here, there's one fraction of your time before your transmission that you dedicate to training. It means that you send a certain amount of energy to train and estimate the box here. And once you estimated the box, the rest of your data that you have here, you recover it because you have a good estimation. Now if you have a lot of entries here, be careful, and you have a limited number of energy, then the strategy is bad. And there's also an optimal number of antennas that you should use. Why? If you have too many inputs, then basically you are sending peanuts energy per input. So once I send my training sequences, I have a very bad estimate of my channel. So when I transmit my data, I make a lot of errors. So typically the story tells you that there's a trade-off between exploration and exploitation. Something people know in classical learning problems that basically the optimal number of antennas that you should use, the optimal bandwidth. Be careful, this can be also bandwidth for a lot of people. There is also a limited number of bandwidth that you should use because the more bandwidth you have, the more energy you need to transmit to estimate all those degrees of freedom that you have there. Of course, the way I gave you right now to transmit information is not the optimum way. We know that basically if, for example, this box changes every time, then for sure you should never send a training sequence. You should do what we call non-current communication. And basically what it means, it means you have to do strategy, which is called on-off keying. Either you transmit or you don't. And at the receiver, there is a detector of energy. So the fact that you split your transmission between a training and communication is not necessarily the best way. And for a given, I would say mobility pattern, determining the best way is also a problem. OK, good. Now, I will solve one case because what I told you is the generic problem in its globality. I'm going to solve one case, which is very specific. It's the case where all these are related and all these are related. This is called the MIMO, this classical MIMO system, multiple input, multiple output, which is also called the multiple antenna systems. And basically when all this is related, all this is related, I'm going to ask myself, if you are in a linear system, so y equal wx plus n. So of course, what I showed you before is also related to nonlinear medium like fiber. But I mean to be very specific and go step by step. Suppose that it's linear and suppose for some reason that you know this w here. So there is no need to send a training sequence. There is a genie telling you the box here. Then there's two approaches to know how much information you can transmit from here to here. And these are results developed already around 1995 by a guy called Emory Tillator. The story tells you, well, if you want to know the information that you can transmit from here to here without error, well, it's the differential entropy of the vector y minus the differential entropy of the vector y knowing x. Since w is known, well, the only source of uncertainty when you calculate the differential entropy of the vector y knowing x is the differential entropy of the noise. Is that clear? So if I suppose, because I don't want to go into the details, if I suppose that x is Gaussian and n is Gaussian, in fact, it turns out that this is the best way you can do for transmitting information. But I don't want to spend time on that. But suppose that x is Gaussian, then basically the differential entropy of the vector y is nothing else than the log determinant of pe r y. And the determinant and the differential entropy of the vector n is log that pe r n, which is the covariance. Now, this is the covariance of y and the covariance of the noise. The story tells you that the amount of information you can transport in a box, in a medium, is only related to the covariance of the received vector and the covariance of the noise. More specifically, it's related to the eigenvalues or eigenmodes of the covariance of r y and the covariance of r n. You can see it also from the point of view of Wiener. Wiener would tell you the following very easily. Why is a multidimensional vector that I'm receiving? Is that clear for everybody? My multidimensional vector is spanning a certain sphere. I'm transmitting the vector w x, y w x, because I know w. So it turns out here that it's w x, and I have a certain noise. So the number of w x that I can transmit is the number of little spheres that I can pack in the big sphere. OK. What is the volume, or what is the space the big sphere is occupying? Well, it's the covariance. What is the volume? It's proportional to the determinant of the covariance. So the volume of the big sphere is proportional to the determinant of r y. What is the volume of the little sphere is proportional to the determinant of r n? How many little sphere I can pack in the big sphere the ratio? What is the rate? It's the log. And of course, both formulas are the same. So the idea now of Wiener, as you can see here, that was the approach of Wiener taken back after by Shannon, tells you that if you minimize here what we call the error, you see, y minus wx is the error. If you minimize at each step the error, you can increase the rate until you have the noise. So minimizing the mean square at each step increases your rate until you get exactly the capacity. And we'll see that there's a link between the capacity and the MMSC. Now let's go specifically to the case where y equals equal wx plus n. Then basically you can write this formula in a very easy manner. And the story is the story tells you that if you write it, ry is given by this, rn is given by this, and the rate is nothing else than the log determinant of identity, 1 divided by sigma square, wwh. Wwh here, h here is transpose conjugate, and wwh here is the gram matrix associated with w. The rate at which you can transmit is nothing else than related to the eigenvalues of this box here. Okay? This you have to capture. Okay. Now let's go back to history and try to understand how am I able to understand the rate at which I can transmit in a medium for general boxes. It turns out that for people who are more specialist in physics, there's something called the Schrodinger equations, which is quite known, where you have here an operator, which is the Hamiltonian, where you have phi i, which is called a wave function. E i is the energy level on which this wave function goes, times phi i. So of course, for people who are not in the operator business, this is a matrix. This is, I would say, an eigenvector. This is an eigenvalue, and here you have also an eigenvector. So one of the big problems that you have in the Schrodinger equations is to solve this equation and find the energy levels on which goes these electrons. Okay? That was a big problem to solve, especially when you have heavy nucleus. Okay? It turns out that the guy called Wigner had the smart idea of trying to solve that equation by asking himself the following. It's quite very, it's quite difficult to solve the Schrodinger equation for a very specific interaction of all the different components. He said, what if I replace the matrix by a random matrix, which has the same properties? What it means the same properties? You have the spin, which is minus 1 plus 1. Okay? So it's symmetric because the interaction I have here with another one is the same that I have here with me. Okay? So you have zero on the diagonal, and you have plus and minus ones, which are flipped randomly with the property of one half. And it's symmetric. And he asked himself, well, what are the eigenvalues of this matrix here? Instead of asking himself, what are the eigenvalues of this matrix, he asked himself, could I find any result on the counting measure of these eigenvalues? It turns out that if you take MATLAB and you plot the eigenvalues of that matrix and you start increasing the size, it's quite incredible. You have something called a semicircular law. It means that the eigenvalues, as the size increases, the counting measure, so we call the empirical eigenvalue distribution of that matrix, tend to converge according to a certain type of convergence. We'll talk about it to this semicircular law. What's very neat is that the eigenvalues are between minus 2 and plus 2. They are not outside the support. You find them all in the support there. So what it means, it means that there's some kind of averaging effect that although the entries can be anything, we'll see to which level they can be anything. In fact, you can reduce the assumption on this matrix here, and it took some times, where instead of having minus one plus one, any random matrix with zero mean variance one plus some constraints on the fourth moment, but each year they're going down on the moment's assumption, we'll have this specific result there. So what it means for telecommunication, of course, for telecommunication, it means that we are able to compute and know the rate that you can transmit here to here with some very generic boxes. We don't need to specify exacted boxes. As long as all the generic transmission environments fulfill that condition, then I can tell you how much rate I can transport in a network. Of course, there is no reason why my box is symmetric. So this result will never apply just to give you a thing. What's the idea for people interested in the proof? It's quite easy. I mean, I'm giving you the combinatorics approach. The combinatorics approach is quite at least straightforward, but then you have to be good in calculus. If you calculate one over n of the trace of a matrix, what is this trace? It's the sum of the eigenvalues. So it's nothing else than the integral of lambda, dfn of the lambda, where dfn is the empirical eigenvalue distribution. It's the counting measure. Just count the number here. It's a lambda i, I'm sorry. What is one over n of the trace of h square? It's the sum square of the eigenvalues. What is one over n of the trace of the power of hk? It's the integral of lambda k, dfn of lambda. So if you're smart enough, you can first calculate the expectation of all these terms when n goes to infinity and see if you get some kind of generic terms. If you can get some kind of generic terms, then from the moments you'll be able to calculate these distribution up to some condition called the Karlmann condition. Then you take off the expectation and then you do also the calculus to see also if you have some stronger result in terms of convergence. It turns out that the guy who did it was Catalan. And Catalan, for that specific case, I mean, so Wigner, of course, used the result, but it's known. If you calculate one over n of the trace to h to the power of 2k, you can show that you get this value. And if you calculate the odd moments, 2k plus 1, they vanish. And it turns out that the only distribution which has those moments is the semicircular law. So you have one way of proving it. Now, of course, if you want to work more on the field, what you have to do, you take any other matrix which is here, Hermitian or symmetric, you calculate the moments. And if you're good in those calculus, you try to prove some convergence of the terms and find the distribution of this empirical eigenvalue distribution. Of course, for that, you need to be very good and very good in calculus of moments. You have to know, as I showed you, that in 1958, that's exactly the result that was proven by Wigner. It was not related only to minus 1 plus 1. But if you took any standard Wigner matrix, so you took here a matrix with zero mean variance 1, and you had the fourth moment bounded, then you had this result, which was true. It turns out that since that time, we have here 2 plus epsilon constraints. Now, things have been moving a bit further in our community and showing that you need less constraints to have this convergence. Another case, and we'll see how things are built. If you take a matrix which is not symmetric this time, you take any matrix here, n times n, in which you fill the entries with minus 1 plus 1 randomly. If you look at how the empirical eigenvalue distribution distributed, you'll see that it is uniformly distributed on the unit circle, on the unit disk, sorry. It fulfills uniformly the unit disk. In fact, I didn't say it, you have an almost sure convergence of the empirical eigenvalue distribution towards this thing here. Now, of course, same thing. Many constraints in terms of results showing less and less constraints. You have to know that one of the guys who proved this is a guy called Girko. So for people who know a bit Girko, it's quite surprising. This guy is able to prove lots of theorems. He's not so well accepted by the mathematical community because basically his proofs do not follow the classical way that the community uses. He's from Ukraine. And by the way, he still doesn't have a position. He goes from place to place every time on a short-term position. And because some of his papers you'll see were not accepted, he even opened his own journal to publish his papers. So of course, if you have a problem in getting accepted your papers, just create your own journal and then ask some friends to publish over there. I think that's a way. Now, of course, those were asymptotic results. As you know, we work in the finite realm. It turns out that there's a lot of interesting results related to random matrices. The first is what we call the distribution insensitivity, meaning that the asymptotic distribution does not depend on the distribution of the independent entries. You can take it more broad, and this is very good for us because in general when you do channel modeling, we need the least assumptions on the environment. The ergodicity, the eigenvalue histogram of one realisation converges almost surely to the asymptotic eigenvalue distribution. The almost sure convergence is obtained in many cases. I'll show you to which level it's done. The speed of convergence tends to be also very good. We have eight which is equal to infinity, meaning that with the number, which is this is called central limit theorems, you can show that the speed of convergence turns out to be not so bad. And if you decide one day not to use the asymptotic approach, but the finite approach, it turns out to be a real mess. I've tried it many times. If you try to analyse the thing in the finite case, if it goes outside the Gaussian case, then it's very hard to calculate any kind of distribution. The things can get very messy. Okay. When will you calculate this, calculate the distribution in which sense? Meaning having an explicit form. You calculate an explicit form of the distribution when it's not Gaussian. So typically you ask me, in the finite case, give me the limit, I mean the empirical eigenvalue distribution or the eigenvalue distribution of this matrix. For a specific case which is not Gaussian, then it tends to be very messy. Whereas in the asymptotic setting, you don't need Gaussianity. You do whatever you want. You get still the... Yes, there will be some kind of integrable system looking behind issues. So already when it's Gaussian, you get into not in this case to the case with Laguerre polynomials and it gets very, very messy. And if it's not Gaussian, it's even more. That's what I'm saying. So you get some very thing. And in fact, you don't necessarily get the distribution. You get the Fourier transform of the distribution in general, which is already a problem because then you have to go back and it's really tedious. Okay. Now let me take this case, which is of interest for communication. Remember, in the initial problem I showed you, we were interested in a case which was w, w, h. That was the case I was interested in. I was not interested in a Hamiltonian matrix. I was interested in the case where you have a matrix times another one. Okay? Where you have k here and n, meaning that you can have more inputs than outputs. Okay? This in the case where it's Gaussian, okay, a zero mean variance one is called Wichert matrix. In the classical sense, it's nothing else than the gram matrix associated to h that I'm writing here. You have to know that in the case where k and n are big enough, okay, but the ratio tends to a constant alpha. So k is big, n is big, but k over n tends for a constant. We have an explicit form, and this is called a Martian-Coupastor law. We have an explicit form of the distribution, and it's something that is explicit and is given by this formula. What's interesting is not the formula, there are two things about the formula which are interested. The first thing is that the formula depends at the end only on alpha, meaning the ratio between k and n. That's the first thing. The second thing which is interesting is that this distribution has a compact support, and the compact support on the left is lower bounded by one minus root square of alpha square, and upper bounded by one plus root square of alpha square. To give you an example, it's something like this that you get. Same thing. When the dimension of your matrix increases, but the ratio tends to be a constant, the empirical eigenvalue distribution converges, and you can show that it's an all-motion converges towards the Martian-Coupastor law here. So very neat, and you have an explicit form of that. Of course, be careful. Some people are talking about numerics. When you multiply this matrix which is fully IID by this matrix which is fully IID, it doesn't give you the identity matrix. Although, if you decided to implement it on things, you have to be careful. So many times where I ran my program, you have to know I made a lot of mistakes, because if you store first what you had, and then compile the eigenvalues, you'll get one. Why? Because you multiply something big by something very big, you get here a zero. You multiply something big by something big, sorry, something big by something big, you get a one. Is that okay for everybody? Because this is an n times k matrix with IID elements, zero mean and variance one over n. Take minus one plus one, flip them, it's not a problem. So you take minus one plus one here, you get a one, you get a zero, you get a zero, you get a zero, you get a zero asymptotically, and the same thing, and you get something which is identity. But you have to know that the terms which are outside the diagonal are small, but the sum, you have n square terms, which go towards zero at the speed of one over n square, the sum is not going to zero. Okay? But so numerically, if you decide to implement a system which is big enough, you have to be careful, because then you'll get some wrong results related to the mathematics. And this gave me a headache a couple of years ago, because I thought my formulas were wrong, but it was the processor when I was running the thing which was wrong. So you have to be careful about these rounding effects. It depends how you write your code. If you store first the things, then you have this rounding effect, and it puts to zero, and then you do. But this is a trick you have to be familiar with. And a particular case of machine-couple-stour is the semi-circle, right? When you take an notation, it's alpha equal one or something, it's alpha equal one in notation. So alpha equal one, no, if alpha equal one, because if alpha equal one means that this matrix is square, and you get h a transmission, but it's still going to give you a new machine-couple-stour. What does it give you? Because then it simplifies writing the particular code again, the machine-couple-stour. Yes, first term is zero. Yes. And second term is code, or gives absolute value x minus a. It's going to give something which is bounded between zero and four. That's the bound, and it's going to give you four. OK, so let us continue now. Of course, to solve the more complicated cases, people do not use the moment method that I showed you before. I'm doing this. They use something called the Cauchy-Stilgis transform. So the Cauchy-Stilgis transform, while it depends first if you are on the European side or the American side. In the American side, they always call it the Stilgis transform, and in Europe they call it the Cauchy transform. Here we'll call it the Cauchy-Stilgis. There is always a fight between, but let's call it the Cauchy-Stilgis transform here. So what is the Cauchy-Stilgis transform? Mathematically, it tells you that if you have here a measure mu, the Cauchy-Stilgis transform is nothing else than an integral of 1 over t minus z, mu dt here. That's the definition. And of course, if you do the transform, there should be an inverse. And by taking here the imaginary part of this Cauchy-Stilgis transform, you can get back to the measure. So this is very abstract what I'm saying, but it turns out that the Cauchy-Stilgis transform, as you can see here, because of the t minus z, is nothing else than a moment generating function, up to. If you see it here, if I write it here, you can have this term here where these are the moments. So what the story tells you is instead of calculating the moments each time, it's much better to calculate the Cauchy-Stilgis transform on one shot, and then you get the distribution. You're not going to calculate all the moments. That's the first thing you have to capture. Now, second thing, which is very important to telecommunication, we'll see, is that the Cauchy-Stilgis transform has a given meaning. And we'll see what is the meaning. Mathematically, not telecommunication-wise, mathematically, the meaning is what we call the trace of the resolvent. So what is a resolvent? You take a matrix. You calculate w minus the perturbation. I'm saying in telecommunication this is very important, because the people who are working in telecommunication, they'll see that this is related to the MMSC receiver very rapidly. But without going into what we call the MMSC receiver, talk about just the resolvent, and you have here a matrix minus a perturbation to which you're doing it. Now, if you have what we call the empirical eigenvalue distribution here, then if you take the integral of dfn of x, x minus z, where this is the counting measure, you write it. You get one over n of the trace of the resolvent. What it means, it means that the Cauchy still just transform is nothing else than one over n of the trace of the resolvent. So to calculate rapidly the distribution or the eigenvalue distribution of a matrix, you only need to calculate one over n of the trace of this matrix and see how this thing behaves. And that's how they do it. Now, this is the first trick you have to do. And of course, once you have this, why you invert it? Because you remember, this has an inverse and you can calculate it there. We'll see, of course, that in communication, we never need to invert it. Why? Because this has a very specific meaning. And this is what we call the SINR at the output of an MMSC receiver when z here is minus sigma square, which is the noise variance. Good. Now, of course, without getting into more result, let me go now after the 50s to something that happened in the 80s in the field of random matrices, and for which we started to have more complicated results. Why we had more complicated results? Because of course, when you look at what I gave you at the beginning, it was always zero mean, variance one, stuff like this. And it turns out that we had some cases where the mean is not zero and it got more complicated. And now, the question that one can ask, and this is a guy who's called Vojkulesku, who was a professor in Berkeley in the 80s, who opened the field. The question that you can ask yourself is if you have a matrix C, which is equal to A plus B. So typically, remember you had here a W, which is your random matrix zero mean, to which you add a deterministic component because you need something which is non-zero mean. And you ask yourself, if I know the eigenvalues of A and I know the eigenvalues of B, can I know the eigenvalues of C? It turns out, of course, that it's not possible. Why? Because you need also to know the eigenstructure. Because depending on where your matrices point their energy, well, if they point on the same direction, then of course you have a constructive effect. If they point another direction, something else happens. Now, it turns out that there are some cases that Vojkulesku showed where if you calculate the moments of C, which are related to the eigenvalues, one over n of the trace of Ck, where by calculating one over n of the trace of Ck, meaning those moments, you remember, these are the eigenvalues of C, you have something which depends only on the moments of A and the moments of B, but not the joint distribution, only on the marginals. When you get this effect here, meaning that when it happens, well, basically what you have is something called freeness, meaning the two matrices A and B have de-connected eigenvector structure. You have freed the eigenvector structure. Their eigenstructure do not depend on each other. Of course, you can have this effect for the sum, but you can also have this effect for the product. A times B. Now, of course, I don't want to get into the details of this theory. It has nothing to do with independence. Be careful. You can always have two matrices which are independent that you generate independently, but which point on certain directions, because that's the distribution. Freeness is exactly about the fact that you have an effect where all the moments of the matrix C here, or more complicated, depends only on the marginals. So how can you have some matrices which are free? It's very easy. Well, de-connect the eigenstructure. So if you take a matrix which is An, the termistic, or anything else, and you take a matrix B, if they're not free, well, just multiply the matrix here and here by a unitary invariant matrix. If you do that, then these two matrices are free, because then due to the fact that this matrix is unitary invariant, the eigenstructure does not depend. And I'm able here, for example, to calculate the eigenvalue distribution of A plus theta Bn theta h, if you want, even if I was not able to calculate An plus Bn. Now, of course, there is a lot of application of this theory, because once you do that, you can understand that you can calculate, and we'll see some examples, the concatenated product of matrices, many sums of matrices. So many matrices which have more and more structure, which are related to our problems of communication, and we'll see some of the cases which are of interest for us. OK, let me now go to communication, because all this theory that I've been showing, and of course, there's much more results, has some application. So this is typically for people who are not in the communication realm, a MIMO system, meaning that you have a transmitter with many antennas. Sorry if it's on the left, I'm sorry. Going to transmitting towards here with a receiver with many antennas. The input-output relation is exactly what I showed you before. You have a vector y, which is the convolution of what you transmitted here by something called the channel plus noise. I'll jump on this. Usually, you have to know that the basic models that people use in communication assume that the matrix that you have here is IID 0, we'll see that there are much more cases, but many applications in communication that was the starting point of the MIMO, I would say hype, considered that the entries of your matrix that I'm considering is IID Gaussian 0, mean variance 1. The main reason, of course, is that, and Sidiric told us about this this morning, is if you apply the maximum entropy theorem, meaning that you assume that you know nothing about where the transmission took place. And that's basically what happens when you don't do measurements. You have no clue where the transmission took place. But you know that your channel has finite energy, because of course, the energy does not grow, which is a very thing. Then it's very simple to apply the maximum entropy principle to showcase that the distribution that you should get is a Gaussian distribution with IID entries. And that's one of the reason this IID model became such a big hype. It's because it's the one which makes the least assumption on your environment. Of course, the more you have assumptions on your environment, meaning that you know that there's a wall, that there's chairs, that there's people, you can, of course, and this is a big, I would say, framework in maximum entropy methods. And there's a lot of maximum entropy engineering methods where they start adding more and more constraints on your problems to find what is the distribution which is mapped to the constraints of your thing. But let's forget about this. The most important for me is that now, I suppose, that w here is IID zero mean Gaussian. And you remember, this was the rate. Now, if you go back, this is the rate. And the rate is nothing else than 1 over n, the sum of k to n, log of 1 plus lambda k, the eigenvalues divided by sigma squared, clear? By the way, this is why people believe a lot in MIMO. It's because thanks to the fact that you sum up the rates, you're able to increase the classical Shannon rate. For people who are not familiar with the MIMO hype, this is why people get so excited in 5G, but also in the latest 4G releases that the more you add antennas, this is the antennas, the more you add what we call multiplexing, space multiplexing gains every time. But of course, as you can see here, everything depends on the lambda. If the lambda is equal to 0, then basically, you don't get any more new mode. Now, remember when n goes to infinity and k over n goes to a constant, well, this thing is nothing else than integral, log of 1 plus t over sigma squared, the limiting eigenvalue distribution. And the list limit is nothing else than the limit of the empirical eigenvalue distribution. Now, what's exciting about this formula is that if you derive the capacity with respect to 1 over sigma squared, it ends up that the derivative of the rate is highly related to a mathematical concept called the Cauchy-Steel Distance Transform. What I'm saying here is that the derivative of the rate in a MIMO system in communication is a mathematical concept called the Cauchy-Steel Distance Transform. So deriving the Cauchy-Steel Distance Transform of many, many MIMO systems is of big interest because of this huge link between the two. Second point now. So before we did, there was a statement that you don't understand, a result of finite energy knowledge, I think I've seen what you mean by that. Yeah, meaning that you assume this. You see my assumption, what I did is two things. I assume that there's nothing. And the only thing I know is that the expectation of the energy is this. So what I'm saying is that if you want to derive some kind of rate in a given environment for which you know nothing, except the fact that you know that there's finite energy, then the model which makes the least assumption from a maximum entropy principle is the IID Gaussian model. And this is one of the reasons there is a centrality of this IID Gaussian model in all the papers that you see in our community. Here, energy being square. What does this square correspond to physically? So you mean here? Yes. So yeah, bum, bum, bum, bum. So the Hij's are these links. You see, this is H11, H12, Hn. So the link here, this is the link that you have related to environment. In general, this is, it depends. I mean, for the moment, this is nothing. It's just a link. Now, the value that you measure here in general, and depending on the distance, this is the path loss. The energy, the value of that which is related to the path loss is the energy. So the square of this is the path loss that you get. And if I'm clear? Yeah? OK. If not, then you just ask me another question. But OK. So what I did here for your information, I took this Cauchy-Steel-Rich transform of the Martian-Coupasture law, calculated this. And if you do that, you can get an explicit formula of the rate. You see? This is the rate that you get. You can get an explicit formula of the rate in a MIMO system. And the rate is given by the ratio k over n, number of transmitting to receiving antennas. And it depends only on the SNR, because this gamma is relative to here. It's not important to know what the expression is, but just to tell you that you're able to compute the rate of a MIMO system in a very efficient manner. OK? Now, of course, this is just to give you an example. Reality, we've been doing measurements, is much more complicated. In general, taking a fact that you have some buildings, you have some what we call scatters, it turns out that you have models for which, from the transmitter to the receiver, you have some reflections before the wave arrives here, because it bounces on something here, and then it will bump, and then it will bump before it arrives. In general, the models that people use are what we call the Kronecker model, or much more complicated, which just tells you that my matrix is not IID 0 mean, but there is a matrix before and a matrix after. It's the product of three matrices. OK? Why? Because you have a scattering effect when the transmitter bounces on the scatters, then it goes around many things, and the receiver, the receiver sees the scatters. So your model has a product of three matrices. What's interesting here also, by using the same techniques that I did, so it's not anymore the Martian-Coupasture law, you can derive explicitly the rate, and you can have some very neat information not only on the rate, but something called the outage capacity. It means that you can derive a central limit theorem telling you that the rate asymptotically, when the number of antennas increases, goes toward a certain value and a certain fluctuation. This is called the outage. OK? And to give you just an example, so here, I mean, we can jump, but you can derive it explicitly, what's interesting is that we did a couple of years ago some measurements at 2.1 gigahertz with an antenna of bandwidth 100 megahertz, with an antenna with eight elements at the transmitter, and 32 elements at the receiver. But be careful in what I'm going to show you. We used only eight elements here. It's an 8 by 8 MIMO systems. And what the theory tells you, so this is for people who are not familiar, this is a batch array antenna. So you have one, two, three, four, five, six, seven, eight. And here you have one, two, three, four, five. So this is your antenna, six. So here, eight times four. OK? So we used only the four here and the four here. And what's interesting, so here typically what you do, this is the receiving antennas, and this is the transmitting antenna that you move. And you do measurement in different environments. And you capture exactly the different channels. What's interesting, of course, is that it turns out that by mapping up the number of scatterers and looking at them, you have a closed mapping. So here, this is different scenarios, indoor, urban, area, atrium, things like that. If you calculate the cumulative distribution and you look at your formula and tweak them, you have a matching of both. And it matches exactly what you have there. OK? Other point, time is running, other point. Let's go back now to the receiver. OK? Typically, for people who are not familiar to signal processing, let me take a generic model where you have a vector y, which is equal to Ws plus n. So W is my matrix, which is n times k. My emitted vector is k times 1. And this is an additive white Gaussian noise. In general, when you transmit, your goal is to retrieve the emitted signal here. OK? So this is a signal which has been transmitted. And your goal is to recover it. OK? When you recover, you have this vector which has a lot of components. The components, I split them into s1 and the rest of the vector x. And here, I have u and big u. OK? Now, to recover it, it's not so simple because you have y equal Ws plus n. I take the first column, s1, plus ux plus n. OK? My goal is to retrieve s1, which is what the first antenna is transmitting. OK? So what I do here is I write it as, of course, for people who are interested, you could do y minus Ws and test all the cases. This is called the maximum likelihood receiver. In general, it's very complex because it depends on the constellation of s here. The more the size is big and the more the number of antennas is big, the more complicated it gets. So if you have a BPSK and you have an antenna, it's 2 to the power of n cases that you have to check. OK? So here, what you do in general is you try to do a receiver which tends to recover. So one classical receiver, what it does, it tells you OK. If you're interested in s1, you consider all the rest here as interference. This is called interference plus noise. This is the noise. And the rest of the other streams which are arriving towards you are called interference, which you can write as us1 plus n prime. Now, the noise that you have here, the new noise, because it's interference, is not white. Why it's not white? If you calculate the expectation of n prime and prime h, well, it has a certain structure because there's interference due to the streams. So one first thing that any good signal processing expert engineer does is to whiten the noise. Every time they like to whiten the noise. So the first thing that you take, you look at the covariance. You take the root square of the covariance. And when you take the root square of the covariance, you multiply it here, you see. And you multiply it here. That gives you this new signature plus b. And one of the good things about b is that it's white. I'll explain why it's white. This goes back to Wiener because the idea of Wiener was to say that in a white environment, the match filter is optimal. And everybody likes to go back to the match filter as the processor when the noise is white. So the first thing you do is you whiten the noise here. Now, if I go back to my model, you see this is what you get, S1 plus b. Now, it turns out that this theory goes back to Wiener. The SINR is maximized when you have white noise by the match filter. So if I call this J, I just do the match filter. And I get this. So at the end, if I look at the receiver, and everybody, at least in my lab knows this, if I look at all the different steps I did, the linear receiver which maximizes the SINR is nothing else than what we call the MMSC receiver, Minimum Me Square receiver. This is what I put here. Of course, for people who are familiar with that, you have the unbiased and biased MMSC receiver. They are the same up to a scalar constraint. And the bias is not a very big issue, but you have to be careful that one is with W and one is when you extract the column. So this has, of course, properties of implementation, depending if you take off the column or you keep always the same inverse. So of course, people tend to implement this because you don't need to invert it every time. This is now my goal here. My goal was now to calculate the SINR. So if I look at the SINR at the end, the SINR is my signal of interest, the J that I put here divided by the J. If I look at the SINR at the end, it's a quadratic form. Everybody sees the quadratic form here? UH multiplied by UUH, the other column of my matrix, sigma squared identity plus U here. It's a quadratic form. Good. Now, the SINR is a quadratic form. It turns out that you can show it's not very difficult that if this vector, meaning here, this vector is independent from the other signatures, okay? Then you can show then this quadratic form. So if U is independent of A, then this quadratic form is close to one over N of the trace of A, okay? This is called the trace lemma. So what it means, it means that my SINR is nothing else than one over N of a trace of something here. And for people who followed me previously, this is called a resolvent, one over N of the trace of a resolvent. So what it means, it means that the SINR in communication, the SINR in communication is exactly a Cauchy-Steeljust transform. Very spectacular. Meaning that the Cauchy-Steeljust transform has a very specific meaning in telecommunication. It is the SINR at the output of an MMSC receiver of a MIMO system, of a multiple antenna system. So that's why a lot of people have been spending time using random matrices to calculate the Cauchy-Steeljust transform of many, many models because that gives you exactly the performance of the MMSC receiver. Now, if I summarize, the SINR at the output of the MMSC receiver is the Cauchy-Steeljust transform at the point minus sigma square. The derivative of the capacity is something related to the Cauchy-Steeljust transform. So the derivative of the capacity is very linked to the SINR at the output of the MMSC receiver. You have to know that these links have been brought up recently in 2005 in many papers in our community showing that there's a big link between Shannon and Wiener. So I showed you there's a link between Wiener and Shannon, but there's also a big link between Wiener and Shannon because, of course, every time you're able to calculate the SINR, you're able to calculate the capacity because of the strong link between the two. And effectively, you have to know that because of that, there has been many generalization of what I've been doing in even rethinking the way what we call the water-filling formula is done. As you know, the water-filling formula that you classically do in communication has one big problem. It assumes that you have Gaussian signaling. Here, for the SINR, I didn't assume any Gaussian signaling. And now, because of the link, you're able to do much more than capacity. You can show that the derivative of the mutual information is related to some MMSC also, and people have been deriving some new formulas of the water-filling formula called the mercury water-filling formula, which takes into account that you have constellations which are not Gaussian. Now, why is there a big link between the two? I don't know. These are the mathematics, but if you read the paper of Shannon, in any case, it turns out that Wiener had a big, big impact on the work of Shannon, and he's one of the guys even who's been re-correcting the paper of Shannon in many respects in telling him, look at this point and what you're gonna do. Okay, I think I took much more time than expected. I'll jump on this, because there's, by the way, for people who are interested, there's a lot of interesting things also that you can do. You can also do some implementation of random matrices with what we call a multi-stage precoder detector. I'll jump on this. I'll finish with that. There is a movie that you should see. It's a very bad movie, by the way, called The Proof. So in fact, I was going to Singapore to give a talk on random matrices, and I was looking at the movie, and in fact, what you have to know is that Anthony Hopkins is a great mathematician. He's got a nice daughter, which is Gwyneth Paltrow, and this is the lover. Let's say that there's always a lover in the movie in his case. So Anthony Hopkins dies in the movie, and he's a great mathematician at university. And then his daughter, which is there, just showcases, I mean, so there's a, I mean, she showcases a proof telling that she's the one who made it. Of course, nobody believes because they think that it's the father who before dying who made The Proof, and then she just took the papers and showed them there. And so there's a whole battle where the guy here starts trusting her and then he doesn't trust her and then they start fighting and then now. But in case at the end of the movie, they brought a lot of mathematicians around the table to find if it's true or not. And the solution turns out to be random matrices and free property theory because at the end, they showcase that many of the results that were done to do The Proof were results which were done after the death of Anthony Hopkins and were late 83, 84, which are related to Voicule school. So I strongly encourage you to see it. It's a very bad movie, but it turns out that it's the only movie which has a citation on random matrices. And you know, if one of your discipline goes to Hollywood, then it means it's good. People are gonna give you some funding for that. Okay, you have to know that what I showed you here, of course, I did it because we're in the mathematical and algorithmic sciences lab related to communication, but there are other applications of random matrices outside the field of communication. One of the big applications, it turns out that he made a lot of money, putters and bushel, in any case, using random matrices for finance. So they were bought by UBS using exactly some of the tools related to random matrices, but you have also other disciplines which use that. I think I'll finish here too much time, yeah. Any questions? Should we go back to this? Okay, so apparently it is well known, but not that by name is remarked by winner about white noise so that the much estimate becomes best. What is it exactly, the statement? So you mean the signal to interference plus noise ratio, is that it? No, in this case, it was so white. So you said, why do we take with you here white? Why? Okay. Yes. So frankly, we keep the property of the white noise. So two things, you have to know what, okay, boom. Good. So if you have a system which is equal YWS plus N, okay, and your goal is to maximize what we call the SINR. So the linear receiver which maximizes the signal to interference plus noise ratio is the MMSE receiver, okay. Linear, be careful. Non-linear, then you go another thing. So the community. Among linear signals. Among linear receivers, linear structures, meaning I just do linear operation of this thing. I don't do non-linear operation of this, yeah. So the community, in general, what they do, of course, what I showed you here is the composition of each step. They just apply this thing here, which is the thing. But the idea behind is, of course, that when you write it this way, you first whiten the noise, okay. That's exactly what the MMSE receiver is gonna do. It's gonna whiten the noise. You mean it is a noise, but your white noise is same? No, you whiten it, meaning that you make an operation, you see. So here, the noise has a certain structure. The first thing you do is that you apply on your vector y some kind of operation which will whiten the noise, which is the root square of your covariance. See, if I multiply this, if I multiply, this is the root square of this. Okay, not exactly the root square, but I'm taking lambda minus one-half, QHY. If I multiply, the new noise I have here is white. Yes, okay. You see? And then once you whiten, so now I have a system which is totally different, which is white tilde equal JS one plus B. Then this is a result, but it's very easy to showcase. It's a Cauchy-Schwarz inequality, but it's a result known from the radar community. If you have this equal JS one plus B, the linear processor which maximizes the signal to noise ratio, because here it's noise, it's the matched filter, filter adept. Okay, if you decompose it this way, but if you don't want, it's also easy to show that the receiver which maximizes the signal to interference plus noise ratio is the MMSE receiver. Minimum mean square error, yeah. Other questions? Yes, also other questions, so okay. There was two quite rather distinct movements in your talk. First with the, like, future, and what will occur, 5G, et cetera. Yes. Then it was good communication and then the Gaussian matrices and so on. Where is the link between both? Of course, there is in this multi-channel operator and so on, but where are the states, the difficulties, what people do have to solve to go into 5G and so on. Okay, good question. So first thing, there's a lot of things in 5G. Okay, so one, of course, of the things I inserted here is on a technology called the MIMO technology, okay, which basically, let me take the first figure here, which is around this 5G stuff. Yes, I also think so. I think I'm gonna have a hell of a time going back forward. Good, yeah. So of course, there's a lot of things in 5G, okay? One of the things, of course, that I've been looking at is what we call the MIMO technology, meaning having here what we call base stations with multiple antennas here. At the moment in the 5G realm, people are thinking of what we call massive MIMO technology, meaning antennas, which are of the order of 128 to 400 antennas, meaning having base stations with many antennas, okay? Now, of course, there's many other technologies, but if I go on this very specific I've been showing, that's one option, okay? Now, when you start adding a lot of antennas here, there's many problems. First of all, when you start adding many antennas, you don't have antennas for which you can start having more and more space. The antennas become closer, okay? And depending on the frequency, you have more correlation. So of course, the assumption I made on the IID assumption does not hold, meaning the links that you have between the different are not IID anymore. That's the first thing you have, okay? The distribution also is not Gaussian. It's something else. We don't know because depending on the frequency on which you go, if you go on millimeter wave or you go down, we have new properties on how this channel is distributed and it will change also the results, okay? So the fact that you have in your matrix that I'm showing here, the randomness becomes first of all, a new kind of randomness, not IID. You also have line of sight effects, meaning line of sight effect, line of sight meaning that you see your destination. So it's not zero mean. No, you cannot have a link which is zero mean on average it's zero because the guy you're always seeing him. So you have a deterministic component which is the non zero mean. So basically you start working on matrices with new properties for the MIMO case. That's the first thing. And then you have to know that what I did until now is the case where for some reason you know everything, meaning the W remember you know it. Basically you need to do some protocols of estimation. And when you do this protocol of estimation, the distribution also of your matrix changes because the way you estimate makes that the links between them are not IID anymore. Okay, because I suppose that for some reason you remember when I derived the rate I suppose that everything is known. So you need some sophisticated channel estimation techniques which takes into account and we derive some of the results there, okay. But that's also only for the case that I did here. And then what I did also in a very specific because it's long, I did what we call the point to point MIMO. Meaning that there is a base station with many antennas transmitting to a receiver with many antennas. In communication you never transmit from one guy to the other guy only. There's one guy to many guys, okay. And this is what we call the multi-user MIMO for which the formula is a bit different from that. We have a region of capacity for which you have to derive some things, okay. And there also you have some problems whenever you have many guys here transmitting then things change also, okay. And then also of course in the receiver structure I did the MMSC is not the only one you have to know because people have been working on nonlinear techniques, successive interference cancellation. There's many other more sophisticated techniques to do that. But that's also only for the MIMO. Why am I taught about MIMO? Because of course that's one of the things which will be maybe solving one of the points here. Not all these things. There's many other issues, but other question? Yes. In the information theory of course they talk a lot about in the books talk a lot about the problem with transmitters, et cetera. But they also talk a lot about the coding problem. Yes. How should you code? And it was for decades a big deal how to make codes which are near the Shannon optimal capacity and things, I guess. How is it, how do these issues affect things and what kind of codes are used? Okay, so that's a very good question. That's a very good question. I didn't talk about the coding part. I just talked about what we call in our discipline performance. What is the performance you get? Now what you're talking about is the constructive way of achieving that performance now. In the point to point setting single antenna. As you know, since I think you've been looking at things around coding, France was one of the, made the headlines with the invention of the triple codes which were supposed to achieve the capacity and they're not so far from the capacity. You have to know that due to the fact that a lot of patents were put on them, there was a revival of other types of codes called the LDPC codes. And now there's a flavor of codes because of course, once you make people pay royalties or try to look at other options in LDPC, there was a revival because they're free. In fact, they're free because somebody rediscovered it and they were in a master's thesis a long time before. But in case, so for the point to point coding, we nearly achieve what we should do. The only problem that we have is that when you do coding, we basically know how to code when you have infinite block length because you know there's this averaging effect in regard to city point. Whenever you have short blocks and this is one of the problems you have with these things and these things, meaning for the internet of things, internet des objets, which are emitting, you cannot have messages which are too long. You need to have short messages. So there we don't know what's the solution of the best codes that you can transmit when you have a very short packet, very bursty, okay? And then there's the MIMO setting for which since 2000, a lot of work have been done on how to generalize the point to point coding to the dimension where you have space. So people have been working on space time codes, new type of codes and this field is rather still open. There's still like a lot of innovation that can be done because of the complexity issues and the performance and there's a lot of discussions on many types of codes that you can do. That's for the coding part. But coding for example is one of the problems we have for the short block, for very short block we have but there's also of course other problems in coding that I did not specify, I mean there's still. Other problem of telecommunication? Okay, no question, okay. So I think we'll go to the next speaker. Thank you.