 Today, we will discuss central limit theorem. So, with today we will be done with your today is the last lecture as far as your syllabus is concerned. Again central limit theorem is not one theorem, it is a family of theorems of which we will do the most simple version for IID random variables. Just like law of large numbers is also valid for more general assumptions of some weak correlations and so on. But we will do central limit theorem only for IID random variables. So, central limit theorem is at a very high level the central limit theorem establishes the importance of the Gaussian distribution. So, we know already that if you were to add any number of independent Gaussian random variables, you will get another Gaussian random variable. That is something we have already established. So, as long as they are independent if you keep adding independent Gaussian, you get another Gaussian. So, this is another way of saying this is that the Gaussian distribution is stable. So, if you keep adding Gaussian, you get another Gaussian. Similarly, you know the Cauchy distribution is stable, you keep adding Cauchy random variables, you get another Cauchy random variable. So, the central limit theorem goes ahead and says a little more about the Gaussian. Particularly, if you have a finite variance random variables, if you keep adding a large number of finite variance random variables whose mean is 0, let us say 0 mean finite variance random variables, the sum looks like a Gaussian even if the original distribution is not a Gaussian. It can be anything with finite variance. So, that is roughly what the central limit theorem says. So, you are x i's. So, you are adding a number of x i's which are let us say independent and identically distributed. The distribution of x i's could be anything, but with finite variance. But the result of adding this independent identically distributed random variables looks more and more like a Gaussian when you add more and more terms. So, this property is called. So, this establishes the central limit theorem establishes the Gaussian as an attracting distribution not just a stable distribution, but an attracting distribution among finite variance random variables. So, Gaussian is not just stable. We already know it is a stable distribution, because if you keep adding Gaussian, you get Gaussian. Now, we are saying that even if you are not adding Gaussians as long as you add enough of them finite variance random variables, you will get something that looks like a Gaussian, which means it is also an attractor in this world of finite variance random variables. So, that is at a high level. So, one way to look at it is you know that. So, x i's are i i d let us say. So, with mean expectation of x and variance sigma squared let us say for sigma x squared. Then you say S n we defined as i equals 1 through n x i. So, from the law of large numbers we know that S n over n goes to expectation of x. Let us say in probability, weak law says in probability, strong law says this is almost here. So, another way of saying this is that if you take S n, which is the sum of the first n i i d random variables and you subtract n times the mean divided by n, you get something that goes to 0 in probability or almost surely. So, that you will admit. So, this is because of law of large numbers. I am just bringing this to this side. So, now. So, if you look at this term. So, this is S n minus n times the expected value. This guy. So, this result is saying that this difference is sub linear in n. It has n becomes large. So, this ratio goes to 0, which means that fluctuation S n minus n times the mean that fluctuation from n times the mean is sub linear in n. That much is clear from the law of large numbers. Now, the central limit theorem gives a finer characterization of this fluctuation. In particular, it says that not only is this difference from S n minus not only is this numerator is the difference sub linear in n. It further says that this difference is approximately like square root of n, if you have a finite variance. So, this is approximately like square root of n. That is one of the things that CLT says. Even more remarkably, what it says is that this is like square root of n. So, if you were to have a square root of n in a denominator, you will have something that is like order 1. Because, this is like square root of n. So, if you divide the numerator by not by n, but by square root of n, you should have something that is like order n, order 1. Now, that order 1 term is a Gaussian fluctuation. That is what it says. It is a now n 0 1 fluctuation. So, I will just put that down. So, what the CLT says is in very loose terms, CLT says that S n minus n times expectation of x is about as big as square root of n. So, this is obviously a very imprecise statement, but we are saying this is roughly a square root of n fluctuation for large n. And B says that the distribution of S n minus n expectation of x divided by square root of n approaches the approaches. So, approaches the Gaussian distribution n 0 sigma square irrespective of the irrespective of the distribution of the exercise. So, it says 2 things as I said. So, this S n minus n times expectation of x is approximately like square root of n. So, this sublinearity which we get from this law of large numbers, it is further qualified that this is not just sublinearity, it is actually square root of n roughly for large n. And if you take S n minus n expectation of x divided by square root of n. So, that order 1 term you get is still a random variable, because this gives a random variables. So, that order 1 term you get by dividing by square root of n looks statistically looks like a Gaussian random variable. It in fact approaches the Gaussian distribution as n becomes large. And this is true irrespective of the distribution of exercise. All we need from exercise is that it must have finite variance. Any finite variance distribution is anything you think of. So, things like Cauchy are not included, Cauchy does not have a finite variance. It does not even have a mean. So, as long as you have finite variance this will definitely attract to a Gaussian. Is this clear? So, this is the IID version for the IID version of the CLT. It also holds under slightly. I mean there are some more general versions of CLT, where you do not necessarily demand IID, where you can relax. You can have some decaying correlations or exercise can be differently distributed, but no one of them should dominate very strongly. In those circumstances you can get a CLT, but we will only bother about the IID version. Theorem then U n equal to blah, blah, blah converges in distribution to a standard Gaussian random variable. That is the central limit. This is the precise statement. So, you have IID random variables with mean mu and variance sigma square. And you have to assume that sigma square is finite. If sigma square is finite, then this object S n minus n mu over square root of n. And I put the sigma down to get a standard Gaussian. Otherwise you will get a Gaussian with variance sigma square. So, I put this sigma down here in order to get a standard Gaussian limit. So, then this sequence of random variables let us call this U n. So, these U n's are centered by taking away the mean and scaled by sigma times square root of n. So, you made U n into a 0 mean a unit variance sequence. So, we are saying that this sequence of random variables convergence in distribution to a standard Gaussian n 0 1. That is f. So, what is convergence in distribution? f U n of x, when I am looking at the CDF f U n of x converges to. So, limit n tending to infinity of the CDF must be equal to what? Error function of x. Error function of q, what did I use? Erf is what I used. So, this is what is the set of function is take as minus infinity to x 1 over square root of 2 pi e power minus y square or something y square over 2 dy. So, it says that the limit if you take the CDF of this sequence where if U n is a sequence of random variables whose CDF sequence of CDF converges to the CDF of the standard Gaussian. So, that is what the central limit theorem says. What the central limit theorem does not say is that the U n will have a pdf converging to the bell curve does not say that that is a very big misconception. See what I mean. So, if you take this random variables U n, first of all these U n's need not have a density because you have not assume that these x i's have density. They need not be continuous they can be whatever you want. Any finite variance distribution will do. So, these x n's or these U n need not have a density. So, in any case you do not even if they have a density it is not necessarily the case that you will have convergence of at least a central limit theorem does not say that U n will have a density that looks like the bell curve. That is not what the theorem says. This is a very big misconception. It only says convergence of CDF's not pdf's. In fact, the pdf may not exist. The convergence is to the error function. The CDF goes to the error function. It is not true that you approach a bell curve. U n's U n may not have a density and it is need not have a density that goes to the bell curve. Is that clear? Any questions on the statement? So, this error this is a continuous limit. So, you do not have to have the problem of the convergence and distribution. You have the problem that you only need convergence at points of continuity. That problem does not arise here because it is such a nice function. It is a continuous function. Any questions? Proof. Proof is actually very simple in the IID case. So, you let z i equal to. So, I am just going to center these guys. I am going to center and scale the x i's. So, I am just going to take the mean away and scale by the standard deviation. So, I get the z i's will be 0 mean and unit variance. But, the z i's need not be Gaussian. z i's are just the scaled version of x i's. They can be anything. They just have 0 mean and unit variance. And then you have U n. U n will be equal to sum over z i divided by square root of n. Great. So, let me consider c z i of t. So, these random variables z i have 0 mean and unit variance. Therefore, the characteristic function must admit a Taylor expansion up to second order. So, it must. So, this guy must look like what? So, you have a 1 plus i times 0 t. 0 is the mean. So, i times 0 times t then plus i squared. So, minus 1 times expectation of x squared. The second moment which is now equal to 1 times t squared over 2 factorial. Plus, there will be a little term of t squared. Isn't it? Little low t squared. Not big old. Little low t squared. Good. Now, so I want to eventually. So, this is what I want. I want the characteristic function of c u n t. What I will eventually show is that c u n t as n tends to infinity converges to the Gaussian characteristic function which we know is e power minus t squared over 2. So, it is happy that I have a term like that now. So, I am going to try to calculate the characteristic function of u n. So, c u n of t. So, now, you have to help me. So, this is a square root of n here. But, other than that this is just the nth power of c z i. So, if this was not there, the characteristic function of the sum alone will be the nth power of the characteristic function of z i because they are i i d. So, I think what turns up is that c u n t will be c z i of t over square root n to the nth. I think this is what will come up because there is a square root of n in the bottom. I think this is fairly easy to verify. So, this will be equal to 1 minus t squared over 2 plus little low well 2 t n power n. So, I have to write. So, I am taking the nth power of c z i of t over square root of n. So, wherever I have t, I write t over square root of n. So, I get t square over 2 n plus little low t square over n. So, now, this converges to. So, this is as n becomes large. So, this is 2 n. So, as n becomes large, this is little low t square over n. So, this will go to 0 very quickly and therefore, it will not play a role in the limit as n tends to infinity. This will just go to e power minus t square over 2. So, this goes to power minus t square over 2 for all t as n tends to infinity. Thus, u n converges in distribution. This is the characteristic function of a standard Gaussian. So, it is a very short proof if you look at it. So, we have used characteristic function convergence to establish a very fundamental result in just three steps. And this is a perfectly, I mean this is a perfectly rigorous proof. They have not cut any corners here. The only thing you have to be slightly careful about is, you have to show that this does not matter. You have to be little careful in showing that. So, this is a perfectly correct rigorous proof. There are many see the many elementary textbooks on probability give many pseudo proofs. You know, they try to manipulate the density and try to get e power minus x square by 2 for their density. Those proofs are not correct, because there is no proof. There is the convergence is not in the density. The convergence is in the characteristic function. Therefore, the convergence is in distribution. So, it is not at best these manipulations found in more elementary textbooks. They can serve as some intuition serve to provide some intuition, but it is not a correct proof to use those elementary proofs. This is the simplest and the most complete proof. It is a complete proof of central limit theorem for the IID case. Any questions? So, these x i's can even be discrete or some mixtures of whatever you want. As long as there is a finite variance, convergence in distribution is guaranteed. So, if x i's were let us say discrete or something. Suppose x i's were discrete, then these u n's will also take only countable set of values, correct. So, you cannot possibly have convergence of this u n will not even have a density in that case, right. And because u n will also be discrete, if x i's are discrete, the CDF of u n will have discontinuities. The CDF will have jumps, right, but in the limit these jumps become smaller and smaller and you will converge to the error function. So, the limiting CDF although the CDF of f u n of x may have a number of discontinuities, because u n as x n's may be discrete, u n's may be therefore, discrete, but if you take the limiting CDF, you get a nice error function, okay. Is that clear? So, the error function looks like that, right, but the limiting function may have these little jumps, which gradually become finer and finer and converge to the error function. But the sequence u n may still be discrete or mixtures or singular or whatever you what have you, right. Generally, even if you have densities, so even if you have a sequence of random variables with densities, which converge in distribution, it is not necessarily the case that the sequence of densities converges, okay. There are some counter examples for this. I believe there is one in the homework. Did you put that in the homework? Yeah, so I think Grimatt has a counter example for this. So, you may have a situation where x n converges to x n distribution and x n's are all continuous random variables. So, they have densities, but still you may have a situation where the densities of x n's do not converge to the density of x, okay. That is also possible. So, it is not true that even if you have, these u n's may have density, but still fail to converge to the bell curve. It is not a consequence of central limit theorem. So, that is not what CLT says. It says convergence of CDFs, okay. It is not a failure of random equilibrium theorem. They both have densities. It is just that the sequence of densities do not converge to what you want. Sequence of CDF converges. See, generally speaking, if you have f n of x converging to f of x, it does not mean that f n dash of x converges to f dash of x, correct. There is sequence of derivatives need not converge, correct. You need additional, you need, if you have uniform convergence, you can assert the convergence of densities, right. So, you need additional assumptions. And if you want to assert the convergence of densities, you need additional assumptions, okay. So, such results, actually such result does exist. For example, there is something called local central limit theorem. So, an example is, so this is just for your information. This is not something you have to really learn very seriously, local central limit theorem. Actually, Grimit has a version of this local CLT. Grimit and Sturzak section 5.0. Let x 1, x 2 dot, dot, dot be an IID sequence of random variables with zero mean and unit variance. Suppose, C x of t, the characteristic function of these x i's satisfies. So, let us say it is the characteristic function satisfies. So, d t. So, the rth moment is integrable for some integer r bigger than or equal to 1. Then, u n has a density by mean pdf. u n has a density function say f u n for n greater than or equal to this r. And further, f u n of x converges to e power minus x square over 2 over square root of 2 pi as n tends to infinity uniformly for all x and r. So, this is the proof. Actually, Grimit gives a full proof of this. So, this is one example of a local central limit theorem. So, this is not, this requires an additional assumption that for example, first is for r greater than or equal to 1. For some r greater than or equal to 1, integer valued r greater than or equal to 1, you have that the characteristic function rth moment is integrable. For r equal to 1, this is absolute integrability of C x. If you just take r equal to 1 for the sake of simplicity, this is saying that C x is absolutely integrable. So, actually if C x is absolutely integrable, you know that there necessarily exists a density. We stated a theorem about it. In fact, you have a density which is uniformly continuous. So, when you have a density that is uniformly continuous, then you can prove that the convergence actually happens to the density of the Gaussian. Generally, r does not have to be 1, r can be 2 or 3. It can be square integrable or cubed integrable. For n greater than or equal to r, your u n will have a density. So, that is an example of a local central limit theorem. So, you need more assumptions in order to get convergence of densities, even if you have continuous random variables. Are there any questions? So, the central limit theorem is the physical reason behind the common occurrence of the Gaussian random variable in so many engineering and statistical applications. So, if you were to measure the noise across a wire which does not carry any current, you will get it is distributed like a Gaussian. It is Gaussian distributed noise. The thermal noise across a resistor is Gaussian noise. This is something that people often say. So, the underlying reason behind this is in fact, the central limit theorem. Because, if you have this piece of wire, so this thermal noise is a thermodynamic, statistical thermodynamic phenomenon. So, you have a number of these electrons and there is no electric field. You just have a wire. There is no electric field. So, there is no current. So, these electrons are just jiggling around due to their thermal energy. Then, there is no mean. There is just variance. They have this thermal disturbances. And, you have so many of them moving around because of their thermal variance. And, each electron creates ever so small a voltage because of its of this vibration, these movements. And, you have so many of these electrons. So, on the sum looks like a. So, the net voltage looks like a Gaussian because of the central limit theorem. So, you have a 0 mean and some finite variance. Finite variance is because, this is a thermodynamic system. It is a finite energy system which is why in all these thermodynamic finite energy, finite variance systems, you have this Gaussian everywhere. And, the variance of this electrons jumping around is related to the temperature. Larger the temperature, more they jump around. So, more the thermal noise, more the variance of this noise. So, the central limit theorem is at the heart of these things. Any questions on the central limit theorem? So, that completes central limit theorem. If you do not have any more questions, that is the end of central limit theorem. Actually, there is another theorem which is more advanced, which further qualifies this fluctuation Sn minus n mu over square root of n. I will just, because I have some 10, 15 minutes left, I will just put the theorem down. It is certainly not in your syllabus. It is called law of iterated logarithms. So, it is a pretty long, it is a pretty hard theorem to prove. It is a fairly advanced theorem, but it is a fairly remarkable result. So, Grumit and Sturzaker section 7.6. So, you have let us say that, let us say x i's are i i d 0 mean unit variance. And, let u n is equal to, let u n equal to Sn over square root of n. So, you are essentially summing the 0 mean random variables dividing by square root of n. So, this. So, central limit theorem says this fluctuation is, as n tends to infinity, this fluctuation is roughly like a Gaussian n 0 1 random variable. Now, this law of iterated logarithm looks at the largest value taken by this u n. How big are these? So, as n becomes large, you know that, if you fix any particular n, very large n, this u n is going to look like a Gaussian. But, if you look at a very large n and look at. So, suppose I am at a very large n, n equal to a million, let us say. And, I am looking at the sequence that is coming after a million, after a very large n. If you look at that entire sequence, what is the largest value it takes? How big are these fluctuations at maximum? Understand what I mean? So, this fluctuation. So, you have these fluctuations. So, let me say, pretend that I am plotting n and u n. So, for each n, you will have some fluctuations. And, if you just fix an n, fix a particular n and run various omegas, you will get this Gaussian for the u n distribution, because of central limit theorem. But, the law of iterated logarithm does not fix a particular n and look at its distribution, which is what CLT does. It looks at the largest value taken beyond n. Is there a very big fluctuation somewhere? And, how big is that fluctuation? Is what it looks at. So, the remarkable result is that, the largest fluctuation in some sense a limb soup of this u n is roughly like log log n. That is why it is called the law of iterated logarithm. You have log of log of n. So, the biggest value for large n, the biggest value that this fluctuation takes, although for a particular n, it is Gaussian. If you look at the entire play of this entire realization of this fluctuations, the largest value it is going to take is like square root of log log n. So, in particular the theorem is this. The theorem under these assumptions is that limb soup of this guy S n over square root of n log log n. So, this is equal to square root of 2 almost surely. This is the law of iterated logarithms. It is a pretty remarkable result. So, we are saying S n over square root of n is Gaussian. But, the limb soup, the largest variation looking beyond n. So, the largest variation as limit n tend to infinity. If you scale it down by square root of log log n is almost surely equal to square root of 2. So, which means that this ratio will exceed square root of 2. So, if for any value of c which is less than square root of 2, this ratio will exceed c infinitely often. But, if this value c is greater than square root of 2, such the excursion beyond c will be only finitely. So, this is a, I mean this is just for your, I put this down because there was some time left. This proving this is quite hard, where it is a fairly non trivial result. So, the limb soup goes like square root of n log log n. That is what this iterated logarithms is. I will stop here. The lecture is over.