 So, we will as I said in the last lecture, we will now after having stated the central limit theorem talk to you about its importance. We will actually see now once I prove the theorem and then I give you applications, you will see how important the theorem is and how widely used and the theorem is. So, as we said that the central limit theorem says that if you have the sequence of identically independently distributed random variables with mean and variance finite, then we say that the sum of these random variables x 1 plus x 2 plus x n, it will have mean n mu and variance n sigma square. So, this will go to, this will converge to n 0 1 as n goes to infinity. That means, the distribution no matter what the original distribution of the x i's was. Now, when you take this sum and you let n go to infinity, then this random variable will converge to the standard normal variate. So, this is a convergence in distribution or in other words as we can state it in another way also. This is for any a which is from minus infinity to infinity a finite number, then the probability of this number being less than or equal to this random variable less than or equal to a will converge to the standard normal. That means, this is your f z a and this is your f z n a. So, this converges to this as n goes to infinity. So, in distribution the convergence is there. Now, in order to prove this, I need to use this lemma which talks about uniqueness of the M G F and I will not give a proof for this. You will just accept the lemma as it is. So, this says that if z 1, z 2, z n again is a sequence of random variables having distribution functions f z n and M G F, M z n and greater than or equal to 1. So, the distribution function is f z n and the M G F of z n would be M z n and greater than or equal to 1. Let z be a random variable having f z as its distribution function and M z at its M G F. So, then if M z n t converges to M z t. That means, as n goes to infinity, this moment generating function converges to the moment generating function of the variable z. Then we say that the corresponding distribution functions will also converge to the distribution function of z. That means, if so this is this talks about the uniqueness of the M G F. That means, if the M G F, if the M G F, M z n t converges to M z t for the variable z, the M G F of z is M z t as n goes to infinity. Then the corresponding distribution function of z n which is f z n t will converge to the distribution function of z for at all points t at which f z t is continuous which means that it is defined. So, this is the idea. That means, the M G F is uniquely and while discussing M G F, I will also try to tell you that M G F is uniquely give you your density function or the distribution function. Because, the parameters you can compute and actually not only the parameters, but the distributions are is the same. I mean, once you get a M G F, you can uniquely fix the distribution for the form of the M G F function. So, this is what we are stating here which we have also been using otherwise. So, now, let us so the proof is not very difficult is straight forward. So, I am just rewriting this function, this random variable. So, mu is getting attaching to each of the same size. So, therefore, the same thing can. So, now, I am writing the M G F of the random variable which we want to show will converge to a standard normal variate in distribution. So, this can then at a point t, I am defining the M G F. So, this is in fact, e raise to this whole thing into t. The M G F of this would be e raise to whatever you want to call it y n or something or z n then e raise to z n t. So, the M G F expected value of e raise to z n t you can say which I can write as x 1 minus mu upon sigma root n plus x 2 minus mu and so on. Now, since x 1, x 2, x n are identically distributed and independent random variables. So, the M G F here again by the property of the M G F. Well, the thing is that, yes, I should have. So, yes, the order has been a little because I will be talking of the M G F of the independent random variables, you know more than one variable. So, that should have come, the lecture should have come before that. Anyway, we can talk about it. So, what I am saying here is that this M G F can be written as the product of the M G F of x 1 minus mu upon sigma root n raise to n because of the independence. This anyway also follows from independence because you see, when you write this, let me say I am saying z 1 plus z n and you are taking t. So, when you write expectation of this because the p d f's are. So, this will be z 1 into f z n because the joint density function would be this. So, your expectation when you write of this would be whatever you can and n th order integral from minus infinity to infinity depending on if the variants are defined. I am taking the general definition into d z 1, d z n. So, then you see, you can separate out the integrals. So, each integral will be the M G F of these and since they are identical, it will be the same. I am writing f z 1, f z n, but they are all the same f z 1, f z 1. So, therefore, you can immediately see from here that this will be that this M G F can be written as this because of independence and identically distributed random variables. Now, this I can write as this should have been at t, M G F at t. So, let me just separate out this. So, this will be M G F of x 1 minus mu and I am taking the variant to be t upon root n sigma. So, this raise to n. Now, you expand this. Remember, this is when you are writing this. So, I am just expanding my t x and this will be this will be expectation of 1 plus t x plus t x whole square by factorial 2 and so on and the expansion of e raise to t x. So, that is what I am doing and then I am taking expectation inside because, now here in the central limit theorem when I assume that they have finite variance. So, another assumption I should have made is that because I am using the M G F way of I am using the moment generating functions to prove the theorem. I should have also said here that M G F x i exist and so obviously, we will be talking about all those t's at which the M G F exist at which the M G F is defined. So, therefore, I take this expectation and since the M G F exist. So, I can take the expectation inside because the series is convergent series and so I can take the expectation inside. So, this is what you have raise to n and you see here I am not considering higher powers of t because, see this is square then it will be t cube t 4 and so on. So, I am just writing this whole as a higher terms higher order terms of t powers of t. So, then see expectation of x 1 minus mu is 0 because, we have taken assume that each x i has mean mu. So, this term is 0 and therefore, you will be left with this thing. So, and expectation of x 1 minus mu whole square would be sigma square. So, this is 1 plus sigma square t upon should be t square t square n sigma square there should have been a t 2 also sorry this would be 2 factorial and so on. So, therefore, there will be a 2 here because, t x square upon 2 factorial and so on. And then higher terms higher order terms of t this raise to n now. So, as n goes to infinity you see because, n raise to half is in the denominator. So, then when you say take the third power it will be n raise to 3 by 2 and so on. So, these things will go to 0 as n goes to infinity. So, but here so I will ignore this and then you see what happens to this 1 plus sigma square. So, the sigma square cancels out t square by 2 n and so if you write this as t square by 2 into n and then raise to n. So, I hope you know that most of you that this will converge to e raise to t square by 2 because, n in the denominator and then n power. So, as n goes to infinity I can safely ignore these terms. So, then this will be this will converge to e raise to t square by 2 because, sigma square sigma square cancels out and this we are left with t square by 2 into 1 upon n raise to n. So, this converges to t square by 2 as n goes to infinity and you know that this is the m g f of random barret which is normal 0 1. And so we have shown that using the lemma because, I have shown that the m g f of this random variable sigma x i minus n mu upon root n sigma that converges to the m g f of a standard normal variant. Therefore, by the lemma I can assume that this random variable converges to n 0 1 in distribution as n goes to infinity. So, you know using the m g f the proof really simplifies and. So, in the proof I have used the expression small o of t upon root n sigma. So, the understanding is that this denotes terms of the type t upon root n sigma raise to r r greater than or equal to 3. See in that proof by proving the central limit theorem I wrote down the terms up to t square and then I wrote down that the later terms will all be having higher power of t upon root n sigma. So, this is the expression and see the understanding is that as n goes to infinity this is for this expression that means because r is greater than or equal to 3. So, anyway this will go to 0. So, that means as n becomes larger and larger the terms that become very small. So, their contribution is negligible therefore, we ignore them. This is the idea, but now for convenience I have expressed I have used this term for this notation for also including the term expectation of x i minus mu raise to r. So, that means I am taking that this now denotes for me in that proof this into t upon root n sigma raise to r for r greater than or equal to 3. And then since we have assumed that the m g f exists for all x i, x i's are all identically distributed. So, the m g f exists that means all moments exist. So, all moments are finite and therefore, these numbers are finite for all r. Hence the same thing applies that means if this becomes small then as n becomes larger and larger this whole thing also becomes very small and goes to 0. So, this is the idea that as n goes to infinity this will go to 0. So, therefore, we can neglect the term. So, this is and I have used this expression elsewhere also and so this is the understanding is that when you write small o then it means that these are higher order terms whatever you have written down beyond that all higher order terms that means of power higher than higher than 2 here and for us the way I am using it r greater than or equal to 3. So, therefore, for large n I can ignore such terms in my sum. This version of C L T as a central limit theorem goes under the name of Lindbergh Levy theorem also Lindbergh in 1922 and Levy in 1925 independently gave this theorem. They showed the proof this result. So, independent of each other in 3 years gap they. So, therefore, this is also sometimes known as the Lindbergh Levy theorem, but most commonly it is referred to as the central limit theorem. So, the proof is simple just using the independence identically distributed random variables and the properties of the m g f. So, through the property of m g f we could show that this sum of the random variables which are independent identically distributed random variables will minus n mu upon root n sigma converges to a standard normal variant. So, now let us look at this interesting example. This is from Dodevich and Mishra and I will give you the references at the end of the course. So, a casino has a coin and the wishes to you know remember a casino is a people bet and so game is you know tossing a coin to show a head. So, a casino has a coin and wishes to estimate p the probability of a head on any toss in such a way that they can be 95 percent confident that the estimate p hat is within 0.02 of p. So, obviously they want to have an idea as to how what is the probability that the coin will throw show a head when it is tossed it is important to them because every whenever a head is tossed then you know the person who is playing the game wins. So, the casino has to pay. So, therefore, they want to be confident that whatever their estimate is that is within 0.02 of p the actual p and so weak law of large numbers helps you out here. So, given an epsilon and a delta greater than 0 we know that there exists an n naught the smallest value of n such that for all n greater than or equal to n naught probability of x and bar minus p greater than or equal to delta is less than epsilon. So, for all n greater than or equal to n naught this difference is greater than or equal to delta is less than epsilon and so the complimentary the event would be that probability x and bar minus p in absolute value is less than delta. So, this probability is greater than or equal to 1 minus epsilon. So, the casino for the casino problem your delta is 0.02. So, that your x and bar is within 0.02 of p. So, it can be either a little less than p that means it can be p minus 0.02 and p plus 0.02. So, this is what you want. So, your x and bar should be in this interval. So, delta is 0.02 and your 1 minus epsilon is 0.95. So, this probability that your x and bar is within is in this interval should be the probability should be greater than or equal to 1 minus epsilon. So, that means you will be 95 percent confident this is the idea. So, therefore, just write out this this is exactly what. So, once you give the values of delta and epsilon you get that probability x and naught minus p in absolute value should be less than 0.02. So, this probability should be greater than or equal to 0.95 this I do not need to write this because now I am using the value n naught. And so, when you expand this x 1 plus x 2 plus x n naught upon n naught minus p. So, this should be less than 0.02 is greater than or equal to 0.95. So, this will. So, therefore, you can find such an n naught and therefore, the casino can by tossing the coin that many times they can find out the estimate p hat for the probability of p. Same event can be rewritten as this has to be or it was not really necessary because since we have said that there is an n naught which will do the job. So, therefore, I could have carried it as n only and then found it out, but anyway. So, therefore, this is therefore, everything is n naught here. So, now I have this event and if I divide all the numbers by under root n naught p q because the variance of each x i is p q and therefore, variance of x 1 plus x 2 plus x n naught will be n naught p q and so divide by the standard deviation and then therefore, this event is the same as this. So, the probabilities are the same 0.02 n naught divided by under root of n naught p q. So, I divide throughout by under root n naught p q to you know standardize and therefore, this I am this is my random variate now which is the standardized variate and by C L T theorem this is a standard normal variate approximately of course, right. Approximately this is standard normal variate and so probability is a function of this probability would come out to be a function of because your numbers on the two end are the interval in which you are wanting z to be that probability will depend on p. Now, the thing is that you are see if I find so here again you have to do n naught n naught. So, the whole idea is that if I put this equal to if I put the maximum value of n naught p q here then I will get the because it is denominator if I put the maximum value then this will be the smallest. So, if that means this interval will be the smallest for the maximum value of this number it will be the smallest interval and so probability of the smaller interval if this is greater than 0.95 then for any other p this probability would be greater than 0.95 that is the idea. So, my event what I am doing here is by writing the maximum value for this event will be subset of all other events whatever the value of p because q is 1 minus p. So, therefore, if this probability can be made to be equal to 0.95 or greater than or equal to 0.95 then for all values of p this will be greater than or equal to 0.95 this is the idea. So, therefore, the and we know that the maximum n p q is 1 by 4 for all 0 less than p less than 1. We know this I am sorry why am I writing this p q. So, maximum value of p q is 1 by 4 and therefore, the required probability since by the C L T theorem this is standard normal. So, this will be now that we have started doing it. So, I will write it not everywhere. So, this will be phi of which is the for the cumulative probability this is 0.04. So, if I am writing 1 by 4 now. So, this will be 1 by 2 and so you take it here and therefore, the 0.04 and not minus phi of minus 0.04 and not from here which by symmetry of the normal standard normal variate this is 1 minus of this 5.04 under root n not minus 1 minus this. So, this becomes twice 5.04 root n minus 1 and now we want this to be greater than or equal to 0.95 and so when I compute the value of n for by putting it equal to 0.95 then again for all values of n greater than or equal to n not this inequality will be satisfied. So, the normal tables give me that. So, if therefore, this probability becomes 1 plus 0.95 divided by 2. So, now the normal tables tell me that corresponding to this value 0.975 that means the area under the normal curve is 0.975 that the corresponding value is equal to 1.96. So, that means, 0.04 under root n not is equal to 1.96. So, from the normal tables corresponding to this probability I get that this number must correspond to 1.96 and therefore, your root n not is equal to this which is 49 and therefore, your n not greater than or equal to 49 square. So, that means for that many sample values or that many trials you have. So, this will this that this number n not which is greater than or equal to 49 square your estimate of p which will be obtained by p hat. So, that means, p hat will be essentially your. So, what we are saying is p hat is summation or you can say x 49 square bar. So, this estimate of your p will be within 0.02 of your original p with probability 0.95. So, this is you know interesting application of your central limit theorem and because we could reduce the whole thing to computing standard normal probability we got the answer here. Now, again I will just try to have a variety of examples to show you use of the central limit theorem. So, now, here another question that is asked is if so, suppose you have x 1 x 2 x n again a sequence of identical independent distributed random variables and these are Bernoulli random variables right probability x i equal to 1 is p and probability x i equal to 0 is 1 minus p for all i and again p is unknown. So, you want to estimate this probability and as I told you that of course, the weak law of large numbers tells you that x bar will x n bar will be a good estimate provided n is large enough. So, now let us see if you put so, now let us define s n as x 1 plus x 2 plus x n and let us fix t. So, here in this example I am trying to show you the you know the accuracy of the central limit theorem and obviously, we expect a better answer from the central limit theorem then if I just use the Shebyshev's inequality. So, this is the whole idea you know by doing you know wanting to do this exercise and so, here let us see that yeah. So, the question is using Shebyshev's inequality how large an n will guarantee that this s n which is the sum of these random variables x 1 to x n. So, this divided by n minus p in absolute value is greater than or equal to t is. So, you are fix the t, so given a t then you want this probability to be less than or equal to 0.01 and we will now here compare because how large an n. So, Shebyshev's inequality will also give me a value of n and then central limit theorem will also give me a value of n. So, we will compare the two values. So, here expected s n upon n is p and variance s n upon n is p into 1 minus p by n because they are independent random variables identical. So, therefore, we will first use the Shebyshev's inequality and then compute through the this n estimate of n and then through the central limit theorem also. So, applying the Shebyshev's inequality because expected value of s n upon n is p and variance of s n upon n is p into 1 minus p by n. So, therefore, by the Shebyshev's inequality probability that s n minus n minus p in absolute value is greater than or equal to t. So, this is less than or equal to expectation of this square divided by t square. In other words the variance of s n upon n. So, the variance of s n upon n is p into 1 minus p by n. So, therefore, by Shebyshev's inequality we get this estimate and here again we are applying the same logic. So, I hope that you can easily see that the maximum value because if you have 0 less than p less than 1 we are in the earlier this thing also I use the fact that max of p 1 minus p is equal to 1 by 4. You can easily check I mean let us just spend a minute and try to see that means this is a function of p and I can find out the derivative. So, the derivative would be 1 minus p 1 minus p by 2 and so this is I put this equal to 0. So, that gives me p equal to I am sorry this is 2 p because minus p square. So, the derivative is minus 2 p. So, this implies that p is half certainly p cannot be 2 because p is lying between. So, this is the critical value and to make sure that this gives you the maximum value the second derivative that means the second if I call if I am saying f p is p into 1 minus p then f prime p when you put 0 gives you p equal to half and f double prime p is equal to minus 2 which is less than 0. So, that means the critical value that we have obtained will be the maximizing value and you. So, you see p equal to half gives you the maximum of this and you can also prove this by concavity and so on. So, therefore, this is less than this I mean here and if I am putting the maximum value the obviously this equality will convert to inequality. So, it is less than or equal to this. So, 1 by 4 n t square and this should be equal to 0.01. So, we get the estimate for n maybe I should put this something like this 1 upon 4. So, your n will be equal to 1 upon 4 t square into 0.01 which is 25 by t square. So, this is your Shevyshev's estimate of this probability I mean for the value of n for which this probability would be less than or equal to 0.01. Now, let us try to apply the central limit theorem and the central limit theorem says that this where yet when you take s n by n minus p divided by the standard deviation which is p into 1 minus p by n. So, you divide by that is the root n goes upstairs and this is standard this is approximately normal standard normal for large enough n I can say that this is approximately this. So, now you want to compute this probability greater than or equal to t and here again I will write this as so I am standardizing it and therefore, dividing it by dividing the whole thing by the standard deviation. So, that gives me the right hand side as root n t upon under root p 1 minus p. So, here again because this is greater remember. So, then if I put the maximum value as again I had argued earlier this becomes a smaller interval and therefore, I should have said I think we have missed out on the standard this thing sorry the absolute value and here also it should be the absolute value fine. So, the standard. So, the interval as I said if you put the maximum value here then this becomes smallest and so the interval is smallest. So, if the probability is and then we are wanting the so for larger interval the probability would be higher. So, therefore, if I am taking the smaller interval the probability I am wanting to be what was our this thing the problem that this problem stated was that this should be less than or equal to 0.01. So, if I am saying that this should be greater and so here if I am writing the maximum value. So, what will it be that means I am wanting the mod z to be greater than or equal to z naught and if I am taking the smallest value here. So, that means if this is your 0 then you are asking for yes. So, you are asking for this probability and this probability to be less than or equal to this. Now, if I am taking the minimum value here that means I am putting the maximum value here then obviously, you will be taking larger area. So, therefore, the value of n which satisfies for maximum value here will satisfy for all values of p. So, this is the idea. Now, let us see so this greater than or equal to so therefore, I am substituting 1 by 4 for p into 1 minus p which gives me half and so that becomes makes it 2 root n t. So, it is just the same argument you draw the figure and you can verify for yourself. So, now you see this is greater than or equal to 2 root n t. So, let me just show you the details here that means we are asking for probability z greater than or equal to z naught which is the same as probability z greater than z naught union probability z less than minus z naught. This is the absolute value as I said z greater than z naught and z less than minus z naught. Now, these are disjoint events. So, therefore, I can write the probability of the union as the sum of the probabilities. So, this will be probability z greater than z naught plus probability z less than minus z naught. So, this becomes 1 minus probability z less than or equal to z naught and probability z less than minus z naught. See here again I can just show you the so that means if you have your. So, if you are wanting z less than this is minus z naught you are wanting this probability, but that is the same if you write z naught here this is the same as this probability. So, therefore, probability z less than minus z naught is 1 minus of probability z less than z naught. So, z less than z naught is this whole probability. So, from 1 minus I will get this which is equal to probability z less than minus z naught and therefore, I get this. So, this is 2 minus twice probability z less than or equal to z naught, where z is your standard normal variant. And so the same thing I have used here, this is your z naught. So, this whole probability is 2 minus twice phi 2 under root n t, which should be equal to 0.01. So, like we are finding the value of n for which this will be equal and then higher values of n, this will always be less than 0.01. And so this is 0.01. Therefore, this tells me that phi 2 root n t, if you bring this to this side and this here is something wrong. So, you bring this to this side and take this to this side. So, it will be 1.99 divided by 2, which is 0.995. And so you will look up these standard normal tables and corresponding to this probability, the corresponding value of the variant is 2.57. So, from the normal tables, I get that when this number is 2.57, the corresponding normal probability from minus infinity to 2.57 is 0.995. So, therefore, this gives me root n as 2.57 upon 2 t. So, the number by the central limit theorem, the estimate and therefore, this implies that your n is 2.57 upon 2 t whole square. So, this is our central limit estimate and that is our Shebyshev's inequality estimate. Now, if you want the third part of the question is compare the results for t equal to 0.01. So, for t equal to 0.01, the Shebyshev's inequality gives you a number a value of n, which is 2500000. That means, 250,000. 250,000, whereas the central limit theorem will only ask for this many sample values. So, if you want to get the sample size such that S n upon n differs from p in absolute value by this difference. So, that means, the probability that this is greater than or equal to t is less than 0.01. So, that number for central limit theorem is much smaller compared to the Shebyshev's inequality. So, this was another aspect of central limit theorem, which I thought you should have a look at. Then another usage of central limit theorem is to how we approximate the chi square distribution for large values of n, because you see somebody has already done the calculations for the central limit theorem for the normal variate. So, we can make use of those tables to compute chi square distribution large values when the n is large. So, the idea here is that if you take x 1, x 2, x n are again identically independently distributed random variables. Each is chi square 1. So, this implies that expectation of x i is 1 and variance of x i is 2. And then also we know from the reproductive property of chi square. In fact, through the joint M G F also we can show and even otherwise we have seen that because each of them is independent identically distributed chi square. So, x 1 plus x 2 plus x n will be chi square n through this is we have already seen it in one way that these when you sum up these independent identically distributed random variable. So, chi square has the reproductive property. So, this sum S n is chi square n and later on we will also show through use of M G F how quickly you can say that this sum will be chi square n if each is chi square 1. So, anyway now so by a CLT theorem S n minus n upon root 2 n in distribution because we have standardized it subtracting the mean of S n. So, S n will be the mean of S n will be n because each is chi square 1. So, therefore, this is minus n upon root 2 n that will go in distribution to 0 normal 0 1 as n goes to infinity. So, in other words chi square is the mean n normal approximately you can say mean n and mean n and variance 2 n. So, this is a normal whichever way you want to put it. So, therefore, you want to compute this probability remember for large n you want to be able to compute these probabilities using the normal tables. So, therefore, when I standardize this will be S n minus n under root 2 n. So, this will be become a minus n upon under root 2 n which is the value phi of a minus n upon root 2 n. So, therefore, when n is large I can the central limit theorem will give me a good approximation and so for large n this will be standard normal close to a standard normal variate. And therefore, I can compute this probability for large chi square n by the normal standard normal table. So, now, as I said I will show you how we can approximate the chi square probabilities using central limit theorem. So, let n be 100 and we wish to. So, I had shown you that you know the formula I gave you the formula that means you can standardize the chi square random variable and use by using the central limit theorem you can get the probability. So, let us compare the values for n equal to 100 and we so we want to approximate a. So, that chi square n by the normal standard 100 less than or equal to a the probability that chi square 100 is less than or equal to a is equal to 0.95. So, by using the central limit theorem and so central limit theorem set that this can be converted to see right 5 chi square 100 minus the mean is 100 divided by variance is 200. So, under root 200 so that becomes phi of a minus 100 upon under root 200. So, this will be approximately because we are saying and let us see how if n equal to 100 how good I mean is it large enough for a good approximation. So, if you put this equal to 0.95 then by the standard normal tables the 0.95 probability corresponds to this value being equal to 1.645. So, therefore, you equate these two numbers and that gives you a equal to 100 plus under root 200 into 1.645 and that comes out to be 1, 2, 3.2638. So, using the central limit theorem and then we just standardize this variate right and then we said that if n is large enough this must be approximately normal 0 1 and so from there I get this probability. Now, from the tables for chi square 100 if you compute the exact value then exact value of a that comes out to be 124.342. So, you can see that the approximation is really very very good and this is for n equal to 100. So, the point we are making is that you know if your n is larger you will get a better approximation, better than this even the difference will be only at the decimal places and so you can do you can use your standard normal tables for computing these probabilities. So, this was one another point that I wanted to make about using central limit theorem results. So, now another application of central limit theorem the question is what happens if x 1, x 2, x n again are identically independently distributed random variables and you are given that expectation of each variable is mu and the variance is sigma square and also that the expectation of x i minus mu raise to power 4 is sigma square plus 1 which is less than infinity because sigma you are taking to be a finite number. So, the first question is does the weak law of large numbers hold for x 1 square, x 2 square, x n I mean the sequence, sequence of square of the random variables and the second question is to find the limit of this probability where x 1 minus mu whole square plus x n minus mu whole square divided by n. So, we are talking in terms of the squared random variables. So, because the central limit the weak law of large numbers holds for x 1, x 2, x n because your variance and mean are finite. So, now the question is that is the does the weak law of large numbers hold for x 1 square, x 2 square and so on the sequence of squared random variables. So, therefore, we need to for answering part a I need to say that the variance of x i square is finite and for this I have this made this calculation that you know if you open up this expectation of x i minus mu raise to 4 because that is what you are given as sigma 4 plus 1. So, yes and we miss this point that for the weak law of large numbers I need to say that these x i squares are identically independently distributed and they have finite variance. So, now we have done enough of this thing to say that if x 1, x 2, x n are identically distributed then obviously x i squares are also identically distributed. This much of probability theory we have done so far and then that they would be independent can also will also follow because if x 1, x 2, x n are independent then these will also be independent. So, I am sure you can work it out while all that we have done in the course by now. And now to show that the variances of these x i squares are finite which will be the same. So, I have just done this exercise and surely there may be other ways of doing it also showing that the variance is finite. So, anyway I just opened up this expression you know then I took expectation inside and you can see that here for example, expectation x i cube then into mu and then when you get 6 x i square. Now, expectation of x i square I know which is sigma square plus mu square because that is already given to me and then this expectation x i is mu. So, therefore, this is what you get mu 4 minus 4 mu 4 plus mu 4 and here it is 6 times sigma square plus mu square into mu square. And then you can write down this thing here and this then tells me see for example, here this is 3 mu 4 because this is 6 mu 4 minus 4 mu 4 plus mu raise to 4. So, that is 3 mu 4 and this is c sigma square mu square. So, you get something like me these two expectation of x i raise to 4 minus 4 times mu x i cube this is equal to a finite number. And therefore, we can conclude that both are finite and so weak law of large numbers can be applied. So, therefore, I needed this condition if the x i's are identically independently distributed and if the fourth power expectation about the mean is finite then the weak law of large numbers can be applied. And now this probability we just need to again standardize our this thing. So, summation i varying from 1 to n summation i varying from 1 to n sigma x i square whole square. Now, variance of x i minus mu whole square summation variance of each x i I mean when you want to compute the variance of x i minus mu whole square then you are looking for I should have said this is variance of x i minus that is variance of x i minus mu whole square. But, expectation of x i minus mu raise to 4 minus expectation of x i minus mu whole square whole square this expectation squared. So, which now since we are given this number which is sigma 4 plus 1 and this is expectation x i minus mu whole square is variance of x i which is sigma square. So, raise to 2 will be minus sigma 4. So, the variance of each of this x i minus mu whole square is 1 and therefore, variance of summation i varying from 1 to n x i minus mu may be I can rewrite this nicely. So, that is readable. So, then what we are saying is that variance summation x i minus mu whole square i varying from 1 to n this is equal to n because each of them has variance 1. So, by central limit theorem this summation i varying from 1 to n minus sigma square divided by root n because each had variance sigma square mean of x i minus mu whole square is sigma square. So, then n sigma square divided by n minus sigma square divided by root n. So, this goes to n 0 distribution as n goes to infinity and therefore, the required probability. So, therefore, I have divided by this thing here we are doing this and yes root n if you want to write this as I can just multiply by root n. So, something is you are wanting this to be less than or equal to 1 I am making it x i minus mu whole square n minus sigma square and then you are dividing by n by root n the required this thing we are looking for 1 by root n. So, variance x i minus again this is summation variance summation x i minus mu whole square divided by n will be n upon n square which is 1 by n. So, therefore, you will divide by 1 by root n 1 by root n and so that you know if I bring 1 by root n here in the denominator then this becomes 1. So, therefore, because this goes to standard normal this whole thing as n goes to infinity. So, this required probability see absolute this less than or equal to I have divided by 1 by root n. So, this becomes less than or equal to 1. So, this is twice phi 1 minus 1 we have already done this so many times in the absolute values. So, this is 2 phi of 1 minus 1 and again from the standard normal tables this is 0.8413. So, this number so therefore, 2 into 0.8413 minus 1 and that comes out to be this. So, integrate probabilities and so on. I mean the results you know there is no end to the results and I will try to I mean I think I will continue with the discussion on the central limit theorem in the next lecture also.