 We now come to the proof of CLT. Let us assume that a sampling has been done, the sample consists of n measurements n elements. So, let us say the random variable x and the realization has led to values x 1, x 2 etcetera to n values x n and we can call this as a set S and these x's each of them is drawn from a population having mean mu and standard deviation sigma. They are independent, each of these measurements has not influenced the other measurement and they are identically distributed because they all come from supposedly from the same underlying population with the some distribution function say p x. At the moment let us say x is continuous. So, some distribution function p x with the mean mu and standard deviation sigma. So, what it means is if I consider these elements as independently distributed belonging to these distribution p x, then the average of the x 1 itself should be equal to mu. Supposing we call it as this is a set itself is let us say one element of a super set. I will do several such n trials measurements of n and then take the average of the first random variable that you got first value that you got. Eventually if very large number of such sample of samples has been taken then the x 1 average will be mu. This is an assumption it has to be otherwise it would not have come from the same population. So, this is true for any second element as well. So, in general therefore, this is true for x 2 average etcetera in general for x i average. So, this is also written as E another notation for it E x i equal to mu alternately. Similarly, the variance x i that should be sigma square for the same reason that each of the x's has come from the same underlying population p x. Hence if I had large sample of samples and then averaged over the ith element of each of them its variance should have exhibited this variance of the population universal population. This is true for all i equal to 1 comma n. Now, we construct a random variable which is the sample mean. So, we now construct the sample mean it can be denoted as x bar, but here for distinctiveness we call it as y as the sample mean y sample mean. So, it is defined by the formula equal to 1 by n sigma x i i equal to 1 to n. We use the notation y for distinctiveness you could have called it x bar also. Now, since each of this elements of this sum members of this sum is a random variable independently distributed y also is a random variable. Let us evaluate the parameters of y if it is a random variable it must have also a distribution we do not know, but it must have a distribution. So, it must have a mean it must have a variance. So, what is the expectation of y E of y will be equal to E of 1 by n sigma x i i equal to 1 to n and the sum and the expectation will commute. So, it will be 1 by n sigma i equal to 1 to n E of x i and E of x i by definition is mu. Hence, this becomes 1 by n i n times summed mu which is n mu by n equal to mu. Hence, the law of expectation shows that the sample mean eventually should approach the true mean of the population itself. So, if you take large number of sample sample of samples and the average of that mean obtained should eventually be the same as the population mean from which it has come. And this proof has come by simple definition of randomness definition of independence and definition of identically distributed. Similarly, we can estimate now the variance of the sample mean around its mean this by definition is variance of 1 by n x i i equal to 1 to n and variance of a constant n is a constant fixed number. So, variance of n into something is always 1 by n squared because variance is always a squared squaring averaging squaring and averaging process and hence this will have 1 by n squared i equal to 1 to n by commuting it becomes variance of x i and variance of x i is sigma squared. So, it is going to be 1 by n square summed i equal to 1 to n sigma square by the very definition of the random variable y this is n sigma square by n square which is sigma square by n. So, we have very important result here again obtained by very simple operations which says that the while the sample mean approaches the universal mean this variance of the sample for a finite sample is variance of the population divided by the number of elements in the sample. Here we have used the property in deriving the above we use the property variance of alpha into some random variable x equal to alpha square into variance of x. So, the simple the almost deceptively simple derivation that we did before shows what is called as law of large numbers that is if I take sufficiently large sample the sample mean will approach the true mean the variance in the of the sample mean about the estimated mean is going to progressively become less and less and it is going to hit the true mean with the 0 variance if n tends to infinity. So, this is actually called the law of large numbers saying that given sufficiently large sample one is a sure to obtain the true mean. So, it is a kind of assurance theorem saying that one will definitely obtain the true mean of the population given sufficient effort and time and the number of elements in this measurement. So, we with this observations we now proceed further it is advantageous to postulate as a called normal variate z equal to the sample mean minus the true population mean and the standard deviation of the square root of the variance that we just now show was this is actually a definition it is identically it is a definition of z it is a construction actually and without biasing this construction does not assume or presuppose anything although it looks like a standard normal variate it is actually a new variable which measures the deviation of the sample mean from the true mean and scales it in terms of the standard deviation of the sample. We can now write it in explicit form in terms of the x's as this is root n by sigma and this is going to be sigma x i by n minus mu. So, you can simplify it you can multiply n here and take the common as a common factor out of the bracket and you will end up with 1 by root n sigma x i minus mu divided by sigma i equal to 1 to n. It is very easy to see the last equation following from the previous one. So, in other words we have expressed the normal the variate z as a sum of individual variates which I call as z i that is I can write it as 1 by root n sigma z i where z i is or individual variates defined as x i minus mu divided by sigma. We can do a similar exercise as to both the mean and the variance of the quantity z. So, way for example, the mean of z or z bar equal to 1 by root n sigma i equal to 1 to n mean of x i bar minus mu divided by. So, x i bar is mu itself. So, this will be 0. So, the advantage of having chosen this variable in the form of z is that it is mean is 0. The variance of z very easily you can show that this will be 1 that is because variance of z i also will be equal to 1. It is very easy from the very definition we propagate the variance and we can show that the sigma sigma cancel out and you get 1 and also cancels. Now, something about the domain of the z variable even if our original sample was discrete I could render it continuous even if the original variable for example, was defined only on integer points such as Poisson distribution. We saw that we can we can render it continuous via delta function. Thus we if for example, the original distribution was defined only on P n n equal to 0 1 2 etcetera we can convert it into P x as sigma P n delta x minus n defined on those points or all n. That is point number 1 basically we can convert it into a continuous distribution. Similarly, even if my probability distribution is actually defined on a finite interval let us say it is defined between some between a and b. I can and my P x is here and outside this interval I can now assume that the probability is 0. So, I can constructed an extended probability which basically represents the original probability in the domain where it is required say a and b and 0 outside the interval a b. So, this basically allows us to treat x as a continuous variable in the domain minus infinity to infinity. Hence, we can assume without loss of generality this is important without loss of generality that x is defined in an interval varying from minus infinity to infinity sorry entire real line. So, to repeat even if my actual domain is not of the probability is some finite interval I assume the probabilities I assume the function to be extended with a value 0 outside it. So, mathematically the function and a function f x which coincides with the probability function, but defined over the entire real line can always be constructed. So, now coming to the value of z we did we defined z as sigma z i by rho 10 i equal to 1 to n. So, each of the z i is corresponds to a measured value of x i with of course, mean mu. So, this also a random variable. So, if I obtain values of z 1 z 2 z 3 z 4 for a given values of them the value of z is fixed. So, for a given set of values z 1 z 2 z n the distribution function followed by z therefore, should be a delta function it is a definite value we stated it when we are discussing delta function that one of the usefulness of delta function that it helps us to define a definite distribution or a distribution for a definite quantity very narrow distribution. So, we can say that for a defined values of for a given set of values of z 1 z 2 z n the distribution function p conditional to that constraint z should be equal to a delta function corresponding to the argument 1 by rho 10 sigma z i conditional for a given realization of the random variables. I have made measurements I have got some numbers and that the those numbers are now fixed. So, the distribution of z value coming out of those numbers is a delta function that is all the states it is actually a statement of a equality that z it is actually a statement of the equality in a distribution form. However, the fluctuations in z that will arise out of fluctuations in the values of z i can now be obtained by multiplying this distribution by the underlying distribution function for each of the z values in other words the true distribution function for the variable z can be constructed over integrals over all values of z there are n of them and each of them can fall anywhere in the original distribution and hence it must be obtained by varying over all possibilities z minus sigma z i by root n into f the original distribution underlying distribution we called as z 1 and each of them is an independent variable. So, it will be f z 1 f z i f z n similarly there are n independent variables d z 1 to d z 1 because each of them can take up any value. So, it is a n fold integral now it is not easy to evaluate this integral in general and we in the present context we do not even know the f distributions. In fact, we we have set about to calculate a mean of that distribution we have no detailed idea. So, we proceed by using the concept of Fourier transforms to transform this function into its characteristic function. So, we define the characteristic function of this joint distribution for z of n samples k is our variable characteristic function is defined via a carrot is is actually by definition expectation of i k z or explicitly it is integration over the domain of z P n z e to the power i k z d z. So, we multiply the P n z by e to the power i k z and integrate over all z values or domain of z. So, when we actually go back to the distribution that we saw for P n z we can execute this Fourier transforms very easily and that will read P n carrot k. Now, the conjugate variable k will enter into the argument of the characteristic function which is nothing, but n fold integral z 1 to z n of f z 1 to f z n e to the power i k z. Then the delta function z minus sigma z i by root n the characteristic function requires integration over z whereas, the probability functions have to be integrated over d z 1 to d z 1. So, it has a n plus 1 integrals now. Here of course, we have assumed commutability between the integral over z and integral over each of the z 1 values. So, with that commutability we can first do the integration over the delta function easily and the basically then delta function acts as a selection function it selects the value of the quantity corresponding to where it become 0. So, here z z minus its argument become 0 when z equal to sigma z i by root n. So, wherever z is there it will be replaced by sigma z i by root n by definition. So, it will now become 1 integration having been carried out then plus 1 integral will become an n fold integral again for each of the z i's e to the power i k into by root n into sigma z i i equal to 1 to n d z 1 d z n. In view of the independence of each of these distributions and the fact that e to the power the sum exponential of a sum of quantities equal to the product of exponentials of those quantities. So, using that we can write it as z 1 f z 1 e to the power i k by root n z 1 likewise z 2 f z 2 e to the power i k by root n z 2 etcetera there will be an n fold integral now of the quantity f z with corresponding exponential variable. Since each of the f's is identical this can be written as a n fold integral because each of them will give you the same value z is a now becomes a dummy index when you integrate. So, here d z 1 and this should be d z 2 etcetera. So, this is equivalent to integral of z 1 here the domain varying from minus infinity to infinity f I can now call it as in general any arbitrary values say z 1 itself e to the power i k by root n z 1 d z 1 whole to the power n. But by definition the Fourier transform of the underlying distribution p underlying distribution with respect to k is going to be minus infinity to infinity e to the power i k z p z d z that is a dummy variable. So, in the present case we have k by root n. So, therefore, each of the terms will contain k by root n minus infinity to infinity e to the power i k by root n z p z d z. So, by using this definition the characteristic function p n of the joint distribution of z's that is that z value becomes an n fold of the characteristic functions of the underlying distribution evaluated at the value k by root n. So, it is a very important result the Fourier transform of the joint variable equal to the Fourier transform of the individual distributions evaluated at the value k by root n to the power n n fold value. So, this from this result we can now go forward use the concept of moments of characteristic functions as coming from the derivatives of the characteristic function then use the property of existence of moments up to at least to the second order that naturally leads us to the center limit theorem. We see it in our next lecture. Thank you.