 So, now I will talk after having discussed the weak law of large numbers, we will talk about strong law of large numbers and I will first just state the theorem. So, the theorem simply says that if x 1, x 2, x n is a sequence of independent and identically distributed random variables, each having a finite mean mu equal to expected x i, then with probability 1, see this is the important thing. Now, we are saying that with probability 1, this average of the sample values x 1 plus x 2 plus x n upon n will converge to mu as n goes to infinity. So, that means this is a sure event. So, therefore, you can immediately see the difference between the weak law of large numbers, there it said that the probability that in probability x bar converges to or x bar n converges to mu, here we are saying that with probability 1, x bar n will converge to mu. So, that means this is a sure event provided the expectation of each of the x i's is finite. So, before we start proving the theorem, let us just interpret what does it mean and so what we are saying is that if you conduct sequence of independent trials of some experiment e, some experiment, suppose you conduct independent trials of an experiment, if say for example, test tossing two coins. So, you go on doing that and then e is a fixed event of the experiment. So, you decide that you just decide on one of the events that will occur when you are conducting this experiment. Say for example, you are tossing two coins and you want two heads to appear three times, see you know one after another, suppose e is that event. So, you go on tossing the coin, the two coins and you do the experiment till your, in this case I am talking of the occurrence of this thing. So, maybe we can say that I toss two coins 10 times and then I want to see how many times I get two heads, that means both the coins show head. So, that would be my event e for example. So, e is a fixed event of the experiment. Then and let p e denote the probability of the occurrence of e on a particular trial. So, this is probability of occurrence of e on a particular trial. Now, define x i as 1 if e occurs on the ith trial. So, I am defining an indicator variable just to show you that how we can interpret this strong law of large numbers. So, it says that if x i is 1, if e occurs on the ith trial and 0 if e does not occur on the ith trial. So, this will be the indicator variable of the event e, that means if e occurs on the ith trial will say x i takes the value 1, otherwise x i takes the value 0. So, then what the strong law of large numbers is saying that see this sequence x 1 plus x 2 plus x n upon n, this is converging to mu as n goes to infinity with probability 1. So, that means and what does this count x 1 plus x 2 plus x n, x 1 plus x 2 plus x n is the number of occurrences of e in the first n trials. Because x i is 1 if e occurs in the ith trial. So, when you add up x 1 plus x 2 plus x n, that will be the total number of times e has occurred when you have conducted the first n trials. You just started and then you started counting, you started your trials and then you started to count the number of times e occurs and that is given by x 1 plus x 2 plus x n. So, number of and strong law of large numbers is saying that this ratio, that means the number of times e has occurred divided by the total number of trials that will converge to your expected value of expected value of x i which is equal to p e. So, this we are denoting the probability of the occurrence of e by p e. So, this is the probability of e. I mean I have denoted p e by the probability of occurrence of e. So, this ratio will converge to e x i which is the probability of e and this is with probability 1. So, this is a certain event. So, this is interesting interpretation and therefore, this the strong law of large numbers reinforces our concept of the way we had defined probability through relative frequency. Now, let us prove the result. So, we have assumed that expectation of x i minus mu raise to 4 is equal to k is less than infinity. So, we are assuming that the fourth moment about the mean is finite and then we show that. So, let us define s n as sigma i varying from 1 to n x i minus mu. Then, we want to compute expectation of s n 4 which means that sigma x i minus mu this whole thing i varying from 1 to n this whole thing raise to 4 expectation of this. So, now, if you expand this. So, I should have mu this part here also summation. So, this should be sigma x i minus mu I am writing sigma this. So, therefore, this whole thing is 4. So, this should be this and then this whole thing is raise to 4. So, sigma x i minus mu i varying from 1 to n and I am saying s n 4. So, this whole thing raise to 4 and then this expectation. So, when you are now taking the fourth power sigma x i minus mu raise to this whole thing raise to 4. So, therefore, I am now expanding this by the binomial theorem. So, this will be summation i varying from 1 to n x i minus mu raise to 4. So, this is your up to n terms. So, each raise to fourth power then you will take 2 at a time product of 2 at a time. So, this will be 4 times sigma i j varying from 1 to n x i minus mu cube into x j minus mu where i is different from j right. And similarly, then you will again take 2 at a time i and j and this will be 6 times x i minus mu whole square into x j minus mu whole square summation i j from 1 to n i again not equal to j. Then you will take 3 terms at a time. So, i j k and that will be plus 4 times summation i j k all varying from 1 to n, but i is not equal to j is not equal to k. So, all 3 indices have to be different and this will be x i minus mu whole square into x j minus mu into x k minus mu. And finally, product of 4 terms where again i j k l are all different I should have said here i not equal to j not equal to k not equal to l right. And this is also varying from 1 to n right. So, this is x i minus mu into x j minus mu into x k minus mu into x l minus mu. So, this is the expansion of e s n raise to 4 and so the expectation is all outside. So, this is the big bracket and expectation of this. Of course, expectation can go inside it is a linear function. So, in the sense that yes, so expectation can be taken inside. Then I have assumed independence of the random variables x 1 x 2 x n. So, then expectation of the product is product of 2 random variables is the product of the expectations. So, e can also go inside here now inside the summation sign. And since expectation of x i minus mu is 0 for all i. So, you see that the expectation of this will be 0. And similarly, this will not be 0, but here again you have linear terms. So, expectation of this and expectation of this will also be 0. And here of course, all the 4 expectations will be 0, because these are independent. So, this will be summation expectation of x i minus mu into expectation of x j minus mu and so on. So, these terms will disappear. So, you only be left with sigma i varying from 1 to n x i minus mu raise to 4. And then 6 times summation i j varying from 1 to n i not equal to j x i minus mu whole square and x j minus mu whole square. Now, we have already assumed that this is equal to k and this is less than infinity. So, here you have n such terms and again independence tells you that you can just add up these numbers. This you can add up k n times. So, this will be n k. And then here you are saying i is not equal to j. So, the choices you can have is n into n minus 1 by 2. So, this many pairs you can have i i j. So, that i is not equal to j. So, therefore, this will be n into n minus 1 by 2. So, that cancels out with 6. So, it will be 3. 3 times this is what you will get. What we are saying is that since variance of x i minus mu whole square because I am assuming that this is always non negative. So, this is equal to expectation of x i minus mu raise to 4. If I rewrite the expression for this, this is the fourth moment expectation of the fourth moment about mu minus expectation of x i minus mu whole square. Now, variance of x i minus mu whole square. So, that will be expectation of the square square of this. So, that is raise to 4 minus expectation of x i minus mu whole square then whole square. This of course, is your variance of x i. So, anyway. So, then since this is non negative therefore, this from here it follows that your expectation of x i minus mu square whole square is less than or equal to expectation of x i minus mu raise to 4 which we are taking to be k. So, therefore, this is also finite. So, therefore, everything is finite here. These things are also finite because this square is finite. So, therefore, both of these are finite. So, therefore, expectation of S n 4 is less than or equal to if you want to write n k plus 3 n into n minus 1 into k because each of them is less than or equal to root k. So, that becomes k here. And therefore, when you divide the whole thing both the sides by n raise to 4 expectation of S n 4 divided by n 4. So, this becomes k by n cube 3 k upon n square into 1 minus 1 by n. So, this you can neglect for large values of n this will become 1. Now, since 1 upon n cube sigma 1 upon n cube and sigma 1 upon n 4 are both convergent series remember because sigma 1 by n cube and sigma 1 by n 4. So, n goes to infinity 1 to infinity these are convergent. So, now, I can take the that means when I take the summation here this is a convergent series because both these series are convergent. And so, I write expectation of sigma n varying from 1 to infinity S n 4 upon n 4 is equal to this because since this is a convergent series I can take expectation inside and. So, I get this here and yes. So, see this is what we have shown is that this is finite expectation of S n 4 upon n raise to 4 summation and varying from 1 to infinity this is a finite series. So, therefore, with probability 1 this summation n raise to 1 to infinity S n 4 upon n 4 should be less than infinity. I mean see actually we have shown that each of this is because each of this is k upon n cube plus 3 k by n square into 1 minus 1 by n. So, therefore, this summation when I take the summation here sigma 1 by n cube and sigma 1 by n square they are both convergent. So, therefore, this is a convergent series, but because this is a we can take E outside because of linearity. So, then this is finite expectation of sigma and varying from 1 to infinity S n 4 upon n raise to 4 is finite. And so, we are saying that the inside thing this expression or this series must be finite because if there is some positive probability that the sum is not finite. If this sum is not finite then its expectation will not be finite. And we have shown that the expectation is less than sigma expectation this thing and therefore, that thing is finite. So, this must be finite because if there was any positive probability that this is not finite then the expectation would not be finite. So, therefore, I am saying with probability 1. So, this is the main point. I will repeat the argument that we have said that this is a finite series, but this I can rewrite as expectation this. And why we are saying this because this whole thing is finite because if this was not finite then expectation would not be finite, but here we have this is this whole thing is finite. So, this is equal to this and this is finite. So, therefore, sigma n varying from 1 to infinity S n 4 upon n raise to 4 is finite. And if a series infinite series has a finite sum it is a convergent series then the n th term must go to 0. Otherwise again from your convergence of series you know that this is the necessary condition that the n th term must go to 0 if the series is convergent. So, therefore, sigma S n 4 upon n 4 and varying from 1 to infinity less than infinity implies that the n th term must go to 0 as n goes to infinity. And if the now this if this goes to 0 then the fourth power one fourth one fourth root of this will also go to 0. So, therefore, limit S n upon n as n goes to infinity is 0. And so just replacing the value of S n here this is sigma x i minus mu by n and varying from 1 to i varying from sorry i varying from 1 to n limit as n goes to infinity is 0. And so you can just take summation inside here. So, sigma x i by n i varying from 1 to n limit n goes to infinity is mu. So, this is with probability 1. So, essentially here I just needed the fact that to prove this strong law of large numbers that means first of all let us just be clear. So, what we are saying is that this will happen with probability 1. So, that means it is a sure event. And so as n goes to becomes larger and larger what we are saying is that this x n bar your x n bar will converge to mu. So, it will get closer and closer to mu and this is a sure event this is happening with probability 1. In the law of weak law of large numbers I we just simply said that the probability of x n bar minus mu see this value greater than delta probability of this could be shown to be less than epsilon. And then of course this is n. So, therefore this was only in terms of probability. Now, here we are saying that this is a sure event that x n bar must go to mu as n goes to infinity. The thing is that and of course here I just needed the fact that expectation of x i minus mu raise to 4 this thing is less than infinity. So, what I want to say is that if the kind of distribution that we have discussed in this course all of them I could show you the existence of M G F. I have not taken any distribution for which the M G F did not exist or for which the mean and the variance did not exist. So, in fact all the distribution that we have considered here. So, therefore you can see that for all of them this condition will also be satisfied. Because if the M G F exists then the fourth moment will also be finite. In fact, the moment say M G F you can what we mean by M G F is that is when you expand it you get as different powers of the power of t raise to n upon n factorial would give you the nth moment about the origin. So, if that is finite then you can say that this is also finite. And so, therefore the strong law of large numbers also holds for all these distributions. So, the weak law of large numbers and the strong law of large numbers both hold. So, essentially it is only when you have situations where your well actually may be I should not really worry about that part. But essentially the proof has been this proof has been given under the condition that expectation of x i minus mu raise to 4 is less than infinity fine. And that this is a sure event that means here this will converge the x n bar will converge to mu as n goes to infinity with probability 1. Now, just want to look at Sterling's formula here and see all of you know that n factorial can be approximated by under root of 2 pi n into n by e raise to n. So, many times this is a very useful way of approximating the factorials. And in many limiting situations and so on we it is very helpful to be able to replace n factorial by this and then you can get good results. So, in other words what we are saying is that n factorial upon under root 2 pi n n by e raise to n goes to 1 as n goes to infinity this is the idea. Now, the solution what we are doing here is we are saying that let x i be Poisson 1 that means the lambda is 1. So, mean this thing the parameter for the Poisson distribution is 1. So, let me take x i this then take n to be sigma x i i varying from 1 to n. So, this will be Poisson n and for Poisson n your variance that means variance of n is also n remember for Poisson lambda mean and variance are the same and they are both equal to the parameter lambda. So, now if you want to estimate this probability n equal to n this using the central limit theorem using the central limit theorem I will say that x this can be approximated using the continuity factor x lies between n minus half and n plus half where x is your normal n n. So, applying the central limit theorem I will approximate this probability by saying that the corresponding normal. So, for large n we will say that n behaves like a normal variable with mean n and variance n. So, this is what you want to compute and therefore in terms of. So, when I want to write this probability. So, this one because x is normal I will write as 1 upon under root 2 pi n because the variance is n. So, standard deviation will be root n and this will be n minus half to n plus half of e minus x minus n whole square to n dx. So, this is my probability using because I have used the central limit approximation fine. Now, just look at this integrand. See what I am saying here is that x minus n whole square upon 2 n at the lower limit n minus half is n minus half minus n by 2 n whole square which is 1 by 4 into 2 n. So, this goes to 0 as n goes to infinity and so e raise to minus something going to 0 is 1. Similarly, when you substitute n plus half for x then again this will be 1 by 8 n. So, you see the in the limiting case as n becomes large the 2 limits come close and the value of the integrand is close to 1 because for n large this is always 1. So, therefore you can always say that this integral is you can multiply you take the maximum value of the integrand. So, which is 1 into the length of the interval which is also 1. So, this is this upon root 2 pi n. So, just apply this approximation because the theorem from integral calculus. So, this integrand is double throughout and then you multiply that by the length of the interval. So, you get 1 upon under root 2 pi n and so this probability is approximated by 1 upon under root 2 pi n and but since this is we said this is Poisson random variable because we started with the assumption that n is sigma x i. So, then this probability in terms of Poisson probability can be written as e raise to minus n n raise to n divided by n factorial and. So, from when you equate these two when you get that n factorial I mean you equate this with this and then approximate by 1 upon under root 2 pi n. So, your n factorial is under root 2 pi n n by e raise to n. So, you know see the interesting applications I mean I just came across that this application I thought I will discuss with you about the central limit theorem. So, strong law of large numbers we have sort of established, but as we saw that for us actually there will be no difference and we will continue to approximate mu by x n bar for reasonably large values of n. So, now I will want to talk about joint moment generating functions we talked about the moment generating function for a single random variable and then we talked of you know we could compute for independent random variables when you talk of some of independent random variables like 2 random variables x and y are independent. Then I could also you know because of through independence we could define the moment generating function of x plus y because it was just the product of the moment generating function of x and y, but there should be a general definition of moment generating function of more than 1 variable when they are not independent. So, therefore, just completing this you know this part of the theory. So, what we are saying is so the definition simply says that if x 1, x 2, x n are n random variables and then the joint moment generating function of these. So, I mean if they are these n random variables, so we are given the joint density function of the n random variables then we can define the moment generating function of these n random variables as of course, right now I am not. So, this simply the expectation that means, so you now need n real numbers t 1, t 2, t n. So, we will say that the moment generating function of x 1, x 2, x n is actually and I will denote the m g f by m of t 1, t 2, t n this is expectation of e raise to t 1, x 1 plus t 2, x 2 up to t n, x n for all t all real numbers t 1, t 2, t n for which this expectation is defined. And the individual m g f's can be obtained from this by putting all but, one of the t i is equal to 0 and then getting the corresponding function from here because then it will be say for example, for the ith you want to compute the m g f of or obtain the m g f of the ith random variable here then I will put all other t i is equal to 0. So, m of x i at t would be e t x i expectation of e raise to t x i which will be in terms of the function m here would be simply 0 0 and then t i you write as t and all other are 0's. So, therefore, when you define the joint m g f you can get the individual m g f also and just as in one variable case we had we did not prove the result, but we stated it and said that if the moment generating function uniquely defines your distribution function also. So, once you have obtained the moment generating function of a random variable then you know what its distribution function will also be and of course, it is unique. So, here also in the joint case we will again just assume this result that the moment generating function uniquely defines the joint distribution of x 1, x 2, x n. So, and now what we want to so it uniquely defines this and now under independence. So, therefore, if the joint density function is uniquely defined then we can conclude that if the random variables x 1, x 2, x n are independent then I mean this is the condition if and only if your m t 1, t 2, t n can be written as the product of individual this thing. So, here if you want to write you can do this also and that means, you can decompose your joint moment generating function into the product of individual m g f. So, I mean assuming that this result we have sort of accepted that the m g f will define the distribution function uniquely and so now we can so you want me to show let us show the if and only if part. So, now if they are independent then of course, the things follow immediately because you will write expectation of e t 1, x 1 plus t n, x n and that will be and this you can then write as product and because x 1, x 2, x n are independent the expectation I can take inside and so this whole thing this can be written as this. This is because x 1, x 2, x n are independent product of the expectations and so it immediately follows that this is your m g f of x 1, this is m g f of x n and so you can write these. Now, the other way now let us show the converse that is now suppose 1 holds. So, we want to show that this relationship will also we can conclude from here that x 1, x 2, x n are independent random variables. So, we can see if you look at the right hand side of 1. So, this part then this represents the m g f of n independent random variables because it is the product of n m g f. So, therefore and which we know we have said that if two random variables are independent then the m g f of the two random variables will be the product of individual random variables. So, just extending that rule this represents the product of this represents the m g f of n independent random variables. Now, the i th of this random variable the i th term here m x i t i of which has the same distribution as x i because. So, here each one of them for example, m x 1 t 1. So, this is the moment generating function of x 1 and as we have been saying that the moment generating functions characterize the p d f uniquely. So, therefore each of the terms here each of the m g f here determine uniquely the corresponding distribution p d f or so which is of the i th variable. So, just as for a single random variable the m g f uniquely determines the distribution of the random variable the joint m g f uniquely determines the joint distribution. So, therefore from here we can say that the product of the. So, that means the joint m g f this will give me because this is the joint m g f of x 1 x 2 x n. So, this will determine the joint p d f of x 1 x 2 x n, but then that is we have shown is the product of the individual p d f's and this is how we have defined independence of the random variables x 1 x 2 x n that is if the joint p d f which I have written down here. So, the right hand side of one represents a distribution which is the product of individual distributions of x i's and therefore this is the and so here and therefore you can expression wise also write that m of t 1 t 2 t n is equal to this expectation of t 1 x 1 plus t 2 x 2 plus t n x n into f x 1 f x 2 f x n. That means the joint density function of x 1 x 2 x n joint p d f is the product of individual p d f's. So, this is what we are concluding we can immediately conclude from here right because the m g f's uniquely characterize your p d f's. So, therefore just using that fact I can conclude that the joint p d f is this and therefore x 1 x 2 x n are independent random variables. So, a neat proof of the fact that if you can write the joint moment generating function as a product of individual this thing then it implies that the random variables are independent and if they are independent then you can also write the joint m g f as the product of individual m g f's. So, we have been using some of these results, but now I have sort of given you a theoretical I supported it by theory. See in this example I am just trying to demonstrate the use of you know joint m g f's. So, even though you know x and y are independent random normal random variables each with mean mu and variance sigma square. So, if you start with that then we have already shown that you know by using the method of transformation that x plus y and x minus y are also independent random variables and in fact they are normal random variables. But now I want to use the method of m g f to show that x plus y and x minus y are independent and then of course once we have shown once we obtain the individual m g f's then as I have been saying that once you know the m g f you can also determine the distribution function or the density function of the random variable. So, we will do that. So, just as an illustration of what we have just discussed I want to go through this example. So, since x and y are independent and they are normal independent random variables and they are both mu and sigma square. So, therefore, x plus y will be normal 2 mu and then variance will be get added because they are independent. So, 2 sigma square and for x minus y the mean will be 0 and the variance will be again 2 sigma square. So, therefore, if you want to write m g f of x plus y because it is normal with mean 2 mu and variance 2 sigma square. Therefore, it will be e raise to 2 mu s plus half into 2 sigma square s square. This is and similarly, m g f of x minus y will be because the mu is 0 the mean of this is 0. So, it will be e half into 2 sigma square into t square this is simply t square. Now, by our formula we will write the joint m g f of x plus y and x minus y. So, this will be expectation of e raise to s times x plus y plus t times x minus y for s and t real numbers s and t belonging to r which I can write. So, why am I rewriting this? So, now I collect the x terms and the y terms. So, this is s plus t is the coefficient of x and s minus t is the coefficient of y. So, this is what you have. Now, we will use the independence of x and y because this is simply some s plus t times x which can be your t 1 and s minus t which can be your t 2. So, this is e raise to t 1 x plus t 2 y, but x and y are independent random variables. So, therefore, I can decompose this m g f into the individual m g f. So, this will become expectation of e raise to s plus t into x into expectation of e raise to s minus t y. So, now I can use the independence of x and y because this is written as this way. So, s plus t can be treated as another real number and s minus t can be treated as a different real number. And so because of independence of x and y, I can decompose into this. Now, let me write the m g f of because x is again normal with mean sigma and variance sigma square and this also is mean mu and sigma is the variance. So, therefore, when I write the m g f s plus t e raise to s plus t into mu plus half s plus t whole square sigma square and the other one will be e raise to s minus t mu plus half s minus t whole square into sigma square. And then see we just rearrange the term simplify the expression. So, s plus t into mu and s minus t into mu will become 2 s mu and here the product terms will cancel out the 2 s t here and the minus 2 s t will cancel out and it will be e raise to half into 2 sigma square s square plus t square. So, now again I collect the s terms. So, this is e raise to 2 s mu plus half into 2 sigma square s square and this is e half 2 sigma square t square and this is what exactly see this is the m g f of x plus y because this is and that is what I am saying. So, this is m g f of x plus y because this is 2 mu and 2 sigma square and you know we can also say that these are x plus y is normally distributed with mean 2 mu and 2 sigma square and this is the m g f of x minus y. So, there you see that mu is 0 and the variance is 2 sigma square. And so, since I have from the theorem that I had just stated and proved to you this says that if your joint m g f can be written as a product of the individual m g f then the variables must be must be independent. So, we conclude that x plus y and x minus y are independent and also we can conclude that x plus y is normal 2 mu 2 sigma square and x minus y x minus y is normal 0 2 sigma square. So, you know with a series through a series of examples I will try to revisit the results which we have already I will try to revisit the results which we have already you know obtained other way by you know especially I will apply this concept of joint m g f to sums of random variables. And then try to show you that sometimes this method is easier and can get the results faster. So, it depends on the situation and of course, a lot of experience, but so this is also another important tool and I thought that in this course we must define this and you know give you the results. So, that you can sometimes when other methods do not work this will come to be proved to be quite handy.