 Welcome back, we were discussing the loss of large numbers in particular for IID random variables. We said that the weak law of large numbers says that the sample average converges in probability to expectation of x. And similarly, the strong law of large numbers says that the same sample average converges to expectation of x almost surely right. So, these two results we will prove today. Strong law of large numbers we will not prove in full generality because it is a fairly long proof. We will prove it for with slightly more additional assumptions. So, first we will do the weak law of large numbers. So, if you remember you have. So, x i are IID random variables with expectation of x finite. Then we have to show. So, we have to show that this average converges to expectation of x in probability. So, there is actually let me do two proofs. So, one of them is a very simple proof which assumes finite variance. So, the theorem itself does not assume anything other than finite mean. But, if you assume in addition that the variance is finite the proof is very easy. Again the theorem does not require finite variance, but the proof is easy if you assume finite variance. So, partial proof. So, if you assume sigma x square this finite in this case it. So, turns out that you can prove mean square convergence very easily. So, in particular. So, this is S n over n. So, we have expectation of S n over n is equal to expectation of x because expectation of S n is nothing, but n times the expectation of x. So, this you will agree. So, what I am claiming is I will be able to prove the mean square convergence of S n over n to expectation of x. And since mean square convergence implies convergence in probability I will be done. So, you consider expectation of S n over n minus expectation of x square. So, I want to claim. So, if I prove this limit goes to 0 as n tends to infinity I would have proved mean square convergence. Now, this is S n over n minus expectation of S n over n. So, this is nothing, but the variance of S n over n this whole term is the variance of S n over n because this is the expectation of S n over n. So, this is equal to variance of S n over n which can be written as variance of S n over what n square. Now, what is variance of S n? See S n is the sum of the first n x i s. Now, these are i i d random variables. Since, they are independent you can add their variances to get the variance of S n. So, the variance of S n will be equal to n times the variance of x i and which is assumed to be finite. So, this is equal to n times sigma x square over n square. So, this n cancels. So, you get sigma x square over n. So, this mean square go as n becomes large goes down to 0 as 1 over n. So, this goes to 0 which means that S n over n converges to expectation of x in the mean square sense. And then mean square convergence implies convergence in probability because of chebyshev's inequality then you will have convergence in probability. So, it is really a very simple proof there are only 3 4 steps to it. If you assume finite variance it is very simple. But, that is just a partial proof because this proof only works when the variance of x i is finite. You may have a situation where expectation of x i is finite expectation of x is finite. But, the variance is infinite that is possible as we saw. Can you think of an example? So, you can think of a p d f or a p m f that goes down like 1 over x cube that will have finite mean and infinite variance. The law of large numbers does not demand that the variance be finite. So, this proof is not a complete proof. So, you want to do a full proof. Full proof is also easy, but you have to invoke characteristic functions. So, what we will use is that we will prove that the characteristic function of S n over n the sequence of random variables S n over n converges to the characteristic function of this constant random variable. And then I would have proven it because if I know that the sequence of characteristic function converges to a legitimate characteristic function. Then I have convergence in distribution. Now, convergence in distribution is same as convergence in probability if the limit is a constant. So, I would have proved my result that is what I will do now. So, consider the characteristic function of S n over n. So, I am looking at expectation of e power i t S n over n. This is what I am looking at. So, this I can write as the expectation of e power i t over n times S n. Now, the reason I did that I took the n to the t is because now this looks like the characteristic function of S n not evaluated at t, but evaluated at t over n. But what is the characteristic function of S n? S n is the sum of these i i d random variables. So, I can multiply the characteristic functions. So, it will be the nth power of the c x of t. So, that is fairly simple. So, this will work out to c x of what does it work out to c x of it has to be evaluated at t over n whole raise to n correct. Next since expectation of since expectation of x is finite c x of t admits the following expansion. So, you can write c x of t as 1 plus i expectation of x times t plus little o t, t squared I think. Little o t or t squared is actually I think little o t as a matter of fact. It is order t squared terms little o t. This is because if the k th moment is finite you can expand till the k th moment and write a little o term. This is the property the moment generating property of characteristic functions. Now, so the excellent. So, you can write this guy as thus expectation of e power i t S n over n. So, the characteristic function of S n over n is that guy. So, that looks like all this right. So, 1 plus i expectation of x t over n right I am writing t over n instead of t plus little o little o of t over n whole raise to n. Now, as n tends to infinity what happens to this. So, this little o term looks like it goes to 0 faster than it goes to 0 like t squared or something t squared over n squared the smaller terms. So, as n becomes large this will become very insignificant correct. So, for all practical purposes this will look like 1 plus i expectation of x t over n whole raise to n right. So, in the limit as n tends to infinity what happens this will become e power i. So, this is like 1 plus z over n whole to n right which is e power which goes to e power z as n becomes large correct. So, this will converge to e power i t expectation of x for all t. So, this means the characteristic function of the sample average as n becomes large if the characteristic function converges to this is the characteristic function of the constant random variable. If you call this constant c right the constant random variable c has characteristic function e power i t c. So, this is the characteristic function of the constant random variable expectation of x right. Now, view expectation of x as a number not as a number, but a constant random variable right. So, this will be the characteristic function of the constant random variable expectation of x which means. So, this implies s n over n converges to expectation of x in distribution that is because we proved a theorem saying convergence of characteristic function to a valid characteristic function implies the convergence of distribution. But now the limit is a constant right convergence in distribution to a constant is same as convergence in probability right. So, this implies s n over n converges in probability to expectation of x. So, this looks like. So, this terms are terms that are going to 0 faster than t over n. So, there is a t over n term n is becoming big right and these terms are becoming very small and. So, for all. So, you can prove this you can prove that you know that 1 plus z over n to the n goes to e power z right that you know for every complex z. Now, you have a little bit more, but that little bit more is very small. So, that does not affect this convergence right. So, this as n tends to infinity this characteristic function converges to e power i t x. Any questions? So, here we have not assume finite variance characteristic function proof is a fully general proof correct. Are there any questions before we move to strong law? So, expectation of x is always a constant, but I am saying that the constant random variable taking value let us say c has characteristic function e power i t c right. So, I am saying this is the characteristic function of the constant expectation of x that is all right. So, let us move on to the strong law of large numbers again x i's are i i d expectation of absolute x is less than infinity you want to prove that s n over n converges to expectation of x almost surely right. So, the proof of the general proof of the strong law is quite long. So, what I will do? So, just like here I did 2 proofs right. One of them was a partial proof assuming something more and then here is a completely general proof. So, for the strong law what I will do is I will give you 2 partial proofs and I will mention how it is extended. So, the first partial proof assumes that the fourth moment of x i's are finite. So, you have you assume a little more, but not even finite variance, but finite fourth moments. So, it so happens that if you assume finite fourth moments of x i's are finite then you have a very simple proof of strong law the very elegant proof it comes out very nicely. But of course, fourth moment it is a fairly strong assumption as assuming that fourth moment is finite. So, I will give you another partial proof that only assumes finite variance it is a little bit longer the second partial proof we assume only finite variance. And then of course, if you do not even assume finite variance you have to it is a fairly long proof actually the proof is there and grim it runs to about 3 pages or. So, it is long, but not horribly wrong it is not 40 pages long right as some proofs can be. So, I will do 2 partial proofs partial proof assume expectation of x power 4 x i power 4 is some eta x some finite number also without loss of generality assume expectation of x is 0 actually the reason I say there is no loss of generality is we can get it back. So, if x i's have some non 0. So, they have finite mean right if they have finite fourth moment they have finite mean you subtract the means if they have non 0 mean right you define x n tilde equal to x n minus expectation of x and then prove the result for x n tilde and then you have to go back to proving it for x. So, this without loss of generality can be justified there is no loss of generality in assuming that all these random variables are 0 mean. So, what you have? So, you have assume finite fourth moment. So, you have so you will do this I think. So, you will so let us look at the probability of the fourth power x i epsilon and. So, this is less than expectation of correct Markov now I am going to complete well let me write it this way as I know absolute s n to the fourth over n to the fourth epsilon to the fourth. Now, I am going to compute this guy absolute s n to the fourth is same as s n to the fourth right this is fine. So, I just took the absolute value out because the fourth power I mean it does not make a difference right because you are taking the fourth power anyway. So, if you look at expectation of s n to the four that will look like expectation of x 1 dot x n to the fourth. So, now I am just going to expand this. So, you will get n. So, you will get the fourth powers of all x i's and there will be n such terms correct. So, this will be equal to n times eta x I will have you will have n times eta x as the first x. So, you will have x 1 to the fourth plus x 2 to the fourth plus so on right there will be n such terms correct plus what else will you have you will have terms that looks like x 1 x 2 cubed and so on correct all possible pairs of x i x j cubed correct where i n j run over all indices correct. Now, what is the expectation of a term like that what is the expectation of x i x j cubed say they are independent. So, you can write it as expectation of x i times expectation of x j cubed, but expectation of x i is 0. So, all these odd terms will go to 0 all these x i x j cubed terms will go to 0 and what else remains you will have terms that look like x 1 squared x 2 squared right x i squared x j squared you will have right how many of those do you have you will have I believe you will have. So, you can do n choose 2 of these right and then you will have to do 4 choose 2 right. So, you will have something like 6 times n choose 2. So, many terms of x 1 squared times x 2 squared expectation of x 1 squared x 2 squared correct. So, just x you are you have 4 of these write it out. So, you will have expectation of x 1 squared times expectation of x 2 squared which is nothing, but sigma x to the 4th right and all other terms will be 0 this verify this I am not I am not 100 percent sure about this I think this is correct, but you will essentially you will get something that looks like n squared times sigma x to the 4th. This is just algebra this is just expand the whole thing out and you making use of the fact that x i is at 0 mean. So, the terms that look like x 2 x 3 cubed will go to 0 in expectation correct and this comes because you are taking you are taking. So, you are looking at terms like x j x j squared x i squared x j squared and i n j you can pick in n choose 2 ways is not it and there are 4 places where you can pick the i's and j's from 2 2 out of 4 right which is why you get 4 choose 2 which is 6. And then you have expectation of x 1 squared times expectation of x 2 squared which is sigma x squared times sigma x squared which is sigma x to the 4th. So, now you have so this is good. So, expectation of so if I go back here. So, let me put that back here all right. So, this I will get. So, I am going this here. So, this will be equal to what I have eta x over n cubed epsilon right plus 6 times n choose 2 is 3 n n minus 1 right. So, let me just upper bound this as actually let me just write 3 n squared over n to the 4 epsilon. See 6 times n choose 2 I have upper bounded by 3 n square like 6 times n choose 2 is nothing but 6 times n n minus 1 over 2 which is 3 times n n minus 1 I am upper voting that by 3 n squared. So, great. So, what do I have? So, I have that probability of S n over n to the 4th exceeding epsilon is less than all that right. So, I will actually I will get it of this n squared will cancel with the n to the 4th to give you n squared epsilon correct. So, what do you have? So, this implies that sum over n equals 1 to infinity probability of S n over n to the 4th. So, if I sum from n to n equal 1 to infinity what will I get? That will be less than it will be less than infinity for all epsilon because this is less than that. And if I sum this from n equal to 1 to infinity I have a 1 over n cubed term. And I have a 1 over n squared term they both finite as long as I do not have 1 over n I am fine right. So, this will be less than infinity. So, what does this imply? So, we have seen this right by from Borel Cantelli lemma we saw if probability of X n minus X greater than epsilon the summation is finite then X n goes to X almost surely. This means that S n over n to the 4th goes to 0 almost surely from which you can show that S n over n goes to 0 almost surely. So, this implies S n over n to the 4th goes to 0 almost surely from which you can argue that S n over n goes to 0 almost surely. This is a very short proof of the strong law you prove this for 0 mean. Now, you can put back the beam you can take this X i this X i you can take as X i X i tilde which is X i minus expectation of X right. And then you can prove for non 0 mean also any questions on this. So, from here how do you conclude that S n over n goes to 0 almost surely. So, the set of all epsilon's where see set of all omega's where S n over n to the 4th goes to 0 for those omega's S n over n will have to go to 0 right saying that the 4th power of a sequence is going to 0 right. This means the sequence has to go to 0 actually you know what. So, I am going to keep this. So, I was about to erase this, but I am going to use this. So, I will erase this. So, I am going to if there are no questions on this I will give you a another partial proof of the strong law of large numbers. This time I will only assume finite variance. So, this assume something very strong right you assume 4th moments of finite which is a very strong assumption. Now, I will do for finite variance. So, once you prove it for finite variance then you have to in order to generalize fully you have to go through a truncation argument. You have to truncate these random variables at some big value b. And so the truncated random variable will have finite variance right. And then you have to show that as b is very large the truncated random variables and the original random variables are close. And then you will get your strong law. So, partial proof 2 assume now that sigma x squared is finite. And let expectation of x is equal to mu. So, begin we assume x i's are i i d non negative random variables. So, we will prove strong law for finite variance non negative random variables. So, in that case so we know that from. So, you call this equation star from star we have. So, I have a bound on the mean squared. So, I am going to look for. So, I am going to look at this guy S n over n minus mu mu is my mean by the way I am going to look for this probability. So, this probability is less than by Markov this is less than sigma x squared over n epsilon agreed correct why is this. So, this is because this will be less than expectation of that squared over epsilon squared. So, just a minute. So, I think there should be an epsilon squared here or if I have. So, I should probably have epsilon squared here. So, they be a square here. So, all I am saying is I am skipping a step here. So, this greater than epsilon is same as yeah. So, this squared exceeding that squared right which means by Markov this is less than expectation of this squared over n times. So, the over sorry epsilon squared. So, there should be a squared correct, but I know that expectation of that object squared is sigma x squared over n. So, agree with me on this. So, all these squares can go away yeah. So, this you will agree right this is basically just actually come to think of it this is just tabby shape inequality right. This is x minus mu greater than epsilon is less than or equal to the variance over epsilon squared correct and variance is sigma x squared over n. So, yeah correct. So, this is 100 percent correct. So, now, this from this if you take limit n tending to infinity you will get 0, but that only help you prove convergence in probability right does not give you convergence almost surely correct. So, you have to somehow jump from convergence in probability to convergence almost surely. So, you can jumping from convergence almost surely to convergence in probability is trivial right it always is true, but you know that there is a partial converse going from convergence in probability to convergence almost surely. Recall that if you submit you will get something that is. So, that guy is not going to be finite that is not going to help you here right find a subsequence you find a subsequence. So, what subsequence will you find now. So, there always see when you have convergence in probability there is always a subsequence a deterministic subsequence that goes to the same limit almost surely correct. Remember the theorem and that subsequence is deterministic it does not depend on omega right. So, I think I mentioned see you have this occasional epsilon excursions in this convergence in probability, but by choosing the subsequence you are able to avoid these excursions right. So, the nice thing is you can choose these subsequence irrespective of the realization see I mean if your subsequence can depend on the realization it is no big deal right I can avoid precisely those excursions right that is not surprising at all. But, I can choose a subsequence which avoids deterministic before hand before even omega realises. So, that I do not have these extra long excursions and therefore, I have convergence almost surely. So, if you consider here. So, what do you think should be the sequence there should be a subsequence of this n. If you go in like k squared if n k let n k equal to k squared. So, the reason I am doing k squared is I want this to go to 0 faster now actually I do not need k square I can even take k to the 1 plus delta, but this is fine. So, I am not looking at s 2 over 2 s 3 over 3 s 4 over 4 and so on I am looking at for the perfect square right k equal to n equal to some perfect square I am looking at that sequence. Then what happens is probability of s n k over n k minus mu you just look at that will be less than omega x squared over n k epsilon square and n k is equal to k square. So, that will be k squared epsilon square. So, if I sum this over k now. So, this is a subsequence n k right index by k. Now, if I sum this over k I get something that is something finite right which means the sequence s n k index by k s n k over n k index by k goes to 0 goes to mu almost surely correct. So, if you have thus sum over k equal to 1 to infinity probability of s n k over n k minus mu greater than epsilon is finite. Because, if you sum this over k equal to 1 to infinity you get something finite right for all epsilon this means s n k over n k goes to mu almost surely as k goes to infinity. So, this is not all that surprising because if you know that s n over n goes to mu in probability. So, they must be sums subsequence s n k over n k which goes to 0 goes to mu almost surely by the theorem right. The good thing is we have identified what that n k is its k squared is enough right. So, that is good now can I actually now that is not enough right I have to prove that s n over n goes to mu almost surely. Now, that is where this assumption comes in. So, this non negative assumption is going to help us now. So, what happens if so next if you consider. So, you are looking at s n over n if k n is a perfect square this helps you right whenever n is a perfect square those terms go to 0 almost surely. But, what if n is not a perfect square right. So, in that case you have to prove. So, you have to look at. So, let say n is some 13 or something look at the number which is the perfect square which is smaller and the next number which is bigger next perfect square which is bigger. So, you can write this is not it s n will lie between. So, since x i is a non negative s n can be bounded between s. So, let me write it like this s k plus 1 squared and s k squared where s squared is less than or equal to n is less than or equal to k plus 1 squared. So, if n is not a perfect square you find the next smallest and the next biggest perfect squares. Well, maybe you should have that less than you can always do that and why is this true because your x i is a non negative. So, this s i can only increase. So, this you will agree with now ideally I want. So, I want some sandwiching argument to go through right. So, I am going to divide this by n which is what I want right. Now, what should I divide this by in order to preserve this inequality. So, let me just divide all of this by n that is definitely fine right. So, I should write that as s k plus 1 squared over n right. I am just dividing everything by n then I can write one more term. So, I can write this is less than or equal to s k squared over. So, I have to divide by something larger. So, I can do I cannot do k squared right then this inequality would not hold right. Then here I have to do a similar trick. So, if s k plus 1 squared over now what do I do instead of n I write I cannot write k plus 1 squared. So, I write k squared. So, you agree with that right this is just algebra. Now, I send now I send n to infinity if I send n to infinity k also goes to infinity. So, k goes to infinity what happens to this term. See I know that x k squared over k squared goes to mu almost surely which mean x k squared over k plus 1 squared also goes to mu almost surely you see why well because it is again just simple algebra right. So, send a k n to infinity. So, then you will have. So, as n tends to infinity k will also go to infinity. So, limit k tends to infinity s k squared over k plus 1 squared less than or equal to limit n tending to infinity s n over n less than or equal to limit k tending to infinity s k plus 1 squared over k squared. So, I have so I have a sandwiching argument coming up right. So, this as k tends to infinity. So, this guy can be written as s k squared over k squared times k squared over k plus 1 squared. So, limit k tending to infinity of that correct, but as limit k tends to infinity the almost sure limit of this is mu and as k tends to infinity that goes to 1 that deterministically goes to 1 right. So, the limit of this as k tends to infinity is almost surely equal to mu right. So, this is this limit is almost surely equal to mu correct and similarly this is also almost surely equal to mu. So, therefore this limit exist almost surely see right, because this guy is this limit is sandwiched between 2 objects which are almost surely converging to mu right. So, this limit exists almost surely and that limit is equal to mu almost surely right. So, that is the sandwich argument. So, as n over n goes to 0 goes to mu almost surely we have assumed finite variance and we also assumed non negative random variables right. So, to extend possibly negative random variables with finite variance right x i is equal to x i plus minus x i and proceed. So, the same trick right x is equal to x plus minus x minus and so you can get rid of the non negative assumption. So, for finite variance random variables we have more or less we have actually proved the strong law of large numbers. So, in order to remove this finite variance assumption as well you have to as I said you have to do a truncation argument you have to define x i delta equal to max of b comma x i right. So, you truncate these random variables x i when they take very large values greater than b some b you call it equal to b. Now, this truncated random variables will have finite variance. So, the law will go through right and then you have to make sure that this b goes to infinity in an appropriate at an appropriate rate. So, that is where the difficulty is that is it is not very conceptually very demanding it is just messy. Actually Grimit has a full proof of without any further assumption it has a proof and also Terry Tau's blog has a very nice exposition on the strong law of large numbers you know the famous mathematician Terry Tau his blog has a very good article blog on strong law of large numbers you can read. I will stop here.