 So, we were discussing convergence of characteristic functions right. So, one theorem one result we said was. So, if so this we proved yesterday if x n converges to x n distribution then c x n of t converges to c x of t for all t right. This is a theorem we proved yesterday. So, convergence in distribution necessarily implies the convergence of characteristic functions. This we proved using score card and dominated convergence theorem right. We did this proof. Now, we were talking about the opposite. So, the converse. So, if you were if you is it true that if the sequence of characteristic function goes to has a certain limit is it true that you have convergence in distribution right. So, this is the result we were talking about. So, it turns out. So, I will state this properly. So, let x n be a sequence of random variables with characteristic function c x n of t and let x be a random variable with characteristic function c x of t. If c x n of t converges to c x of t for all t then x n converges to x in distribution. So, what this theorem says I think I did not state it very precisely in towards the end of the last class this is the correct statement. So, if you have a sequence of characteristic functions which converges to the characteristic function of another random variable then you have convergence in distribution. So, the catch is that you may have the sequence of random variable sequence of characteristic functions converging to some function of t which is not a valid characteristic function. It may happen that way. So, what do I mean not a valid characteristic function. So, three properties. So, a characteristic function has to have three defining properties right. The one is that it should absolute value is not equal to 1. Then you must have you must have absolute uniform continuity and you must have non-negative kernel right. Every characteristic function is a non-negative kernel. See if these three properties even if one of them is not satisfied by the limit function it will not be a valid characteristic function and there is no question of convergence distribution. However, if the limit is a valid characteristic function namely if you can verify that c x o the limit satisfies these three fundamental properties then you have convergence in distribution. Try constructing an example of a sequence of characteristic functions which converges to something that is not a characteristic function. It is fairly easy. So, that is that. So, now actually there is a more refined theorem called continuity theorem which says that you do not even have to check all three properties of the limiting function right. Here what we are saying is that if a sequence of characteristic function goes to some function you have to verify the three basic properties of a characteristic function to verify whether or not it is a legitimate characteristic function. It turns out that you do not have to do that. You do not have to work so hard. You do not have to verify all three conditions. In fact, it turns out it is enough to verify continuity of the limit function at t equal to 0. So, if your limit function is continuous at t equal to 0 you are guaranteed that you have convergence in distribution. So, that is because. So, I will. So, it is somewhat involve theorem. It is a fairly sophisticated theorem. It involves again some tools from harmonic analysis, complex analysis and all. I will tell you fundamental reason or it is may be I will state it and then explain. So, continuity theorem for convergence of characteristic function right. So, this says. So, let X n be a sequence of random variables with characteristic function C X n of t. This is the sequence of characteristic functions and suppose limit n tending to infinity C X n of t exist for all t. Then either one of the following is true actually then exactly one of the following is true. So, let us call this limit some C of t. So, this limit exist for all t. I am calling this function C of t. Of course, the C of t I am not saying that it is a valid characteristic function. I am going to say more about it. If C of t is discontinuous at t equal to 0. So, then one of the following is true. So, then I said C of t is discontinuous t equal to 0 and in this case X n does not converging distribution. But, the other possibility is C of t is continuous at t equal to 0 and in this case C of t is a valid characteristic function of some random variable X and X n necessarily converges in distribution to X. So, this theorem says that this limit is some C of t. So, there are only two possibilities. So, A is C of t you check for continuity at t equal to 0. If it so happens that C of t is discontinuous at t equal to 0 the limit is discontinuous at t equal to 0. The C of t cannot possibly be a characteristic function because characteristic functions are continuous. In fact, they are uniformly continuous. So, C of t cannot be a characteristic function and in this case you can conclude that X n does not converging distribution to anything, which means F X n does not converge CDF does not converge. The other possibility is if you have continuity at t equal to 0 that is all you need. Then it automatically becomes a characteristic function and of some random variable and you have then convergence distribution that is because when you have if the limit is a characteristic function then you have convergence in distribution. So, why do you think this is true. So, intuitively without going into the mathematics of the proofs why do you think this is true. So, I am saying that if C of t has to be valid characteristic function I need to satisfy the three basic properties. But I am saying now that it is enough to check for continuity at t equal to 0 not even for other values of t. It is enough to check for continuity not even uniform continuity for or non negative kernels and so on. See the C of t it might happen that C of t is continuous at t equal to 0, but does not have the other properties of the characteristic function. But this theorem is saying that is not possible if you have continuity equal to at t equal to 0 you are guaranteed that it is a valid characteristic function. Do you have any guesses on why this is happening. That is the convergence of some valid characteristic function. That is essentially the reason see because the C of t is not some arbitrary function of t it is obtained by. So, it is obtained as the limit of a sequence of characteristic functions. So, these guys already have a lot of structure to them they have all the properties of a characteristic function. So, whatever limit it has I mean it is not some limit of an arbitrary sequence of functions it is already has a lot of structure to it. So, it is so turns out that if you just verify continuity at t equal to 0 all the other structures will fall into place. Equally uniform continuity and non-negative corner all that will work out automatically. Because the C of t is not any old function of t it is obtained as a limit of a sequence of characteristic functions which already have a lot of structure. So, this bit is very useful. So, I take the limit of some sequence of characteristic functions and I only verify continuity at t equal to 0 then immediately assert convergence in distribution. So, this theorem will be very instrumental in proving central limit theorem and even weak law of large numbers any questions on this. So, I forgot there is one result that is outstanding still. So, I should have done this last class the previous class there is one result that I forgot to do which I will just mention now. This says this is about convergence in earth mean this is something I should have done in the previous class. Theorem x n converges to x in the earth mean implies x n converges to x in the s th mean if r is bigger than s bigger than or equal to 1. So, this result basically says that if you have convergence let us say convergence in mean square r equal to 2 then you necessarily have convergence in the mean. So, if you have convergence for a bigger value of r convergence in smaller value of s in the mean. So, this is something I forgot to mention seems believable. So, the proof of this actually the proof of this follows in a straight forward way from an inequality called Lyapunov's inequality. So, it follows from Lyapunov's inequality and that brings me to a digression about a certain geometric inequalities which I have not done so far. So, I will briefly for 5 minutes I will digress into 3 well known inequalities I will just mention those inequalities I will not have class time to prove them, but I will I will probably refer you to I mean I will probably upload some material on the proofs. So, let me put these inequalities down there is 3 of them one is called Holder's inequality one is called Minkovsky's inequality and then Lyapunov inequality let me pull this up. So, this is a digression. So, this is not about convergence. So, I want to mention Holder's inequality Holder over with an umlot Holder's inequality this says if p and q is greater than 1 and 1 over p plus 1 over q equal to 1 then expectation of absolute x y less than or equal to expectation of. So, expectation of x to the p whole to the 1 by p times expectation of absolute y to the q whole raise to 1 over q. So, this is Holder's inequality. So, this generalizes Cauchy Schwartz inequality right. So, Cauchy Schwartz Cauchy Schwartz is for p equal to q equal to 2 right. So, this generalizes that. So, the proof is actually quite short, but again you have to do it in a particular way otherwise you will never get it you do it in that particular way you have to know how to do it and then it will come in 3 steps. So, I will point you to some reference on this proof. So, that is Holder's inequality then there is something called Minkovsky's inequality. See all these inequalities are highly geometric in nature I will as I will explain shortly right. In fact, this is a generalization of Cauchy Schwartz right. So, it is geometric means Minkovsky rather says if p is greater than or equal to 1 then expected x plus y to the p raise to 1 over p is less than or equal to expected value of x to the p to the 1 over p plus expected value of y to the q y to the p whole to 1 over p. Finally, Lyapunov inequality says that if r is greater than s is greater than or equal to 1 then expectation of x to the s whole to the 1 over s is less than or equal to expectation of absolute value of x to the r whole to the 1 over r does that make sense or is that the way. So, I mean that should imply this result. So, I think this is correct right yeah I think this is correct. So, a brief note. So, this Minkovsky inequality if you look at it it tells you. So, again see for. So, this basically tells you that if you take the p th moment or p th moment and raise to the 1 over p that satisfies triangle inequality right that is essentially what is saying correct. So, this Minkovsky inequality essentially asserts that this object expectation of x to the p whole to the 1 over p right. So, this is this acts like a norm it can actually show be shown using Minkovsky inequality that this object acts like a norm in the space of L p random variables now random variables for which the p th moment is finite 0 mean random variables for which the p th moment is finite. So, we know that see we mentioned earlier that the standard deviation behaves like a norm right and we mention that expectation of x y behaves like an inner product right in the space of L 2 random variables. This actually is more generally true that the L p the space of random variables with finite p th moment this behaves like a norm and it satisfies. So, Minkovsky says it satisfies the triangle inequality and this guy this hold us inequality which is also true in Euclidean space right you hold us inequality you may you can easily prove hold us inequality in Euclidean space for vectors in R n. It shows that this is like a generalization of Cauchy Schwartz in some sense and it gives you a relationship between these 2 dual norms this p th norm and q th norm or dual norms if this is satisfied and. So, it is exactly analogous inequality works in Euclidean space. So, again so this is like this is now you are saying that in the space of L p random variables or random variables which have both p th and q th moment finite the dual norm satisfy this right. So, these things are exactly analogous to Euclidean space. So, that was just a digression. So, you may encounter these inequalities they are useful to know and this in particular gives a proves that this is a norm in L p the space L p. So, that completes. So, that completes what I had to say actually here this completes what I had to say about generally about convergence of random variables. So, what remains then is just the limit theorems 2 or 3 major limit theorems and then we will be done. So, in particular we will do weak law of large numbers strong law of large numbers and the central limit theorem these 3 are very fundamental results. So, we will do laws of large numbers and central limit theorem. So, here we will do weak law and strong law. So, again so this both laws of large numbers see laws of large numbers I have said in plural because it is not one theorem. So, it is like a family of theorems. In fact, similarly I should say central limit theorems because central limit it is not one theorem it is refers to a family of theorems. We will only do some very specific version of it for the of these theorems for the IID case. But these 2 families of theorems really are at the core of probability theory they are very very important theorems and in fact, one there is no exaggeration to say that the laws of large numbers really the backbone of probability theory. So, if you have to choose a theorem or a family of theorems that make probability worth it it is the law of law of large numbers. That is because the laws of large numbers essentially they give you an interpretation of the expected value expectation of x as an average as an average value. So, far nothing we have said about expectation of x which is we just defined as integral x d p it is just a number right at the back of your mind you make I have this notion that it is some average value or something. But we have not said anything about it I kept saying that so far it is just a number for you it is the law of large numbers which gives the expected value of a random variable the operational meaning or a practical meaning in as an average. So, if the law of large numbers were not to nobody would study probability theory right because this whole frequentist interpretation of probability that oh if you toss a coin a million times fraction p will be heads or whatever right. So, that is essentially because of the laws of large numbers. So, it is the law of large numbers are in fact therefore, the backbone of probability theory without it nobody would study a theory like that and the theory would be entirely useless right. And central limit theorems in some sense they establish the importance of the Gaussian. So, the central limit theorem basically says all this family of central limit theorems they say that in the space of finite variance random variables the Gaussian is the is like an attractor. So, what it says is that if you add a large number of finite variance random variables no matter what their distribution the sum will look roughly like a Gaussian in distribution. So, in some sense the C L T establishes the Gaussian as in some sense the king of this of the as the most preeminent distribution in the finite variance world. So, that is the high level picture of things I have not I mean that I just poke it out right. So, I will now put down the results more properly any questions at very high level. So, any questions all right. So, let me let us first deal with the weak law of large numbers let x 1 x 2 dot dot dot b independent identically distributed random variables with expected value expectation of x which is finite. So, we are saying that the expected value exists the I D random variables for which the expected value exists and it is finite then. So, let S n is equal to then S n over n converges to expectation of x in probability. So, this. So, you have I D random variables with finite expected value expectation of x and you are looking at the sample average. So, you are looking at sum over i equals 1 through n x i over n. So, you are you have this I D realizations of this random variable and you are taking the sample average right you are adding the first n and dividing by n you are just taking the statistical average of sample average of the random variables x i's right. And the law of the weak law of large numbers says that this sample average converges in probability to expectation of x in probability. And it is in this sense that the law of large numbers says the sample average converges to the statistical average that is why this expectation of x has the interpretation as the statistical average value of the random variable. So, this weak law of large numbers is fairly old. In fact, the first published proof of it goes back to Jacob Bernoulli and it was published posthumously after he died in the year 1713. So, it is almost exactly 300 years old. So, it is a old result it is not a modern result it is not. So, most of what we have done is modern probability theory rate all this measures and so on. So, this is this pre rates all this by quite a bit. So, in. So, if you want a more explicit statement. So, it says that limit intending to infinity probability that S n over n minus expected x greater than epsilon equal to 0 for all epsilon greater than 0 right. So, as n becomes large the probability of the sample average being different from the expected value goes to 0 then becomes large. So, this means that with high enough with fairly with probability very close to 1 if you just look at S n over n it takes value close to expectation of x and this you can choose any epsilon you want 10 power minus 6 and you will find a large enough n for which this probability goes to 0. For large enough n this probability will be very close to 0 and n tends to infinity will be 0. So, this is the weak law of large numbers now maybe I should just take the strong law also because before proving both the strong law on the other hand is a modern result. The strong law of large numbers says under exactly those see under these conditions this convergence is almost sure there is only difference between weak law and strong law. The only difference is here instead of I P you will write A S under no further assumptions. So, strong law subsumes the weak law because almost your convergence in place convergence in probability. So, one might ask why do we even bother about weak law. So, it is for historical reasons I mean people figure that out 300 years ago whereas, it took all these measure theoretic machinery. So, and Borel proved a special case of it in 1909 or something then Colmogorov proved the general strong law. So, this is strong law of large numbers as we know it today is only about 100 years old whereas, this is about 300 years old. So, for 200 years people did not know that a stronger result is possible and that is because they did understand this measure theoretic tools properly. That came up only in the early 1900s strong law of large numbers. If x i are same story x i greater than or equal to 1 or i i d random variables with finite mean. So, expectation of x exist and it is finite then S n over n converges to expectation of x almost surely or with probability 1 i e. So, it is just a minute x n over n converges to x. So, this is so you are saying that this is a sequence of random variables which converges to. So, you should look at this not as a constant it is a random variable which always takes the value expectation of x. This is obviously, you have a sequence of random variables converging to another random variable. But that random variable is a constant here that is the correct way to interpret it. But this convergence is almost sure. So, in order to write. So, if you want to write it a little bit better i e the probability of omega for which S n of omega over n converges to expectation of x is equal to 1. So, this is a very different statement from this we in fact know that this is a stronger statement. So, if you look at this this is a statement about the sample parts the sample the samples omega. And this is not a statement about I mean this is the statement saying that S n over n is close to expectation of x for large n that is all this is saying. But this is saying that the sequence S n of n as a function of omega converges to expectation of x for almost all values of omega. So, the probability 1 essentially. So, remember so I will help you interpret this statement a little bit it is very important to understand this statement. So, you remember that all this x i is a living in some probability space omega f p. So, all these guys are living in some omega f p. And the moment nature picks a little omega all this random variables x i of omega realize as real numbers. And then obviously S n of omega over n also realizes as a sequence of real numbers. Now, the sequence of real numbers S n of omega over n may or may not converge first of all it is just some sequence of real numbers for every omega. So, change omega sequence will take some other value. So, what this theorem is saying the stronger of large numbers saying is you can imagine let us say if you like you will have 3 buckets. So, you pick an omega and you check if this sequence converges to expectation of x. So, if the sequence does not converge at all put that omega in the first bucket. If it converges but converges to some value other than expectation of x it could go to any other value put it in the second bucket. If it converges to expectation of x put it in the third bucket you see what I mean you are just partitioning the sample space right effectively. So, in some sense what you are saying is. So, let that those omegas the sequence does not converge let us say. And for those omegas the sequence converges but not the expectation of x. And for these omegas the sequence converges the sequence converges to expectation of x. What the strong law of large numbers is saying is that these 2 have probability 0 and all the probability is here. So, the moment omega realizes with probability 1 you will get a sequence whose average value converges to expectation of x. And the probability of either the sequence not converging at all the average not converging at all or converging to some value other than expectation of x is 0. So, that is the statement of the that is what this means that is very different from this you see. And we in fact know an equivalent characterization of almost here convergence which is also worth stating here. What is it we did a theorem about equivalence of almost here convergence to the excursions never happening beyond a certain m right. That is what so that also it is equivalent to saying this right equivalently you have to help me out here I think. Probability of union right n greater than or equal to m omega for which S n over n minus expectation of x greater than epsilon that right or the another way of writing it is limit m tending to infinity probability that soup n greater than or equal to m S n over n minus. So, or right I mean this union is same as this you can write a soup here the same thing this is all true for all epsilon greater than 0 right. So, this statement is equivalent to this statement right because we proved it right just the equivalence of almost here convergence to this. So, again to emphasize this says I mean they both exactly the same right we write a super right union is the same S n of omega. So, this says that as if you if you fix an m. So, if you fix a large m probability of an epsilon excursion. So, if you look at the difference between the sample average and expectation of x the difference exceeding epsilon is like an epsilon excursion at n. So, the probability of having even 1 epsilon excursion after m m and beyond goes to 0 right. Whereas, this convergence probability says the probability of an epsilon excursion at n goes to 0 right. This is saying probability of an excursion m and beyond goes to 0. So, this is stronger right we have seen this already. So, are there any questions on these 2 this did you understand the difference between these 2. So, if you strong law if you strong law implies the weak law right. So, in some sense weak law is completely subsumed. So, if you prove the strong law you prove the weak law, but on the other hand proving the weak laws much easier which is why it was done 300 years ago. And this is much harder this is why it was done 100 years ago because this is a measure theoretic statement right. You are looking at a measure of all omegas where this convergence happens. So, I think I will stop here yeah I think I will stop here good place to stop. So, next class we will have to prove these results and then I will have 1 class for central limit here. So, I think that is just about right.