 So, we were dealing with hierarchy of convergence concepts. So, we said convergence almost surely and convergence in the arc mean implies convergence in probability and that convergence in probability implies convergence in distribution. And between these two we know that if the converse is not true, but if the limit is a constant then they are equivalent. Here we gave a counter example we proved this last class. We proved a major theorem about the equivalence of almost sure convergence and excursions beyond m. And using that we proved almost sure convergence implies convergence in probability. We have also given a counter example that the converse is not true. And this direction we proved using Markov inequality. And the converse you can I do not know the converse given example I will not we will find an example. And so I think what remains is I have to give counter examples in that neither of these are implied. So, almost sure does not imply mean square convergence and mean square convergence does not imply almost convergence. So, to see that so if you want to so if x n converges to x almost surely does not imply convergence in let us say mean square. In order to see this actually we have already seen this example. So, you have let us say the 0 1 interval as your sample space. And let x n of omega b so x n of omega is equal to n from omega n 0 comma 1 over n equal to 0 otherwise. So, remember this example. So, in this case your so in this case we do have convergence almost surely x n tends to 0 almost surely. Because as n tends to infinity for every omega you have in fact this is the way I have defined it I think it is sure convergence actually for every omega in the sample space you have x n of omega going to 0. And in this case of course. So, what is expected x n square expectation of x n square is n square times 1 over n plus 0. So, limit n tending to infinity expectation of x n minus 0 the whole square is infinity. So, the mean square goes to infinity whereas the random variable goes to 0 almost surely. So, with very small probability the random variable takes very large values correct. So, although the random variable is tending to 0 almost surely you have a situation where the mean square is infinity going to infinity the mean goes to 1 that we have already seen. And in order to so that is the counter example to show this is not true. And in order to show that so if you want to show x n tending to x in the mean square does not imply here again you have seen an example already. So, you take x n equal to 1 with probability 1 over n equal to 0 with probability 1 minus 1 over n and x n independent all x i's are independent. So, in this case you have already shown that x n does not tend to x almost surely remember how did we show it Borel Cantelli lemma. So, in this case it is very easy to show on the other hand that x n approach is 0 in the mean square. Because what is expected x n square it is 1 times 1 over n. So, expected x n square goes to 0. So, you do have mean square convergence and you do not have almost your convergence. So, that way you have also this proved that. So, between these two there is no relationship right one may hold the other may not hold neither right. So, you cannot say any implication between them any questions. So, now what we will do is from. So, there are few other convergence results which I will state without proof they are useful in what we are going to study about law of large numbers and so on. The proof is a proof in some cases may be somewhat long and technical. So, I am just going to state the theorems. So, that you are aware of the results and you can always consult more advance text books if you want to read the proof. So, here is the theorem if x n converges to x in probability then there exist a deterministic subsequence increasing subsequence n 1 n 2 dot such that x n i converges to x almost surely as i tends to infinity. So, this is a nice result to know in fact it is going to help us later when we prove a strong law of large numbers. So, this result says. So, you know the following right you know that convergence almost surely always in place convergence in probability, but the converse is not true. So, here for example in this example we have convergence in probability, but not convergence almost surely right with me. So, the difference is essentially that convergence in probability only demands that epsilon excursion at n must have vanishingly small probability. Whereas, convergence almost surely demands that epsilon excursions n and beyond must have vanishingly small probability correct. Now, what this theorem is saying is like a partial converse to this we know that this implication is not true this implication is not true, but what this theorem is saying that some partial converse is true in the sense that if x n approaches x say x n converges to x in probability it may not be true that x n converges to x almost surely, but there may be some subsequence which converges almost surely. So, I think the best way to explain this is just given example. In fact, the example is going to be this this guy. So, our favorite example right. So, this is not a proof it is only an example let make my point. So, again you take x n is equal to 1 with probability 1 over n equal to 0 with probability 1 minus 1 over n x n are independent here what was the problem. So, here x n was converging to 0 in probability right. So, the probability of x n itself being being 1 is very small right it is like 1 over n, but because this Borel Cantelli lemma 2 says that no matter how far out you go there will be some 1 popping of somewhere right. So, you do not have the statement that beyond n you are always guaranteed to be within epsilon that is not true right. So, I drew a picture. So, no matter how. So, let us say this is your n right. So, the probability of x n being 1 is very small right, but no matter how big n is there may be 1 occasional 1 that pops off here right. No matter how far you go right there will be with probability 1 some 1 that is popping off right because Borel Cantelli lemma 2 says that. Now, what this theorem is saying if you sub sample right the sub sequence is simply sampling your under like this not looking at all indices it is looking at certain sampling of this indices it is saying that you can sample in such a way that you will avoid this occasional 1 popping off. So, no matter how far out you go it is true that there will be the occasional 1 popping off, but you can find a sub sequence such that you can avoid those 1's because they are enough. So, this is a and this these sub sequence that the 1's the n's that you select or can be chosen in a deterministic way it does not depend on omega by deterministic I mean it does not depend on the particular realization omega for example, in this case how would you do it is there can you think of a sampling can you well can you think of a sub sequence for which the convergence to 0 is almost surely true. So, if you take so suppose you do this suppose you take n i equal to i square that is my sub sequence. So, I am instead of and I am considering the sequence not x 1 x 2 so on I am considering x 1 x 4 x 9 x 16 and so on fine. So, I am not looking at all ends I am looking at 1 4 9 16 that sub sequence. So, now what is the probability that let n i equal to i square now probability that x n i equal to 1 is what sorry 1 over i square that correct. So, what you have now. So, borough cantilelemma 1 will imply that x n i converges to 0 almost surely correct. So, if you look at x 1 x 2 x 3 and so on no matter for how far out you go you are guaranteed to find some 1 popping off, but now you are looking at rare enough instances you are looking at 1 4 9 16 you are going very quickly. So, this sub sequence ends up avoiding all the once eventually this is not a proof again this is just an example and this is always true whenever you have x n converging to x in probability there is always some sub sequence it may not be 1 over i square it may not be i square but it some other sub sequence, but deterministic sub sequence such that x n i converges takes almost sure this clear deterministic sub sequence. So, as I said deterministic sub sequence. So, what I mean is that the n i right n i that it choose does not depend on omega no that is not, but there is a deterministic relationship. So, in this case for example, n i was i square it is a deterministic relationship it is not depend on my realization of what the x n's are it does not depend on omega in any way it may not be i square in particular it may be something else, but it is a deterministic relationship it can be chosen irrespective of omega that is what this deterministic means actually you can it does not have to be i square it can be i power 1 plus delta and that would be enough right you have yeah I mean yeah. So, you have to figure out whether it is this convergence holds, but there it always exist some sub sequence whether you manage to find it whether your favorite sub sequence holds or not this is a different story here in this particular case I do not have to have 1 4 9 I can have i to the 1 plus delta or something like that. So, I just need Borel cantilelemma the summation to be finite and I am done right. So, if you look at rare enough indices you are going to end up missing all the occasional there are very few ones by the way right, but they do occur no matter how far out you go, but if you slightly under sample these indices you end up missing all of them eventually right. So, this is good to know this theorem is very useful it is going to help us in strong law of large numbers actually it is going to help us in going from weak law of large numbers to strong law of large numbers. There is another theorem again which is which I will state this is called scurochords theorem scurochords representation theorem. Take x n and greater than or equal to 1 and x b random variables on omega f p such that x n converges to x in distribution. Then there exists a probability space omega prime f prime p prime and random variables y n and y on this omega f omega prime p prime such that y n have the same distribution as x n and y the same distribution as x and y n converges to y almost surely. So, this says that. So, you know that almost sure almost sure convergence is strong convergence right it is a very strong notion of convergence is stronger than in probability which is stronger than convergence in distribution. Now, this scurochord representation theorem says suppose you have convergence in distribution it does not imply any other form of convergence necessarily, but it says that if you have convergence in distribution you can find a sequence of random variables in another probability space which has the same distribution as your initial sequence, but now the convergence in the new space is almost surely. So, actually this is a constructive proof you can actually if I remember correctly you can actually take this omega prime f prime p prime as your 0 1 interval 0 1 interval Borel and Lebesgue. So, you can in particular explicitly construct a sequence such that you have you can construct a sequence y n which has the same distribution as x n and y which has the same distribution as x, but in the new probability space which may have nothing to do with this you will have convergence almost surely. These two spaces can be very different this may be a space of coin tosses or something this may be for example, this may be real line or 0 1 interval, but the distribution should be series will be the same and the convergence will be almost sure and the proof is constructive you can explicitly construct such a sequence. This is again a useful theorem to invoke in a few places it comes in handy in a few places when you are not really bothered about. So, it comes in handy when you are not really bothered about the specific probability spaces, but you are the instead concerned only about the distributions. For example, the next theorem I am going to state is going to use this or the theorem after this perhaps. So, if x n converges to x n distribution and g is continuous. So, g is continuous then g of x n converges to g of x n distribution. So, here we will use core record theorem. So, we are saying that if g is a continuous function. So, if x n converges to x n distribution then g of x n converges to g of x for continuous function g. See the way to prove it is as follows actually you will agree with me that if x n converges to x almost surely and g is continuous you will agree with me that g of x n converges to g of x almost surely. Why is that true? It is actually from continuity. I mean after all if you have a sequence x n converging to x g of x n will converges to g of x for continuous function. Now, it is not everywhere it is just almost your convergence let us say on a set of probability 1. So, you just have to prove that on a set of probability 1 g of x n converges to g of x. Except this theorem is not talking about almost your convergence it is talking about convergence in the distribution. So, you use core record to make the connection. So, the proof is as follows. So, there exist y n converging to y almost see by mean by see by writing there exist y n converging to y almost surely I mean in that sense. They have the same distribution they may be in a different probability space. See y n may live in a completely different probability space from x n. So, y n converges to y almost surely. Next since g is continuous. So, for also the set of all. So, this y is live in see this y is live in omega prime f prime p prime. So, the set of all omega is in omega prime for which g y n converges to g y is at least as big as the set of all omega is in omega prime for which y n converges to y. So, do you agree with the statement. So, these y n converging to y happens almost surely and this y n live in some other probability space that is by core record. Now, I am saying that since g is continuous function the set of all omega is in this new probability space omega prime where this convergence happens is at least as big as the set of omega's where this convergence happens agreed. Why it is nothing but continuity see for every. So, suppose an omega here is as that this convergence happens y n of omega converges to y of omega. Then for that omega this convergence is guaranteed. So, any omega here is necessarily element of this set correct. So, this containment is clear and it follows only from continuity you do not there is nothing sophisticated here. So, you agree with this right this is almost convergence here no this is just convergence in the set of this is convergence. So, actually what I should write is in order to make things perfectly clear should write and similarly here right the set of all omega's where y n of omega converges to y of omega. This is just convergence in the sense of sequences right and similarly here I will write it as like a random variable. But, you should put an omega everywhere right agree with this containment. But, what is the probability of this set 1 right. Therefore, the probability of that set must be greater than or equal to 1 and therefore, must be equal to 1 right. So, which means. So, this implies. So, this set will have probability greater than or equal to 1 right. So, which means g of y n converges to g of y almost surely correct. This implies g of y n converges to g of y in distribution because convergence almost surely certainly implies convergence in distribution. But, on the other hand see you know that y n and x n have the same distribution. So, if you put a continuous transformation g on it g of y n will have the same distribution as g of x n and g of y will have the same distribution as g of x right. So, this statement would imply that g of x n converges to g of x in distribution is the proof clear. And so, this will imply the result this implies the result again because of the because says this y n and x n have the same distribution right any questions. So, this is called continuous mapping theorem. So, convergence is convergence in distribution is preserved under continuous maps. So, this is then finally, this is a very important theorem on convergence in distribution. This is theorem well this is. So, this is chapter 7 Grimett and Sturzaker theorem 19. I will prove 1 direction and leave it. So, theorem says if x n converges to x in distribution sorry. So, x n converges to x in distribution if and only if for every bounded continuous function g we have expected g x x n converging to expected g x. So, this is an important theorem this says that x n converging to x is equivalent to saying that for every bounded continuous function g of x n converges to g of x. So, there are 2 theorem to prove here. So, one is that if you have convergence in distribution then no matter. So, then you pick any bounded continuous function then you have this expected g x n converges to expected g x. So, what is the sense of this convergence? So, this is expectation this is some number this is some number this is some sequence converging to some number. So, the easy part is proving this direction the more challenging part is proving the converse. So, I will just prove 1 direction converse requires more work prove only if prove only if which means I am going to assume this. So, what shall I say if x n converges to x in distribution then g x n converges to g x n distribution correct agree why? It is nothing theorem correct. So, now I have to invokes core chord I have to invokes core chord actually you know what. So, I can invokes core chord and get to here. So, actually I do not even need to go this far. So, what I really need is to get here g y n converges to g y as in the previous theorem. Then actually I need g y n converging to g y in what sense almost surely. And these y n are like in core chord this is by core chord. These y n is living some different probability space not in the same as x n. Now, what happens? Now, since g is bounded expected g of y n converges y right why is that true dominated convergence theorem correct. But y n and y n and y x n have the same distribution and y n x are the same distribution right. So, this implies expected g x n converges to expected g x. So, that one direction is easy it is just core chord followed by application of d c t right. The converse is more complicated I will not spend class time on that it is not enormously difficult just long. So, the reason this theorem is important is because in more complicated spaces this is. So, this is taken as the definition of weak convergence. So, these x n are just real value random variables right. So, you can define convergence distribution in terms of convergence of c d f right. But in more advanced probability these x n may take values in some Hilbert space or some complicated spaces it may not just be real valued or R n valued even it may take values in like some polish space or something. In the case you cannot talk about the convergence of the c d f's in that case you only talk about this. So, you take this as the definition of weak convergence that is why this theorem is important great then. So, I will state 2 theorems about now. So, I have to state 2 theorems about convergence of characteristics functions. So, the first theorem is going to say that if x n converges to x n distribution then x n the characteristics functions converge. And finally, I am going to say if the characteristic functions converge does it imply that the distribution converge not always, but more or less those are the 2 things I will state theorem. If x n converges to x n distribution then c x n of t converges to c x of t for all t. So, convergence in distribution definitely implies convergence of characteristic functions. So, in some sense if you want to in the hierarchy of convergence you can put convergence in distribution implies convergence of characteristic functions c x f t for all t. Now, how does this follow see x n converges to x n distribution. So, proof x n converges to x n distribution. So, y n converges to y almost surely by score record y n have y n x n have the same distribution y n x have the same distribution. Then you can show that this is excellent then y n converges to y almost surely. So, this implies cos y n t converges to cos y t almost surely correct for all t and similarly, for sign both for all t. So, which means I can take now this cos is a bounded function sin is a bounded function. So, I can invoke dominated convergence theorem and say that expected cos y n t goes to expected cos y t. So, this implies. So, and then you can take this plus i that. So, I have expected cos y n t plus i sin i times expected sin y n t converging to this expected sin y t expected sin y t for all t. This is because of by d c t and. So, that is your characteristic function of y n. So, this means that c y n of t converges to c y of t for all t. But, c y n of t is equal to c x n of t because they have the same distribution. See, after all the characteristic function only depends on the marginal c d f of y n and x n respectively correct understood. So, this is why in the score record theorem is very useful. Because, it helps you to go to a new space where you can invoke all these d c t and m c t and so on. And then come back to the space you want. This proof clear? Fine. I think I am out of time. So, maybe I will just state the next theorem and explain it in the next class. Let c x n t converge to a valid characteristic function c x of t. Then x n converges to x in distribution. So, this is like a converse to what we said there right. So, there we said if you have convergence distribution you are guaranteed that the sequence of characteristic functions will converge. Now, here so I mean I have not stated it really properly. So, but you are saying here that the limit of your sequence of characteristic functions is another valid characteristic function. Then you have convergence in distribution. The problem is sometimes you may have a sequence of characteristic functions whose limit function may not even satisfy the properties you know those non negative kernels and equicontinuity uniform continuity and all that. So, those probability may not be satisfied by the limit function in which case there is no question of the limit function even being a characteristic function. But if that problem is not there if the limit is in fact a characteristic function then you have convergence in distribution. So, it is not a full converse in some sense. It is a converse with a caveat. We will take it tomorrow.