 So, in the last lecture, we had introduced these inequalities Markov inequality and Chebyshev's inequality, but I feel that revisiting them is necessary, because some aspects need to be emphasized. And in fact, the Markov inequality has its strength in its simplicity and its generality, because you see inequality is very simple to state, but this can be very useful and powerful at places, and also the strength lies in its generality, because it just that you need to know that there is a random variable whose expected value exists and that is it, and then you can you know state facts about certain probabilities. So, let us see interesting applications of the Markov inequality. Consider a group of 500 people, now the kind of you want to ask this question is it possible that at least 90 percent are younger than the average of the group, the next question is it possible that at least 50 percent are older than twice the average age, and another question could be is it possible that more than one third are older than three times the average. So, let us try to see what kind of answers Markov inequality will give you. So, for the first part of course, the answer is yes and I will explain why, but if you look at the if you try to get the bound from the Markov inequality, you see inequality says that for x greater than or equal to e x probability that x is greater than or equal to e x will be less than or equal to e x upon e x, because you take e x of this and then divide by this which is equal to 1. So, that is no bound, because you know that all probabilities are less than or equal to 1 and the converse of this event would be probability x less than e x, which would then be greater than or equal to 1 minus 1 converse of this, because this is less than or equal to 1. So, this would become 1 minus 1 which is 0. So, again this does not give you any information. So, that is what we are trying to say, we are saying that possible that at least 90 percent are younger than. So, younger than means that you want to compute the probability of the event that x is less than e x younger. This is what you want to compute. So, I should have said here this is comma, so therefore, this the Markov inequality just tells us that this is greater than or equal to 0. So, that is no help, but of course, you can rationalize this thing that the answer would be yes, because there may be some people who are very old and therefore, they will they will make the average go up to. So, even if 90 percent are younger that means, what we are saying is that the answer to this is yes, because 90 percent are younger even then the few people who are very very old will lift the average and so, this inequality would be you know this is the probability of 90 percent are younger would be satisfied. That means, probability x less than e x is equal to 90.9 would be satisfied. So, to answer the second question what we want is that older than twice the average age that means, you want the probability x greater than twice e x and you want to bound that this is at least 50 percent people are older than twice the average age. So, if I want this probability then this is less than or equal to probability x greater than or equal to twice e x, because this event is bigger than this event and this by Markov's inequality would be less than or equal to e x upon twice e x. So, we divide by this and so, this that is equal to 0.5. So, therefore, the answer would be yes, because probability that x greater than 2 e x equal to 0.5 is a possibility yes, but probability x greater than 2 e x greater than 0.5 is not a possibility. So, but since this is possible we will say that the answer is yes that at least 50 percent will be older than twice the average age. So, interesting applications you see then to answer the third part that is probability x greater than 3 times e x and you want to bound on this. So, this is less than or equal to probability x greater than or equal to 3 times e x same argument as earlier and this by Markov's inequality is less than or equal to 1 by 3. So, you want that the at least 1 3 are greater than the probability that at least one third are greater than thrice the average age. So, the answer is no, because this is less than or equal to 1 by 3. So, this cannot be more than. So, this event the probability of this event cannot exceed 1 by 3, so the answer here is no. Now similarly, let us look at the Chebyshev's inequality which says that probability absolute value of x minus mu greater than or equal to c times sigma is less than or equal to sigma square upon you divide by the square of this which is c square sigma square. So, this is equal to 1 by c square 1 by c square. And so, if you consider the event that probability of absolute value of x minus mu greater than or equal to twice sigma then this will be less than or equal to 1 by 4 which is 0.25. So, now if you look compare this with the some of the actual probabilities then you see for x being distributed as normal mu with mean mu and where in sigma square and you are looking at the probability that absolute value of x minus mu is greater than or equal to twice sigma then the actual probability is 0.456. So, therefore, you can see that this is much much smaller than 0.25. And if you look at the diagram, so therefore, if this is the mean your x axis is this and then this is your pdf the axis for this pdf then you see here you take the area that means what you are saying is that this area between lying between because absolute value x minus mu greater than or equal to 2 sigma means that x lies between mu minus 2 sigma and mu plus 2 sigma. So, these are the limits and so here what we are saying is that this area would be 1 minus 0.0456 this is the area which we are depicting here. And so the difference is quite large and this becomes even more significant or more glaring the difference between the Shebyshev bound and the actual bound or the actual probability if you take probability if you take the probability of x minus mu greater than or equal to 3 sigma then this will be less than or equal to 1 by 9 which is 0.111 by the Shebyshev's inequality. But the actual probability is actually very small it is 0.0013 which is see what here again because of the symmetries you remember, so here this will be mu minus 3 sigma and this is mu plus 3 sigma. So, you are asking for yeah exactly, so that area I am showing that means between mu minus 3 sigma and mu plus 3 sigma, so this whole area I am saying is 0.9987. And that is because we know that by symmetry this area the tail this part tail part and these two are the same. And so we have discussed this many times before also, so therefore that means actually the tail that means this tail area is half of this 0.0065 and here also the tail is 0.0065 and so therefore, so the difference becomes you know bigger and bigger. And yeah, so you know one can go on and looking at these interesting part that these inequalities but at times they provide you very they are very useful tools and they as I told you for the Markov inequality it can answer some very interesting questions and here also we will see various applications of the Shebyshev's inequality. Markov inequality is not able to say much but you can see the thing is that the answer would be yes because you can always have small number of people who are very very aged whose ages are very big and therefore, the average, so therefore, the 90 percent can still be younger than the average age because these older people they pull up the average. So, therefore, the answer is yes, now if you look at the second question then you are asking for the probability that x is greater than or equal to twice E x, so twice the average age. And therefore, by Markov inequality this would be E x upon 2 E x which is 1 by 2 which is 0.5. So, Markov's inequality gives you the bound that this probability cannot exceed 0.5 and so therefore, the answer here will be no. So, the answer is no because here they are asking is it possible that at least 50 percent are older than twice the average age. So, no 50 percent will not be older, so this is this probability would be always less than or equal to 0.5 and similarly, for the third question probability x greater than or equal to 3 times E x that will be less than or equal to E x upon 3 E x and this is 1 by 3. So, therefore, again more than 1 by 3 is not possible, more than 1 by 3 are greater than 3 times because this probability the bound upper bound is 1 by 3 and therefore, again the answer is no. So, I just thought that this gives you another insight into the Markov inequality and its uses and one can go and discover more and more about the usage of this particular inequality. Now similarly, for Shevyshev's inequality I wanted to just point out that if you ask for the probability that mod of x minus mu is therefore, you have a random variable x which has expected value as mu and variance x is sigma square. So, just a random variable with mean mu and variance x is sigma square, you are asking the question mod of x minus mu or absolute of x minus mu is greater than c sigma. So, by Shevyshev's inequality this would be sigma square upon c square sigma square this is 1 by c square. So, in particular if you put c equal to 2 then this is the probability that mod of x minus mu is greater than 2 sigma and therefore, this will be less than or equal to 1 by 4 which is 0.25. So, in other words you see here if you I have drawn the normal curve does not matter. So, therefore, this is minus 2 mu and this will be 2 mu. So, in this we are asking for the area that is the probability that this is greater than 2 sigma that means the area on to the left of minus 2 mu and the area to the right of 2 mu. So, that will give you the probability that mod of x minus mu is greater than 2 sigma and this is less than 1 by 4 for in general for universally true this is universally true which is 0.25. Now, if you compare this with the for the for normal n mu sigma that means if your random variable x is mu sigma then this probability is 0.0456. So, therefore, compare to this this is really loose bound loose upper bound, but later on we will see how no matter because of its universality of the Shevich's inequality this is very useful in proving many other results in the probability theory. So, anyway I just thought I will give you an estimate. So, therefore, because the normal curve is symmetric about mu and then it is bell shaped. So, the mass is concentrated around mu for normal and therefore, this probability would be small because the area lying on the left of minus 2 mu and to the right of 2 mu will be much smaller than compare to the area which is around mu. So, therefore, this is this and similarly if you take c to be 3 then the difference is more marked because probability mod x minus mu greater than these 3 sigma is less than or equal to 1 by 9 which is 0.11. Anyway for for so that means what it says is that for most of the distributions the area the mass of under the curve lies the probability mass lies within minus 3 sigma is minus 3 sigma and 3 sigma then the area inside here is 0.9987. So, only this much area lies outside which means half of this I will have to be very sure that if this is this then the half of this half that means if you further do it 0.006. So, this is the area which lies here and the both this area is 0.006 and that is 0.006. So, this is the idea. So, therefore, Shebyshev's inequality is an upper bound but because it is applicable to all the distributions therefore, it has its own uses and applications. Now, the third inequality that we want to talk about is Jensen's inequality and this inequality relates expectations instead of probabilities. So, like for example, both these inequalities were giving you upper bounds for the probabilities of certain events but Jensen's inequality relates the expectations. But before that I give you the Jensen's inequality I need to define convex and concave functions. So, and some of you may have already come across for example, convex lenses, concave lenses you may have heard of. So, here the function is said to be convex if it is twice differentiable if a function is twice differentiable real valued function then it is said to be convex if its second derivative is non negative in the domain of f. So, wherever f is defined then at all those points if your f double prime x is non negative then the function is said to be convex and if the double derivative is less than or equal to 0 then the function is said to be concave. So, therefore, the relationship between convex and concave is that if f is convex then it will imply that minus of f x is concave. So, now here for example, I have drawn for you convex function twice differentiable and what we are saying is that see if f double prime x is greater than or equal to 0 then f prime that this implies that f prime x is non decreasing. If the wherever you take a function f and if its derivative is non negative then we say the function is non decreasing here f double prime x is non decreasing. So, this implies that f x is sorry f double prime x is greater than or equal to 0 that implies that f prime x is non decreasing. So, you see here for example, these are the tangents to the curve and see these angles they are negative they are obtuse and your if you all of you remember your the graph of tan x because slope is given by f prime x is the slope tan of the angle tan of this angle. The tangent of the angle that the tangent at the curve makes. So, your for example, this is if this if you take this is 0 this is pi by 2 then this is pi and therefore, on this side of this it is like this. So, the function is obtuse angle and the curve is increasing. So, as the angle becomes and then of course, this becomes the angle becomes up to pi and so tan of pi is 0. So, your derivatives the tan of these angles are increasing and then finally, at this point at the this point it becomes 0 and then when you take this then you can see that the angles are increasing. So, therefore, for obtuse angles again tan is increasing. So, this is the idea. So, therefore, the first derivative is non decreasing also that the tangent at any point of the curve lies below the curve because you see the function is like this. So, the tangent is this. So, tangent is always below the curve and so here when you say that minus f x minus f x means you will turn it upside down. So, therefore, a convex function you can say holds water a concave function will not hold water because it will be upside down. So, this thing will be up and function will be like this. So, this will be a concave function. Now, of course, here I have said given you the definition of a twice differentiable, but for example, if you take y equal to mod x this is also convex, but of course, this is not differentiable. So, none of these things you know it is differentiable at these points, but at not at the origin. So, this holds because it is constant see here the slope is minus 1 here the slope is 1. So, in any case the slope is increasing because this is this here it is not defined, but this. So, this is also a convex function and of course, there are many ways of characterizing a convex function. So, now, I will state the Jensen's inequality for convex and concave functions. So, Jensen's inequality says that if f x is a real valued convex function then expectation of f x this should be capital X because function f is a function of the random variable x, then e expectation of f of x is greater than or equal to f of e of x. So, that means you exchange f and e then the inequality is this kind. So, for x a random variable with e x equal to mu finite. So, the requirement is that the mean expected value must exist for a random variable and if a function f is convex then this would be e f x is greater than or equal to f of e x. Now, you can see that if you replace this by if you multiply the inequality by minus sign then the minus sign will go inside and it will say that expectation of minus f of x is less than or equal to minus f of e x and since, so minus as we said earlier when we are defining a convex function that minus f will be concave if f is convex. So, therefore, for the concave function the inequality reverses. So, this is your Jensen's inequality. So, it is just relating your expected values and you can if your function is convex then the inequality would be greater kind and for concave it will be less kind. Now, we already know that expectation of for example, x square if the second moments exist expectation x square is greater than or equal to expectation of x whole square that means my function f x here is x square and this we know is convex. It is everybody knows it is a parabola or the second derivative is 2 a constant which is non-negative. So, this is a convex function, but we already know that variance x can be written as expectation x square minus expectation x whole square and this is always non-negative. So, from here also it follows that expectation x square would be greater than or equal to square of expected x. Consider the function f x equal to 1 by x then if you just find out the first derivative this is minus 1 by x square and second derivative would be see x raise to minus 2. So, minus 2 and plus and minus sign plus 2 upon x cube and this is always non-negative for x positive and therefore, this is a convex function and so by Jensen's inequality expected value of 1 by x is greater than or equal to 1 upon expected x. And quite a few people often mistake this and they say that expectation of this will be. So, now you know better because Jensen's inequality says that this will be greater than or equal to they are not the same thing expectation of 1 by x and 1 by expectation x are not equal. So, this is also you can now assert by using Jensen's inequality and you can consider the function log x. The second derivative is minus 1 by x square first derivative would be 1 by x. So, when you take the second derivative will be minus 1 by x square and this is less than 0 for x greater than 0. Anyway the function this is defined for x positive and so by Jensen's inequality expectation of log of x is less than or equal to log of expectation of x because for con k function the inequality reverses. Proof is simple. So, I will use the first property that the tangent at any point of a convex function lies below the curve. So, the curve always goes above the and of course, they meet at this point. So, the tangent is at the point mu then the value here the coordinates are mu g mu and so if I take a plus b x as the tangent to g x at the point x equal to mu. Then g x convex implies that g x is always greater than or equal to a plus b x and g mu will be equal to a plus b mu because the curve and the tangent line they meet at this point. And so therefore, since this holds so therefore, when I replace x by a random variable the inequality remains intact. So, g of random variable x is greater than or equal to a plus b of x and so therefore, the expectation will also set is they will not change the inequality. So, when I apply expectation on either side it will be e of g of x is greater than or equal to a plus b e of x and a and b are constants. So, this is what the proof and so a plus b of e x is a plus b u which is g of u g of mu sorry a plus b mu which is g of mu and mu is your expected value. So, therefore, this is g of e of x. So, therefore, here from here you have shown this inequality the simple proof by using the convexity of the function and then the fact that when you have inequality. So, this is a bigger function and this. So, I hope you can all agree that because even if you are taking x to be a continuous random variable then if your this thing is if your density function of course, is non-negative. So, here you are taking the difference. So, if you take the difference of g x minus a minus b x which is a non-negative function. So, then integral whatever the limits would be also non-negative and so this will be satisfied. So, therefore, from here to here is no problem and therefore, you can prove the Jensen's inequality. So, therefore, the figure is also quite explanatory. Now, an alternate proof because since we have the definition of convexity. So, I will use the twice differentiability of the function now. So, since f is convex so it is twice differentiableness domain and Taylor's expansion of f x at x equal to mu up to second order terms yields. So, now those of you who are feel comfortable with calculus then you know about the Taylor's expansion that every function can be expanded in the neighborhood of a point. Where in the neighborhood it has all these derivatives and so here since I have assumed that it is second order derivative exists. So, therefore, I can write f x as f mu plus x minus mu into f prime x plus x minus mu whole square by 2 factorial into f double prime psi where psi belongs to mu comma x. So, such a psi exists and in the interval so whether it is mu comma x or x comma mu does not matter because you are taking the square here. So, there is a psi in this interval and therefore, this would be the exact expansion that is what Taylor says. So, Taylor's theorem says that such a psi always exists. Now, since f double prime psi is non-negative because f double prime is non-negative in the whole domain. So, this non-negative and this is a square of a real number. So, this quantity is non-negative therefore, I can say that f x is greater than or equal to f mu plus x minus mu into f prime x. So, which you know you if you write this in terms of. So, f of x is greater than or equal to f mu I should have written this step x minus mu f prime x f prime x fine just that as we did here in this first proof. And now you can take the expectation. So, expectation f x again the same reasoning the inequality will not get reversed. So, this will be f mu plus now expectation of x minus mu is 0. So, you are left with only f mu here and f mu is f of e x. So, therefore, again the Jensen's inequality has been proved. So, I just wanted to point out this correction the Jensen's inequality proof. See I was giving you an alternate proof and there I had to expand the function f x by Taylor's expansion at the point mu. And the correct expansion is that f x is equal to f mu plus x minus mu f prime mu plus half x minus mu whole square f double prime psi. Now, instead of mu it got written as x. So, therefore, you have to read f prime mu instead of f prime x. And then of course, we know that psi is a number which is some number between mu and x and by Taylor's theorem such as psi always exist. So, we are taking a two second order expansion of the function f x at mu. And so, this should read as f prime mu instead of f prime x. So, and as we go along we might also see some more occasions to use this inequality. But I think this gives you a good feeling about the Jensen's inequality. So, an interesting example of the Jensen's inequality is that you know an investor is faced with two choices. She can either invest all her money in a risky proposition that will lead to a random return x that has mean n or she can put the money into a risk free venture that will lead to a return of m with probability 1. So, these are the two choices she has and suppose she bases her decision on maximizing an expected value of u r where r is a return and u is her utility function. So, by some advice somebody's advice or something she has now decided that she will base her decision to invest whether in the risk free venture or the risky venture by maximizing the expected value of u r. So, where r is the return function and u is the utility function. So, u of r. Now, by Jensen's inequality it follows that if u is concave then expected u x will be less than or equal to u of e x which will be u of m. So, the risk free venture is better. So, here the expected return of u x will always be less than or equal to u of e x which is u of m. So, therefore, it is better to invest in the risk free venture. Now, if u is concave then this implies that your e of u x will be greater than or equal to u of m. So, the risky venture is profitable because the expected return here would be greater than or equal to u of m. So, this is u is her utility function and in the risk free venture she gets exactly m return. So, therefore, this will be the total I mean the utility to her of the return that she gets from the risk free venture and this is expected because x is a random return. So, e of expected value of u x. So, that will always be greater than or equal to u m in case the utility function is convex. So, therefore, the risky venture is profitable. So, and there can be many more interesting examples of these inequalities that we have just studied. So, the next thing that we want to talk about which again has a very important role to play and these are the limit theorems. So, let us just first I will try to understand the concept of what we mean by these limit theorems. So, the first definition that I want to make is the definition of you know sequence of random variables converging in probability to another random variable. So, here this is that x 1, x 2, x n is a sequence of jointly distributed random variables for n greater than or equal to 1. That means you must have at least more than 1 defined on the same sample space omega and let x be another random variable defined on omega. Then we say that x n converges to x in probability that is. So, the notation is that x n goes to x in probability if for every epsilon greater than 0 limit of this absolute value x n minus x is greater than epsilon. So, this limit converges to 0. So, in other words in probability the sequence the random variable x n is converging to x and please understand. So, here this is different from your concept of usual limit where the p is missing. So, in that case then you say that in value x n the sequence is converging to x that means when n becomes larger and larger the distance between x n and x will be very small because epsilon is an arbitrary number greater than 0. So, I can go on making epsilon small and small. So, but here the limit is in terms of probability. Probability of this event that is of this difference x n minus x greater than epsilon becomes an impossible event because the probability is 0. So, this is the idea of convergence in probability. Then the other definition that I want to make is that of and this is called this convergence in probability I have already given one name. It is also called stochastic convergence, convergence in measure. Measure is your probability here or weak convergence. So, this is one definition and the other is the convergence in distribution. So, we will say that x n converges to x in distribution or in law if the limit of f x n t that means the cumulative distribution function of x n. So, at the point t converges to the distribution cumulative distribution function of x at t as n goes to infinity and this must happen at each point t where f x is continuous. So, that means and in fact obviously this is also continuous at that point. So, limit f x n t the cumulative distribution function of the random variable x n this converges to the cumulative distribution function of f x t of x as n goes to infinity. So, now you know abbreviating the notation. So, this says that f n goes to f where f n t is the cumulative density function distribution function of x n and f we denote by the cumulative distribution function of x. So, at t. So, notation for x n converges to x in distribution we also say that x n going to x in distribution. So, the notation that I have written down or the cumulative distribution function f n of x n which is f n f n going to f the cumulative distribution function of x in distribution and d can also be replaced by l. So, both these notations are valid. So, this is also called v convergence v convergence in law or v convergence in distribution. So, you can see the difference because here it is only we are saying that probability of this event is becoming 0 as n goes to infinity just the whereas, here the whole distribution the cumulative distribution function the whole of the function is converging to the cumulative distribution function of x at every point t where it is defined where it is continuous. Now, convergence in probability and convergence in law are very important and we will see as we go along that the numerous applications of these convergences and are easier to prove and then the less important types of convergence called strong convergence. So, maybe in this course we will have a chance to look at one or two strong type of convergences also, but the more widely used are the weak convergences and these are in law and probability. So, we will now define law of weak law of large numbers. Law of large numbers states that if you have a sequence of these random variables identically independently distributed random variables such that the expected value of each of them is mu and variance is sigma square and these are finite quantities that means the variance and exist then you define x n bar. So, x n bar would be the average of the values up to n. So, sigma x i i varying from 1 to n divided by n and then in simple terms the weak law of large number says that this sequence of averages x n bar. So, as n goes to infinity that means when you take n plus 1 it will be average of x 1, x 2 of x n plus x n plus 1. So, then so this is the sequence that you are generating by taking averages of n, n plus 1, n plus 2 and so on. And then so this sequence converges to the mean of the or the expected value of the random variables. Now, the idea here is that so actually this gives and this will happen in probability right I mean. So, whole idea because we say that weak law of large numbers. So, the whole convergence the concept is in terms of probability and so what we are saying is that since the it is converging in probability the probability is high that I that means I can take the for large enough n I can take x n bar as a good estimate of mu. Otherwise how do we how do we have because we just have the sample values which we have taken randomly and then we are wanting to estimate the mean of the distribution. So, this would provide a good estimate for the for the for the for the for the for the value mu. Now, for example if you if all x i's are Bernoulli then we know that mu is of course is a good estimate of mu in the sense this is also the probability p. If the probability of success is p then for the expected value of each Bernoulli random variable is also equal to p the probability of success. So, what it is saying is that when you take n large enough then this would give you a good estimate of the probability of success. So, these these law of this law of large number provides a way of estimating the mean of the distribution this is the whole idea. So, formally if you want to define this concept that you know then we will say that given delta and epsilon greater than 0 some arbitrary numbers then there exists a number m which is a function of epsilon and delta such that when you write this probability x 1 plus x 2 plus x n upon n which is x n bar x n bar minus mu in absolute value greater than delta will be this probability will be less than epsilon for all n greater than or equal to the number dependent on epsilon and delta. So, this is simply you know just extending the notion of or just same notion that you have about continuity when you talk of continuous functions then you want to say that the function values this and this for example, can be brought brought as close as you wish. So, this greater than delta will be less than epsilon provided for n big enough that means n must be greater than or equal to some function which is function of number which is dependent on which is the function of epsilon and delta. So, the whole idea is that as long as n is large enough given the delta and epsilon you will be able to say that this probability greater than delta is less than epsilon. So, that means when I choose delta and epsilon small then this is essentially saying that for you know the number x n bar comes close and close to mu and so this greater than delta whatever I mean so the event will become impossible because if I choose epsilon very small then this probability is very small. So, of this difference being greater than delta so in probability so the whole thing is being talked about in terms of probability. So, the proof is simple and here you see I will use Shebyshev's inequality. So, by Shebyshev's inequality this says that so here as we have seen already that you know for x n bar your variance because they are identically independently distributed will be sigma square by n and the variance and the expected value of x n bar is mu. So, therefore, this is x n bar minus is expected value. So, this difference in absolute value greater than delta would be less than or equal to sigma square upon n delta square. So, now here I did say that epsilon and delta are arbitrary, but see I can choose my epsilon to be sigma square upon n delta square. So, in a way epsilon is a function of delta that is so then this is I will choose my epsilon to be sigma square upon n delta square and then that will give me that my n must be so that means this number if I denote by epsilon then this probability is less than or equal to epsilon for n. So, from here you see n the smallest value of n would be sigma square of epsilon delta square, but for all n greater than this number this inequality will be satisfied and so my number capital m epsilon delta can be chosen like this. So, once we get that n is greater than or equal to sigma square by upon epsilon delta square I mean this inequality is valid. So, what we have shown is that given epsilon and delta greater than 0 we can find an n such that this inequality is satisfied for all values of n greater than or equal to sigma square by epsilon delta square. So, this is our m of epsilon delta in the definition for you know limit of the probability when we define what we mean by limit in probability sense. So, then this is our m of epsilon delta right. So, for all n greater than or equal to this given an epsilon and delta then for all n greater than or equal to this number this inequality will be satisfied right. And therefore, it follows immediately that this limit of probability of x n bar minus mu in absolute value goes to 0 as n goes to infinity because I as n becomes larger and larger I can choose for I can choose epsilon smaller and smaller here this was my this is greater than or equal to delta here I have chosen yes. So, in my definition for when I defined the limit of a probability then we chose this epsilon we chose a sigma square upon n delta square. So, what we are saying is that this probability that is x n bar minus mu in absolute value greater than or equal to delta is less than or equal to epsilon. So, when I want. So, if I choose this equal to epsilon then I am saying and therefore, as epsilon becomes smaller and smaller I can my n will become larger and larger. And so, my from my definition of limit in terms of probability it follows that this probability will tend to 0 as n goes to infinity. So, this is what we yeah. So, therefore, you see again here that I have made a very good use of Shebyshev's inequality to show you that this probability the limiting value of this probability of absolute value of x n bar minus mu will tend to 0 as n goes to infinity that this satisfies the. So, by Shebyshev's inequality this will be satisfied and so x n we have shown that x n bar will converge to mu in probability. So, essentially we this what when you take the limit as n goes to infinity then you see this number goes to 0 because as n goes to infinity epsilon convert tends to 0. And so, therefore, this limit in probability of limit of the probability x n bar minus mu will go to 0 as n goes to infinity. So, essentially now of course, there can be different interpretations and say one of the students interpreted this as you know like if somebody who is practicing to be let us say swimmer. And so, what he will say is that you mean that means no matter how hard I practice my average performance will remain the same because in probability x n bar is converging to mu. So, that means he says that there is no scope for improvement, but again the fallacy in his argument is that see here we are this result we are proving under the assumption that x 1 x 2 x n the sequence is independently identically distributed. So, the identity part is not valid when you are practicing obviously your this things are improving. So, your performance is improving every day and therefore, to say that you will never rise above the average that means your average performance will remain the same no matter how hard you work is not correct because your x i's themselves are changing they are no longer identical identically distributed. So, therefore, this is not good way to interpret the weak law of large numbers, but it certainly gives you a tool for estimating the value of the mean of the distribution from which the random variables are coming. So, we can now look at these examples to see the application of the weak law of large numbers. So, in fact for example, if your sequence is from exponential 1 by lambda that means they are all identically independently distributed random variable. That means samples you are taking from an exponential distribution with parameter 1 by lambda that is the pdf is 1 by lambda e raise to minus 1 by lambda x for all x positive then this probability if you take x n bar here x n bar minus lambda in absolute value greater than delta would be less than or equal to again by Chebyshev's inequality because the. So, here expected x i is lambda inverse of the parameter here and variance x i is lambda square for exponential distribution. So, therefore, this would be less than or equal to lambda square upon. So, for the variance of x n bar would be therefore, lambda square by n. So, lambda square by n 1 upon delta square and this goes to 0 as n goes to infinity. So, we can in fact we can choose for any delta we can choose epsilon as I showed you here and it will satisfy this that definition. Anyway, so therefore, what we are saying is that x n bar would be a good estimate for large enough n would be a good estimate for lambda for the mean of the distribution. Now similarly, if you have a Poisson if this family if the sequence is coming from a Poisson distribution with rate as lambda then again this will be. So, here you have x i is lambda and variance also is the same for a Poisson. So, this is also lambda and so for variance of x n bar would be lambda by n and so this probability greater than delta would be less than or equal to lambda upon n delta square and this will again go to 0 because lambda and delta are finite as we said that you know we are talking about the situation where the mean and the variance are finite. So, this will again go to 0 as n goes to infinity and similarly, if you take the sample from. So, I am just giving you a few examples but you will see that this is universally true because there we did not specify we simply said they should be independent identically distributed random variables. So, you three examples here and if the sequence is from a normal mu sigma square these are the sample values then again this will be less than or equal to. So, now here again your e x i is mu and variance x i of course is given to be sigma square. So, variance x n bar would be sigma square by n this will also go to 0 as n goes to infinity. So, Shebyshev's inequality has proved to be a strong tool for proving weak convergence and we will see that the other I showed you application of the Jensen's inequality also and we will also again look at some more limit theorems where also we will make use of these inequalities. So, therefore, the whole idea is that I mean again one needs to emphasize the fact that we are not saying that the value that x n bar will in value tend to mu what we are saying in probability it will tend to. So, therefore, when we say it is a good estimate this is in terms of probability the probability is very high of this number becoming closer and closer to mu. So, again as I said matter of interpretation you might say that you go to a casino and you go on putting money in the machine slot machine and say for a number of times you are not getting any you are not successful. So, you will say that no it will soon happen, but that is not true because again is you know it is a matter of probability. Yes, the probability is high because the event is getting impossible I mean this probability is getting to 0 that is fine, but it may happen that you may have to go on you know playing at the slot machine for a long time before you your luck turns or that means the things change. So, therefore, one should not say that yes surely what we are saying here is that it will happen that means if you flip a coin and you keep getting tails then surely after some time you will get heads also, but it does not say when and this is a matter of. So, the important thing to understand is that we are talking in terms of convergence in probability. And so this gives you a good way of estimating the mean of the distribution that means you take coin taking large enough samples and then you take the average and that will give you an idea of what your mean of the distribution is. So, we will continue the discussion with the central limit theorem and what we are saying is. So, here I want to you know address the questions for example, what is the distribution of x and bar look like? This is one question we want to answer and we will use the central limit theorem to do that. And then the second question would be how fast does x and bar converge to mu? And so now let us look at the central limit theorem states that sigma x i minus n mu upon under root n sigma will converge to n 0 that means normal standard normal distribution as n goes to infinity. That means this variate will because this is a random variable for all n. So, this will converge to the standard normal variate as n goes to infinity. Now, here because you see sigma expected value of sigma x i i varying from 1 to n will be n mu and variance of sigma x i i varying from 1 to n will be n sigma square the x i's are sequence of independently identically distributed random variables. So, this is and therefore, you are standardizing by subtracting the mean of this variate. So, minus n mu divided by the standard deviation which is root n sigma. So, therefore, this we are saying that after standardizing the variate sigma x i i varying from 1 to n the central limit theorem says that this will go to n 0 1. So, in distribution and the weak law of large numbers said that in probability sigma x i that is sigma x i by n will converge to mu in probability. But what we are going to say here show is to answer the first question that is if you now divide by n then this becomes sigma x i i varying from 1 to n divided by n and there will be an n here and there is a root n. So, that becomes root n times divided by sigma. So, this whole thing and we are saying that this was. So, therefore, now this is and therefore, the central limit theorem says that this converges to this variate will converge to the normal 0 1. So, I can write down sigma upon root n here and so essentially what we are saying is that x n will converge that means the distribution of x n bar you know as limiting distribution of x n bar will be. So, right now the distribution of x n bar for large n we are saying will be close to mu normal mean mu and sigma and variance sigma square by n and then of course, as n goes to infinity we are saying that. So, in other words that the central limit theorem says that if you take any distribution your x 1 x 2 x n were coming from any distribution, but when you talk of x n bar and for large enough n then you see the curve will become bell shaped. It will get closer and closer to the normal curve for large n and the limiting value this will converge to variate which has the normal standard normal distribution. And so CLT the central limit theorem implies the weak law of large numbers because weak law of large numbers only said in probability x n bar will converge to mu the probability of mod x n bar minus mu will converge to 0 and so, but here it is saying that in distribution. So, x n bar in distribution will converge to standard normal well I should not say because if I am taking x n bar if I am simply taking x n bar then this will converge to n mu of. So, well I have simply said it here for x n bar I have not talked of the limiting value what we are saying is that this will be approximated by normal mu comma sigma square by n. So, the proper statement is that x n bar the distribution of x n bar for large enough n will look like a normal mu comma sigma square by n, but you can see that as n goes to infinity this thing will become. So, the whole mass will get concentrated on mu only for x n bar. So, but then if you look at x n bar minus mu x n bar minus mu this absolute value then we are saying that the or if you are looking at x n bar minus mu upon sigma by n, this is sigma by root n. Then this thing see this will converge to so that this can be approximated by standard normal, but when you look at x n bar then this will be you know approximately normal mu comma sigma square by n. So, the final theorem we can now state as so if you have x 1, x 2, x n and so on sequence of identically independently distributed random variables each x i they having mean mu and variance sigma square and this variance is finite. So, if the variance is finite that means the variance exist then the means will exist. So, we do not have to separately say that mu is also finite and variance is also finite. It is enough if you say that the variance is finite then it implies that the mean also exist. Then the distribution of distribution of see this is important of x 1 plus x 2 plus x n minus n mu upon root n sigma this converges to the standard normal distribution 0 1 as n goes to infinity right. This is what that is in other words we want to say the same thing is that the probability that x 1 plus x 2 plus x n minus n mu upon root n sigma is less than or equal to a this will converge to the form there 1 upon root 2 pi integral minus infinity to a e raise to minus 1 by 2 x square d x for all a belonging to r. Because this is your p this is the cumulative distribution function for so this is what you are saying is this is probability of z less than or equal to a. So, which I have written down here that is if you define your random variable y n as sigma i varying from 1 to n of x i minus n mu upon root n sigma then the cumulative distribution function of y n as n goes to infinity will converge to the cumulative distribution function of the standard normal variate z right and this is for all a and this is what remember earlier I had defined convergence in distribution or in law which said that the cumulative distribution function of a random sequence of random variables converges to a particular cumulative distribution function. Then we say that the sequence of random variables converges to that particular random variable in law or in distribution. So, here this what we are saying that your sequence of random variables y n as n goes to 1, 2, 3 up to infinity then this sequence of random variables converges to the standard normal variate in law. So, now we have had a look at the central limit theorem in various forms its implications and of course, we will continue looking at its applications more and more.