 Today, we will discuss some concentration inequalities. These concentration inequalities, it is a very vast topic, we will only discuss some very elementary inequalities. Basically, what these concentration inequalities say is that, they give you certain probability bounds on random variables taking atypical values. So, that is roughly the plain English level that is what it is. So, if you are talking about, let us say random variable with a certain mean and a certain variance. For example, one of the inequalities we will do will say that, the probability of a random variable taking values outside a certain range of the mean is very small. So, essentially it is called concentration, because the probability concentrates around a certain range. So, on these kind of inequalities are called concentration inequalities, it is a very vast topic. So, we will only do some very elementary ones. So, the first, so the most important one is called Markov's inequality. This says that, if x is a non-negative random variable then. So, here you are looking at a non-negative random variable and some alpha greater than 0 and this is Markov's inequality. This says that, the probability that x takes values larger than alpha is at most expectation of x over alpha. So, this should be greater than alpha x greater than alpha is less than or equal to expectation of x over alpha. So, this has, so this is meaningful. So, this is always true for any non-negative random variable, but this is meaningful. Obviously, only when expectation of x is less than alpha, it for only for values alpha bigger than expectation of x is even meaningful. If not, you will get something here that is bigger than 1 and then, you are just saying that the probability is less than 1. That is completely trivial, but if you are. So, let me say useful for alpha bigger than expectation of x. So, it just says that this is a very weak kind of an inequality, but it is a very universal inequality for non-negative random variables. So, if you are considering a value that is bigger than the expectation. So, it says that the probability of the random variable becoming larger than that alpha is less than or equal to expectation of x over alpha. It goes down as 1 over alpha, at least as fast as 1 over alpha. So, for example, the probability of random non-negative random variable taking value greater than twice its expectation is at most half, that is the kind of result this is. So, the proof is very elementary. So, you have expectation of x is equal to, you can write it like this. You can write it as expectation of x indicator x less than or equal to alpha plus expectation of x times the indicator of x greater than alpha. That is because I have skipped a step. So, this indicator and that indicator have to add to 1 and I am using linearity of expectations. So, this indicator. So, this indicator fires only when x is less than or equal to alpha. So, when this indicator fires this x is less than or equal to alpha. So, this means that this is less than or equal to alpha plus that right. So, expectation of this indicator actually you know what that is not what I want to do. This is not true right. So, what I want to do is to say that actually I want to borrow it this way is not. So, I have to say that this is greater than or equal to. So, this guy is actually non-negative is not it? That bit is non-negative because x is a non-negative random variable. So, this is non-negative. So, I want to go this way after all I want expectation greater than or equal to something right. So, I want inequality to go this way. So, I get this is greater than or equal to expectation of x times indicator x bigger than alpha. That is greater than or equal to or greater than or. So, this is greater than or equal to actually alpha times the expectation of the indicator of x bigger than alpha right. You agree with that because I know that this indicator will fire only if x is bigger than alpha. So, if I put alpha here right. So, I will get something at smaller right and I can bring that out and this is nothing but alpha times the probability of the expectation of indicator is actually fairly elementary right. So, I messed up this inequality in the beginning. So, this is just three steps. So, this is Markov inequality. So, this Markov inequality is often very loose. So, this only says that the probability of x becoming very large some very large value alpha goes down like 1 over alpha right. So, by the way this is for non-negative random variable with expectation of x finite often. So, this 1 over alpha decay of this probability is often a very conservative estimate. Often this probability goes down much quicker, but Markov inequality is very weak. It only gives you that. So, if you want to look at the probability of x becoming larger than 10 times it is mean you only say it is like roughly like 1 over 10 right. It is a very often it decays much quicker for a commonly encountered distributions. So, although this is the most simplest inequality concentration inequality you can strengthen these kind of probabilities if you assume further if you make further assumptions. For example, when you assume a finite variance you get a quicker decay that goes like the squared that is the next thing I will do. The stable shape is inequality. The stable shape is it is spelt and spelt in very different ways. I believe it is pronounced to be sharp and it is pronounced with the t z and so on right. Say several spellings of this I do not know what the correct spelling is. I believe it is pronounced like stable shape if x is random variable with mean mu and variance sigma squared. So, this is all finite then probability of. So, this is stable shape inequality. So, now you are taking a random variable with mean mu this random variable need not be non negative. It is any random variable, but it must have finite variance. Finite variance implies finite mean. So, mu and sigma are both finite. This is mean this is variance. Now, we are saying that the stable shape inequality says that the probability. So, you are looking at the absolute value of the difference between x and the mean right. So, what is the probability that the random variable exceeds the mean in absolute value by k times the standard deviation right. And we are saying that goes down like 1 by k square 1 upon k square. So, it will just give you a picture. Generally you are considering some random variable with this is let us say your let us for the sake of argument say this is this has some density does not have to, but let us say your mean is somewhere here. And let us say your standard deviation is about that much. So, we are saying that the. So, we are saying that the probability of the random variable let us say this is k sigma this is mu this is mu plus k sigma all right. And this is mu minus k sigma right. So, let me say that this is k sigma. So, you are saying that the difference the probability of random variable taking values outside k times the standard deviation from the mean goes down like 1 over k square. So, again I see another way of saying it is probability of this another way of saying this or you can say probability of x minus mu greater than c is less or equal to sigma squared over c squared is not it. This is for c greater than 0 this is another way of saying it. So, you the probability of x minus mu exceeding c is at most sigma squared over c squared right you put c equal to k sigma you will get this back. So, again so recall that the sigma represents how widely spread the random variable is right. So, this bound will be even this bound will be fairly small if your sigma is small right. But any case the important thing is it goes down like this squared of what you consider the deviation c or the k right. So, proof is very simple what you do is you apply Markov inequality to the non negative random variable absolute value of x minus mu squared. So, in that case what will you have we will have probability of. So, the x minus mu squared is obviously a non negative random variable you are looking at the probability that it requires it exceeds k square sigma squared which is obviously less than or equal to expectation of x minus mu squared over k square sigma squared right. But that is nothing but sigma squared what is on the top sigma squared right. So, that is equal to 1 over k square. So, Markov inequality tells you it only assumes finite mean it tells you a 1 over alpha kind of a d k. Chebyshev's inequality assumes a finite variance it gives you 1 over k squared kind of a d k right. So, little bit tighter in that sense, but you need finite variance actually even Chebyshev's inequality often turns out to be quite loose. For example, if you have a random variable which is let us say exponentially distributed you know that probability of x greater than let us say probability of x greater than alpha goes down like e power minus mu alpha right. So, that is exponentially fast right. So, even this 1 over k square kind of estimate is not a very good estimate right. Because here you are using only variance information that you are using only mean here you are using mean and variance. So, if you have. So, you can often tighten these kind of inequalities if you make further assumptions. So, if you assume that the third moment is finite you can get some k cubed or if you assume fourth moment is finite you can get 1 over k to the 4 and so on right that is not a big deal. So, finally if you assume. So, I mean taking this to even further if you assume that the moment generating function of the random variable x is finite in some neighborhood of the origin. Then you can get some exponentially decaying concentration bounds. Because after all if you are see the moment generating function is expectation of e power s x right. And if you are saying that this is finite for some in some neighborhood of the origin. Then you are saying that some exponential moment exists right which means you can get exponentially decaying concentration bounds for such random variables right. In fact, I will not spend too much time on this this is such an inequality does exist it is a very important inequality called shernoff bound have you heard of it shernoff bound. So, I will just briefly indicate what it is. But I will not spend too much time on it because we have to move on to limit theorems and so on. So, shernoff bound is very useful because it gives you a exponential estimates. So, let us say assume that m x of s which is nothing but expectation of e power s x is finite for s belonging to minus x some interval around the origin. Then you can say that let us say you are looking for then let us say you are looking for the probability of x becoming bigger than some alpha same kind of a similar to this. But so this you can write as probability of e power s x bigger than e power s alpha is not it for s bigger than 0. If my s is bigger than 0 I certainly have. So, this will be a monotonically increasing function right. So, e power s x. So, x bigger than alpha is same as e power s x bigger than e power s alpha s alpha is that right s x s. So, yeah that is right fine agreed. Now, for s so s is bigger than 0 and e power s x is a non negative random variable right. So, I can apply Markov inequality. So, you see you keep going back to Markov inequality. It is a very weak inequality, but it kind of forms the foundation of so many other inequalities. So, you have so by Markov inequality we have that probability of x bigger than alpha is less than or equal to m x of s right this is m x of s. So, the expectation of e power s x divided by e power this guy right. So, this you will agree right this is valid for this is valid for all s such that s is bigger than 0 and of course, this guy must be finite right. So, you assume that it is finite for some s belong to minus epsilon epsilon, but this may not be a domain of convergence right this. So, this is valid for s bigger than 0 and s belongs to domain of convergence d x right that is what we do not denoted right the domain of convergence. So, well and good. So, this is true for all s bigger than 0 and all s in the domain of convergence. So, this is already an exponential bound if you notice that s is bigger than 0. So, this is something that d k is exponentially in alpha correct. So, this is a much tighter estimate now. So, you have freedom to choose s correct. So, what kind of s will you choose? So, you have to choose the s that infimizes this whole thing right. So, you can say that. So, this is again less than or equal to infimum s positive and of course, s has to be in the domain of convergence infimum all this m x of s e power minus s alpha. Now, sometimes people use the log moment generating function. So, people write m x of s is e power is like the you take the log of this guy call it the log moment generating function let us say. So, you define let lambda x of s equal to log m x of s wherever it is defined then. So, you can write this as e power minus let me do this properly s alpha minus lambda x of s soup right. So, this becomes soup here is not it take it up soup s bigger than 0 right because there is a negative sign it becomes a soup. So, that is your Markov inequality for s bigger than 0. Similarly, you can sorry not Markov inequality churn of bound for s bigger than 0 for s less than 0 you will get a similar bound for probability that x less than alpha right. If s is less than 0. So, probability x less than alpha will be equal to the same thing s is less than 0 right. So, you will get. So, you will get a similar exponential estimate on the negative side of on the this tail what is what you call this the left side tail right. This is a. So, if you are looking at some random variable whose let us say for the sake of argument let us say it has some p d f like that actually let me yeah. So, if you are looking at the. So, if you have some random variable like that. So, this is giving you the probability that x is bigger than alpha right you can also get the probability of x less than alpha right by if you take the similar s less than 0 right. So, if this is your density you do not actually need the density you just need the moment generating function to exist right and this kind of a calculation. So, this kind of calculation form forms the foundation of what is known as the theory of large deviations getting this exponential bound. So, this if at all you pick up if you study large deviations this is what you will you will get you encounter this expression a lot. It has lots of nice properties this lambda x is has is a convex function you can show and this is called the convex dual of the log moment generating function and so on. So, it has lots of nice properties, but that is really taking us too far a field the all you need to know is that you can get exponentially decaying bounds on this kind of a probability any questions. So, after you supremise over s what remains will only be a function of alpha right. So, that will look something like. So, this will look like. So, you can e power minus for some other function lambda star will look like e power minus lambda star of alpha right. So, I am just calling this after you supremise this function as lambda star. So, this is this is a very typical large deviation bound. Similarly, you can do a calculation for s less than 0 for the left side tape any questions. If you are interested in this kind of stuff you can first look up wikipedia or some text book on Chernoff bound and large deviation series basically takes off from here. Is there any questions. Yes. So, continuous random variables whether mean mean value divides pdf into 2 sections with the half probability no. So, that is called the medium right for any random variable. So, the point where you have half the probability on the side half the probability on the side is called the medium. So, that I think that concludes this little bit of a small section on concentration in equalities. So, the final. So, that brings us to the final module or final section in this course. This is about convergence of random variables and limit theorems and it is probably the most technically challenging part of this course. So, this requires a lot of effort this what I am doing to talk about convergence of random variables. Because, there is we will discuss convergence of random variables and various notions of convergence of random variables and we will also say how these notions of convergence are related and so on. So, what my experience has been that students often struggle with this. It is a little bit difficult to digest and get it into your heads, but once you understand it is all it is just fine. So, you have to what you have to do I am saying this because this you know this is getting to the end of semester right, but this is a time when you have to really put in some thought and effort and study you know this concentration or concentration this convergence of random variables. So, somewhat of a challenging section not easy when you are seeing it for the first time. So, if you are given a sequence of real numbers x 1 through x 1 x 2 dot dot dot we know what it means what it means for x to converge right. It just means that when n gets large x n. So, if x is your limit absolute value of x n minus x can be made less than epsilon right for any epsilon for large enough n. Now, we are going to talk about convergence of random variables a sequence of random variables. So, as usual. So, you have some probability space omega f p probability space and x 1 x 2 dot dot dot be a sequence of random variables on this space. So, if you want to speak about the convergence of a sequence of random variables see random variables are what kind of objects their functions right. So, it is you are essentially talking about convergence of functions right. So, this is something we already defined come to think of it right when we did m c t and so on we talked about convergence of random variables point wise convergence of random variables and what is it we talk about almost your convergence of random variables right. So, those 2 things we will I will repeat and then I will also give you a few other notions of convergence which are commonly encountered. Then we will talk about. So, about 4 or 5 different convergence notions we will define and we will talk about how they are related. So, this you already know definition 0 the sequence x i this sequence is said to converge point wise or surely to x if x n of omega converges to x of omega for all omega in the sample space as n tends to infinity. So, this is something we already seen right this is like point wise convergence of functions you pick any omega in the sample space for that omega you have a sequence of real numbers x 1 of omega x 2 of omega dot dot right as soon as omega realizes the sequence of random variables becomes a sequence of real numbers. Now, we know what it means to talk about the sequence of real numbers right. So, this. So, now this sequence of real numbers x n of omega converges to some x of omega right and that is your. So, and if that happens for every omega in the sample space this is called point wise convergence or sure convergence right for every omega you have this convergence that clear this is exactly point wise convergence of functions right. So, often but often we know that. So, this is 2 stricter notion of convergence right that is because in probability we do not often we are willing to we are willing to sacrifice measure 0 sets as such probability measure 0 sets right. So, we do not really care about probability measure 0 sets. So, we will slightly weaken this definition to say that we will say x n converges to x almost surely if this convergence happens except probably on a set of measure 0 probability 0 this I think this also we did right. We call it almost everywhere convergence I guess mu almost everywhere convergence. Now, the mu is this probability measure definition 1. So, I call this definition 0 because in probability you never use this sure convergence is 2 stronger notion of convergence to be useful in probability definition 1. So, we say. So, I am not going to write the whole thing again right we say x n converges to x almost surely or with probability 1 if x n of omega converges to x of omega on a set of probability 1 i e probability of omega for which x n of omega converges to x of omega equal to 1. The set of all omegas so this says that this convergence happens not necessarily for all omega in sample space but on a set of probability 1. So, there may be some little omegas left out where the convergence does not happen, but they have probability 0 that is almost your convergence sometimes this is also called strong convergence. Now, there is a little bit of technicalities here one has to be a little bit careful. So, one has to prove that this kind of an event this kind of whatever is inside a probability here right you looking at the event omega that x n of omega converges to x of omega right you have to prove that such a subset of omegas is indeed an event right but that can be done otherwise you cannot speak of its probability right. So, perhaps so we can so the way you prove that is. So, you assume that the limiting random variable is 0 or you can consider simply absolute value of x n minus x right and then the limit limiting random variable be 0. So, if you consider this is the 0 random variable you can actually write out this event as a countable unions and intersections of f measurable say events. Maybe I put this maybe you should put this in the homework to show that the set is indeed an event is this clear. And of course, there is also one other technicality we left out I think this also can be proven that if x n of. So, the limit of a sequence of measurable functions or random variables is always a measurable function. So, I mentioned earlier the proof is not all the difficult again you have to write you have to write the sets of the form x greater than alpha in terms of unions and intersections of these guys right. These are technicalities I did not really emphasize these too much, but they can be done. So, the limit of a sequence of random variables is always a random variable it is always an f measurable function. And these kind of events are indeed events are f measurable any questions on this. So, the way I will do this is I will give you all the definitions in one shot. Hopefully you will digest it abstractly and then we will give examples to illustrate the difference between the various notions of convergence. Any questions on this definition 2. So, now we are talking about the notion of convergence in probability. So, x n converges to x n probability if for all epsilon greater than 0 the limit of the probability that x n greater than x. So, x n minus x greater than epsilon is 0. So, if you are looking at the difference between x n and x absolute value of the difference between x n and x. And you are looking at the probability that the difference exceeds epsilon. Epsilon is something very small let us say you are fixing some epsilon very small. And you are saying looking at the probability that the absolute value of the difference between x n and x exceeds epsilon. And if for every epsilon greater than 0 if this limit goes to 0 then you say x n converges to x in probability. So, this I mean this. So, convergence in probability is very different from convergence with probability 1. It may not be totally apparent to you know that they are in fact very different. But, convergence with probability. So, one thing the limit is inside the probability. We are looking at the probability of omega for which x n of limit x n is equal to x here the limit is outside the probability. You can write this whole thing as probability of omega for which limit x n equals x. Another way of writing this is probability of omega for which limit x n is equal to x here the probability is inside the probability outside the limit here the limit is outside the probability. So, they are actually not the same. So, that is why it gets little tricky you have to digest these notions very carefully understand what it means. See the one thing see this is a bit of a misnomer. See there are only see the sure convergence almost sure convergence. In these two cases you have the sequence of random variables themselves converging. Here it is a bit of a misnomer to say x n converges to x in probability. After all what is it is not like x n the sequence x n itself converges to x here it is just that some sequence of probabilities are converging to 0. So, rather than. So, this is become standard terminology, but a correct terminology would be to say that rather than say a sequence convergence in probability you can say that the sequence of probabilities converges this sequence of probabilities converges. So, here I am saying that the sequence itself converges to x of omega here it is not true that it is not necessarily the case that x n itself converges to x on a I mean this mean may not happen it is only that this probability sequence goes to 0. If you call this p n or something if you call this guy let say p n of epsilon let us say I can call this p n of epsilon. So, the sequence p n of epsilon goes to 0 that is all it is not like x n goes to x in any real sense I guess what I am trying to say is that in here over here when you pick a very large n x n of omega itself becomes very close to x of omega. The difference between x n of omega and x n x of omega becomes very small for almost all omega not all omega, but the x n of omega itself gets close to x of omega. Here that is not necessarily the case what this is saying is that this probability goes to 0. There may be sets of probability greater than 0 where x n of omega and x of omega need not be close, but this overall probability goes to 0 of deviation by more than epsilon goes to 0 again this is not something that is so apparent to you at this point which is why I am saying this notions are little subtle. What we will show is that convergence with probability one or convergence almost surely will imply convergence in probability, but convergence in probability does not imply convergence almost surely. So, this implies this, but this does not imply that. So, this are strictly stronger notion of convergence than this. So, I will stop here I will continue with couple of more definitions and then we will see how this convergence notions are related to each other.