 So, in the last class, we started talking about stochastic bandits and in the last class we talked about the stochastic bandits set it that will be interested in. Then we introduced a different bandit environments right, we said that my arms distribution can come from a set of class and that class is going to define my environment class. So, we will identify pick one particular set of distributions and assign it to one to each arm that is going to define a bandit instance. After that now we are started looking into how to estimate and one natural estimator we considered is simply take the average of all the samples and then we are interested in how is this sample average is away from the true mean value. Ultimately our interest was to identify an arm which has the highest mean right. So, we want to identify among the arms which we can play which has the highest mean. We discussed both this Markov inequality and Chernoff inequality sorry Chebyshev and then using our central limit theorem we are able we argued that the estimator being away from this, this is upper bounded by and again this was like an approximation for sufficiently enlarge we argued that that is like bounded like this. So, this is the bound we got using central limit theorem for n sufficiently large and also we had an another bound right based on our Chebyshev inequality what was that? It was sigma square divided by n epsilon square. So, this bound here decays inversely in n whereas, this bound here it kinds of decays exponentially in n. So, this is a tighter bound, but we this bound we got it only approximately assuming that n sufficiently large, but this gave us a intuition that possibly this bound may be weaker it is not like it is falling inversely in n, possibly it is falling exponentially in n. So, for that we will now further look into how to get this can we get a bound which are decaying exponentially in n, but the exact the bounds at exact ones not like in approximately not necessarily n has to be sufficiently large whatever n number of samples we have can we express our bound in similar to this ok. For that we are going to now focus or introduce one set of random variables called sub Gaussian random variables. So, what is the sub Gaussian random variable? Random variable X is sigma sub Gaussian if for all lambda in R it holds that expected value of e to the power lambda X to the upper bounded by expected value of lambda square sigma square by 2. So, we are just saying that if you have a random variable X exponent it or basically take the what we call this as some moment functions right moment generating function and if that is upper bounded by like this we are going to call it as sub Gaussian with parameter sigma ok. Just to see are there any random variable which satisfies this property. So, let us take a X to be Gaussian with mean mu and sigma square ok. So, if you are going to compute its moment what is the bound look like? So, what if we know the characteristic function of a random variable is unique right. What is the Gaussian look like? AXB mu lambda square sigma square by 2 ok this is what, but suppose I assume that this mu I assume that distribution this Gaussian distribution have to have a 0 mean ok. So, if I have a 0 mean then this guy is exactly exponential sigma square lambda square by 2 this is what it is saying. So, at least the Gaussian random variable with 0 mean and sigma square variance is sigma sub Gaussian have at least one random variable which is sub Gaussian. Since, this is like expressed in terms of moment generating functions how did you define the moment generating function this or log of this. So, alternative characterization of whatever we have here is if you take log on both sides it just says that lambda square or sigma square by. So, we are saying that if a random variable X satisfies this correct this kind bound then it is a sigma sub Gaussian. And it is not necessary that every random variable we have need to be sub Gaussian ok. Here we have shown that if I have a random variable which is Gaussian with mean 0 and variance sigma square that is sigma Gaussian, but if you take arbitrarily random variable it need not be it need not satisfy this condition. For example, if you take X to be exponential distributed with parameter mu what is the bound we are going to get just compute it now. So, just take X is exponential lambda. So, what is this value is going to be? So, how we are going to compute this? This is going. So, sorry you take it as mu for some positive mu. So, this is going to be e to the power lambda X X e to the power mu X and there is also mu here dx 0 to infinity right. What is this value is? So, this is going to be mu upon lambda minus mu into e to the power lambda minus mu into X between 0 to infinity. So, if I am going to choose this lambda to be greater than mu what is this quantity? If I choose lambda to be greater than mu this is anyway positive quantity this whole term, but when I put the upper limit infinity this is going to become infinity ok. And what happens when this lambda is less than mu it becomes negative quantity and becomes 0 right like you will not be able to what for no value of whatever sigma you are going to say you will not be able to come up with such a characterization here ok. Because of this this random variable is not going to be sub Gaussian here for whatever parameter it is not going to be sigma sub Gaussian for whatever sigma you are going to take. So, we will see bit more of this examples later and some in the assignments what kind of distribution satisfy sub Gaussianity and what does not. So, another based on the sub Gaussianity one can also define something called heavy tailed and light tailed. This is not important for us, but just I will make a remark this is maybe this will be useful to you some in other cases random variable X is called heavy tailed. So, otherwise otherwise it is called light tailed. So, basically we want this quantity to be a log of to be upper bounded by some finite number here. So, that is why we are interested in distributions which are kind of light tailed ok. So, we will be in focusing on light tailed distribution ok fine. So, we have this at least we have shown that one Gaussian random variable that is Gaussian with 0 mean and variance sigma square it has sigma sub Gaussian. What is the support of this Gaussian random variable here? So, it is an entire R real line right minus infinity to plus infinity. So, its support is infinity it is not the random variable need not be bounded to have a sub Gaussianity property even unbounded random variable can have sub Gaussian property. The only thing we required is it requires that it has to be somehow light tailed it means that as you go along. So, how does the Gaussian look? Something like inverted bell right, but after sometime as you go further this probability becomes very small. So, if you. So, most of the probabilities will be in this region and if you go beyond this. So, this probabilities will be small. So, that is why in a sense that is light tailed like if you go look at the tail the probability contained in this region is very small. So, we will be now going to look into the sub Gaussian that means, our random variable need not our distribution need not have finite support it can have unbounded support. Now, this is the first reason. If x is sigma sub Gaussian then for all epsilon equal to r equals to 0 its tail probability is upper bounded by square by 2 sigma square. So, what we are interested in? We are interested in the random variable taking value beyond epsilon. So, here in this case suppose this is epsilon we are interested in my random variable taking probability in this part ok. We are saying that that is upper bounded like this. Now, why this bound holds true ok. So, the simple proof follows from the Markov inequality. So, suppose we are interested in probability that x is greater than or equals to epsilon right. So, this x need not be positive random variable it can be take positive and negative random variable, but what I will do we have already done this when we proved Chebyshev inequality we will I will explain both sides by x right this should still hold the inequality still hold and the probability holds with equality. Now, e to the power x is it is a positive value random variable right. If x is any random variable, but e to the power x is a positive value random variable. Now, I am asking this positive value random variable being greater than e to the power epsilon. So, I since this is a positive random value random variable I know Markov inequality can I mean I can apply Markov inequality here right. So, what is that gives me? So, before applying any Markov inequality here what I will do is I will do also I will multiply I exponent both sides again by some number lambda positive ok. So, this inequality still holds right both sides I am exponenting by lambda. So, if I do that now Markov inequality tells that e to the power lambda x divided by e to the power lambda epsilon is it this or whole square of this it is just this right it is just this. Now, I invoke the assumption that this x is sigma sub Gaussian. So, if this random variable x is sigma sub Gaussian I know now this is upper bounded by this quantity here. So, which I and I will take also it in the numerator this will give me this quantity right and this is true for any lambda positive. So, now, because this is true for any lambda positive and this exponent here this will like a convex function in lambda. What I will now look for is the smallest value of this lambda or a smallest value of this exponent. So, can you just optimize it over lambda and tell me what is the value of lambda that minimizes this exponent. So, it has to be what 2 epsilon sigma square or this 2 will be there simply epsilon by sigma square right. Now, you just plug in that value here that optimal value. So, if you plug in that what you will get. So, what are you getting? So, if I just plug in this is epsilon square or this is only sigma square here. So, if I do this this is going to give me what minus epsilon square to sigma square and it is exactly what you wanted x is positive why where did I use that x is positive this one no it is whatever this value whatever this value it is there just the both side you are exponenting. So, this is for x greater than or equals to epsilon you can also have a similar bound which says that x less than or equals to epsilon ok. So, instead of you asking epsilon is being greater than here you ask this epsilon is less than or equals to minus epsilon ok instead of that then you will also get the same thing. So, probability that x is less than or equals to minus epsilon you get and now if you want to use what is the probability that your x mod of x being greater than or equals to epsilon. That means, what I am asking here I want it to be both can I write it like this probability that x is greater than or equals to this is mod of x is greater than or epsilon. So, then this is going to be greater than epsilon and plus probability that x is less than or equals to minus epsilon. So, if you just plug this quantity we are going to get 2 times exponential of minus epsilon by 2 square. Now, this bound here I am going to choose a specific value of epsilon let us say let me choose this epsilon to be some quantity 2 sigma square log 1 by delta. So, I am now interested in knowing what is the probability that my x is going to be larger than this epsilon. So, if you have plug in that value what is the probability that x is going to be greater than or equals to 2 sigma square log 1 by delta. Now, just substitute this value of sigma there what you are going to get exponential of plus 2 lambda square log 1 by delta this whole divided by 2 sigma square and this simply becomes log delta and this is going to be delta. So, what it says is the probability that if you want this x to be larger than this quantity that is not going to be more than delta ok. So, is it then in that case can I say that for any given delta if I want my x to be in this interval one is. So, if I want my x to be in this interval right. So, what I am basically asking I know what is the probability that my x being away from this epsilon and what is the probability that x is below this quantity here that is minus epsilon. So, I have set epsilon to be like this and when epsilon is set like this I know that my random variable going beyond this quantity is going to be upper bounded by lambda. Now, I am asking what is the probability that your random variable lies in this interval. So, that is just a going to be the complement of this right what is that this is going to happen with probability 1 minus 2 delta ok. So, with probability delta it is can be away from this the probability another delta it can be below this and if you remove this then this is the probability that it is going to be in this region. So, what it is telling us is if you are interested to know that your random variable x is going to take value in this interval then that is going to be given with probability 1 minus delta. Suppose let us say your delta is very small. So, that if delta is very small you want that means basically you want with very high probability you want in this interval. So, if delta becomes very small what you expect this these two terms to be like they are going to be large right. If delta is small then 1 minus delta is going to be large and because of that this these points move away from each other ok fine. Now, this is just one thing what I was finally, interested in what happens to my estimators like when I have n samples and if I average them how far that will be from the true mean right that is what of my interest. Now, now let us see how. So, now finally, I am interested in asking the question ok if my random variables are Gaussian what is this probability is going to look like ok and what is mu hat here mu hat is this is the average of n iid samples ok. So, now, before we conclude what is the bound we are going to get using this result we need some properties of sub Gaussian random variable which I am going to list now this is called a lemma. Suppose X is sigma sub Gaussian and X 1 and X 2 are sigma 1 sub Gaussian sigma 2 sub Gaussian respectively then the following properties holds. One is if X is a sub Gaussian it must be the case that its mean value is going to be 0. This is the property of a sub Gaussian random variable and its variance will be less than or equals to sigma square that is sub Gaussian at a parameter sigma. And second if you multiply your random variable by some constant c it could be positive or negative we are saying that is going to be mod c of sigma sub Gaussian for all sigma belonging to R c belonging to R. So, if you multiply a sub Gaussian random variable by a constant it is going to be a new sub Gaussian random variable with a parameter mod of c times sigma. So, suppose if this is a constant then the c x will be simply a c sigma sub Gaussian. So, this sub Gaussianity parameter got scaled by c. The third one is X 1 plus X 2. So, we are saying that X 1 is sigma 1 sub Gaussian X 2 is sigma 2 sub Gaussian then if you are going to add X 1 plus X 2 then it is going to be still sub Gaussian, but with the parameter sigma 1 square plus sigma 2 square ok. Now, let us see we can I am going to leave the proof to you just this is application of the definitions work it out yourself. Now, how can we get what we want using this result and this lemma ok. First of all notice that what is this mu hat minus mu by definition it is nothing but which I will write it as ok. So, what we will show try to show is ok before I write. Now, we are going to show that assume X i mu R independent sigma sub Gaussian random variable then for all epsilon greater than or equals to 0. Our claim is that probability that mu hat greater than or equals to mu plus epsilon for bounded by e f p n epsilon square by 2 sigma square and probability that mu hat epsilon or equals to mu minus epsilon is again bounded by the epsilon by 2 sigma square ok. Now, why this is true we are saying that if this X i minus mu R independent sigma sub Gaussian random variable then for any epsilon this is going to happen. So, now, what we are interested in this guy here right what we are a what we are saying is this mu hat minus mu this can be expressed like this. So, this is nothing but. So, now, if you look into this. So, this X i minus mu is sigma sub Gaussian and we have addition of n random variables here. If you just look into this summation here ok now instead of that let us take this n also inside. If X minus mu is sigma Gaussian what is X minus mu by n is it a sub Gaussian with what parameter it is going to be what parameter it is going to apply this. I know if a random variable is multiplied by c then this is going to be still a sub Gaussian with mod c term sigma right. So, here X minus mu sigma Gaussian say X minus mu divided by n is 1 minus n sigma Gaussian right 1 minus n sigma sub Gaussian. So, is what is this is 1 by n sigma sub Gaussian and now we are saying we are now taking some of n random variables right each one of them is sigma by n sub Gaussian. Now, what is this entire thing will be it is sub Gaussian with what parameter with this sigma why is that root sigma root of now we have to apply this result right. So, now we have to now we have to say that this i equals to 1 to n psi minus mu by n this is going to be what now this is summation each of them is sigma by n. So, this is going to be sigma by n whole square and we have to add it for how many times n times. So, what will this will give you sigma by root n ok. So, this whole thing here is going to be sigma by square root n sub Gaussian. Now, treat this as entire one random variable now asking that this random variable is gain gain greater than epsilon. If this is a some sub sub Gaussian value with being greater than epsilon then just apply this replace what are the sub Gaussianity parameter by the sub Gaussianity parameter of this random variable. What is the sub Gaussianity parameter of parameter of this sum here we have just showed that that is sigma by square root n right. So, to get a bound on this all you need to do is replace this sigma by sigma by square root n. So, if you do that that is what I got this exact bound I just replace sigma by sigma by square root n. So, if you got that you get an extra factor n in the numerator. So, that is how we get. So, this is done this guy here is nothing, but now this guy is nothing, but by applying this e x p minus epsilon square by 2 by 2 by square root n whole square and we get exactly what you want. So, notice that again now we got this bound here finally, which is decaying exponentially in n right. If you increase n this bound is also decaying exponentially in n. So, finally, we and this bounds are not for only n large this has true for any n right. So, now, we have a bound which exponentially decreases in n and that holds for any n like when we applied central limit theorem we got also exponential decay, but that will hold for only large n. But here when I use the sub-gaussianity property I get this bound which holds true for any n ok. I did not make any such assumption right. Let me see if I need that assumption that they are independent sub-gaussian are sub-gaussian are independent and so, we need to be they need to be independent sorry independent ok. Just before that I just take one more minute. So, ok here may be when we had this Chebyshev inequality right when we applied it what is the bound we got? We got that mu hat being greater than mu by epsilon this is the Chebyshev inequality I was talking I am talking about we got this to be equal upper bounded by what some sigma square by n epsilon square right when we applied Chebyshev inequality. But whereas, when I apply this sub use this explore the sub-gaussianity I got this quantity to be right this is what I got in epsilon square 2 n square. Now of course, this is tighter right this is going to be this value is going to be tighter because this is decaying exponentially n and whereas, this is decaying inversely in n. Now actually this bound here we can show that. So, we know that exponential of E can be written as upper bounded by 1 upon e to the power x for all x greater than or equals to 0. If x is greater than 0 e to the power x is 1 divided by E times x. So, if you apply this property on this treating this quantity as x what you are going to get upper bounded by 1 upon E times 2 sigma square divided by epsilon square by n. So, you see that finally, what we have gotten is so and also 2 by E is is the ratio 2 by E is less than 1 or greater than 1 it is going to be less than 1 right because E is 2.7. So, finally, we are going to get sigma square by epsilon square n. You see that what are the bound we got this is of course, this is going to be much tighter than this ok.