 So, today we are going to continue this problem of pure exploration. So, I just try to wind up this today, we will just going to discuss the lower bound today and just see how this lower bound compares to the lower bound of upper bound of what we got for KLU, KLU CB. So, we started looking into the best term identification problem with a fixed confidence. That is if I am given a delta, how should I am able to find the best term with this much of confidence with fewer sample complexity. The two algorithm we have seen, we saw that at least the KLU CB was intuitive, it is just trying to identify the best M arms by most of the time it is trying to distinguish whether M, is there any ambiguity between Mth and M plus 1th are, just try to resolve that and when it is sufficiently confident that Mth and M plus 1th are separated, it stopped. Now, the question is in general, any algorithm, if I apply it on any instance on an average, how many number of samples it needs before it gives me the correct answer. So, we are going to look into the lower bound sample complexity. So, the bound we are going to see this, it is bit involved, right now it is not clear why this bound looks like that, but let us write it and then try to understand a bit more. I am going to write it as a theorem. So, let us say you are given an environment nu for which the associated means are denoted as mu i of nu, and for that environment nu, let us say the optimal arm, I am going to denote it as i star of nu, fine. It is just saying that if you give me an environment nu, then the i star of nu is just the optimal arm for that setting. Now, I am defining the set of alternative environments such that the optimal arm in this environment is going to be different from the optimal arm in the original environment. So, what is the same? So, nu environment is given to me, i star nu is the optimal arm in this environment nu. Now, I am looking at all this other environment nu prime such that the optimal arm in this nu environment will be different from this, is that fine? So, I am just basically looking at such all these environments where my optimal arm is different from that in the environment nu. Now, this lower bound is defined in terms of that, ok. Now, assume that this pair pi tau is sound. So, we have already defined what this sound means, ok, for some environments at epsilon at confidence level delta. So, we have said what we have defined the soundness, we have said that soundness at confidence level delta, we mean that the probability that it holds the stopping time is finite and whenever it holds that it outputs a non-optimal arm is less than delta. So, let us take any policy that is sound, that is any policy and the stopping time pair that is sound. Then we are saying that on any environment nu, the expected sample complexity of that algorithm is going to be lower bounded by some problem dependent constant c star of nu multiplied by log 4 by delta. So, what is our input for this? Like we are talking about a fixed confidence, right? So, how is the sample complexity depending on the this confidence term? That is log of 1 by delta here, ok and there is a problem dependent constant. So, this problem dependent constant is with involved and it looks like this. So, the reciprocal of this problem dependent constant is the average of my cool back labor divergence between the arms, right. So, what I am doing is I am taking an environment nu prime which is coming from my this side of alternative environments and now I am looking the cool back lever divergence between my environment and the alternative environment, but now I am averaging them this alpha as they are coming from a distribution. So, I am just now looking at what is this quantity. So, this quantity depends on all possible distributions I have and also all possible alternative environments I have with respect to my given environment nu, ok. So, given like this it is hard to interpret what this quantity means. So, let us try to anyway try to get why this lower bound makes sense with this definition, ok. So, to prove this we are going to start with this quantity. So, I am just start looking into this ratio and now right away I am going to plug in this quantity which I have defined here this is going to be. So, now this quantity here is I am going to have taken the soup over all possible distributions. Now, what I will do is instead of looking at a soup I will look at a particular distribution. So, I am going to remove this soup and take a particular distribution. If I do this I am going to get a lower bound and what is that particular distribution I will be interested in is so, this is the expected number of times I have played before I stop and what is this? This is during before I stop what is the expected number of times I would have pulled each one of this arms. So, this forms a distribution. So, this forms a distribution and this belongs to this set of distribution that is why I have taken this. So, notice that if the supremum is indeed achieved for this distribution then this lower bound holds with equality right now I do not know this that is why I have written it as a lower bound, but if in if this is indeed the distribution at which this minimization sorry this maximization occurs then we have an equality. So, this will give us a sense of when this lower bound is going to be tight. So, the lower bound is going to be tight if at all thus distribution that this is the policy. So, this is all for a given policy pi this policy gets on the number of pulls before it stops if that happens to be a maximizing this divergence okay, I mean if this entire quantity then that quantity is going this lower bound is going to be tight, but anyway so, we are going to get this lower bound then now I am just going to do this further simplification. So, this quantity here it does not depend on the new prime so, let us get rid of this so, it is then simply going to be in for okay, so I just simplify this now next what we are going to argue is what is this quantity here? So, this quantity is nothing but the weighted sum of the divergences right. So, these are like weights depending on how many times you have pulled and this is the divergence between the up here of arms distribution, we are next going to show that this quantity is going to be lower bounded by log 4 by delta, we need to show this. If I can show this then I am done right then this quantity is nothing but c star mu into log 4 by delta that is what we have here okay, now to prove this now we are going to go back to our, the trick we used it in the proof of regret definition, what was the main result we used to prove the lower bound for our cumulative regret? Pinsker. Pinsker, yeah but what version of Pinsker? It was high, what we had you have been shown. Right, so if we have q at 2 distribution we wanted to say basically, yeah so, how the sum of what is the mass for any event A, what is the sum of the mass of q on A plus P on A complement. So we wanted to basically so that result we exploited to make sure that what is the probability that one event being confused for the other under distribution, under to another another distribution. So we just try to use the same fact same result or same kind of idea and try to apply the same result here to do that we need to define this set, let us define this set, so let us take this event E, so if my policy pi and the pair pi tau is sound what is the probability of this event? It should be less than delta, right that is the definition of my soundness, so take any, so here I have taken up environment nu prime and I am asking when you are you stop and that you are going to output an arm which is other than my optimal arm is going to be less than delta, then it is sound by definition. So now let us consider two events, so I am going to look at two distributions let us say V and V prime and now going to basically ask the soundness of my algorithm on this to environment. I know that by if my policy pi is sound what is this quantity it should be upper bounded by delta and what about this quantity, no this one is nu prime pi under nu prime, so I just change that the distribution instead of nu I am asking for nu prime then, if my algorithm is sound it should be independent of what is this underlying distribution, right. So yes and in the second case I am looking at for nu prime, okay and now see like this is the probability with respect to the distribution induced by nu prime and this is the probability that is induced by nu prime nu sorry nu prime nu prime pi, okay under which I am asking this probability. So both should be if they are solved this should be also less than delta this should be also less than delta because the both should be less than 2 delta, okay see the soundness definition holds independent of what is the distributions we are looking at as long as under that distribution that it happens that it falls it stops in finite time and whenever it happens it is going to give an arm that is other than the optimal arm should be bounded by delta under soundness. Now by definition we have defined this quantity to be E, the event E. Now we are going to argue that this is also going to be this quantity here is going to be a subset of E complement we will argue that so this is going to be a subset of sorry we are going to argue that this is going to be a superset of E complement that is why so this quantity simply E and this we are going to nu pi and this we are going to have E complement plus E nu prime, okay now let us see let us understand this step why is this is a superset of E complement if this set is a superset of E complement then only I can write it as a lower bound like this right, okay now let us say what is E complement if this are defined it as E then E complement of this is what is that event means or so let us say I am going to treat this event as A and this event as B I want A and B, A and B is complement right, A intersection B complement is A complement or B complement, okay so one more thing if my policy pi is such that my expected complexity is already infinity then this bound naturally holds right like nothing to prove in that case because infinity is a bound to anything so that is why we are going to when I am going to assume that we are going to assume that, okay if this is the case is it true that if this expectation of this sample complexity is bounded that means the probability that this sample complexity is bounded is also almost surely almost sure okay fine so let us let us observe couple of things the way I have chosen V prime so I am looking at V nu and nu prime and nu prime I am taking from this set because I am interested only nu prime coming from this set right alternative environment set so I am going to take that so if that is the case I know that this always holds that the optimal arm under nu and nu prime are going to be different okay if that is the case I am going to write it as this guy is going to be whatever this this is true that so if this is true if this guy belongs to this it must be the case that it is going to an arm which is I star of nu 1 so we are saying that under nu prime I star is the optimal one now what we are saying that this I star nu it does not belong that means this event already contains this but there are other arms at least it contains that and there could be more arms so that is why we are going to say this one okay and further anyway this event I know is not going to happen right because of this probability but anyway like I mean this is bit let us say this makes sense I am going to if I am going to further ask this then this set is okay I had yeah right which is going to be tau is going to be this event is not going to happen so this is going to be equal so now is this clear now that is why because now e complement here is going to be contained in this set which is this set here whatever I wanted because e complement is contained in this set that is why I got the lower bound because e complement is a subset of this set so whatever I wrote here this part is exactly equals to this probability here and this probability here is a lower bound on this probability okay now we are ready to appeal to the high probability Pinsker inequality what does it says we have such an event here where I am looking at an event e I am looking at the probability of that happening under distribution nu prime P and then I am also asking the complement of that event happening under another distribution nu prime so what does the Pinsker inequality gives for such a sum so because of this this chi is going to be equals to pxp minus divergence of between nu prime these two distributions so notice that they are not just nu and nu prime this is nu pi and nu prime P this is the distribution further induced by the way you have pulled your arm according to policy pi so this kind of thing we have also already incurred when we did it in the cumulative regret case also now once we got this how did we do this then we say try to write down this quantity here as the weighted sum of the divergence between distributions of the pair of arms okay now let us try to get that so this step I am directly writing it but this is exactly the way we got it in the case of the cumulative regret proof also so when we have looking at divergence between these two induced distribution that can be split and expressed as decomposed as the sum of the divergence between each the pairs of the distribution for each arms but weighted in this fashion okay so we did this actually we formally showed this but it needed some more bit of notations just expressing it the pulse over all the arms before we stopped there the stopping was like fixed that there after t rounds we have done but here it is a random quantity tau but whatever or the same thing you can show this but it so happens that it is exactly not this it is going to happen with a pink and upper bound here so just check that this is also given as one of the exercises in the book but just check it okay so now what we have from this point to this point just going to write it directly after using this relation we have 2 delta upper bounded by half exp minus summation equals to 1 to k okay I have just plugged in the decomposition for this divergence term which is expressed here now if you simplify this what we are going to get I am going to get that that the exp so I am just simplifying to get a bound on this I do this the expected value of C pi n i tau divergence and this is going to be what this is going to be if I am going for delta it is correct after if I simplify that this is what I get do you have any of you have that high probability pin skin equality can you check when I applied this point here is this constant here I got it right or it something else yeah is half only so in this case then I think we should be okay so finally I wanted to conclude this result by saying that see now we have a lower bound on this right which we have now actually shown it is log upon 1 upon 4 delta is the way we have shown this is what we got and now this is true for any new prime we are going to take from this set right we have fixed one new prime and did this but if you are going to do infimum this should hold this bound should hold even if you take the infimum or new prime so but it looks like there is slight confusion here the way it has been expressed here maybe I do not know so that so let us stick to what we got we have got log 1 by 4 delta but the book say has written it as 4 delta so let us make it 1 upon 4 delta okay so then we are also going to refine our what we actually showed is then we have shown that the expected sample complexity we have shown is this is going to be lower bounded by c square mu times log of 1 upon 4 delta so one of you please do verify whether there is a just a typo in the book or like it should be actually 1 upon 4 delta so our from our derivation are getting it as 1 upon 4 delta here but the book it states that it is 4 by delta so if see if you can verify that okay fine so so let us take for time being this is our this is what we have shown so what we have now is the low sample expected sample complexity is this quantity upon 4 delta so if you want the confidence to be very high that means delta to be very small so this complexity is naturally going to be high if you are happy to get it with the low confidence then this delta is going to be high and the sample completeness is going to be high and now there is this bit of s e term here which is a problem dependent quantity so in it is I am here what is this but let us try to understand this quantity for some specific example okay so what do you what do you expect like this quantity like okay this quantity is somehow dependent on mu do you expect it to be like inversely dependent on the suboptimality gap so if it is inversely dependent on suboptimality gap then it is natural that if the gaps are small then the sample complexity is also high but it is not obvious that this expression contains that so let us try to work out it for some simple case so let us see this for for let us take an example let us take my environment to be mu 2 1 what is that this notation indicates it has two arms the superscript is tells a number of arms n is showing it is a set of sub Gaussian noise and with mean sorry variance is equals to 1 okay it is a one sub Gaussian in this case and mean we are going to take 0 because this is a sub Gaussian random variable and also assume take that mu take a mu that has unique optima now if you are going to compute this quantity so since is there are only two arms I have to only worry about one one probability in them one probability vector with just one one free variable so I am just going to take it as super alpha belongs to 0 1 and then in super okay so I am going to take there are only two arms right if one term is alpha other term is going to be other term is going to be 1 minus alpha because this alpha constitute a probability vector now this is going to be so if it is a Gaussian random variable what is the divergence between two Gaussian random variables mu 1 minus mu 2 provided their variance is the same they are here diverges no it is mu 1 minus mu 2 whatever the means of the distributions they are different squared okay provided their variance is the same so I am not saying okay so I have to correct here like I am just saying this is the set of all Gaussian distributions with variance 1 it is not necessary that the mean is going to be 0 here in this case it is going to be this term here is going to be simply alpha times mu 1 of mu minus so this is the divergence between two Gaussian distributions with the same variance okay okay now let us try to see if we can optimize the inner term so now we have to find this quantity taking infium over all nu prime so nu is given to me this is for a given nu that is fixed so now suppose we want to compute this nu prime coming from the set alternative set let us let us check this I am just going to write this so recall that what is this nu prime these coming from a set in which the optimal arm is has to be different from that of the optimal arm under the environment nu okay so under this if you try to find the infimum of this I am just directly going to write it as 1 minus so this is going to be if I am going to find all all nu primes I will eventually end up with this again just check this I have not checked this so just we are trying to understand how this quantity looks like now if you have so now this quantity is only sup over alpha now alpha appears as alpha into 1 minus alpha when is this quantity maximized alpha into 1 minus alpha if you are going to maximize over between 0 1 it is going to maximize at half so then this going to be 1 by 4 times and what is mu 1 minus mu 2 for this particular distribution this is the suboptimally gap right we have only taken 2 arms okay and for this this is the mean of the first arm and this is the mean of the second arm and it is a squared one whichever is the optimal I do not know but this is giving me the suboptimality gap that is nu 2 nu only mu 2 of nu so I am already optimizing it for all nu primes this is the after taking infimum over this I am directly writing it like this so that is what not check directly reckon okay so you have to do some optimization making sure that this nu prime is such that the optimal arm in it is going to be different from that of the optimal arm for nu so and then we are going to get this so if you are going to plug it here you will see that the expected is like of the order 1 upon you are going to get it as like something whatever mu 1 so this is like what we call it as delta right what we have been calling as suboptimality gap for this environment so this is going to be like delta square log 1 by 4 delta so anyway this complex looking quantity is trying to capture what is the how complex the problem instances here the problems how complex the problem instances is captured directly through this suboptimality gap right once once we simply this for this specific instance of the set of Gaussian distributions with variance 1 we exactly got this okay in the book they compute for couple of more examples but I am just going to write this suppose if you are going to take epsilon my environment to be so here in this example we fixed variance for all the arms to be 1 but even if you relax that and you are going to take let us say k equals to 1 2 so and mu vector could be anything so you take any mean vectors and any variance quantity then you can compute that this quantity turns out to be 2 times sigma 1 sigma 2 square delta 2 square so if you are allowed any variance sigma 1 sigma 2 so here in this example I have sigma 1 and sigma 2 to be both 1 that is why I have got it as 4 by delta square but if you just make them any delta 1 delta 2 this is what we are going to get okay just by following this same approach so again you see that this quantity is going to be it must be proportional to the square of the suboptimality cap so this is how the sample complexity look like now the question is the algorithms we had earlier are they optimal in the sense that their regret bounds is of the same order as this lever bound I am just going to write the upper bound we are going to get for k l u c b it is a bit the upper bound if I have to want it to write it in a full general it is a very very complicated one so I will just try to give you the order wise flavor of that theorem I am just going to theorem for k l u c b so recall in k l u c b we have this termed beta beta of t comma delta right which came in the computation of our upper and lower confidence bounds so suppose let us say if I set that values beta t comma delta to be of this fashion log k and this h epsilon here is some problem dependent constant and this k star k alpha is another constant now if you see this so kappa 1 is again another constant here which I am not going to define if you look at this the way I got this is log k by delta is what I get but in the actual lower bound what is how was the dependency on delta it is log 1 upon 4 delta like but the so the term inside log did not depend on how many arms are there but here it looks like this quant guy depends on this in that sense this is not optimal right so even though it has other constant but if you just look at the problem the parameters of interest number of arms k the delta parameter the confidence parameter and the problem specific parameters that is all delta so just just if I just want to focus on this delta k it looks like the extra delta kappa has popped up here in that sense this is not optimal even though there are other constants here so we will just leave this upper bound here what you can look into the paper for all this description I just wanted to draw an along like how does the upper bounds for the algorithm we have discussed they compare with our lower bounds so then the question is but just looking at that you feel that okay this algorithm is it may be empirically performing good but at least the bound wise you will see that it is not matching the lower bounds right so in that way you want to look for a better algorithm or like hope that I may have a come up with an algorithm which has a tighter bound than this and matches the lower bound and maybe it has a better empirical performance also so again as I told in the class of late many algorithms have come up on this I mean we do not have time to discuss all those things but if any of you interested you could take study this the survey of what all the new algorithms that have come up you can compare their numerical performance and see which one is better and which one has the best theoretical guarantees and which one has the best empirical guarantees okay the last thing which I do not want to delve into is there are two flavors of best I did arm identification problems we said one is fixed confidence other is what fixed budget you may ask the question okay if I have I can only explore hundred rounds within this you after this hundred rounds you tell me the best arm so in that case you will be goal is to output an arm which is optimal with high probability so in that case you want to minimize so okay let me just write that so in this number of rounds is given let us say t is given and you will have to output an arm at the end of t plus one such that the probability that this is not equal to I star should be minimum you want to you get this so this is your criteria so give a policy that minimize this quantity now in this line of attack we are trying to you will show that whatever the policy you develop with what probability it is going to guarantee this so what do you expect this probability to be if you have a t number of rounds right you are going to use t rounds and after that you are going to output an arm and you are going to tell me with what probability that is going to be not an optimal arm okay you want to basically bound it you you may say that okay I am going to get whatever I give you it so happens that it is not going to be the optimal arm and that probability is going to be bounded by delta you want to make this delta as small as possible linear algorithm what do you feel that how this you expect this delta to be dependent on capital T like 1 by t but what what what could be exponential but how it should be expected to be to the power minus t yeah yeah so you are basically putting the delta is 1 by t is that I mean there is some so okay if so okay for the fixed for the fixed confidence you got like let us say order log 1 by delta right for so for a given delta now for fixed budget we are saying to be like order exp minus t yeah of course there is some other problem dependent ones but so what you are saying is okay I just equate this to t that is a sample complete okay it looks like I mean when people this fixed budget is is bit more nuanced than what we have for fixed confidence in this definitely this the way you are thinking this relation does not really translate like this one has to separately prove it but there is a recent work that says that it is not necessary that we have to build a altogether different algorithm for the fixed budget case if we have a an algorithm so basically that work is basically we can come up with a common algorithm with small difference like there will be but it is going to use a common module such that it can simultaneously perform better or for the fixed budget case as well as fixed confidence case so they have one algorithm that calls certain modules what the difference in that algorithm is the way it is going to stop at what point it is going to stop so if you are given t the fixed budget case it is going to explore t rounds and then going to output the norm if you just pass it on delta it is making sure that it explores sufficiently minute and that whatever it is going to output it happens it outputs a R which is going to optimal with probability at least 1 minus delta okay in a way one can think of coming up with a common frame which performs on both the settings okay but we will not go into that like if you are interested that is also one of the good papers to explore and see how that performs empirically as well as what are the theoretical guarantees of that algorithm I do not have it it is called U gap algorithm that is I think NIPS 2012 paper we are interested to look into that okay so let us stop here