 Let me define at any time u a of t as, so this is our upper confidence bound based on by Bunstein inequality, right? Bunstein, Hofding's inequality which we have obtained. So, this u j t is simply going to be whatever this quantity we have. So, 2 log t by t j t minus So, when I computed this j t, it is just based on the empirical means. So, j t is, so we after the here I have not written, right? Like how I am going to, so when I said set j t, right? How is this set set j t set? It is just like u a after every in every time t, you will have this i for all i. This is the estimates for each of my arms. That is just simply the empirical average that I am going to take based on that I have taken the first top m arms. These are the remaining arms. Now, this is what I have used to define my j t set and to resolve the conflict between the edge points, I have used my confidence terms, upper and lower confidence terms. This is not just this, sorry. It is actually the mean value plus the confidence term and the lower confidence term is the mean value minus the same low confidence. So, this is the input to you. You can use whatever the confidence term you are going to use and then plug in here, then you are going to get this algorithm. But when I said this algorithm is KLLUCB, this algorithm KLLUCB is going to use a confidence terms which are based on KL divergence that came from other set of concentration inequality that is large deviation principles. These confidence terms came from what? Like as we said Hoffding's inequality. But we said that how good the algorithm depends on how tight these confidence terms are. If the confidence terms are tight, maybe I can also get a better performing algorithm. So, other possibility for let me call this as UCB here. Other possibility for this terms are they are called KL divergence. And what is the confidence term we used in our KLUCB algorithm? It looks something like this, right? Same thing. So, remember this is the confidence terms. This is the indexes. So, this is what we called the UCB indexes, right? Indices. We ranked in the UCB we used only upper confidence bond to rank the all the arms and then played the one with the highest index. In the KLUCB, we use this as the index and we rank the arms based on this value and we play the arm with the highest index. Now, we are just saying when I am going to define this pure exploration algorithm, we are actually using both this upper and lower confidence bonds and we are going to either use this pair of upper and confidence bond or we could use this one which are coming from two different methods. Now, the question is which one is tighter? Is this one is tighter or this one is tighter? So, did Pinska say this? Pinska just said that, right? So, suppose if these are the tighter ones, which confidence, which confidence bonds you like to use? So, naturally you want to use the second one, right? Because if they are going to have a tighter one. So, it can be shown that what we are going to have is this LUCB. So, what we are saying is the upper confidence term we get from KLUCB is dominated by that we could have obtained from the UCB. That means this is a tighter one, right? And also on the lower confidence bond, we are saying that the lower confidence bond we get from the UCB method is or maybe instead of UCB I just call half things. So, this algorithm says the now algorithm complete. So, now I am going to in the KLUCB version of this we are going to use this confidence terms. So, just replace this U and L functions are defined like this at every time and using that you are going to compute this UT and LT functions. Now, once you are able to compute UT, L function, you can compute this BT function and then your algorithm completes. So, as you might have noticed when you are trying to implement compared your UCB against KLUCB, you observe this to be better than the UCB, but you are able to compute this efficiently, this indices. This indices computing it is much easier, right? It is just like empirical estimates and then adding this term. Whereas, this itself is an optimization problem where all of you are able to compute this fast. It took time. It took time, okay, but you optimize the time or it you just let it run, you did minus. So, I mean, you did you notice how much time it reduced before you optimize and after optimize if it was taking before 2 hours after you optimize how much time it took just a few seconds, but binary search how did you set the precision like the precision was good enough for. Yeah, actually you set some 1 or 0.001 10 to the power 7 for that it was converging in few seconds. So, you will face the same difficulty here, but you need to resolve that when you are going to implement this algorithm. Okay, fine. So, this is algorithm actually performs good. We will we are not I am not going to right now give what is its bound, what is its sample complexity, we will discuss it in the next class. So, just take this as an algorithm now. So, we are going to look at another algorithm called Lille UCB. So, LIL stands for here law of. So, I am just going to discuss this algorithm again and we will not discuss much about its sample complexity. Okay, I just want to but I have we have asked you to compare this algorithm with K Lille UCB algorithm and you see that both of have its own issues. By the way, what are the inputs I need it for? What are the tuning parameters in the K Lille UCB algorithms? What are the things you need to tune? Epsilon is the one thing you need to tune. Okay, there is also. So, if I am going to use K Lille UCB there is also this beta sequence right, this is another thing I need to tune. So, how did you tune this beta in the K Lille UCB algorithm before? It was some log log t. So, it is log t plus log log something right. So, what did you do with the log log term? You just made it 0 that is. So, you have to see like that is one tuning parameter you have to set. Had you not set it to 0 but you set it to some other value. So, there was I think if you are right it was log t plus some 3 log log t right. There was a some constant multiplying the log log t. So, you said that constant to be 0 but did you any of you try if instead of setting that constant to be 0, if I just set it 1, did it improve or what is the situation? We did not run. So, you should try this time. Okay, see that if I just going to. So, where you get the best performance. So, do mention what is the value you are going to get for C there. When you submit a report just mention what is that value you are going to see. So, we will see. So, you have to set this parameter epsilon and that is without and you have to set this epsilon without knowing it right. So, this is one hurdle in this algorithm like how to set this appropriately. This is a tuning parameter and another thing is okay, we know that setting that constant to be 0 works better but it is not necessary that every time 0 setting constant is a better maybe some in some cases setting it to be some positive value may work out better and you and I these are anyway but the main tuning parameters are epsilon and this beta. So, see that in your when you implement that if you play with this parameter see if you are going to get a better sample complexity and report that like if I am going to if I when I said this epsilon it work best for this problem setting. Okay. So, keeping that in mind let us see this algorithm what are the things that this algorithm needs to be tuned. What kind of algorithm you like the one with the minimum tuning parameters or with lot of tuning parameters? You want it to be the one with the minimum right like that means your intervention in the algorithm is minimal and these are all supposed to be learning or artificially intelligent algorithm. So, you your interference should be smaller and they should be if at all that a parameter they should be automatically tuning themselves instead of me you manually interfering. Okay, this algorithm it is going to tape if you ask. So, where is the input like I said these are all for a given delta parameters right where did the delta come into picture here. So, the delta is absorbed in this beta functions right how was the delta influencing this beta functions. Okay, so just when you are going to implement check that how this delta where you are going to use this delta and if I am going to change this delta your simplicity is so if you are going to increase your delta what should happen increasing delta means what you are able to tolerate more errors right. That means it should your algorithm should terminate fast. So, play with that and see I think in your question you might have fixed that delta to be something if you are fixed fine but if you want yourself check whether if I am going to change delta anything varies in the algorithm you can just try out that as well. Confidence delta and parameters epsilon lambda is going to take this now initialize. So, the way I define this scale UCB algorithm it is time to identify the top m arms but you can just set m to be equals to 1 in that case it is time to just identify the best term. Okay, so this algorithm I am writing here it is just time to identify the best term okay not the top m arms it is just its focus is to identify the just the best term. So, what this algorithm is doing is it initially samples everybody once and then this is the stopping criteria for this algorithm. So, like here the stopping criteria was whether the BT is going to be epsilon or not. So, here it is going to check so once this condition is going to be violated if there is if for some n i happens to be larger than the some total number of pools of all the others whenever that has happened then this criteria is violated and then it is going to stop otherwise it is going to continue okay. So, when it continues it is going to always play the arm which is again this algorithm defines some UCB index that is coming from some new concentration bound that is called law of iterated logarithm not going to that this is another concentration bounds like based on half rings and whatever k l UCB. So, this they are going to define UCB index like that and what whenever they are going to play it they are going to update the count of that for others no count is updated. So, here it should be n i and then once it so happens that number of pools of a particular arm happens to be larger than the sum of pools of all the other arms then this guy is going to stop and then it is going to output the one which has been pulled maximum number of times at that point as the best term. So, it must be the intuitively it makes stopping criteria makes I if it so happens that one guy has scored so much that it is larger than the sum of rest of the guy scored in the class maybe that guy is best. So, it is just going to use that algorithm but when they do somehow they are going to they exploit this law of iterated algorithm and show that this algorithm stops with finite time and it has a good sample complexity especially I think the result the sample complexity result holds for two arms but you can check how good it is compared to the k l UCB algorithm. Okay, so now there are certain parameters again here to be tuned what are the things to be tuned they are two sigma squares. So, okay another thing they are going to assume that my distributions are sigma sub Gaussian that parameter is that sigma. Okay, but they are also assumed that even though it is sub Gaussian but the means are all in the interval 0 1 the support could be large but the means are restricted to the interval 0 1. Okay, so this is for a sub Gaussian distribution with parameter sigma square. So, everything is defined right. Okay, let us see this let us focus on this lambda. Suppose if I increase lambda do you think it is a sample this stopping criteria is going to meet early or late it is going to meet right like I want the best guy to be much much better than the rest all put together but if I this lambda happens to be smaller maybe it will come down. So, then again this becomes a tuning parameter I mean I cannot like appropriately decide what is the best lambda here right. So, you have to maybe just play with this also they say when you are going to say this guy is going to be superior to this or the other way around see that what are the parameters you need to tune and see that by tuning them. Suppose let us say this guy beat this guy now this guy is a poor chap here right. Let us try to give him some advantage try to tune any of his parameter and see that this guy over takes this guy. Okay, so just like do this I mean it may not be it may happen that one guy beats the other by too much amount but if you see that they are very close. So, that there is not much difference between this guy happens to be beat this guy this sampler complexity turned out to be just 50 here and for this guy turned out to be just 60 I try to tune this parameter and see it can come down by 50 and just some simple manipulations that is just for experiment purposes. Okay, now the two algorithms I said these are kind of already old algorithm the one the KL UCB I said that come in 2013 and this UCB it came in 2014 subsequently people have developed many algorithms. Okay, so if any of you interested to study this pure exploration you can take this as a project topic earlier this was not listed or maybe like you are not aware of this topic right. So because of that if anybody has not taken but you want to look into this you can still take it. Okay, I do not recall all the names but I think there are quite a good few more algorithms that have come which have also some better theoretical guarantees and also I do not know the empirical empirical there are debates some people say it is hard to beat KL UCB says does most of the time but I remember like somebody said there is one more algorithm that has come recently which does sometimes better than this that is open for explorations. So if you one of you are much interested you can do all the comparisons of these algorithms and maybe write a nice report of comparison comparison and doing it a regret job of doing comparing it under different different environment setting. So okay, so let us stop here