 So, we have this bound right ok. Now, let us see my input parameter to the algorithm is what we said that is going to be the m right. Where is m? What is m? m is the number of place of each of the arm before I commit to somebody ok. Now, how should we select this m? Suppose, if I select m to be small is it good? Why right? So, if your m is going to be small we are not going to have enough samples right. So, because of that the estimates may not be good and with few number of samples and the estimate with them if you commit to one arm maybe that may not be the best one you are going to get right. Because, we have only few arms maybe the estimates may not be good and because of that the one which is the best one may not come up with empirically best somebody who is not actually best may end up empirically best and you may end up playing that for the rest of the duration. So, it is going to be bad. Why not then choose m large? Suppose, we set m to be large here we have large large number of samples we have a very good estimate after that we will commit to a good arm, not too far, but before that we have wasted so much of time in collecting the samples right. When you are collecting the samples you may be also you are also collecting bad samples. So, you have been wasting lot of your time to identify that. So, both of them are not good right then how to choose m that is exactly captured in this bound also right. As you choose m to be small this quantity is going to be small. So, this is what this is basically the loss you incur in the initial exploration phase. So, this part comes from the exploration part and this part from come the commit part. If you choose m to be small this guy is going to be small, but what about this? This can be potentially large right because it is already saying that this probability if you have small number of samples this probability can be large that you will end up making a mistake. So, because of that the regret you are going to incur the expected regret you are going to incur in the commit phase can be larger. On the other hand if you want to make this guy small the only way you can make this guy small is make this m large. If you are going to make this m large then this guy is going to take a hit the regret you are going to incur in the exploration phase. So, you see that already one we have to balance this how to explore how much we have to put our resource in exploration and how much you have to put in the expedition. So, then how you are going to choose this m? So, m is an input right you see that if I increase m this guy is going to take a hit, if I going to decrease m this guy is going to take a hit. So, why do not I treat this m even though m is an integer I take this bound as a function in m and try to optimize over m and try to find a value that minimizes the supper bound right. So, can we do that just. So, here m is a linear quantity here in this right and here this is exponential in m anyway this is going to be convex, but this is quantity is decreasing in m. So, see that this is going to be let us say convex function in m you need to verify that and then let us say it is going to achieve minima for some particular m. Can we find that m by just differentiating and equating it to 0? What is that value? So, if you just treat it as variable in m differentiate and find it what is the m you are going to get that maximizes this. So, before that. So, this is going to make a slightly complicated right this term because this I know is going to I mean we you write right now we I say you verify whether this is a convex function, but we we do not need to even go there. Let us say we know this guy is convex right the only the exponential minus of m, but this product with this I do not know right now whether this is going to be a convex function. So, let us simplify this instead of that I am going to make it an upper bound here I will just ignore this term and just take an n here simplify this. Now, the second term the only m dependency comes in this part right I have removed this m and I got an upper bound. Now, I know that this quantity here the second term is also convex in m right because exponential minus of m it is a convex in m if you treat m as a continuous. So, this is convex this is linear. So, this guy is now convex function just differentiate it and find what is the m that minimizes this. So, if you just differentiate it we will going to get in m to differentiate and equal to 0 ok. So, let us let us make one more simplification. Let us take this k to be simply 2 let us assume there are only 2 arms ok. So, then the expression here becomes what? Now, because I have only 2 arms the first quantity del 1 is going to be what by our definition what is del 1 it is going to be 0 right and I will have del 2. So, in this summation here it is going to be del 2 plus n times del 2 times exponential minus and also in the summation I only need to worry about the second term because the first terms make this entire thing 0. So, I have now we get this simplification and if I now want to equate it to 0 this can get rid of and then if I am going to optimize find solution for this then m is going to be 4 by delta square ok. So, this is at whatever the value is, but now coming to m m I want it to be an integer right. Even though I differentiated it reading m as if it is a continuous variable, but m is an integer. So, what I will do is whatever the value of this I am going to take the seal of this. So, even that value happens to be a fraction then it is going to give me some integer value ok and also is it possible that this quantity inside this seal can be negative right. So, what I am doing is this what is the inside n delta 2 whole square by 4 right. Suppose this delta 2 is so small such that this quantity here happens to be less than 1. If this quantity happens to be less than 1 then what we are going to get sorry less than less than 1 then log of that quantity is going to give you negative value right. So, negative value means the seal is going to be what? Seal of a negative value some integer right, but some integer value, but I want m to be at least 1 right because m is the number of samples. So, what I will do is I am going to redefine this m to be max of 1 times. So, for the two arm case what is the best way to set this m? It is like this if you going to set it like this ok. Now, let us plug it back in this if you are going to plug it back the value of m in this expression you will end up with some only doing it for the case when k equals to 2 already written . So, if you simplify this I am directly writing after simplification you can verify this what is that? 4 by sorry getting reciprocal of this yeah I think it is actual reciprocal. Now, I think the inside is n delta 2 by square it is ok right this is what we have written. So, this is what the finally, after putting back this value of m in this expression for the case k equals to 2 you are going to get, but I think there has to be some corrections here this quantity. So, finally, if you are going to plug it back the value of m like this we will end up this quantity and you see that how is this regret depends on n. So, ok let us say. So, if I just simplify this this upper bound ignoring all these terms I am just going to keep the ones which are relevant. So, this is the exact expression, but I will just simplify this. So, you will see that now this regret is logarithmic in n right yeah yes we do not know we will come to that. So, I am just saying that somehow if you are going to set m like this you are going to get a regret bound which is in this fashion if you just plug in ok. So, now just look into that now this regret bound how it is like in in n this is logarithmic in n right. Now, coming to what is saying fine if you are going to set a m like this good you have ended up with a logarithmic regret like this which is definitely sublinear, but to set a m like this what you need to know delta 2 right. So, what is delta 2 that is the gap between mu 1 and mu 2 even though my current algorithm does not know which one is the optimal arm 1 is optimal or arm 2 is optimal, but what is need to know the separation between them that is delta 2 right. So, if it knows separation between them it now can set a exactly like it can know how much to explore and then when to start when to commit right. So, then it be it is natural here right does it make sense like why if I tell the algorithm what is the gap between the first arm and the second arm that is the separation between those two it kind of decide right when how much samples it need to explore. This is because it has to estimate the parameters and if it already knows that the separation between them is delta 2 all it needs to ensure is the accuracy with which it is going to estimates the parameters that accuracy that error happens to be less than delta right. So, again what we are saying this delta 2 is mu 1 by mu 2. So, what my algorithm is doing is my algorithm is estimating mu 1 hat and mu 2 hat ok. Suppose it is able to estimate now I know that somewhere mu 1 is here and mu 2 is here sorry mu 1 we are assuming to be larger right. So, mu 1 is here and mu 2 is here and the gap between them is we are going to call it as delta 2 right. Suppose my algorithm estimates the mean value such that the whatever the estimated value I have that the true value of this mu 2 will be contained within that delta 2 around this within delta 2 approximation. So, instead of delta let us make it delta by 2. So, that it will be whatever my estimated value is going to be below this and whatever my estimated value of mu is above this quantity plus delta 2 by 2 right. So, if this I can estimate my mu 1 hat and mu 2 hat within delta 2 by 2 approximation of that true value then I know that when I am going to compare this with this I am not going to make mistake ok. So, let me refresh this. What I am my algorithm is doing? My algorithm is estimating mu 1 hat and also it is estimating mu 2 hat and it is finding which one is the bigger one of these two right. Now, I know that the true value of them that the true value of this mu 1 and mu 2 the difference between them is delta 2 ok. Suppose if I have this mu 1 hat here and I let me estimate this guy within delta 2 by 2. So, now, I know that my true mean is somewhere in this interval with high confidence right like I know like with some confidence it is going to be and then I have another mu 1 mu 2 hat which is also like this and I know that my mu 2 the true value of mu 2 is again going to be or within this interval with high confidence. Now, if I am going to compare these two quantities the separation would not be more than there will be at least separated by distance of delta right and that is also the actual separation between mu 1 and mu 2. So, because of that if I can know the true distance difference between mu 1 and mu 2, I can appropriately decide to what accuracy I should be estimating my mu 1 and mu 2 and now here we are saying ok if the true gap is between mu 1 and mu 2. So, again coming back to this I know that this gap is delta 2 if I can ensure that my estimate of this mu 1 2 somewhere falls below this and my estimate of mu 1 remains above this then when I am going to compare the estimates their ordering will not change right. So, because of that the one with the highest mean will be declared as the highest mean if I am going to estimate those values within this delta by 2 error part. So, that is why once I know my delta 2 and it is saying that if I am going to set my m to be this many number of samples, I am going to make a very small error only this much of error in fact, what is that exponential part where we remove yeah this much part of the error in separating these two through their estimates ok. So, naturally as long as I tell you the actual values differ by certain amount that will kind of already give you a hint to what accuracy I should be estimating each of this arms right and that is what it is saying like ok. If you are going to set m like this I am going to set my number of samples like this with this I will get a error bond which is of this format which happens to be leading to a regret which is logarithmic in n here ok. So, good we have logarithmic regret here in n, but the bad part is I need to tell you what is this delta 2 right to set m in this fashion that is I need to tell with a difference between arm m sorry if it in the two case itself like arm 1 and 2 what is the gap between them I need to tell, but we may not know the superiority right all we are assuming that each of this arm has certain distributions we do not know those means. So, we also do not know what is the gap between them we are interested in algorithms where we even do not want we do not have this knowledge of this even without this can we get something like this some regret bonds which are logarithmic in n ok. So, what other options we have there we have this explore then commit algorithm which said that yes if you tell me the gap between the best arm and the suboptimal arm I will somehow give you suboptimal regret, but I need extra information what is that extra information I need to tell you this delta 2 if you tell me delta 2 I know exactly how many rounds I need to explore and after that I will do a commit that commit will almost identify the right arm with high probability. So, then this experiment commit needs to tell you the gap between mu 1 and mu 2 right I do not want that what other options I have ok. Other possibility is called epsilon greedy or maybe just greedy. So, what is greedy you just go and sample each arm once and after that you just or maybe instead of once maybe sample each one of them certain number of rounds and after that commit to the one which has gives you the best empirical arm is that that is greedy after that in every round you just you you the one which gives you the empirically best you play that and you see what you what you get in the next round you update all of them means and play that that is a greedy version of this right instead of committing to one arm after certain number of rounds you explore initially and after that you start playing an arm greedily in every round you pick the one which has the highest empirical mean will that be good why. So, why not if ETC and explore and commit algorithm has bought you this level with some knowledge of delta 2 can get a logarithmic regret why not if you continue to do greedy instead of committing why not that is a bad idea ok. So, if I start doing greedy in every round instead of exploring I just get sample from each of them once and then start doing greedily is that a good thing or bad thing to do ok. I understood my question you are going to sample each arm once and after that you start selecting the arm greedily in every round yeah you take anything you want and see in which case it is good and is there a case where it is going to be bad yeah. So, along that line suppose let us say again take the two case on one is simply let us say Bernoulli some some value p 1 let us say it is just half and let us say arm 2 is Bernoulli 0.8 ok which arm has highest mean in this arm 2 right it is mean is going to be 0.8 suppose you pick one arm one sample from this and another sample from this ok and in that case it may happen that you may get sample 1 from this and sample 0 from this right it is possible ok. Now, because of that when you want to play in the second round which arm you are going to pick you are going to pick one and it may happen that again you get sample 1 from that ok and now and in the third round what you are going to do ok just forget this . So, I am going to write let us say in the first round you got sample 1 from this and 0 from this. Then what you are going to do in the second round you are again going to choose arm 1 right and what you are going to do and you did not get any sample from this. So, now you have only one sample from this which is 0 while from the other samples you have got from arm you have at least one whatever the average you are going to get your average will be greater than 0 right. So, you will be always playing then arm 1 you will never get to arm 2 even though it has highest mean. So, because of that if you are going to do greedily you are never going to get the optimal it is pretty much possible that you will miss out the optimal one and because of that your regret could be linear. So, the other option is you do not play the empirically best in every time you know that you could be missing the other ones. So, every time before you every time with some probability you go and explore ok because of that you may end up in this case may be also playing this one and get a sample from this and because of that your average about that could improve. Then the question is how to set this epsilon right ok. So, how to set this and how this algorithm going to work you will do it in an assignment question and there you will ask you to discuss how to choose this epsilon and derive some regret bounds like this for it ok. Let us stop here.