 So, now quickly discuss what is the result we are going to get for this. So, we are going to state this for a particular case. So, let gamma is equals to 0. So, notice that like we are setting this gamma to be 0 in this statement. So, when we have this gamma equals to 0, so we are just going to call this exp4, but when and the case when gamma is greater than 0, when we are this alcohol we are going to call exp4ix, when gamma is greater than 0, but the result we have set it here for the case gamma equals to 0 and you can work out that for gamma greater than 0 also similar bound holds. So, now let us revisit this. What we are saying? Take gamma equals to 0 and you set your eta to be in this fashion and then assume that experts are deterministic. What I mean by deterministic here is, yes, experts are actually giving me a probability vector, but it is going to be the same every time I give a particular context. And we are also assuming that these experts are oblivious. What does that mean? So, that whatever based on whatever the rewards they are going to get right they are not going to change their distribution accordingly. So, in a way you can. So, this means like you can assume that that each expert has already come up with what is the distribution he is going to put on each the arms for each context and he is not going to change with what is the reward he has been observing so far. So, that is his what is the context he observes that is irrelevant to him. He just tell me what is the context and he will just tell you what is the distribution. In that case then the regret we are going to get is under root of 2 tk log m. So, this looks very similar to what we had gotten for exp3 right except the fact that this is now m instead of k. So, the regret bound you have exp3 was 2 tk log k right where k is the number but now that k has become m now where m correspond to the number of experts. So, the proof we are going to skip when I am just there is one lemma which you need to get to prove this I am just going to state this lemma even proof you can work out it says that for all. So, this is a lemma it says that for any m t equals to 1 to t let us take t tilde t m minus. So, what this lemma saying is. So, what is this term giving you here? So, let us take an m expert what is this term giving you here right. So, if you look into this x t tilde right what it give basically give then the expected estimated rewards for that expert m right. So, in that way we can treat it as the total reward that has been obtained by expert m ok and then we are comparing it against what is this quantity over here. So, what is q to n yeah this is the probability with which I am going to choose expert m in round t and what is this sum is going to then give the expected rewards that I would have got from the experts ok. So, it is basically going to say how is the reward gotten from one expert compares with the mean that would have got from all the experts and it is going to bound and eventually this I mean this theorem uses this fact to bound it. So, just to look into the book like how to get this. So, I am using all these things finally we are going to get this is upper bound ok. So, we will just skip it I mean most of the proofs are very similar to what is there in exp 3 again ok. The ideas are all similar except for the fact that now we have to take into account two level of randomization into account one with respect to the experts selection and another with respect to the arm selection ok. So, now let us discuss couple of special cases. Earlier when we started this was by saying that I am going to take my phi to be phi 1, phi 2 some phi or phi could be all mapping from c to k yeah. So, that standard assumption we make that rewards are in the interval 0 1 this x, x t vector right or x t i is in 0 1 ok. Suppose this c is finite I mean the cardinality of c is finite and there are k arms what will be the cardinality of this set phi what is that k to the power. So, it is going to be k to the power cardinality of c right. So, in this case if I took a particular phi it is going to assign one value in k to each context right. So, in terms of if I am going to think that as a probability vector it looks like a unit vector right where for that particular arm it is going to be 1 for others it is going to be 0. We got this point. So, I take a particular map here it is going to assign a unique value to each value each context here, but I can think that assignment as a probability vector as how I can assume that em is equals to. So, let us say I am going to index this all the features let us call them m 1 all that up to k to the power c ok. Then I am going to think that emi in round t is nothing, but indicator phi of whatever let us say this ok. So, what I want to write here is let us call my functions here as let them enumerate let them call this f 1 phi 1 phi 2 phi 3 all the way up to phi to the power phi k to the power c ok. I have these many maps. Then can I map it the the value recommended by expert m or the value recommended if I am going to treat it as a map is nothing, but this right. So, let us say each of these experts there are k to the power cardinality of c experts and I am going to treat them these experts are nothing, but these functions ok. So, now experts are nothing, but these functions. So, what are going to they are going to if I am going to give a particular context what they are going to return me it is they are going to return me the arm that should be selected according to that phi in that round right how based on that, but in terms of the probability vector I can write like this right for the nth for the nth expert I can just write it as like this probability vector. So, this is just going to put value 1 on whatever the value this function assigned to that context t and everywhere it is going to put the 0 right. So, in that way I can treat all the all the maps here is as one different different experts and we are already in the setup right where I am going to set my E t function E m i to be exactly like this based on what is the map I am going to use in that round. So, now then even if I have a all my if I am going to define my regret in terms of these policies all possible phi is this regret bound still applies it should apply right it is just like the experts are nothing, but these maps here and they are also even though they are not giving distributions that I can treat that as a distribution. Now, if I am going to apply this point on this what is that I am going to get yeah. So, in that case I am going to replace m by. So, in that case what is this here, but what is this regret bound like we had gotten this regret bound by other method also right what was what was that method yeah by applying exp 3 for each of the context if I maintain my exp 3 for each of this context I would have obtained this then what is big deal about this algorithm then why is this why this I should why this is of any interest okay yes yeah. So, this so when if I have to write this bound it depends on the cardinality of c when I applied all this algorithm when I was in this setup I only cared about the number of experts I did not care about how many possible contexts are there whatever context I have I am just going to give it to an expert and that expert is going to give me a distribution on my arms right. So, this bound only worries about how many experts are there it I do not care about how many context are there as long as I have finally many experts this bound works right and often I may not as I said often we will not be working with all the possible maps we will be working with a restricted set of experts as I said like that as I discussed in the beginning of the class that restricted class could be based on the partition or similarity or you are going to just only fix the number of fees to be finite like what phi 1, phi 2 up to phi m. So, in that case there you have only finitely many m and irrespective of how many context you are going to deal with this bound is valid fine. This number of context is finite this number of context is finite yes, but it could be arbitrarily right which I do not worry about here yes when you are going to deal with all possible maps I am just giving you the worst case when you are going to deal with all of them because see when I worked out with maintaining one context what EXP algorithm for each context right I did not care about how many policies I will be competing against it was just like maintaining one EXP 3 for each one of them and that blind that gave me this bound, but here I am I am deriving it based on how many experts I am going to deal with yes I can take but the number of context could be same here still here though this experts I have only m experts here, but the number of context I am going to deal with could be same as this case also. So, in this case you see that I got log n, but here it could be cardinality into log k it could be much larger if the number of context is very large ok. So, with this we will conclude this discussion on adversarial contextual bandits we will not go into the lower bound proof and for all for this as you see do you expect this bound to be optimal you expect it to be simply square root tk or m should come into picture m should be there or at least order wise if you feel square root t we know that we cannot do better than that fine maybe like k also I cannot get rid of because from the adversarial setting I also know that it is like square root tk right. So, for us now the new term that has popped up is log m is that log m is the best or we can do better I also do not know maybe. Yeah. So, in that there is that log d term is somewhat similar to that log d term where d is the number of experts in this. Yeah, but we do not know weighted majority is the best I mean whatever the bound we got that is the best we could get that is full information whatever it yeah yeah k will not be there because it is a full information case where dealing with the bandit case here right. So, k will come into picture definitely because we are having only one kth of information compared to the full information case here.