 So, let us start. So, today we are going to just complete our discussion on what have been discussing under adversarial multi arm bandits. So, today we will just complete the proof we left last time and then going to introduce two variants of that which will help us give a probability not in expectation, but in high probability. And so, last time we discussed about this exp3 algorithm, we said that its regret or pseudo regret can be bounded as like what is that bound some square root n k times log k right. There are some factor of square root 2 there. So, we needed to show this finally. Before we do that I am going to slightly whatever the exp3 algorithm we have written the pseudo code. I am going to write it the same pseudo code, but in a bit more compact way than what we have written just for our reference. So, what was exp3 stands for right exponentially weighted expo relation and exploitation. So, what was the algorithm we said input for that is some sequence of t and k where k is the number of arms right and then we had an initialization. So, here we have started initializing like p 1 b uniform distribution. Instead of that we will start saying that for all. So, what was our notation for Li? Yeah this is a cumulative estimated loss right. So, we will assume that initially it is all 0. So, our notation was 0 here. So, 0 stands for the 0th round and then what we will do? We will do we will come up with this distribution which is peak i t according to distribution p t of the i t now for all. So, it is just the same pseudo code we wrote last time, but this is just like putting a slightly different. So, let us understand what is happening. So, I initialize all cumulative loss to be 0 in the beginning. So, because of this when I started in t equals to first round what is this value? It is going to be 1 by k right and that is true for all i. So, in a way in the beginning I am giving equal weights to all the arms that is in the first round it is like a uniform distribution. Earlier I have we have made that explicitly by saying that in the beginning p 1 is uniformly distribution, but now I am doing the same thing by initializing my cumulative loss to be 0. So, after that I in the beginning of every round I update my probability like this and then I am going to pick an action i t according to this distribution and then I am going to observe the loss component associated with that ok. And then I am going to update the estimates for loss of all arms like this and then also going to update my cumulative loss like this. It is the same thing as we did earlier the only difference is we are making the updates right at the beginning. In the previous one we are doing it at the end of my round, but you can see that both are the same. So, in terms of the analysis where were we we wanted to show that my pseudo regret of 63 is bounded by k by 2. So, we showed that if you substitute the value of eta t is here this value we got it as mk when I set eta 2 to be all the same as what was this value log k by nk and also there are the factor 2 here right. If I set eta 2 be always the same if I know n this is the bound I got and I also said that this bound is going to be what was that bound 2 times nk log k. Let me keep writing it as long time if I am going to set eta to be log k by tk in every round right ok. So, this is all we argued we wanted to now show this ok this is this was our goal. So, first we made this claim that this can be written as fine this is what we have argued in terms of our basic inequalities. Then we showed that this first term here what would you show we are able to show that finally, in term using our first step. So, to recall we wrote this in terms of the moment generating functions and then you are able to bound each term in the moment generating functions in step 2 and 3. I am directly going to write that here we basically were able to show that this can be expressed as eta t square divided by t i t plus phi t minus 1 eta t minus phi t eta t. We are able to express it like this and how did we define phi of t at any point t? We just said that this is nothing, but 1 by eta log of 1 by k summation e to the power minus eta t and this whole divided by summation x t minus this is let us call j here and summation over j this is you know i. So, this is how we have defined it ok Now, let us plug this quantity back here and I am going to plug this quantity back here finally, what I will end up is this difference is nothing, but is upper bounded by this quantity and minus this quantity whatever you have. So, whatever this whole quantity we have here in addition to this I have this minus t equals to 1 just going to write what is that? What was the difference here? So, there is no 1 by k here ok. So, it is just like 1 by sorry even this this term was not there. It was just like oh no sorry just may just let me correct this I am just now writing this quantity and looks like I did not have this term. Can you verify did we define it like this? Not there, but this numerator was there and the denominator was not there. So, I think I made I had made a mistake that we should have supposed to define it with 1 by k term there ok. So, now again writing this I am just like this ok fine. So, basically using our step 2 and step 3 we ended up this bound in the last till the last class. Now, continuing from here what is this? Right now I have given a bound on this quantity what I am interested in the expected value of this quantity right. So, I am going to take expectation of this quantity. So, when I have that I will end up taking expectation with respect to this expectation of this as well as expectation of this quantity ok. Now, let us type a bound expectation of each of this term. So, now when I look into the expectation of this quantity what is the random thing here ok. So, remember when I am taking this expectation I am taking this expectation with respect to the 2 random quantities. One is with respect to the randomization of the player because he is going to pick an arm according to some distribution in each this round and also the adversary can also randomize his loss vector right. So, this expectation is with respect to both this randomness. So, one thing I can do is I can split this expectation into 2 parts one is this expectation. So, by the way I missed the summation here right there should be a when I wrote this it there has to be summation over here which I missed. So, this is the summation for all the quantity because of this summation 3 equals to 1 to n which right it should be better or maybe I will just keep it and write it as and write like this as a summation of all this quantity ok. Is this clear? Why I have to add this summation because I am doing the summation. Now, let me do that summation here also. So, basically I am trying to deal with the summation of the first term. So, this is I can always write it as expectation of 2 quantities one is with respect to the expectation of random choice of the player and one with the random selection of the losses by the adversary. So, because of that and so, is this clear why I have split this expectation into 2 part? The inner expectation is the conditional expectation given the choice of the losses selected by the adversary conditioned on that the learner is observing the losses that he observed by playing an action right based on that he is going to update his probability distribution P t. So, given that he will have some probability distribution. So, the inner expectation is with respect to the probability distribution of the learner whereas, the outside is the expectation with respect to that of the adversary or the environment whatever ok. Now, what is this distribution is this distribution here the expectation with respect to the P t because that is the randomness with which the learner is going to play the action right. So, if you do that I will keep it like this. Now, let us try to work out this expectation. What is this expectation here? i t is a random quantity that will be played according to distribution P t. So, if you look into that how can I write this as P t into 2 P t j j equals to 1 to j and what is this probability is going to be P t j right because i t is going to be j with probability P t j. Now, if you simplify this what are you going to get this quantity get cancelled inside term is just a constant and this will what will this quantity is it is simply this is going to be k times or like now this quantity the inside term is a constant this is going to be the same irrespective what is the choice of losses that the adversary would have made and so, I am going to pull this out and simply write it as. So, now, we are able to deal with this term. Now, let us try to analyze this term now what is this we have eta t minus phi of t times eta t ok. So, let us try to first expand this and see how this look like I am going to just expand. So, this is going to look like phi of 0 eta 0 sorry eta 1 minus phi of 1 eta 1 this is the first term and if you are going to look into the second term that is going to be phi 1 of e 2 minus phi 2 of eta 2 like that phi 2 of e 3 minus phi 3 of e 3 like this right. I have just expanded this what will be the last term n minus 1 eta n minus phi of n eta n. So, now, what I will do is I am going to club these two terms and also these two terms and also these two terms and I can do keep on clubbing like this right. If I can keep clubbing like this I can rewrite this summation in a different form that is can I re-express this summation here in this format ok and this is this kind of this is called usually A bell transformation and by the way the way we have defined this phi 0 the way we have defined phi 0 function what is phi 0 is going to be if you take phi 0 that is you substitute t equals to 0 at t equals to 0 all this losses are initialized to be 0. So, this all terms are going to be 1 this summation is going to be k. So, this entire summation divided by k is going to be 1 and log of 1 is going to be 0. So, this phi of 0 this term is going to be 0 right and what will end up with the remaining part here. I am just going to write that phi of 0 t minus 1 eta t minus phi of t eta t t equals to 1 to n. We have just shown this to be equal to or may be other way around t plus minus phi of n eta of n ok. So, we are just like this bound we are trying to play with each of these terms ok next let us try to see what what is this quantity phi of n at least we know when I am going to deal with with the last round n my eta t the way I have defined eta t it is going to be simply log k by n clear right for t equals to n. So, let us substitute this and see what we are going to get here. So, phi of n eta n is going to be I am going to just separate out this the first term is going to be what 1 by a log k minus I am just taking this part and the other part is going to be minus plus 1 by eta of log of summation p x p minus eta n l n i hat this correct. So, it is going to be eta n right because we are taking it at t equals to n because it is t n of eta n ok. So, fine this should be like when I am going to take it simply as phi t of n this is the definition of phi t of n for a given n this is now we are going to define. So, what is t affecting t is affecting the cumulative loss we are going to look at and this variable input variable eta is telling you with what you are going to multiply this loss with. So, with that definition we have this quantity here, but what I am interested in not this t n I am interested in minus of this quantity. So, I am going to take minus here. So, because of this this guy becomes positive and this guy is negative here ok. Now, let us focus on this term log k by eta n minus this is log of summation right. If I remove some times some terms in the summation this quantity is only going to be smaller because each term is a positive quantity here. Now, because of that and with this minus sign if I remove some quantities there I am only going to get. So, here i equals to n to k right I am going to only return the component kth component in this. So, it is going to be lk i it is correct. What I have done is I have in the summation I have written only one component it is up to me which component I want to return and in this in this case I have written the kth component and because of that this quantity becomes smaller, but this with a negative sign you know n is the last round yeah n is going to remain the same this should be n k that is right yeah. Now, if I just now further simplify this this is going to be what log of exponential get cancelled this eta n blocks of this eta n and what I will end up with is l nk hat, but by definition what is l nk hat this is a cumulative loss of the kth action till round n right. So, that I can write it as. So, now, I also know that this quantity is nothing, but from my earlier definition this I can write it as t equals to 1 to n this is nothing, but expectation where i is taken sorry this should be k here right not i because I am looking at the kth component. So, this should be a like k and this is like i t pt of tk this is we have shown that l t k hat is nothing, but this quantity this is n k here when we are writing cumulative. So, this is the cumulative of the first n terms. So, that is why we are letting t equals to 1 to n summation l t k hat just a minute what we wanted to say l. So, let me write this is this is the tk here l tk hat right. So, how did you write this term here what is the meaning of this why did you write this term this term l tk hat was equals to this term here right and this is equals to this term ok. So, maybe I do not need this term it is just this term here right. Now, what I am interested in not this quantity, but the expected value of this quantity right because there is an expectation here when I want to take an expectation there is an expectation here for all these quantities right I removed the expectation here, but there was an expectation and also there was an expectation ok. Now, I would then take expectation of this quantity here. Now, if I do that what I will finally, end up is log of k by theta n. So, now, this was like l tk hat and if you are going to take the expectation of this like the way we did I can split this expectation into two part expectation with respect to the random adversary and expectation with respect to the of the player's strategy. If I do that now you will see that this expectation turns out to be simply is that right ok fine. So, now, we are almost there. So, finally, now simplifying putting and clubbing all these things together. So, there is a mistake we made here right like you notice that when this log knocks off this exponent we ended up minus eta n l n k, but this was a minus and there was also minus here because of this we will end up with a positive term here not the negative sign ok. Now, putting all these two together finally, we have now a bound on this we have a bound on this and we we have just this term. So, if you finally, put all the things together. So, let me club all these things first thing is this one yeah no from here to here this is equality and ok. Now, what we have bounded is this we have bounded this part actually we have bounded this minus of whole of this part ok. Oh ok sorry this should be only the expectation of this quantity here ok. So, the earlier term still remains like when you are looking at the whole term this term still remains. If I am going to take the expectation of this term it is going to be expression of this term expectation and also expectation of this term what we have now shown is expectation of this term is upper bounded like this ok. So, now, if you are going to put all these terms this term the expected quantity of this we have shown it to be upper bounded by this quantity which is given by k by 2 summation eta t t equals to 1 to n ok. The next term on this what we have is the first part here the expectation of p of t n t minus 1 minus p of t eta t and then this term we have bound of log k eta n and this term here plus t equals to 1 to n expectation of l hat t k where expectation is with respect to i t going p t and we have also. So, we have actually another expectation outside here which is with respect to the randomness of the adversary this is I have just split this expectation into two parts here and then we have this part here which is I am now the last part I am writing expectation of here. So, I am taking this expectation right when I take the expectation when I take expectation I have to always take expectation with respect to the randomness of the adversary as well as the randomness of the learner. So, that is why that expectation I have split into two parts the inside part is with respect to the randomness of adversary and outside part is with respect to that of adversary yes same thing like here I have just write this part is the same right because here also I am taking the expectation with respect to the randomness of both adversary and the learner is that clear ok. Now, we are almost there this part knocks off with this this term and this term is what we have in the actual bound also right. So, finally, we need to show that this quantity is 0 or this is upper bounded by 0 if we can show that we are done fine then that if this quantity here is less than or equals to 0 then we have the bound of k by 2 summation t equals to 1 to n t plus log k by eta n and this is what we wanted to show. Now, the question is why is that this guy is summation is less than or equals to 0. So, for that instead of now going I can further details I will leave you to verify that if you are going to take phi of t any t see you look you notice that this is the difference of phi t function computed at n t minus 1 and n t in round t I am looking at the same phi t function one computed at eta t and one computed at eta t minus 1. Now, how are this eta t's are chosen? The eta t's are chosen to be either all the same quantity if we know up prairie what is the number of rounds we are going to play that is n or we have chosen such that they are decreasing in t right. So, what is eta t like this? So, this eta t is decreasing in t if you increase t this guy is going to decrease because of that this eta t is going to be smaller than eta t minus 1. Now, you can show that this function phi of t is. So, if we can show that this phi t is increasing eta then this difference is always going to be negative right. So, how we are going to show that what you will do is you take the derivative of this guy phi t with respect to eta and then you are going to argue that this guy is going to be greater than or equals to 0. So, I will leave it as an exercise for you guys because we have already done many steps. Only there is the one step which we are not verifying verify that to verify that you need to know a quantity called KL divergence which are not yet introduced, but look into the book. But even if you do not know this that is fine you should be able to just differentiate it in the standard fashion and manipulate the derivative to see that this guy is always positive for whatever is I you are going to choose. So, because of that this guy is going to be negative if this guy is going to negative then we have this upper bound ok. So, fine. So, this kind this now completes the proof of exp 3. So, what we are able to show that if you look into the regret bound bound on the pseudo regret you have this quantity which said that this will translate to regret bound of the order square root n k log k ok. Fine. Any you have any doubts regarding this proof so far? Yeah, we have went through many steps, but most of the step stars standards the manipulation steps were standard even though it is not clear like why we followed the steps in this particular sequence. So, this is the kind of standard steps we will incur when you are going to deal with any proofs on the regret bounds of adversarial setting ok. So, fine.