 So, before we show the regret we get by by this algorithm follow the regret, follow the leader, we are just going to show that the regret whatever we had is can be bounded in terms of this loss function observed at two consecutive predictions. So, just we make that formal that is lemma and I am also going to write it as this is a FTL policy and also I am going to use a reference one u this is given as F t 1, you see like when I looked earlier at the regret I had a in minimum over over all u right, right now I am not looking at all u, but I am looking at one particular u this loss is computed at one. So, this is just saying that if I am going to apply throughout all the rounds one u always this is the loss I am going to get I am going to compare this loss with whatever the loss you are going to incur by your algorithm. So, this is the loss you are going to incur right. So, w 1 w t's are the one selected by your algorithm, we are going to argue that this guy is upper bounded by this quantity which is at every point at every t we are basically looking at F of t w minus F of t computed at w t plus 1. Yeah, it is the next. So, if t is at w t is the one selected in round t, w t plus 1 is selected in the next round ok. So, we have basically represented everything in terms of the difference between F t function computed at w t and w t plus 1. So, this is basically this is basically the major step in this after that it will be easier for us to show depending on what function F t is the regret bound on this, we are going to show this ok. So, we want to show this showing this is same as showing like if see the first term is going to be the same in both of this. So, showing this is same as I want to show that this regret is this is going to be showing same as I want to show that t equals to. So, if I am going to look into this if I just knock off this term because the first term is same I want to show that this guy is upper bound this is a lower bound on this. So, I am just rearranging these two things. So, is it fine like I want to show this is a upper bound on this, I am just saying that is same as saying that I want to show that this guy. So, if you just like rearranging these things ok. So, we are going to prove this by induction. So, t is running from 1 to n right. So, let us take n equals to 1 first round is this true for first round just by this argument right. So, in the first round as soon as you get f 1 we are going to choose w 2 which is going to minimize this function f 1. So, for n equals to 1 you have f 1 of w 2 which is definitely going to be smaller than any point u. This is by that way you are choosing w 2 ok ok next we are going to select assume that this inequality holds for n minus 1 round and then we will show that it will also holds for the nth round ok. So, then it is like t equals to 1 to n minus 1 f t w t plus 1 equals to n plus of f u. So, notice that this u is any number in my s, but fixed right from the beginning ok. Only this w t's are varying as per your algorithm ok. Now, what we will do is I am going to add on both sides ok. I am just going to add this both terms. So, if I add this both terms this is going to be t equals to 1 to n f of t w t plus 1. Like I add this and absorb it inside the summation and this is going to be upper bounded by ok I have just added yeah. So, this instead of n minus 1 I have written it n because this term is this one this one what you are saying this should be. So, as I said this is this u is arbitrary right like this this u is any u. So, this particular u I have chosen this holds for any u. So, I am going to choose this u to be w n plus 1 that specific value chosen by that algorithm in the n plus 1 round. So, if you do that is this going to be f of n w n plus 1 plus t equals to 1 to 2 minus 1 f t w n plus 1 right I am all I am doing this doing this this should hold because this is true for any u and so far particular u equals to n plus 1. But now if you look into this this is nothing, but they will add up, but I do not want that I want them to add up on w n plus 1 I want them to add up because I know how w 1 plus 1 is selected they are selected in specific fashion right, but you will just a fixed quantity for me given thing. So, I can still add up right with w n plus 1, but what is this by definition this w n plus 1 is such that this is basically mean over because w n plus was selected basically by minimizing this over all u that is why this inequality this equality holds here. So, what we have done we have shown that this is upper bounded by the minimum of this is are we done with this. So, this is the minimum over all u right. So, if you take one particular u from this it is going to dominate this. So, I can now take that like this is true this is true if you whatever that initial u you started with you just take it it has to dominate this. So, this is like any u this is for this is for any u. So, recall that the way I am being I am first saying that this inequality holds for any u belong to S ok and now I am trying to do induction. When I came to this step since this is true so far for any u I took a specific u to be w n plus 1 and then I got on this, but now this is the minimum over all u. So, this should be if you take up any particular u this should be an upper bound. So, what I have shown is now exactly what we wanted. So, then all we need to to prove any regret bounds all we need to worry about is this upper bound. If we can give an upper bound on this quantity we have an upper bound on also the regret as if we can show an upper bound see now this guy is no more specific on u. Now, if I can bound this quantity this will be a regret bound on this and that is holds for any every all u. So, that is what my regret definition was ok. So, all of you followed the steps here ok. So, only trick we used here is after adding this we made sure that this guy is an basically by choosing this u in a sub in a specific fashion this becomes an actually this is this w n plus is coming by choosing this minimization problem like this by all the previous n f t functions ok fine. Good we have this bound expressed in this form now let us compute this what happens for a specific convex function ok. Earlier one possible convex function we have done is like some w t v t ok, but I will not directly jump to this instead we will look into a quadratic loss function. So, we are going to take a f t which is defined as. So, all of you understand what is this? This is a squared l 2 norm ok and I have included this factor half there. So, now let us try to compute what happens when my functions are f t is like this ok. So, in each round what I am going to do? I am going to compute my w t to be arg min of my i equals to 1 to t minus 1 this quantity right half of w minus v t for w. Can somebody compute this quickly and tell me what is w t? This should be easy to compute right you can differentiate this no. . No, how are w we wrote right f t l algorithm. So, how did it choose? . Oh w 1 ok we are talking about w 0 yeah you choose in any way you like ok yeah only we are only worried about subsequent ones. So, this is. So, n t equals to 1 I do not have anything to optimize here I will choose w 1 arbitrarily, but after w t equals to 2 I am going to start optimizing yeah which is going to be just the mean of all v t's right. So, w t is going to be so, it should be v i here. Now ok so, this is for w t if I am interested in w t plus 1 it is just going to be like this. So, we are going to basically average first t terms for the w t plus 1 prediction. Now this can be kind of just rewritten . So, you can go and verify that you can rewrite it in terms of this. This is easy to observe like you can split the summation over t minus 1 terms and the average of the first t minus terms is going to be w t and then the last term comes like this. You can just verify this is just like a manipulation. So, if you just add t minus 1 t minus 1 and then split this as. So, I will just end up with this ok. Now I want to further simplify this. So, if I am going to simplify this as minus v t this term is going to be 1 minus 1 by 2 into w t minus v t ok. So, basically what I did to come from here from w t plus 1 I just subtract minus v t you also subtract minus v t from here and you will have this 1 by 1 minus p common term and you will end up with this term ok. So, now go back and plus this relation in what we are really interested here. I am going to take one specific term here f of t w t minus f of t w t plus 1 here. Now by definition what is f of t w t? It is f of t computed at w t right this is nothing, but already f of t of w which is in this fashion. So, f of t w t is half of and the other quantity I am going to get. So, now what I will do is I have already expressed w t plus 1 minus v t in terms of w t minus v t. I am going to plug back that relation here from here this quantity and if I do this I will end up half of 1 minus 1 by t whole square times. So, is this correct like I have just substituted this by this quantity and then I have just pulled out the common term v t w t minus v t whole square. Now if you just expand this square term here this is going to be 1 plus 1 by t square minus 2 t 2 by t and then if you knock off the one term here what you will end up is f of w t minus f of t w plus equals to half of minus 1 by t square by 2 by t times the norm right this is will be this is a square right. So, this will going to get 1 square plus 1 by t square. So, we have substituted this entire thing here right. So, sorry there is a minus here right which I made a mistake here this minus I have to write here not it was not plus is that correct now. Now, what I will do is I will ignore this 1 by t square term because that is having a negative sign I will ignore that then I am going to get an upper bound and that is going to be and this 2 knock off with this I am going to get 1 by t times 1 by t minus v t ok. So, at this point let me make an assumption that this loss vector that has been chosen here. So, here notice that the environment is choosing the loss vector v t and I am defining the loss like this. So, suppose assume that my loss vectors are all bounded by some number ok by some number l some number which I know and just a bound. So, if you have this you know that my W t here is basically finding average of my v t's right W t plus 1 earlier we wrote this is nothing, but the average of this v t. So, because of that is it true that my W t's will also satisfy this condition if all my v t's are upper bounded by this because W t's are nothing, but just average of this v t's they should also be satisfying this condition yeah we can assume that ok. Let us see do I need to do this this is starting from t equals to ok the only issue that comes is only time when W 1 is coming is this point when t equals to 1 otherwise it is coming nowhere. So, that is you can always split this part for t equals to 1 and take the other part ok. So, you have separated that out so let us not. So, let us assume that t equals to 1 part we have ignored this let us only focus on the other term now, but for all the other terms you agree that this holds ok. Now, if you now you have this now you go and apply triangular inequality on this if you go and apply triangular inequality on this we are going to get this is upper bounded as norm of W t square plus norm of v t square that is one bound I can get and I know that both of them are upper bounded by L. So, I will end up with 1 by t times 2 L. Now, what I will do is I will take the summation on both sides that is what finally, what I am interested in is right. So, this is going to be what then summation t equals to 1 to n 1 by t 2 L. So, what is summation 1 by t 2 L? As t goes to 1 from n what is the bound I can get on this log n why is that? Because integration of 1 by x is log x right and here we are doing discrete, but fine I can make it continuous and get an upper bound. So, this is going to give me 2 L log. So, now, what I have to do is a bound which is independent of what was the u I am interested in this is true for any u. So, finally, what I ended up showing is for the FDL algorithm this is upper bounded by 2 L log n. So, L is some constant which I am assuming that is the bound on the norm of this W t square. So, now, if you see that finally, what I am able to show that the regret of my follow the leader algorithm is upper bounded by. So, is this sub linear this regret yes right. So, if you are going to. So, we have shown that this follow the leader algorithm has this good regret bound which is order log n. Then the next question is fine if you have taken a quadratic function like this you are able to get it, but is it true for any convex function or is it true that if it is true for any convex function why did not you consider this function and show that this also holds even if you have function like this then also this bound holds ok. So, we will argue it in the next class that if you are going to just apply follow the leader algorithm in this fashion when we have this linear functions like this we may not end up with a bound like this. In fact, we may end up with a linear bound ok. So, something more has to be done there. So, for that we will see that we have to kind of do a regularization on this it is not that we should arbitrarily allow this W2 is to change too much we should not possibly allow them to change them too much we should bit slightly constraint them. Then we will show that if you can do that then even for this linear case we will end up achieving a good bound like this and then we will show that any convex function which need not be just like linear like this that can be translated to be a linear function and then the regularized version of the follow the leader algorithm will give also good performance there ok. So, let us do all this in the next class.