 So, let us just recap what we did in the last class. We introduced this notion of strongly convex function in the last class and then we derived some of this properties right. And then we started talking about how to use different strongly convex functions. Two candidates we discussed last time was Euclidean regularizer and the entropy regularizer. Euclidean regularizer was strongly convex in which norm? It was in L2 norm and the entropy one was strongly convex in L1 norm. So, we will see how to use these two regularizers. And we have also discussed that that the gradient being bounded the norm of gradient being bounded is in a way in a sense equivalent to the Lipschitz net property of the convex functions right. Now, continuing from our last class at the end we had shown that if I am going to use a strongly convex regularizer how my difference in the functions look like. So, we are always interested in the regret bounds will be of the form f t of w minus f t at w t plus 1 right something of that sort and we are interested in bounding that. So, let us rewrite that and try to complete the proof we discussed last time. So, we wanted to argue that f t of w t minus f t of w t plus 1 is upper bounded by L t square by sigma. So, what is L t here? Yeah, this is the Lipschitz constant for function f t and what was sigma? It is the regularizer parameter it is sigma strongly convex that sigma parameter. So, we say we are saying that now, if I am going to use my follow the regularized reader the regularizing being my sigma strongly convex function and each of my functions f t is Lipschitz constant is Lipschitz with constant L t then if this w t's are computed based on my follow the regularized reader then we are saying that this bound holds good ok. So, now, let us see why this is so. So, I made a formal statement of this in the last class, but I am just I am just stating what is that we wanted to show in that claim. So, now it is why this is true. So, for all t let us define f t of omega to be summation of. So, this is the function that follow the regularized reader trying to minimize in each round right I am just calling that function f t. All these functions f i's are they are assumed to be convex and also Lipschitz and this r w is assumed to be what strongly convex strongly convex with parameter sigma. So, anyway r w is anyway convex and this is all the convex. So, this this f of f t for capital F t is all convex function, but further we are saying that this r is sigma strongly convex. Will this make this whole function also sigma strongly convex? So, you can verify this that is indeed true like if I add two convex functions it is going to remain convex function, but if I add a strongly convex function to another convex function the sum becomes a strongly convex function and it continues to. So, this entire thing is going to be sigma strongly convex ok. Now, let us with this definition of capital F t let us work out what happens. So, by our rule of follow the regularized reader what is f t is going to be computed as this is like arg min of f t of omega omega right. This is how my follow the regularized reader is going to work. Now, since this guy f t is regular is a sigma strongly convex function I can write it as f of t w t plus 1 is upon bounded by f of t w t sigma by 2 norm of t plus 1 whole square. So, now, why this result is true? So, why this result is true? So, what I am doing here basically yeah. So, f t w t here is what w t is the minimizer of f t w that is my definition right. And now, I am applying I know that since f t is a less convex function or here is sigma strongly convex function it need to have this lower bound, but we had also something some some linear term here right. What was that? That was the gradient of f t of w t and something like w t minus w t plus 1 or maybe it was 2 plus 1 w t ok, but now that w t is the minimizer of f t d w it was getting nullified right. So, we had we had given this as one of the properties of my sigma sigma strongly convex function. So, one this is hold. So, here I am fixing w t and now this is another w t plus 1 ok, this is the minimizer. So, if this is the case I can also write it as now I will do the same thing at f t plus 1. Now, w t plus 1 is the minimizer of capital F t plus 1 ok. And then this has to this will be plus sigma by 2 norm of w t plus 1 ok fine. I have these two equations now. So, let us add these two together. So, if you add them. So, while adding I am also trying to do some simplification w t plus 1 and this is greater than or equals to sigma times norm of w t plus w t plus 1 square this correct like if I. So, right hand side by adding this and this they are the same I get this term and the other thing I have simply. So, f t plus 1 this corresponds to this and f t w t plus 1 corresponds to this and this and this term I have just simply taken on the left hand side. So, now, go back to this definition of f t function here capital F t function. So, f t plus 1 involves sum till t whereas, f t involves sum till t minus 1 right and both of them computed at the same point w t. What is this difference is going to be? It is going to be f t w t ok. And what about this difference? This is now computed at the point w t plus 1 for f t plus 1 and f t and this is going to be what minus f of t w t plus 1 and this is still sigma norm of w t minus w t plus 1 whole square right. So, this so far I have only used the property of this function f and the sigma strong convexity of this. Now, I also want to bring in the Lipschitz properties of my f t functions ok. By the Lipschitz property of my f t function how can I bound this? This is upper bounded as L t times norm of w t minus w t plus 1 whole square is this correct? This is just I am applying my definition of Lipschitz minus of my f t function. So, note that all I am doing for some norm right now I have not specified any norm here it could be L 1, L 2 whatever ok. So, if I am saying that my R w is strongly convex with respect to some norm this L t that is the Lipschitz is also with respect to the same norm ok. Now, with this I am going to now I have a lower bound on this through this and I have an upper sorry I have an upper bound through this and I have a lower bound on this. So, if I am going to compare these two things. So, does this Lipschitz is little square here just check I think there is no square here right for Lipschitz it is just a norm ok. So, if I do this now what I am going to get from this if I now this is a lower bound this is an upper bound. So, if I now going to compare these two things through this I am going to get upper bounded by my L t times norm of W t plus W t plus 1 right. So, using this lower bound on upper bound I have this relation, but now just plug in this relation back here and that is what you wanted to show that is F t of at omega t W t minus F t at omega t plus 1 that difference I have I am just use it. So, this will using this I have whatever I want yeah. So, if modulus is there, but if you remove modulus also the bound should hold right what is the Lipschitzness absolute value of this should be upper bounded by this. Now, from the left side if you remove the absolute value this can be only smaller right. So, the upper bound still holds ok fine. So, with this lemma what we are finally, able to show is let me write it as a theorem now ok is this clear why this is true what we did. So, earlier we have already a result which says that regret is is upper bounded by R of u minus R of W 1 plus summation F of W t minus F of W t plus 1. We had this results right earlier this is how we bounded the regret. Now, we are saying that so, then this is simply follow the this is the bound for the follow the regularized reader with my regularizing function R and what was W 1 here? What was W 1? W 1 is basically the whatever your algorithm found in round 1 right that was obtained by minimizing this R function. So, that is why this R W 1 I am simply writing as minimization over R of omega. In the first round you do not have this summation is empty you are only minimizing the regularizing function ok. So, that is why result of R of W 1 I am writing. Now, for this part we have just demonstrated it to be this ok. So, every term here so, this is for each term right ok. So, I had to make one more thing here. So, here what we are basically saying is this is like L t square by sigma. So, what is now L here? So, this is let us say they are all F t's are all L L L Lipschitz they have the same Lipschitz constant. In this case what this will turn out to be this is going to turn out to be simply N L square by sigma and that is what we have written here, but what we really would. So, suppose this L t's are not the same and they are going to be different in each round let us say this is L t square by sigma. I could I have just multiply N and N here. So, here either I can assume that my Lipschitz constant L is same in for all the functions in which case L t will be replaced by L and this is simply N L square by sigma. Other possibility I can think is ok this is t equals to 1 to N. Instead of assuming that all the functions are the same Lipschitz constant what I can assume is the average value of the Lipschitz constants. This is the average value of the Lipschitz constant right square average value of the squared value of the Lipschitz constant I can say that ok that will be some L square and in this case again I will get the same N L square sigma bar ok. So, what we are saying is as long as all your functions are convex with the same Lipschitz constant L then this bond holds. If you are going to use some regularizer some sigma strongly convex regularizer with sigma here then this holds or let them be all f t this f t functions be convex and each f t is L t Lipschitz constant and the same bound holds provided I interpret this L square as the average value of all the L t squares fine ok fine. So, now we have this nice simplified version on the regret bound provided my I use follow the regularized leader with a strongly convex function sigma in this case sigma strongly convex function. So, now, let us go back and work out if I use different different regularizers which we discussed like our Euclidean distance and entropy distance what is the bound we are going to get here ok. You ok for let us go back one regularizer is I am going to take 1 to be. So, let me call it R is R of u is 1 by 2 eta is and second is u is we said summation u i log u i let us say you are in some d dimensional space right. So, this guy is is it a strongly convex function we discussed this last time right this is a strongly convex function with what sigma value. So, I have it is then sigma also here eta also here without eta it was if I have this is one strongly convex right. So, again what was the definitely what was the property that we used to check strong convexity the Hessian based condition we had right it was del square R of omega x into x if this is going to be greater than 0.2 square. So, with respect to this norm then this guy this this function R is a sigma strongly convex function. Now, can you check me now I am already norm has been defined for you to be l 2 norm. So, what is this? So, if you take this function and try to compute this condition for what sigma it holds 1 by eta. So, let us say this is then 1 by eta strongly convex and with respect to what norm it is true with l 2 norm and what about this? So, for this we made some more assumption right this u i's are. So, each of this u i we also defined. So, this is true for any u this Euclidean distance or this Euclidean regularizer was defined for all u, but then we defined when we defined this entropy regularizer we said that this is coming from a set S where all my w all my components are positive u and also. So, this is what we called probability syncs simplex which you did not like and we call it as probability space right we all this u as it is coming from this space. This function is it again strongly convex and now see when we said this thing right this was nothing, but l 1 norm. So, I do not know we computed this last time or not you can verify that this one if I am going to take this quantity to be 1 this is going to be 1 strongly convex with respect to l 1 norm this is a function right with respect to the l 1 norm and one can also show that if instead of this if you take it to be some b number this can be shown to be like 1 by b strongly convex or like not exactly you can take it to be less than or equals to b 1 ok. So, sorry like earlier may be equals to 1 it is anything less than or equals to 1 or less than or equals to b ok. So, now what is going to be this bound look like if I am going to apply these regularizers ok. So, can you now compute if I take my Euclidean regularizer can you come to work out what is this going to be.