 So, now, let us see the update rules are not to be simple gradient updates gradient design based update. So, if we have this what is the regret bound we are going to get that is of our interest right. So, what is the regret bound? So, first thing when we give a regret bound for the follow the leader how did we do? We showed that f of t of t minus f of t omega this was upper bounded by f of t w t minus f of t w t plus 1. We ran it from t equals to 1 to n t equals to 1 to n. Now, with this adding regularization function can we have an equivalent version of this ok. So, then what will be that? So, that our first lemma says that. So, once we had a regularization the only difference the extra time we get is this term on this bound. So, this bound here get changed by this bound. So, do you anybody see what how can I get this or why this makes sense? When I did my follow the regularize leader right what is my algorithm I was doing argmin of summation f t of omega t starting from 1 to whatever let me be i i starting from 1 to t ok and this was an r omega. This is r omega which is I decide right this is not generated by the adversary this is totally in my control. I am going to treat this function as some f of omega that is a function which is generated in the 0th round ok. So, if I do this now I am basically saying that instead of starting my algorithm from round 1 I am going to start my algorithm from round 0 ok and then this this same step should hold for that also right. If this hold and what is this and in this f of omega is nothing, but r of omega earlier I was starting this algorithm, but now I am treating this to be r of omega to be another function which is generated in the 0th round. So, this whole thing should hold if I start from 0 everything right. Now, once you have this now do you see that this actually implies this you see it. So, now instead of now what I am going to treat is my sequence of function I have seen I am going to treat it as f 0 f 1 f 2 all the way up to f n where f 0 is my r function. So, just simply expand this right. So, you take it to be separate out t equals to 0 at term here then this is going to be 0 of omega 0 minus f of 0 minus what was this term here this was some u right whatever I am interested in my u and other terms I just keep like that t equals to 1 to n f t omega minus t equals to 1 to n f t of u. This whole thing is upper bounded by again I separate out the first term here that is going to be what f 0 of w 0 minus f 0 of w 1 right plus the remaining terms. So, just try to manipulate these two terms. So, this term gets lumped off with this now you just bring it here. So, then that is all we need it or maybe take this on the other side. So, if you do that you get whatever we have here. So, if you just do this, but now what is f 0? f 0 is basically r function right. So, this you replace it by r r of u minus r of omega 1 plus this terms. So, if I have a regularizing function I will just get this yes we will assume that this is also it is f of 0 right like this. So, then only the whole of my function is convex. If each function is convex the sum is convex. So, I will also make it convex by assuming this r is also convex. So, now with this once we have like this earlier we have shown that my ok now with this what is the regret bound I am going to get ok. So, let us try to work out this regret bound for a given u. So, this is my regret interested and now I need to compute this value right ok. Let us compute this upper bound for my case that what is my case I want to now take this to be r equals to norm of this and my f t of omega to be omega to be z t ok. Now let us compute what happens to this portion. So, if I am going to take this. So, my regret of my f o r e l for a given u this is going to be upper bounded by this quantity right r of u minus r of omega 1 plus t equals to 1 to t of omega. So, now let us substitute this value what is this value this is 1 upon 2 eta w and what is this sorry this should be u here ok what is w 1 is going to be. So, when I have to do w 1. So, this is for t right if I am going to compute this at t equals to 1 in that case it is till t minus 1 right my regularized leader version. So, there is no term here when t equals to 1 what I will be interested is only on this and this term is this quantity what is the minimization of this quantity ok. So, one more thing I have forgot to tell is let us assume that my S is r d here the entire r d space ok the Euclidean space. So, because of this this can be S which I am taking to be r d. So, for t equals to 1 what is this quantity is going to be. So, this is simply saying r min of norm of over w right what is this quantity what is that minimizes this quantity 0. So, then this w 1 is 0. So, this term is going to be 0 I do not care about it ok. Now, let work out the remaining terms. So, what is this term this is going to be w z t and this term is going to be w z t plus 1 sorry. So, this is going to be w t computed at w this is w t plus 1 computed at z t is this correct. So, f t of this is f t of w t right and this is f t of w t plus 1. So, I have just substituted this value ok. Now, if you are going to simplify this further. So, I am just writing this as a compact form for these two things can I do that. So, because z t is common it is simply w t minus w t plus 1, but what I know about this. So, w t plus 1 I got it through w t right through gradient descent. What was the relation between w t and w t plus 1? So, how did we get w t plus 1 to be w t minus eta and that is z t right. So, can I substitute here? So, if I substitute here this quantity is simply z t sorry eta z t. So, after substituting this I can write it as norm of z t whole square yeah line of product between them is this correct. So, finally, what I got is the regret bound of on a particular n is above bounded by 2 eta norm of u square plus this quantity. Now, let us again use the same condition we used earlier right. So, let us assume that this parameter z t in every rounds are such that let us say this norm of z t are less than l for all t. So, the gradients here z t denotes the gradient in round t right let us say they are all up like this. So, now, what is the bond I am going to get? So, this guy is also going to be l right because this is also u is coming from the same space as that no like. So, what is this u? So, this zeta are the gradients ok. What now I am assuming is these are the gradients and I am going to assume that they are upper bounded by some quantity l ok and what is this u? u is one of my reference point ok. So, let us assume that this reference points they are coming from a set u right I am also going to define this u to be such that u where all of this u's they are also bounded they are also coming from some bounded interval ok. So, only thing I am doing is instead of let us say instead of considering all the points like this I will consider some ball ok. So, one possibility to do this is let us all all what is this point x such that norm of x is. So, what does this set denote if I write it like this? So, assume dimension is 2 circle right of radius b in this case ok. So, I am just going to assume like even in this dimension I am going to consider some ball of radius b here. Now, because of that this guy is going to be some b plus I will have this eta term equal to eta and this guys I have been assumed to be upper bounded by l. So, this is going to be l. So, finally, what I end up with this is b by 2 eta plus l n eta n is coming because I am adding l for n terms. Now, we have seen this kind of bound earlier also right what are we are going to do now what we will do? So, eta is a parameter for us right this is a regularizing this is a parameter which we used in the regularizing function. So, how now it is up to us how you want to choose it can I choose it in some specific fashion here ok. Now, treat this upper bounded is a function of eta right is this a convex function in eta ok this part is linear in eta this part is 1 by eta is what ok. So, it is a convex function just you just do the second derivation 2 differentiation right 1 by eta square sorry this is going to be what if you differentiate this 1 by eta 2 twice you will see that it is having a positive look positive second derivative if eta is positive. So, if you now just differentiate it and try to find a point eta and plug it back and tell me what is the bound you are going to get. So, basically optimize this with respect to eta what is the bound you are going to get. So, if you are going to choose eta to be b l times square root 2 n it is in the denominator, but what is that this this l should be the numerator or denominator 2 ok just differentiate this and tell me what is that optimal value of eta here under root b by. Now, if you plug back in this what is the bound you are going to get I cannot tell me what is the final bound we are going to get suppose if I choose my eta specifically like this it is going to be what square root of ok just do this l n n square root b square root square root of square root b will be here then becomes b l n will be there and I will have 1 by square root 2 2 by 2 will you get this square root 2 b l n and what is this now because b and l n are constants which are chosen in this fashion and then this is like order square root n. So, you see that now even for the linear function linear loss function if I use my regularization in this fashion I will end up my follow the leader that is follow the regularize leader is going to give me a regret bound which is order square root n ok that means, this is going to this is giving me a sub linear regret ok fine. So, finally, so far looked into two types of convex function one is linear function and another is the quadratic function we have looked at, but what about the other convex functions is that we can do something about this ok. It so happens that studying other convex function is almost same as doing studying this linear functions here because of the property of a convex function which allows us to represent this convex function with a lower bound which has a linear term it. So, let us discuss that. So, linearization of convex function. So, how many of you know definition of a convex function what is a convex function this means f of x plus y lambda x plus equals to plus and what is lambda 0 to 1 right. So, we have already seen it in other class, but anyway let us try to again do it here. So, suppose let us say my function is like this and I have two points here one is at x and another at y. So, this is my f of x right and this is my f of y. So, where is lambda x plus 1 minus lambda y is going to lie? So, it is going to be somewhere in between right because lambda is between 0 and 1 and we are also assuming that this function f is defined on a convex set ok. So, if you take a linear combination of any two points of its domain it is also going to lie in the domain. So, this is going to be lambda x plus 1 minus lambda and this is let us say this is its function and what is this quantity here lambda times f of x 1 minus lambda f of y what is this? So, as you vary lambda I am going to I mean scrape through this line right like from here from for lambda equals to 0 I am going to be at here and lambda equals to 1 I am going to be at here. So, what is the saying? It is saying that my function value at this point that is what is this? This is f of lambda x plus 1 minus lambda y at this point is always going to be lesser than the line joining these two. So, this is the standard thing we know about convex function, but another interesting property about this convex function is to take any point let us take this point I will can come I can have a tangent at this point on this convex function which acts as a lower bound for this entire function that is maybe I should be drawing the slightly better one. So, let us say I take a point here another point I can draw a tangent. So, tangent here is the point where it touches my function only at this point ok and this. So, whatever this point and I can have a tangent here which is like a lower bound for my entire function like if I have a this if you look at any point the value on this line is going to be always smaller than the corresponding value on this function ok. So, in this case then let us say this point is some w. Now, what is the property of a convex function is? So, let us say let this be a convex set a function such that. So, it is saying that if f is my convex function. So, let us suppose s is a convex set then my function f is convex if and only if if it so happens that you take any point w at that point if I can come up with a another point z such that this relation holds. This is true for all u that is you tell me a point w I will be able to come up with a lower bound on my function f. So, this is my function f this is true for all u right that means, this is a lower bound on this function and and what is this lower bound? This lower bound is now defined in terms in terms of the value at that point w and also another point we are saying z which exists at that point z and if at all that exists then then that function must be convex ok and now this z here whatever we said exists this z is called sub gradient of f at w and it is denoted as del of. So, anybody has question about that z did not be unique all we are saying is there exists a z and that z could in fact, depend on the w the point at which you are looking at ok. And so, then if that such a z is going to be called as sub gradient of this function f at omega. So, we are not going to prove this this is like a standard result in convex theory of convex function and, but what we are going to do is we are going to explore this result to linearize my convex function ok. The way I have drawn my convex function here do you think this is this this is differentiable at every point right because it is smoothly changing at every point. So, in such case this if this such this sub gradients can be unique and if at all it is unique that is you tell me a w there exists only one particular z for this this relation hold then, but that particular z we are going to call it as a gradient of my function at point omega sorry w here ok. If if at all this property holds with a unique z then that z defines my gradient of my function f at that point and in that case if that is denote we are going to denote it as del f of omega. So, when we say this when this is uniquely defined the gradient when we say this is not uniquely it is satisfy for a unique z, but there are there could be possibilities we are in that case this del of f omega can be a set right because so, what we are saying this z is called sub gradient right and we are just denoting it. If there are more than once that satisfies that then all of that will be called sub gradients at omega and that will be denoted by this notation and whenever it is unique then we are just going to write it as with this capital del of f omega is that clear? So, what I want to say is if f is differentiable then this del f of omega is a single term and it denoted as f of omega ok. Now, let us see this why so, do you know can you come think of a case where the sub gradient can be a set that is it can have more than one elements in this ok. So, anything like whatever I have some let us take a convex function like this is differentiable at every point yes right this is smoothly changing. So, at any point I can have only one uniquely one unique tangent that is passing through that line ok, but now take is this a convex function right it holds our property right if you take anything and my function value is going to lie between these two that is the property of the convex lens. But now, suppose is this function differentiable at this point this function is going to not differentiable at this point right because when I approach it from the right it has a negative slope and when I approach it from the positive side it is having a positive slope. So, at this point it is not going to have a it cannot be differentiable, but is there a z here? So, let us take this is the one particular w which I am interested in. This relation can be satisfied by a unique z or there could be multiple z that could satisfy this relation. So, it so happens that in this case there could be multiple z that could be satisfying this one possibility is this one possibility is this maybe you can think of many lines which are all kind of touching this point only at one point and they are like a tangent here they are also lower bound to this function ok and, but there is no unique line there are so many lines right and each of this lines can corresponds to one z right. So, in this case at this point w my delta f omega can have multiple points. So, here the sub sub gradient is a set it is not a single term, but if you look into at this point here my function is differentiable. So, in that case my sub gradient will have a single term which I am going to call it as simply the gradient of my function at that point ok. So, now what we basically shown how can we exploit this right. If you are going to reorganize this function here what we have is f of omega minus f of u ok. So, earlier ok let me write it in terms of this we earlier we had this function right this is my regret. Now take a particular t and look at the difference f of omega t minus f t of omega. If my f t is a convex function I am going to appeal to this function here then what I can do then this is going to be what is this true if my function is convex can I upper bound this by like this using this relation. So, what I am doing is in this case I am looking at my sub gradients at w t my points w t and this z t is the gradient of my function f at the point w t ok. So, if then I know that this relation has to hold for some z t ok and if my if my function is convex and let us say it is differentiable everywhere then this z t I can as well depressed by f of t right. So, what we have done basically is we have given here f is any arbitrary convex function right. What we have done is we have replaced we have upper bounded its regret by so, what is this this is nothing, but if I reorganize this this is nothing, but u of z t minus. So, let us see what is this if I want f of t if it goes on the other side right. It should be w t minus u right just verify this this is not u minus w t. So, if you write this then this is w t z t minus u of z t. So, what we have basically now see so ok this could be an arbitrary convex function, but if you can linearize it I can upper bounded in terms of a linear function where my z t is the sub gradient of my function at that point w t. So, once I have this what all the things I have done here for my linear functions I can appeal to this and get a bound here right. So, that will give me a bound here, but this is already upper bound on my regret. So, that bound also holds on this regret ok. So, with this I just want to write this pseudo code then we will move then what we have finally ended up with is what we are going to call it as online gradient descent algorithm. And how does it work? It takes a parameter theta and then it initialize w 1 is 0 because w 1 I do not have any control and then the update rule is w t plus 1 is equal to w t minus theta z t and what is z t here? z t is the gradient of my function at w t. So, this is what we are going to we have simplified our follow the regularize leader for the specific case of L 2 regularization to be to be this online gradient descent algorithm. So, we already discussed why this is gradient descent right ok. So, let us stop here.