 So, let us first compute the regret with this Euclidean regularizer and we will also assume that all my functions f 1, f 2 whatever I am going to observe are lip sheets or we are going to assume that they are n-lip sheet with respect to L 2 naught. Because now my Euclidean, my regularizer is defined with respect to L 2 norm. So, I will also take this parameter to be with the respect to the same norm. Now, what is this value? So, if I am going to apply my follow the regularized reader on this, it is going to be and u is going to be what? This is going to be R of u minus what was the second term? R of u minus minimum of R of u over u. What is that? If you are going to take all over the Euclidean space that part is going to be 0. And what is the third part? L square eta, but eta value is so, by sigma right, but sigma is what? 1 by eta. So, this is going to be eta quantity. Now, I mean I am ok, I should have directly done this. I let me do this and now suppose assume that that we have been doing right norm of we have been assuming this right. Let us assume this u is such that this, but instead of this like I think to be consistent here let us remove this and just assume that the L 2 norm of u is upper bounded by b ok. So, if that is the case this is like half of b square plus n eta L square. So, did I miss eta? There was an eta here right, why did I remove eta? I removed eta just because I wanted to initially argue it is the one strongly common, but this is what I am interested in. So, there should be an eta also here. So, do you say we had gotten this earlier as well? This bound when we just said when we just right now we are obtaining this as a consequence of the theorem we stated, but earlier also just based on my bound I had with my follower regularizer 1 and with the Euclidean regularizers I have already gotten this. Now, which I am recovering by substituting the values in my theorem right. So, now, when we gotten this what would we do eta? So, what is the value we got? So, when we had earlier we had gotten it to be 2 dL n right, but now if you optimize this you are going to get it as dL 2 n. This is just because we had we are taking this to be a you know be and so, but earlier we took it like this, but now we are just taking it to be this that is why this difference, but otherwise if you just to do this you are going to get this bound to be this by optimizing over fine. Now, when I am applying this Euclidean regularizer things whatever I get W's through my follower regularized reader they need not be necessarily be a probability vector they could be any vector in my Euclidean space. So, because of that if you are using this algorithm on something like prediction with expert advice where in every round my goal was to come up with the distribution on the set of experts. So, every every time my update used to give me a probability vector, but if I am going to use this Euclidean regularizer and the with follow the regularized reader I will not get that ok. So, how to account for that case I want it to be always taking so, I want it to always take value from a probability space or let us say more generally I want to take it values from a specific set not anywhere over my id space. So, here if I just going to let my follow the regularized reader we show that the update rule we are going to get is what gradient descent right. So, W plus 1 is Wt minus n Zt. So, this is the case when my loss my Ft functions were linear with Zt be the gradient in that round t. So, here this need not be necessarily. So, when I do this update rule depending on Zt this can be anywhere in my Euclidean space ok, but how to ensure that my finally, my weights are all within some space which I am interested in. To do that one possibility is you can redefine your R of u function to be if u is some s that you are interested in this set could be simply probability space and you could define it to be simply 0 otherwise. Now, if you just to do this where you want your use to be coming only from particular set then also the same thing holds whatever we have done everything holds it is just that we are not allowing it to take value something outside my space s ok. So, we are just defining, but even if you work out all these things again this is what you are going to get again that is my regret bound is here of the order square root n sorry square root n ok. Now, let us see what happens if I am going to use my entropy regularizer. Now, let us say my entropy regularizer when I am dealing with this is strongly convex with respect to L 1 norm ok. So, I am now going to assume my F 1, F 2, F n are L lip sheets with respect to L 2 norm L 1 norm ok. So, notice that like if a function is L lip sheets with L 2 norm and now if I am asking for is L lip sheet nets with respect to L 1 norm this L and this L need not be the same ok they could be potentially different ok. Now, compute again what happens to the regret and here also my space is defined to be like this ok. Now, let us compute what happens to my regret here. So, this is r of u what is the minimum value of this function with under this constraint what can you what is the minimum value of this under this constraint. Suppose this is minimization over all omegas right which are satisfying. So, let us for time being take it to be 1 ok what I am now interested in computing this is like r of u and now I want to minimize over u coming from s where s is defined like this. So, this is like a with a negative quantity right. So, if I just ignore this quantity still I am going to get an upper bound provided this whole thing is not a negative quantity right. If it is whole thing becomes a negative quantity then negative minus this I will end up with something ok right. So, what is the. So, here I am basically interested in minus of this right. So, if it instead of minus if I am take this inside this is like basically I am looking at the max value of this max value of minus r u ok. So, let me write it as. So, there will be some other term here which I am regret now r of u plus max over u of s of minus r u. So, what is minus r u? Minus r u is u i log 1 by u i I am just taking like minus log u i to be log 1 by u i ok. Now, I am interested in the maximum value of this over this space and I am assuming all this l 1 norm to be less than or equals to 1. What will be the maximum value of this? Need anybody come across this? That is the maximum value of this. So, this is actually the entropy function. When will the entropy will be maximum? Did you come across this? 1 by n, n is different n is the number of rounds d is the dimension you should distinguish between these two. So, that the comp. So, u i here is what 1 to d not 1 to n, n is the number of rounds d is the dimension of this space. How many components are there in vector u? So, when is this maximized? You are right like it is get maximized when your distribution is in your arm like you put equal amount of mass on each of the components. So, if you do that. So, this is like if u i is 1 by d. So, this comes out to be log d and then this is like u of i, i equals to 1 to d right or sorry this is like 1 by d. And this sums to what? It sums to 1 because we are adding d times 1 by d. So, this will be upper bound is just log d. So, this whole thing here is log d ok. One more thing I am doing here is. So, when I come I did this right like I am assuming that all these components sum to 1 that is the probability this this this w's are all probabilities. So, that is that sum should be equals to 1. So, now, if you want to maximize this quantity over this probability this is true ok. So, now suppose instead of 1 if you take b here the summation of this components is b. What is that? You expect that maximizes this. What is that u u u vector that should maximize this? If you are constraint is that the norm of 1 is b. So, b by d why is that? So, all the elements are equal right in that case. So, this will be like becomes d by d and what will this be? u i is b by d right. What this will be then? d by b and you can similarly like simplify this and get it ok. So, for us it will just to be concrete we will just focus on b equals to 1. So, where we are interested in this and for this everything is clean what we will get is finally, this guy is upper bounded by log d ok fine. Now, what is this other part? This other part we had L this Lipschitz constant L square N and we had a sigma in the denominator. What is sigma for this case? So, when it was let us say b equals to 1 that is like 1 sigma strongly convex right. So, it is simply going to be 1 in this case. It is going to be simply L square N in this case ok. I am going to take this to be 1 by eta just to make sure that. So, what I will get is finally, this is all 1 by eta here and why did I get this finally. So, if there is 1 by eta here. So, this will make it what strongly convex. So, this is just a constant here right it will just come out everywhere. So, if I am taking b to be just 1 this is still going to be 1 by eta strongly convex ok. So, that I will that sigma I am going to replace it by 1 by eta. So, that will give me L square N eta ok fine. So, finally, what we get this is R of u sorry R of u as u i log u i. So, what we got this quantity as simply log d plus what is this term here? So, I am also missing 1 by eta here right there should be an eta here 1 by eta 1 by eta plus 1 by eta and this term here is L square eta N ok fine. So, this is what we get. Now, this use what was you telling you here the u was telling how competitive I my algorithm when I am going to use u throughout right. So, if I am going to use the same u throughout what is my regret with respect to that. So, here what I am doing is I have to choose if you now let us go let us try to put it in comparison with the follow the worry prediction with expert advice right because now I am interested in I have brought in distributions here. When I was trying to deal with prediction with expert advice how did my I define my regret? My regret was whatever the expected cost my algorithm is going to incur with respect to the single best best expert right. So, there how to. So, I want to now bring that here. So, if I have to make what is the single best expert then I can then this u's are like unit vectors right. So, this u is u comes from one of these vectors like e 1 e 2 like whatever e d what is e 1 here is a vector of all zeros except for the first place. That means, I will be basically comparing my performance against playing expert one all the time. If I am going to use u to be e 2 it is like I am comparing my preference always with respect to playing second expert all the time like that. So, so let us take u to be any one of this then what I am basically doing I am comparing my performance with respect to one particular expert being played in the all the time right. So, that is what also I was doing when I was computing the regret for prediction with expert advice. So, suppose you take u to be any one of them e 1 e 2 or whatever. Now, what is this term is going to be? So, when I take u to be any one of them. So, let us take u to be e 2 in that case u 1 will be 0 u 2 will be 1 u 3 u 4 rest are all going to be 0 what is this term is going to be in that case. So, it is let us say u is 0 1 0 0 all zeros it is going to be 0 right. So, how we are just going to define 0 log 0 as 0 ok log 0 we do not know how to define, but let us take the convention 0 log 0 is 0. If that is the case when it is component 0 0 log 0 is 0 when it is 1 1 log 1 is also 0 everything becomes 0. So, the first term vanishes. So, what we will only end up with is 1 by eta log d plus l square n eta ok. Now, can you optimize this and tell me what is the what is the regret bound finally, I get for by minimizing this over eta. So, what we have finally, 1 by eta log d plus l square n eta just minimize this with respect to eta what is that gives you the value of eta that minimizes this optimal value of this is under root of n l square by. So, now, if you just to plug it back here it is reciprocal of this ok 1 by n star. So, if I am just going to plug this what I will get is n l square log d plus and reciprocal of this it is again going to be n l square log d. So, this is nothing, but 2 n l square log d. So, if you just simplify this this is like 2 times 2 n log d. So, you see that. So, here also what I had gotten say l 2 n, but in this case I have said b to be 1 that is fine and I will get same square root 2 n and the extra factor is simply log d. Also can you compare this regret what you have with the regret you got in the for the prediction with expert advice. What was the regret you have gotten for the prediction with expert advice? So, this is same as this right 2 n log d, but now I have this extra factor l here, but what is this l here? Lipschitz constant. So, when I was dealing with prediction with expert advice what was my convex functions there? So, there every time a convex function was like a linear function right because I was looking at my expected loss which was like this some v t. v t was the loss vector chosen by the adversary and w 2 was your weight this was the case. Now, for this function f t what will be its Lipschitz constant with respect to l 1 norm? How you are going to compute its Lipschitz constant with respect to l 1 norm? So, let us take f t at w t and so let us take it w and let us take another value at u for some function t. So, this is going to be what? So, I that definition this is w of v t minus u of v t in this case. Now, let us try to see if finally, if you are going to compute the Lipschitz constant of this function with respect to l 1 norm is this is going to be square root 2 ok ah ok. Now, we have this to L square. So, this I can write it as u w minus u like this right. Now, now I am interested in L 1 norm of this right. If I am going to bound it is this bound true we have discussed last time that if I take this is to be L 1 norm the next part is going to be the dual norm of that. If it is L 1 norm this is L infinity norm and what I want is for Lipschitz thing I want to now see what is this value is going to be right. So, what I want finally, I want yeah. So, if I can find what is this quantity with respect to infinity norm then I will get the Lipschitz constant for this function ok. So, I would let us take even if I take the absolute norm of this is this correct I am just applying whatever the definition we have last time. So, what is this this V t is what our vectors these are we are assumed this vectors to be loss vector in the in the range 0 to 1. So, because of that what is the infinity norm of this V t it could be maximum 1 right. So, if I am going to take my loss functions to be the linear functions of this form this linear functions are happens to be Lipschitz with constant 1 right here. So, what we have is all fine looks like what will with by this analysis what we are going to get is finally, regret bound is square root 2 L 2 N log d, but this Lipschitz constant for the linear functions that we encountering in prediction with expert advice is 1. So, if I want to compare this scenario for my linear function what I am getting is square root 2 times square root 2 N log d. When I applied my weighted majority algorithm we are able to derive their regret bound which has only square root 2 N log d, but here we got an extra factor of square root 2. So, we have to bit go back and visit that there is a some error here or is it that still there is no error here and we are still going to get a factor of square root 2 here extra, but ok. So, if we just look into the order wise just ignore about the constants. Let us now compare it with respect to my parameters what are my parameters number of the dimension here which is equal to number of experts and the number of rounds here. So, whatever we get this order it is the same as that in the weighted majority algorithm right. There also we had gotten if you ignore the constant like square root N log d here also we have the same thing ok. Actually if you work out all these things with this regularizer and find out what is the optimizer in each round you will see that these weights turn out to be the same exponentially weighted values that we used in weighted majority. So, if you just go back and simplify what is the WT you get in each round for this regularizer function you will see that this will actually turn out to the same one as that in the weighted majority you will get the same update rule as that ok fine. Now finally, when to use what kind of regularizers right. So, we said in one case Euclidean regularizer can also be used by restricting it to the domain of interest. If your domain of interest is only the probability space then you could also gone and used entropy regularizer then the question is which is that you want to use. You notice that the regret bounds are getting affected by this Lipschitz constants and these Lipschitz constants are basically governed by which is the norm you are going to look at or they will be governed basically by first thing the norm with respect to which your regularizer is strongly convex and that the same norm is in turn governing this Lipschitz constants ok. In your when you took your regularizer in the L1 norm that was when you use entropy kind of regularizer there what will be your L order what will be your Lipschitz will be of the order L when you have entropy regularizer also ok fine let us let us come to the Euclidean regularizer. Here what will be the Lipschitz constant with which your entropy regularizer is going to be Lipschitz with what constant did you work out that like yesterday we talked about something right we said what will be the Lipschitz constant of this Euclidean regularizer did you check at last if even it was an ok fine. So, when you are looking at say let us say here your competitor is this unit vectors right each one of them if you are going to look at U i here. So, U here which is coming from this space what is will be its L1 norm it will be bounded right it will be at most 1 because only 1 component is going to be 1. If you are going to let your U to be any value in your Euclidean space what will be that value suppose let us say you allow it to be instead of it to be any value you allow each component to take only value in the interval 0 1 not over the entire id space right this is going to be what d dimension right whatever is the dimension it will be upper bounded that it is possible that all the components can take 1 values. So, depending on with respect to which competitor you are going to base in this case when your U is going to be coming from this one of this unit vectors you have a low value for the L1 norm right in this case because of this you may you are also notice that if you are going to use a regularizer which is strongly convex in the L1 norm that is giving us may be somewhat you may want to use that one because it is what is mattering is this constant right here it may be turning out to be smaller. So, when you are going to compare the regret bounds right fine this L 2 n factor is there both of them may be let us for time being ignore d what now worries for you is which is that which one has a larger Lipschitz constant now and you may want to choose a regularizer which will result in smaller Lipschitz constant right. So, based on that you may want to decide and what is this Lipschitz constant here is the Lipschitz constant here is for your loss functions that is covered actually governed by the norm of your regularizer. So, you see that how what is the regularizer you are going to use that is going to affect the Lipschitz constant and that is going to affect your regret bound. So, depending on what with respect to not what is your regularizer with respect to norm and what is the corresponding Lipschitz constant it is going to give you may appropriately want to choose what kind of regularizers you want to go about ok and that is also going to affect your regret bounds ok let us stop here.