 So, today we will continue our discussion on our stochastic linear bandits. So, in the last class, we just argued that if you make certain assumption about my confidence set and about the mean rewards and also about the norm of my context or norm of my decision element, we came up with a method to give a regret bound, right. So, can anybody tell me what was that bound? Finally, we derived last time. So, that said my regret band is compounded by what? Two times, ok. So, today what we will do is we will just try to understand if at all it is possible to come up with a confidence set which satisfies our assumption. What was our assumption? We said that there exists some set C t which we could define it as theta such that norm of theta hat minus theta is theta t with respect to theta minus t. We say that such set such that theta belongs to star with high probability for all t. So, we wanted that it every time you have this confidence set, we henceforth will call it as confidence ellipsoids such that because now it is like a ball around centered around this point theta hat. So, now the question is such a thing exists, if at all I can define some beta t for which this holds. So, today we can again before we going to state it formally like such a beta t exist, we are again going to make further more assumptions and now try to see that if at all it is possible under a bit more restricted cases, then we will see that how to relax that restricted case to a more general case where we can get such a beta. So, the more general case requires us to get into something with more sophisticated like martingales and mixture methods, we will maybe we will not just go into that, what we will just talk about is some basic ideas and just point out where the method of mixtures and the martingales may have to be used ok, but just we will try to get a sense of how this beta t can exist ok. To now such a to understand this we are further going to make a couple of assumptions before we see how it looks like, we are going to first set lambda equals to 0 that is there is no regularization before I write this. So, how did I find my theta hat? Theta hat we find by minimizing a least square regularized least square function right. So, what was that function? We defined a function which was this was our loss function we defined and we try to minimize it over theta and what it does give us? So, this will this gave us theta if you just minimize this, this is our ruin of theta and d of my L t theta, this turned out to be V t lambda inverse summation R t times d of s, s we have already observed right, this is going to be the solution. So, what I mean by? So, earlier I had just defined V t, what was the V t matrix? So, lambda i plus some outer product sum of outer products right, but that was a function of lambda that depends on which lambda I chose. So, that is why I made this a function of lambda here. So, this V t lambda is nothing, but and this s is going to be 1 to capital T ok. Now, let us try to slightly simplify this, what is this estimate we have gotten here? This quantity here, so this is 1 to do it only to t minus 1 and then at time t, this is what my loss is going to be look like ok. Now, my theta t hat just, so I am just replacing my V t lambda function here, V t lambda is lambda i plus d of s, d of s transpose s equals to 1 to t times. Now, what is my R s function here? What is my R s? What is R s here? Reward that I observed in round s right and how was that? R s is, R s is the arm I played in a product theta star plus the sub question noise. So, this is the noisy version of the reward I got in round s. So, this is simply the reward sample I observed in round s. So, let me replace that quantity over there. So, what this gives me here? So, I just replace R s by that quantity that is d s and this is multiplied by my quantity d s. So, now, before I start further simplifying these things and see how my confidence ellipsoids are going to look like. I am going to make couple of assumptions now, simplifying assumptions ok. First thing I am going to assume the case that that is I am going to assume lambda equals to 0 that is the case corresponding to no regularization. Second independent. So, I am going to assume that this noises here are actually independent sub question noises. So, earlier in the original set of how this noise term is assumed to be it is conditionally sub question, but now we are just assuming that they are all sub question and they are independent. So, these are my simplifying. Third point we are assuming that this d 1, d 2 all the way up to d t chosen without the knowledge of sorry. So, this was R 1, R 2, R 2, R 2. So, what is that we are assuming? First we are assuming that there is no regularization. That means, if there is no regularization here the thing is this inversion may be not well defined. But if we had this regularization lambda greater than 0, this inversion was always well defined. But let us ignore that fact. So, let us set this lambda to be 0 and then we are going to say that my eta s are all sub question then they are independent. The third point this is the very bad assumptions as we are making with this saying that the arm plate in till round t their selection is made without the knowledge of R 1, R 2, I t. That means, we are right now saying my algorithm is kind of not adaptive, right. So, every time I chose my arm it dependent on actually what I have played previously which I am and played and what is the corresponding reward I got for that. But now we are saying when I making this assumption like basically I am ignoring the reward I have observed so far and then making a choice of R, ok. That means, we are kind of decoupling the dependency of the arm I chose in round t on the observations I have made so far. This is and this is actually not true, but let us say we are making this assumption. So, when we had our setup if you have to learn we have to make sure that we are going to choose the arm I am going to play in the next round based on what observations I have made, right. But as I said that makes things complicated to analyze that is why we will make this simplifying assumption, ok. Let us try to analyze. So, now just try to let us try to follow up the sequence of analysis how it goes. Then we will see that what we actually wanted we just now as of now our focus is to see that will I be able to come up with such a set which contains my theta star with high probability for every t, ok. So, the steps we are going to go through it is going to be certain sequence of manipulation. Just follow this manipulation we will arrive at this step at some point, ok. So, now first we are going to look at this quantity here. So, what is this quantity basically? Theta hat is your estimate in round t, theta star your true parameter. So, theta t hat minus theta star is the error you are making in that round. But what this is basically telling is the projection of this r s x on this difference, ok. So, instead of x let me just call it d because you are denoting my decision points as d these are all vectors. Now what is this quantity? This quantity is nothing but d and now I am going to replace my theta t hat by whatever I have by making this lambda to be 0. So, this will give me what? This will give me summation s equals to 1 to t d s d x transpose minus of this summation this quantity over here s equals to 1 to t. So, what I will now do is, so I am just substituting this quantity for theta theta hat here. What I will do is, I will do this, I will pull this d s inside. So, that will give me I have just replaced the value of theta hat star from this equation here and just bought in this d s inside this. So, now if you see this, if I just simplify this d, if I pull this matrix inside this theta star is constant that is fixed. Now if I multiply this quantity here this first term with this that will give me an identity. So, let me write this. So, I have just expanded this inner product here. So, now you see that this quantity gets matrix and I am multiplying by inverse. So, they get cancelled and now what remains is theta star. This theta star will get knocked out with this sin theta star I had. So, finally, what remains for me is this projection term here equals to d times. You know why did I write minus here? This should be plus right. This quantity here s equals to 1 to t d s transpose and this is still minus 1 here and then what remains for me is this quantity here summation d s into theta s. So, this is just I got after my simplification. So, in this case yes that is why that is why this makes sense. We are just assuming that let us if my lambda has been positive that would have been always the case, but now I am forcing let my lambda equals to 0 and trying to assume that holds ok. And now just let me just to make that more explicit let me write it as v t of 0. So, this v t is 0 is nothing, but this entire quantity just simplifying this further what I had is this is the inner product between these two. So, this is between s 1 to t and if I just take s equals to 1 to t I can further write it as d times v t of 0 y inverse times this guy d s here into theta s. I have just simply manipulated this like. You just notice that everything is fine like if you just take the inner product of this right this is inner product of d with this matrix and d s. And now that is summed over all s equals to 1, but I have just pulled the sum out and I have I am just writing it like this ok. By doing what I have basically done is this projection of my arm d in the on this error term I have written it in nothing, but a linear combination of the noise terms n s is noise term right n s is the one sub Gaussian noise. And what is this? So, this is an it is multiplied by some weight which is nothing, but the inner product of these two quantities and that is some weight and now this is noise. So, what I have basically done is this quantity is nothing, but weighted sum of the sub Gaussian noise. And if I want to bound this quantity I now has to bound this weighted sum of the sub Gaussian noise ok. So, now, suppose let us say I am interested in how big this quantity is ok. So, suppose let us say I wanted to ask the question of probability that d times theta d minus theta star is greater than let us say some number. I will we will see why I am using this number, but let us say I am interested to know whether it is larger than this number. So, I am now dropping this 0 I am simply writing V t inverse here square log 1 minus delta. So, now that I have written this quantity is same as this quantity. Now, this probability is nothing, but probability that s equals to 1 2 t d V t inverse this quantity being greater than or equals to ok. So, now, what I have that this quantity is weighted sum of the sub Gaussian noise. And I am now asking the question what is the probability that this quantity exceeds this value? How you are what result you are going to apply to find this? So, we have already bounded how the tails of sub Gaussian looks like right. So, let us try to apply that here ok. Is this quantity entire quantity here is a sub Gaussian? So, if it is sub Gaussian with that parameter, so we had if x 1 is sigma 1 sub Gaussian and x 2 is let us say sigma 2 sub Gaussian, what is x 1 and x 2 are like? So, we are using only sigma 1 or sigma 1 square for. So, then this is like sub Gaussian right. So, now, eta s is what one sub Gaussian, but now it is multiplied by some number. If x 1 is sigma 1 sub Gaussian, if I am going to multiply it by some constant c, what is c x 1 is like? It is like distributed like mod c sub Gaussian. Now, what is this quantity is like? So, this is going to be sigma Gaussian sorry this is going to be sub Gaussian, but what parameter right? So, this weights are going to be. So, initially it was one sigma Gaussian. Now, each term in this summation is going to be this much time multiplied sub Gaussian. And now, if I want to find the sub Gaussianity of this sum, it is nothing but square root of sum of each one of them ok fine. Now, I know that this quantity is sigma sub Gaussian with some sigma. Now, I want to bound this quantity to be upper is at least this much how to find that? So, let us say. So, if some x is let us say x is or let us say x 1 is sigma 1 sub Gaussian, what is this probability? We have discussed this right, we have this probability. What is this quantity is like? Upper bounded as? Exponential of. Exponential of. There is no n here, there is only one random variable considering that is sub Gaussian. What is this quantity is like? Minus 2 minus. Minus, was there any 2? I am only looking at one sided right not two sided. So, this is simply 2 upon square divided by 2 sigma 1 square. So, we had already discussed this right when we talked about a set of concentration equality. Now, let us say. So, let us say this x 1 corresponds to this quantity over here and this epsilon corresponds to this quantity over here ok. Notice that I cannot read this quantity here as a constant because the way I am making my assumption ok. So, let us say whatever I have observed so far right these are all the random quantities, but I am not making my decision based on these random quantities. I am just playing some arms in every round according to some and I have to also have to be specific that they are chosen deterministically. Yes, they are chosen without knowing what are this random quantity and also that when I choose them I am not choosing randomizing my choices. In every time I am going to choose something deterministically. For example, one strategy could be ok I am going to choose my arms in a round robin fashion. In the first round I choose D 1, in the second round I choose D 2, in the third round I choose D 3 and after I exhaust all of them then I will come back and choose. Let us assume that that is one such deterministic case where you have just ignored what has been happened and we have been just simply choosing. So, let us see if I apply this what is the time going to get. So, what is sigma 1 for me? Sigma 1 for me is nothing but the sum of the squares of these two, ok let me see. So, now this quantity summation D VT inverse D s s equals to 1 to T this is what this is this is like is this correct into 1 right because each eta s is 1 sigma Gaussian. So, because of that I only have to worry about it. So, before I did I have this is only D here. So, now let us go back to that now because of that let me write it here probability that I am just repeating that quantity there this being greater than 2 times. So, this probability should be what this should be upper bounded by I am now just going to apply this quantity here. What is epsilon for me? Epsilon for me is this quantity over here. So, I am going to get exp epsilon is. So, after doing this square I am going to get 2 times and there is of course log this is the minus here because of this minus log 1 by delta and what is this quantity here to what is sigma 1 square? Sigma 1 square is this entire quantity for me that is. So, and this is a square term here. So, now if you see that this cancels this cancels what remains here for me is only exp minus law 1 by delta and what is what is that quantity is? It is just delta. So, now let us try to understand this what we are able to argue is if you are going to look at this quantity which is nothing but the projection of my decision on the error term being larger than this quantity is very small that is it is upper bounded by delta. So, what we are just showing is this projection will not be larger than this quantity and that depends on this whatever the delta parameter we have chosen. Now, we want to now convert it into whatever this result we have. Now we want to convert it into the case where how to come up with this decision set right. So, let us see. So, let us say so this is one part we have shown. So, what we have shown is now this is less upper than equals to delta. So, we have this part. Now we want to translate. So, in my confidence set if you recall whatever the CT I had I had the terms like so I need to first now try to get how does this term look like ok. So, what is what is the probability? So, what do you want? We want this happening for if it happens if I consider a set which satisfies this condition I want to show that my theta star contains this with high probability right. So, now whatever we have shown here now try to connect that with this quantity theta t at minus theta star and now instead of theta I want to specifically want to show that if I take theta star this will also satisfy this condition that is why that theta is contained in that. So, now let us take my theta star here and now let us try to connect whatever we have with this quantity. Now just let me take this here I think this is theta minus 1. So, this can be written as theta t half some y I will tell you what is this y and or maybe just can directly write it as into V t half into theta t hat minus theta star this whole divided by theta t hat minus theta star comma theta t hat theta t half. So, do you agree that if I take this in a product this is exactly equals to this. So, what is this? This is simply V t matrix right just take. So, and the denominator is a constant ok. So, now if you are going to take the transpose of this quantity this is this quantity transpose into V t multiplied by this quantity. So, this is nothing, but what? So, the numerator is transpose V t transpose, but that is symmetric matrix. So, I do not care about this then theta hat minus theta star this whole quantity divided by theta t hat minus theta star right, but numerator is nothing, but the squared quantity of the denominator. Yeah right right numerator nothing, but the squared quantity of the denominator that is why we get it back.