 So, now you see that this quantity what I have done is basically I have expressed it like some quantity here. So, what is this? This quantity if you look at this, this finally this just needs a vector right. So, this is a matrix product, this is a matrix and now this is a vector, this product gives you a vector and this denominator is some constant. So, this quantity is a vector and now you are looking at it is in a product with this quantity. Then it is like similar to what I was basically doing it here right, but instead of d I have replaced it by this entire quantity. Instead of this vector d I have another quantity here. So, if I want to now apply some bound probability bound on this like in this fashion the way I did, I need to just then consider this and then use whatever I have done here, but there is an issue here when I did take this inner product here right. This d is a particular arm and I took the inner product on this error term here, but now here this quantity whatever which I want to treat it as a d it is not a fixed quantity, it is a random quantity. Why is that? Because theta hat has come into picture. So, now if I want to apply any tail bound on this like I want to consider what is the probability that this guy is going to be greater than or less than or equals to beta something of that sort. I cannot directly use what I have done here because here to do this I have to deal the projection of a random arm on my error term. So, how to now account for the randomness in this quantity ok. So, for that we have to do some more applied some more tricks and that trick comes something called scouring argument. This quantity here is random and this will lead to a column vector which could be any it can take any possible value in my i d space right. Because this 3 t hat is a random quantity because of this this quantity can be anywhere in my i d space I do not know what is this value is it depends on the value of theta t hat and also other quantities theta star and what was the matrix V t at that time. So, to do this it can take all possible values many possible values in the space i d we are going to first discretize this possible values and now try to focus on the discrete possible values. So, this is this can be any value in i d space which could be like countably uncountably large. So, what we are going to do is when we have such a situation right we usually go for and apply the union bound right we consider. So, right now this quantity has become a random quantity for us because of this estimated term here. Whenever we have a randomness quantity if you remember when we did it in the multi-amp banded. So, you remember in the multi-amp banded the number of the times are pulled is a random quantity right. How did we and when we wanted to apply half-ding similarity how did we do it? We consider all possible values of the number of pulls and then took a union bound on that. So, but there applying a union bound was okay because the values taken by the number of pulls was only finite that is 1 2 all the way up to t, but here this quantity can take uncountably many values. So, first what we are going to do is we are going to discretize it and then we are going to see that whatever possible value it can take we will we are going to say that there exist a point in my discretized space which is going to be arbitrarily close to this. So, we will make that notion precise and now once we have discretized we have only finitely many points in that set and then we apply a union bound on that. So, what we mean by? So, we identify finite set C epsilon belonging to R D such that. So, let me call this entire quantity y for whatever value y takes there exist some y belongs to C epsilon that is epsilon close. So, what I mean by this for whatever value of y there exist some y such that this holds where y belongs to my C epsilon. So, understand this what I mean by covering this I am basically saying that see my set of points that is the possible values y can take it could be uncountably many. Now, I want to make an approximation of this by a finite points such that whatever value this guy takes there exist a point in my set such that that is going to be epsilon close to that value y that is whatever value capital y has taken there exists a small y from my set C epsilon such that this difference is going to be less than epsilon this is what basically I have done the discretization right. Now, the question is whether such a C epsilon exist and if at all it exist and of course, that C epsilon I have denoted as this is a substitute epsilon to mean that that depends on the size of epsilon. So, naturally if you are expecting a small epsilon what do you expect the size of C epsilon to be? It is going to be large right ok. Here is a lemma which says existence of such a covering set. So, I am going to denote a set first S of D to be such of it is x such that what is this set SD is saying SD is such of all x such that their norm is less than or equals to 1. I mean basically it is a unit ball in my dimension D ok. So, now, there exists a set C epsilon cardinality of C epsilon per bounded by 3 epsilon to the power D such that for all x belongs to SD there exist y belong to C epsilon such that you follow what is this lemma is telling? It is telling that if you have a unit ball then that point that has that can have uncountably many points in that right. Now, we are saying that I can come up with the discretized version of that set which will have at most 3 by epsilon to the power D point such that if you give me any point x in my in this unit ball then I will have a corresponding y in my set C such that their difference is no more going to be epsilon. So, that is basically you have. So, this is your ball let us say now you have discretized it with some final number of points. Now, we are saying that if you give me any point let us say this is the point you have given I will be able to come up with some point which is going to be within epsilon distance from that point. So, I should be able to do it for any point given in my ball I should be able to come up with some point from my discrete set such that this quantity happens ok. So, this is true and we are just going to take this lemma we are not going to prove this ok. So, this is a and this is a pretty handy lemma to know you can use it elsewhere also like whenever you want to do a discretization and by the laws we are going to incur by doing a discretization will be at most epsilon if your number of elements in your discretization is going to be this much, but one has to be also careful that how where this discrete point are going to lie. You cannot assume that in this all my discretized point only lying some small region then if I choose a point here I will not be able to find a good approximation to that point ok. So, of naturally this discretized points has to kind of cover this entire region. So, that is why we are looking for a cover for this set, but that consists of only discrete, finitely many points in it ok fine. So, now the question is this lemma holds only for the case where my points I can cover points which are coming from a unit ball right, but now this set y whatever this y does this come from a unit ball that is whether this y belongs to the set Sd it just let us just compute just compute let us see what this gives you. So, this should be what is this? This will give you mu. So, what is this? This is nothing but y with y itself right and of course, there is a square term here. So, if you just look into this you will see that theta hat minus theta star into V g divided by theta t minus theta star whole divided by norm of V t square root right it is let us see I only want to take this part x. So, let us leave this one V t to the power half ok. Now, just take this x. So, now, let us see whether my x what is this x is this x going to lie in my unit ball. So, just check that this guy x transpose x is going to be 1 ok and now x entire thing ok then I want you to check what is this quantity half of this x V t inverse can you check what is this quantity is going to look like. So, this is nothing nothing nothing but the norm of x right this is again going to be 1 ok fine. So, let us have this let us see ok. So, I have this this properties. So, now, let us come back to this. So, if I am going to denote this quantity as x is that clear that this quantity is x transpose x is 1 and then V t the norm of this quantity is 1 yeah. I do not do V t transpose ok let us do this right what is this quantity this quantity is x transpose V t half transpose yeah this is the numerator right V t inverse V t also there in in in this half yeah may not be similar. Just see what is the meaning of V t half means every element is just been taken square root. That is the basically V t your V t. So, this is right x t V t to the power half V t inverse V t to the power half V t minus half I can again split as V t to the power minus half and V t to the power minus half right. That is fine yeah, but this transpose V t half transpose is it equal to V t. Yeah is that true or not. So, if original V t is symmetric. Yes. If you are going to and every element is that in that positive or not not necessarily. So, suppose let us say first consider the case when every element is positive and then I am going to define V t to the power square root t then this is the same thing right this is again going to be symmetric. Positive definite. Yeah. Let me split it up into 2 meters. The question is if V t is symmetric yeah right it is any V t is positive definite for us we already know that. So, we do not need to worry about it and now the question is what about V t half is it the same ok. I mean just if you confuse just work it out right yourself ok. So, right now let us take this is true and that is why this holds for us ok. So, let us continue now I want to use this after doing this now our goal is what our still goal is to still find out whether if I can bound such a probability ok if I have to use some some beta there. Now, I am going to show that if I have to define a set we will see that there exist c belongs to c epsilon such that now I want to do this set. Now see what I am basically doing is now I am looking for an event where I am looking for a presence of an x in the set C. C is now a finite set for me which violates this condition which in which this inner product is going to be larger than this ok. So, this if I want to look at the probability of this event what is this can I write it as or x coming from c epsilon probability that v t half x theta t hat minus theta star being greater than or equals to true log epsilon by delta is this correct ok. So, I am interested in this event where I am looking for their existence of an x for which this condition holds then I can always this upper bounded by this quantity right. Now, everything x is deterministic I am looking for one possible element in this discretized version. Now, I already now I also know that this v t to the power half into x where x is of this form I already know this is as a norm of 1 ok. That means, this guy is this this points here they are already coming from a unit ball the same v t I am taking there also right. v t to the power to the half to the power half. The norm is with respect to v t inverse x that is right. So, earlier when we applied this result right what would we say we took it any. So, when we started with we just started with saying that d inner product with some quantity right. So, d inner product with some quantity and on so that was like projection of my arm d on the error I was trying to bounded. So, now this is now whatever this quantity is let us say this is now a vector I am trying to project it on this ok. And we had in the first step we have demonstrated this, this can be written as a linear combination of my noise and then I had applied a result to this quantity right. So, what would I show for this? So, now this all these quantities is deterministic there is nothing randomness about this quantity x ok. So, now what is this probability is I have shown you that if we have this quantity we had earlier I had 1 by delta here, but now I have cardinality of C epsilon upon delta. So, if you just go and work this is going to be like x belongs to C epsilon this is now nothing but delta upon 1 upon C epsilon this probability is going to look like this. So, I erased it, but whatever the bound we have earlier if I am just going to look into that we will have this quantity and now this is nothing but equals to delta. So, what now we have basically done is suppose if I restrict take a set C which is like a discrete version of all the points that could have arise from points like y here. So, on that even if I have to remove the randomness here how I am going to remove the randomness here by considering all possible x's that are coming from my discrete set C which has only finitely many elements. Even if I do that that quantity that this probability of error is going to be still delta only and now I have to now translate fine this probability is above bounded by delta, but there was an error when I went from points here to a discretized points in this set right. Now, we have to worry about that part as well. So, now you will see that fine. So, at this point I have not actually used that this guy has to be this norm of this vector has to be 1. It can be any vector here V2 to the power half x can be any vector. Now, for that we know this upper bound holds. So, now let us see where I am going to use the error part theta hat minus theta star. So, this is another result I am going to write which you need to verify. So, this equality holds. So, I am just writing this right away, but you can verify this I am just saying that norm can be written as. So, this is going to be Vt here. This matrix norm can be written as a maximization problem over unit ball. So, how is this please check yes. So, in the root 2 log log c epsilon by delta A. Yeah. There should be some factor of it relates the norm of. Okay. So, fine. If we call this d this Vt half this Vt to the power half and if we call it d. Yes. So, only on the other side. Yeah. So, there is some term which involves. So, that term earlier came because we are dealing with the summation there. Now, that sum is not there on this part. If you now just carefully look into we were there looking at summation for S1 to t. S is 1 here. Just one S1 term here. Yes. So, that is why this summation term has not appeared here. To become like Vt inverse dS whole square. That is what I am saying. So, that was coming because it was making it not one sub Gaussian noise, but that was the sub Gaussian noise that was getting multiplied. The sub Gaussian was getting amplified by that factor and we had such many sum. Now, there was we had a sum of that but now you see that in that I am only dealing with one term here and there is no summation. So, that is why I have to only deal with this. So, whatever you are saying the factor that was coming after 2 that term is not there. So, it is just the same thing, but instead of the summation I am only looking at one term here. Just work it out. So, that is why it is still going to be lambda here and now I am saying that take this inequality for granted. So, notice that we had this state result and this result we are just stating this. So, we just take it for granted them. Now, I am going to do quick manipulations. Why is this almost done? So, now with this half I am just going to do quantity plus then I am going to do it as V t half times y theta hat theta. So, what I have done here I have simply this minimization is an artificial here because I have just subtracted y from x and added y quantity here. So, this y has actually has no effect on this, but I am just introduced it and trying to minimize it over this quantity. So, these are the manipulations I need. Now, if you apply some Cauchy Schott's inequality on this quantity what I am going to get theta t hat minus theta star V t and then x minus y. Can I just quickly check this? This quantity if I am going to apply Cauchy Schott's inequality, I am just going to get this inequality here and then whatever the quantity that remains here, we already know that this quantity being greater than this quantity is very upper bounded by a small probability delta. So, we are going to with high probability we are going to replace this quantity by this upper bound which is now let us see this. What I will do is now this quantity x here we know that they were falling in a unit ball, right. So, I know this y's we have chosen they are coming from a unit, a discretized version of this the corresponding value of y and see I am taking a minimum here. So, I know that if I am going to the minimum value I am going to get on this is going to be at most epsilon. So, now if I do this, this quantity is nothing but norm of theta t hat theta star V t this is going to be epsilon and then this is going to be 2 times log epsilon by delta and now just get rid of this maximum because now there is no more x here. And now let us try to plug in the quantity C of epsilon what is the size of the C of epsilon? We know that the C of epsilon is at most upper bounded by 3 by epsilon to the power d. So, this quantity is. So, what we have finally gotten is theta t hat minus theta t star of V t is upper bounded by this. So, if I am going to further simplify this quantity what I will finally end up with theta t minus theta star is V t. So, there is an epsilon here if I take it on the other side I am going to get 1 minus epsilon I will simplify that that I am going to end up with 2 log going to get this quantity whole quantity it is 2 d 3 by epsilon plus 2 and this is true for any epsilon I am going to choose right. So, you see that I have an upper bound on the difference between these two quantity and this is going to hold with probability of at least 1 minus delta because I have used this fact that this quantity being larger than this lower bounded this quantity is delta. So, this quantity being upper bounded is going to be at most 1 minus delta. So, that is what I have done here. So, this quantity holds with probability at least 1 minus delta. And now this epsilon is what I have chosen epsilon just told how I discretized my unit ball right. So, if I am going to choose epsilon small what is this quantity it is going to be like it is going to be large right, but whereas this quantity if I choose epsilon small this epsilon comes in the denominator. So, this can shoot up ok. So, we have to choose epsilon appropriately because one term there is a tension between these two terms here right in terms of epsilon one guy is increasing and other guy is decreasing . So, how to choose epsilon appropriately you can tune it up appropriately, but let us say for time being we are going to just set epsilon equals to half then we will see that yeah this is one upper bound delta. We will now see that this is upper bounded by we just do this simplifications then just replace this we are going to get it as something half this is going to be 2 is going to be 2 d log 6 plus 2 log of 1 by delta. So, now you see that if my beta t is or beta capital T like has to has been chosen like this then I know that my theta star is going to be satisfying this inequality ok. That means, my theta star is the the norm of this difference with respect to beta t is bounded by this the probability at least 1 minus delta. Now, what we have basically demonstrated is if I am going to choose a beta in such factor I can and look for a set C t. So, how was my C t let us say if I just do this and this quantity same thing here which I am going to call it as some beta I know that. So, this is just theta the theta star is going to line this set with probability at least 1 minus delta right. So, what we have basically demonstrated is through the sequence of steps and as some assumptions may be very restrictive as of now we will be able to come up with some beta t such that if I am going to look at this confidence ellipsoids it is going to contain my true parameter theta star with high probability and this is true for any t. Now, the assumptions I made they were mostly non-practical right like I mean they are not plausible in the sense that I am not going to select an arm in a deterministic fashion without ignoring what has been observed. But if I am going to select an arm in each round observing what has been offered the things are more complicated. There is more coupling that is going to happen because all the action you are going to choose it has been based on all what has been observed and further there is a coupling through this theta star which is common for every reward or that has to be taken into care. But once you have to take into all those coupling you have to still go and use some still sophisticated machinery. This machinery to use this we are mostly used only simple linear algebra and then we have exploited subversionality trail behavior right. But now once we have bit more coupling of the rewards we have to use I mean the analysis that has been shown in the book that uses martingale properties and mixture methods and then you can able to show that indeed even if you relax all the assumptions you made it is still possible to come up with such a beta t here which will guarantee that you can come up with an ellipsoid which contains theta star in in every round with high probability. And once that happens we already know how to analyze right in the first class when we started this stochastic linear bandit I already give a recipe how to get the regret bound if once we can construct this confidence ellipsoids. So, there are assumed how let us say such a confidence ellipsoid exist and then we showed a regret bound. Now we just demonstrated under restricted assumptions that yes such a confidence ellipsoid indeed exist. But we can go further and say even under if you remove all the assumptions we have made in this not all but few we are still able to come up with confidence ellipsoid and then once we have that we also have the regret bounds ok. So, I will leave you to read into that that is bit more mathematical I mean we will get easily lost with that. So, just read how to apply the martingale properties you need to have at least basic no need to know what is a basic martingale is that should be enough I think rest is all algebra just try to go into that. So, this study of stochastic linear bandits will stop here, but now what we do is from the next class is we will continue to study linear bandits, but now we will go with the reverse gear and we will go back to adversarial setting. So, you remember we started studying adversarial bandits then we have shift to stochastic bandits. Now we continued study of stochastic bandits for the linear setting, but now we will go back to study this linear bandits for the adversarial case in the next class ok. Yeah, let us stop here.