 So, first thing, the way we did, now drawing parallel with multi-amp bandits, how we did, how we go about it, we constructed, we had two things there, right. For every arm, we maintained an estimate and the second thing, we maintained a confidence interval about each arm. So, we are going to do a similar thing here. First thing is, based on my past observation, I am going to estimate my cheetah star and then I am going to maintain a confidence interval around it, okay. So, in the standard multi-amp banded, what did my confidence interval tell? My confidence interval told within this region of the estimated value, your mean is going to be like with high probability, that is what we did. So, there for each mean we did that, so there the interval was, there we looked at interval because I was trying to estimate the real numbers. So, on a real number, so I maintained an interval there, but now I am trying to estimate a cheetah star which is a vector. So, now what we will do is, we will try to maintain a ball around my whatever estimate such that my cheetah star lies in that ball with high probability. So, because there, although it was a real number, I used to just maintain an interval, but now I have a vector, so I am going to maintain a ball around it, okay. First thing, so I want to do these two things, so first focus on estimating cheetah star. So, what is a good way to do estimation of cheetah star, gradient descent, no, just estimation, not like optimizing, estimation, I am talking about estimation, so yeah, yeah, maximum less, one thing we could do is maximum likelihood, but this is like a linear case, right. Do you see any simpler thing? So, you guys know regression, linear regression, can we do linear regression here? Because it is just like a linear function, right, so how to do a linear regression here, yeah. So, cheetah star is my parameter, I want to estimate, I am asking how to do, how to go about it, how you pose it, how you are going to set it up, what is the formula that gives you cheetah star or how you obtain it. I want you to give me the current estimate, I want you to give an estimate in every round, yeah. I mean, instead of giving close, just tell me how you got it, differentiate, yeah, what you are going to differentiate, mean square, okay, mean square error, you want to differentiate, what is the mean square error in this case, yes, observed, what you actually observed and how that observation is related to your parameter, okay. So, you note that every time you are observing RT, that actually depends on cheetah star, so RT contains the information of cheetah star, okay, but only thing is RT you are going to observe, it is perturbed by this noise, okay. So, how we could do one possibility is go with regularized least square estimator, so what is this regularized least square estimator, so forget regularizer, what is the least square estimator, how it is going to look like, so the least square going to, it is going to define the laws over period, let us say over time t, you have observed RS by playing DT, let us say, you know that you have observed RS, but all you know is it has actually come from DS theta star, you do not know this, you are going to, if you do not know right now theta star, right, let us say you are assuming for some theta, things are going on, you are getting this over ability, but this whatever value you are getting is, it is nothing but this value, but this value is a noisy version of this, now you want to say which is that theta, which best approximates your observations RS, you see that, what you want to do, you want to take this squared error, this is error, right, want to take the square error and want to minimize it over, you want to minimize it over theta, but if you just do this kind of minimization, there could be some issues like this theta hat you are going to get as an estimate out of it, it may not be unique always, ok. So, just to avoid those cases, you are going to regularize it and by this kind of regularization you can verify this, this function is a strictly convex function in theta. So, once it is a strictly convex function in theta, what we know, it is going to have a unique optimal value. So, let us call that as theta t hat and what is this solution is going to look like, theta t hat, it is going to look like, I am going to use this lambda is some number positive. So, now if you do this, it has a closed form solution which looks like, so this is one of the natural estimator, right. So, in the multi arm banded which was a natural estimator for that, we just average the samples we have observed for that arm, ok, but here what we are doing, we are just fitting our observation to what we are supposed to have gotten, ok. So, we are just trying to, in other words, we are trying to minimize this least square error and try to find a theta which turns out to be having this closed form solution, fine. So, we have one thing now, we have an estimate now. So, the next question for us is, so this is just theta hat we have obtained, right, based on our observations so far. So, what have been, we have been observing, we have been observing the reward I have got in that round. So, this is only t small t and also do not confuse by transport by prime I mean transpose, sometimes I also for transpose I may end up writing t. So, I have been observing ds for all s 1 to t till round t that is the one actually I played, right. And I have observed the corresponding rewards, I have all this information from that I have estimated this. But this was based on a some set of samples. See this R s is still a noisy quantity, it has noise in it. So, theta hat itself is a random quantity now, right. So, I need to have a confidence about this noise estimate which is a noisy. So, how to construct a confidence ball around it, so that this theta hat lies in that confidence with high probability. Now, we fix it, we upper if we choose it something and then fix it. Right now, we have not saying anything like this is a some tuning parameter, you can just take some lambda and you will see that the purpose of choosing this lambda is just to make sure that this theta hat is unique or that this theta hat is invertible. Yeah, sorry, yeah, this matrix is invertible. Had you have chosen lambda to be 0, I was now, we were not sure this guy is going to be invertible. But now given that we had used this regularizer term lambda here, it is just for that technicality that we want to ensure that this matrix is invertible. So, you can choose lambda to be some very small number. I mean, I do not see like there is a any optimal value that you can choose, you can optimally choose lambda here. Now, in all the stochastic linear bandits, the big thing is about constructing these confidence balls. So, if you remember in multi-amp bandits, we have seen two algorithms mainly, which we studied a bit. What are those? One was UCB and another was what? KL-UCB. We saw KL-UCB. So, UCB how did we derive its bound or how did we find out its confidence interval? That came from Hoffnig's inequality. And in KL-UCB, how did the confidence intervals come from? So, it was bit better version of Chernoff. What was that Chernoff? Hoffnig's inequality or just we said Chernoff? I think it was Chernoff for a special case of Bernoulli, right. So, for a Bernoulli case, we had a tighter confidence bound and we exploited incoming there. So, all those algorithms depended on how you constructed this confidence ball. So, here also, now how you are going to construct a confidence bounds that is going to change how your algorithm is going to perform, ok. So, right now we will talk later about how we are going to choose this confidence bounds and how the parameters involved them are chosen. But right now let us assume up generic structure. Let us say I am going to assume a generic structure, some CT confidence interval that I am going to construct using my observation so far. That is my x, how I am denoting my D1 and how you did that. So, this is my action, right. D1, I played D1, I observed R1, I played D2, I observed R2 all the way up to D, T minus 1 or T minus 1. So, whatever I have observed till round T minus 1, based on that I am going to construct a confidence ball and we are going to assume that my theta star is going to lie in that confidence set with high probability. So, we will later see how to define this confidence sets. A specific or a generic example is we are going to choose this theta. So, theta T depends on all these things, but right now I am going to write just a theta, CT, which is going to be usually the subset of some epsilon T which is going to look like this. Right now, beta T is some parameter, I have not, we will specify this how it looks like. But what we are going to assume is it is set of all theta which are going to lie around theta T hat within a radius of beta T. You understand what is the, how this set looks like? So, suppose you have, so this is like in your, let us visualize this in your 3D dimension. Let us say this is some, so this is where your theta T hat lies. So, what you are going to look is, you are going to look for all theta which are going to lie around this theta star within a radius of beta T. But you see that it is not a simple Euclidean distance here. It has, it is a Euclidean distance with respect to this matrix VT. What is VT? This VT is exactly this quantity. CT is a confidence set in which with high probability I like my theta star to lie in. Yes, it is with respect to theta T minus 1. So, based on this I am going to find theta in round T, theta. And that from that I am going to decide what is the decision I should be playing next in the next round DT using a value from this. So, it will become more apparent later. But right now we are just saying that whatever information we have till round T minus 1, I use this to build a confidence set in which my theta star lies with high probability. So, later we will see that is it a possibility at all. Can I cover with some beta T such that such a set contains my theta star with high probability? So, next. So, if I just take this, the normal, what is this? This is nothing but X transpose X. If I am going to take this X transpose, it is going to be X transpose AX, K has to be positive definite. For a positive definite matrix, we are going to define this. So, there is a name for this, right? Is it called a Manelobis metric or is it a Manelobis metric? So, it is with respect to this metric A, we are going to define the value of X like this. So, then we, this corresponds to, so is this a special case of this? For what A? Identity. Identity, when A is identity, we will just get this. It is theta hat T minus 1. Because that is what till tie T minus 1, I would have estimated that. That is the estimate I have about theta star. Now, I know that my theta, theta T hat is a random quantity, right? Now, I want to ensure that around, I know my theta star lies something around that, that theta T hat, theta star. I just want to ensure that what is that region around theta T hat, theta T hat T minus 1, in which my T star, star lies with high probability. So, this is, now we are just trying to define it at every time T, right? So, every time T before I am going to decide what is the DT I am going to play, based on my previous information, whatever I have, I just first construct this set CT in round T. I am going to use this set later to come up with what is the DT I should be playing in the current round T. So, we will see exactly how to do this. So, now, I mean this is, as of now a bit abstract. The way you can just think about this is, this is a ball in which the principal axis are the eigenvectors of this VT matrix, okay? So, you understand what is the principal axis? So, in this case like I have x, y, z, right? But if you just look at this set in this, you can think of the directions given by the eigenvectors of this matrix to be my basis directions for that space, okay? And further, the length of those directions are going to be inversely proportional to the length of this eigenvalues. It is just like so, so suppose you can think of this is somewhere like some ball which is encompassing this like which is that ball which is been described by these axis and whose length is inversely proportional to the eigenvalues of this VT matrix. And further, I am saying this axis length is inversely proportional to the eigenvalues. Then it is a smaller, yes, exactly. So, each direction have different length corresponding to the eigenvector. So, okay, now let us come back to this VT matrix. As you see here, this VT matrix keeps on accumulating this ds values, right? So, if you go from t round to t plus 1th round, what is changing? You are going to add the new vector you played, right? So, in that sense, this VT matrix is going to have more and more component. Do you think the eigenvalue of VT is going to increase or decrease? So, let us say I construct, I just added one more term here dt plus 1 and dt plus 1. So, and that becomes, it becomes Vt plus 1, right? So, compared to VT, what do you think about the eigenvalues of Vt plus 1? Larger? Looks like? So, eigenvalue should be larger, okay? So, he fitted, he did a reverse engineering. Any forward direction? Determinating increases? Yeah, determinant is nothing but the product of eigenvalues, right? So, determinant has increased, means eigenvalue should have also increased. So, does it say anything about, if I add more term, its determinant has increased, okay? So, that is not obvious, right? You can think about it offline. So, if you are just going to add more terms like this. So, notice that ds, ds prime itself is a matrix. ds is a column vector, ds prime is a row vector. Every term here is a, this is basically nothing but sum of matrices, right? So, this is one matrices. This is one matrix. What is this matrix? This is a matrix of identity, identity matrix, but the diagonals not one, but lambda. For each s, this is another matrix. Is it true that each matrix in this sum is a PSD? They are all rank one matrices, right? So, if you keep on adding many PSDs, what is going to happen to its rank? And what is going to happen to its eigenvalues? So, I will just leave it to you. So, just think about what happens if you are keeping adding more and more positive semi-definite matrices to another positive semi-definite matrices. Is it either eigenvalue is going to increase or not? Let me just see. So, naturally I am now just saying that they are going to increase. Verify that. If it is going to increase, what you are going to see is as more and more data points come, I am going to get a confidence interval which is shrinking in all the directions, right? So, I should be more confident. I should be able to come up with a smaller balls in which my theta star is contained, okay? So, let us at this point compare the difficulty in what we had in multi-amp band aids and what we have here in having a confidence intervals. So, when we did a confidence interval in the multi-amp band aids, how we did? We used Hoffting's inequality, right? But in deriving Hoffting's inequality, what we had? We had all IID samples. How did we? So, just go back and now recall. What we, let us say we have pulled an arm in the standard multi-amp band aids n times. We have got n samples which are IID, right? We just took the average that will give me a sample mean and then try to see how that sample mean deviates from my true mean by applying Hoffting's inequality. So, the things were easy there because it is easy to handle IID samples. But now, do I have that luxury here? Theta t are, I assume they are all independent but you see that theta star is there in every RT, right? All my rewards samples are all correlated because this theta star is there and further even my choice of dt in every round are going to be correlated because this dt has to be somehow dependent on, in every round depends on the past observations, right? Those observations themselves are correlated. So, even my choice of arm dt here, they themselves are correlated. So, because of that coming off is such a confidence ball against my estimation is in general hard here because of that heavy correlation across different rounds. Do you see this? Because even though it is good, like I have assumed that this is simple nice structure but this structure has made all my observations correlated. So, I cannot go and simply leverage the results which I have for the IID observations. I have to somehow take into account the correlation across them that is why this is all popping in, this VT matrix. VT has all the observations till round T minus 1. VT is contains all the observations you have. So, far and they are going to determine how your confidence ball look going to look like around your estimates, okay? So, that is why when we move to this stochastic linear banded, the estimation of theta star and building, estimation is easy. We just saw it is nothing but least square, regularize least square estimation, but confidence ball estimation is involved, okay? So, I said that the eigenvectors of this matrix VT, they are going to define the principal axis of this ball I am looking at. And I also said that the length of the principal axis is going to be inversely proportional to the value of the eigenvalue corresponding to the eigenvector. So, each eigenvalue are corresponding eigenvectors, right? So, let us take this particular eigenvector direction. The length of this eigen of the axis in this direction, this is going to be inversely proportional to the eigenvalue in that direction or the corresponding eigenvalue of that eigenvector. And we are just saying that eigenvalues are increasing. It is not necessary that all the eigenvalues are increasing. Some may stay constant because this matrix has how, if this is invertible, how many eigenvalues it will have? If it is invertible, that is a full matrix, right? How many eigenvalues it will have? Nonzero eigenvalues, it will have D. So, some of them is, may not be increasing, but some may be increasing. So, none of them is not decreased. Again, see like we have to go back and see that if I have adding a positive semi-definite matrix like this, whether its eigenvalues are going to increase. So, we have to go back to this analysis if you want to make that argument. But let us say with this they are going to increase, if not all, some are increasing. So, because of that, this principal axis lengths are shrinking, right? Yeah? So, yes, if the eigenvalues of this matrix have increased, then the size of the ball has decreased in the axis, in the principal axis along which the eigenvalue has increased. And we are saying that eventually every principal axis along all the directions the eigenvalue is increasing. So, because of that this interval keeps on shrinking. So, let us try it as a special case, the case where my DT is unit vectors, let us as a special case. So, I have told you already if DT is this, it is nothing but stochastic, D I am bandit, right? So, we already know how my confidence intervals looks for this case, right? So, let us see. Let me denote ds. Now, instead of ds, I am going to just write it as, let us say es, es is nothing but es. That means, okay, so let me write it as eit or maybe dis simply. So, this is is what? So, ds is the decision I made, okay, instead of the, let me see. So, dt is the arm I pulled in round t from the set dt. So, now this is this, let me call that whatever the dt is simply eit. That means I pulled an arm whose only ith component is, which is only ith component is 1. So, now let us go back and plug it here. So, what we are going to get? So, what is this? es is just a unit vector only in which is component is nonzero, right? Now, what is this is going to tell you? What is this going to look like? If I am going to, okay, let me write it as a matrix now. So, each element here is a matrix. In that matrix, how many components are nonzero? Only one component. And what is that component corresponds to? That it is index, right? Okay. So, let me call this as 1, 2. So, corresponding to D. What is the first line going to look like? 1. So, is it like lambda times, number of times, till time t? How many times arm 1 has been played? Or how many times unit vector 1 has been played plus lambda? And everything is going to be 0 like this, right? What is the second row is going to look like? And similarly, what is this going to look like? Now, if you are going to look at VT plus inverse, what it is playing the role? It is just like 1 by these quantities, right? It is just like a diagonal matrix, right? It is going to play the role of this. If you are going to have played and this quantity here, when I am going to look at this VT inverse, right? This is nothing but 1 divided by this quantity, 1 divided by this for every component, right? And what does this look like? It is going to look like, so this is what? This is going to be a row vector, right? R is a scalar, ds is what? It is a column vector, right? Now, if you are going to look at over t rounds, what this is going to look like? So, I am just going to treat summation ds, rs, where s is 1 to t. Now, ds, I am replacing it by dit, dit. What it is going to look like? This is going to be a column vector. What is the first number is going to look like? Some of the rewards from first term, right? Whenever unit vector 1 is played, I mean the first unit vector is played. So, that is like total reward, how to write this? So, I am just going to write it in words, okay? Total reward from arm 1. Similarly, second one is the total reward from arm 2, like that, right? Now, in this case, in this special case, what is the total reward from arm m? It is nothing but the rewards collected from a distribution whose mean value is theta 1 star, right? Because whenever, so I removed this, right? Like rt is nothing but what I said, it is s theta star plus noise, right? When this ds is a unit vector, only ith term, so then that means only that component remains plus noise. So, that means this is the total reward collected from arm, that is nothing but the total reward collected from a distribution whose mean value is theta 1 star and this is the total reward collected from a distribution whose mean reward is theta 2 star and like this. Now, what is this theta height in that case is giving you? Now, just combine these two informations. So, this is vt inverse and now this is the total reward across all this. So, now you see that what is my theta i hat is going to be? Yeah, so what is that is going to turn out to be? Total reward in the ith row divided by lambda n, right? So, this is nothing but what we had earlier. This is exactly estimation that we had gotten earlier, right? Except the fact that there is a lambda has come into picture here, but that is fine. So, this is nothing but let me theta i star divided by whatever I have. So, in a way this will give me theta i star and this decouples for every component I can find it like this, right? So, for ith component, this is theta. So, overall I can now find out theta i is nothing but this quantity, right? Theta 1 hat, theta 2 hat, nothing but theta d hat, am I correct? So, I am just applying this theta hat formula whatever I have here with vt hat computed like this and this term computed it like this. So, this now just gives a sample mean, right? For ith arm. Now similarly, now if you just go back and apply to this, you will see that this is nothing but a similar confidence bound what we have in the case of multi-amp banded setting with this upper bound. So, remember in the earlier case we have this multi-amp banded what was the confidence term? It was like some 2 log t divided by number of pulls, right? You see that now because let us just write down this. What is this theta minus theta t hat, right? So, this is nothing but summation of theta i minus. So, this is with vt minus 1, right? So, now what? So, this is by definition is nothing but theta t theta t hat transpose vt minus and theta t minus theta t hat. Now, the structure of vtt is like that, it is diagonal. So, because of this diagonal nature, what is going to happen to this? So, every term here what it is going to happen is this term remains like this, but it is going to be multiplied by the square of the corresponding differences, right? So, this is what? Like so, now if I am going to look it like this, it is going to look like what? Theta 1 minus theta 1t hat squared divided by this is just vt, right? This is going to nt 1 plus lambda. Why summation? So, now what we are saying is the total sum across all these directions is this, but this is not looking like a confidence terms we had in the Bandit case, right? Okay. If I am going to say this is now I want this to be vt, right? This I want to be less than vt. So, if this is constant, now on this sum, it must be the case that for those which have large of this quantity, this difference is going to be the smaller. Yeah, this constant to hold like this sum is going to be larger there. So, you know what it is saying? So, what I am doing? I am looking for all these thetas for such that this holds, right? So, what all these thetas will be? If something is already large, I mean the number of pools is large, then this theta component is going to have a smaller range, right? Beta is fixed, beta t is fixed. That is like this is for a given beta t. Yeah, whatever. So, I just wanted to ensure that we have vt here not vt inverse, but at least with vt, we will see that if some component has been explored much, then I already high confidence and I will be its range is going to be smaller now. Let us discuss how the optimisticity comes in the next class, okay? So, with this confidence set, we will see how to try to play an arm optimistically in the next class. Okay, let us stop here.