 The feature map function here now we are saying it is depends on both x and a right. So, now let us take the example of again your recommendation system. Now what you are going to do is users that are coming to a system you are going to categorize them. Maybe you can say through different different category you can say like you it could be male, female, a old, young or from this geographical location this geographical location you can just atomize and the movies or whatever the product you want to recommend you can also categorize them. For example, if you are going to show a product you can say that this is related to sports, this is related to some academics or this is related to some daily use material whatever you can put them into different different category. Now depending on what is the context that context we have observed and now what is the corresponding arm the phi of x, a is going to tell what is the common feature for this pair. For example, let us say you let us take this example ok. Now let us say you have a some product and there is one guy here. This product you have come up with a categorization of this product. Let us say this is sports or it could be movie related or it could be let us say sports what whatever movie or some joy some something this product can be anything I mean but you are going to put it in one of this category and this users you can put in a category like you can say it is x its age group maybe its location different ok. Now how you are going to make a feature map out of it? So, suppose let us say this is for a so for every guy you are going to set this features and for every product you are going to set this feature. For example, you will have a collection of products and different different people are logging into you. For example, let us say you happen to show a DVD which correspond to some movie. Now for that how you are going to assign definitely it is not a sports related item that component you are going to set it as 0. It is related to movie maybe you can set it as 0. Joy I do not know you can set if it is thinking it is a good movie enjoyable let us say could be 1. Now for this person let us say a girl happen to enter this website for her it could be let us say sex can take 1 or 0 it could be like let us say 1 for female we have this and also you have class of age you have two things very old and very young and let us say it is a very young you can get this and location also maybe you would have categorized and let us say you got it. You see that depending on that user and depending on the product now you have this feature map like this which is given by this guy phi x comma. So, this whatever the feature map I have written here right it depend on what was the user and what was the product right. So, that is why it depends on both the user and the product and I can generate such features and it is all like all the recommendation systems are exactly working based on this features extracted right when you are going to log in the way it stores your information may be through such categorization and generating features out of your profile information yeah. Yeah, known, known a priori this is what like you have already built in like you have already by some experts you have already built in how to map a given pair of context and a product like featuring like this and I have just given them like this feature of extra can run in a have dimension of millions in it right because you can have so much of categorization. If you want a very more precise narrowed on information you will have lot of features you could be logging in monitoring what is the time of logging in what could be the season so many information right because more information you have may be better decisions you can make, but that is a known map that you have come up with offline somehow with lot of discussion with your product team how you should be generating this feature maps yeah. Sir, these are the inputs to the feature map. No, this is the output of the feature map for feature map this is the input you have that is what I said right like so in this case here product is the one you want to sell. So product is your kind of arms here when a user comes you want to see which product you want to sell him ok. Let us say you have thousands of product now your goal is to show him the right product. Now users log in right like there could be so many users now given a user and given a product you should be able to come up with the features like this if instead of this so here I said it was a DVD right instead of DVD let us say it is a tennis ball. Now in that case this feature will change to 1 0 1 may be right. So depending on the product and the user this feature can change. Ok, it is called it is together together so you can decide like how you want to come up with this vector but only thing is I need to tell you what is the context and what is the product for which you are want to generate this ok and now this theta star is unknown and this part is known. Now what it boils down the problem boils down to determining this theta star if I know this theta star I already know my reward functions right. Now the question is how I am going to determine this theta star from the interaction with the environment which is generating this context. Now what is that? It is a constant it does not change over time and it is independent of context and actions it is but for a given environment it is fixed if your environment is going to change maybe it can potentially change right. So, you remember in the bandit instance how we identify a bandit instance through the mean values right of the arms and if the mean values of the arms change it is a different bandit instance. Here if I have one theta star that is capturing one instance here and if theta star changes that it may be a dealing with a different band different instance. Feature maps we will we will freeze it like because that is known to us right there is nothing you this is only unknown this is known to you whatever you are going to use it is fine you do not care about it ok. What matters is the one the quantity which I it is not known to me it should not change while the game is going on we are going to assume that that this remains fixed throughout the game and ok in that in that sense we will also assume that once we fix the feature maps we are going to stick to them throughout the game. This reward map I have written in such a way that this guy does not depend on x and a and this guy depends on x and a but it is known to us ok. If you for this setup here it is making it linear but you could assume anything. If you do not assume anything about this values as I said I mean it is as good as like you have to treat each pair as a different arm and you have to learn over all possible pairs that could make the regret too large right. So we want to that is what and we also said that in such when you have such contextual information it is natural to assume that one context to reveal some information about the other. So that is why we have having this common parameter which will make whatever information I am going to get about one arm I mean one context and arm fair to extract some information about other context arm fairs. Now as I said now in this the problem boils down to like if I know this theta star I know my reward function already and then I have I can know how to play optimally but I do not know this theta star. Now how to find this theta star? In general whatever dimension this theta star it is right we assume that that dimension is known ok and we are going to assume that further this theta star the norm of this theta star is bounded. So we will further assume that like if you what does this mean we are basically saying that this I do norm of this is going to be some bounded quantity you know the bound we assume that this bound is known to us ok. Now what is this basically saying that suppose now if I am going to look this suppose let us take one context pair and take another context pair. So let us say x a and one pair and x prime a prime is another pair. If I have this assumption this I can write it as so this theta star is common right it does not depend on what is x and a I can just apply my standard order equality on this to get and then I can just now treat as if I know putting this assumption like this theta star is upper bounded by like this is like some assuming that this function has this kind of lipstick lip ships property on it you understand what is the lip ship function. So what is this basically this function here is nothing but as somebody said this is a linear function in theta star right ok. So I am just so what is the lip ship function if I have f of x minus f of x star what is this? So in the appropriate dimension whatever right so this is now similarly what we are basically doing is for a given theta right now if I am going to look this as a quantity as a variable now what I am going to get is just this quantity where now I am just putting a constraint on this theta star norm to be bounded is basically saying that this function is lip sheet with that constraint L ok. So we are just going to assume this and that L is known to us ok. So it basically limits that my theta star lies in some bounded space ok. So now I want to look at a slightly. So now we are going to stick to this assumption that my rewards are linear or they are parameterized by some theta star ok that is unknown to us. Now we are going to look it in a slightly different angle and that is going to that is another well studied setup. What we are going to now do is what is basically happening to us when a context came to us when a context came to us this was my arm 1 arm 2 arm 3 right what I am basically doing what I am basically doing is first I am looking for I have already pre computed them let us say my feature maps give me for every action pair what is that. So what I can do whenever a context comes to be I can think of for every action A I can look for this feature maps for that context and then what I have to basically do is look for a action or look for a feature. So I have this many features now which I have extracted from my feature map for a that particular context. Now I have to basically what I have to do is now because I know my reward is inner product right. So all I have to now do is find a feature now x is fixed because x have observed now I have to do this find that feature map that maximizes this product ok. Now the problem has boiled down to looking for a feature vector that maximizes the product ok. Now because now I have kind of abstracted out this abstracted out this arms through the features now. Now it is about looking for which is the best feature that optimizes my function in that round T. So now it is about like now I can think the problem forget throw away the arms now. Now it is about which is the best feature I have in that round ok. So in that way I can now think of in every round I have a bunch of features right and I have to decide which is that feature which maximizes this function right. Because I assume that already I have told you I know this feature map and every whenever a contest come I have this feature vectors. Now the problem boils down to maximizing this linear function for those feature vectors ok. So now we then we can just forget this and now we can think of as a linear optimization problem over a feature feature set is that makes sense. So what we are now basically saying is now forget about my arms and the context coming. Now I am thinking that in every round I have a feature set every round I have a feature set like this. So xt give me this feature set right because I knew already a I knew this phi function. So I have this feature set. Now I can think of in every round I have basically a feature set and let me call this as dt. So now I am going to think this as in in every round in round T I have this feature set and I have to decide which is the feature that maximizes my product ok. So what I am I am saying is I what I was doing earlier in this setup I am trying to find in round T I was trying to find an at star that is maximizing this linear function right. But now this feature vectors this set I am denoting it by set dt. Then this problem boils down to see this dt is or a collection of features now. This entire set as I said I am I am calling is dt right. So this set is collection of features that has come to me in round T. Now I am looking for a feature now I can equivalently write this as look for whatever it is a feature or let me call it as yeah dt. dt star which is arc max of d belong to dt times d theta star. And this can I do this can I map my problem of. So whatever the initial setup I give we are going to refer to that as. So this setup where we have k arms and the context are coming we are going to refer to this as stochastic k arm bandit ok stochastic contextual k armed bandit right. Because this was like exactly the same as my stochastic k arm bandit. But it was just like we have the context there. Now after going through once I made my assumption that my reward function is linearly parameterized through this theta star. Now arms has no I can I do not worry about what the arms what I have to worry about the features ok. So if I know which is the feature vector that maximizes this function I already know what is that arm that is good for the that particular context X. So that is why what we are now saying is I can think this problem ok. So now I am going to say that I am going to now going to abstract this. Now we are going to abstract this. Now in every round I have a decision set DT that I am going to call it as simply decision set which involves set of vectors which are feature vectors. And now that is revealed and now what I have to do is in every round I have to pick one feature vector from them which maximizes my inner product. So now I have to choose a DT star belonging to. So this is I want to do this and now I want to identify this feature if I can identify the feature and now through this mapping I also know which is the action that is good for that feature. Now we are going to call this setting we are going to simply call this setting as stochastic. So we will just in every round I have this decision sets this theta star is unknown. Now my goal is to identify what is the best feature from that ok. So now what I am going to observe I am going to observe like in every round if I am going to play if I am going to. So this is I have the set DT which involves set of features let us say if I am going to play xt belonging to DT in round T the reward I am going to observe is now simply xt times ok. So let me not call this xt because that is confuses that with that. So now it is going to be DT times theta star plus the noise this noise term is same as earlier. So now I am going to call this set up as simply stochastic linear bandage but now you see that this is nothing but again this is just an abstraction of our stochastic KM stochastic contextual KM banded ok. Now what is the regret in this setting? The regret setting for this is simply what I have to do? I want to play select the best feature from my decision set in every round and if you are going to do this in every round you are going to get the best actions but you do not know always what is this theta star. So you cannot play this in every round. So the best you are getting here is max over A this is what T equals to 1 to T and this is the regret you are going to incur and this is nothing but this is the RT you are going to incur. So simply DT and so if you are going to play DT in round T, so the environment you are this is the value you are going to observe right, D know that that arm has concept has gone right now we have abstracted out and now we are saying it is every term thing in terms of the features now because now for any arm pair we have this set of features now in every round. So when I have a feature XT in round T I had got this set of features using this feature map. So this is what matters to me right now I do not care what are the arms now. So now arms identity do not matter to me only the features I have this set and then from what matters to me is can I identify a feature that map optimizes this theta star. If I know which is that feature that optimizes this I already know what is the arm that is giving me the best reward. So that is why now I am saying that instead of worrying about what are this arm I am only worrying about in every round instead of saying XT has come I am going to say that feature said DT has revealed to me and now I have to identify what is the best feature that maximizes this ok. But so if I knew theta star this is what I have done in every round but I do not know so I would have whatever my policy would have played DT in that round. So this is the reward I would have obtained and now I am going to compare my performance this is a total reward I would have got against what a war I call I would have got who knew theta star and I am going to call this regret and expected regret we are going to and ok this is we are just going to do now expected regret ok. So let us look at quickly a couple of cases. So now suppose let us say this set DT I said these are vectors right suppose let us say feature vectors right let us say these are my unit vectors you understand what I mean by this what is E 1 1 all 0 E 2 is 0 1 all 0. So each E 1 is a vector of dimension K the vector of dimension D where only the ith component is non 0 1. Now suppose let us say all my decision set in every rounds are like this my decision sets are like this ok. Now what is this now what is the arg max this is going to give me so whichever which are the component the largest in theta star it is going to give that right. Now suppose this theta stars assume that the theta star is a vector where each component corresponds to mean of an arm right. Now this operation is just telling that I am pulling that arm which has the highest mean right and what is this telling whatever the component you pulled. So now if I have this kind of decision set is it not the same as my stochastic K I am banded problem. So here what my decision sets is such that they have only unit vectors in this and now I know that if I am going to take E I of theta star this is nothing but I am going to be theta I right. So now you can just think that this is not always going to be just if my decision sets are like this this is just going to give me the corresponding component in my theta star. So if I am going to just treat in my earlier stochastic K I am banded problem if I am going to just think that the mean arms means of all the arms I am going to concatenate and write it as a vector theta star. Now it is about now your decision is is about which unit vector to pick from this decision right but this unit vector is it is just telling that which arm to pick in this case. So you see that this stochastic linear banded if I am restricting this dt to my unit vectors this already captures my stochastic K I am banded setting ok where this theta star this unknown parameter here is nothing but the vector of the mean values of the arms which were which were characterizing the environment there. Now this theta star here is characterizing the environment ok. So in that way for the special set of dt, dt which unit vector the stochastic linear banded is same as stochastic K I am banded. So K means d, d I am banded because it is dealing with d arms here. If theta star has dimension d that means it is dealing with d arms ok and but the thing is that dt need not be always unit vectors like this they could be anything each dt could be any subset of my id prime whatever is my feature space that I am looking into ok. So if you are going to take this theta star to be belong to R d ok. So naturally this dt each element I am doing any product. So each of them has to be of dimension d. Yes. But you have to require that K is less equal in the small d. Now what is K? K equals to d. K equals to d for us in this case. Yes. Yes. A belongs to K. Yes. So now we have K by dimension. K by dimension will have to be that right. Right that is right but so that is when I did this. Now I am saying that forget that like for each arm you have a feature. Now only look at a feature vectors. Now the dt is nothing but the collection of feature vectors. Where is the K number of features? If there are K number of features vectors. Yes. Yes fine. Exactly. So each arm whatever this theta star here right I want each of this unit vector corresponding to one component in this. So I want as many unit vectors as there are as the dimension of this theta star. So that is why like in this case I want K to be same as d. Yeah there are that many unit vectors. So is this clear why this is giving me back if my dT's are like this why this is giving me back my stochastic KM bandits. So yeah clear it has d has to be K for this mapping to happen. Okay so let us stop here. We will continue this in the next class.