 So, what we are next going to see is, is it possible to instead of focusing all the context, can we do some restriction on this context or like group this context and then we can see whether we can get a better regret. So, what could be the possibilities? So, naturally we are seeing that if this context is very large, I mean just applying maintaining a separate EXP 3 algorithm or one algorithm for each context is a very costly affair ok. So, we have to think a bit more here ok. So, for that let us look for some special again like the way we did it in linear bandits where we assume that my rewards, mean rewards are linear right. So, let us look at for some special cases where we will be able to bring down this dependency on the size of cardinality, cardinality of my context set ok. So, one possibility is, so what Lerner is doing in all the setup is, given a context he is trying to identify which is the best term for it right. So, in the standard bandits setup where we ignore totally contextual information, what we did? We always try to look for a single best arm right, we always try to see which is a single best arm and that was our benchmark against that benchmark we compared our algorithm. But now we are saying this see I it is not always the case that there is always going to be single best arm, the single the best arm is context dependent, we have discussed this right last time. So, if you want to recommend a movies to the people who log into your recommendation system it is not that there is a single best movie that everybody likes right, it depends on who is that person is and what is his profile or when I say profile is things like his age, his past activities and all these things. So now what we are looking is fine for every context the best arm could be different. Now instead what we are now looking in we are trying to do is we are trying to learn a function in a way which gives given context what is the best arm to play. Earlier we are just interested in one arm finding one arm now for every context we want to find the best arm. So, the problem boils down to looking for a function which is given my context it wants to tell me what is the arm to play my search is basically over such functions right for every given context I want to basically identify which is the best action. So, I could rewrite this regret in a form and what is phi here this phi is collection of all phi where which maps C to K. So, can I think of this regret more generally like this I am now interested in finding a function which maps each context to particular arm. So, what is this saying if you are going to take a function phi if you are going to use that function phi that means you are going to basically select the arm phi C t in round t what is C t here C t is the context that appears in round t and this is the total reward you would have got if you are going to use the function phi. And now what you are interested is you are trying to get a best phi that means the best reward you would have got over time purity and that is your benchmark and now you are going to compare that against what is the total reward you would have got yourself using a policy phi your policy phi right yeah all functions this one yeah because I want to see like so my what is basically learner is doing learner is basically mapping a context to an arm where now just looking what is the best you would have got what is the best map that maximizes this regret maximizes the reward. So, you are a learner for every context like you you have to decide which is the arm to play and obviously you want to like to play and you like to map it in such a way that the reward you are going to get over time period t that is going to be maximized yeah yeah. So, in fact, this does not depend on this phi right you could as well write it as the both are the same because this is just a generalized version of this here you are looking at yeah I mean right now I have let us say this is all possible functions all possible maps yes in this case this should are identical they may not be, but if I allow all possible phi then I could this is exactly saying what is phi of x here phi x x is nothing but exactly arc max of this. So, in this case suppose let us say so in this case I could refund phi of x to be arc max of this exact quantity of course I have to as ct equals to c every you are allowing every possible maps, but in that you are looking for such a map which maximizes this and I can always define such a map right. So, this is provided you know already all this xtis in every round that is this is the thing in hindsight what you would have got in hindsight ok. So, again here when I am doing this maximization I am doing this maximization assuming that I know all the reward assignment that has happened in every round and to each of the arms ok and actually as you see here posing yourself. So, what is this you are competing this is what you would have got and you are trying to compete yourself against a benchmark which is trying to get the best possible thing and it is going to get the best possible thing by knowing all the information and you are trying to compare yourself against this by knowing none of this information right. So, you are doing I mean in the sense that you are only learning it and trying to do, but this guy is trying to get this value by knowing all the values right. So, this is the so this is the varacle here. So, varacle here is knowing all the values that has been assigned in each to each arms in each round and then it is trying to see what is the best you could have got. Now, you competing against such an varacle is may be a very you are asking for too much right like because this guy the benchmark you are setting is such that it is it knew all the information. If you want to get a similar performance like this case here you what you can you could guess guarantees this and this regret bound is reasonable if this phi is small sorry this cardinality of c is small. If this is small this is large then you may not be able to guarantee a good regret performance. So, for that what we will do is we are going to weaken this benchmark ok. So, it is always the case like when you are going to do such kind of performance right there are two things here. What is the benchmark you are going to set? If you are going to set yourself a weak benchmark yes you are actual algorithm may come as close to that benchmark and in that case your regret may be 0, but the actual reward you got may be poor ok. But you may try to set a very very high benchmark like as we have done here this is like a this is what the varacle would have got and you are trying to get try to get something as close as varacle. So, you are making your life very difficult right you it is very difficult to achieve that. So, it is always important like whenever you are going to set up such a things what is the benchmark you want to compare against? You should not set your benchmark either to be too easy so that it is much easier for you or you do not want to set it too difficult so that you are never going to achieve. So, for example, in our in our class we do not want everybody to score simple give simple simple question and everybody score 100 and actually it would have happened that none of you knew anything, but you still got 100 because the questions were too simple or otherwise like we just set two hard questions so that none of you got high scores. I just made your life difficult right you may maybe you have learnt, but it is just that you could not score too well because the benchmark was too difficult. So, what is that benchmark we are going to set? So, for that now we are going to see few possibilities and then we we focus on a particular benchmark a couple of things we are going to do is these are some some potential things we are going to talk about partitions. So, in this we are going to assume that yes we have many possible context, but some context are very identical that means if you know something on a particular context you kind of already know the same thing happens on another context which you believe is going to be the same as this context ok. So, let us try to formalize that let. So, p be partition of c. So, understand what I mean by partition right. So, let us take p to be a partition of c and then we are going to look for this phi where and then I am going to further enforce a constraint that phi of c 1 is equals to phi of c 2 if c 1 and c 2 belongs to same set in p you understand this. So, what we are doing here let us say this is my context set I just assuming take one partition let us say this is one partition if let us say if this there are two context from this they will have the same value everybody in this region we are going to have the same value. That means if I know something about one context in this region I already know everything about the other context in that region. So, in this case do I need to maintain one exp 3 algorithm for each context. So, then I need to maintain one exp 3 for each partition ok. So, I can just maintain one for each of this and in that case how does this bound is going to change. So, it is going to be simply the number of elements in the partition right. So, in that case if I am going to with if I am going to restrict myself to such a phi then my if I am going to use exp 3 this is going to guarantee that two times instead of this I am going to write it a cardinality of p times t k log k. So, do not confuse ok not this one. So, how to write this. So, p is a set of sets right it is a partition cardinality of p. So, I hope so if I write do not we will not get confused right like. So, it is the number of sets in this set ok. So, that is fine. So, and depending on how big how many elements are there in this partition accordingly you are going to get the regret bound ok. So, for example, if you are you this you are going to make it like a single partition this guy is going to be 1. If you have two sets in this this guy is going to be 2 ok. So, it often happens that yes, no, no, no, but we are just saying I if we have the prior information that the set. So, we are basically weakening the benchmark right. So, we are weakening we are saying that we are looking for such a fees where such a thing already is happening. So, when I am going to risk so, here there is no constraint on phi. So, this is like the hardest benchmark I could get, but when I put such a constraint on this phi it is like a which less competitive benchmark for me ok. So, can you think of other possibilities? What kind of other how can I further weaken my benchmark and I want to weaken my basically benchmark and I do not want to be like a too simple one. Like I do not want like very very tough benchmark with hard to crack and a trivial one which anybody will crack. So, what could be other possible good benchmarks? Ok. So, in all this recommendation system the way so, you might have all this seen this movie recommendation systems right. How do they recommend you? Even though you might have logged in into the recommendation system for the first time, they will still able to show something to you. How they would have done it? Yeah. So, they might have seen that ok. So, your age is XYZ you come from this geographical area and you have done this that is your profile, but before you you are not the only one who are going to entering their system right. Many many people have also logged into there to see watch movies or download movies. Other people who have done it they might have already rated them because every time you watch something they will ask you to rate right ok. So, that guy whatever he has liked that may provide some information about what you like. Is there a possible how can this this system can derive this information right. So, some similarity score can be defined between the context right like suppose if this is the feature and this guy has this given these rankings to this. Now, this guy says feature. I am going to compare this guy's feature with all the possible feature vectors I have that is profiles and I see that where this guy's feature matches closely with already the things I have in my database. If it matches maybe I will feel that ok this profile looks very similar to this profile and maybe this guy will also like the similar things that that guy already with this similar profile liked and then that may show you. So, in that way we can come up with the similarity score here and based on that try to further restrict my fee function here ok. So, based on similarity function. So, for this obviously we need to define a similarity function we are going to define it as simply. So, let us say there is some function I already come up with given two context it will give you a number between 0 1. So, this similarity score could be 1 if the two profiles match very much and it is 0 if they are not matching ok. So, in this case you can think of only those fee functions for which the average similarity score is or average dissimilarity score is less than some threshold ok. So, let me formalize this. So, what is the average? So, let us say I want to now define for a given fee function what is the average dissimilarity score ok. So, given fee I am going to define average dissimilarity as 1 upon. So, can you all read this? So, what it is doing is. So, it is going to take a pair of context and on those points which your function see do not agree it is going to compute what is the dissimilarity score on those two and it is going to submit over all possible pairs and normalize. So, if my fee is agreeing on some CD I do not need to worry about that I will worry about all the points where my fee function does not agree and now I will see that how much is the dissimilarity. So, I have defined S as a similarity one. So, 1 minus S is the dissimilarity score here ok and now yeah cardinality of I am not worried about partition here like I am taking all of them. So, this is for the previous part. Now, I am just defining my similarity function over all pairs and now I am only looking for the dissimilarity scores where they do not it could be right like you just take a fee that is given to you. So, there are so many fees right so many assignments can be possible you take a fee and if on some pair it is and it is not necessary that if it is true for every and it may possible that take any two pairs the phi could be assigning different values or it could be possible that some pairs it could be assigning the same value. It is just worrying about the cases where that phi is not assigning the same value to that pair and now taking the dissimilarity score over all possibilities. Now, you want to make sure that this dissimilarity score is not too high yes, phi and yes definitely it is defined. So, this is defined for a to take fee and S is already you have chosen that is your whatever favorite you have you took it and for that you are going to define. You pick it, fine but that is not the only fee you want to learn over right. So, what I want to do is the way I set my fee here like that. So, let me call this quantity as another thing. So, dissimilarity score dissimilarity of fee and also let me write it as S because this depends on this function S also. So, now what I will do is here I am going to look for all fee such that my d S of fee is let us say less than some theta. So, I am now basically not looking considering to optimize this function over all fee, but only those fees where the dissimilarity is not too much. We are not trying to reduce this right, we have defined it in this way and only looking for those fees who do not have too much of dissimilarity. So, this is as I said you should just let this theta to be like arbitrarily large that means it will incorporate all possible fees and it will be like same as this fee. But now as I am telling you I want to like weaken this benchmark and by and I want to say that it is only going to look to optimize over those fees which will try to group this context based on that similarity scores. It is like so you are going this is up to you right like see you are going to either search for all possible fees or you can search over only those fees where the dissimilarity is not too high ok. So, other way of thinking it is like say like in the recommendation system we are going to have right you can your algorithm the recommendation system basically when it is going to search and give the recommendations for you right it will have certain set of policies like this ok. Now, if you have to find the best one or all these it may be too much big task for it. So, that is why I am going to restrict only those fees assuming that this already captures a big chunk of all fees. So, there would not be. So, whatever the profiles I am going to create among the users right for every users I am assuming that the profiles of users like would not vary drastically ok fine. Let us take some people are visiting a recommendation system right or so when I am going to design this theta I am going to assume that at least they will have something in common because of that their dissimilarity will not be too much and that is why I am going to restrict my search space to smaller fee ok. And if I further know that all these guys are coming from IIT Bombay I will still set this theta to be smaller because all these guys are young guys who is going to have a similar interest right. So, this is based on my prior information and if I know that so, all these legs of suppose if it is a sports site it already knows that like most of the guys who are coming to it are like young guys. So, some of the features are going to be common there right. So, then they will have already some similarities based on that you can restrict ok fine. So, this is just another possibility right and indeed these are actually used like depending on your prior information what kind of customers you are going to get you are going to because you do not want to arbitrarily search over a big space here ok. The other possibility is what could be the other possibility? Other possibility is as simple as this like TK Kithnabi ho is may like how many are there in this I do not care about them, but I am going to only restrict myself to few of them. So, you are going to further have some kind of a prior information all this mappings are not possible only few mappings are possible you are going to initially shortlist them, but you do not know among them which is the best you want to still figure out, but at least initially itself you will narrow down your search and by restricting yourself to certain number of fee functions should that be fine or that is a very bad thing to do. So, I am just going to take this fee to be just a collection of fee one, some fee two all the way up to let us say fee m, some m set of functions and I want to just find out which one is best among them. I mean this is most of the case right like you do not want to deal with this is always the case we will not always have infinitely many options right. We have few options and from that we have to pick the best one and in that sense we can think of the number of possible mappings are finite which you would have narrowed down like which you are going to select among fee I do not know that comes from your expertise for that particular application. Like for example, in the recommendation system you may be already knowing that like so, you may be you may already knowing that if you are a so, if you are if you just go to some e-commerce site and if you already know if you are a young guy I already know that it is unlikely that you are going to look for some insurance policy right. So, that mapping that given a young guy looking for an insurance policy that option I already rule out and that will already eliminate certain number of policies. So, based on that I will rule out certain policies and narrow down to certain number of policies and then I am going to search which one the best among them ok. So, now fine I have narrowed down is it any simpler can you think of what is a good way to search on this, but it is not still necessary that these two functions should assign the same value to up at any particular context it is not necessary right they could be all they assigning different yeah how we did right now we are not telling you I am just saying somebody did it for you we will have always have some CL steam some big PR teams and all these things right like they will narrow down they will already like they will say rule out ok. So, this is not making sense rule it out then what is possible then you are going to narrow down and from that available options then you are going to optimize what is the best you could do on that we are not going to make any assumption is it. So, what is this phi function? Phi function is still doing with dealing with the different context right phi is again has to see which for which context which are we are going to select ok. So, now so, then what is a good policy here? So, one possibility is like each is like a banded problem in its own right like if you just take a policy phi ok. So, what is this phi is basically you want to identify the best term ok not this ok. So, let us go back to so do you recall our weighted majority setting what would we do there yeah. So, what would we say set that setup as what we said that setup as yeah we said that has prediction with expert advice right. So, prediction with expert advice what we said there the expert is going to. So, there expert is going to tell us which arm to play and what we did there we go basically assigned a way to each of the expert and based on that we are going to pick an expert and we are going to observe the loss or reward half-heartedly. Now, can we think of this phi functions as different experts and I want to identify the best one, but what is the difference here? Yeah fine let us think fully information ok let us compare exp3 only exp3 we had only banded information right we only observed the expert we played there. Then if that is the just simply exp3 here. So, in exp3 what we did we actually maintained a distribution over the arms right in exp3 we actually maintained a distribution on arms based on the cumulative loss or reward we have observed so far and then try to play the one which has like according to some distribution which we updated in every round. So, now is it worthwhile to maintain the distribution over these experts yeah yeah how do you have it will come to that later that is an algorithm, but in terms of the broad structure of the policy if I want to just ok now we have this narrow down to this now I want you to just think about an algorithm at least the broad steps. So, is it like ok like if we can now think these as different experts on which I want to maintain. So, what these guys can do? Each guy is going to basically tell this is a mapping right for each context what should be the arm these guys is going to tell ok you use this this this arm. Now, I want to whatever that guy tells me to do in each round based on that I can maintain a distribution over him and if I pick this particular expert then whatever that guy tells me to play in that round I will play that and I observe the reward and based on that if I get a good thing or bad thing I am going to update my weight on that expert and I will continue this ok. So, based on this idea we are going to see an algorithm called exp4 next time you have seen exp3 right. So, what is exp4? So, what is exp3 exploration and exploitation with exponentially weighted. So, now exploration. So, what was exactly exp3 what was first exp what is the first say that again exponentially weighted algorithm exploration and exploitation for experts. So, this is the exp4 algorithm which will study in the next class. For each guy you are going to do A B testing what is that what did we will get if you do A B testing on each A by A B how we are applying A B on each phi 1 here yeah yeah maybe we will do a version of that A B testing only right what exp3 and we are all doing there. We are just playing them and getting the information about how they are doing and then we are updating about the weights just like we have more experts here and we have to identify an expert ok. So, let us continue in next class.