 So, with this notion of equivalence now defined, we now have the basic question, well this is a very strong notion, but does this even hold? So does there exist? So this brings us to the question, for every mixed strategy does there exist an equivalent BI for every sigma, mixed strategy sigma i does there exist an equivalent behavioral strategy BI equivalent in the sense defined above and likewise for every behavioral strategy does there exist a mixed, every behavioral strategy BI does there exist an equivalent sigma i, mixed strategy sigma i. Now, again for those of you who know a little bit about Markov decision processes and or stochastic control, a behavioral strategy is what we know as randomized strategies or randomized policies in the Markov decision processes literature or you know for you encounter this in learning theory and so on during Q when you do Q policy iteration or Q learning and so on. So, a behavioral strategy over there is essentially a random choice of an action at each state and that is essentially that is what this is, so the state is your information set you know that is what you know and so you are producing a random choice of an action at that information set. So, we are now left with essentially this question, so is it true now that we have defined this notion of equivalence is it true that for every mixed strategy there exists a equivalent behavioral strategy and for every behavioral strategy does there exist an equivalent mixed strategy. If we settle this then it means that you know we are clean as far as the theory of games is concerned because then we do not need to tell the player what to play, we can just stick to our role as observers, right, alright, so unfortunately this is not the case, so there are counter examples, so let us see, so here is an example, so in fact I do not even need to bring in multiple players to give you one example, so this is an example of a mixed strategy with no equivalent behavioral strategy, okay, so here is the game begins at this note, so let us say player 1 is playing, he has to play L and R, once he is done with playing that it turns out after that it is again his turn to play, alright, and again he has to play L or R, okay, I am just using the same notation, but he is forgotten what he has played in the first step, okay, he remembers he has played but he has forgotten that what he has played, is this clear, so now what are the set of pure strategies, how many pure strategies for this player, 4 pure strategies, okay, again L, L, L, R, R, L, R, R, these are his 4 pure strategies, so consider, let us say consider this mixed strategy, a mixed strategy like this, L, L with probability half, 0 for L, R, 0 for R, L and half for R, R, this is now a mixed strategy, right, so it is a probability distribution on 4 on this set, it is playing, he is playing L, L with probability half, R, R with probability half and all other strategies with probability 0, okay, now once again let us look at what outcomes can come out of this, the outcomes here are there are 4 possible ways in which this thing can end, 4 leaf nodes, right, O1, O2, O3, O4, okay, what are the, what is the probability of reaching each of these under this mixed strategy, so let us write this, what is the probability of reaching O1, probability of reaching O1 is equal to the probability of taking actions L followed by L, see, so the way we write out games is we keep track of the entire history even though players may not remember it, so as observers we know exactly what history has happened, right, so what has happened is L has, in order to reach O1 there is only one way which is L followed by L, now what is the probability that that sequence of actions is chosen, it is half, okay, so O1 is reached with probability half, what about O2, if you want to reach O2 it has to be L followed by R and that is not, the probability of that happening is 0, okay, so O2 is 0, similarly O3 is 0 and O4 is half, right, this is the probability of reaching these nodes, okay, now suppose the player does a behavioral strategy, okay, so now let me write in red a behavioral strategy, so what will he do for as a behavioral strategy, he has to pick an action at random at each information set, two information sets, the starting node and the second node, this information set, so here he is choosing an action at random, let us say suppose he chooses L with probability alpha and R with probability 1 minus alpha, okay and out here he chooses L with probability beta and R with probability 1 minus beta, okay, so now what is the probability, let us see now what is the probability of reaching node O1, what is the probability of getting outcome O1, alpha times beta, right, so now my behavioral strategy here and so now let us write out the probability of reaching these various nodes, so probability of reaching node O1 is alpha beta, O2, outcome O2 will come up with probability alpha into 1 minus beta, O3 will come with probability 1 minus alpha into beta and O4 comes with 1 minus alpha times 1 minus beta, okay, alright, so now question for you is, here is the green one is a mixed strategy that I just wrote, okay, it is a valid mixed strategy and it produces this distribution, the green distribution sorry, it comes up form a mixed strategy, can I get this distribution, can I simulate this distribution using a behavioral strategy, that is not possible, why is that not possible, because if you want O2 the probability of reaching O2 to be 0, which means you will necessarily get, so you will get alpha into 1 minus beta to be 0, which means either alpha is 0 or beta is 1, right, implies alpha is equal to 0 or beta is equal to 1, but if alpha is, if alpha is 0 and if either alpha is 0 or beta is 1, there is no way that O4 and O1 will have probabilities half, you also want alpha beta to be half and 1 minus alpha into 1 minus beta to be half, these are inconsistent, right. So it is, so there is for this mixed strategy, there is no equivalent behavioral strategy and you can see what has happened here, again there is an issue of memory, enough when you should randomize over pure strategies, a pure strategy has in it implicitly a memory involved, because it is a sequence of actions that you have taken throughout, so it is possible for you to sort of take, choose a particular path by choosing a pure strategy, whereas in a behavioral strategy, when you are randomizing in a behavioral strategy, you are randomizing afresh at each information set, you may or may not have remembered what at that information set, you may or may not know what happened in the past, okay. So strategically it is, so we cannot say that the player should be restricted to one one thing or the other or in this case, what has happened is that actually the set of mixed strategies is actually, it turns out the reverse is actually true that for every behavioral strategy, there exists a mixed strategy, okay. In this case, I can actually show that you know you should just, in fact it is apparent from here itself, I take a behavioral strategy alpha beta, this is the distribution I will get, right and then corresponding to each, I just need to assign these and that they add up, this whole thing adds up to one, right, so that gives me an equivalent mixed strategy. So for every behavioral strategy, there is an equivalent mixed strategy in this game, but for every mixed strategy, there is no behavioral strategy, there need not be a behavioral strategy, clear, okay, yeah. No, he just, the mixed strategies does not require him to do that, right, he just needs to look at what information set he is at and he just plays, so he does not need to remember the action as part of the, as part in order to play the mixed strategy. He does not remember, but he knows that he has to play L as part of his strategy, see the point is that is the thing, so what he knows at this information set is already there in this information set and he just says that his strategy is to play L at this information set, the question is whether he can recognize the information set or not, if he can recognize the information set, then that is it, right, so if he has some measurement apparatus which tells him that he is at this information set, that is what the game is saying, right, when the extensive form has this shape, it basically says that he is able to tell whether he is at one of these two nodes, but he does not know which one, so he can, you know, he has some sensor or whatever to tell him that he is at this node and then based on that he takes an action. Now, he does not need to have stored in his memory what he took before or he does not need a good enough sensor to tell him whether he is followed this history or that history, but his strategy is that when his sensor gives him this reading, he is supposed to just play this action, is this clear, yeah, yeah, so here that, no, no, so but to show that it is not equivalent, it is enough to check enough to check that I get a contradiction, no, yeah, to show that they are equivalent, I have to check for every, to show that they are not, it is enough to, you know, show that it breaks as at one place, right, yeah, let us take another example, okay, so this is now, so behavioral strategy with no equivalent next strategy. So, here is the, here is this game now, the player, here is player one, at this node he starts off here, yes, he can take two actions L and R and if he takes action L, the game comes to this node at which case he has again two actions to take L and R, okay, and now I have to draw the information set and the information set is funky. So, I mean this could, this is not starting from a singleton information set, but that is okay, I could have, you know, put in a bigger tree here and then ended up here, that is perfectly fine, so just for the moment just take it from here, as if this is part of some subtree, okay. Now, here this is where the, this is the situation, player one has how many information sets, player one has just one information set and at each information set he has two actions L or R, when he takes action L, the game actually moves forward to another node, but he has no way of knowing that that has happened. So, when he plays L, he has no way of knowing that the game has actually moved forward, for him it is no different from him having not pressed L at all. So, which means he is, it is like the situation where he is applied a control, but he has, there is nothing, no physical change around him, he has no way to know if he is, if his thing has not been applied or is the thing not working or what, he has no, he has, there is no way for him to know, but if he plays R, he does know and the game goes to some other node, okay. So, the world around him, when he plays L, the world around him looks the same, he has his readings, his sensor readings and all that are exactly the same question. So, he can, well for him it is like, okay, let us press once more, is it clear? So, this is, I mean you can give many sort of physical interpretations, but this is essentially the case where you, where the, where an actuator has acted, but does not has, does not get the acknowledgement to know that it has actually acted, okay. So, we send all these signals into outer space and all that, how do we know that anyone has even received them, right? So, for if taking action L means that someone has received them, then you need an acknowledgement back, something has to bounce back or whatever, right? For us it is, if you do not get that, it is as good as that action has not been taken, it is that sort of a setting, okay. Of course, there is a more amusing setting, you can think of this guy as a driver taking left turns, right turns, etc. But he is absent-minded, he has taken a left turn, forgets that he took a left turn and now wants to take one more left turn or right turn, okay. This is our drought, think of your driver in outer space, the world around him looks dark and the same, taken a left turn, he has no way of knowing that his turn has actually got an executed, okay, all right. So, now what are his pure strategies? How many pure strategies? Two pure strategies, there is just one information set and two actions at this information set, right? So, two pure strategies, L and R, okay. Yeah, yeah. So, as I said, that was only to, so, yes, when I wrote out an extensive, the definition of the extensive form, I said that some books include this as part of the definition. So, this is essentially a case of absent-mindedness where you have taken an action but do not know that you have taken an action, right? Now, whether you allow for that or not is a matter of choice of how generally you want to make that kind of model, you are allowing people to forget whatever. So, whether you want to allow them to forget that they even acted, that is the question. No, this is necessary. It is a multi-act game because there is a path that intersects a player's player set twice. It just happens that those, that path passes through the same information set. See, that is why I said, you know, that is why we need to put all that structure in feedback games, you know, to make the multi-act game structured. Imagine if some information set like this was extended across stages in a game. Imagine how, you know, complicated that could become, right? Yeah, okay. So, there are two pure strategies and how many possible outcomes, the game can go to these three possible nodes here. So, let us, the outcomes are O1, O2, O3. Okay, these are the three possible outcomes. Now, by playing a pure strategy, okay, let me ask you this. Is it possible for the driver to ever get to O2? Impossible, right? Why? Because there are two pure strategies, L and R. If he plays L, the game comes here and then he has to again, he is in the same information set. He then again plays L because his policy is to play L at this information set. So, the game comes to O1, right? And if he plays R at this information set, that is it, he goes to O3. There is no way for him to reach O2 by playing pure strategies, which means by even after randomizing over the set of pure strategies, the probability of reaching node O2 is going to be 0 for any distribution because there is no, there is no pure strategy that reaches O2, okay? So, the probability of reaching O2 because for every pure strategy, so there are these two pure strategies, you have to randomize over these two. You choose one or the other. So, suppose you choose this with probability, probability lambda, this with probability 1 minus lambda, right? What is the probability of reaching O2? In either case, you do not reach O2, right? You choose one of these as random and then say what is the probability? Yeah. So, probability of reaching O2 is in any mixed strategy, under any mixed strategy, sigma i is equal to 0 for all mixed strategies, sigma i, right? But now suppose he does a behavioral strategy, okay? Let us say he does a behavioral strategy. Consider a behavioral strategy where he does L with probability half and R with probability half, okay? Then what is the probability of reaching these various nodes? Then O1 is reached with probability half times half, which is half times half, which is one-fourth. What is O2 is reached with probability also one-fourth, half times half, which is one-fourth and O3 is reached with probability half. So, this absent-minded driver can actually get to the right destination, you know, by playing behavioral strategies when he could not have done it with by mixed strategies, all right? So, this, so there is something quite subtle here, okay, about how to randomize, because I mean, you may make fun of this kind of a game, you might find this ridiculous, but the point is this is there is something that this way of randomization is actually, you know, letting you do what you could not have done in the conventional or mixed strategy way, okay? So, this is, you can think of this as a form of exploration, right? This is a player is trying to explore the space and it turns out, well, if you are going, if you are going to choose a policy at random, that is somehow limiting as compared to doing exploration in this way, right? Okay, all right. So, therefore, now what this means is since there are counter examples on both sides, it means that you need a theorem, okay? So, you need a theorem for when do they exist, okay? So, we need a theorem for when there exists behavioral strategy equivalent to a mixed one and mixed equivalent to a behavioral one and so on, okay? So, in each of these cases, actually, if you see, player is forgetting something and that is, it turns out that that is fundamental to this, okay? So, what is, what has the player forgotten in this, in this game? In this game, he is at this information set, he has forgotten what he played in the previous, at the previous, previous time that he played, right? He has forgotten what he played. At this information set, he has forgotten even that he played. Is this clear? Okay? So, let us now try to think, okay, what, what exactly, what are the various things that a player could forget, okay? So, what can a player forget? So, at any node, so, this is a multi-ag setting. So, when you see, you are at some information set that is probably in its precedence, some other information set where the player has previously acted, right? So, now, we need, we should be able to compare what that player knows at this information set versus what he knew in the previous, in some earlier, at an earlier stage in the history of the game, right? Along the same path. So, what can we, what are the things that the player could forget? So, he can, of course, forget what action he took, okay? Or rather, let me put this positively. So, what, so, if you want the player to remember everything, let us say, so, if you want a player, if a player has to forget nothing, has to forget nothing, what should he, you know, what, what, what, what are the things that he cannot forget? So, he cannot forget action, he cannot forget the actions he took in the past, okay? So, this is something very common, we use this in Markov discipline processes, stochastic control and so on also. The information that you have at any time step includes the set of observations up till that time set and the actions that you have taken up till that time step, right? So, so, the actions is part of it. So, what do you, what a player must know the, remember the actions is taken at that time, okay? In the past, okay? What else can he not forget? We have not defined state. So, state is what in this case? No, but he, so, we are, see, this is not about what he knows, this is about what he can, what he should not forget. See, forget is different from knowing. So, knowing means that he actually, when I say he knows something, he knows whether he is at this information set and or he is at this node and so on. He may have, so, a player may know nothing actually, but he is, it is quite, provided in the past also he knew nothing, then he has not forgotten anything, is this clear? So, forgetting is about, is the differential information, essentially what he knows now relative to what he knew earlier, is this clear? Yeah. So, he cannot forget the actions that he is taken. So, this is something that has additional information that has happened. So, he cannot forget that the actions that he is taken, okay? Cannot forget that he took an action at all, which is what happened in the absent-minded driver, okay? Well, not really, not that different, actually. If he knows the action, then he knows that he has taken an action. Yeah, that is the important thing. See, so, cannot forget an action, it is okay, see, it is of no use if you remember the action, but you forgot what information you had when you took the action. So, again, reminding you of, you know, as you do in partially observed Mark-Augustian processes and so on, the information you had, that a player has at any time, the set of all observations up until that time. So, it includes all the previous information and his actions that have happened, right? So, the cannot forget what he previously knew, all right? So, this is what we want. And now, how do we capture this in an extensive form? We need to put this into a form of a definition that we can actually into a form of a formal definition. So, we say that player i has perfect recall if the following holds, every information set intersects every path at most once. So, absent mindedness is gone with this, okay? So, every information set is intersects every path at most once. Every path means path from root node to a leaf node, okay? Now, what this does? This only prevents absent mindedness. You have to now make sure, you have to ensure that you do not forget actions and do not forget what you previously knew, okay? Now, here is the thing. So, if suppose two nodes x dash and x double dash are in the same information set. So, if you have two nodes that are in the same information set of player i, okay? And suppose, so now we need, so that means he cannot distinguish between these two nodes, x dash and x double dash, okay? So, it is like this x dash and x double dash. Now, suppose x dash comes from a predecessor node, let us call it x, let us call it x, okay? So, x is a predecessor of x dash and let us say x hat is a predecessor of x double dash. So, x dash and x double dash are in the same information set. So, these two are in the same information set. x is a predecessor of x dash and x hat is a predecessor of x double dash, okay? So, I am just drawing this as a yellow highlighted thing because, so it may be, I mean there could be a long intermediate tree, many other players playing and all that, but x dash is a predecessor, x is a predecessor of x dash and x hat is a predecessor of x double dash, okay? Now, if this is the case, could these two guys x and x hat have been in different information sets, is that possible? So, these are, so x, these are nodes, so this is player i's node, these are also player i's nodes. Could these two have been in two different information sets, x and x hat, why no, okay? Or alternatively, see if these two are in different information sets, then there was something that player i knew at that time which helped him distinguish between x and x hat, but after that the game has reached a stage where he cannot, he cannot distinguish between these two histories anymore. So, which means there is something he knew at that time which he has forgotten, yeah. So, so imagine the history of the game has passed through this, this node and then reached this and then this through this node it is reached here, right? So, now what the situation in the red information set is that the player is not able to distinguish between these two histories. But if the, if this, these two x and x hat were in different information sets, then at that time of play he was able to, but today he is not, right? In the red information set he is not able, which means that necessarily it has to be the case that these two are also in the same information set. If he has perfect recall, right? The fact that he does not have this information now means that he did not have it in the past either. It could not have been that he had it in the past, but he has forgotten it now, right? So, so if x and x dash are in the same information set of PI and x is a predecessor, if x is a predecessor of x dash, then x hat, then there is a node x hat in the same information set as x and is a predecessor of x double dash. So, these two guys must be in the same information set. So, now can these two nodes be the same? Can x and x hat be the same? He would have forgotten the action. See, if these two nodes are the same, that means these two call us into one, right? Then what must be the case? That means then after he has taken two different actions which has one which has led him, led the game down the path to x dash, the other has led the game down the path to x double dash. But now he cannot distinguish between x dash and x double dash, right? So, which means up till this node x, he had remembered what had happened. He then took two different actions, he could have taken two different actions, which one which would lead to x dash and the other is led to x double dash. But now he does not remember which action was taken because he cannot distinguish between x dash and x double dash, right? So, these two could not cannot be in the same information, sorry, these two cannot be the same node, they have to be in the same information set but cannot be the same node because then it would mean that he is forgotten the action that is taken, okay? All right. But let me ask you this, is that enough that these are two distinct nodes? Does that ensure that he has not forgotten anything? For example, suppose he took an action L here and he took an action R here, then again he has come to a stage where he does not remember whether L was taken or R was taken, right? So, he has to have taken the same action also here, otherwise he is again forgotten the action. If the two nodes were the same, of course, he was taking two distinct actions to read to do this, these two, there are two distinct actions which have led to these two paths. But now these two nodes are not the same but the actions could still be distinct and for him to not forget the action, it has to be that the actions that lead to these two nodes are also the same, okay? So, there is a node x hat in the same information set as x and is a predecessor of x double dash and the action taken at x along the path to x dash is the same as the action taken at x hat along the path to x double dash. So, these specific actions that were chosen here that lead you to this path are also the same, all right? It is essentially means that if you did not know something in the past and you took an action based on that little that incomplete information or imperfect information, based on some ignorance, you continue to remember that you took that action but you do not have any further memory of what you previously knew. I mean, you still are as ignorant as you were about what you previously knew, okay? So, whatever, so if he had, if these actions would have been distinct, this essentially would have forgotten that these two actions, that two distinct actions that were taken which were leading to this path, okay? So, in other words, if these two actions were distinct, basically that would have been two different trees which are now being put into one and there is an information set which is one which is coming from one action, the other which is coming from the other but is now being put into one information set. Again, in the MDP language, it is like it is as if there is some action at some past time which has been forgotten, you have a memory of actions up till that time and that action has been forgotten and then maybe after that you remember something or the other, is it clear? So, this is what we mean by perfect recall, okay? And so, what we will show is that, we will show that this is, so if a player has perfect recall, then for every, so there are two theorems, I will explain this next time. So, if a game has perfect recall, then for every mixed strategy, there is a behavioral strategy, okay? And if this absent mindedness condition does not hold, then for every behavioral strategy, there is a mixed strategy.