 Welcome back to this NPTEL course on game theory. In the previous sessions we introduced this evolutionary game theory. So the evolutionary game theory the major idea is to introduce this concept of evolutionary stable strategy and connecting this with replicator dynamics. So one of the very interesting fact about replicator equation is that in the replicator equation if you really look back, so the strategies which are performing better are adapted in the population. Then as the time progresses the people adapt to a fitter and fitter strategies and eventually it leads to evolutionary stable strategy. And in fact the one of the very interesting idea here is that the subrationality is connected with the rational behavior which is given by a Nash equilibrium. So the evolutionary stable strategy is in that sense is a very interesting subject on its own. And there are many other, in fact the replicator dynamics also will provide a nice way of learning the equilibrium. So under what equilibria will be the stable points under this dynamics? So this question is a very important and in fact people have tried variety of methods and one of the method that we have seen earlier is fictitious play. Even though we have not given a very detailed description about fictitious play at that point of time, now we will look at this fictitious play in a little more close detailed way. So before going further into the fictitious play I just want to mention about one example which I have not completely done in the previous sessions is the Hock-Dove game. Let me recall the Hock-Dove game and then just make a statement and then we will go to the fictitious play. So what is the Hock-Dove game? Hock-Dove game this is introduced by Maynard Smith basically to this is game is introduced to illustrate the idea of evolutionary stable strategies. So basically the idea is that there is a large population of some species and the species have two types of behavior. One is Hock behavior and other is Doe behavior. The Hock is basically an aggressive trait this is modeled by Hock and the passive strategy or trait is the Doe behavior. So what happens is that in the large population for example they will be contesting for some resources they could be food nesting site territory and other kind of thing. In two spaces of this population, two of this from this population meet they can actually show the Hock behavior or they can show the Doe behavior. If Doe behavior means but both of them are not fighting for the resource. So therefore they do not fight each other and then they share the resource. But if one Hock and one Doe encounters the Hock behavior wins over the Doe and Hock behavior takes over the resources and the Doe loses that and if the two Hocks encounter because both of them are aggressive and then one of them will lose and other gets this thing. So this is basically the this thing. So let us assume the resource of value which is greater than 0 and there is a cost of injury C which is again positive cost. So now let us look at it. Now there are Hock behavior. If both the people are playing Hock behavior that means they will get a resource one of them will get a resource V and other one is incurring a cost C. So therefore on an average the Hock population gets V minus C by 2 and of course V minus C by 2, V minus C by 2. And whereas if one Hock encounters and other is Dock, the Hocks actually get the resource and there is no cost here that is going to be V and then the Doe gets nothing because it simply gave up it is not fighting. Similarly Doe 1 H that becomes a 0 V and then when both of them are Doe then they are just simply sharing the resource no cost here this is going to be V by 2, V by 2. So this is the Hock Doe game. In fact we have introduced this game previously but we have not done the complete analysis I will just make the statement here. If the resource has a cost is higher than cost Hock is strictly dominant strategy therefore in this case actually the game becomes prisoner's dilemma. If C is greater than V game turns into what is called game of chicken. In this game there are two asymmetric Nash Equilibrium where one player is Hock and the other is these are all which you can verify easily. One player will play Hock and other player will play Doe. However there is a symmetric Nash Equilibrium here in fact let me put it as a lemma here Hock Doe game with C greater than V has one ESS in fact that is X is equals to V by C 1 minus V by C. So the Hock behavior with probability V by C and Doe behavior with 1 minus V by C probability so this is going to be a ESS here. So in fact to prove this one I do not go into the details here you can simply verify the conditions of the local superiority which we have introduced in one of the lemma or even directly verifying from the definition of ESS. So with this now I will shift the focus to fictitious play. So what is this fictitious play? In fact in the zero sum game we have introduced already this game so let me recall what this fictitious play is. So there is an underlying game A, B. So the game proceeds in several rounds let us take player 1, player 2. So let us say in the round 1 player 1 let us say place a pure strategy or let me put A0 player 1 place a pure strategy A0 and player 2 place a pure strategy B0. In the round 2 the player 1 knows that player 2 has played B0 in the previous round so therefore he will try to play best response to B0. So pure best response to B0 so A1 is nothing but is pure best response to B0. Similarly player 2 will also play B1 which is pure best response to A0. In the round 3 what happens is that the player 1 sees that player 1 has played B0 and B1 in the previous round. So he assumes that he will play B0 with B0 and B1 with equal probability. So therefore he will look at the pure best response of B0 plus B1 by 2 and then similarly player 2 will play B2 which is a pure best response of A0 plus A1 by 2 and this goes on. Therefore we are having these are the empirical averages that the players are playing it. Now what the fictitious play says is that these empirical averages in fact converges to the Nash equilibrium. So what it says the fictitious play says that A0 plus A1 plus An by if I go for n plus 1 rounds this converges to x star B0 plus B1 plus Bn by n plus 1 converges to y star and x star y star is Nash equilibrium. So this is a conjecture made and then Julia Robinson proved this for zero sum games. So the proof is actually a technical it is not it uses elementary arguments but it is quite long and technical therefore we will not go into the proof of this but we will see some consequences of it. So first before going further let us understand why this is a interesting learning algorithm. So this is a learning algorithm in the sense that when this game is played repeatedly what you are really inferring is that based on your belief about what the other guy is playing you are forming an opinion. So basically you are assuming that the player 2 is going to play B0, B1, Bn with equal probabilities. So that is why we are considering this empirical distribution and my response in the next round is to play a best response to this and then whether this eventually this best responses will converge to the optimal strategy for me or not that is the biggest question here. In fact the answer to this general question is negative meaning that this need not be true. So why this is not true for example let us look at another interesting aspect of it. In this particular thing I do not know what the other guy's payoff is. The players are not at all they do not need to know anything about the payoffs of their opponent players. They only need to watch strategy they have played and in a zero sum game automatically because my payoff is nothing but the minus of the other payoff therefore I know automatically what the other payoff. So therefore the information is available about the payoffs but in a non-zero sum game this is completely saying that the other player's payoff function is completely immaterial for this. So therefore it is difficult to believe that this converges and in fact Lloyd Shapley has provided a counter example which we will see. So in fact as I pointed out the advantage of this fictitious play is I have no need to know about anything about the payoff of others and only thing I need to know is that what he is playing in the in these rounds. So let us look at few points. Let me introduce this little formula in the fictitious play. Let me rewrite the fictitious play. So there we are considering only two players they are playing the game at time t is equals to 0, 1, 2. So define eta i t is from s minus i to n be the number of times i has observed s minus i in the past. Eta i t tells you the number of times i has observed s minus i in the past. Let eta i 0 of s minus i is the represents the starting point. So this we are not this is the starting point at time t is equals to 0 we have no past information. So therefore we consider this as a starting point. This eta i t is basically the number of times i has observed s minus i therefore the players choose action in each period to maximize the periods expected payoff given their prediction. So opponent actions are predicted according to the beliefs they are forming beliefs are formed according to mu i t s minus i this is nothing but eta i t s minus i by summation s minus i bar in s minus i eta i t s bar minus i. So basically this is this is particular thing is telling the how mu i t s minus i his belief that he is playing s minus i the player i thinks that the his opponent is playing s minus i the number of times he has this is the total number of strategies that he has observed in the past and then this eta i t tells you how many times s minus i is observed in that past. So this is the relative frequency of observing s minus i. So mu i t the player i s belief about player his opponent player is given by this one. So the player is now this once this belief is there the player chooses just his action at time t to maximize his payoff. That is what exactly it means is that he will choose s i t which maximizes over his strategies of the payoff. So u i is the payoff of his if I take s i and then mu i t mu i t is the strategy that he is has believing that the opponent is following. So when the opponent follows mu i then what is the s i which maximizes my payoff that s i t is what I will choose. So this is basically the this thing. So this is the reason why I am saying that the players they do not need to know what the other players payoff is. It only depends on the beliefs about what the other guys are playing. And one more thing is that this is not unique because there may be multiple best responses. So therefore depending on this thing different things can be there. So one has to look at this thing. So let us consider some examples. So let us take 33004011 so UDLR this is in fact if you look at it this game is dominant solvable you can easily verify this is this can be solved by domination. So in fact the unique Nash equilibrium is DR, DR is going to be the unique Nash equilibrium which you can verify. Now let us say assume eta10 is 30, eta20 is 12.5. So initially initial field starting point is eta10 is 3, 0, eta20 is 12.5. So now in the period 1 in the round 1 so mu10 is going to be 1, 0 and mu20 is going to be 1 by 3.5, 2.5 by 3.5. So play follows S10 is D, S20 is L. So if you take 1, 0 is the belief and then mu20 is this thing. So what is the best response to this mu20 that is going to be the player 1. If you look at it that best response is going to be D for player 1 and the best response to mu10 for player 2 is going to be L. This is a straight forward calculations you can see it. So with these details I will exclude. So therefore this thing in the period 2 now the beliefs are updated. So eta11 now becomes 4, 0 and then because he has this thing eta21 is going to be 1, 0.5. Basically the what basically eta11 tells you how many times he thinks L is played and how many times R is played. Similarly eta21 is telling you how many times U is played and D is played. So those updates so they are increasing. So now in this case eta11 is going to be there and then correspondingly mu1 is there and then in fact we can see that S11 is going to be D here, S21 is going to be R here. It again follows from the same thing. Look at the mu10 and mu20 they are the probability distributions coming from here and then look at their best responses then D and R comes in a period 3. So now eta12 this becomes 4, 1 and eta22 becomes 1, 4.5. So in this case S12 is going to be D, S22 is going to be ok. Now you have reached to D and R and after this whatever they play it is already you have arrived at an ashram you can verify one few more rounds and then they continue playing D and R throughout afterwards. So therefore in this game it has reached to an ashram. In fact we can prove the following thing. Let me mention some statements and then I will not go into the proofs of it. So let ST be sequence of strategy profiles generated by fictitious play. So now thing is that sequence ST converges to S if there exists T such that ST is equals to S for all T. If this happens I will say that ST converges this is the definition. So in fact we can prove the following theorem. If ST converges to S bar then S bar is Nash equal to 0 that is one point. One point is if suppose for some T ST is equals to S star where S star happens to be a strict Nash then afterwards S tau is going to be S star for all tau bigger than or equals to T. So at some point if a strict Nash equilibrium is played in the fictitious play and afterwards it will be played and the first theorem says that if there is a sequence which the fictitious play sequence ST converges to some S bar then S bar is automatically a Nash equilibrium. The proofs are actually not very hard it can be verified easily. Of course this is basically a pure strategies but if we look for a mixed equilibrium as I said the problem is it need not converge but let us say let me mention one more result. Let us say the fictitious play sequence ST converges a mixed strategy profile sigma in the time average sense if for all i and all Si in Si we have limit T going to infinity summation T is equals to 0 to T minus 1 indicator of Si T is equals to Si then take 1 over T of this this is nothing converges sigma Si where i is the indicator this is basically indicator. What it means that whenever Si T is equals to Si this becomes 1 otherwise it is 0 and basically you are counting how many Si T's are Si and then take the average count up to capital T minus 1 and then divide by T this particular term converges to sigma Si then I will say that this sigma ST converges to sigma in the time average sense. In other words actually what we are saying is that mu minus i T of Si converges to sigma i Si this is same as that we can verify this one. Again another theorem which says is that if ST converges to sigma in time average sense then sigma is Nash equilibrium. So, this is again not a very hard thing to prove it. So, in fact here is one exercise I would like to give is check this for matching pennies. In the matching pennies example write down the fictitious play sequence and then take the time average sequence and see that there is a convergence happening here. So, the most important thing is that this nonconvergence. So, basically this is due to Shapley it is a modified rock paper scissor game he considers the following thing. So, he looks at the following matrix. So, in fact one exercise is it has a unique mixed Nash equilibrium. So, in fact that is going to be 1 by 3, 1 by 3, 1 by 3 this is going to be the mixed Nash equilibrium unique mixed Nash equilibrium which is not hard to. Now what I will give is outline the ideas of nonconvergence that start with eta 1 0 to be 1 0 0 eta 2 0 is initial belief is this. So, initial belief is this if you take the next round in the period 0 play is p, r in the period 0 the play will be p and r. Then period 1 in fact play will become in fact p, r then this continues until player 2 switches to yes. So, I would like you to write down the arguments initially because of this beliefs p, r will be there and then in fact it p, r will be played for certain periods until player 2 switches to yes. Once player 2 switches to yes then what happens is that play continues with p, s until player 1 switches to for some certain number of periods again and then then it goes to r, s again it continues again until player 2 switches to. So, it goes on like this and every time the whole idea here is that the amount of time it is playing p, r and amount of time this is played here that keeps on increasing. So, therefore this will never lead to a convergence. So, I would like all of you to write down these details. In fact, if you start looking at it how many times this will be played for example in let us say this lasts for some k periods and then this particular thing will last for some beta k periods and then this will last for some beta square k periods and it goes on. So, that means the amount of time that a particular strategy is the fictitious play is going on is increasing with the number. So, therefore the time average will never converge here. So, I will leave these details here and which I will ask you to fix and with this we will stop this and will continue in the next session. Thank you.