 Welcome back to NPTEL course on game theory. In the previous session, we have seen some ways of computing the saddle point equilibrium for matrix games. Now in fact, one of the interesting property that we have used there is that if a player 1 is let us say playing his optimal strategy x star, then we are checking that the player again, we are checking essentially against the pure strategies of player 2. Now the whole another interesting thing is that do we really need to check against all the pure strategies. So, here we will make a interesting statement here and whose proof we will see it is let us say consider a game course we take m by n size and let us assume the value of the game is we let x star x star 1 x star 2 x star m be any optimal strategy for player 1 and y star to be y 1 star y 2 star y n star be any optimal strategy for player 2. Then the following happens some summation of a i j y star j j runs from 1 to n is nothing but v for all i such that x i star is greater than 0. Similarly, summation a i j x star i i runs from 1 to m is equals to v for all j such that y star j is greater than 0. That means when you take this what is this a i j star y j star and you are summing over all j. So, therefore, this is nothing but the pay of that player. Let me write it here. This is nothing but the pi of y star and e i if player i is playing i throw and player 2 is playing y star the pi i i y star is nothing but v for each i whenever x i star is greater than 0. Similarly, here it says is that pi x star e j pi x star e j is equals to v for all j such that y star j is strictly greater than 0. So, in fact, this is not difficult statement to prove because this in fact follows from that pi v is nothing but pi x star y star. So, if you really look at proof pi x star y star is nothing but v that is the value and what is pi x star y star this is nothing but summation pi x star e j where j runs from 1 to n and if e j is 0 of course there should also be a y star j here. If y star j is equals to 0 for some j then that particular term can be removed here and what you are going to get here is this is nothing but summation y star j pi x star e j where j such that y star j is strictly greater than 0 this is equals to v. Now, this is a convex combination of pi x star e j s a convex combination of pi x star e j s is equals to v if they are all not same of course if they are not same you look if something is bigger then that will be bigger than v that is not true because all of v is going to be there x star is optimality y star is optimal. So, this immediately tells me that using the convexity properties the bilinearity of pi and this is a convex combination what you have is that pi x star e j is going to be same as pi x star e l for all j comma l such that y star j comma y star l greater than 0. If whenever the probability with which those columns are chosen is strictly greater than 0 then the corresponding this is going to be same that is exactly what we wanted to prove it. So, this is known as an equal such a strategies are called equalizer strategies sometimes this is quite useful in solving this problem that what I mean is that computing an equilibrium. Suppose you can always try to find some y star satisfying such properties some y star x star and v then you can actually verify that a converse kind of result holds true. In fact this idea we have used in the early lectures where we showed that in the matching pennies game the choosing with probability half that is an equilibrium we have used that fact. So, you can go back to that video and see that and try to relate to this theorem that in that way you can also this theorem is also quite useful in solving zero sum games. The next we will look at it another very interesting idea of this games what is known as a symmetric games. The symmetric games are a nice set of games where the players have kind of a symmetric behavior. So, what exactly I mean is so whatever player 1 is trying to do the player 2 is also trying to do the same thing. So, let us say player 1 is maximizing certain utility and the same utility the player 2 also has and therefore you want it in zero sum game that means whatever you are trying to maximize I am also trying to minimize and several these things in particular in this setup what we have is that this a the matrix a should be same as minus a transpose that means this is a is a skew symmetric if the payoff matrix a happens to be a skew symmetric. So, let us look at an example the best example is rock paper scissors game we will understand this symmetry better in the this thing. So, there are three strategies 0 minus 1 1 1 0 minus 1 minus 1 1 0. So, rock paper scissors. So, the whatever player 1 is doing the same thing player is 2 is also trying here. So, whatever player 1 gets the same amount he is also getting under the similar setup. So, so both the players are trying to maximize the same payoff matrix. So, let us look at it the player 1 is trying to maximize this interest into this thing he chooses this one and player 2 is minimizing these columns if I he is minimizing this column let us say the column 1 if player 2 is minimizing this column that means 0 1 minus 1 that means he is maximizing 0 minus 1 1 minimization is nothing but minus of maximizing maximizing the minus of that. So, minimizing 0 1 minus 1 is nothing but 0 minus 1 1 and player 2 is choosing this column or he is basically looking at minimizing these entries 0 1 minus 1. Same thing if you look at it player 1 when player 1 is choosing row 1 he is actually maximizing these entries 0 minus 1 1 and 0 1 minus 1 here minimizing 0 minus 1 maximizing both are the same. So, the symmetric game is exactly that. So, what is the condition for this one? So, the minus of this row should be this row that is exactly this condition. So, if a is equals to minus a transpose we say that the game is symmetric. The symmetric games have an interesting fact. So, immediate thing is whatever in a symmetric game intuitively if you look at it whatever I am maximizing you are the same thing you are exactly in the same position you are also going to maximize the same thing that mean whatever I am getting you should also get the same. So, that essentially means that because it is a 0 sum game whatever I am getting should be same as what you are getting and the sum of both is 0. Therefore, the payoff that we get under an optimal play is 0. So, that is the interesting situation here. So, in fact we can write this as a theorem a symmetric game has value 0. The symmetric game will have a value 0 not only that whatever is optimal for player 1 is also optimal for player 2. So, let us try to prove this fact. So, what do we have? Pi x comma y is nothing but summation i is equals to 1 to m summation j is equals to 1 to n which is x i a i j y j. This is same as summation i is equals to 1 to m summation j is equals to 1 to n x i minus a j i by the symmetric property of this game. If the game is symmetric the payoff matrix is going to be a skew symmetric matrix y j. So, if you rewrite this one this is going to be y comma x of course minus. Now, here is few here are few things I have used m here n here. Remember because by the definition of symmetric game that is the matrix a is same as minus a transpose that means a the number of rows and number of columns are same m and n are same. So, that we have used explicitly here when I put y here and x here we are using this one. So, therefore pi x y is same as minus pi y x. So, that is basically the important fact here pi x y is going to be same as minus pi y x. So, whatever you are getting x and y let us say x and y is what player 1 and player 2 are playing then pi x y is same as minus pi y x. So, this in fact gives you the following fact pi x comma x is minus pi x comma x implies pi x comma x is going to be 0. If both the players are playing the same strategies we are getting 0. So, this immediately shows that the value of the game is less than or equals to 0 y if this comes easily what is the value of the game is nothing but max x min y pi x y minimum y pi x y certainly less than or equals to pi x y for each y in particular pi x x this particular term is certainly less than or equals to pi x x and this is 0 the maximum of that has to be less than equals to 0 and hence v of a is less than or equals to 0. Now, we have to use the symmetric argument now in a similar using a symmetric argument. So, for example look at the minimum y max x of pi x y now look at this maximum of this this has to be less greater than or equals to pi y y and this is 0 and minimum this term for each y is greater than equals to 0 therefore minimum of that is also greater than equals to 0 therefore this says the lower value is less than or equals to 0 this says the upper value is greater than equals to 0 and we know that the game admits a value therefore lower and upper values are same and hence value of the game is 0. So, this is a very interesting argument which gives you that for a symmetric game the value is always 0. Now, you can check the rock paper scissors game and then see that the game has a value 0. In fact, here is another interesting thing is that because you know that the rock paper scissors game is a symmetric game. So, therefore the value is 0 therefore you can look for an equalizer strategy assuming v is equals to 0 because you know the value. So, v is 0 you put it and for use the equalizing property of an equilibrium and then you can easily solve the best thing. So, that is an interesting and simple exercise. Now, what else is required? Now, we need to show that if x star is optimal for player 1 then x star is also optimal for player 2. So, how do we prove this fact? This fact once again comes from this thing suppose x star y star is a optimal what I mean is saddle point equilibrium here. Then we know that pi x star y star is 0 and now this is also same as pi x star x star. So, given x star if player 1 fixes x star y star minimizes and similarly by this equality x star also minimizes. So, therefore you can use any properties like equalizing or you go back to the linearity by linearity of the payoffs and other things you can now conclude that x star is also optimal. In fact, x star x star is saddle point equilibrium you can prove it without much difficulty. So, in fact this completes the proof but the way to visualize this fact is that this is a symmetric game that means whatever I am maximizing the same thing you are also maximizing. Therefore, whatever is good for me should also good for you because it is a symmetric environment and hence such a result. So, this is quite a useful thing and in fact symmetric games play a very interesting class of games. In fact, there are ways to symmetrize a non-symmetric game. We will see that more later on when we go to the non-zero sum games. Now, we introduce an iterative method to solve the zero sum games. This method is known as a fictitious play. The method is introduced by Brown and the convergence proof is given by Robinson. So, fictitious play is an example of a method which is known as the learning methods what it says is the following thing. So, what is this learning particularly? So, there is a game that people are playing it. So, let us assume they cannot really compute the equilibria and other things. So, the only thing is that they are playing it. As they play again and again this game can they infer, can they learn the equilibrium what should be the optimal strategy for. So, this is basically the idea behind this learning algorithms. This fictitious play is one such algorithm. Let me describe how this method looks like. So, we have a zero sum game A and there are two players. So, the most important thing here is that the players do not make a mixed strategies. So, in other words let us say in the first time when player 1 is playing he does not know anything about the game. He only knows what choices he has and all. So, he will choose some strategy. Let us say a pure strategy let me call x1. This is a pure strategy. x1 is a pure strategy. So, player 2 will actually play a best response to x1. Once again he will choose only pure. So, he will look at y1 which is basically a pure best response to x1. He is going to choose another a pure best response. This is pure best response which is y1. Now, in the next round player 1 observes that player 2 has played y1. So, if he has played y1 what should he play? So, he will look at it x2 which is pure best response to y1. Now, in the next setup next iteration the player 2 observes that player 1 has played x1 and x2. Therefore, he thinks that player 1 will play x1 or x2 with equal probabilities. Therefore, he will look at the best response to half x1 plus half x2. Again he will only look at the pure best response. The pure strategy which is a best response to half x1 plus half x2. That means he is choosing x1 with probability half x2 with probability half. Then if player 1 the player 2 thinks that player 1 is going to play x1 with half probability x2 with half probability. Therefore, he will look at the best response corresponding to that. Let me call that as y2. Then once player 2 chooses this y2 in the next round player 1 will look at the player 2 and he has played y1 and y2 and therefore, he thinks player 2 will play y1 with half probability y2 with half probability. Therefore, he will look at x3 which is basically a pure best response to half y1 plus half y2 and it goes on like this. So, every time a player makes a decision he will look at the empirical behavior of other player what he has played how often he has played a pure strategy he looks at that then he forms an opinion and he plays a best response a pure best response to that opinion. So, this is basically known as a fictitious play. So, recall in a fictitious play players make an opinion about the other players strategies. So, this opinion is formed based on the empirical average. Now they choose the pure best response to those empirical averages and then this the convergence is that the x1 plus x2 plus x and these are the this thing by n. This is basically the empirical average of player 1's choices. This converges to some x star. Similarly, the choices made by player 2 look at its empirical average this converges to y star. The theorem that is proved by Robinson is that x star y star is saddle point equilibrium. Convergence the proof is actually a lengthy proof at this moment we will not go into the proof but we will see an example illustrating this result. So, let us consider matching pennies. So, this is a very simple example the two pure strategies to make and if both match player 1 gets 1 unit if they do not match player 2 gets 1 unit this is a zero sum game of course. So, how does this go in the fictitious play? So, player 1, player 2 let us say in the first round player 1 has played H if initially if player 1 plays H player 2 will play T and in the next round player 1 because player 2 has played T. So, he will play T because he has played T now player 2 will see that player 1 has played H and T. So, therefore, with half probability H with half probability T therefore, he can play anything that maximizes that let us look at the what is the payoff that player 2 will get. So, player 2's payoff is basically pi half H plus half T comma whatever he chooses Y. So, this is nothing but half probability if he chooses H this is T Y. So, if Y happens to be H then he will get pi H H will be 1 and then half and this is 0 therefore, he will get half if Y is H if Y is T then also he will get half. So, he is going to get whatever he chooses H or T whatever he chooses he will get half. So, therefore, he will pick one of them let us say he has picked T T and H both are best responses to half H plus half T. So, now once you know that he picks T now the player 1 has picked T. So, therefore, player 2 will continue to pick T because that is going to be the best response to this because both times he has picked T. So, therefore, T the best response to T is T. Now, player 2 will see that player 1 has played H one time and T two times. So, therefore, he will play 1 by 3 H plus 2 by 3 T. Now, player 1 has played H once in 3 times whereas T in 2 times. Therefore, so that means T he has played more often than H. Therefore, the for the player 2 the best response is going to play something not equal to T that is H. So, he will play H. Now, you go on like this and in fact, it is an interesting exercise to see that this empirical average converges to half H plus half T here here also half H plus half T. So, one can actually do this iterations for several times and then get some opinion about how much it is. The most one of the very important point here is that this convergence rate is actually very slow. This is not a very fast converging method, but nevertheless, this is a good method to show the convergence of this saddle point equilibrium and in fact, for any zero sum games this convergence automatically happened. This is a proof due to Robinson. In the next sessions, we explore further properties of this quick t-shirts play and then study some of its convergence properties. Okay, with this we will stop this session. We will continue in the next session.