 Last time we started the study of zero sum games and these were games involving two players with finitely many strategies for each player and we said that they can be represented in this sort of form. Now, zero sum game had the property that the payoffs of the players sum to 0. So, as a result you could represent the game using only one set of payoffs and that was represented in a matrix form. So, a zero sum game was represented using a matrix A where the rows of the matrix were the strategies of player 1 and the columns of the matrix were the strategies of player 2. And player 1 seek to find the least value in the matrix, he wanted the least value. So, he was the minimizing player, minimizing player and player 2 was the maximizing player. And what did we, we introduced something called as a security strategy for the players. So, the security strategy is the strategy, security strategy for the row player, for the row player is a was a row i star such that should take the maximum damage that he could have, he could get under i when he plays i star, this was better than what he could get from any other row. So, i star was a row that satisfied this. Similarly, for the column player we had a security strategy j star which was defined with the property that the minimum, again the maximum damage that he could get which is the minimum over i of i j star. This was greater than equal to the minimum over i a i j for all j. And the significance that I said of security strategies was that the security strategies basically bounded the payoff that a player, bounded the loss that a player could incur. In other words, that if the player, if the row player played i star then the worst case payoff that he could get is this, was this number there and this was called to recall this was called the security level. The worst case payoff that he could get was this security level and the reason for that was because if the worst case payoff that he would get is would be if the column player played the j that maximized a i j star and but if the column player played any other j then the row player would always be better off because this was the maximum and he was looking for the row player is looking for the least possible value in the matrix. Any other strategy j that the column player played was always beneficial for the row player. So, in particular, so the security strategies satisfy this particular property that if you take look at the max over j of a i star j, this is always greater than equal to a i star j for all j. So, what is this here? This term is what the what the row player is objective and the column player plays j. So, when the column player plays any j, what the row player can get is only better than the left hand side here, which is his security level. So, the security strategy basically guarantees him a certain level of his objective that he can he can always get a number lower than this particular number. So, that is why we denoted this by a notation called the upper bar, the upper bar of a was the min over i max over j of a i j. And similarly, there was a analogous one for the column player also that was the column for the column player we had that the min over min over i i j star, this was always less than equal this was always less than equal to a i j star for all i, which means if the row player played any strategy, he was guaranteed the column player was guaranteed to get a payoff greater than equal to this the left hand side security level prior. So, in other words, if these if the row player and column player play i star and j star, then they are guaranteed to receive a payoff better than that defined through these security levels. So, that was what we discussed last time. And then we discussed something about saddle points which I will come to in a moment. But what I want to do now for the for a little while is to also give you another interpretation of the significance of security strategies and security levels. So, presently, we have been assuming that the players are playing simultaneously. So, there is no communication between the players and that is the other player is not aware of the that what the what a player what his opponent has played. But now suppose there is a definite order of play, suppose there is a definite order of play. Suppose player 1 plays first and then player 2 observes his action, observes player 1's action and then responds to that. So, player 1 plays chooses a row, player 2 gets to see what that row is and then decide what column he should be playing. If there is such a definite order of play, then what should the players play? Why is that? Why would the first one play? Why would the row player play his security strategy? So, notice this, see basically now suppose the row player plays suppose so suppose E1 plays first and he picks say a row i. Now, this row i is played by player 1, player 2 can now observe that row i has been played. So, what are the choices for player 2? He has to pick a column, but his payoff is now very well defined because he knows that it is going to be E1 of i, a ij where i is already been declared to him. So, i is known, so he has to look over the columns and says which is the column that gives me the best payoff. So, that column player will, player 2 will respond with, will respond with j such that, so j, the j that maximizes basically with j that maximizes a ij with i known remember with i known ok. So, player 2 will now pick the j that maximizes a ij with i known alright. So, in other words player 2 is going to just maximize a ij. Now, this is what happens when player 1 plays a row i, but player 1 knows that this is how what is going to happen because player 2 is going to be maximizing a ij. So, he can now choose his row i, so that in anticipation of what happened, what is going to happen right in anticipation of how player 2 is going to respond. So, if he plays then what row should he be playing in that case? He should be playing the row i which minimizes this particular term, which minimizes this term right. So, he should choose the row i star such that it minimizes the max over j of a ij. So, in other words, this another interpretation for the security strategy is that the, it is the strategy that the player would have played if he was playing first ok. If the row player was playing first, it would he would have played i star, if the column player was playing first he would have played his corresponding security strategy j star ok. Now, yeah no, so player 1 is going to play first ok. So, he declares his i, player 2 responds to that with a j right. Now, knowing that this is going to be the response he can choose the i. So, he minimizes the prospective response that will come up after he chooses the i ok. Now, if you, if this was in fact the game right, if this was in fact the game that player 1 was supposed to play first follow and then player 2 was going to has to follow, then this is in fact the solution there is there is not much else left to do as far as solving the game is concerned. So, in fact the security strategy is the strategy to play if there was a definite order of play ok. The solution of the game would be that the player playing first plays a security strategy and the other player just responds to that, yes, no, yeah, player even if whatever player 2 already knows that player 1 is rational ok, so that or ok, when if you assume that it makes no difference because eventually player 2 has this is what is called a faith accompli basically his options are locked once player 1 has already declared his strategy. Player 2 has to basically pick the column that gives him the best payoff in that row. That is it, yeah ok. So, the security strategy therefore can be interpreted as the right solution for a game where the play was sequential ok. We will talk more about sequential play much later in the game, but as you right now we are this is just more as an artifact for interpreting security strategies that security strategy is the right strategy you would have played if the game was sequential ok and with and if there was a predefined definite order of play. What this also means is that it is a strategy in which if the player played in a in why am I saying that this is in fact the right solution for this problem is you can think of it this way that if player 1 plays i star ok, then he is guaranteed that he will not come back with any regret. He plays i star he is player 2 will then respond with a column that maximizes this box term here. Player 2 will respond with a column that maximizes this term with i equal to i star and therefore then in the you know given that that is how player 2 was going to respond it makes sense that player 1 just picks the i star that minimizes this. If he had picked any other i not i star then he would have in could have in hindsight regretted that I could have actually knowing that this is what player 2 was going to respond with I could have played it played differently right. So, in other words the this is the right solution concept because there is exposed after the game after play has played out there is nothing for the player 2 there is no nothing for the player to say that he could have done better right now if there is no definite order of play right this this situation changes. So, let us let us do an example. So, so let us say for example, consider this so this is player 2 this is player 1 it is this is 4 0 minus 1 0 minus 1 3 1 2 1 can you find the upper and lower security levels what is V upper bar and what are the security strategies for the 2 players what is V upper bar here. So, how do I how do you find V upper bar you have to max look for the maximum over the column in every row and look for the row that gives you the least value of this maximum ok. So, let us just do that look at the maximum over over the columns in this row what is that that is here that in this case it is 4 I will just write it here. So, what I am writing here is the max the max over this over the columns in each row ok max over columns. So, this is 4 what about the second row what is the maximum over the columns it is 3 what about the third row the max over the columns is 2 and now you look for the minimum of these green numbers. So, this is going to be equal to ok what is V under bar V under bar what I need to do is take the minimum over the rows in each column ok. So, in this column now what is the minimum over these over the rows here first column 0 second column minus 1 third column minus 1 and then look at the maximum out of these which then gives me 0 and what are the security strategies it is easy to see actually what the security strategies are the what is the security strategy for the for the row player what should be what is the strategy that gives him this security level third row row 3 and what about column what about for the column player column 1 right suppose now there is no definite order of play and the players play actually these two strategies I star and J star ok. Now, what would happen if player player let us say let us take it from the player 1's point of view what happens if player 1 chooses I star a that means row 3 what what does the column what does the column player play column player would play 2 would play column 2 right, but if column player was playing column 2 what would what would the row player have played row 2 right not row 3 right. So, likewise let us think of it think about it from the column player's point of view column player J the security strategy is suggesting him to play J star equal to 1 which is column 1 ok. If he plays column 1 player 1 would have would respond with with row 2 which is 0 you would get 0 right with row 2, but if player 1 was indeed playing row 2 if player 1 was indeed playing row 2 what would what would the column player respond with column player would respond with column 3 right. So, another way to think about it is suppose the row player plays a row I star thinking that the column player is going to do the worst column player is think plays rows J star thinking that the row player is going to do the worst worst possible damage right, but the worst possible damage to I star is not J star right. So, what is the worst possible damage to I star it is column 2 right. If column so if player 1 had chosen if these 2 players had chosen I star and J star and played this way each has a reason to regret that they would each think that if I knew that this guy was going to play J star I would have played differently. So, if column 1 if player 1 would have if the if the column player had played J star player 2 would not have played I star right player 2 would have played row 2 sorry player 1 would have played row 2 not not row 3 right if likewise if player player 1 is playing row I star which is row 3 column player would not have played could not have played column 1 he would have played column 2 right. So, in other words if this if you think of this as I star J star comprising of security strategies just a pair of security strategies as a solution it is not satisfactory simply because there is a reason to for to regret after the game right which means that there is somewhere you feel that the reasoning process that the players have put in has not given you an you know a satisfactory conclusion there is something for each player to get could that they each thing could have they could have done better given you know given the circumstances. So, which means that there is now the reason this has happened in this particular example is because your V upper bar is not equal to V lower bar. Now, that brings me to the other theorem that we proved last time what will be what did we conclude last time we conclude that concluded that if V upper bar is equal to V lower bar then there exists a saddle point. What is a saddle point? A saddle point is so what is the saddle point is a saddle point is a pair of strategies I star J star such that A I J star. So, A I star if you look at it from the point of view of player I so from the point of view of player the row player the row player is better of playing I star in response to J star and the column player is better of playing J star in response to I star. So, if V upper bar is equal to V lower bar then there exists a saddle point in and in that once there exists a saddle point the kind of regret that we just saw about does not exist. Now, moreover so but there is a slight catch here which is for all I and for all J. So, there is a slight catch I said I star J star is a saddle point but what is the guarantee that this saddle point is in fact the same as the same I star J star that we wrote about for security strategies because I was talking of regret in terms of security strategies but actually what is very nice is actually that the saddle point in fact comprises of security strategies. So, and in fact the converse is so this is what we showed and actually if you see the proof that we discussed last time you will see that the converse is also true. That means if there is a saddle point then V upper bar and V lower bar become equal. So, if we so here what the way I have written it is that if V upper bar is equal to V lower bar then there exists a saddle point and it comprises and the saddle point comprises of security strategies and the opposite result is that if there exists a saddle point then V upper bar is equal to V lower bar and that is actually equal to a I star J star. If there exists a saddle point I star J star then these are equal. Okay, any questions about this? Yes, yes, yes. So, a saddle point is actually the same as the Nash equilibrium when applied to zero sum games. You can see in fact you can see it from the from this inequality here each player the way you can think of it as a Nash equilibrium as you can say that assuming the other players plays his start strategy each player would want to play his own starts would not want to deviate from his start strategy. So, I will write this here. So, a saddle point is a Nash equilibrium is basically is equivalent to Nash equilibrium for zero sum games. Now here is something to just take again sort of take note here. See if the saddle the way the saddle point is defined is that it requires you to define that pair okay and in fact this definition of saddle point is exactly what you encounter when you do calculus multivariable calculus you know optimization saddle points and so on okay. So, this is the saddle point requires you to define the pair I star J star security strategies could need not require you to concern yourself with what the other player is going to play. Security strategy was just defined as in this in this way they mean over that the I star that minimizes the max over J a i j and the J star that maximizes the mean over i of a i j. So, I do not really need to know what the other component of the component is if I want to find the security strategy all I have to do is this I do not need the J star to find the I star and likewise I do not need the I star to find the J star. Whereas in a saddle point a saddle point always is defined with I star J star as a pair together the pair is called a saddle point if this inequality they together need to satisfy these two inequalities. So, a saddle point in some sense is a more demanding concept but a security strategy is something that each player can compute on its own does not need the you know I do not need to even take the view of an observer it is just a each player has to just compute a security strategy by you know by its own calculation okay. So, that is one thing. So, the security strategies can be computed by players independently a saddle point is something that is a joint definition okay but when V upper bar is equal to V lower bar the two coincide okay saddle point comprises of security strategies and vice versa the this is what happens when V upper bar is equal to V lower bar which means if the game satisfies V upper bar equals V lower bar then in that case players can basically reason about the game sort of in some way in some sense independently each player basically just thinks of the worst that could happen and plays and that that that turns out to be you know zero regret or satisfactory at the end of it all okay. Unlike in the so since the reason I brought this up is because the comparison with the Nash equilibrium game. The Nash equilibrium remember is something that is defined as a profile. So, profile of strategies for the player such that no player has an incentive to do so for that the entire profile has to be defined together you cannot define each players action independently. If you go if you go back and think about the dear rabbit game for example I cannot think of only one player's strategy as dear and another player's strategy as rabbit and think of those two as separate because then that will not you know I cannot come dear dear is a Nash equilibrium there rabbit rabbit is a Nash equilibrium but dear rabbit and rabbit dear are not okay. So, so this is again a peculiar structure of the zero sum game the zero sum game has all these nice properties that you know it is essentially because players are this kind of in are into mortal combat here is that that is the reason why this is happening because they are basically just out to kill each other any gain to one is the is lost to the other okay. So, that is why there is this particular all this structure prevails here is another corollary of all this. Now, since a saddle point must always comprise of security strategies whenever V upper bar is equal to V lower bar saddle point must comprise of security strategy but as I said security strategies can be computed by each player separately which means these two quantities V upper bar and V lower bar can be computed by the players separately in other words they are properties of the matrix itself right they are not properties of the specific saddle point necessarily because the saddle point is just a pair that satisfies this whereas these two quantities whenever a saddle point exists its value actually ends up being equal to the V upper bar and V lower bar and these two are actually functions of the matrix which means what does this mean which means all saddle points must have the same value because they are they end up being equal to V upper bar and lower V lower bar right. So, corollary here is all saddle points have the same value now this is in itself significant because what it is saying is that even if there are multiple saddle points to the game it does not matter which one you pick the value that each player gets is the same regardless of what that saddle point is ok. Again contrast with the deer rabbit game deer deer gave you a different payoff rabbit rabbit gave you a different payoff there are two different Nash equilibria which gave two different payoffs to both players right. On the other hand in a zero sum game once there is a saddle point there is one saddle point regardless of what the what the structure of the matrix is once there is a saddle point it has to be that all saddle points will give you the same value provided there exists a saddle point ok. So, this again simplifies for us you know is as observers of the game this tells us this is very useful because it tells us that there is you know I may not be able to predict the saddle point, but I can certainly predict what the value each player is going to get yeah I can say something very very definite about what value each player is going to get ok alright. So, that is that so all saddle points have the same value there is another corollary of the again of what I just wrote on the previous on the previous page. Now suppose there are two saddle points let us say I1 J1 is a saddle point and I2J2 is another saddle point and J1 and I2J2 are saddle points. So, these are saddle points now therefore what does this mean every saddle point must comprise of security strategies which means I1 is a security strategy for player 1 I2 is also security strategy for player 1, J1 is a security strategy for player 2 and J2 is also a security strategy for player 2. But, then what will we also say that if there is a saddle point then every pair of security strategies is also a saddle point point, which means I1J2 is also a saddle point and I2J1 is also a saddle point. So that is, so if there, so actually you will have to go back to the previous, the proof that we showed at the end of the last class, essentially if there is, if V1 equals V2 there is a saddle point and saddle point comprises of security strategy. Conversely, if a saddle point exists, then V1 equals V2 and then once V1 equals V2, the saddle point becomes, is comprised of security strategy. So every, and you put together any pair of security strategies, they have to be a saddle point because security strategies give you V1, V upper bar and, not V1, V2 what am I saying, V upper bar and V lower bar. So, if I1J1 and I2J2 are saddle points, then so are I2J1 and I1J2. What this means is, this is what is called the interchangeability of saddle points. What this again means is, if, see again remember in the dear rabbit game, dear, dear was an ash equilibrium, rabbit, rabbit was an ash equilibrium, but dear rabbit is not. But in a zero sum game, you see what has happened, you can take one player's security, one player's strategy from a saddle point, combine it with the other player's strategy from another saddle point and put those together that still remains a saddle point. This kind of mix and match that you can do here with, you know, with player strategies is essentially because the way a saddle point is arrived at is because, is by both players kind of independently reasoning. You do not have to define the pair together if the saddle point exists because it comes out from security strategies. Yes, I think that generically should be possible, I do not know if I have a numerical example. Yeah, but you see, I mean, so simple, okay, I can all, I can give a pathological example, I can just repeat the last row. So I can just create one more row, copy paste this last row here, it should be, no, it would be either row 3 or row 4. Yeah, well, you should be saying, I mean, but here, this is a bad example because there is no saddle point to here to begin with, here we did not have via per bar and below. But nonetheless, I mean, just imagine a game where you had a saddle point and you just duplicated the row that had the saddle point, right, you will get the column that had the saddle point, you will get, you will get this particular property. So mathematically, I mean, that is of course, a very kind of contrived, contrived example, but mathematically, there could very well be multiple rows that give you the same worst case payoff or multiple, likewise, multiple columns that give the same worst case payoff. So if there is this, because of this interchangeability property, essentially how to play in a zero sum game has a very simple answer, just if there is via per bar is equal to will or what you need to do is just play your security strategy, that is it, and it does not matter whether which security strategy other guy is going to play, you would get the same payoff regardless. So this is basically the, let us say where this is essentially what you can say is how you would reason if v upper bar is equal to v lower bar for a zero sum game.