 Here is the, here is an application you can say of the use of mixed strategies in a zero sum game from a tale from a Sherlock Holmes novel. So the story is as follows, Holmes is leaving from London and he wants to eventually get to this place called Dover. Dover is the point from which you can, if you know the gates on the coast of England from there you can actually get a ship to go to the European continent. He wants to get to Dover, now that is where he was planning to go but then as he is leaving he sees on the platform that his nemesis, this man called Moriarty actually sees him on the train. So the train is pulling off, pulling out of the platform and Moriarty is on the, on the platform and he sees that Sherlock Holmes is on the train, ok. Now Sherlock Holmes has a ticket till Dover, he can go till Dover that is one option or he can get off at Canterbury, so which is an intermediate stop, the only intermediate stop, ok. Now he knows that Moriarty is going to follow him, alright and now there are two possibilities here, either Moriarty himself also goes to Canterbury, takes another train to get to Canterbury or he takes another train and gets off at Dover, ok. So based on what actually happens, here are the, here are the probabilities that will work. Here are the various things that could work out. If Holmes reaches Canterbury and Moriarty also reaches Canterbury, the payoff for Holmes is the probability of his survival, ok. In which case now if he does not, if Holmes reaches Canterbury, Moriarty also reaches Canterbury then Holmes is killed for sure, so the probability of survival is zero for Holmes, ok. Holmes is looking to maximize this particular number, the probability of survival. If Holmes reaches, if Holmes reaches Canterbury and Moriarty reaches Dover, then in that case Moriarty can retrace himself back from Dover to Canterbury and find Holmes somewhere in the, in, you know, in the, in the country. So in that case Moriarty basically, we will assume that Holmes gets killed with probability 0.5 in that case, ok. So his survival is, probability of survival is 0.5, then the payoff therefore for Moriarty is minus 0.5. Now if, if Holmes reaches Dover, but Moriarty gets off at Canterbury, in that case Holmes survives for sure, ok. So his probability of survival then is, is one, so, ok. So the probability of survival, so the payoff that Holmes then receives is, is one and the payoff that Moriarty receives is minus 1. So actually I should, I do not need this second digit at all, let us remove this second digit. Probability of survival for Holmes here is 0, probability of survival for Holmes here is, is 0.5, probability of survival for Holmes here is, is, is 1 and if they both reach Dover then the probability of survival again for Holmes is 0. Moriarty wants to reduce this probability as much as possible and Holmes wants to maximize this, ok, alright. Now if, let us think of the following situation. Suppose Moriarty knew what Holmes would do, ok, then what is the probability of, of survival for Holmes, 0. If Moriarty knew where Holmes is headed, he will just follow him there and, and kill him there, right. If he knows he is going to get off at Canterbury, he will go, he will go till Canterbury and kill him there. If he goes, if he knows that he is, Holmes is going to Dover, he will go, he will go to Dover and find him there, alright. So, so if, if Moriarty knows what Holmes is going to play, then Holmes is killed, is killed for sure, right, ok. So, now here is where the mixed strategies become useful, ok. So, Holmes now, instead of doing this, what Holmes is going to do is. So, Holmes rolls a dice, ok. Now and if it, if it comes, if it shows, ok, a fair dice, if it shows one or two, then he will, he just goes to Dover, ok. And if it is any of the other numbers, he goes to Canterbury, ok. So, he goes to Dover, else he goes to Canterbury. Now if it is one or two, which means with, so he is going to go to Dover with probability one-third and he goes to Canterbury with probability two-thirds, alright. Now the, the, here is the thing, ok. What is, ok, let us, let us now calculate what is the probability now of survival for Holmes, ok. Let us look at the two options for, from the point of view of more reality, just, ok. So, Holmes was going to play Dover with probability one-third and Canterbury with probability two-thirds, ok. So, he has Canterbury with probability, the two-third probability of Canterbury and one-third probability of Dover, alright. And what was, let us write out the payoffs once again. So, it is 0 for 0, 0 and 0.5, so 0, 0, 1 and 0.5, ok. Now let us look at the, let us look at the payoff that, that more reality now sees from this, ok. What is the payoff? Now that, if this is what Holmes is going to play, alright. He does not know where Holmes is now headed, ok. He just knows that Holmes is going to go to Dover with probability one-third and to Canterbury now with probability two-thirds. More reality just knows that, ok. See, so, so I think I was not very clear, I was getting a little hassle with this. See the main point is, see if more reality knows what Holmes is going to play in the sense that he knows where Holmes is headed, then he will just follow. Now also more reality, let us assume more reality knows what Holmes is going to play, but what is he going to play? He is going to play a roll of dice. He knows that Holmes is going to do this, that he is going to play, he is, if it is going to turn out, if the dice shows 1 or 2, he is going to go to Dover. If it is going to, if it is, if the dice is going to show 3, 4, 5, 6, he is going to go to Canterbury. So, ok. So, what more reality knows is that Holmes is going to play Dover with probability one-third and Canterbury with probability two-thirds. Now with this, let us see what the, what are the payoffs for more reality now in this case. So, the payoff that more reality sees is going to be two-thirds, two-thirds into 0 plus one-third in the first row. So, here what the payoff for more reality is going to be two-thirds into 0 into one-third which is just one-third and what about this one? It is again going to be one-third. So, what has happened as a result? More reality has no, has, has actually now is, has been left ambivalent. He does not, even if he knows the probabilities with which these things are being chosen, he does not know actually which one is going to be played and both are now equally good for him. Ok. So, now what is the best response for more reality in this? What can he respond with? He can go to Canterbury. He can go to Dover. He can go to Canterbury with probability 0.5. He can go to Dover with probability 0.5. He can go, he can do whatever he wants basically. All of them are equally good. Right? So, what more, what Holmes has basically done is he has made him himself, made more reality ambivalent or indifferent between his alternatives. Ok? All right. Ok. So, this is, this is 0.1. Now, what is now the probability that, so therefore more reality can, now, now ok. So, he is the best response for more reality. He can play whatever he wants. All of those are best responses. Ok? Now, keeping this in mind, keeping that this guy could, you know, have a certain best response, if suppose now Holmes who had to decide what is the probability with which he will, he should, he should pick one-third and what is the probability with which he should pick, what is the probability with which he should pick Canterbury and what is the probability with which he should pick Dover, what, what Holmes would effectively be doing is, is doing, is trying to find the max followed, max, max min of this particular value. He would look at the worst case best response that, worst case damage that more reality could do to him and then try to minimize that worst case damage. Right? Ok? Now, what, now what we know is that this max min is actually equal to min max. That is what we just, we showed in the previous, in the previous class. In other words, this and, and therefore, this, you will get a saddle point of this would be one where, would be a distribution for more reality such that when more reality plays that distribution, it should be optimal for Holmes to continue to play his distribution. Ok? So, what I will show you now is actually this two-third, one-third is in fact, the max min or equivalently his own security strategy. Ok? So, you can actually check this. So, suppose more, so as you see that everything, all of these strategy, any strategy is the best response to, any strategy is the best response for more reality, but you can take a particular, so therefore, more reality can respond with anything. Let us take this strategy for more reality. So, suppose more reality chooses this. Ok? So, he plays Canterbury with probability one-third and Dover with probability with probability two-thirds. Now, let us see what is the, what is the probability of survival? His is one-third into, one-third into 0 plus two-third into 0.5. So, that again becomes out here, you again have, again you have one-third here and you do the same thing here, you again find one-third. So, with, so what has happened as a result of this? You see, Holmes has come up with a coin toss, Moriarty has also come up with a coin toss or a roll of dice. Essentially, both have come up with mixed strategies such that given one guy's mixed strategy, the other guy becomes indifferent between what to choose. Right? Now, because the other guy is indifferent, you can pick any best response, but that best response has to also match up with, with, with what the first guy considered as their, as your security strategy or as your worst case. So, you may as well pay the worst case, therefore each of them may as well play the worst case. Ok? So, effectively what has happened is, this two-third one-third and one-third two-third becomes a saddle point for this particular game. And this is the nature of saddle points in a zero sum game. Then one, when a player, what it, what a saddle point tends to do is, is it tends to make the other player indifferent between a bunch of strategies. In particular, even a pure strategy becomes a best response, but that does not mean the player will actually play a pure strategy. It for the other player to play a pure strategy, it all, the first strategy that you started with also has to be a best response to that pure strategy. Got it? So, two-third one-third for, for Holmes, two-third one-third for Holmes is, is, if two, if Holmes plays two-third one-third, Moriarty can play whatever he wants. But then, if Moriarty plays whatever he wants, will Holmes continue to play two-third one-third? No. If it is only when Holmes, if it is only when Moriarty plays one-third two-third, that Holmes would want to play two-third one-third. Ok? So, this particular, this here now is a, is a saddle point or a Nash equilibrium of this particular game. Ok? Two-third, the, this, this strategy, so the two-third C, one-third D for Holmes and one-third C, two-third D for Moriarty is a saddle point. Ok. Ok. So, we found a saddle point now. No. No. Greater than equal to. What is, what equality was? Yeah. Yeah. So, there are, so, so you can argue this in many different ways. So, what is, one of the reasons is that it equalizes the upper and lower values also. So, one-third, two-third gives you the exact same lower value of the game which is one-third and likewise, two-third, one-third for Holmes also gives you that exact. That is another way of justifying that this is a, this is a saddle point. Ok. All right. So, so now, let us calculate what is the, now given this is what they are going to do in the case of a coin toss, ok. What is the probability of survival now for Holmes? So, probability of survival is basically now just the average payoff of this, right. And you can compute that. So, what is that, that is going to be, so, he survives with probability, how much is this? This is 1 by 3, right. So, this becomes 1 by 3. Ok. So, this is 1 by 3 into 1 by 3, you can, so, this is, this 2 and 0.5 will cancel off. So, this is 2 by 9 and this is 1 by 9, that is 1 by 3. Ok. All right. So, this is, so, this probability is 1 by 3. So, in short, by doing this particular mixed strategy, now Holmes survives with probability one-third. Earlier, he was surviving with probability 0. And remember, the point is that he survives with probability one-third even if, even if Moriarty knows that this is what Holmes is going to do. Ok. So, even if Moriarty knows that Holmes is going to cross a coin or roller dice and choose to go to Dover with probability two-thirds and go to, sorry, probability one-third and go to Canterbury with probability two-thirds, even if he knows this whole plan, still he survives with probability one-third. And the reason, what is the reason for that? And the main reason for this is the inherent nature of randomness. When you toss a coin, Moriarty may know that this is the probability with which various things are going to play out, but he does not know actually what will play out. Ok. This is, this is a manifest, this is exactly what mixed strategies are doing. Essentially, in fact, why Moriarty, even Holmes does not know where he is going to go, when he is going to roller dice. Right? He only knows that with certain chance he is going to go here, with certain chance he is going to go there. And because he, because you do not know the outcome, because Moriarty does not know the outcome, Moriarty was also forced to, in turn, you know, you can say he can also do the same thing and therefore they both, they both effectively try to fool each other and fool each other to the point when they have no preference over their pure strategies anymore. Both pure strategies are just as good for both, because then you have really made them indifferent. The ultimate, you know, ultimate sort of ambiguity you can, or, or, yeah, ambiguity you can create, right, in a game that you, a player, the opposing player has no way of, has no sort of way to distinguish between whether this is better or that is better. Right? So, this is, this is effectively what has happened here, that by doing this roll of dice, you, Holmes has ensured effectively that, that he, he survives with probability, he survives with probability one-third and that is because even if that, even if, you know, even if the mixed strategy itself is known, the exact action that he will be taking is not known, okay? So, this, this, that is the smoke screen that you know, a randomized, that randomization basically creates, okay? All right. So, there are, you can, so this is, this is just, this is one more example of, of why, you know, you should, you can say is, so how players can use randomization and also of why you should allow players to do randomization, because you can see that there is a, there is a, there is an actual, you know, strategic benefit involved in randomization. All right? Okay. Okay. Now, this, let us just play this thing out a little bit further and see where, how this, this shapes up. So, one of the things we just saw was that when, when a player plays his, a mixed strategy, the opponent has any, you know, any pure strategy is as best as one. Now, technically, what really happened is, what is happening is actually this here, is, this is a result that we have seen before. See, remember, we wrote this out that when we wrote, write min over of y in y, max of z in capital Z, y transpose A z, this was the same as the min over y of max over j, y transpose A j, right? So, what, what is the reason? The reason for this was basically for a fixed y, this inner objective was actually linear in z. So, all you had to do was pick out the element in the row, which had the largest weight and that is what this max over j does. Now, what this is effectively saying is that if I fix a y, even if I fix the saddle point y, right, the security strategy y, okay, the best response to that is going, can, is going to be any set of pure strategies, which give you the, which have the, the way, where y transpose A is the largest, right, a y, any j for which y transpose A j is largest is the best response to that. And in particular, any mixing over those j's is also a best response, right? So, so which means that for any y, so once of, once you fix one player strategy, the other player's strategy, the other player has, has a, he, the, for the other player, the problem becomes linear and he has to just pick the one that is, you know, the weight that is largest. So, any j that maximizes y transpose A j is a best response, is a best response of the column player, okay. So, like, and the same is true also for the minimizing player, okay. So, if you, if you, the same can be written also for, if I write it, this is still max over z, z, is this clear? Okay. But that doesn't mean that you can restrict to pure strategy best responses. A best response is j, a pure strategy best, so a pure strategy j is a best response to a mixed strategy y. A pure strategy i is a mixed, best response to mixed strategy z. But that does not mean that i and j are going to be best responses to each other, okay. So, so the, so a saddle point will be, therefore, necessarily a saddle point will be one, will have this property that it, so clearly a, okay. So, now if, if, if j is a strategy that maximizes y transpose A j, okay. And if there are multiple such j's, then in any saddle point, what would the, what would the, the, what would the column player be playing? Can the column player play a z? So, any, so column player z mixed saddle point strategy has to be a best response. So, so, so let's say y fix y at y star, okay. Okay, which is one component of the saddle point. Now, any pure strategy j that is a best response to this, that maximizes this is a best response. Now, what about a mixed strategy? What kind of mixed strategies will be best responses to this? It will be a combination of those j's that maximizes, right. Because if, if a j is not amongst the larger, if for that j, this is not amongst the largest, right. If it, if it is a little bit less than the largest, then it doesn't make any sense to give that, that component a positive weight, because you are unnecessarily reducing your own, reducing your own pure. What you would want to do is randomize over the best pure strategies, correct. So, the, so if, if I take y as y star here, if I make y as y star, then the best response z's, there will be a best response z star to this, in which the player has randomized only over those j's that maximize this, okay. And likewise, out of those z's, that also there will be multiple such z's. And out of those z's, there will be one such z, such that if you take, fix that z, then the maximizing y in response to that would be a y that is supported only on this, only on the i's that minimize this, okay. And then a y, a pair y star z star that basically keep each other in equilibrium in this way will therefore be a saddle point, okay. So, in other words, in any saddle point, no player is going to give weight to a strategy that is giving the giving, give weight to a pure strategy that is giving anything less than the best, the best, best possible response. So, each player basically distributes his probability of distribution over the pure strategies that are a best response to the other guys, given the other guys strategy, is this clear? So, in other words, if, if, so in short, in summary, if, if y star is what is chosen by the row player, then Z star will be such that it is supported on these j's, the j's that maximize this. And y star in turn will be supported on the i's that maximize this with Z replaced by Z star, with Z star here, the ones that, sorry the ones that minimize this with Z replaced by Z star, okay. So now, so let us, let us actually see an application of this now. So, how we could use this particular fact.