 In this video, we're going to look at the equilibria of infinitely repeated games. So in order to talk about equilibria, we need to begin by asking, what are we going to mean by a pure strategy in an infinitely repeated game? And you may like at this point to pause the video and see if you can answer that question before I go on and tell you the answer. If you need a hint, you should remember that the rule of thumb that we've used to define strategies as we've changed game representations has been to say that a pure strategy should be whatever you would need to tell another person to play in the game on your behalf and to end up doing exactly what you would have done. So a pure strategy has always kind of been the policy that you would follow making all of the decisions in the same way as you would have actually done it. So whatever you would need to communicate about that policy. Well, in an infinitely repeated game, your strategy is going to be a choice of action at every decision point that you would have, which means an action that you would take at every stage game. And bear in mind that when you take those actions, you actually get to reason about everything that you've seen before in the game, which is to say you can remember all of your own previous actions and you can also remember all of the actions in previous stage games by the other players. So really, your pure strategy space is a mapping from every history to an action choice that you would make. Clearly, this is an infinite number of actions. So unlike the previous games we've looked at, extensive form games and normal form games, you're not even going to have a finite set of pure strategies in an infinitely repeated game. Let me give you some examples of some famous pure strategies in infinitely repeated games. Nevertheless, to give you the sense that we can say interesting things about pure strategies, even though the set of possible pure strategies is infinite. So think about playing repeated prisoner's dilemma. So I'm going to play the prisoner's dilemma game infinitely. One famous strategy is called tit for tat. Turns out that there were famous competitions where people submitted programs that played a prisoner's dilemma very repeatedly and then they looked at how these programs did against each other. And tit for tat was a strategy that famously did really well in those competitions. The way it works is it starts out cooperating and then if it observes that its opponent ever defect, so in the previous round the opponent defected, then tit for tat defects in the next round. Then it goes back to cooperation. So then if it defected but its opponent cooperated, then it'll respond to that by cooperating. So if tit for tat plays itself, it'll cooperate forever. But if it plays against somebody who defects, it's going to intuitively kind of punish that defection by defecting itself in the next round. But it's very forgiving. It's going to go back to cooperation after one punishment. In contrast, the trigger strategy is kind of a really mean spirited version of tit for tat. So it starts out by cooperating. But if its opponent ever defects, that's it. It's just going to defect forever. So it's going to pull a trigger and say, I'm never going to forgive you. You've wronged me once. I'm going to wrong you back until the end of time. That's why we call it trigger. So you can see that in both of these cases, I've been able to describe to you how these strategies would respond to anything from the infinite history that they would see kind of in algorithmic terms without actually kind of writing out a strategy kind of in a formal language. So you can see that it is possible to think kind of coherently about interesting strategies in infinitely repeated games. Now, of course, what we would really like to do as game theorists is describe the Nash equilibria of infinitely repeated games. The kind of approach that we've taken in the past has been to first of all show how we can make an induced normal form out of a game. We figure out what a strategy is in the game. We show how to make the induced normal form. And then we just appeal to Nash's theorem and say, because we've got a normal form game, we know that there is a Nash equilibrium. And so everything kind of goes through the way it did before. Well, that worked well for us in the past because we always ended up with finite-sized induced normal forms. And unfortunately, because we have an infinite number of pure strategies, we're going to get an infinity by infinity matrix, even in the two-player case. And so we're not going to have something to which Nash's theorem applies anymore because now we have an infinite-sized game. And Nash's theorem only works for finite games, which means games whose matrices are finite. They contain a finite number of numbers. And that means we don't have any reason, based on what we know so far, even to be sure that equilibria do exist in these games. On the other hand, because they're an infinite number of strategies, there might even be an infinite number of pure strategy equilibria. So we've seen the fact that in the past it's possible to have an infinite number of mixed strategy equilibria. For example, if I, in a normal form game, have two strategies, two pure strategies that I'm indifferent between, then any mixture between them can also be a best response for me. So there's a sense in which I can have an infinite number of mixed strategy equilibria. But here I can have an infinite number of qualitatively different pure strategy equilibria potentially. That seems like a problem. So we're not going to be able to somehow list off the equilibria necessarily of a repeated game. Interestingly, there is still something that we can coherently do to give a satisfying answer about the Nash equilibria of infinitely repeated games. And in this video, I'm going to tell you what it is. And actually, because it's so satisfying, I'm even going to prove this theorem to you. So you really will understand how this one works. And here's the idea of the theorem that we will eventually prove. We can characterize what payoffs can be achieved under equilibrium. And we can give an example of equilibrium strategies that result in those payoffs without giving every equilibrium that results in those payoffs. So we're going to kind of characterize which strategies are in equilibrium in infinitely repeated games sort of via their payoffs. I'm going to be able to tell you precisely which payoff vectors are achievable in equilibrium. So to do that, I need to give you a little bit of notation so we can talk about these payoff vectors. So this is kind of our slide of notation. Once we get through this, we're going to have all the building blocks we need to prove our theorem. So we're going to start with some n-player game. So this is just our stage game, some normal form game. And we're also going to start with some payoff vector here. And by this, what I mean is the utilities that the players get in the game. Now, in this video, I'm going to talk about the average reward case where each of these numbers actually encodes the utility that I get on average by following my strategy in the infinitely repeated game. And so that's going to say I care just in the same way about payoffs that I get now and payoffs that I get a million iterations into the future. We can do something similar in the discounted reward case. And that's going to be the topic of a different video. But for this video, we're going to talk only about the average reward case. And the reason is the proof is a little bit easier to think about in the average reward case, even though the discounted reward case is maybe a bit more of a practical setting. We can get across the key ideas of both proofs by this proof. And this one's a bit easier. So let's do it. To do that, I need to remind you of the concept of the minmax value here, which is something that we introduced in the concept of zero sum games. But it turns out to be very important for repeated games. So I'm going to remind you of what it is here. So this is the mathematical definition of the minmax value. But let me start by saying in words what it means, because the nested min and max operators are a little bit hard to think about. Essentially, what I's minmax value is, it's the amount of utility that Player i is able to obtain for himself. If all of the other players, who we call minus i, are completely unconcerned about their own utility, and instead they're just trying to hurt i as much as possible. So they all play that strategy from their joint mixed strategy space. That if i responds to that by doing the best thing that i can for himself, then i's utility is as low as possible. So they're trying to minimize i's utility given that i is trying to maximize in response to their minimization. And the number that comes out at the end is the amount of utility that i can get by doing as best he can against everybody trying to hurt him as much as possible. So that's i's minmax value. So intuitively, if i want to punish you as much as possible in a game and you know that i'm trying to punish you, that's the amount of utility that you get is your minmax value. So i will say that a payoff profile, remember a payoff profile is a payoff amount for everybody, is enforceable, or give it the name enforceable, if it's the case that everybody's payoff in that payoff profile is at least their minmax value. So if anybody is getting less than their minmax value in a given payoff profile, i will say that that's an unenforceable payoff profile. And i will say that a payoff profile is feasible. I'll use this technical term feasible if the following condition is true. Intuitively, what i want to say about feasibility is that it's possible to actually get that payoff profile by combining together payoffs in the actual game that's being played. Notice that enforceability doesn't actually that it's possible. I could say the payoff $1 million for me, a million for you in prisoner's dilemma is an enforceable payoff profile because it's above both of our minmax values. But there's no way we can actually each get a million in prisoner's dilemma because the numbers don't go that high. So feasibility is going to talk about what it's actually possible to do. And the way we're going to say this is i'm going to restrict myself to rational numbers. I'm going to say if there exists rational and non-negative values alpha A, such that for all players, I can express player i's element from this payoff vector as a sum over all of the payoff action profiles in the normal form game of this alpha A times i's utility for that A. So intuitively, oh, and then of course I need this, let me just say this last part of the condition that these alphas all sum to one across all of the action profiles. So let me kind of explain to you what this means. So I have kind of a normal form game and what I want to say is I have some weight on each cell in the game. These are the alpha A's and they all sum to one. And what I want to do is take my payoff from this cell weighted by the alpha for that cell, my payoff for this next cell weighted by my alpha for that cell and sum all of those weighted utilities for me up. And then I get RI. So what I want to say is there exist alphas that I could use that would blend together the payoffs in the actual game in such a way that I get RI. And furthermore, that has to simultaneously be true for everybody else. And in particular, it has to be true with the same alpha A's for everybody else. So I can come up with some alphas that get me my RI, but then I have to use those same alphas to get RJ for player J. I have to have the same weights on all of the cells here so that I get the same numbers for everybody. So let's think about the following game which will give you an example of how feasibility works. So let's say I have a game that looks like this. So in this game, I claim the payoff profile 11 is enforceable. And the reason is I can put a weight of 50% on this cell, a weight of 50% on this cell and weights of zero on these cells. And you can see my payoff in this game is 50% times two plus zero times zero plus zero times zero plus 50% times zero, which is one. The other player's payoff in this game is 50% times zero plus zero times zero plus zero times zero plus 50% times two, which is also one. So that is a feasible payoff in this game. On the other hand, the payoff two two is not feasible in this game. And the reason is there's no way that I can have weights on these cells that sum up to one and that give both of us two because for me to get a payoff of two, this would have to be one. That's the only cell where I get a payoff at all. And for my opponent to get two, this one would have to be one. And then they would both sum up to more than one. This condition over here would be violated and so we just can't do it. So that's not a feasible payoff. Okay, now we're ready to prove the folk theorem. So, and it's kind of funny, it's called the folk theorem. It's kind of like a folk song for mathematicians. So a folk song is a song that everybody knows but nobody really knows who wrote it first. And a folk theorem is a theorem that people all kind of knew and talked about before they got around to writing it down and they're not really sure quite where it came from. So this is the folk theorem of game theory. And despite having uncertain origins, it's very important. So the folk theorem basically tells us what the Nash equilibria are of an infinitely repeated game in terms of these payoffs and in terms of the definitions that I just told you. So there are different folk theorems for different settings. This is the folk theorem for infinitely repeated games with average rewards. So the folk theorem has two parts which basically stems from the fact that I've made a restriction here to rational alphas. Turns out we don't actually need that but the math gets more annoying if we have to deal with real values. So to keep things simple, I'm doing this just for rational numbers. So the folk theorem says that in any end player game which we're gonna repeat infinitely and for any payoff vector r1 to rn just like we've been talking about, first of all, we're gonna talk about the case where if r is the payoff in a Nash equilibrium of the infinitely repeated game with average rewards, what we can conclude about the payoff vector and what we conclude is that for every player, r has to be enforceable. Remember enforceable means greater than or equal to that player's min max value. So I can conclude that if r is the payoff in an equilibrium of the infinitely repeated game with average rewards, then for everybody that must be enforceable. That seems like kind of a weak thing to say. The second part of the folk theorem is kind of more surprising. It says basically that's the only restriction we need. If r is enforceable and furthermore if it's feasible, if it's kind of possible to make it out of the payoffs of the game, then r is the payoff in some Nash equilibrium. So together what this says is that basically feasibility and enforceability is the whole story of Nash equilibrium. Enforceability seems like this very small thing. It says you're getting no less than the payoff that you get if everyone punishes you as much as it's possible for them to punish you. And feasibility says it's kind of possible to get these payoffs in the game at all. And the folk theorem says that's basically it. As long as you meet those two conditions, you've got a Nash equilibrium. That's basically what there is to say about Nash equilibrium. You'll notice there's a little bit of wiggle room between these two parts of the theorem statement, which has to do with the fact that we're talking about feasibility here and we're not talking about it here. So we can't conclude that every Nash equilibrium is feasible. You might wonder why that is. That's just because some of them aren't expressible as rationals. Some of them, the alphas wouldn't be rational. So that's the part that I'm kind of leaving out of this proof, but that's a technical point. So the broad thing that you should remember about the folk theorem is that basically feasibility and enforceability together are equivalent to Nash equilibrium in repeated games. So if you wanted to stop here, you now know what the folk theorem is, but I encourage you to keep watching the rest of the video and you'll see how we actually prove this there. So let's begin by saying, proving the first part. So saying, why is it that if the payoff is achievable in a Nash equilibrium that I can conclude that it must be enforceable? Well, to prove this, I'm gonna prove by contradiction. So I'm gonna begin by supposing that the payoff vector is not enforceable. And if not, that means that there must be some player i for whom his payoff r i is strictly less than his min-max value. Now let me consider a situation in which this player i changes his strategy. Let's imagine that he instead deviates to playing bi of s minus i h. I'll tell you in a second what that means. Anytime he's in a history h of the repeated game. So what is this thing? It says, this is the strategy of the other players. Let's assume that he gets to know what that is because we're talking about a Nash equilibrium. And we'll say bi is just his best response every time the other players are playing their strategy profile s minus i given h for every history h. So let's just consider a case where he just best responds to what the other players are doing. Well, by the definition of a min-max strategy, you have to get a payoff of at least your min-max value in every stage game if you follow this strategy that we just defined. Because remember the definition of a min-max value is that everyone is trying to hurt you as much as possible and that you're best responding to that. So intuitively, if I'm already getting less than my min-max value, that means I can't be best responding to everybody else because if they're trying to hurt me as much as possible which is the worst case for me and I'm best responding to them, I will get my min-max value. So intuitively, I could change my strategy to best responding and that has to improve things for me that would have to bring me up to my min-max value. And because that would improve my utility, that's a better response for me than what I was doing before. And because, so basically here we've just kind of constructed a profitable deviation for player I. And because a profitable deviation exists for him, that means the strategy he was playing before couldn't possibly have been a Nash equilibrium. And so that leads us to derive a contradiction from our assumption that we made at the beginning that R was not enforceable. So you can see, we can conclude that if R is not enforceable, we can't have a Nash equilibrium. And that then proves what we wanted to prove that being in a Nash equilibrium implies enforceability. So that's part one, that's kind of the easy direction and the less interesting direction. Let's now do the second and more interesting part which says all I need to assume is that the payoff profile is both feasible and enforceable and that means that I've found a Nash equilibrium. And what's interesting about this part is that we're going to do it by construction. So I'm gonna show you how, if you're given any feasible and enforceable payoff profile, you can build a set of strategies for both players which are in equilibrium with respect to each other and which obtain exactly that average payoff for both players. So first of all, let's just kind of introduce some bookkeeping notation that we'll use here. So since R is feasible, and since we've assumed that these alphas are rational, then we can write each RI as follows. So basically we can make up new variables beta A and gamma where we replace each alpha A by beta A divided by gamma. And basically that's not hard to do because we know that the alphas are rational. So we know that it's possible to write each of these alphas as a fraction and then we can make gamma into their common denominator and we can set the betas appropriately and then it's possible to come up with betas and gammas that make this expression true. So that shouldn't be too surprising, that just follows from the fact that these alphas are rational and from feasibility. Now I'm gonna construct a strategy profile and then on the next slide I'm gonna argue to you that that strategy profile is in equilibrium or that we can turn it into an equilibrium. So let's make a strategy profile that cycles through all of the outcomes using cycles of length gamma. And the way it's going to work is that each cycle is going to repeat action A exactly beta A times. So we're gonna have our kind of game matrix here and remember beta and gamma are both integers. So for example, let's say here we have gamma is seven and let's say here we have betas two. So here alpha A is two over seven. Let's say here it's zero over seven, zero over seven. I'll just write the betas from this point on. Zero, one, two, zero, two, zero. So let's say that's what we had in this particular game. Then what the strategy would do is it's just gonna cycle through. So player one strategy would be to play, let's give these names A, B, C, D, E, F. So player one strategy would be to play A, A, B, B, B, C, C, and then go back and keep doing that forever. And player two strategy would be to play D, D, E, F, F, E, and then go back. And so if the two players played those two sets of strategies together in a coordinated way forever, they would cycle through exactly hitting every action profile in this game according to the betas. In just the way that the betas say. And let's denote such a sequence A to the T. Now, let me make strategy, the real strategies for the players that I'm gonna claim are in equilibrium here. Let's define a strategy S-I of player I to be a trigger version of this strategy. So if nobody deviates, then S-I tells the player to play just what A-T would tell them to play. So the players are gonna begin by kind of cycling through these outcomes in just the way that I told you. But if one player notices that the other player didn't do what they were supposed to do according to A-T, then they're gonna hit the trigger and instead from that point on, they're gonna play the min-max value against the other player. So if, sorry, that's just what this says here. So if there's ever a period in which somebody does the wrong thing, then everybody's gonna gang up on that person forever and play the strategy against that person that causes that person to get their min-max value. So let me just say that one more time to make sure we've all got it. So A-T is a sequence which is constructed so as to hit every action profile here exactly the number of times given by these beta integers and to be to cycle through that forever. And the strategy we're interested in is a trigger version of that that says everyone tries to do that, but if somebody does the wrong thing, everybody punishes them forever and gives them their min-max value. Okay, so I wanna claim that this is a Nash equilibrium of the infinitely repeated game with average rewards. So first, notice that if everyone does play according to this strategy, then everyone will get an average payoff of RI just as we wanted. Now you might be thrown off by the fact that sometimes they're not quite getting RI because they're only midway through the sequence, but we're only interested in the limit. And so if you look at averages over periods of time of length gamma, so you look at what happens after gamma, what happens after two gamma, what happens after three gamma, you'll be able to convince yourself that they really do get the right payoffs because indeed after every period of length gamma they get exactly the payoff RI by construction. So indeed this does lead to the payoff that we're trying to get. What remains to show is that this is a Nash equilibrium that nobody can gain by deviating. And indeed I claim that it is. So to show that let's imagine that everyone else is playing according to this strategy and some player J deviates at some point. Well, if that happens that for all time after that point player J is gonna get his Minmax payoff and that's gonna render the deviation unprofitable because we've assumed that this payoff profile is feasible and enforceable. And that means that he was already getting at least his Minmax value. And because this is gonna happen for all time, right? This is gonna happen forever afterwards. That's just gonna end up dominating the average. Everything that's happened for that finite amount of time beforehand is gonna be washed out of the average and instead his average payoff is gonna turn out to be his Minmax payoff. And by the thing we're trying to prove here we know that that's less than or equal to RJ. And so there's no reason that he can gain by such a deviation. And that suffices to prove the folk theorem. So what we've seen now is that essentially feasibility and enforceability correspond to the payoff profiles that are achievable in Nash equilibrium of an infinitely repeated game with average rewards.