 In this video we're going to look at how to define utility in infinitely repeated games. So remember that the way that an infinitely repeated game works is that we have some stage game which is a normal form game and the players repeatedly play the same game over and over again. And what that means is each player gets a sequence of payoffs. Let's say player i gets the sequence r1 in the first repetition, r2 in the second repetition, r3 just on and on infinitely. So we have this infinite sequence of real values which are the payoffs that this player has gotten. But if we want a reason about this game we can't really reason using utility theory about an infinite sequence. We instead have to take this sequence and turn it into a single number that talks about the utility that the player has for having played this sequence. And so how do we do that? What's the right way of thinking about that? So the first thing to notice is that previous things that we've learned about game theory aren't going to be sufficient to answer this question. So you might wonder if we can take this infinitely repeated game and just write it in extensive form. And of course we can't. And the reason is that the extensive form would be infinitely deep. We would never get to a leaf node where we could write a payoff. And so that won't help us. You might also just wonder can we sum up the sequence of payoffs and just say that my utility is the sum of these values. And the problem is that that sum can be unbounded because if for example every payoff I get is positive then I'm going to have some unbounded amount of utility at the end and so that's not going to work for me. I want to have finite utilities. So instead there are two canonical ways that this gets defined and I'll tell you about both of them in this video. So here's the first one. So the first thing says let me look intuitively at my average payoff over this sequence. Now the average payoff of a sequence is also not well defined because of course the way I take an average is I sum everything up and then I divide it by the number of things. And we've already seen that the sum can be unbounded. The number of things is unbounded as well. So I would have infinity divided by infinity which wouldn't help me out here. But what I can do instead is to look at the limit of finite averages as the averages get longer and longer. So what I can say is let me look at an average over the first k things in my sequence and then let me take the limit of this average as k goes to infinity. And it turns out actually technically this isn't always well defined. It's almost always well defined and there's an easy fix that we can do to this definition that I've left out here to keep it from getting too technical. But in cases where this is well defined this is the right thing to do. And everything we'll talk about in this course it'll be well defined. So this is defined as the average reward that the player gets over the infinite sequence. So this gives us one number. The reason we have a second definition is that there's something kind of counterintuitive about the average reward. Let me put it back up. The reason that it's kind of counterintuitive is that if I get some bad payoff for a finite amount of time let's say for the first hundred thousand iterations I get a payoff of negative a million and then for the rest of time after that I get some good payoff let's say one unit of utility then the limit of the means would be one. Because the negative payoff that I got at the beginning was only for a finite amount of time and it washes out in the average if I go long enough out into the future. And well that's what the math says but that doesn't always model what we want to to really reason about because we have an intuition that payoffs that you get early on are kind of more important than payoffs that you get really far into the future. So if we want to have a model of utility that has that property we need to say that different payoffs matter differently. So it's more important to me to get a good payoff in the first iteration than to get one in the millionth iteration. And the way that I can model that is by saying my payoffs are multiplied by some discount factor. So my discount factor talks about my value for payoffs at different times. So my discount factor beta is some value strictly between 0 and 1 and you can sort of think of it like an interest rate. It's saying you know with money if I wanted to tell somebody that I'm going to pay them a hundred dollars in a year they would value that at less than a hundred dollars today. And so and the amount by which they would value it less today kind of corresponds to the interest rate. And that's kind of exactly what's going on in the math here. So what I'm saying here is that my utility for this stream of payoffs this stream of ours the stream of payoffs is weighted by the discount factor to the power of which payoff in the sequence it is. So I'm going to discount each payoff successively. So the first one is going to have the discount factor applied once. The second one is going to have the discount factor applied twice. So I'm going to get the discount factor squared and so on all the way through the sequence. So each of them is going to be diminished but each of them is still going to matter. And there are two ways we can think about what the discount factor means. So the first is kind of the interpretation that I've been telling you so far that the agent just cares more about the near term than the long term. There's another definition which is different but mathematically the same so it's interesting to think about. And that is that the agent really is the agent we just talked about in the average reward case cares just as much about every payoff. But with some probability the probability actually 1 minus beta the game will end in every given round. So our game is not necessarily infinitely repeated it's sort of potentially infinitely repeated but every time we play the game we're going to flip a coin and with probability 1 minus beta the game is going to just end and with probability beta the game is going to continue. And what that means is that here we'd be talking about my expected reward in the game because there's a beta chance that I'll go to the next round there's a beta squared chance that I'm going to go two rounds forward there's a beta cube chance I'm going to go three rounds forward and so on. So that means my expected utility in this game would would just be the same formula. And that's it for defining utility in these games. Thanks very much.