 Let's now speak briefly about stochastic games. This is a topic that lends itself to a very long discussion and quite a complicated one. But we'll touch on the main points and position it in the landscape of topics we're discussing. And so the starting point are repeated games. As we know, a repeated game is simply a game in normal form, for example, that we repeat over and over again. For example, we play Prisoner's Dilemma once, twice, three times, maybe finite time, maybe an infinite number of time. But we accumulate all the rewards all the time to some overall rewards. A stochastic game is a generalization of it where we play games repeatedly, but not necessarily the same game. So we play a game. Depending on how we play that game, let's say Prisoner's Dilemma, we each got some payoff. But depending on how we play the game, we also probabilistically transition to some other game, play that in turn, and the process continues. A graphical way to look at it is here. If this here is a repeated game where you play the same game over and over again, here you play the game, and then if you haven't played this, you'll transition to this game. If you play this, you may be transition to the same game. If you play this, you'll transition here. If you play this, maybe you'll play this game again. And so on, from each game, you transition probabilistically to other games. This is a stochastic game, formally speaking, it's the following two-pole. It's a lot of notation, but the concept is exactly what we saw. We have a finite set of states, Q. We have a set of players. We have a set of actions where actions are available to specific players. So A sub i is the action available to player i. And then we have two functions. We have the transition probability function. So depending on the state we're in and on the actions we took, we move to each of any of the other states, or the very same state, with a certain probability as governed by this probability distribution. And similarly, a reward is the reward function which tells us if in a certain state a certain action profile was taken by the agents, then this is a reward to that particular agent, to each of the agents. So R sub i is a reward to agent i. That's the formal definition. Notice that it sort of assumes implicitly that you have the same action spaces here, but you could define it. Otherwise, it simply would involve more notations and nothing inherently important about the action spaces being the same, the different games within this stochastic game. So just a few final comments on it. First of all, as we saw, this obviously generalizes the notion of a repeated game, but it also generalizes the notion of an MDP or a Mark of Decision process. If a repeated game is a stochastic game with only one game, a Mark of Decision process or an MDP is a game with only one player. And so you have states there where the agent takes an action, receives a immediate reward, and probably moves to some other state. And the only difference is that he's the only actor in the setting. I mentioned this because, well, MDPs have been studied substantially in a variety of disciplines, from optimization to computer science to pure math and beyond. But also these two perspectives of generalization of repeated games and of MDPs give you a sense for the theory and investigation into stochastic games. So from repeated games, we inherit the definitions of different ways of aggregating rewards over time. You can have limited average rewards, future discounted rewards, whereas from the literature on optimization and on MDPs, we get notions such as stationarity and Markovian strategies. These have to do with we also have notions of reachability about the structure of the underlying transition probability. And so again, these are issues that are involved that we won't get into more in this lecture, but at least we flagged their existence.