 All right, now that we've seen some model-free reinforcement learning approaches, let's move to our next class of approaches that we'll see briefly called model-based reinforcement learning. To motivate model-based reinforcement learning, let's go back to why we want to do reinforcement learning. Because remember, we have classes of approaches based on dynamic programming, like the value iteration and the policy iteration. And from other branches of study, like planning and trajectory optimization, we know what to do when the environment dynamics are known. When p of s t plus 1 comma r t given s t comma a t are known, then you know exactly what to do to maximize your reward. Now it's only in the setting where we said we don't know these dynamics when we don't know the the state transition dynamics and we don't know the reward function. That's when we do reinforcement learning. And in particular, in model-free reinforcement learning, you could learn policy mappings from state to optimal action. And of course, you can also do this slightly indirectly by learning a Q function and then using that to determine the policy. Now model-based reinforcement learning says, if it is the whole point of doing RL is that you don't know this, you don't know the state transitions and you don't know the rewards, then why not actually learn that directly? Why not learn the environment dynamics, meaning p of s t plus 1 comma r t given s t comma a t or you could also think of this as p of s t plus 1 given s t comma a t and R of s t a t s t plus 1. Those are both equivalent ways of writing the same thing. So if you learn the environment dynamics, then you have suddenly reduced your setting to the setting that you had at the beginning. Planning, trajectory, optimization, dynamic programming can handle that setting afterwards.