 Let's see some examples of model-based reinforcement learning successes. Here's an example of a race car that's driving with model-based reinforcement learning and it's doing a very good job of it. It runs at really high speed, I believe, up to 40 miles per hour. This tiny race car on this mud-race track, which is quite hard to race efficiently on. You can see that it's pulling off some pretty impressive maneuvers. On the right, you see some examples of manipulation tasks being learned with model-based reinforcement learning. In this case, a simulated task of writing with a pencil and, in this case, juggling these two balls in a robotic hand. Again, these are quite difficult tasks. Manipulation with such contact-rich events tends to be really difficult and so this is quite impressive that it's actually able to learn it within two hours of real robot training. Here are some examples of learning from pixels. In this case, purely given the image representation of the scene and learning the dynamics in the image space because you have a deformable object that's not easy to represent in a lower-dimensional representation, you are learning the task of folding the cloth. So you're trying to take the hand of the shirt and place it near this point. That's the task that's specified on the bottom right. Finally, here is the task of learning how to control these different simulated robots including a humanoid and a half cheetah and so on. You can see that over the course of only about 2,000 attempts, it's actually able to learn pretty good policies using model-based reinforcement learning for controlling these. To wrap up, let's compare model-based reinforcement learning to model-free reinforcement learning along a few axes. Model-based reinforcement learning tends to be modular because you learn a model separately and then you also apply some kind of dynamic programming or other approaches like we've discussed to it. That lends itself to some easy debugability because you have these two phases, you can independently test them and see what's going on. And it's also easy to inject some approximate physics knowledge especially when you're trying to learn a model in a physical system. It's easy to say I kind of know that this physics system works by following Newton's laws and so that allows you to also learn more efficiently because you're able to inject some domain knowledge. One nice thing about model-based reinforcement learning is also that you can piggyback on a really large literature of many decades on planning and trajectory optimization and dynamic programming. Model-based reinforcement learning is sample efficient like we've seen you can learn some robotics tasks within only a couple of hours that would be much more difficult with model-free reinforcement learning, at least with the methods that we have today. And you have reusable dynamics models. Your dynamics models especially in theory, even if you do incremental model-based reinforcement learning, you're learning after all a dynamics model of the system and so that is not entirely task specific. You should be able to reuse the dynamics to some extent for new tasks as well. Now among the negatives of model-based RL is that the models are trained independently of the task and so you don't directly get task gradients into the dynamics models. And so this can lead to biased models that limit performance. And also sometimes it's actually harder to learn the dynamics than to learn tasks. So for example, if you're trying to learn to swim in a river that is fast flowing, the dynamics of that river are probably harder than to learn that you should move your hands in a particular way. And so it might in some cases not be worthwhile to try to learn the dynamics exactly. And finally, there's also a problem with some kinds of model-based RL approaches. The ones that use planning, purely planning for selecting actions, planning can often be quite slow or deliberative compared to direct policies like the ones that we were learning with Q-learning or even with direct policy search or even with Q-learning.