 So, let's look at that in a little bit more detail and in particular, think about exactly what the workflow will look like. So imagine that you could execute some random actions, aggregate the SA-S prime tuples that you observe from executing random actions into a data set D. Then you would train a model P of S prime given S, A in a supervised learning manner because now you have the label S prime and the inputs S, A for that model. So you could just do supervised learning then to learn the model. And once you've learned it, you would use P in some way for task execution. Now one good thing about this is that there was nothing about the process of aggregating the SA-S prime tuples and executing random actions that was specific to the task that we eventually want to execute. And so the model is very general and so you could train that model and then use it for executing potentially any task afterwards. The flip side of that is that it's actually really hard even though in theory you could do something like this. It's really hard to collect good training data that could produce a model that would work well for any task. Now this style of model-based reinforcement learning is called one-shot model learning where you only have one shot at collecting data and training a model and once you've done that you have to use it for some task afterwards. There's another category of approaches which is more commonly used called incremental model learning which looks very similar except that what you do is after using the model after using the learned model for task execution, at the beginning you might do a poor job by using the learned model for task execution because it's hard to collect training data like we said, but then once you have tried to use that learned model for task execution you start producing some data specific, some task specific data. You get some task specific SA-S prime tuples and you grow your data set in the direction that matters the most for your task. And then you can repeat the process all over again and you can iterate several times like you can keep going back and forth between aggregating the data, training a model and using that model to perform task execution and collect better data. And doing this is great because the target task is taken into account as you're collecting data and so it's obviously going to work well on that target task much better than the one-shot model learning approach. And the flip side of this is that this is going to, if model training is expensive, if for example you're training a deep neural network, this will be cumbersome to do, it takes a lot of computation, it might take time. And of course by specializing the model for the task that you want to execute, you're automatically also giving up the ability for that model to work well on other tasks potentially. Okay, so we kind of glossed over on the previous slide what it means to use the dynamics model for a particular task. But of course we mentioned this a couple of slides ago where we said once we've learned the dynamics model, once we've learned the state transition then you can simply use dynamic programming style approaches like policy iteration to solve the learned MDP. And another class of approaches is to treat the learned MDP as a simulator and then run model-free reinforcement learning inside that simulator. Because after all, once you've learned how the environment transitions and what the rewards are, you essentially have created a simulated world. And inside that simulated world you can imagine experience and you can train a reinforcement learning algorithm for as long as you would like within that simulator. Even though model-free reinforcement learning like we discussed takes a lot of time to run because you're doing it all inside a simulator now, you don't really have to worry about how much experience you need to collect. You can collect millions of episodes if you like and that's all fine. So these are two main strategies for how to use the dynamics model after you've learned it.