 So let's have a quick look at the training loop of this. What do we have? We are building our agent and we then create the optimizer. So we have the mse loss here and we have an optimizer. In this case it's the atom optimizer. Keep in mind that in week four we will learn an awful lot about optimizers and then we have the training loop. What's going on in here? We will iterate through the epochs and then the AI plays itself. Then we put this into the right format to produce a batch that we can use for training. And then we'll do a number of iterations where we do training of the policy value and we have a policy loss and we have the valuation loss and the overall loss is then the sum of the two of them and then you take the gradient of everything, take a step into the right direction and then do some housekeeping. So that's it. That's the training loop for the system.