 So, finally, we get to the real deep learning training. What we will do now is we will train a network that gives us policy and value. And we will use it to get better policies, which will give us better research. And one idea that is quite interesting. Now we could, in principle, have a function that deals with policy and a different function that deals with value. However, it makes a lot of sense to share the two of them. Why does it make sense? Well, the features that help us know how good or bad we are might be similar features to the ones which say which moves should we choose if we're good at choosing. So what we will have is we'll have one network that will give us policy and value at the same time. So let's briefly look at the code here. So what do we have here? This code is almost exactly the same code as you saw in tutorial one. There's a few differences. We have regularization, which we will cover in week five in great detail. And specifically here we'll use dropout. We otherwise have the same network, but now we have two outputs with the same network. But we will ultimately return P, the policy, and also V, the value. By having both in the same network, we will be able to produce, efficiently produce good trees because basically the policy network allows us to only go through the parts of the tree that are actually decision relevant for us. And at the same time it will give us value, which is what we ultimately want to learn about. And this is really it. The reason why I thought it's so cool to do alpha zero during the first week is because it consists out of relatively simple components, policy network, value network, together with non-deblining component, which is very typical. And it comes together and it's awesome. So what do we have here? We have a network that does value on policy with a way of self-play and a make MCTS system built around it that allows us to learn this meaningfully. Now, we will put it all together. We will use AI versus AI for training data. We now need data out of our trees, such as the number of visits as part of the training data. And now, before we do the actual training, ask yourself, what is exactly the data that you need to be able to train the network as we have it here?