 So, how would you train a policy network? You saw how we trained a value network last time in the first tutorial of the week. What do we need? We need an objective function, because that's what you will be optimizing for. We need a functional neural network that ultimately produces a policy as a function of the board as an input. And then we need a ton of data. Now describe how you would train such a network.