 Now that we can estimate value, let's see if these values make any sense. So what we did here for you is we took all those games where one player one in green versus where the other player one in red. Of course there's the perspective from player one's perspective and the player two perspective. If it's good for player one it's bad for player two and we have the predicted outcome like it will start at zero because in the end in the beginning of the game we can't know what the output it will be. So let's first see a few things that we can observe there. The first one is you can see a jump up and down every step every turn. Now this is kind of like a little suboptimal if you think about it but what happens every turn? Now like if I play I will after I have played I will have more I will have more pieces to my color. So it's natural that you would find this effect the fact that it's still there like in principle an ideal value function would abstract over this. It's not surprising that I have that advantage but but here you can see that over the turns we basically have that effect. The other one is if I look at the green ones the one where player one wins they tend to go up over time. If I take the games where player two wins they tend to go down over time and this is simply the effect that basically if I'm on a win if I will win in the end and the value function isn't completely bad it should go into the right direction. So in general this value function seems to roughly have the properties that we should expect. Now the next thing is let's make a neural epsilon greedy player. We have something that plays. So let's see how good it works against the previous value estimator that you built by hand. Keep in mind that our neural networks was trained having the heuristic players play itself in an epsilon greedy way. So can we do better? Well why not have the neural network evaluation play against itself. It's like a circle and now the question for you is explore that and see if the self-playing agent is better than the agent that plays against the value function that you hand constructed.