 So, as it is pretty clear, it is easy to win against such a hand-designed value optimizer. Why? I simply cannot enumerate all the things that can be good or bad about a position. In fact, the history of software packages that play games such as Stockfish or Deep Blue is full of people pushing a lot of human knowledge into such systems. They generally didn't produce systems that were all that good at playing. AlphaZero, on the other hand, doesn't even need to be taught the rules of the game. It's not full of human knowledge about it, and this human knowledge is not actually all that useful. And in fact, that's the history of machine learning. As we keep going, more and more we realize that replacing human knowledge, human biases with things that we learn from the real world leads to systems that work better. So now we have a subprime of game playing where deep learning clearly feels like the right answer, which is estimating value. But before we start, how would you solve such a problem?