 So, when I played Othello, what was I thinking? Well the first one is that it clearly seems that if I play with white then having more white stones on the bot is better than having more black stones. That can actually be a bad strategy if we only focus on that but that kind of seems right. The second observation that I had when I was playing it is that corners are great because once you have a corner no one can take things away from you. There's also sometimes there's pieces that are safe where you can like know this piece can never again be taken away from you. But there's also something about local configurations that's very hard to describe but like if I locally look at the bot there's certain things that feel good and certain things that don't feel good. So you could say that could be basis of maybe value judgment. Now we want to distinguish three different strategies here. The first one could be snap value estimation. I want to choose the action that brings me to the bot with the highest estimated value. If you give me a really great value estimator this is a really very nice strategy. Alternatively we could say we could have snap policy judgment. I look at the bot, I see the right moves there and I simply choose the action which I estimate to be the best. And then there's the possibility to use planning where I can say I can look into the future maybe using snap policy judgments to decide where I should go with which probability. And then of all the possible futures that I can foresee there I might want to choose the future that has the highest associated value. So you see there's like different subtle ways of thinking about how we should play. They all feel in a way right and they all in a way feel incomplete. And you will find that this kind of property of real world prompts happens very often in the context of deep learning or general in machine learning. Now let's first see if we can get away with just relatively simple strategy. This one possibility plan plan until the end of the game. Go through all possibilities. In that case we don't need any value judgment no like we go all the way to the end. And in fact there exist some games like tic-tac-toe that are simple enough that we can plan through all of them. Alternatively you could say if I would have a perfect value estimate I would know the probability of winning given optimal play for every possible bot. No need for planning on that. But let's first see how good we can do be with such a strategy where we do value-based snap judgments. Value. What do we expect the value evaluation to look like? Well if I'm going to win I might expect that my value of the bot should be one. If it's going to be a draw I might expect the value to be zero of the bot. If I'm going to lose the value should be minus one. And you can say if we are farther away from the end of the game we maybe might want to be somewhere between one and minus one. Now let's implement a pure snap judgment based on values strategy to see how good it plays. We will build a system that tries to estimate how good a given position is. The value here is defined as the difference between the probability of winning and the probability of losing. And we will solve this problem and many others with deep learning. And you can see how deep learning fits very natural. Like we are looking for a mapping of a complex structure like the bot onto the difference between the probabilities of winning and losing. But before we go there solving it with deep learning we will use the strategy that previously had been popular which is let's try to solve that problem by hand. So that's what you will do first now.