 So, if I have a value function, how should I play to generate good data for a machine learning system that estimates the value of a situation? So what should I do? One possibility is I could have a system that plays itself and always just chooses the best game, the best move. Well, in that case, we would always play the same game over and over. So, this would not produce the diversity of training data that we need for estimating how good a given position is, because, well, just about any position we would never see. The second possibility would be choose a random action, but if everyone, if both players just act randomly, it cannot really understand anything about the logic of the game. This comes as a strategy that we often use in that domain, which is the epsilon greedy decision making. The idea is I choose the best action with p is 1 minus epsilon, which means, like, usually I play the right action. And then sometimes I randomly, with probability epsilon, I randomly choose a different behavior. Now, what we'll have there is it will somewhat approximate the good player, and then at the same time, it will produce a good amount of exploration. And that exploration is necessary to have the diversity so that we can have meaningful value estimations. Now, it's your turn to implement such an epsilon greedy agent. So implement the choice rules for that.