 So what could be meaningful policies in this context? We saw in snap judgment we take the action that has the highest associated value with it. Now of course to produce a policy we want a real probability distribution. So there's multiple ways so we could do it. We could make it that the probability for the action is one minus epsilon if the action is the snap judgment and with epsilon probability it's just a random action out of all the other actions that are there. Alternatively we could say we could formulate this as a deep learning problem where we can say the probability of an action is a function that depends on parameters of the bot. So it basically we look at the bot and basically just figure out what the good moves are here. Alternatively we could have a version that combines it all where the probability for an action depends on the position of the bot and maybe calls to the value function as well. What if you had to design a meaningful policy by hand? Right up. What do you would do?