 So now we can change the epsilon and we can thereby dial between a player That absolutely optimizes value always takes the best step for them and a random player that always chooses a Random step and thereby we now have something with which we can dial that effectively it's an exploration versus exploitation trade-off if I have a large epsilon the system will explore all kinds of different games if I make epsilon be very small it will effectively exploit it will do the optimal actions for it Another we have a playing agent. We can generate training data by self-play we can have we will have a list of bought positions that we saw and We will have the respective end result, which is who won and And now effectively we have a machine learning system that wants to take that position and Converts it into an estimate of value so that that estimate of value is similar to the Suspective end result just does a reminder one if I win minus one if I lose zero if I draw and This is now a quintessential problem for neural networks So let's look at how code to do such a thing looks like So we first need to set up a layer and then we need to Define the computation or like we will need to set up a network and your network and then we will need to define the Computation that so let's look at the initialization here. We have a couple of parameters Then we will have two convolution layers. We will learn a lot more about convolution layers later in the course But the idea is there are aspects of the bot that are local and these local aspects can be extracted by local filters and We but the meaning of these local motifs will be the same regardless of where it is Therefore we can apply the same computation to many locations on the bot So we will have two convolution layers and we have two linear layers here and Then we have this so-called forward computation. What do we have? We take the bot we run a convolution layer on it with a real or non linearity We do another convolution with the real or non linearity and then ultimately we have a Fully connected network at the end and we will return then the tanch of the value that we have there So this fully specifies the network that defines value here Now all the there's so many aspects on there every line here is really valuable and interesting and We will touch a lot on the ideas here. We'll talk about linear networks in week two We'll talk about these non-linear transfer functions. For example, RELU in week three We will talk about convolution in week six in the context of computer vision