 So, how could we solve such problems? The idea is that we want planning, we want to look into the future. How could we do that? There's a lot of older work that did that. An example is, for example, score function estimators. It's an older idea for improving policies. It's incredibly beautiful. So let's say we want to optimize a policy. How could we do that? We can calculate the gradient after the parameters theta of our policy of the expected value for the induced probability distribution from that policy that depends on z and theta. And we want to have the expected value of this for some meaningful function here, which is going to be just, we're just writing out the expected value, the integral of the gradient after theta of Pz theta f of z dz. Keep in mind that we used here that as nabla is a linear operator, we can change the order in which these things appear here. Then we can multiply that with one, which is P of z theta divided by P of z theta in the next line. So, otherwise, this is just a simple rewrite of the previous line. We can then rewrite this as the integral of Pz theta nabla theta log Pz theta f of z dz, which is, if you look at this, it's clear this is the expected value. This is the expected value under the relevant probability distribution of f of z. And then we have the gradient of the log probability here, nabla theta log Pz theta. Now we can approximate such expected values by having samples from a probability distribution. So if we have 1 over s, which is the number of samples we'll take, sum over all the samples of f of the z associated with that sample of f of zs times nabla theta log P of zs theta where the z have to be drawn according to the probability distribution P of z. Now with this, we can calculate the gradient after the parameters of the policy and a lot of such approaches work surprisingly well in reality. So now, this is all touching on ideas of deep reinforcement learning. I want to, the reason why I think it's so great that we're doing that today is because it will appear again in weeks 11 and 12. And in those weeks, you will get a much deeper insight in what can be done in the area of deep reinforcement learning where you will see lots of cool ways of setting up such problems.