 So now we see that life isn't all about winning. The discriminator can only learn about errors being made. And the generator can only get better if it sees a gradient from the discriminator. So they both need to learn concurrently. If you want, the generator can only learn to not make the mistakes that the discriminator knows about. But the discriminator can only learn about the mistakes being made by the generator actually making those mistakes. They need to learn concurrently. There's a huge vanishing gradient problem. Imagine you have a perfect discriminator. In that case, the discriminator will always say to all the images produced by the generator, that's just totally fake. And therefore, the gradient for the generator will always be zero. And therefore, the generator can never be better, become better. The same thing with the generator. If we had a perfect generator, the discriminator would be entirely unable to see the differences and would again see zero gradients because it's just totally on the domain where you can't pick up on any relevant information. So in practice, therefore, we do the following setting where we have a generator. We feed it with the relevant prior beta distributions. We have real images. The images being fed to the discriminator alternate between real images and generator images. So in each, in many settings, you have in each batch, you have some of both. Although there's some, some chances to optimize that goes to the discriminator, produces a cost that cost is seen by the generator and being used to improve the generator. Okay, so now we will learn, we have this setting and we will use in that setting the learning for both of them according to the relevant gradients. Okay, now, technically, this is a competitive game. You can say what we are looking for is the minimal over the generators of the maximum of the discriminators of the relevant value function. It depends on the discriminator and the generator. And now this is the expected value of over the data distribution X of the log discriminator values. This is what the discriminator wants to assign high values to. And this, we have the expected value over the general over the Zs, which are the parameters that go into the generation of the log of one minus the discriminator applied to the generator applied to Z. And now this, of course, will only will stop once we reach a Nash equilibrium. Now what is what is a Nash equilibrium? It's important in game theory. It's the situation where no one can improve the outcomes for them. By doing anything differently. So that means that neither side can improve the outcome by changing its actions. So there is no change to the generator that will will improve the cost for it. And there's no change for the discriminator that will improve the change for it. And any change in the generator makes it more detectable. Any change in the discriminator makes things less detectable. Okay, so it's interesting to have that link to game theory. Now, why is this going to be hot? There's going to be lots of convergence issues. Like if the generators too good, the discriminator doesn't see gradients anymore. If the discriminator is too good, the generator can't see gradients anymore. There's instabilities. Now let's say the discriminator figures out a new dimension in which the discriminator the generator can be wrong. All of a sudden it will be winning that it's going to take the generator a long time to figure out what that mistake is that it's turning in. And by the time it gets there, it can have already moved on. Then there's the problem of so-called mode colors. We'll be talking more about that, but we want to make sure that every image possible will be produced according to its probability distributions. And then there's of course the general problem that we're talking about very high dimensional statistics. We're talking about images in million dimensional spaces and in general statistics in high dimension as hot. And we will see that it also is super dependent on hyper parameter settings and it feels very random. There's a lot of people who play with GANs and it almost feels a little bit like a dark art where what works with what doesn't work depends on very minor differences on how we set up the problem. Now it's your time to implement the full GAN. Take our discriminator and our generator and put them into this joint setting where they learn at the same time. And then will it be good? And will the images be any good?