 So, this worked great, no? We can absolutely distinguish images from white noise. So, maybe images are easy, maybe the discriminator has already learned what's real about real images and fake about white noise. Well, let us see about that. So let's focus on the generator part now. Now, what's the generator trying to do? The generator is trying to do, in this view, it's trying to minimize the probity of being caught. In other words, you can say the generator wants to produce images such that the discriminator, when it thinks that it's a real image, it's actually a fake image and vice versa. So, one easy way of doing that is to give it the cost function, which is just the opposite of the cost function of the discriminator. Mind you, there's these two components. This part is the probity assigned, the log probity assigned to the images if they are real. And here we have the log probity of one minus, the log of one minus the probity assigned for the images that are fake. Now, here's a slight problem there, which is if D would go to zero, then no, if D would go to one, then this term would go to infinity. So, what people often do here is they rewrite it. So, they replace the term that's log one minus D with minus log D. It's a monotonous function within the relevant domain. Now, like here the value of D will generally always be positive. Now, like it's going to lie between one and zero, usually that's because we have a normalization that only allows values in the domain. Okay, but the original cost function goes to infinity as we approach, to minus infinity as we approach one, whereas the newly rewritten one stays in a more meaningful range as we approach one. So, that is the cost function that people then often use for GANs. Now, what's the GAN learning idea when we talk about probity distributions? Now, like what goes in is a random number. Maybe what we want to have is a probity distribution like the green one here. And what we are learning is effectively a mapping of a random number into the SpaceX so that ultimately after learning the two distributions are the same. And therefore we have ongoing low loss. Now, how's that going to work? We're going to produce a vector Z of random numbers. We will either draw them from a Gaussian distribution or from a uniform distribution. Now, you can say any one vector maps onto an image. So, the network that will be training here through the generator network. Not like any vector maps to something. Why do we need those random numbers? Well, we need the random numbers to describe the variability that is in the dataset. Now, like you can say the poses of the heads of the cats and there's the size of their eyes and there's the position of their ears and so on and so forth. So, there's a lot of degrees of freedom within that image manifold. In fact, it's quite interesting with this cat. It's a little disturbing because it's unclear what happened to the rest of the body of the cat. In any case, it's a fun webpage. This cat does not exist.com, which will give you every time you go there. A new cat that doesn't actually exist. Now, that's the generator mechanism. And this is where the neural network sets. It gets its input, this vector. It produces its output, an image like that. And now that we have a discriminator, let us freeze the discriminator. Let us use a simple multi-layer perceptron generator. Let's give it a couple of layers with real or non-linearities and ask, can such a trivial generator win, which is ultimately make the discriminator no longer know the difference between real and fake images? And then, of course, there's this question that's underlying here, which is, are the images it produces actually in any way close to real images?