 So, how can we improve these problems with GANs? And how can we generally improve problems with mode collabs? Lots of people had lots of ideas there, you know? Like, we can use, we can use JSD divergence. We can use arthmover's distance. We can, we can do something about weight decay. We can give it an all too loss objective. There's so many great ideas that people came up with on how we could make GANs better. Now, these ideas are often based on mathematical intuitions. And sometimes mathematical intuitions are great, but sometimes they just hide that there's things that we don't really understand yet. I can highly recommend the paper on how well GANs, long densities, nonparametric and parametric results if you're interested in more details there. So, given these, all these mathematical ideas, and there's countless papers in this list here that deal with different ideas on how to make GANs better. So, did all these mathematical ideas succeed? Well, did anyone find a magical while walking formulation that is not as, doesn't feel as hacky as it is right now? Well, let's see. Let's say we want to measure how good a GAN is. What does it mean to evaluate how a GAN works? It's very difficult. Now, like we want to produce images that are kind of like real images. How could we know that? Here's a very simple intuitive idea, which is the Fréchet Inception distance. We take the Inception network and we use the Fully Connector 2048 neural and width layer. Now, if GANs produce the same probability distribution as images as the real world, we would expect at that layer that the two of them are very similar. Now, like, what does that mean? It means that at some level, all the means should be the same of each feature. Like, if we have a feature at that level, that feature should be matched and of course, the moments should be matched. The easiest is the variances should be the same. Now, you can say the Fréchet Inception distance combines those. It asks how the means are different and then it asks how the variance structure is similar here. Now, we can say, well, how similar are they? Now, first, this FID really makes sense if you apply it. So what we have here is we have FID on this axis versus various different kinds of disturbance levels. You can say we add some color noise, like adding a little bit of color noise. Yeah, like there's a good number of photographs that look like that, but still the FID goes up. And then if we add a lot of noise, the FID goes up an awful lot. If we blur it, the FID goes up. If we do some rotational things, it goes up. Interestingly, only very little. Look here, this face to a human clearly doesn't seem right, but suddenly the activations higher up in a conf net may be quite similar, which is actually interesting because it means that there's something missing there. And then we can add individual random pixels to it. And more and more, and you can say that we already start having very different levels if you have really small amounts of this extra noise there. So FID kind of makes sense as a measure of how real stimuli are. And then we can look at the FID score for the various different gains. And what we can see here is A across restarts and with different hyperparameters, there's a huge variability. Looks like the same network can have an FID score as low as 15 and suddenly much, much higher. We have that for the different data sets here. They look at MNIST, fashion MNIST, CIFA 10, CELABA, lots of different models, but the upshot is huge variability and also the upshot is not one of them is clearly better than the others. So despite all these good ideas, they don't seem to be helping all that much, at least not if we take FID as a measure of quality for a gap. Now there's one thing that I should have mentioned about FID. Why might we expect that FID are a good idea? You can say at the very output of Inception we have object categories. So a good generative model should produce object categories at the same frequencies as we have it in the real world. But you can say we might want to go down a level further and say we want to kind of not just get the object categories, but maybe poses, maybe like additional parameters, right? So therefore, close to the output of an object recognition network, we might expect the kinds of things that we would ideally like to keep constant here. Now here's an idea of improving certain things, which are conditional GANs. There's some things that we can easily do. At least make sure that every class comes with the right frequency and cannot be moded cobs or collapsed away. So in a normal GAN, mind you, we put in Z, which is a random vector, which produces cats. Now we can add the properties of what we are looking for here, where we could say, well, we instead take Z and we give it coded, the class here, one hot encoded. Alternatively, we can take this one hot encoding and add it to the image inputs, which is how we do it in practice. And then we can say here we have different cat breeds. And like this could be Russian blue cat breed. This is another Russian blue breed. Both of them would be one hot encoded, maybe as 1, 0, 0 at the end of this. And then here's Conrad's kind of a cat, which would be like 0, 1, 0, and so on and so forth. There would be different cat breeds here. And now the discriminator, importantly, needs to get that same information. And like if the discriminator would basically get this information, here's the image, and the label is that it's Conrad's kind of a cat, the discriminator would say, no, that's definitely not. Whereas here Conrad's kind of a cat, probity would be high, would be that it's fake, would be much lower. Now let's build a conditional again. Fast sample the labels or just take them out of the data set and then have the label be part of the input to the game. Build a condition again.