 So, what did we just see? The generated images are pretty bad. Why? We're using a model trained without any constraints on the distribution in that latent space. And then we've picked new vectors according to some probability distribution. Now that probability distribution P of Z might not look like anything like P of H in the model where we calculated in a forward way. This is not what the original autoencoders were trained to do. What they are trained to do is just use a representation that internally allows us to produce the images that we put in. And now where we're going here is we want to produce variational autoencoders that make probabilistic and generative autoencoders explicitly. So what's the idea here? Implicitly, we've defined a probability distribution over X by passing a probability distribution over Z through the decoder. Technically, this is a density network, and McKay and Gebs have been doing these things in the 1990s. So, what's the idea there? We can say we take a probability distribution P of Z of which we assume that it's a Gaussian. It goes to the decoder. At the end, we will have a very complex probability distribution in X space. Already here, when we have a non-linear decoder, the Gaussian that goes in will produce a complex distribution. Now this distribution, we'd like to be good. So we have effectively a maximum likelihood problem. We want to maximize the log probability on the points from the training data. So we want this to be similar to the real distribution. This is, of course, hot. So, and even the formulation of the problem is very hot. Like what does it mean to produce a good probability distribution? So read through the section that's called formalizing the problem and discuss with your part if it makes sense to you.