 So, let's briefly think about the big picture. Like, it's confusing if we go into the formalism of ALBO, despite the fact that it does the right thing. What we're looking for ultimately is a mapping with an encoder network that basically gives us mean vector and standard deviation vector, and from those we will obtain the sampled vector in latent space. And then based on these samples, we can use the decoder network to get back at images. And now fill in the conf VAE, and it's very similar to the conf auto encoder, but now with real distributions.