 So today for the first time, we will have some serious time dedicated for unsupervised learning. So why is unsupervised learning so important? Well, the world around us mostly gives us data without labels. But we know aspects of the representation that we seek. Now what do we know? We want that the representation that we have somehow explains the data that comes in and much about VA East that we will talk about today is about that. But we might also believe that the right representation is sparse, whereas wrong representations will be less sparse. We might believe that the representation should be able to tell us something about the future. We might in fact believe that good representation should tell us something about the causal structure in the world. But in any case within this is the domain of unsupervised learning. And people have long argued that unsupervised learning is extremely important. And we need to start that with the picture of a chair of a cake. It's been popularized by Jan Lekkon, who says, well, look, if we view about the problems that we solve in machine learning, then the cake, if you want, that's unsupervised learning. Why? Because the bulk of data that we get is without labels. And then we have icing over the cake, a small amount, but it makes it more beautiful. That's supervised learning. Like in some cases we do get labels. I might at some point of time have seen something like that and someone said, that's a phone. And then on top of it is reinforcement learning, which is the so-called cherry on that cake. Why is this cherry so small? Well, the amounts of things that we try in our life is very, very small. Most of the time we just see what happens around us. But sometimes we try something. We do a little experiment in this complex world that's around us. But the amount of information that we get from it is extremely small. Hence that realization that basically we live in a world that must be dominated by unsupervised learning with a little supervised learning and a tiny bit of reinforcement learning added for good measure. So, again, remember that we had this exercise where we said, well, what makes for a good representation? We said, much variance should be described, the kind of variance that matters for a task. It should be temporary, smooth sparse, compositional, and so forth. So now today we will take some of these ideas and implement it. Today is a special day because eight is a wonderful number. So today you get to choose your own adventure. Within each part we will allow you to choose one of two data sets, which is either MNIST, which is a well-understandable, low-dimensional, historical data set of images of digits. Or you get to choose CFA-10, which is a considerably more complex data set that contains photographs of things like deers, tracks, or cats. Again, 10 categories in each of those. The important thing is everything we'll do today, you can do on MNIST and you can do on CFA-10. Each one of you gets to choose one of those two. And that way you can then compare results. And we'll see that in a lot of domains, the results look rather dissimilar between MNIST and CFA-10. So right at the beginning of today you get to choose which data set and then you do all the exercises on that one data set. Now, images are high-dimensional structural data. CFA-10 images, they're color. So it's 3x32x32 channels. So it's about 3,000 dimensional. MNIST images, you might remember them from previous approaches, they're 784 dimensional. Now, let's think a little bit about meaningful dimensions of variation. How many ways are there to write the two? Well, you can make it smaller, narrower, higher, bolder, and so on and so forth. Think about variations in frogs and trucks for CFA-10. There's quite a lot of different dimensions in which frogs and trucks can differ. Many have short description. For example, you could say, well, there's a loopy two, a slanted left with thick lines and so on and so forth. We'd like to, in a way, automatically take a data set and find out what these dimensions are along which the number two differs, where we can produce short descriptions. And ideally, the way to invert that way can say, well, can you draw for me a bold, tall, but narrow number two? In a way, we humans can all do that. So in this context, you can say we can view unsupervised learning primarily as compression. We want to find out the dimensions along which examples of a certain category vary. We want to do dimensionality reduction, in a way. But before we get there, which we should always do as data scientists is really understand your data set. Now, each of you should choose a data set and then really make sure that you understand the data set because you'll be learning a lot with it today.