 So let's talk about the early history of confnets. Relatively shortly after the original experiments on visual cortex by Euboe and Wiesel, Fukushima tried to build a network out of this. So here's what the idea was. We have an input layer. We have a fast layer that does contrast extraction a bit like the lateral geniculate nucleus, a part of the visual system that's between our eyes and our visual cortex. And then we have simple cells just like Euboe and Wiesel had and complex cells. And then we had the generalization of that, where we basically have a linear combination based on the output of those complex cells and then again additional invariances. And it was added and added and added and it ultimately could do some object recognition. So arguably these are the oldest versions of something that resembles a confnets. And it basically had an alternation between so-called S-layers, simple layers and C-layers, complex layers. And it also had a non-linear transform between the layers. It was much more complicated because I believe at that time it wasn't yet clear which things are not important. And in fact we also see that in the transition of later confnets to today's confnets. So at some level a lot of things get easier over time instead of getting more complicated. So the neocognitron in a way was a rather complex network. Now another 20 years later. Now like keep in mind that at the time of Fukushima computers were extremely slow. Now this was 1980. So at that point of time what we could do was severely limited by the compute power and of course also by potential data set sizes. So 20 years later computers have come a long way. I'm in grad school and things are much faster. And Yanle Kun then did the so-called lunette. And I remember thinking about it and asking myself well that like mighty looks like a neocognitron. But it turns out that in deep learning oftentimes by making things better, taking things to the next generation, they actually get to be much better than they were before. So the lunette of course takes what was a nice idea in the neocognitron into the domain where it actually was usable. So let me briefly introduce Yanle Kun. He worked as a postdoc in Jeff Hinton's lab. He's now chief AI scientist at Facebook AI research. And he also wrote a white paper arguably discovering backprop. Although Thorbos has a similar claim to it. And he also co-founded the founded ICLR then the conference on learning representations. The problem that lunette was solving was classify seven by 12 bit images, they're binaries, although always black or white, of 80 classes of handwritten characters. And you can immediately see why at a time where we can't solve such recognition problems. This is very useful. So the architecture of that was the following. You have an input, then you have a convolution filter of size five by five. So you go from the input to six feature maps, each of them 28 by 28. Okay, why is this smaller than the original one? Well, because he didn't use the trick of padding that we'll be talking about later. And then it goes from there six maps of 28 by 28 to six of them by 14 by 14. So subsampling. And then we have a convolution that now gets us to 16 filters by 10 by 10. And then we have a fully connected layer and a Gaussian layer at the end. And it was using average averaging, not max pool within that area. Now what were the results? Jan Leakon successfully trained a network with 60,000 parameters, which was very big at that time. And that was entirely without GPU acceleration. And it solved handwriting for banks, it pioneered automatic check reading. And it got the error down to 0.8% on MNIST, which was pretty much state of the art at the time. And the alternative was virtual SVM, caramelized, which also received the same error. But in a way, this wasn't a breakthrough because there were other techniques, namely these SVM techniques that had similar error rates. And in that sense, Lynette didn't have a strong influence on the computer vision field. We will see next week how basically then applying similar ideas another 10 years later, led to actual outperforming of the previous state of the art massively. And that is how neural networks had their breakthrough in the mid 2000s. And here's just the links if anyone wants to play with it. So artificial neural networks have come a really long way. Now keep in mind that Jan Leakon's Lynette in lots of ways has all the things that modern neural networks have. And yet in the late 90s, there was nothing as uncool as artificial neural network. It was uncool enough that when I organized with friends little workshops of Jan Leakon and Fukushima, what would both come and participate despite me just being a grad student. So it like the field no like now if Jan Leakon gives a talk, there will be 10,000 people. It's a spectacle. But it's interesting how that field has changed. The content hasn't changed all that much. But the way the world thinks about it has changed massively. In general, deep learning is a story of representation learning. Hence, ICLI is a big conference. So what do we want a good representation to do for us? What does it mean for us that a representation is good? Now let's briefly think about philosophy here. Like what do we mean when we say a representation? Well, a representation is a representation of something. You can say that in a way the nuance in a layer of Jan Leakon's Lynette jointly represent the important aspects of the characters that the system is seeing at the time. So in a way, they're supposed to reflect the world. That in a way also philosophically means that it must be possible for them to be wrong in a way. And arguably your representation must be used for something. Otherwise, it's hard to say that it's a representation of something. And there's a huge intellectual history going back at least to Aristotle. For anyone interested in that, I highly recommend it. It's a beautiful branch of philosophy. Now I want you to ask yourselves what are good representations? I have something in the outside world and I represent that as a vector of activities in a representation. Now each element in that vector tells us something about what's out there. Now there's no doubt that some representations of the world are more useful than others. Now if I encrypt them, they are maximally useless. And clearly if I somehow figure out what's in there, they start to be very, very useful. So if I would look at a feature, could I figure out that that is a useful feature? Or could I not figure that out? And if we learn representations, what would we want them to represent? Now like just to be clear what the setting is, we're looking for a mapping of an image into a vector so that that vector representation is useful for us. So less criteria that you can come up with that capture what it means for a representation to be good.