 So let's talk about an issue that I feel very strongly about, which is continual slash lifelong learning. So let's say you train a neural network on one dataset until convergence. Let's say you train it on MNIST with digits. You train until it's really very good at that. And then you train it on something else. You retrain that network and you now train it to recognize characters. What we saw before in domain adaptation is that that second task might be easier now that we already learned the first task. But there's another effect that we didn't talk about, which is you're messing up the original performance. So if I now take that network, I train it on numbers and then I train it on characters. And then I again test it on numbers, I will be very, very bad. It's the phenomenon called catastrophic forgetting. Now, if you want to train these things, what you generally always do is you take all images and you randomize their order. If you would instead take, say, even MNIST or something, you first train it on the zeros and then on the ones and then on the twos and then on the threes and so forth. It works really, really badly. So, but in the real world, we learn like that. Now, like I see maybe the first time in my life, I see a ball. We've all seen kids see balls early on in their life and they will go like, wow, a ball. Like, look at that. And then they will like stare at that ball for an hour. And then the next time they look at a ball might be a long time later, maybe weeks later. But it's not that the fact that they just learn with the rattle, the leads, the ball, quite the opposite. They basically by learning about some visual things, they get better at the visual things. So it's a weird blind spot where deep learning methods seem to work very, very differently to the way biological beings are. And this is a set of major differences between biological intelligence and AI. Now, biology gets better the more it learns. Biological intelligence is curious. It prepares as it prepares for new tasks. Now, like all of these relate to continual learning. I'm curious because I know there will be new tasks in the future. I get better at new tasks because I know there will be new tasks and I know that all tasks will come back. So biological intelligence has prior knowledge about the world. It has constraints of what it expects. Again, why does it need that? You need to know something about the kind of problems that you will have to solve in the future. It's compositional where you can take things that it learned and use it for these future tasks. And it focuses on causal things. It focuses on the things that we could do something to to make the world better for us in the future. Now, there's a very simple way of saying that. Like what we do in deep learning and machine learning in general is we try to minimize if you want the loss of the sum over the loss of past events. And now the loss depends on the parameters theta that we train with this ingredient descent and so forth. But this is clearly the wrong thing to optimize. What we really want to optimize is the sum over the future event, the future tasks, the future problems of the loss function parameterized. And now that immediately shows that what we should be about is finding the things that are potentially useful for the future. And we are solving these problems for which we can assume stationarity. But keep in mind that it's not even true, even not for ImageNet. When Ben Wrecked produced a second data set just like ImageNet, performance was considerably worse. And this problem runs to all of deep learning and most of machine learning. Now, let's briefly talk about causality because that's very close to my heart. We are interested in potential future actions. And you'll see in reinforcement learning weeks how we can capture that from a reinforcement learning perspective. So we care about potential actions in the future. Now, what does that mean? The world has lots of parameters there. There's lots of things that we can do in the world. But only a small number of them have interesting, useful causal effects on the future. What we really want to do is understand what these causal levers are. What are the things that we can do in the world? And we really want to understand about those. And we live in this incredibly complicated world where most things are not of the nature that we can change them. Now, I want to argue that continual lifelong learning is one of the main blind spots or maybe the blind spot for deep learning on machine learning in general. We do not really have solutions for it in general. And even simple animals solve real world problems better than the best deep learning systems. Why? Animals are really good at worlds that change. Animals are really good at interpreting the world in terms of cause and effect. Animals are really good at having prior knowledge they don't start from scratch. And I want you all to think a little bit about this. That's why I added a continual learning component to that. Despite the fact that it's usually not taught as part of confidence. So I want you to see in this exercise a simple strategy to counteract catastrophic forgetting. This is not the solution to the way how we should think about continual lifelong learning but it might be a starting point. It's replay where you could say one way of avoiding catastrophic forgetting is that occasionally we reactivate some input stimuli that we had before, replay them, it will lead to less catastrophic forgetting. Although I should mention there's interesting things about critical periods so if there's sudden stimuli that a network doesn't see early it's very difficult for it to learn it at a later point of time and that might in fact be something that it shares with us humans. And so enjoy the continual learning exercise.