 Well, how else might we build smarter, better deep learning systems that are more general? And three of the methods, one of which we've seen self-supervision and two we haven't, continual learning and maybe curiosity, seem like ways to go beyond the simple supervised learning paradigm that dominates most of deep learning. Now, we've seen self-supervision as being critical. Auto-encoders, take an image, put it into some hidden state, reconstruct it, maybe by putting a noise, right? We saw a variety of auto-encoders. We saw language models. Given a sequence of words, predict the next word. Given a set of words, mask them, predict the next, pick the masked words. All of these allow us to have lots of data with no labels, where we still learn useful things from the data, which we can then generalize and apply to small labeled sets, right? And this has really been the key to lots of the gains in the last few years in deep learning. Well, it'd be nice if we could combine that with something where we keep learning as we see more and more different problems. And unfortunately, neural nets suffer from catastrophic forgetting. It's been known since the 80s that if you train a neural net to learn the single-digit times tables, 3 times 4, 7 times 9, and then you train it on 12 times 14, and you go back and look at the earlier ones, it's forgotten what it learned before. If you think about what gradient descent does, gradient descent moves you toward the current thing which you're being trained on, therefore necessarily forgetting the first things you were trained on. Humans are pretty good. When we learn a new fact, when we learn a new language, we don't immediately completely forget the first language we learned. Now, a little bit, we get some degradation, some interference, but mostly we're pretty good at learning a second language without forgetting the first one. And by the time we learn the third or fourth language, we're getting better at it. Now, how might we implement this sort of learning in an artificial neural net, a deep learning system? People have tried a bunch of methods and this is still an open problem for you to work on. One is as you're seeing new observations, new input output, x's and y's, add more neurons so that if you learn more and more, maybe you just put more and more neurons, don't change the existing ones too much because the new ones can learn the new facts. Okay? Another way to do that is use regularization. I've listed a couple of times. We've seen versions of this, but you could say I want to learn to fit the new xy pairs without changing the weights too much. So I'm going to regularize the weights that they stay sort of like they were before so that they tend to make the same predictions they made before, but also learn the new pieces. So I might build in some sort of a soft constraint, a penalty that says, don't forget, try to remember things. Or people do other things that are maybe a little bit more hacky or tricky. They make the computers, once you see the new facts, the new x's and y's, watch some of the old ones again, see some new ones, see some old ones again, maybe regenerate synthetic versions of the past, maybe dream about things you've seen in the past as you consolidate that memory. But all of these are trying somehow to pull in the notion that says that as you learn new facts, new x to y mappings, you shouldn't forget all the ones you've seen before. Cool, that's good. But that's not enough to be like a human. We can ask the question, how do humans or even rats move? Now, here's a question that was a trick question. I dump a rat in a maze with a start and some food at the end. What does the rat do? And I talked to some rat people at Penn and I discovered what do they do mostly? They wander around, they sniff it out, they check it out. If they're hungry, they will try and run and find the food. But the rats have to be really hungry to run to the food. They're like you. If you're hungry, you run through refrigerator. But the rest of the time you're out trying stuff out. So rats spend a lot of time just checking the stuff out, unsupervised learning, just seeing what's there, looking at the maze, watering around it, sniffing stuff, they got great smell systems. And then when they need to learn to find the food, they can learn it much more quickly. Our current computers are pretty crappy at that. They don't do a good job of exploring and thinking. So one question you could ask is how can I make a AI, a deep learning system that's more curious, that's more interested? Now currently we do reinforcement learning to maximize reinforcement. But instead of that, maybe I should encourage the reinforcement learning to take actions which are likely to be surprising, to give some result I didn't expect. And to encourage the reinforcement learning system not just to greedily move as fast to the cheese at the end of the maze, but to explore the maze and discover parts of the maze it hadn't done. To try pushing different levers to see what happens if I do it. To try taking different actions. So a hybrid reward system that rewards both exploration, learning new stuff, and exploitation going to the reward. A number of researchers try this. They often combine it with something for example like episodic memory. Remember not just the rules of this is what happens if you take this action, how good or bad it is. But also remember what have I seen? Is this familiar? Is this not familiar? Humans and rats are remarkably good at recognizing, yep I've seen this before, nope I've never seen this. And if you augment your learning with that sort of episodic memory, this I remember having seen this yesterday, I had this for lunch yesterday, oh I've never eaten that, I'm curious, let me try it. Then you can actually reward the RL system for discovering novel environments, novel actions, as well as for trying to succeed at its end task.