 So I'd like to introduce Josh Tinnabom, who is the professor in the brain and cognitive sciences at MIT. He's also a member of CSAIL. We are absolutely honored that he's also the principal investigator for two of our MIT IBM Watson AI lab projects. And also just recently, we noticed that Josh is the number one accepted papers author at NIPS with 11 papers, which is the size of most entire institution. So well done and congratulations, Josh. Please welcome Josh, please. OK. Thanks so much. Thanks. It's a great honor to be here and to be able to work with so many colleagues. First of all, it's 10 papers, not 11, but it's counting. But again, most of those are collaboration, or they're all collaborations, where I played a small part in each of those. And I'm going to tell you about some work that's been around for a little while and also a few of those new projects. The work I'm going to tell you about here represents what we do in our lab at MIT, which is both cognitive science and AI engineering. The basic thing we're after, I think, is building machines that learn and think like people and at trying to understand the roots of intelligence in the human mind and brain. That's the really big question. I have many affiliations here. Again, I'm very proud to be associated with the MIT IBM Watson Lab. I represent also CSAIL and the Center for Brains, Mines, and Machines. My academic affiliation is in brain and cognitive science. And I'm one of the leaders of the Quest for Intelligence, one of the scientific directors. So I'm going to try to introduce all of these things to you here today. I think the easiest way to motivate the big question is to just point out something which I think we all recognize to be true. Well, at this point, we have amazing AI technologies. We don't have any real AI. We have systems that do things that we used to think only humans could do. And now we have machines that do them. But we don't have any machine with the flexible general purpose intelligence that each and every one of you can use to learn on your own to do each and every one of these things without having to be built by a special devoted team of engineers. So what's missing? And how do we fill in the gap? That's our fundamental question. And there's many, many different perspectives on that. I'll just give you the perspective behind our work and what we focus on. And it fits very well. I mean, you'll see many resonances with the previous panel, right? It's this point here that intelligence, what's driving today's AI technologies are dramatic advances in pattern recognition and function approximation, right? But intelligence is so much more. In particular, it's all these ways in which we model the world, our ability to explain and understand what we see, to imagine things that we haven't seen, maybe things that nobody's seen, and then to make plans and solve the problems that come up along the way to make those things real, and then learning as building new models of the world. So I'm going to tell you about the work that we've been doing trying to address these things, and especially studying the model building abilities of even young children. I'm extremely motivated by the fact that even a one-year-old child has a kind of common sense intelligence that none of our machines yet have. But imagine if we could build that. So here's one of our quest for intelligence moonshot slides. And I've got a few of my collaborators on here, but it's a much bigger team at this point. But we ask, imagine if we could build a machine that grows into intelligence the way a person does, that starts like a baby and learns like a child. That would be real AI in some sense, machine learning that really learns. OK. And it's probably the oldest good idea in AI for good reason. Right? In some sense, it's the only scaling route that we actually know works. Now you might say, well, could we actually achieve this? Probably not, but in a short time scale, and maybe not in our lifetimes. But we already know from the history of both AI and neuroscience and cognitive science that even small steps towards ambitious goals like this could be big. So just to remind us all, the history of deep learning and reinforcement learning started in papers like these. Papers that were published in the 60s, 70s, and 80s, the 1960s, right, in journals mostly of psychology and computational neuroscience. Each of these was a very simple model of the most basic learning processes, like think Pavlov's dogs, simple associative learning. But what we saw is that, formulated in the right way, simplified scientific models and realistically understood, as Joey was saying, but scaled up can still do big things and change the world. So imagine if we could take the next small steps towards learning like a child. Now this, as I said, is probably the oldest good idea. It goes back to an even earlier paper from Turing's famous paper in the 1950, when he introduced the Turing test, along with his only good idea of how we might solve it, is to build a child mind, a machine that, again, starts off like a child and is taught. Now Turing inspired many others. Marvin Minsky, in some sense, all the great figures of AI at one time or another championed this vision. But we might ask, if it's such a good idea, why hasn't it worked yet? And I think the basic thing we have to recognize is that only now are the fields of AI and the scientific study of children's minds and brains mature enough that they can talk to each other and usefully inform each other. Turing was brilliant, but look, he could only presume how children's brains worked. So here's a famous line. Presumably, the child brain is something like a notebook as one buys it from the stationers. Rather little mechanism, lots of blank sheets. But what we've learned now from several decades of research and two of my colleagues here, Laura Schultz in the middle in Rebecca Sachs, are two of the leading figures in the field of the study of human cognitive development. What we've learned, for example, from Rebecca's work, she's most known for studying functional brain imaging. And she's recently released some of the first functional magnetic resonance imaging studies of four-month-old to six-month-old human infants and shown that a lot of the architecture of high-level vision is already in place before, in some sense, even at the time that you're born. The large-scale brain wiring and systems for seeing the world in terms of objects and other people and people's faces, for example. Laura Schultz studies children's learning and has studied all the ways in which children are not just passively writing things down in their notebook from the Blackboard. But active learners, forming intuitive theories, what's sometimes called the child of scientists, testing hypotheses, exploring with curiosity, doing the same kind of experiments scientists do. When children do them, we call it play. Actually, when scientists do it, we also call it playing around in the lab. But she's studied the ways in which children's play is perhaps the root of what makes human children the smartest learners in the known universe. So what I'm trying to do is to capture, work with them, capture those insights in computational form and engineering terms. We start with what we call intuitive physics and intuitive psychology. These are some of the basic common-sense knowledge systems, which again, in some form, seem like they're present from birth, not just in humans, but in other animals as well. And then we try to capture them in engineering terms and also understand how they might be learned and how we go way beyond what we start with. So by intuitive physics, we mean like what you see this one and a half-year-old doing here, stacking up cups. I mean, if you compare that with the state of the art and robotics, and we realize quickly that there's no comparison, the understanding that the world has real objects, the forces, the fine-grained physics needed to make that tower of cups. If we could put that into robots, that would be amazing. And I think it's not crazy to aim for that. Or when we talk about intuitive psychology, we mean what this one and a half-year-old is doing here from the famous experiments of Felix Warnikin and Michael Tomasello. Warnikin is the big guy there. The subject in the experiment is the little guy in the corner. Just like you, he's seeing an action that he's never seen before, but he figures out what's going on and even how to help out, right? Just wait. It gets better. It's cute. It's sweet. But it's also really deep. So again, consider what has to be inside his head to make sense of that action and to know how to help out. If we could build robots that could help out around the house like that, it would be amazing. So how are we going to do that? Well, again, calling back to the previous talk, right? Our job in engineering, we start with actually building on some of the tools that Vakash Mansinga and colleagues have built, these new AI programming languages known as probabilistic programs. These are tools, the way I like to think about them, is that they're tools that bring together a number of the best ideas in AI going back to the beginnings of the field. So yes, neural networks for pattern recognition, OK, but also symbolic languages for abstraction and real knowledge representation and reasoning, probabilistic models for dealing with uncertainty, causal models for explaining and understanding and intervening and asking counterfactuals, hierarchical structure so that we can learn to learn new things faster and not just learn one thing at a time, and so on. And I think the ability to bring all these tools together, that is as much as anything is what's making AI so mature and exciting right now and actually able to engage with the kind of cognitive phenomena that we've just been talking about. Some of the particular kind of probabilistic programs that we've been looking at wrap inside this general purpose framework programs that come from the game industry. So just as the game industry transformed AI on the hardware side, driving the GPU development, it's also transforming AI on the software side. The kinds of tools for very fast interactive real-time graphics, rendering a 3D world from a 2D perspective as a player moves around, physics simulation, and even game AI, simulating other agents. We think that those programs when embedded in a framework for probabilistic inference and causal reasoning provide the basic tools for capturing in some sense what babies' brains start with and then also extended with new tools I'll tell you about at the end for program synthesis for how you might be able to learn beyond what you start with. So we've built, for example, what we call the intuitive physics engine, which takes a game physics engine, wraps it inside a probabilistic programming framework for inference and can address the basic challenge of how you can answer so many questions about a physical scene. Like, for example, in these stacks of blocks, think like the blocks Jenga from the game Jenga, you might ask which of these are more or less stable? Think for yourself on a scale of one to seven, how likely is any one of these stacks of blocks really fall over or is it really stable? The data you're seeing there, that scatter plot, is plotting human judgments on the y-axis against the predictions of our probabilistic intuitive physics engine, and it captures it pretty well. But it doesn't just answer this question, it's not like it's trained to answer this question. We do this by the kind of probabilistic simulations that Joey and Vakash were talking about before. I'll show you one in a second. The same system, though, can answer this question, will the stack of blocks fall? If they do fall, which way will they fall? How far will they fall? What happens if some of the different materials are different, the different colors are different materials with, say, different densities? What if the gray stuff is 10 times heavier than the green stuff? How will that change your answer? Or in scenes like this that look surprisingly stable, which color material is much heavier? Or consider this last question here, which is, again, a perfect connection to what we saw before. It's a kind of counterfactual reasoning that we might use in planning actions. And all of these things are not based on data at all, they're based on having a mental model and being able to do the right probabilistic simulations. So this is a question, unlike the question of will these blocks fall over, which you might have a lot of experience with and can do pattern recognition, consider this question, which unless you've seen me talk about this before, you've probably never thought about. We have a table like this, and we ask, what if the table is bumped hard enough to knock some of the blocks onto the floor? Is it more likely to be red blocks or yellow blocks? So what do you say, red or yellow? Red depends on the right. Yeah, but you have to choose. It's too alternative for us choice. Red, very good. OK, how about here? Yeah, yellow. Yellow, good. OK, you guys were great. So you just saw, that's how we do behavioral experiments in cognitive science. And we just replicated it here. You could tell in the different reaction times and different amounts of uncertainty, you could see the probabilistic inference going on in each and every one of your heads. Now, how do you do that? How can you do that? Well, here's the way our model tries to capture this. I'm not saying we know for sure this is right. But it's something like this. We can reconstruct one of these scenes in one of these game physics engines and simulate a small bump. And here we can simulate a large bump. And the key is that it doesn't really matter which of those we simulate. And you don't have to simulate it for more than a few time steps. And you don't have to simulate with high precision to be able to just read off the answer, which is, very quickly, you can see in your mental simulation that it's going to be all or most of the yellow blocks and few of any of the red blocks. And that sort of idea allows us to make a quantitative model, which is just predictive of your counterfactual judgments as the much more basic thing of, will this stack fall over? No learning yet. That'll come in a second. Now, the intuitive psychology engine, just to say how these ideas are not just about intuitive physics, although we'll mostly be talking here about objects and physics, consider a scene like this where now we're going to build on physics and forces to understand action. Again, a basic insight that goes back to even very young infants is that when people move, they are acting efficiently, in some sense rationally, according to a cost-benefit analysis. The costs are the forces that they have to exert and deal with. And the benefits are the rewards they attach to their goals. And using that, we can work backwards to infer somebody's goals from how they act. So when you watch this woman here, she's reaching for one of the 16 objects on the table. And ask yourself, which object do you think she's reaching for? And just raise your hand when you think you know the answer. So here we go. It's moving in slow motion. Just raise your hand when you think you know the answer. Okay, I think most of the hands are up now. All right. And I wasn't watching the screen, but you might have seen that dashed line go up around the time when I said most of the hands are up. And that's the predictions of our model, which is working backwards in a Bayesian way, inferring what's the most likely goal that best explains her action under a rational cost-benefit trade-off. The same kind of idea can explain in these scenes where multiple people are acting. In doing, again, slightly weird things, why does this scene here look like one person is helping another person? You have to understand that person's goal and why they're reaching the way they are and why one person is doing something to help them. Or in a scene like this, why it doesn't look like helping, but the opposite. By embedding these models in a recursive way where agents utility functions can depend on their expectations about other agents utility functions, we can capture helping and hindering. All right. And maybe most excitingly from a scientific point of view, the same kinds of models can be tested quantitatively in young babies. So we've done studies of intuitive physics in 12-month-olds and been able to predict in these simple kind of bouncing gumball, lottery ball machines quantitatively how long infants will look at scenes according to whether they're more surprising under our probabilistic physics engine. Or in scenes like this, this is intuitive psychology in 10-month-olds. You see, when faced with a costly action, the red guy declines it here, but now when faced with the same costly action, well, he's going to accept it. And infants also do a cost-benefit trade-off in understanding his actions. Say, well, he must like the yellow one more. Because he's willing to do more work, which we actually quantify in physical work done for supplied over a path. And it doesn't matter whether the work is jumping over a wall of varying height or sliding up a ramp of varying slope or jumping over a gap of varying width. In each of those cases, the infants infer a graded preference stronger for the one that the agent is more likely to work for. So that's the common sense, basically. That's our attempt to engineer, there hasn't been any learning. But now we want to ask, where does learning come into the picture? One possibility, which is a certainly legitimate one to pursue, although it's not the one we've been pursuing, but there's a lot of momentum in AI around this, especially people interested in what they call AGI, is to see how much we can just learn from scratch. That one possibility is that's how it works in the baby's mind, or we might just try that out in AI. So with amazing successes that we've seen, for example, in learning generative models from raw pixels, like in GANs, including some classic papers of two years ago, and this is a recently submitted Eichler paper. It's amazing what you can do right now in synthesizing pixel images. But these systems, as far as we know yet, they don't have that same kind of understanding of the world behind the pixels. Although it's an open question, they have some understanding and we're trying to figure out, actually, what they do. But at least that's not the way they're built. At DeepMind, they've been also inspired, these are two DeepMind papers, one on intuitive physics, one on intuitive psychology. They've also been inspired by the infant development literature. But again, they're trying to build models that learn this stuff from scratch. And in many ways, interesting things are coming from this. But I think what we also see is that it's very hard for systems that try to learn everything from scratch to generalize in fundamental non-trivial ways the questions and objects and worlds that they've barely experienced before, like each and every one of you do every day and every young child does and you're doing even in this talk. So the kind of approach we've been pursuing is to think about learning in the context of these rich knowledge systems. How can we learn within the game engine to make it better? How can we possibly learn something like the game engine itself? And how can we use all of these components to learn everything else in intelligence, like language and then all the things that language enables? So just to give you a brief tour of some recent work here, and when it comes to learning in the game engine, what that means is using, in this case we're using deep learning technology, we're using GANs, but we're taking some basic insights about how the physical world works, that it's three-dimensional, that it's made out of objects. So for example, Jajun Wu and colleagues built a couple of years ago a 3D GAN which can learn generative models of actually rich object shape. Most recently this has been extended in collaboration with Jajun Wu and colleagues. This is a NIPP paper to be presented this year that's called Visual Object Networks where we take a 3D GAN model of shape in terms of fine-grained voxels and then we just take the basic insights of graphics which is you have shape, right? You have a viewpoint, you have material which turns into surface texture and then all of that can be rendered to form an image, a pixel image like this image of a car, but it's generated in a way that has learned components, but the components follow our basic understanding of how objects and graphics work, right? So it comes pre-explainable, right? Pre-disentangled, you don't learn that graphic structure because this has existed, now I'm switching back to science, this structure has existed in the world since forever and it's quite possible, I'm not saying it's true, but it's quite possible that our brains and many others might come pre-built with that kind of architecture but then we learn about what are the actual things and what they're made of in our world. So this system can do much better job at learning generative models at the pixel level but the key isn't just that our cars and chairs look better than ones that just try to model pixels but we can actually think about them. So we can imagine the same object from different viewpoints or what if we keep the viewpoint and the material the same but change the shape or what if we change the material while keeping the other factors the same, okay? So I think this is one illustration of how we can combine learning with something like the structure of basic graphics engines in this case. You can take these same kinds of models and put them into a vision pipeline, something that knows about, those of you who remember Mars 2.5D sketch that also knows about the value of that representation of let's say the depth to the nearest visible surface and that becomes a way to actually see in real images rich 3D shape. So this is a paper that was just presented at ECCV by basically the same crew. That's Jajan Wu again there and a number of colleagues together with Bill Freeman, okay? Most recently, and here's another NIPPS paper to come this year, you can put in a little bit more information that not just that objects have visible surfaces but that they have invisible surfaces like the backs of things and that there's statistical regularities relating the visible to invisible surfaces. So we put that in with what's called the surface in-painting network and now we have a system that can actually see for novel objects, not just ones it's trained on, okay? In many cases, not perfect by any means but can see 3D shape. So a system trained to do 2D to 3D for say cars, planes and chairs, now we can give it tables, sofas, benches and lamps and it does remarkably well, especially for objects that have some symmetries to them which is what that surface in-painting network is capturing. Asymmetric objects like this one can break it, okay? But that's plausible also from a biological point of view. Oops, sorry, here we've got sofas, even people and animal shapes. I mean it's really quite remarkable I think what these guys have been able to do. Okay, and I'm lucky to even be slightly associated with this. This paper is work of many, many colleagues. Okay, but none of that actually learns concepts. That's all just seeing objects. What if we wanna actually see one instance of a concept and like any young child, infer a broader generalizable class, okay? So for example, how do we see an object like this piece of rock climbing equipment here, this cam that you've never seen before, unless you've seen me talk about this or you're an expert climber, and recognize other instances, okay? We need to do the 3D vision that you've just seen but we also need to generalize to things that are just different in shape, color and so on. So we tackled this a couple of years ago in PhD, the PhD thesis of Brendan Lake who was a student here and is now a professor at NYU. And again, at that time, we didn't know how to solve the 3D vision problems. I'm not saying we know now, but we've made a lot of progress. So we worked like other people in machine learning, think about how much of deep convolutional networks were driven by the MNIST handwritten digits. So we worked with these handwritten characters to try to build models of how we can see, again, not just the 3D structure, but the abstract structure that describes the way things within a class vary and how they're different and how those dimensions of variability are different across classes. And that requires us to bring in, again, the tools of probabilistic programs, but now is the basis for the models that we're learning. We call this work Bayesian program learning because we're learning a probabilistic program that describes each class. So you can see that here, for example, these are essentially motor programs or action programs, very simple visual classes that have parts and subparts or routines and subroutines. And the idea is that those describe the causal process of how characters are generated. And then we can do Bayesian inference to work backwards to infer the probabilistic program most likely to have generated any one new character. And that allows us to see characters, even in alphabets that you've never seen before or our system has never seen before, figure out how they're drawn and thus how to generalize them. And we can test this in a simple kind of Turing test. So here again, if you haven't seen this before, actually I've seen this 100 times and I still don't remember. In each of these cases, one grid of nine is human drawers who were shown the one new character on top and asked to imagine another one, just draw another one. And then the other is our model. So see if you can guess which is the human or which is the machine. Anybody think they can tell? It switches always. It switches always, yes. So can you tell which is which? Well here, I'll trust you. I'm gonna show you the answer and you tell me how many you got right. So there's the answer. Did anyone get them all right? No, probably most of you got about three right if I had to guess. How about here? Can you tell? Yeah? Okay, again, here's the right answer. Basically we passed this very simple Turing test because our machine is indistinguishable from humans here. It's obviously just a very small step. We recently made some progress on extending this idea. For example, a recently submitted paper is trying to learn 3D shape programs. So to take the same kind of voxel image as you saw before but learn abstract programs which describe the repeatable structure like the arms and the legs of a chair, okay? But this is just in its infancy, so stay tuned there. In the last few minutes, I just wanna move towards the more fundamental learning problems, right? So we've talked about all these ways that if you assume something like that game engine in your head, you can learn what kinds of things are out there in the world and how to see them. But what about actually learning the mental simulator? So could we build a system that from some kind of experience, either in an individual child's life or more evolutionarily, experiences the physical world and builds one of these mental physics engines, okay? This is a very hard problem. We sometimes say this as an instance of what we call the hard problem of learning because it's a lot harder than learning in neural networks, right? Neural networks end to end differentiable systems. The reason why they are so appealing technologically and from an engineering standpoint is because there's a smooth optimization landscape and learning even in a very high dimensional space just comes down to rolling downhill. But if you're talking about search in the space of programs, symbolic programs with rich data structures like in the game engine, then there's no nice search space yet somehow children solve this. So how are we gonna do it? Well, we don't know. But two ways that people are exploring this is one is to try to turn it into the first kind of problem. So to try to basically turn it into a neural network learning problem. And here I think nothing's worked yet but we've learned something important which is that you need objects, right? You need some of the basic or at least it seems very valuable and likely something you actually need. The basic symbolic structure of objects and their interactions that are at the heart of any physics engine. If you try to learn as a number of people have tried to learn basically physics from raw pixels, it works okay. But again, you need huge training sets and it doesn't generalize very well. But recently a number of papers such as the interaction networks from DeepMind or the neural physics engine of Michael Chang. These are Chang slides, by the way. Chang was an undergrad here. He's now a PhD student at Berkeley. They've shown that if you actually build in the notion that there are discrete individual objects and some kind of graph structure of their interactions and then you learn the interaction. So it's like you learn the forces. You learn F equals MA, but you build in the idea that there are objects and that there's something like pairwise forces that guide the dynamics. Then you can build models that scale to different numbers of objects, different environments. Still all balls bouncing in a box, but still it's an important step. And we think it illustrates the value of building in certain kinds of basic structure that cognitive scientists have long recognized might be built into the human brain into our neural networks. The other approach is just to sort of go completely to the other side and say, all right, well, maybe we're just gonna, forget trying to make it into a neural network. We're gonna actually learn programs. For example, you might start off, I mean, again, this is a very simplified science fictional model, but I want you to get the idea. Start off with something like Unity game engine and then you learn the game engine of your life. That means that learning is a kind of programming. We have this metaphor we've been playing with for the last year or two, which we call the child as, well at MIT we call it the child as hacker. Because at MIT we know hacking is a good thing. For the rest of the world, we call it the child as coder. But the same idea, that in some sense, learning is like coding. And the goal of learning, just like the goal of coding, is just making your code more awesome. And that can mean many things, faster, more efficient, more robust, more explainable, more elegant, more reusable. And all the activities that coders do, we all do when we write code, to make our code more awesome, they all have analogs in children's learning. And if we want real learning algorithms that learn like children, we have to have algorithmic versions of all these activities. Not just tuning parameters of existing functions, like in stochastic gradient descent or backprop, but all these various ways of writing and rewriting code. So in one recent development, we're presenting an early version of this at NIPS. This is work led by Kevin Ellis. And we have new other versions that are still in the pipeline. We take inspiration from the way a lot of learning goes on in sleep, right? Where you replay experiences from life, you consolidate out abstractions and you imagine situations that are somewhat like ones you've been in, but different. We have a system that we informally called DreamCoder, which learns to write code. It learns to write programs. And it does it with a combination of hierarchical Bayesian learning and basically self-trained neural networks. The same kinds of pattern, the same way we use pattern recognition to learn to see the 3D world, we can learn to find patterns in data that suggest the kinds of code we need to describe it. And then the last paper I'll tell you about, which is also a NIPS paper. And this is very, very happily, a collaboration between MIT and a number of other groups, including Harvard. The first author, Casey Ney, is a physics student at Harvard. But also a key collaborator on this is Chuang Gan, who's a postdoc at IBM and MIT as part of the joint lab. And here we've put a bunch of these pieces together. The sort of learning to de-render a graphics engine to see visual scenes like these blocks world scenes from the clever data set, okay? And then analogously to sort of de-render language, which means to learn to parse from strings of words into functional programs. We have a system that tackles the popular task of visual question answering, as people call it. But the key here is that all the actual reasoning is done completely symbolically. There's a symbolic description of the scene and a symbolic description of the meaning of a question. And then you just put them together to compute the answer in a very traditional way. What the neural nets do is that they learn to produce the symbolic parse of the scene and the symbolic parse of the words. And I'm not saying that this is the way it works in the human brain or the way ultimately wanna build AI. But what we show here is the value of pursuing this kind of idea. And I can tell you more later if you're interested about where you go from this. But the value of combining pattern recognition and symbolic reasoning, which is effectively also a kind of causal inverse here, bringing all these ideas together to really start to really connect the different parts of intelligence, seeing, understanding language, thinking. So just to sum up then, I've tried to give you an overview of the work that we and many other people have been doing to try to reverse engineer the basic origins of common sense. The things that we start with as infants, how language might build on that. And then you can start to think, well, once you have language, you can get everything else. I've shown you a little bit about how new tools from probabilistic programs, game engines, program synthesis together with deep learning methods, provide a powerful engineering toolkit for this reverse engineering enterprise and transporting directly some of those ideas, hopefully into AI. But I hope you have also seen that. We're really just at the beginning of this, especially for the big part of the quest for us, which is, can we actually fulfill AI's oldest dream to build a machine that learns like a child? And I think we don't know, but at least we have a roadmap, right? In contrast to Turing, who could only presume about the starting state and the learning mechanisms, now we actually have some good ideas, both scientifically and in engineering terms. So if you're interested and excited about this, I hope you'll work on this in some form, or maybe even join us. Thanks. So I guess we will have some questions. And should I just, how does this work? Do I just talk about the old question? Yeah, yeah, okay. Sorry, I can't actually read that. So I'm sorry I have to keep looking back at this, but human intuition as an inference mechanism that's been set in many fields. Well, I guess people, no, that's new. I guess I'll go with this one. Huh, is machine and bottom of algorithm drive intuition different? I'm not, I like raising hands, honestly. So yeah, I mean, intuition, you know, I use the word intuitive. Intuition is one of these loaded words, right? It probably means different things to different people. It is our goal as cognitive scientists to reverse engineer human intuitions. And what we mean by that are things where the output of the computation pops immediately, or you know, pops readily to consciousness, but yet you don't know how you got there, right? And interestingly, right, I think we think some of our intuitions are driven by things like pattern recognition, and which might be well captured by neural networks, and others are driven more by the kinds of mental simulations that I've talked about here, or probably most intriguingly, hybrids of those, okay? So yeah, I mean, it is really our goal to try to reverse engineer human intuition, and if we want to give intuition to machines, I think that's gonna be part of, very much part of what we're gonna come to recognize as machines that actually have some kind of common sense. And that's huge, right? I mean, people mentioned earlier autonomous driving, and how that's a relatively simple problem. It is a relatively simple problem in the grand scheme of things, because we put each and every one of us, when we put our 16-year-old kids, or we were once 16-year-old kids, and you get behind the wheel, and if you're in the United States, you have very minimal driver education before you're just unleashed on the world. It better be common sense, driving it, right? Driving your driving. And I think the more we can, for example, put things like the kind of common sense that every 16-year-old has. I know we often talk about some 16-year-olds could have more common sense, especially when they're, yeah, but some 46-year-olds can have more common sense, too, as my 16-year-old daughter will tell me. But the point is, all of us have a basic understanding of the physics of the world and what other people are doing, and if we could give that, for example, to our self-driving cars, it would be worth, in many things, far more than all the data in the world could buy you on its own. What else do we have here? What is the role of emotion? I'll answer that one because it is reliably, and about half of the talks I give, the first or second question asked, so I'm glad to see that comes up here. And I didn't say anything about emotion, but it's a really good question. And people often pit emotion against rationality or against computation, and I don't wanna suggest that we should make that, make that, see those in opposition. In fact, some of the exciting new work that is just starting to appear that I am very honored to be a little bit involved in is work that's being done by Day Hulahan in Rebecca Saxe's lab. Day is a grad student. Rebecca Saxe is one of my colleagues who you saw there. Rebecca's work is far broader than the work I told you about with baby fMRI. She's really well known for studying theory of mind, both in cognition and in the brain. And she and Day have been launching an exciting new research program that I'm helping them with a little bit, along with Max Kleinman-Weiner, a former PhD student of mine, which is trying to reverse engineer how we think about emotions. It's not about trying to understand our own emotional experience, and on the AI side, it's not about trying to give machines emotion, although it could be used for that. It's actually first and foremost about what you might call emotional intelligence. How do we understand each other's emotions? And again, as I think many of you know, some of you probably even work on this, there's a lot of work in computer vision and AI in trying to give machines emotional intelligence. But what that usually means is you're gonna look at pictures or movies or listen to speech and try to see is this person happy or angry, right? To diagnose either one dimension of valence, positive, negative, or maybe a few, like four or five or six basic emotions, surprised, happy, angry, sad. Okay, that science grounded because actually if you go back many decades, that was what we thought about emotions, famously starting from Darwin. But in the meantime, cognitive neuroscientists and social neuroscientists have learned a lot more about how emotions actually work and how we understand emotions. And it's not just a one or two dimensional space or five categories. There's literally hundreds of words that we use in English to describe emotions. And think about what they all mean. They all mean something important and different, right? What's the difference between being angry and upset? Okay, there's a little bit of a difference. What's the difference between being disappointed and feeling regret? What makes frustration or satisfaction? Those are just a few basic fundamental emotions. And it might not be something that our two year olds understand. In fact, it's one of the cognitive development tasks that language and theory of mind, both your own and others plays a key role in. The ability to understand emotions at that level is a cognitive achievement. And we're trying to reverse engineer. What are those emotional concepts? How do children learn them? And ultimately, how do they work in the brain? And I think if we can do that, it will be the basis for, as part of the stuff I'm talking about too, together with it, for building machines that can really interact in a human world, that we can talk to, trust them and trust that they might be able to trust us. And that's independent of whether they have the same kind of emotional experiences that we do. It's about their ability to understand our emotions and our ability to understand that they're understanding us. So I'll just say that's a long-term projection, but it's something that I hope will be appearing in this setting in the not too distant future. One more question. How do we scale Bayesian inference to be done on the same time scale as human infants? Okay, great. So you mean, whoever asked that, do you mean the learning, like Bayesian learning of programs, or do you mean like perception of things in the moment? Yes. The second one. Both, okay. So for perception of things in the moment, that's, you know, a lot of the things I described as using deep learning for perception, you can think of those as approximate inference algorithms in a Bayesian inference in a probabilistic program. And indeed, that's how they're trained. If you've heard the term amortized inference or inference compilation, that's the idea that a probabilistic program, such as one of these probabilistic graphics engines or physics engines, can be used to provide training data for a neural network. And the neural network is explicitly trained to do approximate posterior inference. So that's probably our best guess right now at how you make really, really fast, reliable online inferences in perception, again, which are settings that have existed for your whole life and as well as much of evolutionary time. But then you might say, well, what about, you know, the learning processes, those unfold over a much longer time. Okay, I don't think that the learning is primarily driven by what we now call, say, gradient descent. But you might ask, you know, again, how does it work? Well, that's what I was trying to get at. I mean, it's possible that something like those neural physics engines that I talked about, although they take a huge amount of data and still have not yet generalized in fundamental ways. But that's where the learning is programming idea comes in, right? So programming is not fast and is not easy. And it's not fun except when it's done. But it does work, right? You know, if you have a well-defined goal, you can hire a good software engineering team or even one good software engineer. And, you know, with some reasonable expectation, it will be successful. If it's not, it's probably because you didn't have a well-defined task or you hired the wrong person and so on, right? It turned out to be harder than you thought, whatever. Okay, but in all these ways, it is a lot like children's learning, right? It's not fast, it's not fun except at the end or when you make a little breakthrough. So all those ideas I was trying to give, which are really pointing to the future, right? We've made a little progress on this and it's not just us. There's a really exciting community right now, including folks at IBM and IBM Partners, but also, you know, within Microsoft, within Google, at Stanford, at Berkeley, at a number, you know, at NYU, leading academic and industry labs, okay? Interested in this idea of what you might call, sometimes it's called ML meets PL. Machine learning meets programming languages and the idea that we could have machine learning algorithms which learn to write code or to rewrite code or to write whole new libraries of code, okay? Like the system I was talking about there, okay? That's a super exciting development and again, we don't think it's going to be, it may not be, it may not look easy and fast in the way that we've come to expect, say, training a neural network. But remember, neural networks weren't always easy and fast. And I think it's not, you know, to me, it's a good bet that I'm willing to make that if a bunch of us work hard on these things and software and hardware engineering both keep up with us and go way beyond us and we try to keep up with them, that, you know, within our lifetimes, within my lifetime, we might, we'll come to see, say, program learning as reliable and as scalable a tool as the way we now see gradient descent in neural networks and it may not be as fast for some things but then again, the kinds of learning that it's trying to capture also aren't as fast. Okay, I think that's more than enough. Thanks very much.