 Thank you, Yashua. Just to introduce myself, I'm David Cox. I'm the director of the MIT IBM Watson AI lab. And we're thrilled to have you here. And we're going to have a brief fireside chat. Apologies that there's no fire here. I was actually told that technologically it was possible, but we thought that that would be tacky. So Yashua, I just want to start off before we dive into the technical discussion on the fascinating talk. Since it's been a while since I've seen you and since then you won the Turing Award. And I just wanted to take this opportunity to congratulate you publicly. It's a tremendously well-deserved honor. So I like that. So and also just as a bit of a plug for some of the other activities that are happening at AI Research Week, we're running a workshop on causality and transfer learning, which is constant with many of the ideas. We'll share some of our work, but also the work from the community in the Boston area. We also have a workshop on neurosymbolic reasoning and machine common sense, which I think is also in a lot of the similar themes. But I just wanted to unpack two things. And one of them starts with that word, the S word. So you didn't say the word symbols. And yet in many ways symbols embody a lot of the ideas you have. So they're compact, low dimensional. What are your thoughts? So in deep learning, we don't do symbols. We do distributed representations of concepts. So what I personally want to take out of the classical AI work with reasoning and logic and so on actually is not the symbols. I mean, there's discrete concepts for sure, but it's much more profitable from a generalization perspective to think of these discrete concepts as having attributes that make cats and dogs close to each other so you can generalize across them. But the thing, so that's like an old idea from Jeff Hinton, right, in the early 80s. But the new thing is with the attention mechanisms I've been talking about. What the attention mechanisms I think can bring to deep learning that is also inspired from classical AI is the ability to manipulate variables. The ability to do computation on objects and not always the same objects. Just like in programming we have functions and we can call those functions with different arguments depending on the context. Whereas in a normal neural net it's always the same neurons sending their outputs to the same other neurons. But when you put in an attention mechanism it's like you're saying, oh, I can choose which neurons are gonna send their outputs to this guy, right? And now you have to start thinking in ways that look more like classical programming or classical AI where in addition to values being sent you have to tell the guy which receives who's that guy who's sending? Like what is my name, right? What, and names now is not gonna be a symbol it's gonna be a vector which is like a key in transformer networks and attention mechanisms you have key inquiries. So the key becomes like a type or a name that allows the recipient NLP to know what is this variable that it's getting not just its value but who's that, right? What is that variable? And now you can do things of course with recurrence to recurse and do these things. So I think a lot of concepts that have been developed in classical AI including like chains of reasoning and so on can be transported to the neural net world but in that way transforming that the way we think about neural nets from these vector processing machines to systems that can handle variables, recursion and all these things. Yeah, so we've been investing heavily in what we're calling neurosymbolic and other people are calling neurosymbolic systems and we're actually agnostic on what we mean by that it's always good to have a little bit of a vagary if you're not sure yet what you wanna brand something as but there's a version of it where we say, okay how can we teach neural networks to process things that are symbol like, like what you're describing but there's another view which says, well if we have symbolic systems and we can get from neural networks to get from the world to those symbolic systems. That's not gonna work. You don't think that's really good. So David Cecillo has an interesting quote which is, you know, reasoning, human reasoning and logic is crappy Turing machines running on amazing neural networks and neural networks today are crappy neural networks running on amazing Turing machines. So, you know, if we actually combine the two and again, we're agnostic on this. We're doing a little bit of both. Yeah, nobody knows. I mean, we're exploring. Exactly. So what are you talking about? My feeling is that just like sticking classical on top of neural nets is not sufficient. So one, you wanna keep this soft probabilistic distributed attributes and not suddenly move to the pure discreet logic thing at least to mimic the kind of reasoning that humans do, right? You can do amazing things with classical programming and optimization and search. But I mean, if your goal is to emulate the sort of reasoning and high level condition that humans do, I think you have to build in the system one part, like the deep learning part inside of the reasoning, right? So it can't be totally separated. For example, the control mechanism which decides where to search because when you do reasoning it's basically like the search, right? And right now our reasoning systems they are like super expensive in terms of search. Like they look at zillions of things. Multicollow research is probably the best integration we have right now and human reasoning is very, very different. We just look at two or three things. Well, it sounds like magic, right? And it's because we have a controller that decides what to focus our attention on, our thoughts on and it's very, very efficient but it can only do it because it's grounded in sort of all of that intuitive background context that the system one provides. Sure, and then increasingly there are formal reasoning systems that rely on internal network to do premise selection for instance, to reduce search spaces and things like that, so lots of confluence of different ideas. Probabilistic programming is another sort of intermediate area where you can actually have program, you invoke arguments and as you imagine increasingly sophisticated systems, probabilistic program gives you the ability to represent things that are not fixed and discreet and don't have that brittleness. How do you feel about? I think it's an interesting direction but again, there are many ways you could do it and I would favor the ways which don't lose these distributed representations, right? If you do it purely in the symbolic realm, even probabilistic, it's not sufficient I think. So another, just to dig into another area which you invoked a lot is biology. So in the audio laid out this idea that there's this sort of intrinsic motivation of deep learning, originally came from biology but it's a fairly loose inspiration. I know this is an area you work in, we've talked about it before. How important do you think biology is? How does that flow? I mean the work you do in biologically plausible back propagation, is that meant as a service to neuroscience or are you looking for, is that scientific discovery or do you think there's real kind of gold there and inspiring us for new approaches? So in a good fraction of the talks, my first slide is about this hypothesis that there might be a few simple principles which explain biological intelligence and that we use to build intelligent machines. If you believe this hypothesis, which is a hypothesis of course, then the kind of work that tries to build things that are both compatible with our scientific understanding of what works in AI, machine learning and what seems to happen with biology should bear fruits on both sides. And I would say these days I'm more interested in talking to people who work with child development or high level cognition because I'm thinking about these things about causality and reasoning and so on. But I think in general we should look at potentially all sources of information about human intelligence including even the social aspects, right? People are starting to look at that in order to serve as inspiration for our machine learning approaches. Yeah and the connection to cognitive developmental psychologists is interesting. We're now participating together with MIT in a DARPA program on machine common sense and actually a mandate of the program is that we actually include developmental psychologists as members of our team and actually running violation of expectation tests on agent-based systems that we've been asked to build. So one of the areas of neuroscience where I'd like to see more work done that would really be helpful for the program I've been talking about is regarding memory and attention because the kind of memory that we've put in neural nets these days, I mean like the extended memory neural nets like neural train machines and all this, they're not the kind of memory that you find in human brains. It doesn't look very plausible. So it looks like a lot of our memory and the way that we select like attention is more of a process of the dynamic coming from the dynamics of the circuit which leads to sort of concentration of activity in a few selected places rather than copying information in some memory bank and then retrieving it. I mean another area of intersection with biology of course is you mentioned sample efficiency and natural selection has a sample efficiency problem insofar as your samples are babies and you only have a few of them and they're expensive to make and choosing the wrong sample means they die. So do you think there are tie ends there? But evolution doesn't care if we die because you have billions of humans, right? True, true, true. But we care. We care, we care. And the process as a whole also needs to be efficient and so in many ways evolution is another layer of learning on top. So actually the work on meta-learning started in the early 90s and with my brother Sammy we worked, it was actually his thesis topic. We worked on inner-optimizing, inner-learning, outer-learning where the outer-learning was meant to be like evolution but because we didn't have supercomputers we said oh let's back prop into the outer-learning. And so that was the initial inspiration. Now it is people are thinking about meta-learning potentially as something that may happen within the lifetime, within one brain you could have learning to learn going on because we are exposed to changes in distribution as I was talking about in my presentation. Yeah and changes in distribution, arguably one of the natural selection is an example of non-stationary distributions and trying to adapt the architecture. Of course there are connections to neural architecture search and methods like that. Except that traditionally people have been using meta-learning and architecture search to optimize for IID generalization but there's nothing that says you can't use the same mechanism for optimizing for transfer learning for what happens when I use my learning in a new environment. Very good. I just wanted to end with a bit of a sociology question. So the first time I heard you speak was at a workshop at Snowbird and I think there were- That's the pre-Iclear conference. Yeah yeah so and I think there are probably a couple hundred people there. Yes. At most and- Good old times. The good old times. And now we live in a world where NeurIPS now we're doing a lottery this year, last year it sold out in 12 minutes. I was in a cab coming from the Sao Paulo Airport desperately trying to register during those 12 minutes and I didn't quite make it. I eventually got a ticket thankfully. But how has that changed? I mean as a field explodes the mean time somebody has spent in that field suddenly dramatically decreases and you get all these people who are new to the field and their history starts post 2000. Right, right. How has that changed? Is that all positive? Yeah so- But it has a lot of positive. I mean progress has accelerated for sure because of the sheer number of people doing research in this field. At the same time of course they don't read the old papers from the 90s and that's just deep learning or neural net papers but they don't read other papers and they don't even know what an expert system is or search, they just even not learn it in school. Like it's just very strange. And there are some negative effects also in the culture, it feels like everyone is in this crazy race for the next first author paper. Instead of thinking about the longer term picture. So when things were slow and there were just a few hundred people working on this you could dream of problems that might take years to solve and quietly hack at these things but now grad students are afraid of someone else in another lab doing the same thing and so they're rushing like crazy. So I don't know, there is something that needs to be fixed. I don't know if we can fix but the culture is too much focused on short term incremental progress rather than attacking really, really hard problems that might take a decade to address. So how do we give the next generation that space? How do we? So I think we should move from the conference publication centered mechanisms to more journal like things where there is no deadline you submit your thing when you feel it's ready and that gives less of a pressure to submit, submit, submit because it's a deadline, I have to submit something. Otherwise I'm not a good human being. So we need to reassign self-worth. Yes, yes. This is what you're saying. And also I think it's important for community service. I mean it used to be that doing things for the community was highly valued and now because there's so much pressure to publish we're losing some of that and it's very, very dangerous. Okay and one last question. So I've seen you give talks before where you end a few slides later and you talk about social good. That's right. And the importance of. And that's exactly connected to this. If we want people to start understanding the impact of their work in the broader society they need to step back a little bit from the next deadline and start talking to philosophers, start talking to legal experts, start talking to medical people and learn about the ethics of what they're doing. So I think it's very, very important because the progress in AI is leading very, very powerful tools and these tools could be misused and they will be misused and so we really need to pay attention to this. Okay and two minutes, 40 seconds on the clock. We have many students in the audience today. What's your advice to them? Don't fall prey to this sort of pressure to publish short-term, work on hard problems, be ambitious and ambitious doesn't mean more papers it means hard problems. Don't go for what's popular. Don't do the things that I say you should do but think by yourself. Fantastic. Well on that note, thank you guys so much.