 Hello, and welcome to Three Questions, a new series in which we feature new scientific publications and ask the authors, one, what they've contributed to the community, two, why it's important, and three, what's next on the horizon for discovery. I'm your host Mark Webber, and today I'm delighted to welcome Tian Min Xu and Abhishek to represent their team in discussing their new publication in the 2021 International Conference on Machine Learning. Tian Min is a postdoc at MIT, advised by Josh Tenenbaum and Antonio Tarraba. Abhishek is a research software engineer at IBM Research, and both are part of the MIT IBM Watson AI Lab, which has funded this work as part of an effort focused on developing what we call machine common sense. Welcome, gentlemen, and thanks so much for joining us today. We have your paper, which is Agent, a benchmark for core psychological reasoning. And we'd love to ask you first, what have you contributed in this publication this week? Right, first of all, thanks for having us here. So in this work, we are interested in building a benchmark to evaluate how much and where machine can understand core concepts in interest at college. So just to give you a little bit of background about interest at college, it is basically the ability to reason about mental states of other agents by watching their actions. So mental states includes the goals of agents, their beliefs, and their desires. And such ability actually comes naturally to people. Even preverbal inference can understand many core concepts about interest at college. For example, babies actually understand that the agent has a goal. And in order to reach that goal, you need to find the most efficient path with respect to some vertical constraints. And then in such a way, agent can maximize its reward and minimize its cost. So the question here is that we actually don't know whether machine agents that we build nowadays, such as robots, can actually understand this kind of core concept. And there's no way to evaluate currently. So we want to fill in this gap by presenting this benchmark to the machine and the community. So specifically, the way we design this kind of trials to test machine agents understanding about core psychological reasoning is very similar in a way that we design the experiments to test whether babies can understand such concepts. So there have been many studies in development psychology where people design experiments specifically to test whether and when babies can understand core concepts such as goals, beliefs, desires of an agent by just watching their behaviors. And we copy and develop a bit more trials in a similar way as people design the trials for babies. But then now we're presenting for machines. And then test machine, test by the machine actually can also understand such concept in the same way as babies do. We have a hypothesis which is actually the core social understanding builds upon your core physical understanding. And if a machine learning model want to succeed in our benchmark, such model also need to have a basic concept about how the physical world works. And just as we're showing these are very nice videos about how babies understand these kind of situations. Exactly. And I want to pull up the data set that you guys have created. And Abhishek, perhaps you could walk us through some of this a little bit. So in the goal preference task, we usually test whether what kind of object preference the agent prefers. So in goal preference, you usually have two objects on either sides. And we show the model, the video that the agent prefers one object over the other. So this forms the feminization video or the behavior expected behavior. And then in testing, we show them two videos. One where the object goes towards the goal object that was not preferred before. And then the other case, it did go towards it. So one forms the expected behavior and other forms the surprising behavior. So these kind of videos they use on the baby studies to see if babies understand preferences of the agent. And that's exactly what we're doing in this case where we're showing machine learning model these videos and see if machine learning model also understands preference of that agent. Right. And so there's four different scenarios in the data set. Goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs. John Min mentioned where an object might be jumping over something. So first you teach through familiarization, and then you say, okay, it's surprising if they're jumping when there's no object to jump over, right? Yes. And those are those common sense things that we take for granted as humans. But babies have to learn in their first 18 months. And so do AI models, right? Yes, exactly. One of the things that excites me that this paper represents a line of work that is really focused on this first 18 months of human developmental psychology. And the various stages of maturation and intelligence that take place over those 18 months. And so I'm hopeful this is not the last paper we'll be talking to you guys about, but we'll get to talk to you about a number of different papers with you and your colleagues who are working on this. Let's ask the second question. I want to bring this into the real world. So intellectually, this is super interesting, but in a short term and a long term horizon, what are you seeing as a real world impact potential for this kind of work and how does that energize you to continue on this? Right, so essentially it's related to my personal goal as a researcher. So ultimately, I want to build robots that actually can understand humans and then help humans in different situations. So in order to build robots that actually can develop such powerful social intelligence, they have to have to wait to infer the mental state of humans. So obviously, we are doing something like that in this benchmark. But then similar like we are watching basically very simple admissions. They are far from the real human activity that we see every day in real life. However, the idea here is that the principle that we use to infer agents in such simple admissions is actually the same kind of principle we use to infer the mental state of real humans in real life. So real humans also have goals. They may have different goals. They don't have such simplified goals, but they have goals nonetheless. And they want to reach the goal in the most efficient ways, right? So those principles are basically the same. Now the question is that if we build some models that have some kind of structures and knowledge about agents in these simplified environments, how can we scale that up to real worlds? I think that's still an open question. We have some ideas, but it's still on a very early stage. But I think coming up with a solution that can help us solve problems in simple worlds may give us some insight about how we can solve problems in the real world. Very good. Yeah, and thinking about those two, those latter two scenarios of hidden constraints and trade-offs, those are so fundamental to even basic human intelligence, and human common sense. And they're quite complex when you take them into more advanced settings versus a very simple setting. So I see that as being really relevant. What about you, Abhishek? What aspect of the real world impact and are you excited about and where do you see this going over the next couple of years? So as an engineer, I'm really interested in transparent systems. And the way I look at this whole project and the way it fits into the whole machine common sense project is this kind of systems would form the low level intelligence or intelligence that come to basic conclusions. And these kind of systems would be used for building more complex intelligence. So a system that is able to tell you exactly how it came to a conclusion would be much more attractive to everyone rather than a black box system that you just cannot know how it came to a conclusion. So for me, a system that can understand basic goals and desires could be used to build a much more complex social interaction machine that not only has real world application, but that is much more reliable than today's systems. And do you think the real world applications are limited to robotics or what other kinds of applications do you think are relevant here? I think any kind of interaction. So the interaction doesn't have to be just conversational or robotics, but this could be part of a fleet of self-driving cars, for example. I mean, when you drive a car, you sort of have some kind of etiquette or manners you have to maintain, or it could be a chatbot talking to a customer service, or it could be anything like a robot working in a logistic warehouse. So any kind of machine interaction, this kind of system would definitely help make that a much more social interaction. And when we think about the ability to reason about costs, about trade-offs, and there's nothing stopping us from exploring the area of ethical AI and understanding the different cost functions involved in human ethics and making some of those trade-offs from a machine common sense standpoint. Is that right as well? So I think there's definitely a very great opportunity for us to either call current studies or either as machine researchers to try to think about how we can actually capture the kind of complex costs and rewards that people have in our society and how we can build our system that can adapt to those kind of rewards and costs. Exactly, yeah. Very good. And so to close us out, what's next on the horizon beyond this paper and what do you hope comes out of this paper and what are the unsolved questions and answers that we need to discover? So one of the things we're hoping for is not just the AI community or ML community to think in the sense that building more general or explainable systems. So our data set would encourage people to build models that can not only understand to do a task, but rather to understand the underlying principles that govern the whole process of coming to a conclusion. And so we hope that our data set in conjunction with other data sets could be used to build models that are much more understanding principles of the whole interaction. So this is basically the first step towards building benchmark for machine social intelligence. But there are other aspects of social intelligence that we're not capturing in this benchmark. So for example, we also have the ability to research about other people's perceptual access. So for example, what other people cannot see and then also infer the beliefs of other people such as what people think about the world. Now, there are also other aspects that goes beyond where we can infer about the humans' mental states. For example, I do have an understanding about your mental states. Now what? Now I can try to help you, right? But then how can develop a good plan to help you? That's another kind of aspect of the social intelligence that we also want to build. The connection between the inference about my mental state and my goal preferences, my trade-offs, and the planning side, the intelligence of how do I organize my actions in order to support your actions, your needs. Exactly. So imagine you have a robot helping you in the home. The robot may have no clue sometimes that what you actually want to do. But then robot can still help you. Maybe the best way to help you in that situation is just to reorganize your home, like clean up all the messages you left. And then maybe that's the best way to help you. But in order to do that, your robot actually need to infer what you want. And then you also have a sense about uncertainty about that inference. So it's actually a very complex problem. And then that's definitely something we also want to build either as a benchmark or as better models. Well, I know in the MIT IBM Watson AI Lab and the campus of MIT and Harvard, there's a lot of great AI planning experts, including ones that we have funded. And hopefully there's opportunities for collaboration with these two fields and to come together and for an integrated approach once the time comes. I'm sure it's already started. Yeah, definitely. Well, this has been such a pleasure and an honor to speak with you, gentlemen. I know you represent a larger author team of people who have been working together. So I want to thank them and all your collaborators as well as DARPA, which is supporting this line of work. We look forward to seeing those next phases. I'm very excited about how you guys are looking to combine this physical common sense, social common sense and then have an eye on the horizon toward how these are going to connect to planning so that we can develop AIs that are actually supportive and helpful to human beings and all kinds of real world scenarios. So thank you again. And well, until next time, be well and good luck with all of your research. Yeah, thanks. Thank you. Bye, everyone. Bye.