 Hello, my name is David Cox and I'm the IBM director of the MIT IBM Watson AI lab, which is a unique Industry academic collaboration between MIT and IBM was founded in 2017 when IBM announced they're going to invest close to a quarter billion dollars over 10 years to find a collaborative lab And I'm pleased to be here today to tell you about some of the work we're doing Together and particularly in the space of neuro symbolic AI now I just said AI and I just wanted to stop for a moment and just comment on this term artificial intelligence because While I work at IBM research now, you know before that I worked as a professor at Harvard for a number of years And I will say 2018 and before I think we've all had this experience that We were quite uncomfortable with this term artificial intelligence You know, we would try and say machine learning or computer vision or be more specific and say things like deep learning But for whatever reason 2018 and beyond I feel like we've all given up and we're all calling it AI You know IBM calls it AI who calls it AI academics call it AI But when I came to IBM research Two years ago, I discovered this framework Which I really like and I just want to share it with you for a brief moment because I think it frames the discussion That we're about to have in a good way And that's simply to qualify what kind of type of AI we're talking about when we say the word AI or the term AI And that's to distinguish what we have today As narrow AI and that's not to say it's not powerful. It's not to say it's not disruptive But limited in important planes and then also to distinguish from general AI You know the sort of strong AI, you know the systems they can think for themselves and then really identify in between But I think the real opportunity is which is what we're calling broad AI And just to drill down why is narrow AI narrow? Well, it's typically single tasks single domain one thing Well, you can achieve some pretty amazing things with it superhuman accuracy and superhuman speed in certain cases But it does basically that one thing and that one thing well, you know Meanwhile, generally I you know, that's cross-domain learning and reasoning systems that are broadly autonomous that decide what they want to do for themselves So the kind of thing that Elon Musk calls summoning the demon or Stephen Hawking warns that you know could end mankind You know interesting philosophical debates, but I think everyone in this item audience here at iClear I think we'd all agree that we're we're not close to this But in between I think that's where there's real stakes and that's really what the lab that I run is is About and that's really what we're targeting. So broad AI is you know multi task multi domain be able to take knowledge from one place And apply it in another place Multi modal be able to take data from lots of different sources images video audio structure data unstructured data text You know your name at bring it together Systems need to be distributed I'm gonna be able to run in the cloud but also increasingly at the edge and we need systems increasingly for enterprise Which is what IBM serves where you know the systems are explainable because people aren't going to incorporate AI into their workflows Unless they can understand what those systems are doing. So, you know, this is basically then a roadmap for What we at the MIT IBM lab think we need to do The kinds of AI that we need to push in advance to make AI broadly applicable to all of the hard problems that we'd like to And that includes a number of different facets including explainability We can't have systems that are strictly black boxes really people have to understand Why they make the decisions that the systems make they have to understand how to fix those systems if they make mistakes Systems have to be secure as a whole new world of AI security. That's emerging That's that's both concerning and also revealing interesting things about the underlying science of deep learning that we didn't understand before These be ethical fair, you know, that's both Pro-social good, but also something that in many industries, you know as a regulatory You know consideration, you know, banks can be find, you know, huge sums of money if they fail to You know demonstrably show fairness and then the next piece is really about learning from small data So as much as people talk about big data, most of the problems that we see From across all of our customers at IBM that they're trying to solve what they are are in many cases small data problems And even when enterprises have huge amounts of data, if you don't have huge amounts of labels, obviously, then you're back into this This problematic small data regime. So we think transfer learning is obviously very important there but increasingly Reasoning being able to more flexibly extract structure from images and then be able to logically and flexibly reason over it Is going to be very important and this is really the thing I'm going to talk about today and focus on today So narrow broad general But you know, so what's narrow about today's AI toolbox? Okay, so I mean first of all I think we all saw this coming, you know, you know by by the year 2015 Forbes published an article that said deep learning and machine Intelligence will eat the world And you know, I think it's fair to say that it really has in many ways. So, you know, this is an example of a piece of work from Andrei Karpapie and they really that really for me sort of made me stop and say, okay This is really incredible what neural networks are able to achieve. So this is a, you know, classic image captioning system where you take an image And the system is able to produce a beautiful natural language caption like man in a butchers playing a guitar or on the right A construction worker in an orange safety vest is working on a road You know, meanwhile, there aren't many games left that the humans are better at than machines everything from, you know Jeopardy which IBM did almost a decade ago to really amazing feats like AlphaGo from DeepMind being the the world go champion group from Carnegie Mellon beat the world champion in poker and IBM even built a system called project debater that'll carry on a debate So we'll even have systems that can argue with us if you're into that sort of thing And even domains like art as we all know are increasingly, you know being You know invaded in some ways by AI so everything from early demonstrations of style transfer Like this paper from Matias Becker's lab on the left where you can take a photograph and re-render it in the style of any Artist you like all the way up to more recently BigGAN Which can produce these, you know, beautiful photorealistic Images basically out of thin air, you know, it's hard not to have a feeling that deep learning is sort of taking over everything But there are some concerns so so what is this image? Now if you ask State-of-the-art image net trained a convolutional neural network, you'll get an answer like this teddy bear And I would submit this is pretty much the least teddy bear image That's possible. This is this is very much not the teddy bear In fact, it's a piece of sort of subversive modern art called a luncheon and fur it's in the Museum of Modern Art in New York and We can if we if we look at it and think for a moment We can see where where things are going wrong and practitioners of computer vision already know this This isn't perhaps a surprise But often the public has this idea that computer visions that completely solve problem when in reality When you have these sort of corner case images that go outside of the distribution of support you have for your classifier you very frequently get these answers like teddy bear because things that were furry and roughly round in the Trading set tended more likely to be teddy bears and it was very unlikely that you'd have a fur covered saucer cup and spoon in your training set And it's actually worse than that You know if you take state-of-the-art object detection systems, and again, this is this is no secret people who work in computer vision know this This is a paper from Alan Yule's lab in 2018. They showed that even just putting a few objects out of context Can cause state-of-the-art systems to to fail badly so we put this guitar in front of a monkey Even though we have no trouble as humans telling that's a guitar and a monkey The neural network now will decide that that that guitar is a bird You know perhaps not surprising because the kinds of things that you'd see that are colorful and in jungles tend to be Tropical birds interestingly it also takes the monkey and thinks it's now a person because you know We would probably weren't very many guitar playing monkeys in the training set So so some pretty serious weaknesses And even this captioning system that we saw a minute ago, which was the thing that kind of blew my mind And made me really think okay deep learning is is is really an incredible force to be reckoned with You know when we show it slightly corner case images, and this is you know some work from Brendan like and colleagues You know this one's a man riding a motorcycle on a beach This one's an airplane is parked at the tarmac at an airport And this one's a group of people standing on top of a beach So there's a which is true so score one for the for the deep learning system But it gives you a sense that even though these systems are often producing, you know beautiful captions that are many cases correct There may be not truly understanding the structure of what they're looking at And it gets even deeper than that. So if we look at why is deep learning? What was really the inflection point for deep learning? You know image net arguably our data set that was created by Feifei Lee is really you know the the spark that lit the Revolution for deep learning, you know millions of carefully curated images. This was made possible by The digitalization of the world and the creation of the internet and you know the fact that we all now have digital cameras on our person at all time But we know that we as humans don't need this kind of data You know so if I show you this image and even if you've never seen an example of this image before From one example we can all be experts now So if I ask you is that in is that object present in this image? I think we'd all agree that it is I can ask you questions like how many of that object are present in this image And I think we can all agree that the answer is two and I could show you images like this and say is that object present in this Image and I think we'd all agree. Yeah, but it's it's weird, right? So we can extrapolate beyond The setting we've seen an object in and we only need one example So you know single shot there one shot learning to be able to do that and that's in contrast to our current supervised Systems which require, you know Thousands to millions of examples with carefully curated labels to be able to perform the amazing feats that can perform So in contrast my daughter when she learned what a cat and a dog were I didn't need to show her flashcards cat cat cat dog Dog I could show her one and then boom she'd get it So so this is really something that we need if we're going to apply AI much more broadly And and the problem is actually even a little bit more severe and subtle The other problem with the image net is if you look at the category for instance a category like chairs We have all of these sort of Canonically, you know positioned framed images of chairs that make up the chair category of image net But there's interesting project that that we did in the lab Which was led primarily by Boris Katz and Andrea Barbou together with Dan Gutfreund from IBM They created a data set called object net where basically they asked What would happen if we broke those correlations of you know Sort of the canonical views that you find in a data set like image net which was sort of scraped from the internet What if we took objects like a chair and we tipped it over or we took a hammer and we put it on the bed How would these models perform? So what this team did was they created an app for phones and they had You know crowdsourcing workers go and get assignments Like take a hammer bring it into your bedroom and fit it inside a mounting box on the phone snap a picture or You know take you know take a knife and put it in the kitchen on the sink I'm sorry in the bathroom on the sink and put it in this bounding box and snap and they did this for 50,000 images across 300 object classes that overlap with the image net object categories in four different kinds of rooms And what they found was really striking You know Models that were performing you know the top image net trained models which were performing in the 90% range Which is in many cases even better than humans can do just simply because there's so many dog categories different dog breed categories and image And it's hard for humans to actually do as well in some cases as Image net trained models, but you see these these these models are you know This is arguably the success story of deep learning these models take an enormous hit You know they're down to 40 45% performance when you put the objects just a little bit of context and just to be clear You know humans retain 95% performance despite these these differences So there's a sense in which you know something is missing. You know deep learning is clearly very powerful But but we need something more and and we need to keep pushing towards this goal of having systems that are They're sort of genuinely broadly applicable in any kind of setting that we want to put them in And of course, you know, there's also this issue of hacking So adversarial examples, this is an example I like from Pinyu Chen who's an IBM research where he took a captioning system more or less like the one that I showed in the beginning of the talk and You know Capsule system will produce a beautiful caption for that stop sign like a red stop sign is sitting on the side of the road With just a little bit of perturbation which is below the perceptual threshold that we can see as humans You can get it to be whatever captioning likes including, you know a brown teddy bear laying on top of the bed So these are vulnerabilities of these systems have and you know, it's it's really fascinating to see just how Severe these these gaps are this is another thing that came out of the lab. I run these are This is CJ Lu on the right and and talk to the Chen trying to fan on the on the left and Tuan Fu you can see is wearing this very very ugly shirt and And then CJ's got a detection box around him. So there's a detector a person detector that's detecting him You could imagine this sort of being like a surveillance You know like a AI surveillance setting but because That shirt is very carefully constructed so that it's an adversarial example Even under different lighting conditions, even if the cloth is folded or bent You can see the situation where it basically makes him invisible To to the the AI detection. So it's sort of like AI camouflage. So There are really, you know serious vulnerabilities and we need to figure out how what how those are how those work But you know, even cases where we can give the system as much data as we like We're still in a regime where we we struggle to get neural networks deep learning to solve certain kinds of problems And there's lots of these sort of intuitive common sense intuitive physics kinds of problems like these These are examples from our collaborator Josh Tenenbaum at MIT Like, you know, if you have to ask answer how many blocks on the right of the three level tower That's the kind of thing that that even a child can can reason about an answer But it turns out these are things that are quite challenging still for for deep learning systems or You know, it's sort of intuitive physics questions like Will the block tower fall if the top block is removed or are there more trees than animals? Again, this is from from Josh the Josh Tenenbaum's lab Or in this case, you know, what is the shape of the object closest to the large cylinder? And this one this is a data set that was created by group across stanford and and fair Including today we call clever And this data set was really created You know, it's sort of to illustrate this problem Which is when you have these sort of combinatorially complex arrangements of objects These are just rendered objects So it's very easy to create question answer pairs where the answer is known because the objects are rendered But even these simple sort of questions that require us to reason about the relationships between objects It seems like deep learning systems to the extent that they're able to solve them require Almost unreasonable amounts of data to make that work. So this is really then You know what we want to get at and ask, you know, okay If we want ai to be again broadly applicable, how do we solve this problem? And you know, one of the approaches that we're taking, you know, kind of one of the big bets we're making Is actually going all the way back to the beginning of ai, you know, so I mentioned this term ai You know at the beginning of a talk and it turns out if you don't know the history is that the term artificial intelligence Was coined way back in 1956 at a Dartmouth workshop that was co-proposed by people like Like, you know, john mccarthy, who's a future mit professor and actually also it turns out an athena rochester You can tell who is an ibmmer So he developed the ibm 701, which was one of the first mass produced computers You can tell who the ibmers are in this picture because basically they're the ones wearing ties But together with people like marvin minsky claud shannon Got together back in 1956 coined this term artificial intelligence And sort of imagined this this future we'd all be in together But the interesting thing is Of course neural networks were around back then they weren't called deep learning yet Deep learning was sort of a rebrand that was attached in the 21st century But neural networks around back at that time and as we all know, you know, everyone in the audience is Is you know, perfectly well aware, you know A neural network is basically a kind of non-linear function approximator in the setting So we can take in an input like a complex input like like an image of an apple And then we can map it from what we have which are images to what we want, which is some kind of class label And you know, if that's the the neuron that corresponds to apple Then we can get a some sort of read out of the probability we can get apple is present Now that's the neural network approach and you know everything in between, you know It's obviously learnable weights and that's how the magic happens But there was another kind of AI that's been around since beginning And of course many in the audience will be, you know, very familiar with this This notion of symbolic AI and in the world of symbolic AI We have a slightly different representation of an apple So, you know, an apple is not just going from an image machine that turns an image into, you know, one hot coated vector But there's stuff we know about the apple. We know that an apple has an origin comes from an apple tree We know that an apple has structure. It's got a body and it's got a stem different parts The body can have a shape. It's round can have size if it's in your hand. It's got color. It can be red or green We know that an apple is a kind of fruit We know all kinds of things about the taxonomy of, you know, apples and the evolution of of different kinds of plants So there's all this knowledge that we have and The conceit the central, you know premise of symbolic AI is We can use that knowledge and different kinds of knowledge to be able to symbolically, you know Manipulate the symbols that are represented in a structure like this to bring them to bear on solving problems So, uh, you know, and I would argue that when we when we Make a decision about whether something is an apple or not Or we make a decision about whether something is a fur covered sauce or cup and spoon Or it's a teddy bear We bring all of that knowledge to bear and I think that's one of the things that may be missing From today's AI and you know, in particular We, you know, we've been working on this this theme within the MIT IBM lab And you know, this has been a collaboration between Chuang'an who's at our own You know at IBM research with us in Cambridge gather with Josh Tenenbaum at MIT As well as his his former student judge in Wu who's now a professor at stanford on this notion of neural symbolic AI So if we have neural networks we have symbolic AI The idea is that in many ways We feel that AI that's sort of deep learning and symbolic AI many ways complement each other's strengths and weaknesses And as as you all know, you know Artificial neural networks have been around for a long time and in many ways they were waiting So, you know, there've been AI springs and winters Um, and you know, there were various times when you know neural networks everyone knew they didn't work But what really made them advance wasn't necessarily Primarily at least initially a conceptual advance It was the availability of compute, you know in the form particularly of GPUs As well as the availability of data, you know, the world became digital We had digital images all over the place that together with this explosion of Moore's law and particularly this uh, You know emergence of GPUs which turned out to be great for doing this That was the thing that ignited, you know deep learning and neural networks Symbolic AI, you know, it's been around the entire time. It's it's still around um, but it hasn't enjoyed a resurgence yet And one of the you know central theses of the work that we're doing is that In many ways symbolic AI has also been waiting just the same way that deep learning was waiting But what it's been waiting for is neural networks And the idea being that neural networks solve many of the problems of symbolic AI And then in many ways symbolic AI Either solves or provides a roadmap for how you might solve the problems that that I just illustrated about about deep learning that it has today So You know, let's look at this this data set clever for a moment again clever was created To illustrate a problem, you know to a first approximation to illustrate a problem with current neural networks Which is it be simple sort of visual question answering problems where you have You know an image and you have to answer a question like what's the shape of the red object? These problems require huge amounts of supervision To train in the traditional end-to-end way, you know, and of course in neural networks end-to-end is sort of the mantra Uh, and even when you do it end-to-end, you know, and you train with huge amounts of data In many cases they don't perform, you know completely, you know completely solve the task So, you know, this is what you're supposed to do, you know in some sense with a neural network You you take what you have uh, you want to uh, and then you want to map it to what you want Which is an answer, uh, and then you try not to get in the way You know, there's a lot of decisions about how we want to construct the architecture in between But uh, the idea basically is don't do anything gets in the way because you're just sort of getting Farther away from optimizing the thing you're trying to optimize And this is kind of the one of the You know the lessons of of the last few years or last number of years in Indeed learning now the problem with this is that the concepts things like colors and shapes And the reasoning, you know, you know processes like counting or or doing a an identity operation They're entangled, you know, so the neural network because you're just doing this end to end There's there's not a lot of incentive not to mix these things together in the network And what that does is that that makes it hard a couple things one is it makes it hard to Really completely solve the task, but then it also really makes it very difficult to transfer to other kinds of tasks Like image captioning or instance retrieval Uh, and and this is the kind of thing where you know humans again We don't need to train in multiple different tasks to be able to to to solve a problem Once we know something about the world we can very flexibly apply that to different different problem settings So let's unpack the task of of this question answering So if you have an image like the one on the left and we have a question Are there an equal number of large things in metal spheres? How how would we solve this problem? Well, first of all You know the question ask something about large things So we you know sort of interrogate the image we use our visual system to ask Well, kind of how many large things are there? There's three large things good We look at another part of the question. We see there's something about metal spheres. Okay We use our visual system and we identify all the things that are metal and spheres. Okay, good But then critically, you know, we we have to do an equality operation Which is fundamentally a you know, it's more like a logical symbolic operation We're going to compare to quantities And then we decide yes the answer to this question is yes And if we unpack that what we see is that you know part of the problem is visual perception And that's something that you know, cnn's accomplishment neural networks are good at Part of this question understanding and and we know From the amazing successes of deep learning in natural language processing that the current neural networks and other kinds of Deep learning methods are are very strong at understanding questions But then there's a piece which is Much more like logical reasoning and we're going to we're going to filter We're going to do a series of operations in a particular order and then we're going to do an equality operation Um, so what the team did uh, this again this collaboration between uh, folks at AM together with MIT and a few other people at other institutions Uh, was to build, you know hybrid system Which took a convolutional network to for the vision component You know, this is sort of the standard thing you would you would expect to do But rather than just trying to go straight to the answer Um, what what it's going to do is it's going to de-render the scene into a structured scene representation So de-rendering in the sense that, you know, you know, renderers like Uh, you're like for 3d movies or whatever, you know, they go from a structured representation of what are the objects and where are they? To an image and de-rendering is just going the other direction taking the image and going back to a structured representation Again, I think all the objects and their properties and everything we know about them And then we're going to go with the question and in this case using a recurrent neural network because we know that those are good for doing Language natural language processing But instead of a traditional language, you know, either a traditional end-to-end system where we just put that all together and try and give the answer Or traditional language applications where you might translate into another sequence or where you might translate into Uh, you know, or classify, you know, the sentiment or the quality of the answer directly Here the neural network is translating from the question the natural language to a symbolic program Series of operations symbolic operations that we can run to get the answer So you take that symbolic program then you run it on the structured representation and you can get the answer Now a critical piece here is that we're not just taking You know the pencil and the eraser and putting them together and calling it a day We're not just taking neural networks and symbolic processing and jamming them together and saying we're done bolting them together Instead the system is trained jointly with reinforcement learning That allows the neural networks to learn something different than they would have learned on their own By virtue of being part of the symbolic system. So you can exercise the whole system after some pre-training Um, get this the symbolic program to run get the answer the answer is right or wrong And just with a standard reinforce algorithm, uh, you can then uh train train both the the RNN and the cnn to to find You know do better at extracting the symbols better at de-rendering the symbolic representation of the scene Uh, but also be better at translating from natural language to a symbolic program And just to to walk that through how that works So you take an image like that language is what's the shape of the red object? The scene is parsed into, uh, you know the series of IDs and there's an object screens cube and it's at a particular location There's another object thread sphere. It's made of rubber Uh semantic parsing. We're going to take that question and we're going to turn it into a program In this case, there's a filter operation and then a very simple query operation. This is just a sort of purpose built You know domain specific language for the purpose of this demonstration and then you do some symbolic reason And you know you walk through you filter you find the one that's the red query What's the shape of it and you get the answer so You know in some ways very simple, uh, but I think there's a powerful idea here about mixing these two things together Now there's three advantages that come with this so just adding in a very simple dash of symbolic Operation the symbolic processing to the to mix one thing you get right out of the box is incredibly high accuracy So when this paper was published in in neuropse back in 2018 Uh, the performance was 99.8 performance. So this is effectively perfect performance So while previous methods had gotten close to you know getting perfect performance on this task Um requiring huge amounts of data, uh, none of them had really quite completely cracked the nut So, you know as of this time, you know, this clever data sets basically a solid data set And you know, it just took a little dash of symbolic processing to get it done Uh, even more importantly though is this notion of data efficiency. So again, remember, you know, we're able to work In most cases many cases with single examples, whereas You know supervised learning with deep learning requires, you know, the more data the better but huge amounts of data is required The interesting thing about this this algorithms again just adding in a dash symbolic processing You can get high accuracy better accuracy than than other methods It's just one percent you can get acceptable performance with just one percent of the amount of data The other methods require and if you allow yourself 10 percent of the amount of data You can basically completely solve the task So where other methods that are purely deep learning base to require, you know Hundreds of many hundreds of thousands almost a million examples to be able to perform Well, this method is able to train with with an order of magnitude or even two orders of magnitude smaller amounts of data So this is a huge event. So this is again one of the things that was preventing us from getting into this sort of world of broad AI that that we're trying to get to And then the other the third thing Is transparency and interpretability So, uh, if you haven't done neural network, there there's a lot of work about how do you back out what that neural network knows And what it's doing and how it works and how to think about the answers it's giving But this method In to you know intrinsically because it has this symbolic choke point in the middle where it produces a symbolic program And it sort of has a symbolic representation of the scene you can step through in a very understandable way to understand Not only how did it arrive at the answer? But you know if it got the right answer did it get to the right answer by the right the right path Was it looking at the right objects? Was it following the right series of steps? Uh, and this is uh, again, I can't estimate I can't overestate I cannot overstate In the real world how important this is that people who are going to be using AI be able to understand, you know Imagine, you know mission critical settings where people's lives might be on the line You really want a system where you're sure It's it's giving you the answer that makes sense. It kind of needs to show you its work So this is something that just intrinsically falls out of these kinds of hybrid approaches Now, uh, the work I showed you is that you know, it's actually, uh, you know, uh, you know over a year and a half Old, this was the you know, the first work that this team did In neuropsy in 2018 the team followed up last year at eye clear with something called the neurosymbolic concept learner And you know the way to think about this and I'll tell you a little bit about in a moment Where previously the properties and the like color and the values like red were predefined in that original paper With the neurosymbolic concept learner, uh, you could autonomously learn new value, you know new values like red You know, you could introduce a new color and the system would be able to understand like Adopt a new concept to go with that new color. Um, you know last year at neuropsy The team extended further something called the neurosymbolic concept meta concept learner That can learn relationships between concepts and start to get to the notion of what would it take to learn a new concept? Uh, then even and then I clear this year, you know in the main conference, uh, you'll you'll see there's extending this work to uh, you know dynamic scenes looking at counterfactual scenarios I'll tell you about that in a moment as well And this is sort of a Program of work, uh, you know, and this is a little bit of a roadmap and really what we're driving towards is having systems That are less and less predefined that can learn autonomously But still retain this this sort of symbolic understanding of the world and that are increasingly more autonomous And I think this is really then for us at least, um, you know, I think others in the field as well This is what we're driving towards how can we make systems that are more genuinely autonomous that can acquire knowledge That's flexible and can be used across multiple different tasks But it can do it in a way that doesn't require this these huge amounts of supervision The current methods seem to require So the neurosymbolic concept learner i'm just going to run through a few of these Pretty quickly, but you know, this this was an attempt to Have a more flexible notion of concepts like colors like red. They don't have to be predefined But instead, uh, we have, uh, you know, we again we go from the question to a symbolic program So we have a series of operations like you have to filter for red and then query the shape of the object Um, we're still using as you know, the team here is still using a cnn to represent the visual content of the image and there's a you know, sort of a That gets mapped into a general representation space but critically then we also map uh down into sort of conceptual spaces like color space, uh, and then concepts like like red can live in this color space You know in in in a you know vector embedding where you can now reason about Whether, you know An object is likely to belong to you have a particular property and importantly because it's now a learned Concept space you can you can acquire new concepts. Uh, and it's hard to be much more flexible Of course, you know, I laid out this vision that hey, you know, one of the advantages of symbolic AI is we also have all these relationships and we can sort of make Leaps sort of logical leaps where we can understand that the two things are related or similar or they have particular relationships The the team, you know on the very next nerfs on the next sort of You know click of the conference clock Um introduced this meta concept learning notion. So there's still visual reasoning questions like, you know Is there a red cube and you know the answer of yes, and is there a green block and answer is yes Interestingly, um, you know tied up in that in those two very simple questions. There's a lot of subtlety and structure there like You know, we can ask some meta conceptual questions like is red the same kind of concept as green Are they both, you know examples of colors, you know, are they the same kind of concept? And we can also ask questions about you know synonyms, right? The first question said cube the second section the question said block You know, that's not something that we we that we typically struggle with Because we know that cube and block are synonyms of each other. So there's this conceptual learning You know sort of set of of questions and there's also Sort of a meta conceptual level of understanding their relationships between those concepts and how they How they are how they interlock and are related to each other and you know in this case, uh The the team also worked on on the not just synonyms also things like hypernims And so is is an ivory goal an example of You know a species in the in the in the group of laridae And and again, this is this is from the caltech usd birds data set And then we can ask, you know, meta conceptual questions And the critical thing here is the idea is that these meta conceptual Questions can operate on the same conceptual space and help us, uh, you know bring more structure to that space through learning But also be able to make some of these sort of conceptual leaps So logical leaps that we're able to make fluidly as humans, but uh that our our systems really struggle with today So, I'm sorry. This is just a repeat So so when you have a you know a question again, we detect the objects we we extract features We have, you know These these features can then be embedded in conceptual space We parse the question into a symbolic program And then you can, you know, basically step by step execute this program and then be able to make a decision about what the answer is With these meta conceptual questions, there's you then a separate system is just trained on On the the natural language questions by themselves, but operating on the same conceptual space So you can parse that into a meta program We're going to medify meta verify whether red and yellow are the same kind of concept Uh, and then we go in and we can produce an answer And what that lets us do then is During training you can see examples like question. Is there an airplane? The answer is yes Is there a plane? That's a slightly different way of saying airplane the answer is yes And you know other questions like is there a kid the answer is yes Is there a child? Yes, we know that kid and child are the same But the system hasn't been explicitly told that And if we've seen questions before we understand the concept of a synonym So we can know that is a airplane a synonym of plane We know that the answer is yes We have a notion we've trained of what a synonym is Then we can make these sort of generalization leaps and say a kid and child also synonyms And then we can be able to do that because we have these operators on the same conceptual space And you can show you know, obviously You know compared to straight neural network methods that some of these inductive leaps are Are much easier when you add in a little bit of this sort of symbolic processing And then this year at the at the conference, you know The team is is still you know active publishing in every conference kind of on this this roadmap There's a new data set called clever that That that's being introduced which is taking clever, but taking it from static scenes to to video to dynamic scenes and includes Descriptive questions like what is the material of the last object collide with the the science cylinder these are still hard questions to to to answer with just a pure end-to-end approach But you know, it's fundamentally descriptive, you know, we can watch the scene unfold and then answer the question There's explanatory questions we can ask like what is responsible for the collision between the rubber and metal cylinders now we're we're starting to get to another level of Of reasoning where we not only can say What happened, but what were the causes, you know, what what was the you know, what explains the results that we saw And then really interestingly we can also do counterfactuals and this is also included in the clever data set So what would happen if the science cylinder weren't there so we can now Ask the system to reason, you know, again, this is something that humans We're perfectly adept at this we can Not only say what happened, but we can also imagine Alternate scenarios and this is a big part of how we Plan our lives and and achieve complex goals as we ask ourselves What would happen if I did this what would what could I have done differently that would have led to a different result? And we think this is really, you know, the core of an awful lot of intelligence Now, you might ask why is IBM Care about this aside from this being you know, I think an interesting scientific direction When you look again at the kinds of problems that we face in the real world I would actually argue that the vast majority of them You know the problems that businesses face the problems that we face in our daily lives Aren't the we're going to have a river of data and we're going to train a classifier and get results out They're much more like one-off puzzles Like, you know, I might want to know how many employees are over 10 years of experience, but we location last year What factors might contribute to better output from factory versus factory factory b or why is our database down? You know, are we being hacked by by hackers or you know, somebody pushed bad code into production? And that's why our database is DDoS, you know our systems DDoSing our own database These kinds of problems You don't get lots and lots of training examples like hackers aren't going to hack your system a million times before it matters It actually matters the very first time it happens And when you look at how we as humans solve these problems, it's much more like You know problem solving, you know, it's a puzzle we go in we extract the structure of the world We build a mental model We reason about it and then we do more experiments to refine our mental model And I think that really is the crux where we're heading towards and we think that these ideas of you know, fundamentally symbolic manipulation Whether they'd be done with traditionally symbolic systems or be done with neural networks that operate on symbols That's really we think the magic and we think that actually 99 of the problems that we face in the real world have much more of this flavor Than than this the simple sort of, um, you know, huge amounts of data kind of kind of approach Now this is actually a Research program across much of the lab. So I showed you one example of some work That we're doing together with Josh Tenenbaum's lab in particular at MIT This is something that we think is actually important across, you know, the entire spectrum of AI work that we do So we have work in our symbolic generative models Um, that that's hopefully coming out soon um, we have work on Doing safe, uh, machine learning a safe rl basically using symbolic methods to verify the behavior of neural network and machine learning methods We have work in Neurosymbolic natural language understanding We have examples of mixing neural networks together with planning and another classical tradition in symbolic AI Neurosymbolic code optimization. We have a paper That's sorry the year and that's wrong. That's a this year's eye clear in 2020 On using Neurosymbolic methods to look at code to optimize the code And then we also have a lot of work going on Again together with Josh's lab as part of the DARPA program on how do we in you get Machines to have what we call common sense sort of all the base level knowledge That's not written down anywhere about the affordances of how the world works I'll just give you just a few little You know teasers of some of that work. Uh, so this is some work mixing Uh, traditional good old fashioned AI symbolic planning with neural networks And this is some work from, uh, Masataro Sai who's with us in Cambridge at IBM research at the MIT IBM lab Looking at how you solve problems or classical problems like these little tile puzzles where you move the pieces around And you try and get them in a particular order or try and produce a picture or problems like the tower of Hanoi Those are are, you know Classically things that planning algorithms are very good at and we have lots of heuristics for how do you search the space of potential actions to To get to an answer, you know, this is, you know, a very mature field But, uh, you know, these methods assume that you start with the symbols and the question is can we much more flexibly go from a world That's messy Which is really the domain of what neural networks are good at And make make the world compatible through neural networks with algorithms like planning And this this lat plan algorithm, uh, does does just that which is basically using auto encoders and forcing them, uh, with, you know, basically a soft Gumbel softmax to produce discreet representations, which are compatible with planning And then being able to use the late basically do planning in in a in a discreet latent space of the auto encoder And we can find that we can we can solve problems That it would be difficult to solve otherwise Using this method and this is a line of research that we're continuing in many different directions Also, just to give another shout out work on verifiably safe reinforcement learning. So being able to Have scenarios where you want to have a machine learning algorithm or deep learning algorithm or reinforcement learning algorithm To to take actions like driving say a an autonomous vehicle But you really want to be able to have some kinds of guarantees on on on the the the safety and the you know The safety policy around those systems. So Nathan Fulton Who's with us again at IBM Research in Cambridge and the MIT IBM lab is working on building hybrids again Of these systems where you can include some of the guarantees of verifiability with with deep learning to get kind of a best of both worlds So, uh, you know, that's just a few very very brief glimpses of some of the stuff we're working on We're very excited about this intersection um, and I I think really the the key of it boils down to Um being able to leverage the power of neural networks. We're not saying, you know Neural networks are wrong in any way should perform this incredibly powerful set of tools But the goal in many cases if we want to be able to have the sample efficiency that we need to address overall problems We want to have the richness and the flexibility we need We think that something like symbols, uh, you know, and and whether we manipulate the symbols with traditional symbolic AI or we build neural networks that manipulate the symbols The idea that we have rich representations, you know structures that that like like trees Structures like graph structures like programs That level of richness and all of the strength that we've developed over the years with manipulating those kinds of structures Married together with neural networks. We think that's a tremendously powerful combination and we're making Uh, you know kind of our big bet on that and putting a lot of resources behind it So we're very excited about it. We hope you're excited about it too And I'm very happy at this point to take any questions you might have over over the video conference Thank you