 Hello, hello, and welcome, welcome to the second week in the What's Next in AI seminar series. I'm coming to you live from IBM Research, Johannesburg Lab in South Africa. My name is Dr. Ismail Akulwaya. I'm a research scientist at IBM Research Africa. And it's my great honor to be co-MC of this session. So to kick things off, I'd like to invite our senior manager, Charity, to tell us about this next session. She was inducted in the top 40 under 40 women in Kenya on a roll and has done amazing work in pushing Kenya up the world bank's ease of doing business ranking. Charity, please kick things off. Thank you Ismail and all right. Thank you Ismail and good afternoon, ladies and gentlemen. And thank you so much for taking the time to join us at this second week of the AI seminar series here at IBM Research. We're really pleased and honored that you've taken the time to join us today. And last week as well, for those of you who are able to join, I hope that if you are unable to join last week, that you'll be able to go back to our YouTube channel so that you can catch up on the fantastic conversations that we had in last week's session. As you've been informed, this week's session is going to be focused on topics around learning, reasoning and language understanding, a part of AI. And I really look forward to the wonderful lineup of presenters that we have for you. And really our goal for these sessions is to have a conversation. You know, one thing I wanted to share with all of you is our statistic that's quite fascinating from the Brookings institutions last year. And they said that by 2030, AI will add about $15.7 trillion to the global GDP. And I think the question to ask is, that's to the global GDP. How much of that will be produced by the African continent? And I think for that to happen, we have to become a major, major part of the conversation that's happening and not just as consumers of AI, but also at the table being able to produce AI intellectual property to create products. And as Solomon mentioned last week, not just small innovations, but innovations that we can be able to produce at scale so that they have impact. So I think as a continent, we have a unique opportunity to be able to leverage these technologies. We have the population. As everyone talks about the demographic dividend, we have the young population to be able to do it. We do not have a dearth in terms of the challenges that need to be addressed by AI. So I really think this is a fantastic opportunity for the continent. And all we need to do is ensure that we have the right skills to be able to do it. So I believe this is a fantastic conversation for us to be having in that timely one at that so that we can see how all of us start to participate in the creation of AI technologies. Here at the research lab, our teams have been working on part of what you will hear today but also applying AI in various areas, ranging from healthcare to maternal and neonatal child health, looking at food production risk all the way to climate change. And there's a ton of papers and things that we have produced to be able to describe that work. And I hope you'll be able to follow us on social media so that you can learn more about that and also look at some of the technical papers we have written in that regard. I'm hoping as you listen to the conversations today that you will be able to also provide your own perspective, ask questions, and also we really welcome you to collaborate with us. And the collaborations, the opportunities are endless all the way from collaborating as part of a university or an organization to coming and joining us as an intern or a full-time hire. I think all those are opportunities that are there for you to join us and work with us so that we can be able to really leverage and take advantage of the opportunity and potential that AI brings to us. So without further ado, let me welcome you once more and I hope that you have a fantastic rest of the time. Over back to you Ismail. Great, great. Thank you very much charity. As you can see, it's about skills development. It's about impact at scale, but impact bearing in mind the human element, of course. I would like to connect this week's session to last week's session. You remember from last week we had, we talked about responsible and trustworthy AI. Koush gave us four principles of trustworthy AI and his motto, no shortcuts. Tathi talked about values and meta values, Azina about ethical frameworks and what they miss. And that laid the groundwork for what we need to move forward in trustworthy AI. We had Vera telling us what is understandable, explainable AI and Benji gave a hint of what we need to inject in our AI. We need to, it needs to be symbolic, transferable and algebraic. Finally Celia and Gilmore showed us trustworthy AI put into practice in the real world. So that lays the ground nicely for this week's sessions, which is about filling in those missing gaps. What's needed? We know that AI is about programming computers with data. So data is key, but that opens up a host of problems. Bias sees in the data and shortcuts in how AI learns from the data. So we need something other than just data. We need reasoning and you'll see that thread coming through in today's sessions. We laid the groundwork, values, trustworthy AI's. We know what we want. We want understandable, explainable AI and the missing element is reasoning. I like to think about the scientific method and how if you remember from your history lessons, Tycho Bra was a famous astronomer that collected data about the position of Mars. And that represents for me is an example of big data. He just had lots and lots and lots of measurements. And then along came Kepler who built up an empirical model from the data, but he didn't have understanding. It took Newton to bring explainability to Kepler's three laws. Where are we now in AI? Now we have massive data sets and massive empirical models. We're missing the Newtonian understanding. And that's where today's sessions come in. The best, I suppose, one of the most exciting current day examples of massive models is GPT-3. But as you know, it doesn't understand the text it generates and it's definitely not responsible. We've seen horrible examples of it producing racist and sex stereotypes and irresponsible outputs. So we know what we want, but how do we achieve it? We achieve it, we need reasoning, we need symbolic treatment. So let me then introduce you to our speakers for the first session. We have Dr. Nome Slonam and he's a distinguished engineer and the principal investigator of Project Debater, which has recently featured on the cover of Nature. Nome co-authored more than 50 publications in peer-reviewed conferences, workshops, and journals. And some fun facts, I always like to throw in some fun facts as emcee. He did his PhD under Professor Naftali Tishpi, who's famous for information bottleneck theory. And by the way, there's a very nice quantum magazine article on information bottleneck theory. I throw that in because that's trying to get some understanding, opening the black box or opaque box as Vera likes to remind us to say. And another fun fact, Nome was also a script writer for a TV sitcom puzzle. So not only is he a candidate, big bang theory character, but he's a script writer of such sitcoms. So let me hand over to Nome. For 20 minutes, thank you. Hi, hello, everybody. Thank you, Ismail, for this very nice introduction. And I'm Nome. I'm the principal investigator of Project Debater. And I'm very happy for the opportunity to speak to everybody today about this exciting project. So as you perhaps know, in IBM research, we have this interesting tradition of grand challenges in artificial intelligence. Back in the 90s, IBM introduced Deep Blue that was able to defeat Gary Kasparov in chess. In 2011, IBM introduced Watson that defeated the all-time winners of the TV trivia game, Jeopardy. And just a few days after this event, an email was sent to all the thousands of researchers of IBM across the globe, myself included, asking us what should be the next grand challenge for IBM research. And I was intrigued by that. So I offered my office mate at the time to brainstorm together. And this is what we did. We sat in the office in Tel Aviv and we raised many different IDs. And at some point towards the end of the hour, I suggested this notion of developing a machine that we'll be able to debate humans. And that this is how we will demonstrate the technology through a full-life debate between this envisioned system and an expert human debater. And this sounded better than all the other thoughts that we had after that moment. So we decided to submit that. The only guidance that we got was to submit the proposals in a single slide. So the relevant management would not be swamped with too many details. And we followed these guidelines. We submitted a single slide. This was February 2011. So this is more than 10 years ago now. And this started a fairly long and thorough review process that lasted for a year. And eventually in February 2012, this proposal was selected as the next grand challenge for IBM research. We started to work a few months later with a small team that gradually expanded. And after nearly seven years of intensive work dedicated only to this mission of developing a machine that can debate humans, we demonstrated the system for the general public for the first time. And it was a full-life debate between this system now being called Project Debater. And one of the legendary debaters in the history of university debate competitions, Mr. Harish Natarajan. It was a full-life debate in San Francisco in 2019, February 2019. And it was surprisingly reminiscent to the vision that we had on that single slide back in the office in Tel Aviv. It was in front of around 800 people in the audience, mainly journalists from all over the world. And it was also broadcasted live on the internet. And as perhaps expected, this event attracted massive media attention, 2.1 billion social media impressions, a lot of video views and press coverage and so on and so forth. And what we will do next is in just a couple of minutes we will see short segments out of this debate, the full-life debate is available on YouTube. But before that, here is the premise. The debate starts with a motion that defines what we are debating. And in this debate in San Francisco, it was whether or not the government should subsidize preschool. There are many considerations around how this motion is being selected, which I'm omitting here. I will just emphasize that obviously it is selected, it was never included in the training data or the development data of the system that we picked. We are on-site government, so we support the motion Harish is on the opposition. We have four minutes opening speeches for each side, four minutes rebuttal speeches for each side and two minutes of closing statements. So all in all, including the poses to think, so to speak, we are talking about nearly 25 minutes of a discussion between men and machine. We will now see just three minutes out of that and then start to talk about how the underlying technology actually works. So if you can play the video now. Greetings, Harish. I have heard you hold the world record in debate competition wins against humans. But I suspect you've never debated a machine. Welcome to the future. When we subsidize preschools and the like, we are making good use of government money because they carry benefits for society as a whole. For decades, recent research has demonstrated that high-quality preschool is one of the best investments of public dollars, resulting in children who fare better on tests and have more success. More successful lives than those... So sorry, guys, the video is not really working. This is really too bad. Perhaps you have the link that I shared yesterday and you can put it on the chat and maybe we can pause for three minutes and people watch it or do you want me to continue? Okay, so I will continue. You will need to take my word that this is a fantastic video and I will encourage you to either watch this short video of three minutes or go to YouTube and search for Project Debater. And you can see the whole event, the full-life debate is about one hour. And just a second. And the question really that we would like to address from a technical perspective is how the underlying technology really works and it is summarized in this fairly elaborated slide. So let me take you through that. The system has two major sources of information. One of them is a massive collection of around 400 million newspaper articles taken from the Lexis-Nexis corpus. And when the debate starts, the system is searching for short pieces of text that satisfy three criteria. They should be relevant to the topic. They should be argumentative in nature. So they should also argue something about the topic, not just be relevant. And finally, they should support our side of the debate. And once these short pieces of texts are found, the system is using other AI engines, clustering, cluster analysis is an example to glue these pieces together into a compelling narrative. The second major source of information is a unique knowledge graph that we developed over the years that aims to capture the commonalities between the many different debates that humans are having. So in this knowledge graph, we have thousands of more principled argumentative elements. And when the debate starts, the system is navigating in this knowledge graph, aiming to find the most relevant principled arguments and use them in the right timing. So just to give an example, what we mean by a principled argument. So imagine that we are debating whether or not to ban the sale of alcohol or whether or not to ban organ trade. In both cases, the opposition may argue that if you ban something, you are at the risk of the emergence of a black market, which by itself has a lot of negative consequences. So the black market argument is a principled one. It can be used almost in the same phrasing in very different debates. One may naively assume that therefore this is just a keyword matching thing. That is, if the debate is about banning something, we should anticipate the opposition. They used to use the black market argument, but of course, this is not always the case. So for example, imagine that we are debating whether or not to ban the use of internet cookies. Obviously, we're not going to see a black market of internet cookies, people standing at the street corners offering internet cookies or something like that, or imagine a debate about banning homework. So again, the system need to develop a more subtle and nuanced understanding of the human language in order to perform well in this task. And the third issue is of course, the rebuttal, how do we respond to the opponent, which is the most challenging part. And this starts by listening to the words articulated by the opponent. And this is done for that task we used Watson's speech recognition capabilities out of the box. But of course, we need to go beyond the words and really understand the gist, the main claims the opponent is raising. And to that, we developed several techniques that work in parallel, and almost all of them rely on the same principle of trying to anticipate in advance what kind of claims or arguments the opposition might use, and then listen to the termine whether indeed the opposition were making this claim and then respond accordingly. So at a very high level, this is how the system operates. And over the years, we published many papers and released many data sets that are freely available for the research community to use. And I really encourage you to take a look at our website. We published, I think, around 55 papers to date and released more than 20 data sets. But these papers usually highlight a particular aspect of the system, and not the system as a whole. And later on, we will discuss a more recent paper that describes the system in its entirety. But before that, just to give you a better feeling about what it takes to develop even a single component in this project, let's dive deeper for a couple of minutes on a specific task of evidence detection. And we probably, I mentioned a few papers and what I'm going to describe now is by a paper led by Liat and Dorr from our team, published last year in AAAI. So first of all, here is the problem. We have a motion that we haven't seen in the training data. In this example, whether blood donation should be mandatory. And we see two sentences. And basically the first one is telling us that blood donation is good for your health to some extent. And we understand that this sentence can be used in this debate. It's persuasive to some extent. And the other sentence is telling us that students are more abandoned amongst the blood donors. So this is a statistical fact, but it's not really persuasive in the context of this debate. So how do you train an AI system to make this subtle distinction? This is another example about a very important debate, whether or not we should abandon Valentine's Day. And we see here again, a sentence about a survey running in Canada, finding that most Canadians agreed that Valentine's Day is a waste of time and money. So this is to some extent persuasive and we can use that during the debate. But now we see another sentence quite similar about another survey running in the US where most people reflect that if they are going to break up with someone, they would do that just before Valentine's Day to save money. So this is perhaps a useful fact to know, but not very useful in the context of the debate. And again, the question arises, how do you make these subtle distinctions? And the high level architecture of the system is depicted in this slide. We have this massive collection of newspaper articles around 10 billion sentences and we are given a controversial topic. This is the motion of the debate and we have some queries to find candidate sentences that may contain evidence which is relevant to the debate. But these queries are kind of basic, obviously. So we end up with perhaps tens of thousands of candidate sentences. And then we would like to apply some kind of a ranking model, BERT, a deep learning model to highlight the most promising evidence that we are going to use during the debate. And of course the question is, how do you train? How do you collect the training data to train this BERT model, this deep learning model? And the problem is that if you will just go ahead and label random sentences, annotate random sentences in the corpus, the prior of being an evidence to a controversial topic of these sentences is fairly low, it's just around 2% or something like that. So it will require annotating a very large amount of sentences and this will end up with only a small fraction of them being representing positive examples. So to overcome that, what we did is we started with a relatively simple model that we had developed in the first few years of the project. This was even before moving to a transformer technology like in BERT. This was just a logistic regression classifier based on feature engineering. And the precision of this model was around 40%. So obviously this is not usable for the live debate, but this is much better than selecting sentences at random. So we started by using this logistic regression classifier, annotating the data from the predictions of this classifier and then training a more advanced model over the obtained data and then repeating this process in iterations to gradually improve the models. And eventually, as indicated in this figure and in more detail in the paper, we reached very high precision. These results are actually after the live debate in San Francisco. These are results from 2020, but on average, the accuracy of the model is around 95% on average on 100 different topics. So this is in the top 40 predictions. So this was very impressive from our perspective. Now, taking a step back, I think at a higher level, we needed to develop three capabilities. First of all, data-driven speech writing and delivery. So think about the opening speech of the system. This is a four minutes opening speech, around 700 words, reminiscent to perhaps to an opinion article that you may read in your favorite newspaper, but this one was written by the system in a completely automatic manner on a topic it was never trained upon before. So this is quite difficult to accomplish. The second capability is listening comprehension or rebutton. So we can compare that to the virtual assistants or personal assistants that we are all familiar with, that we have on our phones, but these assistants also they are powered by interesting and modern AI technology. Usually they need to understand a single sentence with a functional flavor, like turn off the lights or find me a restaurant nearby. The better is facing a much more complicated situation where it needs to find, to understand the gist of a speech, a four minutes speech. And the human debater is usually speaking fast with a lot of emotions and we still need to identify the main claims. So this is a different level of difficulty. And finally there is the issue of modeling human dilemmas. This is the story of the principled arguments, trying to capture commonalities between many different debates that humans are having that I mentioned earlier. And of course the problem is that when you build such a system and it has many, many components, many things need to succeed simultaneously, but many things can go wrong sometimes in unexpected ways. So for example, just to get the stance right is quite difficult. And if you get the stance wrong, it means that you are going to raise arguments that support your opponent, which is not a very recommended tactic from our experience in a debate. Also drifting from the topic would be an issue. Many team members vividly remember a debate that we had in 2016. This was one of the first live debates that we had with the system. And the topic of the debate whether or not physical education should be compulsory and the debater system started to argue in passion about the benefits of sex education and the human debater was trying to bring it back to topic in vain. So this was quite amusing for some people in the audience and quite disturbing for other people in the audience, depending on your perspective, I guess. And also the system naturally is only as good as its purpose. So for example, if we have a sentence speaking about a global warming will lead malaria of virus to creep into heavy areas, the system cannot correct the fact that malaria is a parasite and not a virus. So these errors can propagate into the text suggested by the system. Also sarcasm is fairly hard to detect. So if we see a sentence speaking about university scientists, the system may still use the claim hidden in this sentence missing the point that this is a sarcastic comment. And it can get even more complicated. So for example, in some cases when we raise a principled argument, we try to massage it a little bit. So it will sound even more relevant to the debate that we are having. So for example, instead of saying people enjoy that, therefore we should attempt to fix it rather than eliminate it in a debate about whether or not to ban gambling. We're going to say people enjoy gambling, therefore we should attempt to fix it rather than eliminate it, which is fine for this debate, but then you may end up saying in a different debate, things like people enjoy assisted suicide, therefore we should attempt to fix it rather than eliminate it, which of course is not making a lot of sense. And in some other cases, we had a different component trying to expand the topic of the debate in order to bring more content to the debate. So for example, if we are debating whether or not sorogacy is a good policy, then we would like to bring arguments about adoption as well into the debate, which is the relevant stance. So the system need to automatically understand what is the alternative to sorogacy and then it can introduce these arguments by saying something like, let me discuss a welcome alternative to sorogacy, this is adoption, which in this case again is making sense, but on a different debate the system may say, let me discuss a welcome alternative to global warming, this is global cooling, which and we understand the source of this error, but of course this is a kind of a funny statement that we prefer to avoid and it can get even worse. For example, the system may say, let me discuss an alternative to suicide, which has some advantages, this is homicide, which again is not a good statement in front of our live audience. But in spite of these difficulties, we were able to make a nice progress. So in 2016, we had the first live debates with the system. This is after nearly four years of work and the system at that time was basically at the level of a toddler, was not making a lot of sense, but in only three years from 2016 to 2019, we made it from kindergarten to university. So in 2019, the system was at a level of a decent human university debater and this was a very nice progress to see over the years. Now, I mentioned at the beginning that we had recently a paper describing the system in its entirety and one of the points that we wanted to consider in this paper was really how to evaluate a live debate system in a more systematic manner beyond a live demonstration and the public debate approach is very limited for various reasons that I'm skipping here, but basically it is even not practical. You cannot run 100 live debates and with live, different live audience each time and evaluate the system. So we needed to find some other technique and this was one of the contributions of this recent paper published this year that was featured on the cover of Nature. And you can see the full details in the paper, but we'll just skim through. Basically, we were not able to compare the system to other live debate system because as far as we know, the only live debate system available is Project Debater. So we started by focusing the comparison to alternatives on the opening speech and which is obviously a prerequisite to participate in a live debate is the ability to generate an opening speech. And here we have summarization techniques. We have GPT-2 that was mentioned earlier in the introduction. We asked for access for GPT-3 but unfortunately we didn't get permission to that but GPT-2 is also a powerful model and we fine tuned it to the task of the thousands of debate speeches or human written arguments and also other techniques. And basically we showed human annotators opening speeches by these alternatives by Project Debater and also by expert human debaters and asked them to say to what extent they disagree with a statement like, this speech is a good opening speech for supporting the topic between one to five. And as you can see in the figure, the results of Project Debater were only a little bit below the results of human expert debaters and way beyond the fully automatic alternative systems including GPT-2 fine tuned for this task. And we also had evaluation of the final system which I'm skipping here for the sake of time which you can read about in the paper. And another last point that I would like to mention in this context is that many of Project Debater capabilities are now freely available for academic research. So you are very much welcome to take a look at these capabilities and try them yourself. And I think we are curious to see what people can do with that. So this was also released as part of the paper in nature. So before wrapping up, I think one third question is why to pursue a grand challenge to begin with. And I think the answer lies at several dimensions. First of all, it enabled us to advance science and to push the boundaries of artificial intelligence. I mentioned the papers that we published and we also organized workshops and tutorial. We just had a tutorial in ACL and another one in HCI. It helps to find a research on new problems. So for example, this problem of finding evidence which is relevant to a controversial topic in a large body of data is very natural in our context but actually we were the first to formulate this problem and suggest a working solution. And of course, there are also many use cases for the technology that are of interest to IBM customers. And finally, we started this discussion by mentioning earlier grand challenges in artificial intelligence and indeed grand challenges in AI are there from day one back in the 50s out of somewhere who was also working in IBM. Actually the person who coined the term machine learning started to work on AI that can play checkers. It took him decades and eventually in the 70s it was performing very well, which looks obvious today but back then it was a sensation. Then we had backgammon in the late 80s also from IBM research, Jerry Tezar was still in IBM research. We had chess in the 90s and go more recently by deep mind. But I think and we argued that in the paper that all these board games lie in what we refer to as the comfort zone of artificial intelligence. And there are several reasons to that that we'll mention just one of them. When a board game is done, we know who won the game and this implies that we can facilitate enforcement learning techniques. That is we can run the system play against itself and millions of games and see the results of each game and then by that use this signal to improve the system. When a debate is done, we do not necessarily know who won the debate, it is not even clear how to define that. So we don't have this pass and we need to think of other paradigms. So project debaters certainly do not lie in the comfort zone of artificial intelligence. It is in what we refer to as a new territory for AI grand challenges where humans are actually still better. But for us, this means that this is actually interesting and we still have many open questions to explore. So I will end by just thanking the remarkable team of project debater at IBM Research and thank you for your attention and your time. If we still have time, I can take a couple of questions. Thank you very much. Thank you. That was excellent. I had an independent researcher, I won't mention his name, saying that your working project debater, the system is a marvel of engineering. So I just wanted to throw that out there. Let me take some questions because you're not available for the Q&A at the end of the session and there are some questions that have come in. So let me fire away. Could I also invite you to after this five minute Q&A to log on to Crowdcast and maybe you could offer some responses to the Q&A for the questions I cannot get to. Let's start with the most upvoted question. So yes, project debate is an amazing project. It's so difficult to make the system seem organic and respond in real time since the opposition human debater can be so unpredictable. What does the system do if they cannot match the incoming speech as expected? Does it arguably conceive notions? That's from Assad. That's a very good point that this was in need some of the, I think most stressful moments when you sit in the live debate in San Francisco, everybody are watching and I think the first issue was whether the system will speak at all because my concern was, you know, it is enough that someone will unplug one of the cables or something like that and that's it. Nothing will work and these errors may happen. So we were really, really nervous. And then the question is right at the point, right? You know, we cannot anticipate what the human opponent will say. It is unpredictable to some extent and they can do anything. So the system had a lot of fallback options of, for example, if, you know, imagine that we prepare this set of claims for the opposition that we believe the opposition may argue but the opposition is not raising any of these claims or they raise one of these claims and we cannot detect it. This may certainly happen. So in this case, we had different fallbacks of saying, you know, kind of making, I would say more generic rebuttal about the motion regardless of what the opposition were saying, which is by the way, what happens sometimes with human, right? You can imagine a politician, you ask them a question. They don't have a good answer. So they speak about something else, which is somewhat related. So the system had this politician mode in the background. Great, that's a wonderful answer. Thank you for that. I'm going to squeeze in one of my questions. I'd like to connect Project Debater to the rest of the series on trustworthiness. What are your thoughts on Project Debater being used in the future for as a tool of fake news and pushing, you know, in the various human agendas? Yeah, I think the connection is there absolutely because the technology of detecting arguments is to some could be sensitive, right? It may find fake news arguments, but it can be used to some extent. I think the evolution of this technology can be used to make the distinction between real and credible arguments and fake arguments. And I know that there are teams actually working on related stuff in academia. So I think we are not solving all the problems in the project, but I think we are building technology that will help us to meet this very important challenge. Great, so it's nice to see that you've thought of this and have potential solutions. I'd like to end on a quote from your abstract where you say, debating humans lies outside the comfort zone of AI for which novel paradigms are required to make substantial progress. So this actually in a sense covers, I think three questions have come in about how do you tackle tricky things like sarcasm and metaphors and how do you tackle bias in the data corpus and I get the sense from your abstract, this last sentence of abstract that new methods are needed. And maybe I wanna connect that to the second half of today's session with new neurosymbolic techniques. What are your views, could that be a candidate for this new techniques that you speak of? Yeah, absolutely. We should definitely explore this pass because I think the notion that the language models that we have today are sufficient to solve these kind of problems is simply wrong. And we don't have enough time to go deeper into that but it is clear to me that you cannot solve this kind of problem using just a larger and larger language model. I don't see that happening in, I cannot even conceptualize and envision an end to end system which is doing a full life debate. Maybe in decades, who knows? But if we want to make progress in the upcoming years, I think we need to nurture and consider this alternative pass and perhaps other directions this way. So absolutely, I think you are right to the point that this is needed and should compliment the technologies that we have. Wonderful, thank you. We can't applaud you, but please join me in thanking the speaker for your time. Thank you very much. And if you want to hang around for a few minutes, minutes on Crowdcast to answer any questions that would be greatly appreciated, but otherwise thank you for your time. Thank you very much, no. So let's move on to the next item on the program. I'd like to introduce two speakers at the same time because they're working on the same project. I'll start with Joan, my colleague from IBM Research Johannesburg. Joan Hills from Uganda and finished her PhD from UCT in Cape Town on natural language generation for the healthcare sector. And then after she joined IBM Research Johannesburg, there's a very nice article about her and her PhD on the UCT website. So I encourage you to search for that. For me, she's a brilliant, brave and humble scientist. I'll take a quote from that article that I was mentioning, the UCT article, where she talks about there's a skewed perception that I defied the odds and as editors note, achieving her PhD while site impaired that couldn't be further from the truth. My university helped me, my friends helped me, the world folded to accommodate me. You can see how brave someone and humble someone has to be to say that. So that's Joan. And then at the same time for this next 20 minute session is Alexandra Rehmacher from our Brazil lab. So that's one thing I love about IBM. We have labs all over the world. Alexandra is the author and co-author of more than 90 papers published in peer review journals and international conferences. His areas of expertise and interests are logic proof theory, knowledge representation and reasoning and everything relevant to this NLP topic. He's also an adjunct professor at the FGV university. A fun fact, he's a fan of Lisp. So he's definitely got my respect there. Let me hand you over to the two of them for the next session, for the next talk, AI and NLP for social good. Thank you very much, Ishaia. As introduced, I am Joan Giamgisha. I'm a research scientist in the Johannesburg lab in South Africa. Today, I will be talking about what IBM is doing in the NLP space for under-resourced scientists and specifically for the Bantu class of languages and looking at what we're doing in developing computational resources and tools. The structure of this talk will be such that we'll have a brief background on what Bantu languages are and their linguistic features. We'll look at the computational resources that we've built as well as the computational tools. So let's look at the spread of Bantu languages throughout the continent, right? It's a very widespread, 27 of the 54 countries of the continent have Bantu-speaking communities with an estimated over a quarter of a billion speakers. And the number of languages ranges from about 300 to 680, depending on their linguistic classification. So definitely an important class of languages. These languages are as complex as they are diverse and the key three features of these languages so that they have a noun classification system. They have an agglutinative morphology that we shall see shortly and they also have a complex verb morphology. So the noun classification system is a system that groups nouns into different categories. So look at the semantic categorization of nouns. It's such that firstly we group together nouns that belong, that have the same similar meaning. So people in kinship are put in one class, they have body parts, put in another class, animals in another class, sometimes this class is mixed. But this is the first level of categorization where they're grouped according to their semantics. And then from there we consider the morphological categories of these nouns. The importance of the noun class system is that it gives meaning to the nouns because a noun comprises of prefix and stem. And just to see what that means, if we look at noun semantics, let's look at runan kudil, which is a language spoken as South Eastern part of Uganda and mine which is 12. So if we look at the prefix in two, it has no meaning. And only when we, sorry, the stem in two, it has no meaning. And only when we give it a prefix, with the prefix omru, it means person, with the prefix aha, it means place. And similarly with the prefix nyankure, if we have the prefix runan kudil, then that is the language of ankure. If we have omru nyankure, that is a person from ankure. And these prefixes depend on the noun class system. So if we look at the noun class system omru nyankure specifically, this is just an excerpt of the entire noun class system. Nyankure has 20 noun classes here. I just showed the first six. And you can see how we have the noun prefix and the noun also determines the subject omkure, that means if you're conjugating the verb and the subject of the verb depends on the noun class. The object of the verb depends on the noun class. The adjective, if you say beautiful woman, omru kaisi murinju, that is dependent on the noun class and the possessives as well. So the noun class is a very strong linguistic feature in the banter language and controls a lot of the linguistics of the language. And so looking at the generic verb morphology of banter languages specifically, we have can be split into three categories. We have the section before the verb root and then the verb root itself and the section after the verb root. The more things before the verb root specify subject of the conjugation tense aspect and the ones after the verb root of extensions. And so there is just a generic structure of how these different morphemes come together when we're looking at the generic verb morphology. Another feature of banter languages is that we typically have at least two, three or even four past and future tenses in this language. So again, very grammatically complex. And if we look at Rignan, but as a cognitive morphology, it will bring together all these things we are talking about. So if we look at that example there, it's a single word, but that represents a whole sentence. We have never ever brought it in. And it brings together the verb morphology that we are talking about, where we see the negation, the past tense, the emphatic, the subject, the object, all these represented as a single word. And these are the complexities that we deal with when we try to develop traditional resources for better use. So what resources have we developed? For Rignan, we specifically, we've started with a single language and we've developed a corpus of labeled data. And we also have another corpus of unlabeled data. We have one million synthesis of a general purpose dataset that is just sentences talking about not restricted, say to the healthcare domain or to the education domain or to the banking domain, the general purpose domain synthesis. We have a million synthesis. We decided to create enough diversity in this synthetic dataset. So it has over 18,000 variations in how it was generated. And we also generated labeled datasets that have generated labeled for the morphology and for sentiment. We also varied the tense such that we have at least seven of the 14 tenses in the opponent. And we also embedded sentiment depending on the adjectives, the adverbs, the types of nouns used. The next resource that we're developing is a tree band. Now tree bands are especially important because they can assist us to be able to traverse. As we traverse a tree, we can look at the different linguistic aspects. We can generalize from one language to another, because languages are not only different on the lexical level. That means one morph can be different from another. We can look at the depth of the tree bank and square different languages differ from each other. We can look at the linguistic divergences within the banter languages. And so what we're seeing here are the production rules for quantities free ground. And these are what we initially used to generate synthesis. And so instead of generating a conjugated verb, we instead here output a tree structure and then we collect this into a tree bank and we now have over 500,000 pastries in this tree bank. And then another resource that we have developed is free-trained word embeddings and classifiers. So from the one million sentence corpus, we use the distributional context to obtain word embeddings. And these have been very useful in solving one of the four problems in banter language computation and linguistics. We're also pre-trained models for morphology and for sentiment. And these have been used for sentiment analysis, morphological analysis, and so on. So moving on to the tools, we'll look at one specific example in detail. And that is a problem of how do we disambiguate noun semantics. This is very, very important because as we've seen, the noun semantics are obtained from the noun class. It gives the noun a stem, its meaning. And we have a problem where we actually have nouns that belong to different noun classes but have the same prefix. So here are a few examples. If you speak, you'll know what these words mean. You'll know what is a person noun and what isn't. And these would belong to different classes. If you don't, then you can't tell. They have the same prefix. So how do you tell? And similarly, how does an algorithm differentiate which noun belongs to what noun class in order not to be able to pass on error to different aspects. So what we've done is to look at this from various categories. Literature says that we need to know the call codes to assist us where we have nouns of the same prefix but belong to different noun classes. And then the thing we've done is to look at the distribution of context. So from the one million sentence corpus, we obtained the pre-trade word in meetings and got nearest neighbors. So have a look at these nearest neighbors here. There is something interesting that when we have a query word for a person, we get person nouns as nearest neighbors. When we have a plant for a query word, we get plants as nearest neighbors, body parts, animals, et cetera. Why is this important? If you look at the query words for omum to omutia and omukono, they all start with the prefix omum. And this would be one of those things that we need to disambiguate. So already we are seeing that if we stick to the semantic groupings, we can actually start to see the differentiation here. And if we look at the semantic categorization of nouns again that I introduced earlier, you will see that this, what we're seeing in the nearest neighbors of the pre-trade word in beddings is actually being reflected in what we see in the literature on Bartel-Anglico computational linguistics. And so what we have used is to create a tool for noun class determination. And in the results here, you can see that when we look at the logical only, that means we only look at the class prefixes, the accuracy is extremely low because we cannot disambiguate between nouns or the same semantics. When we look at the semantic only, then we also get quite a low accuracy because nouns can't have the same semantics but have a different prefix. Only when we combine the morphology, the syntax and semantics are we able to get the best results. And so this is early results but it's the first approach to try and to determine given a noun, what it's noun class is, and then this can be passed further on to other computational tasks. And then other tools that we've developed that I'll speak about very briefly are from the pre-trained models and the tree bank. We have the tool for word segmentation based on the morphological model. We've built a sentiment analysis tool based on the sentiment classifier and from the tree balance is up to still work in progress. So we're looking at using this for machine translation. And again, looking at how we traverse a tree for different languages. This can be used for translation between better languages. And most importantly, from non-banter languages to banter languages where traversing the tree bank can actually help us to determine where the test morphin is so that we can then translate that to the terms of the target language and vice versa. We're also using this to perform an automated evaluation of the text of the generated. So again, the one million centers corpus is a synthetic data set. And it will be very difficult to use human evaluation. But what if we had curated human generated copper and then we could use the tree bank to look at how we traverse the trees and where the differences occur between the human author than the computer generator. And then finally, looking at the different structures of the trees that we have linguistically where it can be able to compute actually have a figure for the linguistic diversity within all these languages and see how that can help us to develop generic tools. So that is all for me. Thank you very much. Thank you. Thank you very much, Joan. That was amazing and reminds me of the Bender Rule, which I actually learned from Alexandra about the importance of doing research in different languages because the techniques you need are different across the different languages. And you showed how you got better results when you took into account the morphology. So great work and great demonstration of the Bender Rule. We'll hold questions for after. I've already introduced Alexandra, but please, everybody, all members, all attendees, ask questions in the crowdcast Q&A box. I was gonna throw in this thought that last week we learned that there's the importance of participation and consultation and different perspectives. So this is the platform. Let's break all records. Ask questions and upvote. I love the idea of the power of the wisdom of the crowds. Let's show this working in action. Are you engaging with technology now? And let's give back to the speakers. We have wonderful speakers here. And when they get, when Alexandra knows he's a lecturer, adjunct professor at the local university in Brazil, he knows when it comes to engaging with students how important input is and feedback for the lecturers. So that's a gift we can give the lecturer for their time that they've given us. Let me hand over to Alexandra. Thank you so much. Okay, hello, guys. Hope that everybody can see me. So my name is Alexandre. I'm from Brazil and research at IBM. And I want to talk with you a little bit about these language resources all the work that we've been doing with language resource for natural language understanding. And it's important to emphasize this idea of understanding. So understanding could, when you process language, we may need different levels of understanding, right? So for example, in the applications that nowadays we are very accustomed to live with like an in-mail application that's able to detect reference to add place or events and put some automatically that information or agenda, this is a completely demands a different level for that standard comparing to application that, for example, that try to answer a question, right? So understanding is a complex thing. Understanding is even difficult to define, right? So how can we define understanding? Different ways to define. So over years, people have different proxies for that instead of trying to define understanding, people try to create tasks that try to measure the understanding of the system by the ability of the system to solve that task, right? So for example, one very well-known task is the text entailment when the systems should be able to detect that one sentence followed from the other. So the truth of one sentence implies the truth of the other or in the other way around we have a contradiction or the neutrality, a contradiction which means that the sentence, if the first sentence is true then the second sentence need to be false, right? So this is a prox for understanding precisely because the difficulty of the fine understanding. Language also has these problem or feature, right? That to understand language means knowledge from the word, right? Common sense knowledge as we say it. So simple strings of texts are just understandable if we do have those knowledge that came from different forms, right? So for example, geologists may know about the times of the period of times of the history of the word and then he's able to understand that two different centers are referring to the same age of the word. We also have those, this last example here where when we saw the sentence like I saw a man with a telescope that could be completely ambiguous in the sense of I don't know if the telescope is an instrument or just an object that the man that I saw is holding and these ambiguity could be clarified if you add some extra information in the sentence because we as humans would be then able to understand what is the preferable complex to solve the sentence, right? So all of these make a language understand very hard but difficult and that's why we are researching this error. And moreover, to connect with Ron at talk we need to be aware that most of the work now this is still in English but we have many more language in the word and potentially we would like to understand those language because all of those lines bring different culture for us and different knowledge that encode in those language and different perceptions of the word and that's all important for us as humans, right? So my journey in the development of language resources started back in 2010, 2009 when it started to develop with the first resource the Portuguese word net and this is based on the these lexical resources as we call is the word net from Princeton work. So the word net was this kind of a dictionary but different from dictionary we have words and words are connected by our group together in a content forming concepts and concepts are connected between different semantic relations and with those with this kind of a resource which we can potentially have the task of disambiguate a word in a context. So here for example, I have an example of the mouse and a mouse could be many different things that depend on the context and the task of what sense of the equation would be to detect what is the right sense of the word in that particular context, right? And on the top I have these simple centers of they can fish and for non native English speakers the can is almost always understood as a modal verb but can could also be the verb of put something again, right? And then we have these two different understanding completely different understanding for example. So the work that I start is to produce the similar resource for Portuguese and nowadays it's considered the most complete and stable a word net for Portuguese. It's incorporated parts of many other resources that people sometimes are used without knowing that we are embedded on that. So I don't add to open muting word net and many others. We have been using by different tools and even for tools that people use every day like a good translate service that uses this resource and it's completely free available. In my page people can see different publication that we have about these resources. So this was the first work. And later on we realized that okay, so this is the first resource and that's a language but language is not only that, right? So language have many different needs and then we started this long term project that's hosting in GitHub as an organization that's language resource for Portuguese with many collaborations for students and colleagues of mine. Once one second resource that we did and this is very, very connected with the joint talk that's a more follow-up analysis for Portuguese. So finite, this is a complete full form dictionary of lexical entries for Portuguese. And I'm highlighting here the numbers. So this is a huge resource that we combine for different resources from European Portuguese and Brazilian Portuguese. And we combine and normalize that information to a single resource. And those huge numbers are only possible to be effectively used because we are using techniques from finite state morphology. So all these resources are combined with a single network in an automata that could efficiently process a text and produce possible analysis of a word. And this is interesting because this is the importance of this morphology analysis because again, different from English that people most of the time remember, Portuguese is a rich morphology language. And for many types of natural language, having this information of morphology, inflections and derivations is a rich resource and very useful. And we should never underestimate the importance of understand the language in full details. Next comes the my curiosity of parsing, right? And over the time, what happens is precisely that as you can see, I'm presenting different resources that happened to call my attention because over the years, we start to understand the necessity of different levels of understanding and processing. And parsing is the task of trying to understand the structure that you have in the sense, right? So you move from a simple string and transform the simple string into a tree or a graph that you have information. Parsing comes in different flavors, right? In normally when we talk about parsing, we may be talking just about the syntax structure of the sentence, subject, object, verb, or we can even be talking about a more semantic understand of the sentence. I have a few examples of all these in the next slides. So here I'm just showing that the sentence that I pointed out in the last slides will have different analysis that will highlight a different understanding of the sentence. Another thing that we usually did in learning these schools is that there are many different theories about parsing, right? So parsing could be considered one of the head of the linguist's studs of language, right? So people's grammar is the main documentation of the language. And so parsing, there are many theories of parsing. There are many ways to represent the syntactic structure of the sentence. One particular, and this actually, among those possibilities, this actually, the existence of so many possibilities actually was one of the things that kind of put some barriers in the development from that to understand that are broad available for different communities in different language. And to overcome that difficulty, we have this universal dependence project that started a few years back. And the idea is that is to produce a simple and consistent way to analyze the grammar of sentence across different language, right? So it was designed with few principles in mind and those principles emphasize different aspects. As a whole, it's those principles play an important part of the design of these universal dependence guidelines. And after years, what we have is more than 203 banks and more than 114 language document and feel available in this website. And as you expected, they also produce, define not only the way that the tags and the way that they are naturally done, but also the format that the data should be coded. And I joined this project 203, maybe more, five years ago. And I'm now responsible for at least three from the Portuguese corporate in this project. So the most important one is the Bosch one, is the one used by many tools that we would have nowadays. And it's comprised for 9,000 centers. And it's too small compared to other English corpus. But this is a challenge perceived, right? Because now we have 9,000 centers and we want to be consistent among the analysis of those centers. So we want that the same kind of linguistics structure should be analyzed the same way in different contents that they could appear, right? And we need tools. We need the tool that I'm showing here in the screenshot was developed by one of my students a few years back. And we need tools for searching, for browsing, libraries for processing those files. And all of these are our long-term project that we want to keep keeping work on that and try to improve diverse source over the years, right? Now, despite the fact that this data set was created over years, we still have many issues open and all the work has done over GitHub. So this is an example of analysis using universal dependencies. So in the left we see the Portuguese version of the sentence. On the right we have the translation of the same sentence and the analysis with the English model. As you can see, this is one of the benefits of universal dependencies. We have a very similar structure despite of the difference of the language. And this is one of the important outcome of the project. Okay, but moving from syntax to semantics, we're talking about understanding, right? So is it syntax enough to understand the language? And my claim is that it's not. And we have different levels of trying to understand the whole structure of the sentence. One of those is the semantics or labeling, right? So when we do semantics or labeling, we want to try to get the information, the proposition information from the sentence, right? We want to know who did what to whom. So what was the main verb? The main verb, what are the arguments of the main verb? And so what is the core of the semantics of the sentence? This is considered a kind of a shallow semantics, but this is the first step for a semantics of the sentence. And for doing this, again, we need data, right? And one of the, for English, again, we have a standard data set that people use to train different systems. But for Portuguese and other languages, we still have lacking of resources. I'm contributing now with the universal proposition that's repository created by IBM Research and Group of AMDEN, that's contributing. And these resources is a projection on top of the old universal dependent syntactic analysis. We are projecting this information of the semantics, right? The verbs and the arguments of the verb in the semantics level. And moving on from semantics representation, one of the things that we learn is that semantics is so hard that we have many different approach to that and not all of them capture all the details that we have in language. So another proposal for having a little bit more information about the semantics of the sentence is AEMR representation. We have works inside IBM to develop parses that are able to get a sentence and produce these graph representation that provide not only the information that we have from the prop bank on the semantics level, but a little bit more information regarding the nodes of the graph. Here we have examples of, for example, of two different graphs. So for example, on the left, we have a sentence that we have the boy that's at same time arguments of two different verbs in the sentence, right? And on the right, we also have ways to formalize these sentences. I would compare that with another model that I was showing you in that. Okay, the problem with these AEMR representations that in most of the case, people try to have these data sets to train the system to make these AEMR representation and this is training on pairing texts of the sentence with the final graph of the sentence. And the problem is that these could be two very different and distant representation of the same information, right? From one side, we have the surface information, the other side, we have a graph and this graph is not produced compositionally from the grammar of the language. And this brings me to a discussion about what is needed to deep process human language, right? There is a, then I start my collaboration this consortium that is around the idea of developing tools and techniques to actually produce our grammars, formal grammars and computational grammars for language and those computational grammars could potentially be used to process language and in the most automatic possible way. And those are based on black score sources, so it's very lexicalized and also produce semantic representation of the sentence. I'm bringing here this analogy that Emily Bender presented in one of her presentations a few months back where she compared this view of language as this random pattern in the window, right? So the idea is that language is the patterns that we see in the window, the raindrops. And the question is, are we just focused on the scene outside the window? Just wants to extract information from language regardless of how language works or do we want to focus on the patterns of the raindrops and try to understand those patterns and how these is connected to the scene outside the window? I thought that this analysis is very interesting. So to give you an example of this kind of a work, so we have, as I said, a much more deep analysis of language, so we have a grammar behind the scene and then analysis of the same sentence from the string of the sentence, we can produce a deep semantic representation with this logical connecting words are transforming to predicates and we have arguments that connect how one word complete each other. Those semantic representation are interoperable so we can convert between formats in different ways for different purpose, different formats may be more useful. And we have a link demo here at the top. And of course, as you can guess, after starting to work on these tools and these techniques, I started to wonder, okay, so maybe we can start to produce a Portuguese grammar for being able to use those same tools for processing Portuguese. And that was the project that I started last year. So I have a collaboration on that and we have those long-term, medium-term and short-term goals to produce a grammar for Portuguese in these formalities, right? Here I'm showing just a simple screenshot about how these grammars are implemented to give people some curiosity. So this is basically very lexicalized as I said before. So those grammars are driven by the lexical of the grammar. So the main work is the lexical acquisition in the organization of the lexical entries. This is the most part of the grammar is in the lexical. Okay, and finally to conclude my talk, I just want to call attention for one project that I have been working on the past months and this project is how can we put everything that I said here together and produce a new resource that potentially could be used to final applications for example, question answer or projects like debate or many others, right? So what I'm trying to do is this idea of, okay, I want to have a knowledge base and to construct a knowledge base is a very hard task, right? Because knowledge comes in many, many different contexts with many, many different levels of granularity, right? So producing ontology or a knowledge graph is a very difficult task. So why not trying to produce that in a lightweight manner, right? So trying to get information from text. And this started with this, from what net itself, right? So if you take for example, what net and look for butter, the definition of butter, then we, from the structure of what net, you know that butter has some semantic relations and then you can guess the meaning of the word butter by the semantic relation that you have with other concepts, right? So for example, it's a solid food, it's a garlic product and have many two or three different kinds of butter. But if you read the definition of butter, the definition of the concept that has this word butter as one way to like scalise it, then you have much more information, right? You know that there's an edible, emotional of fat globules and milk or cream and this kind of thing. So the idea was to, okay, so let's try to process those glosses and try to, first of all, annotate those glosses with the senses. So the every word that happened on these glosses should be disambiguated with the what net itself. So if I use the word edible, emulsion, fat, globules, all of these words should have itself connected to this definition in what net. That this was the first step. And I had been doing that for some years. I started with the, Princeton has already a first start on that. So they released the gloss type corpus, but they didn't complete the annotation. And what I'm trying to do is to complete this annotation and start the annotation for the Portuguese word net too. So once we have the annotation of the words, next we can, using the tools that just described, supported by grammar, we can produce the semantic representation of the same sentences, right? So for all the definitions and examples that we have inside the word net, we'll have not only the semantics and emigration, but also the semantic analysis. And putting all of these two layers together, we went up with a knowledge graph that connects concepts and their use in particular context, right? And we can browse that knowledge base, right? We can make, we can search and browse in those representations looking for patterns and the way that concepts are used and described. And of course we can also use that to train models that could potentially be used in different applications. Okay, so to conclude, I hope that I'm on time. So to conclude, I love this quote of, linguist resources are very easy to start, hard to improve and extremely difficult to maintain. And that's the thing that I, over years, I learned and we, I use this phrase many times to remember me how these work is hard. So linguist resources are easy to compare regarding quantity. So many people who wants to compare the dictionaries by the quantity of entries that they have or comparing two word nets by the amount of words that they have. But the quality of them is hard to compare, right? Interoperability is a complex and important issue, right? Because interoperability brings quality. If you can make word net interoperable with the morphology dictionary or with the tree bank, that's, we discover a lot of problems when we compare those resources and these helps in both resources to be met, right? But it's hard to make interoperability when you have different resources developed by different people at different moments, right? And another thing is that resources, when you talk about language resources, there are many sheets for that, right? You can be talking about dictionaries, corporal grammars, annotated data, data sets for text and teams, and many others. And finally to highlight these better rules that was just mentioned, right? So the better rules about asking for research to always name the language that you are working on. Never assume English as the full language, right? The many papers you see people saying that they present the whole technique but not even making clear that they are talking about data sets or experiments done only in English. And the better rules says, always mention the language that you use. Don't please assume that English is the only language or the full language, right? And moreover, to make it very formal, there is also that it's proposed off a data statement that I would like to call people attention to that. So data statements is a way to precisely describe how your data was created, who was involved in that, what are the potential bias on that, what domains are covered on that. So this is the idea of a template of a document that describe a data set. Okay, thank you, that's all for me. Great, great. Thank you so much, Alexander, that ties in very nicely to Joan's talk as well. Now we have time for questions. So let me read them out. Yes, for Joan first, a big issue is a lack of training resources. So naturally, the run Young Coral Corpus is hugely useful for the community. Is the corpus publicly available? And can you explain how it was generated and script? Yes, thanks, Ismael. The corpus is not yet available publicly yet, but we are particularly useful seriously. And it wasn't script, it was generated significantly. The paper on this was published at INLG 2020, and I think you can find it on my Google's smaller page. It details how the corpus was generated. Great, I think that also answers Lauren's question, are these tools publicly available? We'd like to use for related language, Luganda. Good, another question for Joan. How are you dealing with same word, but different meanings, like clala. I hope I'm pronouncing it right, which means hunger in Botswana language and going to the restroom in Lesotho, from Di Tiro. Right, right. So first of all, most different languages have different noun classes. And as I've said, even within the same noun class, when you look at the corpus, it helps to disambiguate between, what's happening here, between the same half exact same spelling, but half different meaning. So for example, in Run Young Coral, you find that the word for accountant and medicine have the same exact spelling. And only through the corpus can we distinguish the two. And just another thing I'd forgotten to mention for the person who mentioned Luganda. So in the tool that we do open alpha's determination, the underlying word embeddings are for Run Young Coral, but they're helping us to determine that alpha's in Luganda and in Luganda. So we do not need to, we are looking at the generalizability of these tools across languages, even when the word embeddings are another language we can generalize to other languages. Great, that's some very nice cross-fertilization there. Let me give you a break and ask Alex some questions. How do you handle loanwords? For example, I don't know how to pronounce this, L-O-L-O-L-E for lorry, or ambiguous ones like in Jala and in Jala, which means clause and hunger. That's from Ron Kis. Yeah, regarding the ambiguity, so this is precisely the kernel of the idea of word net, so a word is just one way to like analyze a concept. And the word net is centralizing the idea of the same set, this concept. So the concept may have different words that will analyze it and a word can be part of different concepts because it could be used to like analyze different concepts. So this is the kernel of the idea of the word net to move from words to concepts. And then we can handle all the ambiguous of language at the lexical level at least. And regarding the loanwords, so this is the head of many problems in lexical semantics, right? So what's the frontier, right? What is the border between these two things? In which time you can say that the word has its own sensing in one particular language what is border from the others. We have many issues we could discuss for hours here, but in some cases, in some situations we potentially may have the word there in the Portuguese word net, even in the English version of the word, right? The English word in the Portuguese word net. In other situations, we may think that we have the compositional semantics and then we skip to have that word there. So there are many possible solutions for that. And actually this brings another even more complex issues that move word expressions, right? In language you have these expressions that we use more than one word to express one single concept and not always this single concept is compositional made from the single word, right? So there are many issues for that. I hope that they just give you an idea. In fact, it reminded me of an even higher level of creativity when humans play with language. They often play off these ambiguities and what you just mentioned now. So perhaps this is an argument for, a clear argument for why you need to understand the structure of language. You can't just rely on these statistical parrot models. That's actually my question to you. You've taken us through many theoretical aspects of language understanding, but as you know, there are two camps. There's the neural networks are enough. You just need a large enough model and then there's the camp that advocates you need explicit symbols. You know where we lie, but I just would like to know if you could bolster the argument for the symbolic camp with other thoughts that maybe you haven't mentioned so far. Yeah, so this, I think that my point here is very clear. I think that it's important to not to clear understand that the distribution of semantic way to handle the semantic of the words are, they don't have any concrete model of what is the relation between words besides the fact that those words are closely related by the context that they could appear. But it's important to realize that in most of the case, we don't have anything more besides this correlation between concepts. And this is not enough when we want to really understand the semantic of the words and the connection that they have between them. So I think that the real question that we have to make is, okay, so if we have this necessity of language resource, so how can we improve the way to construct those language resource? What's the metal laws? What the tools that we need to be productive on that, right? And I think that's the real questions, not only to avoid this manual construction and say, oh, we will not need to go into this manual work. We do it automatically. It's not completely true, right? It's more about, okay, let's be more productive on constructive resource and think about what we can develop to do assistance. Thank you. That is a new perspective. What more you can extract from the data sources. That actually leads in nicely to a question for Joan. Let me read that out. Thank you, Joan, for an interesting overview of the grammatical and lexical structure of Bantu languages. My question is, why did you create a tree bank from a handcrafted grammar instead of inducing the grammar from an annotated corpus? That's exactly. So these languages are very, very, very under-resourced. And if you are to, you need a linguist because much as they're widely spoken, not many people understand the linguistic structure. So if we use the synthetic dataset that we've generated, then we can automate the creation of the tree bank. The important thing is that we can then use the tree bank as a logical analyzer to then differentiate between handwritten, human authored and computer generated. So it can help us to read out what has been computer generated for us, the computer generated going in error and how can we correct it and we can do that. Great, thank you for that answer. And that uses up our Q&A time. If you have time, could I ask Joan and Alexander to please continue to try and answer some of the other excellent questions on Crowdcast Q&A box? And I'm going to... So this is great. So after I mentioned that my plea, my call for more questions, we got a good run of questions. Let me ask for more questions for the subsequent speakers as well. In fact, I'm gonna tease you attendees. When I attend a work meeting, I always tell my colleagues, there's no point in you attending this meeting if you're not gonna contribute. So one way to contribute is to ask questions or even a simple upvote, that's your input. Otherwise, you might as well watch this recording or watch a recording of this at 1.5 times speed up. So let me conclude that session and move on to the next session. So the next session is NLP on African and low resource languages. And let me give you a quick buy of the speakers. We have Yusuf Mrow from IBM Thomas J. Watson Research Center in Yorktown. He is a technical team leader in the trustworthy machine learning field. He received his PhD in computer science from MIT where he was advised by Professor Tomaso Poggio as he's a famous computational neuroscientist. So it looks like our speakers have very good lineages, academic lineages. And then a fun fact from IBM point of view, he organizes a lot of internal seminars. So I wanna personally thank him for that. All the way from Africa, I attend your seminars. You haven't really heard much from me, but now I get to, in the public, thank you for all those excellent seminars you organize. And now we get to hear you. And then at the same time, let me introduce Richard, because it's again a two-speaker session with on the same topic. Richard is an accomplished research engineer of our IBM Research to Hannesburg Lab. And I must tell you, working with him is like watching someone do magic with computers. And in fact, you get to see what, you may get to see what I mean in his live demo. So let me hand you over to Yusuf first, please. Thank you. I'm sorry, let's, Yusuf, just for a second, you're on mute there. You unmute yourself. This is... Oh, sorry, sorry about that. Yeah. Thanks a lot, Ismail. I was saying thanks a lot for the nice introduction and sorry for unmuting. So I was saying that this talk will be on image captioning as an assistive technology, and I'll be presenting it along with my colleague from South Africa Lab, Richard Young. And this project is part of the AI for Social Good Initiative at IBM Research. And I invite you to check out the link for this Social Good Initiative that is led by Saskia and by Koush Varshni in IBM Research. And it contains a lot of other projects and other fellowships and initiatives. Let me start by presenting first the team behind this work. So this work was done as part of a grand challenge in CDPR 2020. And along with my colleague, Brian Bergador, Igor Melnik, and Richard Young, who I said will be talking in five minutes from now. So let me start by saying what is the problem. So image captioning, you are given to a computer system and image and you want to the AI system to spit out a sentence that describes what is going on in description. And effectively we've been working on this problem since 2017 at IBM Research and we made a lot of progress on it. But this very problem of captioning images has been so far dominated by what we call descriptive image captioning, which is the images are open domain, are not specific to help the visually impaired person in their everyday life because the data set that we are collecting the images on are in the open domain and are not specific to the specific needs of the visually impaired person. And they were effectively, this field has been dominated by two main data sets that is one from Microsoft and one from Google. And the only difference between those images is that Microsoft data set was around 500K images and the Google data set is around 3 million images. So as you can see descriptive image captioning on the left is open domain images that are not specific to the needs of the visually impaired person. Although they will supply some signal, but they are not specific. And I will explain in general what we mean by a goal oriented caption that will be useful as an assistive technology for the visually impaired person. So as you see on the right here, this image was captured actually by a visually impaired person on their phone. And as you can see, it is not, the visually impaired person is not expecting the AI system to just say this is a bag of food on a counter. They need a specific information. Hence we need a reading ability. We need to be able to detect what type of object they are, their location in space, etc. Things like that. So what they wanna know is that it's a bag of turkey. Would I need to know, for example, that it is what kind of the indication or the instruction to be able to cook it? So this is what we call goal oriented caption. So fortunately in the recent year, this was, I also invited you to go on visvis.org that has been a big effort in order to collect data from visually impaired people. And as you see here, these are very challenging images for an AI system to be able to caption them. Why? Because as you see here, for example, this is a medication, it's in flipped. It's not in the direction of the orientation is not the correct one. Because in this orientation problem, the visually impaired person would not be able to tell if it is the correct or not. So the AI system has to anticipate that and be able to correct for it. It could be also blurry or it has some blocking effects of certain aspects of the visual scenes. It will contain very often a lot of text and it's important also to gather this text into the data. So our system that we built has multiple component and it's trying to address this problem for building an image captioning as an assistive technology. So we start by, as I said, the importance of text. So we need to augment the captioning system with optical character recognition and we need to do angle correction. As I said, the orientation can be of the wrong one. We need to do the image feature extraction that was done using Resnex that is a residual neural network that was trained on one billion of images. And then we need to detect all the objects in the image. This was done using some open object detectors. So now, how do we get from all this information to fuse them to get back to the caption? I will not go through the technical detail but we use a multimodal transformer network along a mechanism that is called the copy mechanism because remember once I detect like Arbor Mist or Chardonnay, this may not be in the vocabulary. These words might be out of vocabulary. So we need to give the system the ability to copy text from the input to be used in the output. And this was very important for our system to work to add this copy mechanism because very often for the visually impaired person what is useful is most likely out of vocabulary for the system that it was built on. So this is a recap of the system but what I wanna say is how did we handle this challenging problem of that the orientation of the image can be given wrong? So what we do is we take the image flip them on four orientation, run the OCR detector on the four orientation and then we would select the view that has the most intelligible word in addiction. So as you see here, ESO, here it has price shopper crushed. So in a sense, this view is the correct one and this one will use it to be able to get useful information to the end user. So this is the pipeline of the work. It contains a lot of more details that I will not go over and I will tell you that with using this type of system in 2020 we were able to be the first and the winning team for this CDPR 2020 challenge. And this was by large margin where on images with text thanks to our orientation and variant text detection as well as the copy mechanism we're able really to achieve very high accuracy with respect to all the other teams and as well across multiple type of images easy medium to hard. We have this paper that we put together for summarizing our system and this is a blog also summarizing it. And now I'm gonna hand it to Richard who will be telling us how we took this system to build a real time demo of it. Richard. Thanks Yusuf. So now I'll talk a little bit more about some of the engineering aspects of this project and then after that I'll show a quick demo video of the system. So once you've developed a model pipeline that works well on your test and validation data you then need to evaluate how all your models work in a real world use case. The performance of the system can be different when you actually build it into an app because things like lighting conditions, blurry images, low-race images and even network latency can affect your performance. Another complication can be that we don't always have the same resources available as we had during testing. Things like the amount of memory, the amount of processing power or even time. The system needs to give a real time response for it to actually be useful. So we built a demo application server which is hosted on IBM Cloud. It has access to two GPUs and on this application we have a Python app server that loads up the models into the GPU memory on startup. This takes a couple of minutes to do but once it's loaded we can run data through the model pipeline very quickly. The user will capture an image using a webcam on their phone and it'll be uploaded to the application server. We then feed the image data into the model pipeline. One GPU does the optical character recognition, feature extraction and object detection. We pass that data to the second GPU that runs the transformer model and that gives us the caption text. We then send the caption, the OCR text and the list of objects back to the clients. The client application is written as a progressive web app using JavaScript and web components. It runs in a web browser so it'll be able to run on most devices. It's supposed to be mobile friendly because this is the most likely way people will use it but you can also use it on a desktop. The user will capture and upload an image which gets sent to the captioning API. We then get the text response. We then call the Watson text-to-speech API which converts that text to audio which we play for the user. We can change the audio outputs to use the on-device text-to-speech engine but the quality of these engines vary quite a lot depending on the device. We can also turn off the audio which is useful if you're using a screen reader because the screen reader will be able to pick up the caption of the image. So I'll now show a quick demo of the system. This is a pre-recorded video. I tested the app on a few objects I had lying around the house. We've got a live video stream on the phone and if you tap anywhere on the screen, it'll take a snapshot and then send that to the server to get a caption. I see a black laptop with white keyboard and a white keyboard. I detected the following text to the image, option command and the following objects, laptop. So you can see in this example, it's correctly identified the laptop even though it's only got a small portion of the laptop visible and it's picked up some of the text from the keyboard but it's got some of the colors confused. It maybe would have been more correct to say it's a white laptop with a black keyboard but it is picking up color. I see a computer mouse is sitting on a table. I detected the following text to the image, Lenovo and the following objects, mouse. Okay, this one was quite easy for it to pick up what it was. I see a TV remote controls lying on top of a table. I detected the following text to the image for Bell and the following objects, remote. In this example, it was able to correctly identify the object but it doesn't have any useful text that it's been able to pick up from the TV remote. Perhaps the text was a little bit too small for it to read. I see a half empty bottle of red mountain dew sitting on a wooden table. I detected the following objects in the image bottle. This one is quite interesting because it's managed to pick up quite a few different features of the image such as it's a half empty bottle, it's red and it's sitting on a wooden table but what it hasn't got correct is the brand name of cold drink. So it probably can't figure out what the brand name is so it's defaulting to whatever it had as a bias inside the training data. So here it's just said that it was mountain dew. Perhaps if we had moved the camera a bit closer it would have picked up some of the text on the label and it would have been able to pick up that it's Coca-Cola. I see a can of beans on a table. I detected the following text in the image, Bobby Butterbean's Rhodes Brine. This is a good example of where it's using the text that it's picking up to inform the caption. So it could have just said that there was a can of food on the table or just a can on the table but because of the text that it picked up it's able to say that there's a can of beans on the table. I see a white bottle of dietary supplement sitting on a counter. I detected the following text in the image, Vigmas for the Turing 100 and the following objects bottle. Once again, it's most likely using the text to inform what type of bottle it's in. I see a clear glass cup with a clear liquid in it sitting on a table. I detected the following objects in the image cup. And then I also just wanna show you what happens when you give it a large amount of text. I see a white paper with black text on it. I detected the following text in the image that leads natural separator centers led close to the distance the square that fetch class choose the best decision the classes. So we can see what's happened here is it's correctly captioned the image by saying that it's a white paper with black text on it. But the system is not designed for reading text. It hasn't been optimized for that. And in fact, it'll only make use of the salient text in the image to provide a caption. So it's not designed for reading large amounts of text at all. So that brings me to the end of my demo. Thank you for your attention and I'll now hand back to Ismail. Yeah. Yes, thank you. Thank you, Ismail. Thank you for that. Let me use this opportunity now to... First, let's thank the speakers. I am all four speakers from the previous two sessions and you can see the thanks coming in on Crowdcast. People are very impressed with the demo but also we're thanking Alexandra and Joan. So for this last few minutes, my task is to do a Q&A with Richard and Yusuf. So let me start asking again. I'm asking for questions. So please go ahead and ask questions. He has one with regard to vis-vis. I know that traditional captioning is supervised but the labels in those datasets are not visually impaired friendly. Is this an unsupervised approach? Vis-vis is actually supervised. So for vis-vis, it was also that the data was collected on mechanical caption images that were taken by the visually impaired people. So it's a supervised task here. And maybe to speak to the essence of the question, not being visually impaired for traditional datasets how did you account for that? So actually the main thing is that if you use a system that was trained on traditional dataset that were not captured by visually impaired, they will fail. They will fail and they will fail in big time. So the thing is that in vis-vis they collected the dataset from visually impaired people that took the pictures by themselves. So this made that there is no this discrepancy between the training and the testing set that was present before. Oh, great. So that answers the question very well. Thank you. And one last quick question for Richard. As I was watching your demo, I thought about Vera's point from last week about this is cool technology and the cool techniques, the whole pipeline. And then the work starts for her and her design thinking colleagues and HCI colleagues. How would this perform in a real life setting? I don't know if you've done any experiments. So that's part of my question. But also, have you thought of how a visually impaired person would want to adjust the outputs? Of course, like maybe always reading it out is not what they want. And maybe they have earphones or I don't know. Have you thought of variations of, in real world, contextual use? Yeah, so we haven't gone too much into that yet. The one thing that we have figured out is that in most cases, a visually impaired person is going to be using a screen reader to navigate their phone anyway. So they're likely not going to want to have a separate text-to-speech engine readout what the text is. They'd want to reuse the screen reader because then they can customize that to how they want to. It'll be the correct speed for them. It'll be the correct voice. But I do think that there will probably need to be an extensive amount of user feedback sessions before we get to the point where it fully caters for everyone because there's undoubtedly things that we haven't thought about yet. Thank you very much. And I think about, so this is very cool technology and it just warms my heart to think about how you're improving people's lives and the human story behind how it may be used. You can imagine it being a central plot device in a movie or something. It would be interesting to see how people, how this integrates into society. Great, so thank you for that. Let's close the session. And I'll let me invite our speakers to please, if you can, on Crowdcast, answer the questions that are coming in as well. There are a few questions there. Please do answer as many as you can on Crowdcast. And my last task is to tell you about my work. I am a research scientist at IBM Research South Africa in the Johannesburg lab. I do, I work on Allen N, the next session, actually being MC'd by Ndivore is part of the team. And I also do quantum. So I do, I'm part of two projects and there is a logical overlap. They're both about machine learning. So as you know, quantum computing is a new technology, completely new revolutionary new computing technique using fundamental physics of nature and to do exponentially faster computing. And it's in the early days, but it's the right time for us as researchers and students to start to get familiar with quantum computing and how to program a quantum computer. So in this way, we are running a quantum hackathon, so to speak. Let me share my screen. You can see the challenge, IBM Quantum Challenge, Africa, 2021. We invite you, we please invite you to register for this challenge. The idea here is to introduce quantum computing to the continent, to our student body, to our developer communities and to get them interested and to make sure that you don't need to know quantum mechanics or it's for the end user. We've gone to great pains to make this accessible to the end user because after all quantum computing, we hope will be like classical computing. It's pervasive, it's used by everyone. And we've chosen three fields, medicine, finance and logistics. And the quantum algorithms in medicine, finance and logistics actually have an overlap with machine learning. So if you have machine learning interest, as you do because you're attending this wonderful seminar, you can see how that some aspects of machine learning can be sped up on a quantum computer. In particular, for the medicine use case, we use a algorithm called VQE and in finance and logistics we use quantum optimization and all of that has versions for classical machine learning and speeding up classical machine learning. So I invite you to please consider joining us. It launches 9th of September. Its format is online notebooks, Jupyter notebooks that you run through and you submit your answers. So we try to make it fun and we'll have a leaderboard and you get a certificate for participating. So we invite you please to attend. So that's it from me. I wanna hand over to Niveau, my colleague from the IBM Research Johannesburg Lab. I'll just mention that he's a research scientist here and working in the field of RL and LNN, the two separate fields. He's also currently a visiting lecturer at the School of Computer Science and Mathematics at the University of Waterstrand, WITS in Johannesburg, South Africa. And a fun fact is he did his PhD in Tokyo, Japan. So he speaks Japanese. Let me hand you over to him. Thank you, and that's all from me as co-MC of this afternoon session. In your good hands, Niveau. All right, thank you, Ishma. Yeah, I'd like to welcome everyone to the second session of the day. So the theme is advances in learning and reasoning, neuro-symbolic AI. I hope you enjoyed the previous session on language understanding. Yes, as Ishma introduced me, my name is Niveau Makondo, research scientist at IBM, working on reinforcement learning and neuro-symbolic AI. Yeah, I did my undergraduate at UCT in electrical engineering and also my master's in robotics at UCT. I'm very excited for our team to talk about some of the great work we've been doing in-house. And I hope that you will find this exciting and that this will spark future collaboration opportunities. We have three speakers from our neuro-symbolic team to talk about neuro-symbolic AI and our approach to learning and reasoning. I would like to provide some context to the next set of talks by attempting to do the impossible task of summarizing the history of AI in a few minutes. If you were exposed to AI before deep learning error or if you took a comprehensive AI course at university, you might have encountered the concept of symbolic reasoning. One of the earliest approaches to AI and the core idea is to hardcore intelligence intelligence into machines where human understanding of the world is encoded using some language, a computer can process and reason about, such as first-order logic and many other logical formalisms. However, this approach quickly fell out of favor in the broader AI community due to several issues, including scalability as it required what used to be called knowledge engineers to hardcore knowledge into the machines. This made the idea of learning from examples appealing and very solution learning which required less domain knowledge as the machine could learn from modules from data. However, this too required a lot of future engineering and could not deal with well with large and high dimensional data sets. Then enter neural networks and deep learning. I can now take the same neural network architecture and more or less use it in different applications, big games, question answering, chat boards, machine translation, et cetera. As long as I have large enough data, compute and or if I can wait long enough and if I'm willing to do that for every new problem that I encountered, this sparked the idea of learning end to end, a learning system that lands from scratch with as little human assistance as possible. However, this old approaches were not all that bad. There are some desirable properties in the case of symbolic reasoning. I could easily encode domain knowledge into the model which helps with simple efficiency and generalization. And the representation language used has a direct mapping to natural language. So a human could interpret the reasoning steps and understand and or debug the model. In our neural symbolic AI team, we aim to develop from the ground up a general learning framework that naturally combines these properties of symbolic reasoning with the power of deep learning, learning from large noisy raw data through general learning methods such as back propagation. And with that in mind, I'd like to invite you to join us in this exciting journey. As usual, please use the crowdcast platform to write down your questions and we'll have the speaker's answer then during the panel discussion at the end. Upvote questions you get to, you'll like to get answers to if you do not have your own question. The first speaker is Ryan Riggle. Dr. Riggle is a researcher in the AI reasoning group based in Yorktown in the Watson Research Center, New York. He's part of the core team developing neural symbolic AI methods and collaborates heavily with our joining spec lab. So yeah, Ryan, please take over. All right, my name is Ryan Riggle. As Indieville mentioned, I'm gonna be talking about a new paradigm of logical neural networks that we have dubbed neural equal symbolic so this is in contrast to some previous work that has existed. And if you haven't seen it, I would encourage you to go and find Henry Coutt's excellent lecture on different styles of neural symbolic methods that he categorized. So all of these things are roughly in the vein of getting a neural net to respect symbolic information or somehow extracting symbolic information from a neural net. They're very much structured so that the neural components of the model and the symbolic components in the model are sort of on an opposite side of a wall from each other. But nonetheless, they're definitely aiming towards these goals of understandability, task generalizability and being able to answer more complex problems. So we offer a new paradigm which is separate from the other categories that Henry Coutt's proposed, neural equal symbolic, which means to break down that wall. So this is achieving understandability by having a single human readable representation. So we don't have a separate neural on symbolic layer where one might be doing something other from what you expect. Less data is necessary in order to train this compared to other neural networks because it does respect true symbolic inference behavior. And further, you can model much more complex problems and even unanticipated problems with this in the same sense that symbolic inference can answer questions that it was not originally designed to be able to answer, just using logical consistency and other rules. So I wanna give a background of where neural nets really came from and why we believe this is the logical direction to take them in the future. So the very first paper on artificial neural networks actually proposed neurons as logic gates. So the concept of a neuron and logic gate have really always been intermingled with each other. And in this case, it was very easy to construct the neuron that you returns the same kind of truth function that you would expect from something like AND, where when all of its inputs are true, the output is true and it's false otherwise. As you can see from the diagram here, the step function serves as the activation function making this a very discontinuous and undifferentiable neuron. And in fact, that persisted into the next rendition of how neurons might be formed. So the perceptron was later proposed by Rosenblatt, where rather than having a fixed roles of AND and OR for the neurons, neurons could be learned by tuning their weights. You can still achieve AND the behavior for a specific region of the weight space. Now, the activation function is still a step function, but the next innovation in neural net training was to smooth the activation function in either various ways. You can use a sigmoid, you could use a rectified linear unit, which has become much more popular recently. So this permits differentiability or at least more differentiability. And of course, these innovations are ultimately what brought about the deep learning revolution where you could use back propagation to train many connected neurons with each other. But of course, when you go that route, you've really kind of diverged from this original interpretation of the neuron as a logical construct. We no longer have values of zero or one. We don't really have any kind of intuitive meaning for what those values express. So this is where we start to introduce some of our first ideas in defining logical neural networks is, well, it's still the case that a portion of the weight space maps to logical behavior for inputs within a certain range. So, for instance, if I define a threshold of truth alpha, which expresses everything above alpha is as good as true and everything below one minus alpha is as good as false, I can establish some reasonable constraints that would make it so that evaluation of an and neuron, for instance, for inputs that are all either above alpha or below one minus alpha, will also return values that agree with the classical logic gate. There's a couple of problems associated with that. It was just throwing that at normal neural net training softwares is, now we have to deal with these constraints. Neural nets are usually not constrained in that sense. And further, the constraints can be quite overbearing. This, for instance, without modification would prevent inputs from having a weight of exactly zero, though there are ways to circumvent this. It also doesn't quite address, okay, so what's going on when you have a neural inputs and outputs that are between alpha and one minus alpha? And for that, we turn our attention to real valued logic in other contexts. This has been called multi-valued logic or fuzzy logic. There's lots of different flavors of real valued logic that have been well studied. And they all have the property that when you have inputs of exactly zero or one, they behave the same as classical logic, but they do have different properties in between those ranges. And in some cases, they can be configured to do things like capture probabilities. Alternately, they can have high fidelity to various technologies and classical logic. So these are the sorts of things that inform the choice of, okay, so if we're going to use a real valued logic, what should we use? There's one in particular that we have focused on. It's by no means the only useful one for logical neural networks, but Wukasziewicz's logic has a very nice property in that it's activation function, or I should say it's truth function, is basically equivalent to the real you activation function. Now, Wukasziewicz's logic in its ordinary form does not have any kind of weights. It always maps exactly to the diagram shown for this would be for a conjunction. The other operations have related diagrams. So we have extended Wukasziewicz's logic with a notion of weighting the operands to each of the connectives that bring it even more in line with relus. Now, it turns out that weighted Wukasziewicz's logic is also been studied by other groups. And it still has a lot of the desirable properties that you get from fuzzy logics in general. Number one, it does have interpretability. The weights express importance of the operands. There's a bias term that expresses what operation that you're doing. You can recover the unweighted case rather trivially. All the operations are continuous, but also many classical tautologies remain completely intact. So an expression written in weighted Wukasziewicz's logic has a very good intuitive behavior to human observers. So it really bridges these two desires of being interpretable for humans, but also having these rigorous logical semantics that we can now exploit to perform inference. So to define inference, this is a step that's different from just the ordinary evaluation of a neural network. Neural network, of course, very familiar would be to do a feedforward evaluation of its neurons. All the neurons at the bottom layer are evaluated based on inputs and neurons in the layers higher than that are evaluated based on the layers beneath them. This feedforward computation would be analogous to simply computing the truth value of an expression in a logical system, which is a useful thing to do. However, it doesn't really serve as inference. Ordinarily in a logical system, you would start with something like, okay, well, here's a collection of formulae that I'm going to assert are true. What can I learn about the subformulae and atoms that are present in those formulae? So inference is the term that we use to refer to the general process of determining information about really every subformulae that exists in the system. From a neural point of view, that would be learning information about every neuron in the system based on every other neuron in the system. And some important concepts about inference is that it can be sound and complete. Sound, of course, would mean that when you perform inference, you only learn things that are true. And complete would mean that when you perform inference, you are capable of learning everything that is true. And systems that are both sound and complete are highly desirable. And there has been some work to show sound and completeness for real-valued logics, though we go a step further in collaboration with Ron Fagan. We have determined that there's an axiomidization that shows sound and strongly complete inference for all real-valued logics, meaning all logics where you can choose absolutely anything for its connective activation functions. And further demonstrated that there's a mixed integer linear program that you can use to perform inference in Wook-Shavitt's logic and Gertl logic. Mix integer linear programming is nice, but it does have some significant computational overheads. So we're interested in establishing a cheaper message passing procedure, which is what the LNN method serves. And it does this with a neural net style propagations. So what we needed to add here was the ability to allow information to flow, not just upward through a neuron, but also downward. And I'm using these directions to suggest, if you think of the abstract syntax tree for a logical sentence, upward would be the ordinary evaluation of a truth value in that sentence, whereas downward would be the inference of some truth value at a sub-formula or atom in that sentence, based on truth values known elsewhere. So downward inference through neurons is conveniently related to the functional inverse of the activation function of that neuron. So we devised an upward downward algorithm that provably converges in finite time that in a nutshell starts at the bottom of the neural network. This is all of the atoms known throughout the system and truth values, which maybe range anywhere from zero to one propagates that information to the neurons above them in the usual sense for neural networks until it reaches the top. And then turns around and in a sort of a dynamic programming sense propagates information that was collected from everywhere else in the network, downward along all of the possible edges through a neuron. So each of a neuron's inputs gets an updated value based on the other inputs as well as the computed or assumed result of that neuron, assumed in the case that that neuron represents a formula that's taken to be true. So this procedure is sound and our paper that's available in archive suggests it is not complete. However, recent work has shown that it can be made complete with a brief extension to the neural network, which I won't get into because it's still a work in progress is being drafted for publication soon. One of the caveats about doing the upward downward algorithm is that now neurons, which from a neural network where we're used to the idea of neurons being defined in terms of a single input or value for each input and for each output, we have to work in terms of bounds. And this is because the functional inverse of activation functions may not have a unique value. For instance, relus have flat regions. So if you take the functional inverse of a relu, you have to do something special at that flat region. And in particular, we can define an upper and lower bound result for that flat region. From a logical and intuitive sense, this is sort of like saying, when I perform a downward inference through a neuron, it's possible for me to not find a result for a particular input. And that's a reasonable thing. If I say, A implies B, but I don't know anything about A, I shouldn't be able to prove anything about B. So having lower and upper bounds expressed separately allows me to state that in a fairly concise and interpretable representation. Bounds also allow us to work under the open world assumption in which not all information is known from the get go. So for our neural network that starts with the truth values at all of the atoms, many of those atoms may simply have unknown truth values, a lower bound of zero and an upper bound of one. And further, this also grants us another convenient ability, which is the explicit representation of contradiction, which is simply when bounds cross. And in contrast, a lot of other neurosymbolic methods don't have this. Rather, contradiction is expressed the same as, for instance, ambiguity or an inability to prove something as simply returning something like 0.5, whereas we have different representations for your contradiction with bounds crossing, absence of knowledge, loose bounds, and genuine ambiguity where it's tight bounds around 0.5, for instance. And it turns out that these bounds computations can also give you some useful results from a probability point of view. You can use a hybrid, we'll shave it slash girdle activation function, implementing the Frichet inequalities. And as a result, the inferred truth values end up being upper and lower bounds on probabilities at every sub-formula throughout the entire tree. They're bounded in the sense that they make no assumptions about the independence or dependence between any of the atoms throughout the system. So it's a powerful result, though we're still working to improve our ability to deal with probabilities. So you might be wondering, so how do we train a logical neural network? A logical neural network would start with, so far we've mostly considered cases where we have a scaffold of candidate formulae that may or may not be correct. And we want to train those formulae in the sense of tweaking the weights on the operands within those formulae in order to bring them into closer agreement with observed data. So in this case, observed data would be, for instance, knowledge graph triples. This serves as a grounded representation of things that are known. In a first order setting for logical neural networks, these are represented as tables, equivalently tensors or repeat applications of neurons at each grounding. And the operations on these things perform joins or reductions to work on non-scaler data. So now the output of this would be the truth value at really any neuron in the system. So you may have some target formula that you want to know whether or not it's true. You may have some target atom in the system that you may want to know whether or not it's true. In fact, for a given trading data set, every single training example may have a difference target neuron in the system, representing some specific data. Some specific logical truth. So as a result, we can model any task learning. This network can be trained to be able to predict anywhere within its entire network. And further, we can support a single knowledge graph training world or universe, you might call it, which is fairly typical of very large knowledge graphs, though of course that can be replicated into multiple distinct observations, which have nothing to do with each other. So completely independent settings, et cetera. The learning process is now, it's optimization as usual. We have a loss function that is expressed in terms of number one prediction error. So some measure of fitness based on what it is that you're trying to predict. So a candidate that I propose would be a hinge loss trying to make predicted truth value bounds at least as tight as ground truth value bounds. There's a handful of variants of that. You can throw exponents, et cetera, to encourage even tighter bounds or bounds closer to the exact ground truth, for instance. But another very important element of the loss function is our ability to work directly in terms of contradiction. So remember I mentioned that we can represent it directly. So we might as well use it. We know that contradiction is something that we don't want to see anywhere in our system. So whenever bounds cross, we can apply potentially very large penalty. And this wouldn't just be for your target neuron, something that you're trying to predict but really every neuron in the system. So contradiction loss is trying to uphold the interpretability of the entire network so that even things that you weren't strictly trained on should at least be behaving in a consistent manner. And of course, these components of the loss function are optimized with respect to the constraints that I suggested earlier. There are some means of working around those constraints and in fact, if you're comfortable in working in a completely real valued setting, you don't even need to use them. Even without constraints, the model still represents a completely rigorous, now real valued logical system as opposed to classical. So the thing that is interesting as a result of our work on LNN is that it demonstrates a complete equivalence between this form of neural evaluation and logical inference using otherwise symbolic methods. So in this sense, so the standard neural net training with the constraints ends up producing a set of weighted logical statements and the standard inference with the neural net along with the reverse inference or the downward inference that I mentioned is performing the same operations as are performed in logical inference system. For instance, there are many different logical inference systems possible, but Hilbert's system is one in which you use rules such as modus ponens in order to derive truths about the world. This is completely equivalent to doing the downward direction of inference through a neuron modeling implication. So this is establishing a logical neural network not just as a means of bridging neural networks and symbolic logic, but in fact, demonstrating a single model that is both at exactly the same time. And with the constraints, you can combine it to behave exactly classically and you can further in the same sense that neural networks can be linked together in many interesting ways, forming completely into differentiable systems. You can have standard neural networks as a special case or occupying special portions of this neural network. So there are several other neural symbolic methods that are closely related in task to what LNN attempts to perform. So the chief, I won't say competitors, but alternatives are the logic tensor network and the Markov logic network. Both of these have the downside that in a logic tensor network, you start with a logical representation in the same sense that logical neural networks do. But once you have training go on, you really don't know what the inference process is. You just have a regular neural network that's attempting to predict the truth value of a particular atom, having been constrained at training time to produce the truth value that the training examples showed that atom to have. So it's not particularly interpretable, though it's effective at being implemented while using existing hardware. Markov logic networks are a fairly different approach using Markov random fields. Again, you start with a collection of logical statements and what the Markov, learning in the Markov logic network does is figure out the degree of truth of these logical statements. So it's a little difficult to normalize the weights that it produces into probabilities. So again, while it's marginally interpretable, but there are some roadblocks to understanding what it's really doing and what its results are. So I suggested logical neural networks sidestep these issues of translation from a logical form into some connective form by having those two forms be exactly the same. And we have demonstrated logical neural networks in a handful of different applications now. So our first use case was in fact, the knowledge-based question answering or KBQA. So this is a task where you have a natural language question that should be answerable from knowledge that's contained in some kind of knowledge graph. So for instance, DVPDA, but there are other data sets that can be used as well. So answering of these questions is often more complicated than just lookup. And that's one of the reasons that existing neural network approaches to this run into trouble is novel questions never appeared in a training set. So it's difficult to get them to extrapolate to answers that must be found in different ways. And many questions just inherently require some degree of reasoning. For instance, one of the examples is, does Breaking Bad have more episodes than Game of Thrones? You have to figure out that you're doing a numerical comparison in order to answer that correctly. So, and also I think one of the bigger downsides here is these end systems really don't have any ability to explain how they arrived at their answer. They just come back with the tokens that it purports to be the answer to a question and leaves it at that. So our LNN approach on the other hand actually uses a pretty significant pipeline in order to take a natural language question, parse it into a logical form and then use our LNN as a means of answering the logical question via inference. Now, this has demonstrated more generalizability. You can ask it for questions and unseen situations and the logic still evaluates the way that it really ought to explainability is very apparent because you actually have a logical statement that you can understand. For instance, if it got it wrong for some reason, you might be able to see that the logical statement was the problem as opposed to some nature of the computation. This is a brief demonstration of the computations that are actually going on inside this pipeline and LNN in general answering a question having to do with where Albert Einstein was born. So as you can see from the video that there was a quite a large amount of information that had to be picked through, that information led to a couple of different statements pertaining to the location of Albert Einstein's birth as well as the, which was listed as a city, as well as the location of that city. Finally, ruling out whether or not he was born in Switzerland. Other projects involving LNN involve learning to reason. This has also been referred to as trail. This is a deep learning approach where the task is to determine what steps to take in logical inference. So this actually is bolstering the effectiveness of LNN by limiting the number of logical inferences that it needs to perform in order to answer a question. So it can focus computation near where relevant truths are being proved. And this work has actually recently surpassed the E-prover on one of the very difficult datasets out there. So this is making good progress in the direction of just general logical inference period. Another task concerning LNN is, this is logical rule induction, also called inductive logic programming, ILP. So this is saying, given a collection of facts, can we determine rules that explain the facts relationships to each other? So this allows us to sidestep the need in logical neural networks to start with an established set of formulae. And we could start with just facts and learn what those formulae should have been from the get-go. So this has been a work called rule in N and it outputs a logical neural network. So again, bolstering its abilities. There's also been some work concerning how to best optimize logical neural networks. It's a, with the constraints in place, it's a fairly difficult optimization problem. It's non-convex and non-smooth. And there's a handful of parameters that were difficult to learn. However, using ADMM method, it's now possible to learn all of the parameters in LNN, including that alpha parameter that I mentioned, the threshold of truth, which can be unique for each neuron. And this work has actually demonstrated really quite amazing convergence rate. So it was very successful at, not just dealing with the constraints, but simply training a neural net in general. And lastly, LNN has seen application to reinforcement learning as well. So these are situations where, rules are learned in order to guide an agent through a simple text world. And those rules have to obey, for instance, a certain logical constraints, such as whether or not you can move a block in this example. The block must not have anything on top of it, et cetera. So introducing those rules, in fact led to about a 50 times reduction in the amount of data needed to adequately train the system. So wrapping up, we believe that LNN is really offering the best of all worlds in the neural symbolic spectrum. It meets the most parts of the Benjo-Marcus deciderata for what would constitute a truly neural symbolic system by having elements of true symbolic AI, the ability to do rigorous inference, elements of true statistical AI, being able to deal with uncertain knowledge, and also elements of neural nets in the way that it's able to train in a gradient-based manner. So there, of course, are still quite a lot of open ends. So we are always seeking collaborators concerning the ability to scale LNN to massive scale with improved representation of and dealing with probabilities and also embeddings in the subsymbolic sense, where rather than having a knowledge graph of data, we might have some embedded representation of a less structured thing, like an image, and also how to handle knowledge acquisition via parsing of natural language, for instance, and to improve its inference via reinforcement learning and also different multitask learning tasks. And yeah, so we believe this represents a philosophical shift in the way that people can conduct AI. So compared to the previous paradigm of machine learning where humans would collect the dataset and hand it over to the machine, and the machine would just do something, and there's really not much more interaction on the part of the human logical neural networks and really the neural equals symbolic paradigm opens up the opportunity of having a lot of interplay between what the system is learning and doing and what the human can make sense of the situation. So that's the conclusion of my presentation. And in a nutshell, yes, logical neural networks are neural networks that have constraints, contradiction loss, bi-directional inference and truth bounds, and they grant the full power of classical logic, but also sub-symbolic neural net reasoning can deal with uncertainty and yet are still human readable and rigorous. Thank you. All right, thanks Ryan for that amazing introductions to the LNN. Now that we have some understanding of the theoretical foundation of this approach of combining reasoning with neural network, we'll move on to the next speaker, Naweed Khan, who will take us through the development of the LNN software and its application to several domains within IBM research. I've only seen two questions so far. Please keep the questions coming and the discussions on the crowdcast. Naweed is a research scientist in AI science at the IBM Research Africa, here in the Johannesburg lab. He's leading the open source development of the LNN in the global neural symbolic AI team at IBM Research. Naweed, please take over. Thanks, Ndiwul. So yeah, my name is Naweed Khan, research scientist at the Johannesburg lab and I'm leading the LNN development in this case. It's really a tough ask to come after Ryan. We did such an excellent job in trying to explain what the LNN is in terms of its breadth and depth. So maybe I'm just gonna take a step back and give you a high level overview of where the LNN is playing at right now. So touching on some of the domains as well as focusing on some research that is released within the space and maybe have a short discussion at the end around what comes next for LNN. So looking beyond research. So LNN really does try and answer a very difficult question of how do you combine neural networks with symbolic reasoning? And the way that we've attempted to do that is to take an individual neuron and to constrain it logically in a way that allows you to take a symbol and apply it to that neuron on a per neuron basis. And as described, we get this very nice class of neural networks that wasn't there before called neural equals symbolic in a way that allows us to truly lift the lid off of neural networks to be more than just a big box models. But I'll describe a bit of that later. So just to give you an overall look at what the landscape looks like. So as part of our first iteration of releasing the LNN, what we've opted for was looking at a specific subset of logic called first order logic and the propositionalization of groundings within those logical statements. So making sure that each fact in our network is in fact just the proposition. And it's a common approach in neural symbolic systems. As Ryan mentioned, we do have bounds on individual propositions. This allows us to have uncertainties about what truth values can be. And wherever you are maximally uncertain, that facilitates unknown inputs. So we can cater towards the open world assumption. The LNN has also been rigorously tested against classical logic and extended towards different classes of real value logic like Lukasziewicz and Godel as Ryan described. And we've really taken it forward in terms of the parameterization on putting weights on a per input basis. So for example, we can actually identify what is the contribution of individual operands in the computation of an operator. So where does the LNN really apply? That will probably be the fundamental question in your mind. And right now the starting point will just be in any system that has first order logic as some kind of a representation of knowledge. If you do have first order logic statements then the LNN is automatically for you. But the one thing that we do have to do differently when training the LNNs compared to normal neural networks is this additional step of constraint optimization. So it's not as simple as just saying, okay, that propagation finds the gradients and updates the weights and the biases. We actually have to say that specific neurons within the system have been defined according to predetermined symbols set by a human. And those symbols may have their own constraints associated with it. For example, if I specified a conjunction or an AND gate as a particular neuron, that means there are constraints placed on that neuron that says if all of the inputs in the conjunction are true then the output should be true. But if a single input is false then the conjunction be false. That's the classical interpretation of an AND gate. And this applies to all kinds of neurons that can be defined. Typically it's a conjunction and disjunction and implication under the hood. And what this really allows us to do at a micro level is to lift the lid off of any neural network defined as an LNN and inspect each neuron within the system to ask that neuron, what are you? What is your symbol? What is the truth of that symbol and how much certainty do you have in that individual truth? Which is something that you couldn't do with normal neural networks because there is no interpretability on a per-neuron basis. But because we have this symbolic system implemented as a neural network what you can do is take an existing knowledge base, the database of some kind and use the symbolic reasoning engine of the AND and in order to identify how logical facts and reasoning might propagate throughout the system. Maybe just to touch on this idea of where rules may come into play. You may have human-generated rules but as Ryan alluded to you don't only have to have human-generated rules. Those rules can also be learned from the facts themselves. And he did demonstrate the one piece of work that looks at templated rule learning. So it is possible to have this combination of both human-generated and machine-generated rules that work together with one another and along with the neural components of the system. But you can imagine that having machine-generated rules introduces a lot of noise within the system. It may not always apply and you always have this interplay between having noise in the system that might introduce some kind of logical inconsistency. And Ryan did flash some of the lost functions there but what it really facilitates is this ability to identify where there's inconsistencies within the system. So you can imagine one part of the network saying a neuron should be true and another part of the network saying that neuron should be false. That inconsistency in a normal symbolic system would actually cause it to raise its hands in a very brittle way and say I can't compute over inconsistencies. It doesn't make any logical sense. But what we do in the LNN is to actually change that inconsistency into a loss. And then as part of the back propagation process through the gradients, we can actually identify what were the sources of those inconsistencies? And then using this weighted formulation allow you to slowly start to change the parameters to reduce the amount of inconsistencies within the system. So that allows the LNN to operate under two modes. Typically a supervised setting if you give any particular label to any neuron within the system as well as a self-supervised system where the whole system will try and converge towards being logically consistent as it tries to learn. There is this combination of the open world assumption along with the knowledge-based reasoning engine that plays nice with one another. So you can imagine a scenario where you may not have all of the information within a neural network. Either some signal that came from a hardware system might have gotten corrupted or there's just missing data or you didn't have any information. The LNN's open world assumption allows you to stipulate that that fact could be unknown in the system. But through the reasoning process and the rules that exist on top of the facts you can actually start to fill in information that was missing if there is a reasoning path that makes sense for the LNN. And through multiple hops of reasoning it does facilitate knowledge-based completion. And finally to just touch on this idea of how LNN integrates with its back propagation engine you can imagine that any individual proposition or atom in the system can be given a truth by a human but it may not be the case. You can also have those propositions being defined by whole deep learning systems. So we get this very nice interplay between a subsymbolic deep neural network with rules that might exist on top of it and the back propagation process of LNNs allows you to actually use one deep learning system through the rules to actually train another deep learning system with those rules to make sense and there's logical coherence. So that's a very high level overview of where the LNN is right now. So just to quickly flash some slides around the research the link is at the bottom I do encourage everybody who is a research oriented to actually go to this link. You'll see that IBM has invested heavily within the space of Neurosombotic AI. Lots of publications, many projects and collaborations. So for those of you who are actively working within these research areas to reach out to the authors of those particular papers. I'm gonna flash three quick pieces of research you would have seen but some pieces of it in Ryan's presentation. The first is the main LNN archive paper that's available right now. It does identify and explain to you how to set up an individual neuron, how to construct a graph, what the reasoning steps and the learning steps might be in order to create an LNN. A second paper is around the soundest guarantees of the bounded computations of LNNs. So how do we go from computing over complete uncertainties and then as new facts come in we start to be more certain about particular positions in the system. Finally is that I guess paper that Ryan mentioned on how to train LNNs within the constraint optimization process. And it is a fairly difficult problem but there's a lot of different approaches that we could use. In this case, there was just one approach that uses an advanced Lagrangian method. Some four papers on where the LNN has been applied building on top of Ryan's papers that he discussed. The first is the knowledge-based question and answering. That ACL paper will be exposed a bit more in depth by my colleague, my colleague Francois will speak after me. The second is using LNN within logical optimal action. You can find that on archive as well along with the other RL applications that Ryan mentioned. And also what comes to mind is Benjio's presentation yesterday where he showed that an RL agent really needs to execute on some logical function. And those logical functions can directly be replaced by LNNs under the book. LNN has also been used within time series classification so weighted signal temporal as a subset of that. And then also in knowledge-based completion. So in this archive paper it looks at how to use the LNNs bounds as an embedding space in order to perform multiple hops of reasoning. So that's a very high level overview of some of the researchers. Please do reach out and build on top of the literature that is there. But just to wrap up the conversation around what's next, what's beyond LNN research. And two things really come to mind. The first is development and the second is integration. So what we're gonna do as part of the development push is to really cement this idea that we wanna move forward in an open framework. So we are trying right now to implement the LNN and will be released on Git. As an open platform that teaches all of those different tasks sets that you saw. It's a really difficult task, but we are in the process of working on that. And as part of this first implementation, the code base will be released with a backbone of Python and PyTorch. So do familiarize yourself with those tools if you haven't already and just go over the research on some level to understand the different mindset that's required in order to do computations with LNN. It is a little bit different from standard neural networks. But when we release the LNN as part of the open framework, what we will do is to continue to innovate and push the state of the art within this new domain that we've really created. And as part of the first push will be requiring the user and the human to put in logical rules. But as Ryan mentioned, you're not restricted to that. We can do role learning. So over time, we will integrate different ways of doing the role learning from data. And the data could come from either an image, the text to different kinds of domains. But really where LNN will shine and we'll include this as a whole new section and it's related to all of the conversations that were had yesterday around explainable and trusted AI is to look at the LNN not only at the micro level of inspecting individual neurons and asking what they are, but also at the macro level to really ask a neuron, why do you believe what you believe? How have you come about to reason about some fact of the universe and having this real human computer interaction where the machine can explain itself, which is something that we really couldn't do before. So that's a high level overview of the domain. And I'm looking forward to some of your questions in the chat. Thanks. All right, thanks, Nahuid, for that great presentation on the development of the LNN and its applications. Seems like I came into the stage too early there. I hope Nahuid has helped solidify some of the theoretical ideas that Ryan introduced. We now move on to the next speaker, Francois Luce, who will delve deeper into the application of Neurosymbolic AI to question answering, contrasting it with end-to-end pre-trained deep learning based language models in a natural language context. Dr. Luce is a lead of the Learnable Reason Sub-Theme and team lead of the LNN at IBM Research. He's also based in the Johannesburg Research Lab. He's leading the global effort in the development of the LNN and its application to several domains within IBM Research. Take it over, Francois. Thank you, Nduvo, and thank you, Ryan, for the comprehensive overview of LNN and Nahuid for the high level summary of LNN yet again. So in this closing talk of our session, I wanted to take a step back and make a case for Neurosymbolism and LNN as a way to achieve next generation advances in natural language understanding or NLU. So what I'm gonna do in this presentation is to review some of the problems that we have with deep learning-based transformer-based language models. And I'm gonna discuss how Neurosymbolism can address those deficiencies. And I'm also gonna show an example of how we apply Neurosymbolism to a question and answer task. So why do we want natural language understanding? So firstly, natural language ability is one of the all marks of human intelligence and language is central to the human experience. So the ability to inject automated NLU into our everyday lives, we can add considerable value to so many industries and human pursuits, industries like healthcare, law, education, but also recreation and other creative outlets. And by injecting automated natural language capability in our everyday lives, we can enhance and enrich the human experience. And so one of the prime examples of this is the BERT language model, which is used by Google search and it serves more than 4 billion people on a daily basis. So looking at the philosophy or the metaphysics of understanding, what is understanding? There are basically three broad categories of understanding. It's referentialism, internalism and pragmatism. So starting with referentialism. Referentialism basically maps your text input into external reference or it maps contexts into truths. So referentialism has the ability to evaluate the truths of sentences given a context. And referentialism is the typical mode that neurosymbolism uses. And this typically gets executed as linking things that you see in the text to an external knowledge base. And then we gain knowledge of things that we see in the text and we can make sense of it to our degree. Secondly, internalism. This is somewhat related to referentialism. With internalism, we retrieve internal representational structure given linguistic data. So basically we map text to internal objects or processes. And so this suggests a little bit of a causal understanding. So referentialism, if we go back one, referentialism supports systematicity. So systematicity is where you systematically arrive at an answer or an output. So you go by it step by step. And this is a very important quality for interpretability because you can examine the inner workings of a model if it's based on referentialism or internalism. So internalism would be intrinsically systematic. In contrast, we have the mode that is predominantly used by deep learning models, the mode of pragmatism. So imagine you have an agent and the agent is intercepting communications from an unknown universe. It might be able to imitate or mimic the conversational patterns that it sees, but it's inability to ground into things that are known would basically eventually betray its lack of understanding. So in this talk, I want to play these two against one another, the approach of referentialism for neurosymbolic AI and pragmatism for deep learning. So looking at the best deep learning models for natural understanding that we have today, we see that deep learning through the attention mechanism in transformers has brought us these large scale language models like ALMO, BERT, different variants of GPT and many other language models that's based on these foundational models. And these models display a very impressive variety of linguistic capabilities and adaptability to so many different situations. And these models can actually train through deep learning on large unstructured data, raw text data, basically through a simple self-supervised objective of auto-recreation. So auto-recreation, you only need to predict the next word given a preceding text. So what's the problem with these transformer based deep learning based models for language understanding? So if we take a large scale language models like GPT-3, these models are exceptional at producing fluent and plausible sounding text without necessarily being grounded in truth. So when I say grounded, I mean being able to link things that you see in the text on the surface form to link that to things that you have knowledge of. So pointing to a knowledge base, for instance, having knowledge of something allows you to reason more systematically and more soundly about the things in your context. So even though these deep learning models can produce text with strike fluency, they eventually lapse into a state of incoherence. And this suggests that they are merely parroting according to the statistics of the data. So these deep learning models they rely on something called emergence where behavior is implicitly induced with minimal priors. So they basically built in the case of transformers they built statistically representations given the current patterns of words. Now, these deep learning models can accompany their outputs and decisions with generative explanations. But there's little assurance of the faithfulness and the soundness of these explanations or whether it actually gives the right kind of insight into the moral behavior. So these models lack an intrinsic interpretability and the systematicity. So that means that they have unmapped some unmapped capabilities. And that means that these models can lead to unintended consequences and unknown failure modes. So an example of some of the reasoning flaws experienced by these transformer based language models here we have a study that was done on the birth classifier birth being the basis for a lot of natural language inference transformer language models. It's very, very capable to perform inference but basically it's a statistical learner. So it's prone to adopting shallow heuristics or so-called shortcut features. And in these examples, the shortcut features or the shallow heuristics that birth adopts is to say if it sees a port of a sentence in another sentence, then it would readily assume that the one entails the other. Let's look at some of the examples of this. So we have three different types of word overlap starting with the constituent word overlap. We have the sentence starting with if the artist slept and then birth immediately says that implies the artist slept. But of course this is a logical error because this precondition is not necessarily fulfilled. Then if we go to sub sequence overlap, we have another example here, the doctor near the actor dons. Bird implies that oh, it's the actor that dons. Which again, this is incorrect. So bird actually gets confused here by the change from passive to activist just because these two sentences share the same words. And it's throwing the incorrect logical conclusion. And similarly with lexical overlap, here we just have a disconnected sub sequence that appears in the consequence of this implication. And bird again makes the mistake due to this change from active to passive voice. So how can neuro symbolism help us with some of these issues? So if we look at like GPT-3, it has actually trained on three to four orders of magnitude more data than humans will ever read or hear. Yet humans achieve a better language understanding. So advancing grounded language learning, this is essential to approaching human language acquisition efficiency. Furthermore, a neuro symbolic approach uses logical rules and these logical rules are either acquired or they're utilized during inference. And these are inspectable and auditable rules. So you get some insight into the model that you perform inference with. And this means that you can map the behavior, the capabilities of the system to a larger extent. And the system behavior can be anticipated more reliably. So here we have an example which was previously shown by Ryan and Awid. An example where we apply logical neural networks for the task, the natural language inference task of question and answering. And this is an approach, a neuro symbolic approach that is based on this mode of reasoning or understanding called referentialism. So if you recall, referentialism is where we link things that we see in the text on the surface form. We link that to things that exist in a knowledge base. So this is the act of grounding. So we'll take entities and the relations that we see in the text, some abstract representation of the text and we'll map those to corresponding entities and the relations that we have in a knowledge base. And that very act gains a lot of information about what we're dealing with. And this gives us the ability to actually perform proper logical reasoning in order to determine the answer to the given query. So looking a bit more closely, here we have the pipeline on the left hand side and we can start out with the natural language input to the pipeline, which is the question. Here the question is which actors stored in Spanish movies produced by Benicio del Toro. The first step that we take is to convert that into an abstract meaning representation, which can be represented as following the example this I am or we didn't take in the second step and we take the entities and the relations that exist inside of this I am or and we point that to our external knowledge base. In this case, we used DB Pedia, but you can also use Wikidata for most of the popular type of Q and A benchmark datasets. And then after we have grounded these pieces of the I am or we now have a grounded I am or representation. And then the next step is to take this grounded I am or representation and to convert that into a logical form. In this case, we have first order logical form for our question. This becomes the input to an Allen or alternatively we express this logic as an Allen and as a neural network. And then we permit Allen and to perform the reasoning in order to obtain the answer. And this would be proper logical reasoning whilst making reference to all of the knowledge that's pertinent to this scope in the knowledge base. So Allen and gets to traverse the knowledge base through a reasoning approach in order to answer this question. And then we get out the answer. So how does this system perform? So we call this system NSQA. In the bottom line here, we can see the performance and this is for two very quite difficult Q and A datasets. And if we compare our method against these state of the art Q and A systems on these benchmark datasets, we see for a F1 accuracy score, a higher score being better that we outperform the state of the art. And in some cases with statistical significance. And the hypothesis here is that because we symbolize everything in our pipeline as early as possible, we get the best opportunity, the best chance to perform sound reasoning in order to obtain the answer. And this produces better answer and this could contribute to the better performance of our system. So this is a quick overview for this specific case and the paper is linked here. And then in closing, answering the question, what is next in natural language understanding, we firstly recognize that still the growth in moral resource requirements will still forexceed the generational hardware improvements. And that means that we're gonna have to keep working at these models. We're gonna have to keep introducing further useful priors into these models in order to improve model performance. We're then we're also gonna see a trend of using deep learning and back box models for high stakes decisions being scrutinized more and more. And then we're also gonna see our greater emphasis based on intrinsic interpretability, explainability and causality of these future language models. And we're gonna see to that end a greater administration of experimental protocols that examines the internal dynamics or the inner workings of these models. And we're gonna see the deep learning community turn to neurosymbolic approaches integrating that more and more in order to retain all of the benefits of deep learning, but also gain the intrinsic interpretability and systematicity of a neurosymbolic approach. And we're gonna see neurosymbolic representations that are powerful like LNN being adopted more and more in these future desirable language model approaches. And with that, I'm gonna conclude and hand over to Ndiva. Thank you. All right, thanks Franco for the great talk. I hope that has helped clarify the advantages of neurosymbolic AI and LNN language understanding tasks. So that concludes this series on advances in learning and reasoning. I'd like to thank our speakers for their amazing presentations. I hope we've managed to get you excited about our journey to neurosymbolic AI and the application examples have helped make the material relatable. I'd like to invite all our speakers back to the stage for the final discussions. We have Antwa and Nawid. I don't know if Ryan is joining. All right. I will pick three questions from the audience, one for each speaker. I'm not sure if we have enough time for all the questions. I'll read all three questions and you guys can answer once I've read all the questions. Starting with Ryan, we have a question from Hope. The question is ultimately, what's the general problem or concept of LNN is based on or trying to solve? We move on to Nawid. What problems can a young researcher start working on in LNNs? And then Francois, can you give us a concrete example of a problem that is generally tackled with traditional neural networks and contrast how LNN will solve it better? I do feel like your presentation did kind of touch on that. Over this question was asked earlier on before you have a chance to speak. Ryan, your question. Sure, okay. So what is the general problem solved by LNN? I think that LNN has a lot of different fronts, a lot of different things that it's trying to do. So if we boil that question down to something like what sorts of problems, like business tasks, for instance, might LNN be well suited for? LNN is very good at prediction problems in the same sense that neural nets are very good at prediction problems. They can be trained in the same way that neural nets are in the sense that I have a training data set in different examples of observed facts and particular answers, ground truth answers for what prediction should have been. So given thousands of such examples, I can do the gradient based optimization in order to make those predictions accurately, though of course with the distinction being that LNNs are thinking to also maintain logical consistency throughout. LNNs are more flexible than that. They can be used in a setting that is not very typical of neural networks, where for instance, you have multiple different things that you are trying to predict at different times. These are still, I'm still phrasing that as a prediction problem, but it's just predicting different things at different times kind of problem. You could, as a point of comparison, the various language models like BERT and GPT and so forth are also just prediction problems. They're attempting to predict, what's the next token in a sequence? So you could map that kind of problem into LNN as well. But I would stress, it's meant to be a very general model. It's almost at a paradigm level in the sense of, so what can you use neural nets to solve? And the answer is almost anything. Yeah, with the added benefits of interpretability and generalization. Yes, yes. Yeah, you can incorporate domain knowledge much more easily. Yes, definitely. Yeah, so I guess to get to that aspect of the question of, so why neural nets and not some other method, those are the concrete things that we can say is, it's trivial to incorporate domain knowledge. You just directly represent those formulae expressing domain knowledge as extensions to the neural net. And another thing too is, I didn't stress in my talk the modularizability. I can take a portion of a logical neural network and reuse it elsewhere. I can combine logical neural networks that are trained on different tasks, but in the same setting, because the truths of the different formulae learned should be compatible with each other. Making a larger network, making a larger network, training out of two, and then retraining that might constitute a good method of building up a large network from smaller independently trained components, for instance. So there's, again, lots of desirable reasons to investigate elements. All right, thanks Ryan. Naweed, what problems can a young researcher start working on in NNNs? It's actually a really interesting question because Ryan just touched on so many of those topics right now. The one thing that stands out for me when you look at new or young researchers coming into the space is some researchers don't have a lot of experience within logic or they don't have a lot of experience within deep neural networks, but they may come from different environments. So a physics background, applied math background, many different areas that are starting to only now look into how to apply neural networks within those fields. So you may come from a different environment. The question is when you're looking at neural networks being applied in those spaces, LNN becomes applicable where you say you don't just have data, you don't just have neural networks trying to do pattern matching on their data, but you actually have some kind of domain knowledge as well that you can enforce with that. And it's truly this interaction between the knowledge that you can reason about that provides some gains that we might be able to see as young researchers because new fields may not have as much data as existing fields. So for example, if you're working in computer vision, there's a lot of data that's associated within those fields, but if you're coming from a physics background, it's more likely that you'll have knowledge that you can represent than having a whole lot of clean data that you can actually work with. And this is where the LNN can shine in these kinds of environments because LNN fundamentally can operate as a zero shot learning process because a rule can be something you learn on without even having any facts associated with that system. But as a little bit of data starts to come in, that's where this interaction between the rules and the facts starts to shine. So you can kind of see how coming from a different environment has less data which makes deep learning based systems less likely because for deep learning, you need a lot of data to train. Whereas the LNN, you just have to simply train on those rules itself. So it's this interplay between the knowledge that you have and the data that might shine in those ways. All right, thanks, Nawud. Front work, maybe you can just re-emphasize some of the examples you showed in your presentation. Yeah, I wanted to give the most prominent example that I can think of a concrete example of where LNN does better than normal neural networks is with knowledge-based computation or link prediction. And so we showed a couple of papers in the list of things that we've done with LNN before. And the reason this for me is a very strong example is because you do have the use of neural networks in combination with embeddings, which is a very, very popular approach for link prediction. So they would use neural networks in the way of learning a knowledge base, essentially as a neural network operation so that they can go from one embedding to another embedding given a certain relation. And LNN outperforms some of those methods. And we have a very strong hypothesis of why that is the case. And it's simply because LNN has the right kind of inductive bias for rule induction. So we place the template in exactly the form that you can outfit after you have learned everything that you need to learn. And you also have the added capability of a parameterization of that model where you can learn nuance if that is necessary. So this for me a very prominent example of where LNN does all of the right things in order to outperform neural networks. Additionally, with this right kind of inductive bias that LNN has, you can directly leverage the knowledge base as it is. And you don't have to start approximating it with a neural network. So that means LNN can leverage human authored data directly and we have the right kind of inductive bias for reasoning the way that we wish to interpret it. And that means that we go beyond what neural networks can do in terms of their lack of concise interpretability. So that's the answer. All right, thanks Francois. Yeah, so we've come to the end of the learning and reasoning session. I hope you guys enjoy the talks and the discussions at the end. I hope to see more participation in this journey going forward. I would like to invite John to close. Thank you colleagues for this excellent session. And also for all of you for bearing with us during this last part of what's next in AI session. I'd like to thank you all for being with us during this particular round of the lecture series and looking forward to having you when we run it next. On behalf of my colleagues at IBM Research, we'd like to thank you for all for being here and just to remind you that you can learn a lot more on the topic of AI through the various resources that we as IBM make available to our academic partners. Through the IBM academic initiative program, IBM gives a lot of resources, learning resources and also software and access to IBM Cloud through the academic initiative. It's all available free of charge. Just log on to ibm.com forward slash academic and through that you can access the academic initiative and many other resources from IBM for your learning on artificial intelligence. On behalf of my colleagues at IBM, thank you so much for being with us during this two part series and looking forward to host you for the coming quantum hackathon that my colleague talked to you about. The links will be available in the chat. Please register for those and we look forward to hosting you once again. Thank you very much and goodbye for now.