 So, good afternoon everyone. I'll try to keep it short because you're running a little bit out of time. I'm actually presenting a recently funded project so I don't have anything to show you. What I have to show you is our ideas and I can promise you that the technology exists and how basically what we got money for to do in the next two years. So, what is this? So this is basically the outline of the talk. I'm going to introduce myself, say what the problem is, and then go a little bit of what has been done until now and what we're going to do for the next year and a half to try to push language learning through computers onwards. With my self-introduction I'm basically a cognitive science that came from Portugal about three years ago. I'm at NTU in the Computational Linguistics Lab and we do a whole bunch of semantic related and syntax, each syntax. So, a lot of meaning banks, both free banks and sense banks. We work with word nets, we work with computational grammars, we work with semantic based machine translation and also multilingual sentiment analysis. But today what I'm going to present to you is basically one of my contributions to the multitude of projects that we run at NTU which is a computer assisted language learning. This comes basically from my passions of learning languages and trying to get computers to help me learn languages because sometimes humans cannot reach everyone's needs. So basically, if you've been to a university in Singapore, for example, you know that technology enhanced learning is a very big keyword right now and you've probably heard about MOOCs and e-learning platforms and blended learning classrooms, so it's very high, but there's a problem. So some courses, some contents are easier to implement in these kinds of platforms than others basically. And language is one of those that is very hard to do. Why? Because basically language is infinite. If you do computational linguistics, you know that language is a very interesting, hard problem. But also because you would have to be able to scale and to adapt to students and to know the model of language, sometimes even to know what students are trying to mean when they say something and you want to help them. And basically that's where we're trying to help. So I'm going to start a little bit with the background. So this already exists. This is by colleagues of ours at Stanford that have been implementing systems that try to help students in US public schools to learn English. So this is teaching English for native speakers of English in the US schools and basically the goals are to help improve writing skills very broadly and how to do this is basically they try to present a few sentences of context. They have a student's read. This is between grade two and grade six. They have the student read the little passage and they ask the student a question. Then they try to grade the answer in somehow and if the answer is both grammatical and meaningful. So if it answers the question, they give them a pat on the back and say go on. Otherwise they try to help them saying something is wrong with your answer. Let me see what I can do to help you. So this is an example. Abigail didn't want to go hiking with her parents because she felt too tired and wanted to rest instead. Question to the student. This is a fifth grade example. So why didn't Abigail want to go hiking? So when you have a whole bunch of words that they can click and drag, the interface has changed over the time, but basically the idea of the exercise is the same. So given these words that have been selected to answer this question, what can students answer to be a correct answer to this problem? Well, there are a few possibilities. For example, she was tired, she was too tired, she was too tired to go hiking or she didn't want to go hiking because she was too tired, for example. But actually there are actually a few more possibilities as you may imagine and I'm not going to read all these. And if you think about it, there are more than what you thought there were right now. And I can continue going and there's definitely more than you want to enumerate. So enumeration is not going to work for you. Why? Because I said three minutes ago the list is very long because language is infant. So you have a problem. So how do you actually go about this? And now you're thinking how many more slides of sentences do you have to show us and only one more? So as long as you get the point that this is not all of them, right? I could go on forever and ever. So how to deal with this? Okay, so we don't want to enumerate them. We know that probabilistic grammars are fun to work with. But if you're teaching language to sixth graders or fifth graders, you probably don't want to go that way. You cannot deal with 80% correct grammar on sentences that you want to teach students, right? That's also a problem. And as you can imagine, we need both ways to check the meaning of what the student is trying to say and the grammaticality of the sentence. So you have to check that they actually answer the question and if the sentence is a correct English sentence. So basically, we reuse handwritten computational grammar to do this. In this case, we use the English resource grammar, which has been developed in Stanford for many years now, and extend them with mal rules. And if you're now going mal what? I will say it again, mal rules. It's a very simple idea. You extend the grammar rules to accept something that is mildly ungrammatical, but you have control of yourself. So in this case, if you see an English example on the right, dogs is cute, is an ungrammatical sentence in English, and I could do this by having a rule saying, I will accept a plural subject with a singular predicate. But I will check this in my list saying, oh, this is one of the bad ones. So if I produce, if students produce a rule, a sentence that I had to use this rule for, then I know it's not good. For Chinese, the very, the singular sentence, you don't have the same mistakes. Basically, you have all the types of mistakes, like using the popular ship before an adjective. You wouldn't use that, right? So this is a different kind of mal rule, but still a mal rule or a mal lexical entry, as we call it within our system. Okay? So here are some simple errors that the English system has actually implemented and is capable of checking for. And again, it's beyond the point going through them because what I want to convey is the measures behind it. What is an HPSG implemented grammar? It's slightly beyond the scope of this talk, but you have to believe me that they exist, right? They have been handwritten by linguists for decades. And for example, in the case of the English grammar, you have very high coverage, maybe like 90% for the whole English Wikipedia. And these grammars are basically lexical. They have a rich number of lexical with lots of features that they use unification to unify with rules and other lexical entries. And then at the end, if you get a full sentence, you also get a meaning representation. In this case, we use minimal recursion semantics, which is a very computer-friendly semantic representation. And it's actually this representation can be merged across languages. So we have used this for machine translation, semantic-based machine translation. This is what we use, MRS, minimal recursion semantics. So with this MRS, you can generate back sentences. And this is how we generated some of the answers that I just printed before. I didn't write them all down. So the extensions to what you need to have these grammars help you out. So you need to add these mild rules that I just told you about. So you go on to the field, you check what students are doing in their papers, you say, okay, these are the ungrammatical, what are the errors you're doing? Let's further implement the grammar, allow these, and remember that these are wrong. And then unfortunately, we also need to do some reductions because language is too ambiguous. So in this case, for example, we might reduce some very ambiguous lexical entries for example to flower as a verb. They don't know this, fifth graders don't need to know that. So if I can control the lexicon, I also don't need to worry about all the ambiguous that it can actually have in English but that a fifth grader wouldn't know. So I can reduce my search base when I'm trying to get, there's multiple parses of course for many of these sentences, but I get less in this case. So does this really work? Yes, the example I was just talking about has been implemented in the US. Many, many, many thousands of kids in the US have been put through these systems and they get better scores at the end of their years. So this has been made into a startup company, but the core technology is all open source. So the grammar is open source and then you can use it for whatever you want and even the mild rules are open source, you can still use them for whatever you want. And similar projects started to appear. For example, for Norwegian, you have such a grammar that you can do the same thing and then Norwegians had the idea saying, well, if I'm helping other people learn Norwegian, maybe I want to give the feedback in multiple languages. So that's what they did. So they have, if you want to learn Norwegian, you can get the feedback in Chinese, Japanese, Portuguese, whatever you want. They have maybe 20 plus languages that they give the feedback. But the problem is that language is too ambiguous still. So in the examples that you just saw, there are basically parse ranking algorithms that say, well, the most likely parse out of the ones that I have here is going to be this one and so I'm going to give the student a feedback message that is based on the guess that I thought he meant. Because as it turns out, even ungrammatical sentences are ambiguous. And this is where we come for. This is where our project is actually started from here. If you don't believe me that ungrammatical sentences are ambiguous, my question would be how would you correct the sentence on the top? That dog liked the cat happy. That is an ungrammatical sentence of English. This is the prototype what we promised to build for our project. And I could argue here that you can correct that sentence in many ways. And three of them are there basically. The creative system that exists would basically choose the most likely area made by students and that's fine. So as long as you're not the student and saying, that's not what I meant. I know how to say that. So what we promised to do is the following. So we used this machine translation technology which is a very high quality but hard to implement because it's mostly by hand. You can do some learning from the inference rules but what we want is high quality teaching quality precision. So what we promised is that we're going to use this semantic-based machine translation which we have between, for example, English and Japanese. And we're going to implement this for Chinese. And then when an English student or a Chinese student wants to try wants to learn the other language and the grammar knows that it's wrong instead of choosing the top the top most likely area that they made it's going to ask back in the original language. So in this case this is an example for Chinese native speakers learning English they just wrote that sentence. And the grammar in the system knows that something is wrong with it and instead of guessing which one it is it's going to give back in Chinese you have the glosses underneath saying there's something wrong with your sentence if you can help me tell me what you meant I can tell you exactly what your problem is. And in this case the student wants option C for example and then you can say okay so if you want object C you need to conjugate it first. Otherwise there will be other kinds of syntactic errors. So like the cat is a comparative it's completely different. So the dog like the cat is happy missing the copula and conjugation and conjugating the verb are two very very common mistakes from Chinese learners of English. So how does the system choose right now it chooses from likelihood and it gets right maybe 60-70% of the time they always correct something which may be helpful but as a language student myself if you get corrected by something you didn't mean it's very frustrating saying no I know how to say that I was trying to say something else. Okay so this is basically what I've said we're going to make this system that exists for English multilingual we do have a small grammar of Chinese not as large as the English grammar and our target audience is university students so right now we are teaming up with the Chinese teaching the department at NTU and we are making their courses the syllabus and the grammar constructions that they use the lexical that they learn we are mining all that into our grammar and grading it by you know by lessons so in the first lesson they learn 50 words in the 10th lesson they have learned maybe 500 words altogether. So with this basically we're also integrating these lexicals with wordnance because we have wordnance for all that and we would like to use wordnance to help with the machine translation so the pairwise semantic based machine translation I'm going to again gracefully skip but I can promise you that it exists it's not very popular for multiple reasons because if you want to apply it to very broad domains then you get a headache but to apply it to small domains like language teaching where you know exactly what you want to do it's much better because you get all the possible syntactic constructions given a meaning representation okay and at the same time the system that we build is going to be able to collate a very large learner corpus for Chinese learners at NTU that's going to be at the same time tree banked, sense banked, sense tagged and there is going to be a blended learning experience like experiment going on at NTU starting at the end of this year the power of semantic disambiguation this is just a little workflow of how the system would grow if you give a sentence and the system thinks there's nothing wrong with the sentence it looks grammatical and it again just pats you in the back and says move on to the next sentence but if something is wrong it will start this disambiguation process if there's no ambiguity then we'll just say okay the sentence is ungrammatical there's only one possible way of me to correct the sentence so we will just do it if there's many ways then again we'll pipe it through the machine translation algorithm and we'll ask again the student in their native language what did they meant and then after that we'll say okay if you meant that I can trace back the construction that I used to give you this translation so I know exactly what your problem was and the goal of this is to allow systems to not guess anymore basically but when you want to teach students you don't want high coverage 100% quality and not guessing so much we call it intelligent again because it's bilingual we'll have a very rich knowledge about the syntax and the semantics of both languages and we'll be able to help students correct their language skills this I've already said but basically to do this it requires knowing the curriculum very well so that is basically at the end of the point that we are right now that we know very well what is happening with the Chinese curriculum we know all the grammar structures and all the lexica that they have used in the early levels of Chinese so that we can strip down the grammar to that level basically and basically as I also mentioned before we are serving the not only NTU but we also want to make the tool more general so we also serve it for example for the official like the HSK the official Chinese examination and other textbooks that we don't use here because again the system is going to be open source it's going to be web based and if you say oh I'm using that textbook and I've been through lessons one to seven can you help me and then yes the system is going to be customizable that way saying by level saying by this textbook up to level seven after lesson seven sorry and the system will know how your Chinese is supposed to be and basically the evaluation we're doing against corpus that we are building ourselves and I mentioned before it's going to go it's going to go through a blended learning experience at NTU starting the end of this year Future work it involves a PhD that I'm actually applying for and will definitely meet so the concept will work we know it works we don't know if it scales too well for the language level because language gets more and more complicated then the implementation cost would be a little more troublesome so you need a little more time so as levels go up you need a little more time to get there but what I would also mostly like to work is on gamification and user modeling this system so yeah but the example I showed you was not very exciting and we saw just a very nice presentation just before me that games are cool right so if you can make this into something that the students actually get points for or fight with each other or for example team up within one classroom to go against the other classrooms or between NTUN and US for example see who learns Chinese the quickest I believe that in Singapore that would work very well and we have reasons to believe that and later on which is not on the agenda right now but I would like also to turn this into a standalone app meaning that we are developing and we only promise to do this that is a support for in classroom system so that the teacher knows exactly what is wrong with the student what construction are they failing what lexical units they have failed but there is nothing stopping us from making this for a full self-based app where you don't need a teacher anymore and in some cases that would be fun and basically this is what I just said I'm not going to summarize but my hope was that you thought that what I just presented is cool and I can promise you that it exists basically and what I have to acknowledge is that the MOE has given us a bit of money for the next two years to show that this is possible and if this is possible then more will come and basically that's it thank you for your time it's incredible fascinating work and thank you for the insight into the digital linguistic behind all of this I think it would be an excellent idea to do the app side of things because as far as Chinese is concerned and Singapore is concerned teaching if it's simplified it's as simple as downloading an app to get your children to just learn by themselves but it fuels your research that's going on in the back I think that would be amazing Singaporeans would love to download this though when you are speaking with language teachers this is where they draw the line in helping you so much so right now we to be able to work with language teachers they don't like to feel like the rule doesn't exist anymore and I do believe that even with an app the language teacher rule is still very important so I've talked nothing about speech because speech recognition yes you can use it, it's going to be great no okay so if you want to learn Chinese without native speakers telling you to do it again you're going to have a hard time but for some skills I believe that you don't need a person yes to some extent I would like to develop the app but we're also aware that to get the help from the teaching community we need to do things right with the right place there is no replacement for the email but computers can help we have a quick question sorry is there a way to keep up to date with this project? yes you can email me for example but so we don't have anything on github yet but it will be so we're basically developing a little bit in-house but again the technology both English and Chinese grammars are completely open source and we have made it very clear to MOE that we wanted it to maintain an open source so the project is funded to be open source at the end one last question not yet we're thinking about cool names to market it around but not yet it will be open source yes yeah last one okay so one of the things I turned a little bit on this future word so I talked about scaling up to different levels of Mandarin Chinese but I also have here that I want to include other languages languages I speak and I am interested beforehand which is Japanese and for which for example there is also grammar so adding Japanese would not be too troublesome because the amount of work that you have to extend it for the amount of rules and the lexicon you still have lots of work but at least there is such a parser for Japanese if you're talking about a new language this has actually been discussed as a nice way of making grammar engineering useful at very early stages of implementation right so if you do grammar engineering or you edit it you know that to get a nice coverage that people are actually interested in your work is a pain right because saying yes I can get some neural network to do some parsing or you know some probabilistic grammar to get me what I want so what is the benefit of starting a handmade computational grammar this could be one of those so we're working a group it's called Delfin and there's many people use the same technology to develop grammars that look the same that produce the same semantics and that can be very easily stringed in these kinds of pipelines so if you would you would want to do such a thing for minority language then join join Delfin for example started grammar for that language and you can very soon again the difference of size between the English grammar and the Chinese grammar is huge but regardless the Chinese grammar has maybe three years in development but is still sizable enough to do this very cool project so for Singlet then come to NTU where are you we can talk about that Singlet should be amazing though thank you again