 So welcome, everyone, to the new meeting of this integrative. So we are getting quite integrated, I guess. So that's very good. And so it's my pleasure to introduce Leo Vanne today. So a colleague actually was finding out about the life of Leo right now. So Leo got his PhD in the University of Sarland, actually relatively late, so in 97. So it seems that he had a lot of fun before he got the PhD, and he has been around in Barcelona here at the Pompafara since 2005, sort of as the founder and directing the group of natural language. So it has specifically more focus normally on the natural language generation, but I will talk about a number of issues I have to do with natural language processing today. So without further ado, so Leo, thank you. OK, thank you very much for the introduction, Hector. So hello, everybody. Thanks for coming. So the title might look a little bit pretentious, but OK, so what it's supposed to suggest is that natural language processing is nowadays much more popular than it used to be in former times. So but you will agree that I couldn't put such a prosaic statement as a title, so I look for a more fancy one. So what I'd like to talk about today, so that present the research activities of our group. So then very briefly sketch how our activities fit into larger applications and then finally conclude with stating what natural language processing is in general good for. OK, so natural language processing, not all of you might be very familiar with natural language processing, so what is it about? So at the very general level, it's a very simple thing. So what we are doing is to look for the definition of a function which maps an element of the domain A to an element or a subset of elements of the code domain B. So that's far, but obviously things get a little bit more complicated if we look into A and we look into B. So what can be A? A can be first, can be an audio signal if we're talking about speech. So it can be a text. It can be a whole pile of texts. And it can be a formal representation. Here we see a more linguistically oriented formal representation or it can be an ontological representation. So a very abstract representation which is very popular nowadays in semantic web publications. So if we look at B, so then again it can be speech. It can be a text. It can be again a formal representation, a linguistically oriented formal representation or again an ontological representation, but it can be also something really very different. It can be, for example, only two classes male-female if we are looking for the identification of the gender of the author of the text. Or it can be minus plus if we are looking for the general sentiment of a text, now whether it has a positive attitude, whether it has a negative attitude. Or it can be, well, whatever we can think of, a sequence of texts, a sequence of a set of classes. OK, so let's now group this a little bit and assume that we have as input natural language and output a formal presentation. So then if we have speech and the output is a sequence of prosodic contours. So then what we are doing is prosody analysis. Now if we have a text or a pile of texts and we get out linguistic representation, a syntactic structure, let's say, so then we are doing parsing. And we are doing a natural language understanding if the result is an ontological representation. OK, so if we get a sequence of morphological texts, so then, obviously, we are doing morphological tagging or morphological analysis. And if the classes or texts look a different way, so then we are doing probably lexical analysis. OK, and if our output is a sequence of emotional, let's say, text, so then we are doing sentiment analysis. So if we are with minus or plus or whatever, so then maybe we are looking for the detection of a sarcastic piece or a satiric piece or whatever. And male or female will already have this. This might be about author agenda recognition. OK, so this doesn't work properly. So if we have as input natural language and as output, we have natural language as well. So then what we are doing is document summarization or text summarization can be multiple document. If we look at those piles of texts, can be paraphrasing or simplification of texts. And it can be also machine translation. So we get natural language and we put out natural language. And if we have at both sides speech, then we do, for example, procedure translation. Obviously, we can also imagine here speech as input and text as output if we are doing, for example, language transcription. OK, and now then if we have formal representation as input and natural language as output, so then what we do is a multilingual, maybe, or monolingual text generation from a formal representation. So again, this representation can be closer to the language surface or further away from the language surface. OK, so what we are doing from all of this, so basically everything, or quite a lot. So our research topics in the group can be grouped into two fields or two areas, two broad areas. So that's natural language analysis. And this is prosody analysis, parsing, understanding of written spoken, discourse, lexical analysis. And here we're also looking into second language learning context, as we'll see later. So then a very important aspect is the meta-analysis of written discourse. And here, several members of the group are in the sentiment analysis, figurative language recognition, and author profiler. Natural language production here, so that's basically the more traditional areas of the group, as Héctor mentioned. So we do content to text, content to speech, and more recently synchronization with mimics, speech synchronization with mimics with the lip movements, as we will see. And then text to text generation, that is summarization, paraphrase, and simplification. And prosody translation is, again, more recent field we're working in. OK, so apart from our research topics, we have a couple of application areas. So we are working on the first one and the oldest one, that's patent material processing. There we are already, for 10 years or so, into summarization of patents, analysis of patents, and the representation of the content and visualization. And here we are collaborating with a number of companies, with the European patent office, and with other institutions. So the other application area, that's text simplification. This is mainly Orazios application area. And there we are developing portals for end users for text simplification. And the last one, that's language learning. And here we are focusing on what we call collocation checkers for learners. And we'll see in a couple of minutes, we'll see what we mean by collocation and what we mean by collocation checker. OK, so well, in order to put a handle on this function F, I mentioned at the beginning, and all these different topics, it's very useful to look at the level of processing within each topic. And here we have three different levels, again, that's sentence or utterance. So then we have text, document, and we have a text collection, or document collection. And each of these levels has, obviously, a representation model, a processing model associated with it. And at the sentence level, so we work with the dependency language model. And here at the text level, that's a lexical distribution model. And this core structure model, we'll see this in a minute. And then text collection, document collection. So that's, again, basically lexical distribution. So in my talk, I'll focus mainly on these two different models. OK, so now a little bit more details on the processing model. Before but then, we'll look into how we do it. So well, so this is our sentence representation model. So as you see, we have quite a few layers there. So the first layer, that's phonology, surface phonology, that's basically speech. So then deep phonology, what we have here right now, so these are all features we need for processing analysis and processing generation. Then we have topology, morphology. Actually, these are, again, two sub-layers, surface morphology and deep morphology. But it's not important at this stage here. So you see that here, we get some text which identified John as proper now I don't think as an inflected verb, et cetera, et cetera. So then we have surface syntax, which is a syntactic structure, surface-oriented syntactic structure. And we know these labels, no subject, direct object, et cetera, back from school, a long time ago. So we learned about subject object. And here, we are recycled those notions. At the deep syntactic level, we have already more abstract representation, but it's still syntactic. And here, we go down then towards semantics and deep semantics. And here, below, that's the ontological representation of assignments. So in this modeling itself, that's already a research topic. And here, so I showed the pictures of the people here, so sitting over there, people mainly involved in these respective research topics. And here, from time to time, I mentioned some reference publications. I didn't have the space to spell them out, but if you are interested. So then please let me know once, and I will give you the complete reference and the publication itself, no? OK, so it's Simon Mill, Alicia Borga, who are dealing with this part here, and Monica Dominguez, who takes care of the phonological stuff. OK, so now we have the descriptions, the layers. So what we are still away from our function F. Now, so what basically what we do is to see how we can, let's say, establish correspondences between different layers of the representation at the sentence level. So that is, if we, for example, do a procedure analysis, so what the whole story is about, so this is to map, to find an accurate mapping between this representation and what we want to get here. If we are doing parsing, so what we want to, then we have a written sentence here, so basically John thinks that his professor has been witched. And this example, and then we go down either to this layer or to this layer, we can do it step by step. So from one layer to another layer, or we can skip a couple of layers and then go directly from the surface, for example, to deep syntax. So that depends on our approach. And also, if we can stay at the same layer, if we are doing, for example, paraphrasing, so there is no need to go from one layer to another. So what we are doing here to find a mapping between two, let's say equivalent or nearly equivalent representations. And so this is the whole story. And in more, let's say, processing terms, what this means, that what we're doing, it's classification. And what we're doing is graph transduction. So these two, let's say, instruments are used to do the whole processing at the sentence level. Why classification? So if we go back, so then if we take this layer here, we want to get there. So what we do is to classify the subject relation, for example, in terms of the relations which the subject can be mapped to at this layer. So we take the direct object and we want to classify how this translates into this layer. And graph transduction, that's even more straightforward. So what we are doing is to define, to develop graph grammars which are capable of mapping all representations at one layer to the representation, to a representation, a correct representation at the next adjacent layer. And this is the whole story. So classification and graph transduction. OK, so let me now very briefly go through the different research topics at the sentence level and explain how we're doing it, more or less. So process analysis. So the goal of process analysis is to derive from the acoustic signal to derive what we call prosodic phrases, prosodic words, and prosodic contours, and so what we see here. So it's not important, let's say, I will not go into details here what the individual symbols mean, so I think it's not important there. But what is important and what is novel here in our research is that we use, in order to derive the prosodic contour, so how we speak, how expressive our speech is, that we draw upon acoustic features that's in 10-CP generation and, at the same time, linguistic features. I think for the time being, or in the state of the art, what people do is focus either only on acoustic features or take some of the linguistic features, first of all, syntax and some lexical features into account. And this, let's say, hierarchical representation here of the prosodic structure allows us to be much more accurate. OK, and so for this, we use supervised machine learning. So for the time being, we experimented with several classifiers from the WECA environment. And let's see, so whether we can talk to machine learning people here at the department in order to get something more advanced and better. OK, and the members of the group involved here in this research that's Monica Dominguez, that's her PhD thesis, Maria Farrous and Alicia Borga, and the reference publications here are two publications and the speech prosody last year. OK, and prosody generation, just the other way around. So we start from a syntactic structure. We add what we call communicative structure, that is information structure, theme ring, what we are talking about, what we are saying, what we want to emphasize, et cetera, et cetera. This comes from, let's say, deeper levels. And we map then this onto prosodic contours and acoustic features. And as I said, so here at this layer, we have only linguistic features available. And OK, again, so the same story. So we use classifications for this. OK, now, so something I mentioned that we do it quite recently. So the idea is to, and this, since we are working together with Joseph's group and with Xavier Benifas group in a project, so we realize that the avatars, or the agents, so how they move the lips is rather unnatural if we look closely at it. So that, in other words, that speech and link movements are not synchronized well. And obviously, it has a lot to do with phonetics. And what we do now is to develop a system which is able to take prosody parameters and facial action, point correlations, and synchronize better. So and this is John Pere Sanchez who is doing this in collaboration, as I said. Also with Joseph's people and with, first of Federico. OK, so now, syntactic parsing. If we take a sentence and we get to this structure, also then this will subject object, et cetera, so then we are doing surface syntactic parsing. OK, so we can do, again, either going through the two layers or we can develop what people usually do is to get straight onto the surface syntax, skipping, or taking one, doing it in one shot. So this is a result to be better. So usually we use a transition-based parsing structure that is we start from the left of the first word of the sentence and we go through the sentence taking word by word. In contrast to the other, main approach to parsing where we start building a syntactic tree from the root, from top down to the bottom. The advantage of the transition-based parser is that it's linear in time, and that is it's much faster than a graph-based parser. And so the trick here or the challenge, let's say, is to find an accurate model for establishing those relations between tokens. So and obviously with this, we need a relatively large training corpora annotated with those structures. And in order to learn this model, how to assign and when to assign what relation. So we use SVMs and very recently StackLSTMs. So that's a variant of our current neural networks, which work really very, very good. And basically this is Miguel Baisteros' work. He already published it together with several colleagues from the CMU where he is now. And OK, so this is the best parser we have or existing parser. So if you want or if you need to use a syntactic parser for analyzing your language, so this is the parser you should use so that the best you can get nowadays in the research community. OK, so then if we go further, so to this structure, so more abstract, here we have already fewer nodes. So then what we do is to combine surface parser with a deep parser. And we use a so-called hyponode approach. Hyponode, basically a node which consists of several nodes. Simplified why? Because the standard, let's say, statistical parsers, so they rely upon isomorphy between structures. And here we see that we cannot rely. So there is no isomorphy. So in order to provide a solution to this problem, so what we do, we pack nodes, several nodes into one hyponode. And then we simulate, let's say, isomorphy. And OK, so here Miguel is also one of the main figures and Simo. And we have published this work also in several first trade events. And again, no, so this is the only parser you can get of this kind, which gets, let's say, deep and still syntactic. OK, so from parsing, there is only one step to language analysis. And that is what we want to do here or what we are doing here is to go from deep syntax down to an ontological representation. So here and here in between we have semantics. So this is a standard predicate argument structure, nothing else. And here, deep semantics, here the linguists among you will recognize that these are, let's say, here, semantic roles used in standard resources. And here, for example, FR stands for frame net. So we use standard lexical resources like WormNet, FrameNet, BabelNet in order to map first to this layer and then to this layer. So what we do is graph transduction. And as I just mentioned, so we use our existing resources because it turned out to be very, very useful, also because, for example, BabelNet allows us to be multilingual, to get the connection between concepts of different languages. OK, and here Simone is involved, Jeralka Samayor and Stamatia. So she is responsible, first of all, for this ontological part. OK, so sentence generation. So it goes always one way and then the other way. So generation. So again, so this is basically just reverse. And here what we do, we use SVM so far and working on a deep learning approach as well, so using this hypernode strategy. OK, and if you want to look into this work, so there are, again, some reference publications. OK, and the ontology, two deep syntax. So we use graph transduction grammars. OK, so text simplification, text paraphrasing. So now we went from the top to the bottom and from the bottom to the top. Now we are, let's say, in the horizontal way. So what is it about? So we have, imagine, John thinks that his professor has been witched. So this is our example. And we know from studies, or Athea and his team, they know that passive constructions, they are more difficult to read for people with reading disabilities than active constructions. So this means that we need, first, we need to analyze to a certain extent our sentence. And then once we are, for example, at the surface and tactical layer, so what we can do is to simplify this paraphrase. So using, for example, again, graph transduction. So John thinks that somebody witched his professor. Then, OK, and then we can regenerate using the standard generation grammars in order to get the simplified sentence. But the syntactic constructions, they are only half of the story because they are also complex words. Here, we don't have anything but in principle. So it's known that complex, let's say, literate words should be avoided when texts are simplified. And for this purpose, we use corpus-based complex word detection and then substitution with simpler words. And here, so that's, as I already mentioned, that's Athea and his team working on this topic and quite a few members of the group are involved in this task. And here, we have quite a few, again, publications on this if any is interested. OK, now let's move from the sentence to text. So you remember that we had the sentence layer, then the text layer, and then we'll skip this text collection layer. So OK, I got this part of this text now from New York Times a couple of days ago. And if we read this text, and if we understand it, so then we can just looking at terms and looking at the relations between terms, we can grasp what the idea is about. That is what we, for example, the IBM initiative, the company, so IBM is a company and the company refers to IBM. It's management workforce, so it's part of the company that is part of the IBM. And company's quarterly reports got also something to do with IBM, et cetera. So that is, if we identify, as I said, all terms referring to IBM in one way or the other, and we get the relations between those terms, so then we know what the text is about. So we have an understanding of the text. But a text tells us we can get out of the text even more. So if we look, for example, at firms, companies, IBM. So we see that firms and companies, that's kind of quasi-synonyms, more or less. So IBM is a company. So then if we have design experts and professional designers. So this, again, we have here synonyms. And we have revenues, and we know that revenues got something to do with profits. And we have fields and here, data analytics, cloud computing, et cetera, et cetera. And all these fields are also businesses. So if we're able to get the relation and we are able to identify those terms and relations between them, so what we can do is to construct basically a taxonomy. So we can automatically construct a taxonomy, create large lexical resources, which can be then, again, reused in other NLP applications. And this is what we are doing here in this research topic, automatic induction of taxonomies for compilation of lexical resources. And so first, an important aspect of this work is to identify definitions. For example, we have here, Euphrates was an eminent stoic philosopher who lived above. So this is a definition. And Euphrates was a native of tier and showed great power as an orator. So this is not a definition. So what we need to be able to distinguish between those two different statements, which are, as a matter of fact, rather similar. So then, so the second, let's say, topic in this area is to retrieve relations in definitional statements. And here goes an example. Notice you have, Henri Lafontaine was a Belgian international lawyer. And a lawyer is a professional who practices law. That's another statement we get from somewhere, and professional can be any person who develops a professional activity. So then we should be able, so that's our goal, that we deduce this chain. So that we deduce that Henri Lafontaine, so that's a Belgian international lawyer. So that's a Belgian international lawyer. That's an international lawyer. And an international lawyer is a lawyer. And a lawyer is a professional. And a professional is a person. So that is, we get, now, full automatically, we get the whole, let's say, line from the bottom to the top. So and it, yeah, it's obviously, when we do, for example, deep analysis or whatever, so then we ought to use those resources to understand. And so the work goes even further. So instead of only constructing taxonomies of words, so far we have only words here, so the next step is to construct taxonomies of word senses. And here for this purpose, the colleagues are using here word embeddings. So that is, again, let's say, a strategy from deep learning. And this is Luisa Espinosa's work, mainly, and Francesco Ranzano and Horacio, they also participate under the leadership of Horacio. OK, so now, so again, so the next step, so now we're focused in this, the previous topic, we're focused on, let's say, taxonomies, specific taxonomies, specific tokens, now, but the interest goes further, so we want to capture the concept of the whole text. So that is to understand what a text is about. And so for this purpose, we use dependency parsing, driven candidate detection, what is a concept. So not any, let's say, noun is a concept. So then we identify, use rule-based strategies to identify properties and then classify, or basically typify, the identified concepts. So in here is an example so that it becomes a little bit more concrete. So if you have a battery, so then we want to know so that it's basically something, a concept called battery. If we have a naked, so a nickel-cadmium battery, so we need to understand that it's of the type nickel-cadmium or made of nickel-cadmium or whatever, so it's not just a blue battery, it's different. And if we have a first arm and a second arm, so then we need to understand which seems trivial, but in fact it's not for the machine that there are two separate arms. So and we can do something with one arm and we can do something with the other arm. And so then if we have the bottom of the vessel, so then we need to understand that it's the bottom part. That's the bottom part of the object vessel and the bottom part is, yeah, the bottom. So we need to understand this. And another thing that we need to be able to do, so let's identify different mentions of the same concept that is core reference. We need to do core reference resolution. So here that's an easier case, a charging regulator, the regulator incorporates two switches, so we need to understand that the same regulator. And here the upper part and the bottom part of the regulator alternate to each other, so we need to know, we need to be able to understand that here. So what we mean these are two parts and in the semantic representation, obviously of the statement what we need to put not each other would make sense, so we need to put the upper part and the bottom part. Okay, and then, so the novelty here is also that we extract open class relations, whatever relations they are between concepts and instead of focusing on close set of relations as is often done. Okay, and here who is working on this, that's John, John Codina, Alicia, and Sergio Cahal. Well, let's look another time at this text we had. So now we move to another topic. So here I highlight it in green, a couple of, let's say, word-call currencies. Hiring experts initiative stands out, revenues are huge, steadily declining, fall off in revenue, shaved profit, et cetera. So at the first glance, so there is nothing special about these word-call currencies. So you hire an expert and you hire them, no? So no problem, but it becomes more interesting if we look at other languages within cross-linguistically. For example, hire an expert, no? So in Spanish, you would say contratar un experto. You wouldn't say, you wouldn't use the equivalent of hire. In German, you would say einstelle. So this in Spanish, we have an equivalent, colocar aligen, but it has a very negative connotation. So you wouldn't like to translate it as einstelle and colocar aligen, no? And in French, you would say engager to engage someone. In English, you wouldn't say engage someone, no? So that is, okay, so when, and all the same, the way down, no? So that is these core currencies of word-combination, they are language-specific. So in order to generate, for example, so generate the text from an abstract representation, we need to know what kind of combination we use. And not surprising, so language learners tend to have huge problems with these combinations. Here we have two examples from Lerner corpora. Pasamos por la universidad, sorry, that's only in Spanish because we work on Spanish here. Pasamos por la universidad de Vermont, hicimos un recorreo, no? So this is someone Lerner, Native American English speaker wrote in their essay. And here, el pasado fin de semana tomamos un paseo, el tiempo de la serie de fantasias recibimos mucho sol, no? Okay, no? So here we see that and we can imagine that it's not that trivial, no? So why, for example, so here occurs tomamos un paseo, because in English you say to take a walk, no? And the learners, they tend to translate from their native language or from another, let's say L1 language to the second language, no? So that is what we want to be able is first to be able to recognize when we have those word co-occurrences, what we call collocations, and compile resources. So we have, for example, dictionaries, which can be used both for language learners and for automatic purposes. And we also want to be able to recognize the mis-co-locations and be able to correct them, no? Okay, so first collocation recognition classification, no? So first we need to identify collocations in native corpora and for this part we use normalized point-wise mutual information from information theory. It's normalized or we need to normalize it because we need to take into account the asymmetric nature of those co-occurrences. And then we use supervised machine learning, again support vector machine, that's our standard vehicle, to classify, no? So in here we see how the system performs right now, so with cafe, no? So tomar, beber, apurar, no? So this means that, well, the same color means that they are basically considered to be the same, to express the same semantics, no? Okay, and in theory what we want to be able to say, okay, to take a walk and to make a suggestion that's basically the same semantics of take and make in this case, in this context, no? Because take and make basically avoid of any semantics, no? So they just signal, not of any semantics, they have an absolute semantics of, let's say, to perform something, no? And we want to be able to say that heavy storm and huge revenue are also of the same class. And, okay, so who's involved here, that's John, that's Sara Rodriguez and Roberto Calini, and what we started to work on, which is really very promising, with Louise using word embeddings to have an example-based retrieval and classification of colocations, no? So I want to be able to say, look for any colocations that are the same in their semantics heavy storm, no? So here we do it in two steps and here we want to do it in the first step. And the first results are really very, very promising and very exciting, no? So we'll submit it a publication on this and yeah, I'm very happy about this work. So then the miscollocation, recognition and classification. So yeah, so first we need to identify what is wrong about something, so we identify miscollocations using distance to valid colocations we have from our reference corpora and then we classify miscollocations with respect to a given typology and order them to be able to correct them, no? Okay, so and here again that's Sarah and Roberto who are heavily involved in this. Okay, so, well, another let's say a research topic, again this text, yeah? So what is, so far we looked at this text as such, but if we're doing let's say generation, text generation, so then we need to look behind the text. What is, what comes before the text? And basically in very abstract way, this is what is behind the text, no? So if we think, for example, of DVPD or other large scale resources, ontological resources, so this is what we have, no? And in order to generate, for example, short statement on, I don't know, so Albert Einstein or whatever, so then we need to be able to select the appropriate content from this mess, no? Okay, and what is else behind this text, no? So there's what we call discourse structure. So it's not, a text is usually not, let's say, unordered sequence of unrelated sentences. A text is a text because, no, ideally, let's say, ideally. No, so there's a red thread through the whole text. And what we want to have is kind of this, identify this structure. So we have, when we look at this text, so while its revenues are huge, the company's quarterly report, so there is a relation between those two statements, no? And here the fall off in revenue is partly intentional as the company sold off, no? That's the justification of it, no? So that is, if we use this, let's say, set of different relations, so we get a graph, or in the simple case, it's a tree, a discourse tree, no? So, and what we want to do is, when we do text planning, no? So this is first to plan what information should be communicated in a text, no? So, and this obviously depends on the goals of the writer, that is, the machine, the addressee, et cetera, et cetera, the context, in which it's written. So, and for this purpose, we use graph navigation techniques, so there are also distance metrics, no? So the reinforcement plate with reinforcement learning a lot, and also supervised learning, so annotating corpora with data structures. And then the second step is planning the discourse structure of the text that is to be generated, no? And for this purpose, we use this structure, we just saw, no? So with this, let's say, rhetorical structure theory. Okay, and so that's Jaras, Jaras Casamayor's work, his PhD is exactly on this topic. Okay, let's now get to the last branch of our activities, yeah? Okay, so that's meta-analysis of written discourse. So, look at this text. No, so that's another one. So, there was this, I think, that this went through the press, no? So, and I highlighted here some parts which are really, yeah, striking, no? So this was a publication, an article in the Onion newspaper, that's the satirical newspaper, and so, curious enough, no? So, not all readers really got the message that it's satirical, no? So, for example, there was a statement by an official from Trinidad, who welcomed, no? So, in his Twitter, welcomed the soccer cup in the US, blah, blah, blah, no? So, he took it for serious. And the Chinese press is notoriously known for taking this for seriously, no? So, they republish, no? So, the articles, et cetera, no? So, there is a need, no? To say those people, hey, this is satirical text, or this is sarcasm, whatever. So, this is about a recognition of figurative language in multilingual social media, first of all, here we see a couple of examples, which are satirical, no? And, okay, no? So, here, so, that's work being done in Horatia's team, by Francesco Barviere, Francesco Ranzano, and Horatia himself, no? So, again, a couple of reference publications. Okay, so, author profiling, no? So, if we have this, again, this text, now who has written this text, no? So, was it a man or was it a woman? How old is the author? What is the professional background, et cetera, et cetera, native, and this is John Soller, who is looking at those questions, and, okay, just give you an example here, no? So, first, a couple of studies with it, and you see here some features, use of quotes, question marks, columns, semicolons, and commas, and across language and across genders, no? So, and here, no? So, if we want to get something political out of it, we can compare Spanish and Catalan writers, no? So, and there are quite a few differences if we look into other features, no? Okay, so, integration of time and research into broad applications. I very briefly am running out of time, so I am going very briefly through this. So, one of the projects we are working on, so that's the Christina conversational agent, and here we're collaborating with GTI, with Jazeb, and with a computer vision people, and what we are doing, basically, everything what has been, or nearly everything what has been mentioned before, no? So, we are doing analysis, we are doing, let's say, semantic representation, and here we are doing then, so the generation part. So, the idea is to develop socially competent conversational agent, which speaks in Arabic Polish and Turkish, with migrants, why Arabic? Because in Spain, there are many Arabic immigrants who do not speak Spanish or Catalan or whatever, so they need to be talked to in Arabic. Why Turkish? Because in Germany, so there are many Turkish migrants and the elderly who now come to residences for elderly, they have huge problems. So, conversating with personnel. And why Polish? Because the Germans, what they do, they hire Polish caregivers for three months without caring whether they speak the language or not, and then after three months, so they didn't learn German, so they go home, back, and so on. So, there is, again, a huge social problem in this, no? So, and we are trying to get a solution for this, no? So here, so GTI is working, obviously, on their expertise domain, so that's a generation of the avatar and the computer vision people on facial and body action parameter extraction. Okay, so and here, there are quite a few people involved in this project. Okay, multi-center, so that's the goal of the multi-center is to extract information from the web and summarize this information, tune to the profile of the user, and here we are involved in content extraction, in content summarization. Okay, again, no, so quite a few people. So Dr. Inventor, no, so that's a special interesting project for us here, no, so that's, the goal is to support scientists in their creativity and to assess their research, how good actually their research is compared to the state of the art, to compare to other publications, no? So, and here, so obviously, the avatar is involved in, or covers the natural language processing part and that's Orazio's team that's working on this. iPadDoc, no, so that's about patents. So, and again, what we are doing here, obviously, taking care of the whole natural language processing chain. Okay, and the last one, so that's able to include, that's about simplification and inclusion of impaired people into let's say the whole normal society communication process, because the European legislation, for example, so states that no person must be in disadvantage because of their disabilities, et cetera. So that legally speaking, so every text which is around, whatever is published must be simplified, must be adapted to the needs of the impaired people. So, and here, so that's the processing chain. Obviously, this is a project specifically about simplification. Okay, so here are the different steps and again, that's Orazio. No, NLP in the world, my last slide. Okay, so NLP is basically central to everything, no? So, without being modest. Okay, so if we look at yeah, any application where the user is involved. So let's say problem solving decision support, nowadays you need, if you want to interact with the user, you ought to do it in natural language. So you ought to explain to the user what it is about, what the solution looks like, et cetera. You look at this computer implications, so I see a lot of potential, for example, and what ARAFA is doing, ARAFA posts, no? So for natural language, intelligent social agents, of course, information search, even if Ricardo wouldn't admit it, so I think that natural language is a huge asset for information search and special question answering, knowledge acquisition, information access, et cetera, et cetera, forensic, whatever you name it or you can discuss how we can put natural language into it. Okay, so thank you very much, sorry for being over time. And yeah, if you have any questions, so you are very welcome. Thanks. Methodology is just not good enough. So we speak to a certain extent, we speak rule-based, but it's very expensive to hire linguists who would write a complete grammar for, let's say, for Catalan. So a generation grammar, as you saw, so we have different places, so you would need to sit down and to work really to spell it out, and this is just too expensive. So there are systems, for example, machine learning systems that are still rule-based, so if we think of the famous system, so or there are companies who are really successful in rule-based machine translation, but that's niches. So it's on a large scale, you can't do it. Just partially related to that, how do you see what giants like Google and so is doing and how you get models which are useful for the research community out of that, or is this just a total different parallel chain of? No, so what Google is doing here is purely machine learning based, so as you know. So there is, we collaborate with people from Google, from a student of mine, this is now doing parsing, is one of the key persons for parsing in Google, so we're collaborating with them, and they have published their research, so we can all profit from it. So they are investing a lot of money into annotation of corpora, three banks we need for training, so this is also then publicly available, so we get something out of them. So follow up on that. Question. So this move to machine learning, and I promise also raise the fact that there is abundance of data now. Exactly. And one of the striking things is the amount of things that you can do with natural language, rather things, extract the information and so on, without understanding anything, okay? So the question is whether this move to supervise machine learning methods, in a sense, so it's taking researchers away from the goal of understanding and getting a semantic representation of natural language sentence like in text and so on, and just moving a slightly to extracting useful information from text without necessarily understanding them. So I think that it will always be useful and there will be always an add on if we understand what the whole story is about. So we didn't do any experiments with information extraction, but we did some experiments with parsing. So we took a parser and then we looked at it where it goes wrong and applying our linguistic knowledge. So we developed a multiple SVM, so just to take into account the linguistic facts and we improved quite a lot. So it's not hopeless for linguists, so that's the message. So we'll always need the understanding of the language. Yeah. Just a question here. How do you, you briefly touched on accuracy just earlier in one of your answers. How do you actually measure accuracy? So in parsing, so we have always a ground truth, a gold standard, and then we have, so in parsing, there are very concrete measures. So unlabeled attachments score, so that is in how many cases we get the relations, we attach the relations right, then labeled attachments score, precision accuracy, et cetera, et cetera. So we have a ground truth, and then we, let's say, assess. Yeah, but in general, you can never test against a ground truth, right? For an arbitrary. So obviously, so I can only say, so well, I need to select my ground truth, so adequately, so it's representative. And then I can only say here, so this is what my parser, my generator, whatever, is able to perform on the ground truth. Now, obviously, it's probably, it's an approximation. If you get something, so what we have, now, okay, now so parsing, let's say in the Christina project, parsing of spoken language, nothing to do with the ground truth, we have for written language. So it performs much poor, no? So, yeah. But in the end, in a generic case, then it's perceptual, right? If you don't have any ground truth to compare with, then it's, I mean, you can probably parameterize or give a quality parameter, but then. Yeah, if there is no ground truth, so then it's perception that's quality differentiation. Hi. I'm wondering if you're in your matching learning structure, you are applying some dimensional, or you have some dimensional problems since you are extracting too much information and you had a lot of feature, and so I wonder if you're applying some feature selection methods based on the matching learning also. So we do a lot of feature engineering. Yeah. Yeah, so that's for sure, but we don't experience any problems with feature vectors there would be too large. So, but obviously, so there is a lot of feature engineering, as I say, behind it. So some features for some tasks, some features just useless or they are even kind of impede a good performance. So, yeah. You don't have any dimensional problems. Dimension problems, no. That's a question. You said that you are working with a lot of software to machine trying to classify something and software to machine is a binary classifier. So I'm wondering if you have so high computational cost in training step. So you actually, you can do well multiple class classification with support vector machines. So the machine learning people will answer it better. So, but yes, you can actually, you can do multiple class classification with support vector machines. It's not per se, it's a binary classifier, but you can tune it to have multiple classes. Could you please elaborate a little bit about the tool, the text simplification tool that you mentioned? It seemed very interesting and inclusive. How is it being used? Is it available? Does it include those things that you mentioned like the passive and the complex for detection? Okay, so for this, I think that Orathia would be more appropriate to answer this question because it's his work on text simplification. So, another seminar, yes. Okay, the tool is being developed. So the Spanish tool is already available and the English tool is being developed and the project is still running for another year and a half. So the idea is that the libraries, the software libraries will be available for any developer to integrate into. So it will be free software and all the resources will be available. Yes, yeah. Thanks, yeah. It's terrible. Thanks so much. Thanks so much.