 So, I would want, I swear, I swear I'm the same guy as the picture at the beginning. That's what the kids need to do to you. So, you see, this is an updated version of the tutorial we did in Novus, in East New. It's a conference for music information retrieval, so we thought that it would be very important to have, like, a kind of a, let's say, beginner-friendly introduction to an MP that we didn't know we would be fine. So, a lot of concepts that you see here, many of you in the audience may already know, but we hope that still it's entertaining for you. Okay, so, I'm going to start with, yeah, a loose definition of what is natural language processing and breaking it down into two types of tasks, what we call the core tasks in an MP, so any type of intelligent text processing that you have to do automatically, at least it's going to rely on some, if not all, of these core tasks. And then the applications are these intelligent things that you can do based on these core tasks. Then, as Jose said, since we work a lot on semantics, we rely a lot on knowledge repository, so digital places where you put knowledge and then you can use them for improving artificial intelligence tasks, inference, reasoning, or in this case, music information retrieval, and at the end a lot of resources, references, et cetera. So, let's start with the definition. So, an MP is a field of computer science and artificial intelligence concerned with the interaction between computers and human natural language. It's believed that Antonio's favorite computer machinery intelligence was the first proper MP vapor. It stated that the computer could be considered intelligent if it could carry on a conversation with a human being without the human realizing that he or she was already talking to a machine. From there, insulin evolved a lot. Realistically, when we speak about natural language processing, in 90% plus of the cases we speak about English processing because it's the language of research, commerce, entertainment, et cetera, et cetera. But this reality in terms of research and industry doesn't really need to reflect the reality of the world. In the world there are more than 7,000 languages and the language, as we all know, it's not simply a collection of words and meaning that you put together and then you sum the meaning and that's what you have, which is something that is much more complex. They bear social and cultural traces. They are chronologically sensible, like this meaning that the Spanish, Catalan, or English that we speak today has nothing to do with the language that was spoken a few hundred years back and it's going to be probably changing a lot during the years. And these particularities of the language is something that for automatic systems is difficult to cope with. But this is what makes language beautiful, right? So this is why we have figurative language. We have a lot of fun with language, but it's also because this is the reason also why language has been not treated with a lot of success in many aspects up to now in artificial intelligence because the meaning is not explicitly stated in words. And in fact, if you look at this quote but this is from a Wired article in 2013, they said the future for the most, in the future the most useful data will be the kind that was too unstructured to be used in the past. So processing language actually has a lot of value, but it has value because it's difficult. It was easy, it would have been a sort of problem and it's far from being a sort of problem today. And still we still use NLP every day even if we don't realize it's present in web searches, speech recognition and synthesis, automatic summarizing the word, product including music recommendation, machine translation, et cetera, et cetera. So there's a very classic book by those who study NLP, by Jolaskia and Martin, it's called Speech and Language Processing and they said that even if it comes as a surprise to many doing some kind of automatic processing in text, at some point you have to resolve an ambiguity of any kind. Morphological ambiguity, lexical, syntactic, discourse, communicative, ambiguity is the key of why natural language is difficult. And well, considering the audience and considering the context in which this talk is being given, if you read this sentence, I expect most of you to picture all this kind of representation in your head because there are certain non-touchable characteristics of our communicative context that lead you to read that. But actually if you only read the sentence there's absolutely no way that you would understand something completely different. And the reason for this is what we call lexical ambiguity. So we have words that have several meanings and an intelligent system that's to go by word and say, okay, this word is ambiguous but what am I going to do with this? And another important aspect in natural language processing is that it's not that large, uniform task. You don't think, all right, so it's time for research so I'm going to do an NLP paper in general. That doesn't work that way because as I said, processing language is difficult enough so we follow the divide and conquer approach. You can break it down in smaller and more tractable programs and then you try to do the data set and you may add that specific sub-task in NLP which then you can, you know, apply to other tasks. There's another quote that I'd like to read is from Collovert and colleagues from 2011. That's the reality today is pretty much the same. They said, will a computer program ever be able to convert a piece of English text into a programmer-friendly data structure that describes the meaning of the natural language text? Unfortunately, no consensus has emerged about the form or even the existence of such a data structure. So if we don't even agree in the community in the end of the community whether there should exist a data structure or a formal way to represent language as a whole, it makes sense to continue breaking down and using smaller tasks. So we're going to start with the, what we would call CORE, Natural Language Processing Task and the first one, the different piece. Again, I mean, this could be called a custom list of very important tasks in NLP selected by Luis. This is no universal truth, but I believe that most researchers in the area would agree with me maybe take one or add one, but in general it's more the kind of piece. So in part of the design, what we do is to resolve this ambiguity in terms of what you have to know, the grammatical category of each of the words appearing in your text. In this way, what you can apply this information to further downstream applications, for example, in parsing that you can know who is the subject, who is performing an action, and who is receiving that action, or you can do other things which are based on this information. So in this case, for example, in the sentence I like this music, it's like being alive for a second. We have two occurrences of the word like, but one of them is a verb, the other one is a position. So by having this part of the speech, we know that the first like is different from the second, and for us, since we are going to work with statistics, that's what we have. As I said, who is the subject and who is the recipient of the action? So this now we come to a little bit more of one level up in the scale of linguistic description and we're going to speak about syntax. So if you look at the sentence, this is a job for us to start with and some people don't find it funny that it's a job. So one morning I shot an elephant in my pajamas. How he got into my pajamas, I'll never know. So the reason why this is funny is because the author is playing with the fact that you don't really know in my pajamas if this is a prepositional phrase but on paper you don't really know whether it's modifying elephant or it's modifying eye. There's nothing explicitly in the sentence telling you what. You know because you expect a person to be wearing a pajamas and that's where the job can seem. So he says no, it was the elephant that was wearing a pajamas. So this is kind of ambiguity. It's called syntactic ambiguity or syntactic trees where you can then know which words are more important than others because they get more important words to appear at the top of the syntactic tree and lesser words we could say appear at the bottom so prepositions, conjunctions or other types of modifiers. And there are two main paradigms for syntactic parsing in LP. One of them is constituency parsing. The main idea is that you can draw the syntax of a sentence by means of drawing words in terms of as-knowns and you also have super-knowns, phrasal-knowns that subsume other words. So you have a super-known here which is a verb phrase that is containing a whole-known phrase like shock and elephant. Realistically in natural language processing we prefer that this tendency is going on the side of the tendency parsing where you essentially get rid of phrasal-knowns so there are no nodes that subsume others in terms of a phrase. Each node is a word. Relations are by lexical so there's one relation one-to-one and the idea is more or less the same so you have important words at the top of the syntactic tree and at the bottom modifiers and this is extremely difficult from the computational side but also from the linguistic side to do so I think it's enough to know this stage. Then one step further in parsing is what we call semantic parsing. This is mostly used to treat verbs and I'm going to give you an example from Roba which is a fairly-known example in this sense so the idea is that for every verb and for every sense of that verb you expect arguments as a kind of slots that have to be filled by the information that appears in the sentence so this is giving you a little bit more of information than the syntax the regular syntax which is in earlier than the morphology. In the sentence Mary left the room to be the entity who is living and the argument one would be the place that is being left so this sense of living expects two arguments but in the sentence Mary left her daughter her parents the sense of living has another connotation so here the arguments are in number they are different but they are also different so this sense of living expects a giver a thing which is given another officially. Then another important task in the NP is the identity recognition so not everything can have a grammatical category those things can be proper nouns traditionally the idea was to work with generic types the identity recognition is taking a text as input and go word by word and identify the offset, the snippet of text that is referring to an identity and traditionally these identities were well defined as persons, locations or organizations and in some cases currency, dollars euros, etc. and dates and that was it but this is a very flexible task so you can actually formulate it a little bit and say okay we need the identity recognition in the music domain and I look specifically into bands music genres, RPs or other types of entities that may be useful then for another application in which I want to use this information. Another example of a very important task at the discourse level in natural language processing is for reference resolution so your algorithm not only has to read the text word by word and you know drawing knowing the morphology and the syntax and the semantics and knowing whether they are named entities or not but also kind of keeping track of mentions to entities that have been already mentioned which is called anaphora and here in the sentence I voted for Nader because he was most aligned with my values she said your algorithm has to know that she, Mike and I are referring to the same entity because in the third mention there might appear a very important bit of information that you don't want to miss and you can't expect every mention to an entity to be explicitly stated and then Nader and he another entity so this is a we are speaking about difficult tasks there is one that is one of the most I would say then you have work sense disambiguation this is the same case as the this is the same case as the music metal fan the performance of that bus was outstanding so you have bus as the musical sense and bus the fish sense now your work sense disambiguation system should look at a word that is ambiguous and then it has to leverage it has to exploit some kind of contextual information like we do to decide this seems to be the music sense if you only simply by looking at the sentence we see that performance and later you know we have a few words that if we knew what these were we will probably say ok this seems to be a sentence where bus has the music sense and not the and not the fish ok so now this should cover most of the core tasks in LLP and then you can use the information you get from here to do the very nice intelligence applications like for example summarization summarization is basically reducing the size of a document but keeping the most important information in LLP we work with two types of summaries extract the summarization where we simply select the most important sentences in a document or in a corpus and we put them together in another document after getting rid of the redundancy of things that are important. The extract summarization has one additional module on generation so not only do you select important information but then to make everything more readable you have another language generation module that puts together sentences as core references and makes it everything more natural and more fluent to the reader. Summarization comes in multi-document summarization so you take a bunch of documents as input and output one single document with the important information across all of them or single-document summarization where you have one document that you generate you derive as author profiling is another very interesting task which is revealing demographic traces behind the writers of a message also known as digital text forensics so give me a text as input and my algorithm might say whether the mother language of this writer of English text was English, Spanish or that this is actually a bit of the format from a challenge in author profiling so this is what is expecting from the systems to also predict the age group of the writer of the text or whether he was a major or a female. This has clear applications in plagiarism detection, cybersecurity so industrial industry wise this is a very nice way to work in because there is a lot of interest by public institutions machine translation is like the text of the natural language processing text sometimes when you are asked what do you do you teach computers how to speak and how to understand you and how to explain that what you say machine translation pretty much everybody understands what you are doing and the task in fact is very simple from a text in language one you have to derive a text in language two preserving the meaning and this is what anything can be done. It was originally approached as a rule based task but today's statistical approach is not clear we have taken over still Aperture is one of the best known rule based machine translation so I thought it was worth mentioning it at least for for statistical machine translation you need parallel corpora so you need a statistical model that looks at a bunch of parallel sentences parallel texts and learns what is a kind of transformation or translation or however you want to call it from language one to language the problem we have a lot of data so this is good but it comes with a lot of challenges for example sentence or word or phrase alignment so there are many cases where languages are not in the same order so it's difficult to align for example the English and Arabic statistical anomaly so in the language there may be things that are very rare but still very important so you have to account for that so the statistical model has to be sensitive to things that are important idioms, collocations, collocal compounds multi-word expressions also different word orders or out of vocabulary words so what would a statistical machine translation system do with the word selfie when it was starting to be used so what do you do with things that is the first time I see them so you have to account for that and finally sentiment analysis sentiment analysis as it now says is a computational study of assembly so you want to know given a piece of text what is the polarity of that text what is the stance of the author if the author is happy or sad or angry or intrigued about a certain piece of text usually this is done for music for product reviews in social networks and this is another task with a very strong applications in the industry because me as a company I'm interested in knowing whether people are speaking well or bad in social networks or as an investor I want to know the probability of a certain stock or a certain company to do well in the stock market and maybe with information about how it fares with the community I can get a little bit of an insight from there of course it's very difficult like everything that has to do with language this can be go read the book this can be an extremely positive review for a book and an extremely negative review for the movie which was based on the book so if you think about it not changing one single character we have two opposite scenarios so what do you do you have to look at the context you have to consider the profile of the user here is where we as practitioners have to be imaginative and yeah, I could have it on and on but I think that's fine for having more of this an idea of what once you break down natural language processing then what are you expecting to do with that so you can perform any of these tasks and one of the resources of one of the ways that you can improve your natural language processing adventures or even this can be a line of research on it so it's working with knowledge repositories or knowledge bases there are renomages that need to be fuzzy so it's not that knowledge base has been used as a term to define databases and knowledge repositories along the time but let's just say that as a graph like data structure where nodes are entities and edges are between them and called relations relations can be semantic, they can be ontology like Barack Obama is president of what was president of United States and so on this is an example that has been used forever so even so anyway KBs are essential for a gap because they can provide you with a ground of knowledge that then your system can take advantage of to maybe be used in web search engines like a classic case and they can be constructed manually or automatically and our preference is to do algorithms to do them automatically especially because there are fields in science and culture that evolve so fast that it's not feasible to have knowledge engineers updating them manually so if you have an algorithm that is able to read what is going on and update the knowledge base automatically you're saving a lot of time and probably having at least as good your results as you would have with human engineers types of knowledge bases as I said this is a loose definition like custom for knowledge bases so I included wordnet as one and those who know wordnet is an flexible database so strictly speaking it's not a knowledge base but we can make it with that definition SNOMED is another medical terminology for in the medical domain there are integrative projects so systems whose aim is to put together in one single resource automatically manually constructed resources of the best known systems at least to my knowledge is BaylorNet which was originally a mapping between Wikipedia and wordnet but now includes medical and specific terminologies and other resources, wiki data, omega wiki, dictionary, etc just as a side note there's a whole line of research in artificial intelligence called Open Information Extraction this is machine reading you have a system that goes online and that's facts and whenever these facts are being on a certain threshold they are incorporated to the knowledge base NEL from Carnegie Mellon University, Patty Weiss, NEL, Desi or maybe UniFi for some examples NEL even has a Twitter automatic user that throws away facts I read this, do you agree with this should I incorporate this to the knowledge base and depending on what people reply to this tweet then both there's a system where the the system is being running 24-7 since 2010 and yeah, it's kind of a reference work in Open Information Extraction and there are music knowledge bases of course, not many but there are a few music brains and disco GS are to well-known resources they are open to psycho-media data and importantly also for the sake of this presentation I'm very proud to be published as in data by the NEL project there's a group of music online for psycho-media although in this case the information is more inclined towards the scholarly information and there's also a flamenco music knowledge base that was developed in the MTG and I guess that's it, so until here a big picture without too much detail about natural language processing we've left a lot of links to software tools, libraries, Python references so yeah I don't think we should take questions like throughout the presentation if you guys have questions maybe before the break or at the end