 information instruction, that is one of the applications that Luis was playing before, and it's information instruction, or in the meta-analysis, information instruction is a very important application in NLP, so we give it some section. So, information instruction is the task to automatically transform unstructured information into unstructured information. So, from the server to order it make the computer able to understand what is right, written. So, for example, this sentence, Haydn here was written by Wilco Frondmann, yesterday, so this is an instructor text. We have the center, so there are different methods to be more structured to the text. So, first, entity identification. So, you define the different entities that are mentioned in the text. Haydn here, Wilco and entity are entities, okay? The entity recognition is, okay, which kind of entities are there? So, Haydn here is a word of art, work of art, Wilco is an organization, if it is a person, so, okay, this is the definition, but the problem here is, for example, Wilco, if we look for Wilco in Wikipedia, there are different entries, like Wilco de Ban, Wilco de Albu, Wilco de Tree, so different, so this is a problem for me to solve another step in this recognition of entities is to disambiguate them. So, this is called entity linking or entity disambiguation, and the idea is, okay, we identify entities, we recognize it, and we also link to another interface where these entities are uniquely defined. For example, Haydn here is defined in music range and it has a unique URI where it's this song from Wilco that is called entity here. Wilco is defined in Wikipedia, and it's a page, and though it is in Wikipedia and Valvernet, and the activity is also in Wikipedia, so in this way we properly disambiguate the libretome method. So, this entity linking is like the building block of most of the work we are going to be planning in this tutorial. So, another step in information extraction is called relational structure, that means we have several entities in a sentence and we want to extract the relation between the entities, what is written in the sentence. Here it's very straightforward with the relation, but so Haydn here was written by F3D, this is a relation, and F3D is the problem of Wilco. This is another relation that can be extracted from this sentence. So, this is an extractor, and this is an extractor. This is good for computation and computing, this is what you can just call the words for. So, entity linking, entity linking, as I said, is the task to associate a candidate text to an entry in a knowledge base. Typically entity linking systems work with Wikipedia or Wikipedia, or even with Yahoo! Freemakes, so there are systems that you give a text, and it returns you the annotation in the text and the entries in these knowledge bases. How are entity linking algorithm works? So, it is splitting two steps. First, to the candidate selection, it's looking at the text, what are the possible candidates to be an entity, and then reference the equation as, okay, for this candidate, what is the best match in the knowledge base? A typical entity linking system returns the matching entry, or if it doesn't exist in the knowledge base, it returns new. But the thing is that most of the systems that are there to be used have the closed word assumption, that means that they always return something. They make assumption that everything is in the knowledge base. So entity linking is to handle with three kind of problems. First is name variations. So a same entity may have different nationalization. For example, Elvis, Elvis Presley, the king of rock and roll, all of these refer to the same entity. This is the variation. Entity and rigidity as the same text snippet may refer to different entities. For example, prints, maybe the artist, or maybe just the prince, the son of the king. The boot may be the album of Björk, or it might be just a boot album. But it's the father, the son, or the structure son, you know, the structure son, so this is an B with the N. Also, the library is missing entity, so everything is not in the knowledge base, and in the case of music, more. So for example, super topic is not in Wikipedia. This is a super bad library. Okay, so Wikipedia is not perfect, so... Okay. Why don't you make the article yours? You shouldn't submit your own articles. If you want. So these tools are the... There are many editing tools available, and with these three tools we have worked the most. It's very interesting. It maps entities to bubble net. It's another space that we explained before. Bubble net you have Wikipedia, and you also have WorldNet. So BubbleFind is able to do two things. Do entity linking, and also works as an aggregation. So it can find named entities in Wikipedia, and definition of words defines this aggregation to WorldNet. The thing is, you have an IUPK limit, and you have to deal with that. That means another series that do this aggregation of entities to Wikipedia works pretty well, and Wikipedia's point light that works like the original entity linking system that works with Wikipedia. Wikipedia is a structured way of form of Wikipedia, so everything in Wikipedia is Wikipedia. And in Wikipedia's point light, the good thing is that you can install locally and run with Wikipedia. Relation extraction. Relation extraction is another very interesting problem where the idea is to detect and extract relations from the text. These relations are between artifacts. These artifacts can be entities, normal phrases, chants, different things. And there are many variations, variants for relation extraction. It depends on the real supervision. It's used for the system, or if the semantic relations are defined a priori or not, if they have to extract any relations or bind any relations. So there are, for many years, a lot of exceptions to that, and it's not a solved problem. So the typical features, this system uses all kinds of linguistic features, from morphological to semantic, syntactic and entities for many variations. So let's illustrate the evolution and the different systems for relation extraction that there have been. So the first approach for relation extraction was supervised evidence. So you have a large corpus of unstructured text. You want to extract the relations in these corpus. You have a set of semantic relations defined a priori, and you have a very big label train data with the relations and things. So you try to understand with the features sent from the text, and the system is able to output another trace. That is a collection of triplets with the entity relation entity. So this is an ideal work. It's super nice if you have this big label train data. This is not always the case, and it costs a lot of money and time to annotate relations in text. So there are other approaches, like the semi-supervised learning approach, where instead of having a big corpus of annotated data, you have some high precision seeds. You train with that, you get some results, and you would serve your system using that output to continue the training. So this is the semi-supervised approach, and then there is the unsupervised approach. The unsupervised approach, you don't need nothing annotated, and even you don't need a set of semantic relations you want to extract. You just give a set of documents, and the system excites all the possible relations that there are. So the output is the knowledge base and also the semantic relations, and the final theory. So if we plug all these systems in this graph, so we have the real supervision and the real semantic information that is used by the system, we get another picture of the set of the app. So traditional information extraction is the one I explained before, fully supervised and doesn't use semantic information at all. Then the semi-supervised approaches are different with supervision, that only use this bootstrapping approach without using semi-information. Self-supervision, that is the never-ending learning that Luis told before, that has an ontology, I use the ontology in this process of learning, and it's able to also learn new relations in these ontologies, so there is a definition of semantic relations, and there are other relations that are learning. And then the distance-permission approach that use a knowledge base as a training. So the thing is you have a set of documents and you have a knowledge base, and you look in the sentences where entities that are in a relation in the knowledge base, occur, and then you use that, so you create your training label set. It's like you make a fictional training set and it works pretty well with this kind of system. And then appear this open-information instruction paradigm where you don't have to label any documents, you don't train, you just extract everything, and there is a system library that is very well known, that's it, that's it very well. The problem with that is that the accuracy of the results are, well, you get a lot of things, but many of these things have no sense. So there is another approach in the late years and that is like mixed semantics and open-information instruction. There are systems like Patti, the IE that does this. And so we have focused our research in this last one. So we mixed semantics information with relation instruction to build music knowledge bases. So here there is a link to open-information about the state of the art and all of this stuff that we take from the library that you own in the library. So, okay, this is the idea. And if we give plus open-information instruction, what are the advantages of that? We are not restricted to a set of pretty fine relations so this means that we don't have to defend, we want to extract relations like is composed by, this is a relation, for example, a priori note. If we find a sentence that says let me do what's composed by this, so we get this relation. What's composed by this is our relation. So we don't define a priori what relation we want to extract. We get the entire thing because it's open-information instruction. It's a supervised. The use of semantics information reduce the precision of the open-information instruction. We have a different method where we talk about this in the music domain. So all these use this information is able to reduce the precision. So, okay, this is the approach. I will be briefly on that. So you have a set of documents. You apply entity linking. You apply syntactic analysis to this dependency person analysis. And then we mix these two things. We integrate them and then do a process of filtering. And at the end, we get the analysis. So this is for the sentence that something before entity was written by Wilco Frondmann, F3D. This is the dependency person treat of this sentence. So we have also entity linking applied so we know that entity here is an entity and is an entity of type song. Wilco is an entity of type band and F3D is an entity of type artist. So what we do is we collapse the notes in the dependency treat that belongs to the same entity. So this notes all entity here. So we put like a single note. The F3D we put as a single note. And then we can find the path between the entities in the dependency graph. F3D was written by Frondmann the treaty or F3D is the Frondmann Wilco. So then the path. This is the idea. Here you have more reference for the different approaches for relation instruction in case you want to go into that. And here are some links to the tools for doing relation instruction.