 So, yeah, so we're going to move forward with the, well, applying the information extraction approach that you've seen here briefly introduced by Sergio through the construction of music knowledge bases. There you go. So we'll start with the motivation. This will be more, if you're still not convinced about how difficult it is to do anything in the music domain, so a couple of more, a little bit more evidence. Then we'll present two of our, two papers from last year where we tried to, well, feel and reach the gap. That's a couple of things that we felt that had room for improvement. And then we'll briefly over a whole pipeline to build a music knowledge base from scratch. So, at this stage of the presentation, so you're already convinced that structuring information, the information agency system, is the big thing. And for music information retrieval and for musicology, what we're making sense of what people say about music has a very high potential. You keep, not only for obtaining knowledge automatically, but for asking complex questions, the type of semantically difficult question that you might not even think about asking a surgeon, like how many German past players performed in the Madison Square Garden in the 80s. This is something that requires some kind of semantic information and for these ontologies or knowledge bases can be useful for you to get a relevant answer to that question. And, of course, for musicology, improving visualizations and personalization. So, as I just said, generic knowledge repositories are the components of music. And there are problems like, for example, only well-known popular artists, obviously, are there, but others, like his man or not. There's, of course, a bit of bias towards Western music and there is clearly bias in terms of corporate bias. So, you'll have editorial and a bit of biographical information, but you don't really have the actual meat of what people are saying about certain bands in these structure repositories. So, our claim is that this can actually improve, if you incorporate knowledge repositories about music, information that we found that we find in these structure texts. From artist biographies in Last Defend, these trivial websites like TV Troves or Song Facts where you'll have a lot of facts about artist songs which are not necessarily contained in most cases, they are not in reference knowledge basis or even in media articles. So, it doesn't have to be specifically about music, it can be a generic piece of text with information about music. As we say, no, when you have a, so this is very nice, but clearly, the bottleneck is at the very beginning of the pipeline. So, you have to recognize the entities and then you have to know which of them are actually belonging to the music domain and which of them are relevant to your preference knowledge repository because some of them are entities that may be already there. So, you can look up and see if there is something. If you want to incorporate novel entities, then before it becomes a little bit more difficult. Traditionally, you may use gas figures. So, looking at this information either in a list or in a knowledge base, but yeah, they work in videos in practical and ambiguous cases like the symphony number 90 minor but whenever there is variation, they fail. Again, the same mention may refer to different music identities like Kathmandi opera, even in the music domain, Kathmandi opera and Kathmandi opera's main characters. So, this is actually a fairly difficult task. There's variability between the world of stones under Statenic Majesties, of course, music identities with common nouns. Madonna could be the artist or representation of Mary, but even things like the hook. You know, in the early days of what searches, there was this joke that you search, I don't know if it was on the vista or inside, or any of these, the hook. I said, your search did not return any match for there. Your search did not return any match for hook. Try a less common word, zero matches. This doesn't happen today, but it used to happen in the time. All right, so, I'm an especially artist-ness that's shortened in casual language and casual language is the type of language that at some point you are going to address. Album, artist-ness, may be the same, especially in the new albums. And for all these reasons, there is software for entity naming. They, well, they are what we have. In the music domain, we have identified certain shortcomings. One of them is that they exploit context as we saw in the bass example, but some cases this can be counterproductive. So, this is a table with a corpus from last FM, and the most frequent identified entities for some album and artist types from Baylor, Fy, Tagney, and the BPD Spotsite, and the ones that are in red are because they were in the most cases from these evaluations. And as you can see, most of them are with, are musical entities written with stop words, like for those of you who don't know, stop words are function words, like prepositions, congenitions, et cetera. So once your system knows that you are speaking about music, it will tend to disambiguate every single occurrence of anything against a musical entity. This is when, whenever the system, so in the end, the artist split up with his manager because they weren't doing well. The end, the door's strong. And this happens a lot in a lot of documents throughout the whole program. So of course, today, these systems need to rely on context, but relying on context also has this short piece, another drawback. So we felt that there should be a way to, well, propose at least a line of, a direction of research where this was addressed. So we thought, okay, so what about having a training corpus? No, we speak about it would be great to have training data. So why not trying to think a way of automatically building a corpus or entity-linking in the music domain with high precision? Even if not everything is annotated, at least those things that are annotated in the corpus make sure that you're going to fit your learning algorithm with reliable data. So what we did was we took last-of-hand artist biographies and annotated those sense of thousands of entities with very high precision thanks to LBs. Not this guy over here, but rather a very, you know, humorous way of calling an entity-linking both in integration. The idea is that okay, maybe one or two systems maybe wrong, but if we put together the output of an arbitrary number of generic entity-linking tools, we may have some kind of evidence of how well, how good this output is. This is a framework for entity-linking, for integrating entity-linking systems. So the output is a, is a homogenization of the output of each of the systems that you put there. Keep in mind that some of them tell you the offset at character level, as another, other ones at token level, and this is integrated in LBs. So we processed 13,000 artist biographies, which are collaborative efforts pretty much like in Wikipedia, in last-of-hand people at Hypernix, and these Hypernix point to other last-of-hand documents, which were almost the same as in Wikipedia. And what we used, those Hypernix, to map from last-of-hand to the Wikipedia, and after a thorough evaluation of different degrees of agreement, we annotated that, we evaluated, I believe, 1400 sentences, and with a high precision of how less entities annotated, but the percentage of precision is very high. So we are very confident that this is a core books where, even if not everything is annotated, whatever is annotated is in the vast majority of cases, right? And in fact, if you go up to the website, what you're going to download is an improved version of ELMD. It's a bigger corpus because it has more stuff, and it's better because it has different output formats. Now it comes with JSON, in Formatida, JSON, SNL, K, and NIF, and a lot of annotations have been theoretically propagated. So yes, this is one of the resources that we have made available for the research community. In this context, even if we're going to see a couple of things about this at the end of the presentation, we are organizing a challenge in the European Semantic World Conference about entity linking in the music domain, and we are validating a small set of these documents manually. This is being done by Yalvin, who's there in the audience, so keep working. But it's going well. It's task number three in a three task open knowledge extraction challenge, and we are excited about it because this is the first time that there's going to be tasks on the intersection between natural language processing and the music domain. So, and of course the data training that is available already, and the test data is going to be released. All right, so now towards the music knowledge base, then from scratch. So this is entity linking, so we are talking a lot about entity linking. We are going to say a couple of things about this pipeline that does the whole thing from scratch. So again, the task, once you have your entity this entity linking is the task likes now on how to leverage this information as the cornerstone of a music knowledge graph. And the approach was combining acoustically motivated rules over syntactic dependencies pretty much like Sergio said with such a statistical evidence. This was a very dense paper, so we didn't think it made sense but it was not very dense enough for a three hour tutorial. So many technical details are left out if you're interested, we encourage you to have a look at it. But yeah, the main idea is that shortest path over syntactic dependencies sometimes breaks because people can be very creative in how they write and these may syntactic parsers don't really know what they are parsing. So cases of, for example, reporting each like sentences with a verb like same or if there is an article for syntactic parsers. So we, well, enforce certain syntactic relations at least for example to get subject, verb, object, three points, at least this information we could, if it was explicitly encoded and the path was below a certain threshold then we do that. Or looking at the first relation word, et cetera. Then we cluster together relations like since our approach is on the open information side a lot of the relations were pretty much the same but they were verbalized differently. So we proceeded to from right to left cluster together triples that had songs and artist names for example as argument one or argument two as type cluster, so song as a type was written by artist as a type and then other similar clusters were inter cluster as one single relation and this also allowed for a less sparse set of relations which is also better for navigation and relation scoring. So there's always this problem of we have a very huge dense graph but now I want to trim it a little bit and we want to get a bit of bad relations. So there are certain heuristics that you can follow. So we looked at the number of proportion of triples each relation encoded and whether they were evenly distributed. We looked at the degree of specificity so if we have a relation like performed we are fairly confident that it's going to enforce artist entity on the left and on the right. So we assume that whenever a relation is very specific if an outlier comes we expect this to be an error and we get rid of it and we also look at all the criteria like frequency length and fluency they generally get more relations that preserved in some way the original order in the sentence. This is precision like how well the relations extracted from the music from the knowledge is preserved the original meaning in the sentence from left to right are different configurations of the algorithm and the last column is our competitor system that we evaluated against based on river the system that said you mentioned earlier which that's very well actually only the best the most sophisticated version of our algorithm is more precise and this evaluation comes from manually changing the veracity of facts extracted from the corpus and we also wanted to know to what extent this approach since it is based on statistical evidence and entity linking if it actually extracted more information for every pair of relations in the for every pair of entities in the knowledge base and in fact the music knowledge base that we have released has over 3,600 relations there for every pair of entity compared with the 1,500 of music brains and lower numbers in the entity. And another nice application and it's a shame that this is in PDF because it would be great to play these songs but not some but a little bit because one of the applications that you can use with a graph that is constructed with natural language since the relations between entities are in natural language you can use it to explain recommendations we're not evaluating the quality of that recommendation but rather we go back to the survey and ask people if they value the fact that the recommendations they got were explained in natural language even if the recommendation was questioned about again because the recommendation was not evaluated in this paper. So you have, you're like a Jersey girl by Tom Wates, so for those of you who don't know Tom Wates is what it's not known for being the happiest artist on earth so he you know the cigarette and the glass of whiskey and the sore voice and then you make a recommendation of a song by Lady Gaga which on principle doesn't make much sense but then if you look at the explanation of the recommendation it says that Bruce Springsteen covered Jersey girl by Tom Wates that Bruce Springsteen played along Clarence Lemons which appears in this feature in the Lady Gaga song Clarence Lemons is this saxophonist over here so the conclusion we got was that for people that were educated in music these that didn't really appreciate it but for people who didn't have a lot of music music background didn't know too much about music these actually made questionable recommendations at least better from the user side so this was another conclusion that we got and another conclusion is that well as you can see we have lots of stricter information about music in the form of natural language that we can take advantage of and that we have very very scratched the surface so in our work we never touch social networks Wikipedia articles, Nordic reads and subtitles so the potential for improving MIR applications and musicology but applications using text processing is still there because there is a lot to be covered about which there has to be many publications potential for improving MIR by integrating with a much very yet acquired understanding of the process and these are a bunch of references and we're going to finish with the