 So this is the idea we are going to use for MIR, I'm going to do this before the break. So applications in MIR, where MIR means information retrieval. I'm going to focus on three very important applications in MIR that are similarity, classification, and documentation. So similarly how the songs are similar to artists or whatever, for example here, like in Mexico, but they have kind of similarity here. Classification, so put a label, so like general classification or different classification, can be a non-recommendation. So these three tasks have something in common. So this is a similarity problem. In the input you have a matrix of items and features, and then the idea is to get a matrix of items and items, where the sense in the matrix represents the similarity between the items. So this is a similarity matrix, and at the beginning you have these matrix of features. You measure in different ways what are the similarity between the different items and represent a similarity. This is similarity. This is classification. So we have matrix of items and features, we have a vector of labels for these items, and then we split in training and test, and we try to train a system that is able to learn to give these labels even the features. And this is recommendation. We have two matrices in recommendation. We have a matrix of items and features, let's say a matrix of items and features, and we have another matrix here. We have a matrix of items and uses, where there is information, whether a user liked or not liked, or liked a lot, depends on the program, an item. And the objective here in recommendation is to... Well, in this matrix there are many elements that are not known. So in this matrix it's incomplete. You know some information about some users and some items that you don't know everything. You don't know if a user may like or may not like all the items in the catalog. You only know some of them. So the idea here, you want to predict these, the missing elements in this matrix. So you can use this matrix, this matrix, command them, but the idea is to learn this. So this is the recommendation. So at the end we have always some matrix of items and features. And this is what we are going to see here. So what is an item for us in our problems? Typically in my app, the features are extracted from the audio. And the items are the audio clips. There is a song, an item can be a song, and the features are extracted from the audio. In our approach what we are doing is... An item is a document. An item can be an artist's biography, or a description of a sound, a review of an album, or can be a history about a song. So a text document that refers to an entity. This is an item for us. Okay, so what we are going to do is to build matrix of items and features from the document. And the idea here is, okay, the typical approach for document classification, or document similarity, or document recommendation, because all of these exist outside of my app. So these exist in LPS itself. Document classification, document similarity, document recommendation. So the typical approach here is to use what is called a dangerous space model, but the typical in the past is always the typical value. You create a feature vector where every feature is a word. You have a vocabulary. You define a vocabulary of 500,000 words, or 15,000 words. And every feature is one of these words. And you have an item is defined with one if it has this word, or zero if it doesn't have this word. So this is the regular space model. And this is about word representation. And then you can apply a weight to these words in terms of the frequency, using EFIDF or other fancy VM25 or whatever to weight the matrix, the weight of the matrix. So this is the typical documentation approach. We can do all these tasks with this. But our idea here is, okay, we have the text, and we approach the text with this amount of information. With this amount of information, we will add this knowledge graph that we explained to me before. So how to embed this graph into a matrix of features and items? This is the problem here. Once we do that, we can run the same approaches that are used for document classification, document similarity, everything. So we are focusing on these. First, how to embed the graph, this material information into a linear matrix. So first, let's talk about the concept of H0 by the neighborhood graph. The red points here, imagine that the items may not be artists, for example. And the blue nodes are entities that are linked to these items. For example, we perhaps did the delineation on the linear matrix of artists, and with the 30 centimeters we collected to the nodes or whatever we have this graph. So the neighborhood subgraph is the, so we take the one-hot subgraph of every item is all the nodes that are one-hot from the item. The two-hot neighborhood graph is all the nodes that are two-hot further from the item. So this is the idea of the H0. So to do an embedding, we can use many different approaches. There are a few kinds of embeddings we can do. So we can use the distance from the item to a node. We can use the frequency of the node inside the subgraph. We can use tfid of a node inside the whole node. We can count the number of inlinks that a node has. We can take into account the path, the non-different path, the sequence of nodes. So there are many ways to embed a graph in a linear representation. There's no one fixed way. There are, let's say, three ways that we have used in our approaches and talk about some experiment with this. So first, a flat embedding is the most straightforward way, and it works pretty well. Any application is just, like, instead of both words, about nodes. So we have the graph, and each node that is in a graph has a feature in our feature matrix. And every item has a score if it has this node or it doesn't have it. If it's neighborhood subgraph, then we can also have an idea for whatever to reweight this matrix. Another method that we have used is the entity-based embedding. So in this node, we want to also measure, like, the distance from the node to the item. So it's not the same to be in the first level or in the second or in the first. So we can do, like, a decreasing weight in terms of the distance from the item. Or if a node has, like, different inlinks, more inlinks, perhaps this node is more important. So we can give more weight to these nodes also. So this is the idea of the input embedding. Then we can also take into account the path, the sequence of nodes that there is in the subgraph. So this is what we call the path-based embedding. So a subgraph like this may have different paths. This is one path, and the path is apart from the leaf nodes to the root. So an apart can be also divided in different subpaths. You can have this subpath, this apart, and this other subpath inside the path. So at the end, if you take all these possible subpaths, you can associate a feature to every different subpath that exists in the embedding program. So this is another approach. So, okay, let's try to use tables and add a similarity. So remember we have a matrix of features and items, and then we want to get the matrix of items and items with the similarity between the items. So we can take the neighborhood subgraph of a reaction, we can go after embedding, and simply check a similarity. For example, this is like what is the intersection between two subgraphs. So we did an experiment with that in last year, and it needs me. So we got a course of biographies from last year. We applied the link into all the biographies using butterfly, and we built a large graph, a semantic image graph, an entity graph, to every biography. And we also did relational extraction and built a graph with the relation that we extracted from the biographies. We did two experiments. So remember this is the difference. This end is Gorillaz. This is before a band formed in 1998 by the Monalval of Lour. And yes, Henry, the creator of the comic book, Tangier. So an ideal relational extraction system would have built this graph. Gorilla is the, Gorilla is formed by the Monalval, by Jeremy Hewell, the Monalval of Lour, Henry, the creator of Tangier. So this is the ideal relational extraction. And this will be the graph of the entity. So Gorillaz in the middle, and all the entities that were detected alone. So I guess it seems that this graph has more information. But the problem is that the relational extraction system is not able to give you this type of graph. It has some errors. So at the end, the performance of this is much better than the performance of this sort of graph in MIM. So with the experiment, with only with a pure text-based approach, with the relational extraction graph, or the graph of entities, or the semantics graph. So at the end with the semantics graph, we all performed the text-based approach, and I think it was the best one. So another application, general specification. So we have the matrix of item and features, and we have the vector of labels. We want to train the system to predict the labels. So we did an algorithm that has album reviews from Amazon. So there are reviews from the customers. We mapped that to music brains to have more information. So at the end, we have an algorithm with 1,300 albums and 13 genres. So the idea is we have the reviews of an album, and we want to get what is the general of this album from the reviews, from the text of the reviews made by the customers. So we will serve a different set of features, textual features, semantic features, so we apply entity linking again, and we do these graphs and get information from MIM in Wikipedia to do the semantic language graphs. We need also some sentiment analysis to get some features related to sentiment, and we also have the acoustic descriptors of the audio, the songs in the album, and compare audio classification with text classification. So these are the results. So the audio specification works for general identification, works much worse than if you use text. Text classification has high scores. This is the pure text approach with about words to get data. If we use semantic information through entity linking to enrich this information and combine with the words, we improve the results in 69. So if we use sentiment features, we improve them, but not very much. So the thing is that the amount of words model is very strong, and it works super good. It's difficult to beat this amount of words model. It's bad, okay? If you get sentiment information and you do some filtering or you do it very properly, you can improve the amount of words. This is the confusion process, the combination between the audio take and the text base approach. So we try to combine audio and text here to see if it improves, but it didn't improve. And the problem is that when the text base approach fails, also the audio base approach fails. So if you combine with human words results, then you can use text only. So there is a lot of space for research in the combination of text features and audio features. And finally, music recommendation, that is the approach that I have worked in the most. So typically in music recommendation, there are three approaches to the recommendation. Collaborative filtering that uses only the user matrix, content-based recommendation that uses only the item features matrix, and hybrid approaches that takes advantage of both matrices. So we did a research with an hybrid approach and we concatenate the item features from the item feature matrix and the item features from the item user matrix. So we concatenate them and use them to train the model for every user, trying to predict the items that were not written by the user. So this is the idea and we use some information and thinking to build this item feature. So we did the experiments in two different facets. One is for sound recommendation from free sound, getting the description of the sounds and the types, and the other for music recommendation, having the description of the songs extracted from sound facts, a work with a history of songs, and with information, perhaps for last effect. So what we did is we applied the linking to the description of the sounds and the description of the songs. We built these semantical image graphs and we embed them with entity-based embedding and with path-based embedding and then we compare them. So this is the graph. This is the recommendation we used before. For every sound, we have the entities. We have information from Wikipedia. We have information from water. So this subgraph describes sound. So these are the results. We get the best results in terms of precision and record using these entity-based embedding combined with collaborative features. But if we see there are not that much difference if we use only the user matrix, this is the pre-collaborative approach of if we use a vector space model that is used in combining with the typical text-based approach or if we use semantic features also. So they are pretty similar. Not that much difference. These are a bit better, but not that much. So we use another evaluation method to measure in which way these semantic features are improving the recommendations. So we see what are the entity-based novelty and the aggregated diversity of the recommendations. So this means that we are recommending items that the user would like, but how diverse are these items? We are always recommending the same items or a different range of items. So this is what this measure gives, so this is the lower the level and this is the higher the level. So here there is much bigger difference between semantic approach and non-semantic approach. So the idea is that here we are giving more diverse recommendations and keeping a high accuracy. That's the result here. And if we see the approaches that doesn't use the power of any information, so it means that don't use the user's item matrix, only the future matrix, you get very bad recommendations even though your diversity and novelty is super high. So the rest is used for our information and combined with item features and if you use some of the information you can provide more diversity and novelty in the recommendation. So this is the conclusions and last thing we can do is like Luis explained before is to explain the recommendation. So we can explain the recommendation if we build a knowledge graph and we have the entities in this knowledge graph defined and we don't have to compute the recommendation using the knowledge graph. We can compute the recommendation using another approach, using the power of any information, but if we have a knowledge graph that connects the entities in the graph we can use them to explain how two entities are related. So we can give two kind of explanations here. We can give an explanation based on the labels of the knowledge base, the labels of the relations and the sentences where entities compute and use the original sentence to provide explanation. So here for example with the experiment I have some of you build because of the same name with that in the past. So you may have a recommendation using a sentence, a complete sentence it doesn't really matter to say the leaders, it's a sentence where these two entities are here. So it's the original sentence. And here we have a recommendation only using the label of the relation. Like I can imagine it's a version of Sanctuary. Jumping the file of the data was described by I don't mind it. So, like, information snippets and there are long sentences talking about that. So we did an experiment to compare what the user preferred. Another challenge for building explanation is what is the best path that connects two entities in a knowledge graph that the two entities might be related to many different paths in the graph. So selecting the best one is another challenge. So we did this experiment and at the end we got that using the original sentences is better for implementation. Let's see, another applications that were this, that we applied and question answering as an added search but we haven't solved it yet. So here we have a reference. There is some supplementary material in the web of the tutorial where there is a Python notebook where you have that uses the library that Luis told before called Ernie's for doing entity linking. So I will show you just a brief example of entity linking using Ernie's and then we go to the break.