 Let's go to the last section. So deep learning, okay, I'm not going to talk about deep learning or display nothing, but deep learning just gives an example of how we use deep learning in the approaches we saw before or in MIA. So, first of all, deep learning improves almost all tasks in NRP, as in any other fields. We have seen that deep learning now is really high. And it's used everywhere. This is the typical draw of what happens in deep learning. If you are tracing on machine learning, you have a lot of data. The performance is usually kind of constant, but deep learning can be all performed machine learning if you have enough data. In NRP also, every single task that we set is kind of the beginning now. There are deep learning approaches to improve these tasks, but I'm not going to explain these, how to do NRP with deep learning. The typical objectives that are used in NRP are the SDM and convolutional learning networks. The SDM are used for imparting and direct munitions and development analysis, and convolutional learning networks because integration and sentiment analysis also. This is the feature. Also, deep learning has opened up another thing that is there possibility to do N2M learning, so you can use it without creating an embedding matrix like I did before. Or you can also use character levels, or you can use the characters, not the words. You can use word embeddings, as we just played it. This is the idea. So first, let's talk a bit more about word embedding. It's not exactly deep learning, but it's more of a deep learning because you are doing an optimization, and you are using soft marks, so it's very related to deep learning, but without the need. So what are you doing in word embedding when you are learning the vectors? There are two vectors, two embeddings. The word embedding and the context embedding. So the idea is that the deep learning architecture for that, which we call that deep, is not so, we have two embeddings, and we compute the dot product of these two embeddings and apply a soft mark function to that, and we have to maximize the probability. So if this word, if we can do this word from this context, the probability should be one of these dot products and soft marks, and if not, the probability should be zero. These are all training samples. We have the observable word of context as positive samples and negative samples that are randomly taken from the corpus. So we have positive samples, negative samples, and we train these until the embedding converges. When they converge, we have the embedding for the words. So we mean in mind, this object is function. So, but there is also a word from Levy et al that says that in word embeddings, in word embeddings we have to, we learn two metrics of embeddings, the metrics of words and the metrics of contexts. And typically we use the metrics of words in an LPE, but the metrics of concepts is not used. The context is not used. But what are really these metrics? See, he said, what happens if you do the dot product of the metrics of context and the metrics of words? What is the metrics that you get? So here we saw that, these metrics is like the point-based information metrics of word and context. So there are some papers based on that that also use the context, not only the word embedding. So here, when you have small links, you do what to break if you want. So the idea here is that, we can see that words and context, we can think about, we can use the same methodology, but use other things. For example, words can be songs, and context can be like this. So we can try to find the embeddings of songs or the embedding of artists using the word to make. Or we can think that tags and items, tags and words, and context are items, and we can also learn the embeddings of tags. And I insist that we can do different things. So we did an experiment with another set of playlists, for the tutorial, did that. So using another set of playlists, very old, from La Rosa, we get it, and we have the artist. So there are playlists of songs, and we get the artist of these playlists. So every context is, we set up a context in the playlist, and learn vector representation embeddings for the artist. So we have put a Python node on the tutorial website with this. I will show you. I hope this does not crash, because it's not using Internet. The idea here is that it is simpler and faster. We use Jensen, that is the library that we set, that is very good for embeddings in Python. We get the, like the playlists files and process them to have a link. We train the word to embed here. We pass all the sentences. A sentence for us is a playlist, and the words are the artist. So a sentence is a various artist, a list of artists. So we define the number of dimensions, we want for the embeddings, the size of the window. So what is the context? The context is 10 songs after and 10 songs before. These parameters and we train the model. It's super fast. It should be okay. And then we have the model train. And we can now look for more similar to my days. My days is a work in our world embedding space that we have left from the playlist. We get, for example, a model train. What is the most similar to Marilyn Manson? White Sony, Metallica, more similar to Nirvana. There are things, for example, there are things that have sense. It wasn't like that. We can also look for what is the old, like here, Marilyn Manson, Metallica, to be market. And this is the vector. This is the vector embedding of Nirvana and a lot of numbers. So we have one of the dimensions and the surface is represented in this space of 400 dimensions. Okay, so deep learning for music recommendation. So this is the last thing I was working in. I was in Pandora, in the interest of doing deep learning for music recommendation. Well, the idea, there is a the state of the art in recommendation is use matrix factorization. So you have the use of eigenmatrix and you do factorization and you get two matrices. One of eigenfactors and the other of user factors. If you multiply these two matrix you reconstruct the original matrix. And you also feel the missing cells. So this is a problem of optimization to learn these two matrices from these matrix pieces. And once you have these you can multiply them and to read the recommendation. This is the state of the art. So the problem here is what happens if you have a new item in your system. You don't have any collaborative information. It's empty. Then the matrix of user items for this new item. The thing here is okay, if you have user information, you have to do a content-based approach. You have to use the features of the item. Or you have to do a hybrid approach. You have to combine the collaborative information with the features of the item. So these two hybrid approaches, for example one is aggregation of features that is what I explained before in the recommendation. So we can also learn the item factors from content features. There was a paper from San Diego and another one from the North that is very good that they what they do is to learn the item factors from the audio content. So what we are working on is to learn the item features from the artist biography. So we have a biography of the artist that was obtained by factorizing the user item items. So we train a network to learn so much. We have a new artist, we have a new biography of this artist. We can learn this factor and once we have this new factor we can combine with the user factors and predict their recommendations for the user for this new item. So this is the idea and we also have a set of information music, we have the biography of all the artists processing that. So we have some very interesting results on that. So this is the 500 like the upper bound. So if we have the matrix field, we would have the information from this item. What is the recommendation and accuracy of this new precision? We have 0.5. This is the upper bound we can get if we have this method. If we do random, so if we create the factors random without factorization, well we can get 0.001. And if we use the tags of the artist to learn the factors, we get 0.57. So the idea is that tags are socially created information that you don't have that always. You may have that in a dataset, but if you have a very new artist, perhaps you will have tags. You just have the biography. So what we want to do is to really try to put the information in the text of the biography to be able to learn as much as the tags are able to submit. So we try to have a better space model for example, this typical approach of text base and input that in a feed forward network. We also compare that with not using deep learning. So just using deep learning instead of a classified component forest, we include. And using the typical text base approach for conveying. Then if we add semantic information like we did in the other approaches and adding this information to the words, we improve it also. We also try to use word embeddings and also a long test, a very big test, text and the green network doesn't work that well with long texts. We add word embeddings approaches. We are working with that. So it's just an illustration of what can be done. So everything can be redesigned and using deep learning instead of using classification as I was using before when we started learning. So another thing of deep learning is that we can combine text and audio. We can domain images and audio images text and audio. So we can do multi-modality and there is the same approach almost for every one of these domains. So now there is really a cross-domain transfer of knowledge here because this is end point learning. You don't have to learn about learning features, you just have to learn how to build the architecture of the network. So if you know to build the architecture of the network how do you do that with text? If you know to do that with text you can do that with images and you can connect everything in the network and you can do multiple approaches.