 Okay, so back a little bit to text processing. We're going to speak about lexical semantics a little bit. I could take the risk and say it's the hottest topic now in terms of what NLB tasks you perform, because it's a task that interacts very well with the new trends in machine learning and deep learning. So let's flesh it out a little bit from the beginning. So what is it about, this is a work by James Pusteljopski. I think it's very nice to start a presentation on semantics. What is it about the presentation of a lexical item that gives rise to certain extensions and to the phenomenon of logical policy? So what is there behind meaning of words, the interaction between these meanings and how as researchers in NLB are we going or are developing systems that account for these phenomena? So lexical semantics is about understanding the units of meaning of the language, not necessarily words but also compounds, phrases and fixes. So you need that you may want to encode language as. And in NLB we have gone a very long way from the original formal logic way of representing semantics into path-based and now distribution of semantics, which is what we are going to talk a little bit about now. It also intersects with relational semantics, so when you are evaluating a system that identifies the meaning of a word, intrinsically you are also putting that word in context with the other words in your vocabulary. And the intuition of distribution of semantics is based on an assumption that we only believe it holds, which is that you should know a word by the company it keeps. So, and this is a classic example, right? So I don't know if anybody here knows what the one thing on the map is. No, no. But if I put the word in context, maybe it can bring a bit. So here is the one thing with the substance passing around and we also have some. So I'm pretty sure now most of you already have a broad idea of what this is. But actually if I take the word one thing back and I say that we found a little here in one thing back sleeping behind the tree, now the meaning that you are assigning to that word has probably changed quite a bit. So in fact, one thing that doesn't exist is a made up word by Magdalen and Branstad quite a bit quite a long ago. And we use this to illustrate how much we rely on context to understand the meaning of words. And this is the distribution of hypotheses. Words that appear in similar context exhibit similar semantics. So with this assumption, this assumption is what has given a rise to all the work on continuous version of representations and word embeddings because of the following. So in distribution of semantics, as I said at the beginning, you would traditionally compute word, context, concurrency, matrices, and you take a word and I look for context. I define what a context is for me and I encode these occurrences for isolating words with potentially occurring context. The problem with this is that you have to use a very long vocabulary because the vocabulary is not in the language, it's very long. If you encode one dimension third word in the vocabulary, you are inferring the varsity and also when you are building a concurrence matrix and you are encoding the context, the word context concurrence, for example for dogs, you are not implicitly missing a lot of information from cats or other animals because you are looking at explicit appearances of the word and its context. So this has changed with the predictive model, so instead of counting, predicting words by a context. This small change made the representation of text specters very similar to other type of media or other type of data whose nature kind of fell. Kind of forces you to encode any real value dense vectors like audio or images. By predicting words using the context, when you are modeling the vector of a cat, you are also extracting inherent characteristics of other types of other words that shares in that context. And this is a projection, these artificializations from word to bed vectors are very famous algorithms for building word embeddings. And they come from the TensorFlow website tutorial on word embeddings, highly recommended. And as you can see, because the context of the embeddings are similar, there are certain semantic properties that emerge in your data, even if you don't have to define them in a category. So in the first main female relation appears on the left, so you see that the vector difference between king and queen is very similar to the man and woman, or other relations like the verb tense or country capital, you name it. And I think that the coolest thing about this is that you're not telling the model to learn, this is emerging from your data. So as long as you have enough data, you may be able to see stuff in how people write that you may not have thought about earlier. And how this translates into code, well, everybody that does data science with text at some point has imported gems in, because it has a Python implementation of the original word to bed algorithm. And so, okay, let's go back into lexical semantics. So in semantics, you have a system that needs to learn the meaning of words, and how do you evaluate that? So there are different tasks for evaluating how well a system knows how to assign meaning to words, right? So this can be, for example, done in the TOEFL test, in the English test, where you have synonym equations and destructors. So as a student, you go and you're given an English test, and you say, this is a synonym, this is a synonym of this, and these two are not synonyms. So a system, a distributional semantic system, is traditionally evaluated on the same exams. Also, this semantic data sets, pair of words, look for hyperlinks. So tell me which of these pairs of words are, are types of. So any cat, animal, cat, hover. So I mean, my system needs to say, first one, it calls an essay relation, but the second one is unrelated. And also analogy, and there are data sets where you have to kind of solve the equation of a, b, a is to b, as c is to x, where x is what your algorithm has to discover. And with word of evidence, this was a lot of the news, because it was very cool to see that, oh, if you could say, if you take the vector of man, of the vector of king, and subtract the vector of man, and then add the vector of woman, then you, the closest, the nearest neighbor to that resulting vector is queen. So you can actually solve analogy tasks in a supervised way. This means that you don't need training data. And for, and it's also very simple to, to, to exploit it, you can see it's one line of code. And this, this one answers the analogy question. Man is to king, as woman is to queen. So this is everybody, more or less, who works in, and we have seen this a few months. So how does this translate into music information retrieval? Without any supervision, you can play with your data to look for representative instruments. And this is, the examples I'm going to give you now are by querying a model, which is already pre-trained in the World War II web page, trained on Google, on, I think, one million Google news web documents. But it's not music specific. And it's, so it's, it's generic. So if you want to solve handy, this is to the cards, most of this to x, the nearest neighbors to that, to that, the, that first piano is accordion. So it's kind of, it's, it's kind of nice. You can look for associated music genres. For example, another thing that came up to mind. So I really, it says it's to both ascend this presently, it's to the output, it was country and ready the first two. No, country, road and then ready. Of course, ready doesn't make any sense, but at least it's a music genre. So there is some kind of semantic property that is being preserved. So in the, in the website of the, of the tutorial, you can download a work of a model trained only on music, only on music texts with more than 16,000 groups, biographies, last 10 days and some five documents. And yeah, so now there's a lot of work, there has been a lot of work, especially since 2014, so for the last few years or so, in how to enhance these, these word environments models, how to make them more sensitive, how to make them semantic, and also how to make them domain-specific, sensitive. And this is one of the examples. Because we were working with already, with, with information that was already in the music domain, musical entities were already tokenized. So Meet Jagger is not the vector of, it's not the central of the vector of, of Meet and the vector of Jagger, but it's actually one single entity, so it has its own vector, so it's easier to explore the data you have. So it's more of the same thing. So John Lennon used to be those as Meet Jagger, he was involved in Stones, or something a little bit more convoluted, I would say. They did that as dance of a CC docus to jazz rock. So with music, music data, you can actually query these things. Another thing that I found very nice is that if you take the central of the vector group and see, but, but if the nearest neighbor was another, another team-floating member, I thought that was very cool. And also the, the closest to it involved is, is by this Meet, so it seems to make sense. Of course, I'm giving you the juiciest examples. I mean, I, I went into a, a, I'm not giving you all the time where I had an idea and I was like, oh, this is not important. But yeah, it's a matter of exploring your data and see, again, this is unsupervised. You are not pre-labeling anything and you are seeing what kind of regularities are emerging in your data by doing this. So, so for the cases where it doesn't work, do you think it's a problem of the, of the model or of the lack of, lack of data? Well, first it's a problem of lack of experimentation by our side because it's worth to try all different parameters that the work of the algorithm has. So you can play with the window or with the context window. You can, you can, it has, it's not quite single algorithm. So you have continued power force as you have a model. Then whether you want to use negative something, so you want to actually give a lot of relevance to negative cases and introduce artificial negative, negative cases and we haven't done this. So, this would be, if you ask me, the first thing to do is play more with the model, with that data. And of course, in data science, get more data and it's probably better. Okay, so what things can you do using this kind of, in the using domain? This is something that I've been working on, which is doing worse than disambiguation and entity linking, but instead of using other shared tools, attempting distance, using vectors. So let's say, for example, you have a following sentence. The influence of sisters of mercy became evident in later poetry. Sisters of mercy could be, they're not going strong, but it could also be the sermanitas de la ganita, the religious order. So, and here, at first glance, I don't have too much information about which of them I should choose other than poetry. And even maybe there was a lot of cultural and literary production from the sisters of mercy. The thing is that, at least I have one five-fold word that I can use to attempt the disambiguation of that entity. And we have done, so we have not published anything on how well disambiguation works using vectors as opposed to other shared tools for entity linking, but they are always being part of longer pipelines. But we were very happy with the end results, so implicitly I would say it's at least worth trying. We used, for this last sentence, which are mapped to pavement. So you already know that pavement is a large semantic network where you have concepts from wordnet, like hoe, shoe, or summer, and named entities like mid-jaguar, the Beatles, Barcelona, the club metro station, you name it, whatever you have in the media. Now, these vectors, which were presented specifically when you sent some bed, it was presented in ACM 2015. And the beauty of these vectors is that you have, for example, the vector of bank. We have as closest neighbors things like financial finance, money, transactions, things related to the sense of financial institution, but also river, nature, water, also vectors, which are related to the river bank sense. So in this model, since they are disambiguated vectors, the clusters are very clearly separated between the two senses. So we use these to perform a disambiguation strategy, which is the following. You take your two candidates and you assume that the closest, the closest two senses in the vector space from all the available senses associated to each appearance, that would be sisters of mercy and poetry, that would be the most likely disambiguation, the most likely senses for those. This intuition may work, for example, in relation to strategy. If you have Neil Armstrong one to the France, Armstrong may be the saxophonist, maybe the cyclist, or maybe some other guy, the thing is that, but to the France, which vector of to the France is going to be, which sense of answer is going to be closer to to the France, the one with the sport sense. So this was more of our intuition. So you keep the two vectors from each set that maximizes the cosciety of them and you keep the two of them that are the closest. This translate has to go as this. We wrote a small interface to sense and men, so you go like, get the most senses. I'm not as adventurous as him, so I'm not using any notebook. You have to trust me, this works. So you get the most senses, then you get the poetry senses, and then you go close the senses, and if your intuition is okay, then you expect the disambiguation to be correct. And I tried this yesterday, but the sense, the Babelnet ID, this ID here, you can actually go to Babelnet.org and check it out. It was actually the result that we were expecting. So we had the sense of sisters of mercy as the international cause. So we thought this was a very nice application of using word embeddings for disambiguation, and since we are speaking about music, you can always disambiguate against the music sense. So, let's see the semantics. What's the password in NLP? Again, you can see it interacts very well with the word embeddings. It's not a fashion anymore, it's a word embeddings scenario. And because it models the semantics and classes neural approaches, it's clear to know how well we're developing our research. In fact, it was kind of a joke, but it was very illustrative. EMNP is a very strong conference in NLP. It's got a theoretical methods and non-language processing, but I read a blog post about it. I attended the set of the inside, and I'm joking, this whole word that the EMNP now stands for embedding instead of the musical. So you can see that this is already, this is clearly something that everybody is willing to work in. Because it's giving very good results on pretty much every task, combining deep learning. And these are the references.