 Thanks for the interaction again. Thank you all for coming and saying so late I'm gonna talk about information embeddings And here is our agenda. So first we're gonna talk How can we use embeddings in music domain? We're gonna talk about why do we need it? How can we do it and how we apply it and we'll try to answer the question How does Spotify recommend us the music we like and then we will switch to Domain of words and texts and we'll talk about what embeddings and try to discover how Google translate knows all the meaning for all the words So let's go on with music embeddings and let's start with a situation which might be familiar to some of you Imagine that you're in your office sitting at your desk working hard and listening to your favorite music Let's say you try to you like to listen to classical music when you try to keep the concentration And that would mean that your Spotify recommendation list would consist of a bunch of different classical composers Then suddenly your phone starts to ring and you realize it's an important one So you jump out of your desk To pick it up and leave your laptop unattended and unlocked And the one thing that I didn't mention before you have really nice colleagues with a perfect sense of humor So they simply won't miss a chance to go to your laptop and messily and randomly click on the tracks that you probably don't like So when you're back from your urgent call You would found the list of recommendations Spotify looking somehow like this And it's really fun that it actually raises an important question of How can the computer distinguish between classical and metal music? So for us humans it's quite obvious we can easily say that Bach and Beethoven are related to classical music They both have quite complicated arrangements. They both performed by Orchestras big amount of people they only they have low amount of repetitive patterns and so on and you can easily say that Metal music is something different because it's much louder. It has heavy drum patterns It's it has distorted guitars screaming vocals. You name it But for a machine, it's really a problem. You cannot machine cannot apply the same logic They need something some help some hint from us to help them Converts this complicated and structured information, which is music to something digital and Digital can be operated by machines and this hint is exactly the embeddings So what is in betting in betting is the miracle representation of some entity? So here are two examples one is the song embedding you have the song Califernication and you're presented with the set of numbers we call it a vector Or let's say you have the word mother and you're presented with another set of numbers How can you do the embeddings from music? So what would be the? simplest idea to convert music into something digital you can look at the content of the song you can try to analyze the Sound spectrum you can look at the lyrics You can always you can also try to label everything with different genres and then use it for different purposes But there are also obvious disadvantages of it First of all the algorithms for sound processing are quite complex Then sometimes you listen to instrumental music. It doesn't have any lyrics at all And also if you want to label everything it would be a bunch of manual labor, which is not fun But fortunately there is another simple approach which is called collaborative filtering So what is it imagine we're Spotify and we want to recommend our customers what to listen to next? So we collect all the historical data of what our customers have been listening to before and we put it into the stable and The one the stable means that this user Like this artists for for example, Mike likes Beatles and the zero means he doesn't like so Mike doesn't like Beyonce But also we observe a lot of question marks here It means that this particular customer haven't listened to this particular artist before and that's exactly the point of our interests We want to predict it So the main idea behind collaborative filtering is very simple we look at the Mike and Kate and from the fact that they have the similar opinion on some subset of the artist We imply that Kate would follow the opinion of Mikey about the Beatles the artist she never listened to before So now we can fill this question mark if we keep going the same technique up till the end We would have the full table with only ones and zeros no question marks And the column on this table would be exactly the embedding of the artist So here are the embedding the numerical representation of fractal cell peppers is highlighted in green So this collaborative filtering techniques and embeddings. They are widely used in different musical services Spotify Google Play music song clouds last FM all of them and there are typical use cases for it So first use case is recommendation. It's when the user asks a question what to listen to next So we look at this matrix. We have a lot of question marks. We fill it in with ones and zeros and then we Output to the user of the one that the ones that he probably would like Next one is called start It's a problem when the customer just registered into the system and Spotify doesn't know anything about him You have no history for this user What would be the way to deal with it? So Spotify would give a little survey to the user They would say hey, this is 20 very famous bands Please say which ones do you like and we'll go on from it. So you as a customer or make the ticks Spotify then grabs this prefers artists create embeddings out of them and looks for the most similar embeddings and then they would suggest you to listen to these bands Next application is playlist generation. It's when the customer wants to listen to something which sounds like Linking Park, but it's not actually Linking Park So Spotify grabs one of the songs of Linking Park Converts it to the embedding and then looks for the most similar embeddings and then he would create He would take some bunch of these similar ones and put them into something which he would call Linking Park radio station and everyone is happy Another application is Friend suggestion, so I'm not sure about Spotify, but last time does it for sure It's when you're a user and you want to follow someone who has a similar interest to yours So if we would look back at our table, we would see that Every year over presents the embedding of the actual customer So we would take Kate's embedding look for the most similar ones and suggest her to Follow these guys and probably they would be friends There's enough for music. Let's switch to the second part of our talk worth embeddings We're gonna look at we're gonna briefly look at one of the approaches how we can create a numerical representation out of words And the main idea of this approach is to use the context information. So what does it mean? Imagine you have the word Fox and You're interested into in creating the representation of this words so here are the several sentences about Fox, I think I grabbed them from Wikipedia and You observe different words around Fox. So let's say dogs animal habitat brown predators habit and so on and Each of these words they contain a tiny piece of information about the word Fox that's because they They're quite close to the word Fox. They occur together. So you can say that these words represent the word Fox Let's try to generalize this idea So we're trying to analyze the huge text and we put this text into this table So columns and rows would be the words from this text, of course, it's just a tiny piece of all the table And what are the numbers in the table? So each number shows us how many times these two particular words occur together in this text so that's our Fox and This row of numbers would be actually the embedding for the word Fox So what do we see? We see that Fox and quick they often occur together Also, we can say that Brown often goes together with Fox, but for example lazy The value is very low. So you rarely see Fox and lazy together and When we would create the graphical representation out of this table, we would need to take into account so here is our Fox and Around we can see Brown and doc and lazy is a bit further. That means that lazy That folks are not lazy. They're less lazy than they are brown and Another interesting adoration is that lazy is closer to the dog than to the Fox. So it means that Dogs are more lazy than foxes and it probably makes sense because foxes are wild animals. They need to run more than dogs So how can we utilize this approach this knowledge into in real-world applications? So first of all, if we look at the distances between the world the words we can try to create the Synonym findings in a matching so football and soccer would be Graphically represented very close to each other that means that the meaning of them is very close and you can say that one is a synonym for another second one is We're trying to observe the direction between the words So we see that the direction from the country to its capital is always the same That will let us to answer a simple question the questions like Given the fact that Paris is the capital of France. What would be the capital of Spain and We just go to the Spain and follow the same direction and we would see them and read If we would apply the same technique to the whole sentences, but not to the words We would be able to create chatbots. So the chatbot is Some service which talks to the customer customer has his question and chatbot gives him an answer So we have the question from the customer We embed it to some numerical representation and then we are looking to the closest embedding but from the answer space We would get the closest answer and output to the user Another interesting application is about languages Here I'm sorry for the not very perfect graph The font is super low But believe me there are some words in English on the left side and on the right side There are the same words, but in Spanish and what you can see that the position for the word one for example is Really close to the position of the word uno which is one in Spanish and the same goal You can say about any work like for and quattro cat and Gato and this idea is actually one of the basic ideas lying behind Google Translate algorithms So you can you take the word in English you create the embedding out of it And then you look for the closest embedding but from the Spanish vocabulary and that would be your translation so to recap One thing that I want you to take away from this talk is that embeddings is something Is some concept that allows you to convert some complex and unstructured information into digits and digits can be understood by the machines And this concept is applied in different domains so translation Google Translate. We just talked about it a Virtual assistance for example Amazon Alexa. It's actually a combination of acoustic embeddings and textual embeddings So you have the question you can the system converts the audio to text also with the help of other embeddings Then you have the embedding of your question and you look to the closest embedding from the answer space and you give this answer back to the user Recommendations, it's about the content the musical content as Spotify or video content as Netflix So you have your movie in Netflix and you want to Be suggested by similar with similar movies. You have the embedding of the movie You look to the most closest embeddings and it would be your output for the user Image similarity It's like when you go to Pinterest in search for the new outfit You start with clicking on some picture the outfit you like and then Pinterest would show you much more of the same much more pictures Which look closer to the initial one. How do they do it? They grab the embedding on the picture you clicked on They look for the closest embeddings and they output all the closest embeddings the pictures with the closest embeddings to you Yeah, actually there is much much more you can even try to embed the human character and use it for partner matching service But unfortunately, unfortunately one question would remain unanswered Would you need to match the characters? The characters which have the similar embeddings or you need to match the opposite ones? Thank you Thank you very for your for your talk. It's a very nice picture to finish So we have some time for questions so someone would like someone would like to ask a question please Hello More and well, I'm not sure if this is more goes into kind of your domain or not, but Especially for so for music music embedding Why isn't there like a huge random button if you want to discover something that's completely outside of what you like or like on YouTube Why isn't there why is it always gives you just the same thing over and over again? I'm not sure if you know the answer to that Yeah, thank you for the question. I'm not sure I don't work for Spotify But probably like, you know in the old times Google had this button get lucky So you type anything and they get lucky and it would be random And I think at least you have at least Google fame music. They have they have the same radio station I think it's called just get lucky radio station So you just press it and I think it's based on your previous history, but not super related to you They would recommend you Quite random music So probably that would help Some more questions, please Thank you very much for a talk question about the start knowledge to be able to do that kind of thing you need to Get the information to collect some initial. Let's say an amount of just, you know, knowing things So before you do that So it's just wild guesses or there are some smarter ways like, okay, I have nothing about that guy I do have one time Information I do have time ten times or yeah now I'm lucky. I do have a thousand times Interactions with my service. So how those stages look like? Yeah, I think it's really nice question. So It's always better to have more information than less but if the user rejects to Go through the survey, for example, at least, you know His login name probably, you know, his gender maybe the age group if you apply some difference Third-party databases, you may know some of his interests. That's what's help us to do something. Yeah, it's about Google But if you know nothing about him, I think the only way to go is to pick your popular Tracks artists and to give it to him because otherwise if you know nothing then it would be the way to start So some more questions, please Please How does for example Spotify classify something as not liked by the user because If I'd like something I downloaded or I favorite it But if I don't like something there's really no way of showing that I don't like it I might just listen for a couple minutes and then decide I don't like it and then I just close it But you know sometimes it happens but doesn't mean that you don't like it Well, now I'm really sorry for Spotify because actually I'm a user of Google Play music and you have a dislike button there But actually I think they edited it actually I'm using Spotify they have it now But another way would be like I think Google does it Automatically when you start listening to the track and you almost immediately switch the next one. They automatically Markets dislike so probably the amounts that you spend listening to the track would be the implicit feedback