 Okay, and we're back Excellent, so it's a great pleasure to introduce Sam All right. Hello Sam Where are you from right now? I mean, where are you right now? I'm coming to you live from Galicia in the north of Spain. Oh Excellent, that's that's great. It's a great place. Galicia is a great place great food I'm jealous So you're going to tell us about simple open music recommendation great subject very very excited to hear Take it away Sam Okay So thanks for watching this talk. I I'm actually an operating systems developer But I've developed an interest in in recommendation engines recently I'm also a musician and a keen music listener. So that's what Drives my interest Now this talk the emphasis is on simple so I can promise you there's going to be no machine learning because interesting as it is it's not very simple and Hopefully this would be Something anyone can understand I am going to go into details and do some demos, but the idea is that it's It's a recommendation engine that we can all we can all understand so You've probably noticed the recommendation engines are everywhere these days think about your day-to-day Have you done something in response to a recommendation algorithm? Maybe you spoke to someone on a social network because an algorithm showed you their post Maybe you bought something because you were recommended it online maybe you got into an argument because an Algorithm showed you something you disagree with maybe you listen to a song So my interest in music means that I use different music services and I found The the recommendations sometimes very good sometimes not sometimes predictable. I thought Given recommenders are taking over the world. Maybe I can make a recommendation algorithm and then at least I'll know what's going on with it But I'm an operating systems developer. So I have to start from scratch. I thought What goes in to a recommendation algorithm? Well data goes in then there's a process and Then some more data, which is hopefully hopefully more interesting will come out of the other side In the case of a music recommender then on the right-hand side we will get a list of songs on The input side it's more complicated. You might take a music collection Well a music collection is really a list of songs, but where the order isn't important You might take social data like other people's playlists. Well, those are also lists of songs Maybe you'll take the history of what someone listened to in the past, which is also a list of songs ordered by time And maybe you look at analysis of the pieces of music that might say how fast it is how dark it is whether it's metal or ska or gospel Which we could really keep that data in a playlist as well if we can store that data alongside the songs so We can already simplify this Back down to a playlist goes in some kind of processing happens and a playlist comes out As long as we can do arbitrary playlists then we can have this simple model And that's nice because this to me looks like a shell pipeline as an operating systems developer I spend a lot of time writing shell pipelines As data scientists we're more likely to use something like ipython jupyter The same tools work for modeling pipelines. What if you want to use? So How do we represent a playlist? I've been using this simple Jason format where each line is a Jason object and a set of attributes describe the songs and I've created some Python code wrapped up in a command line tool called Calliope or Calliope Depending how you want to pronounce it And that allows me to create some pipeline. So here's a very simple. I Wouldn't call it a recommender yet, but a playlist generation pipeline so it asks the tracker search engine on my my computer to show all of the songs I have and Then it shuffles the list chooses five and then uses another command line tool to select just the title and the creator There we go. We created a playlist already. I want to talk a bit more about the format Firstly, the fact that it's a list of Jason objects is deliberate That makes it more usable in a traditional command line because you can start the tool on the left and It prints an object and then another one and then another one and the next tool in the pipeline can read these as you go So you don't have to generate a huge list of a million songs wait for that then shuffle it In the case of shuffle you do have to wait, but in the case of other processing you can do it a line at a time And it's a kind of lazy evaluation Apart from that the playlist format is not special. It's an adaptation of spiff excess pf which is Already existing since 2006. I think It's a standard pretty common playlist format. It's based on XML which I have not used Because it's no longer 2006 But pretty much everything else is the same One really cool thing about spiff is it's described as portable In the case of a playlist being portable means that it's not tied to a specific Streaming or storage medium if you have a list of files on your computer. That's not portable. You can't play it on a different computer So the way spiff works is you record details about the song like title artist and Then later you resolve it to some actual content So I want to dive in already with some demos to show How this works and how easy it is to play with it. These are all live demos things can break Please bear with me, but let's Let's see what happens. So I've got a playlist here But I can't play the playlist because it's just a Jason object so What if we resolve the playlist against the Spotify web API this is using an API API key that I've got saved locally and Out the pipeline comes another Jason object. I'm gonna run this into the JQ tool. So it comes out in color There we go much better So now you can see we've got the title the creator And we've got a load of metadata from Spotify including somewhere Well, we should have a location field But now we don't this changed recently, but we have enough information that we can generate a Spotify URL What about a different service we could resolve against music brains Same input This you would expect that to work ah The command is different. So I'm gonna annotate it with some information from music brains Again, it's a live demo. It's The results are cashed, but obviously my cash is empty. So if I run this a second time, it will be fast Okay, and now we have some information from music brains I Could also resolve this against files on my local computer There's another command which interfaces with the beats Music library, which is a cool Python tool to organize your music library so I can ask beats For a list of all the tracks It's quite slow because there's some improvements needed in beats In fact, that's too slow. Let me show you albums instead Okay Hopefully by now you're seeing that All this tool Calliope does really is it's a toolkit for reading data from disparate places and Writing it out in a standardized format, which is the first thing we need to do some kind of interesting Recommendations Most of these tools are simple. They are a few hundred lines of Python or a thousand lines Maybe are the more complicated ones and I intend for them to stay that way now This is a nice way to generate playlists, but I Also want to show you That you can export to a number of different playlist formats. So now we generated our resolve Spotify playlist I can now export it in XSPF format, for example And I can now import that into a music player application or put it on my iPod or anything So back to the slides Let's talk a little bit about Spotify Spotify is probably the number one in music recommendations as of today by their own numbers they have thousands of engineers working on the product they have Thousands of data pipelines running constantly They have 50 million tracks a quarter of which nobody's ever listened to and They are capturing half a trillion events Every day. So that's a lot of events per user. I think there's about 250 million users, right? So I think how many events they're capturing per user. They know a lot about you more than just a List of songs that you listen to What can they do with that data? Quite a lot. Can we compete with Spotify? In the open source world that would be difficult, right? Because we don't have 16,000 engineers at our disposal But we also don't need to Something anything is better than nothing, right? so an open source music recommender even a very basic one is Already kind of fun. It's already helping us learn about music recommendations and it's already Helping us not depend on Spotify So if we're gonna recommend Songs we need more than just randomness. We need some data I'm gonna talk about some open places and some not open places that we can get data from Firstly music brains Which is a website? It's kind of a catalog So you can go on music brains and you can look up any music release any song and it will tell you way more information than you ever Wanted it can tell you when and where something was recorded what record label it was on who else has released music on that record label who produced the albums when they were born Huge amounts of data The data is all open data. It's released under an open license. The website itself is open. It's maintained by volunteers So it's a really cool data source to use and obviously Cali up interfaces with that Unless you make use of them the music brains data So I have lots of ideas of fun things like recommending artists on the same record label or from the same place Listen brains is from the same team as music brains and it's a way to record your listening history So you can hook it up to your music player. You can install a browser add-on called. What's it called? Web Scrubbler and that'll record things you play in your web browser to listen brains You can look at other people's listens and it gives you some kinds of analysis about what you've been listening to so start using that it's really fun and In future, it's gonna mean you can get better music recommendations Of course Spotify if you use Spotify, they record your listening habits as well, but they won't give it back to you Even if you go through the long process of request all my data Which they have to do legally they still only give you one year of your listening history They won't give you everything unless you specifically ask for it. So if you're not start using listen brains. It's more fun This website in the bottom left is last FM, which is actually an older website for recording What you listen to it's not an open website. It's owned by a CBS interactive I think But Calliope can also interface with that ice have been recording the music I listened to for years and I used to do this with last FM Last FM has an API as well with a lot of tags Created by users. So there's an interesting data source You can look up an artist and see what millions of users have labeled it as and we can use that for recommendations Finally Spotify actually has a really powerful API and you can access Spotify's own data So my last image in the bottom right Has Spotify's an example of the Spotify analysis you can get for a song like how danceable it is What key it's in whether it's got lots of speech whether it sounds acoustic Now I don't think you can query five million songs using this API But you can probably query 20 or 30 songs for free with no problem So there's a lot of interesting data we can use I'm gonna show a couple more demos of Things that you can do with the listening history now. I said to use listen brains I'm a bad example here because I'm still using last FM. So I'm gonna be showing the last FM history command Calliope doesn't yet support listen brains, but it will be a fun thing to contribute if you want to Join in the project Let me show you What the last FM history command can do so if I give it my username and I say That's not the command That's the command. So I say give me a list of the scrabbles, which is the Last five songs I listened to and there we go. There's the last five songs. I listened to in the form of a playlist Let's mine this data a bit more So I can ask what artists I've listened to I wonder if I've got this one recorded in my Backlog yes, so I'm asking it for all the artists. I've listened to in the last six months. So I'm saying There's an SQL database in the back end here, which I'm not gonna show you but in the background This is scraping the last FM data putting it in an SQL database so that we can query it locally and now I'm saying query everything I've listened to in the last six months and Only return things which I listened to ten times or more and Then I'm gonna select just the creator field because that's the only one we care about So here's a list of all the artists I discovered in the last six months and I must have liked them because I listened to them ten times or more So we want about my taste of music. It's quite obscure But now we could perhaps make a playlist of these artists I'm not gonna show you how What am I gonna show you? Ah, here's another interesting thing we can do kind of the inverse. I Can say show me all the tracks Which I Didn't play in the last one year This is gonna be a lot of tracks. I think let's see how many it is I'm gonna count how many it is using the word count to 34 000 tracks Which I haven't listened to in the last year, but I listened to before that Um, how can we make a playlist? Let's randomly pick five songs Okay, so there's an interesting playlist five songs that I haven't listened to in the last five years So anything else we can do Select Okay, the last thing I want to talk about before we break a bit for questions Is the select command so shuffle is fine, but there's nothing really too smart going on there We're taking the data randomly shuffling it and Spinning it out. That's already a lot better than nothing But let's do something a bit more advanced. So the select command uses a simple type of algorithm called a local search And so I can feed in this data randomize it and then I can say Give me a playlist with a duration of 60 minutes Because there's a pipeline I need to put a dash To tell each command to read from the previous pipeline. Let's see what this comes up with Actually 60 minutes is quite long. Let's have a 30 minute playlist so that hopefully it fits on the screen Hopefully this doesn't take too long In fact, uh, this isn't going to work because we haven't annotated it with a duration information So That's possible, but it's outside the scope of this talk. What I would do next would be to Resolve the files somehow either against my local music collection or against music brains to find out how long they are And then I would have the duration field and then I could actually Select them based on the duration, but because time is short. I'm going to move on to the last slide Which is a paper I read And this is where I got the idea for the select command so it's from 2008 and You can read this online for free the link is in the slides which are in my talk And It uses what's called a local search algorithm To recommend playlists local search is quite a simple algorithm in the sense that It's what a person might do if you had to choose from a pool of a thousand songs And you had some constraints Then you start by picking one you pick another And eventually you've got too many and you throw some of them away and then you pick some more Until your playlist fits the constraints There's more to it than that But that's that's the fundamental and this paper is a really interesting read. They demonstrate recommending music generating playlists By defining a series of constraints. So here's an example from the paper Of one of the tests they did in their research And there's a list of different constraints now the right hand side is this is an academic paper So it's quite mathematical, but on the left hand side In the description column You can see The first constraint is that all the songs should be different The second constraint is that they should be released between 1980 2001 The third constraint is that 20% should be stevie wonder always a good choice And so on and a local search algorithm can take a collection of songs and it can find the best Not the best but a Result which satisfies those constraints and the the select command Is an implementation of that now. It's not a complete implementation. Although this is very simple but This is how I really think we're going to get some engaging and some useful open source music recommendations so Just to summarize recommendation engines Are a thing they're here to stay kids now are growing up watching youtube and using recommenders every day And they're going to become more and more a fact of life The calliope project aims to make simple fun music recommendations It's a project you can hack on and use right now. You can pip install it And run it from the command line Um, it's full of bugs. So please open issues and merge requests without ever you find It's something I work on in my spare time and I don't have time to polish it Um, one final point. I think the design of calliope is much more important than the code I think the model of simple self-contained tools which communicate with a well-defined Format can work for any type of recommendations not just music It can work for any programming language not just python And with that I'm interested to see if there are any questions Let me I'll leave the screen sharing, but that's the end of the talk All right fantastic This is where we normally have an applause, but we'll play it later After your after the questions you were Very very clear And I think it's a great idea to have a recommendation system which is open source and Everybody can contribute to There are a number there are a few questions Okay, great So I will I will copy them and in the banners I will All right, so I will show you the first and I will ask you the first question We still we still have more than five minutes. So it's plenty of time so You mentioned this algorithm to Create a playlist out of some music collection and some constraints. What if the constraints are not Uh, you know assistance not able to satisfy them That's a good point. I mean, it's up to you in a way the I didn't mention this But the algorithm is implemented with a python library called simple ai which You can you can configure in various ways. So one thing you can do is say only iterate n times In fact, you want to limit it because if you don't put any limitation then the algorithm will iterate forever And you have an infinite loop. So you put a limit of say iterate 1 million times And if the constraints aren't fulfillable within that time then it will return the best thing it found within 1 million iterations So for a recommendation engine, I think that's fine because if it returns a playlist that's not perfect It doesn't really matter. You can still listen to it. Maybe you enjoy it So I think that's the right The right result for this kind of thing Cool. Yeah, absolutely another question Here is there an interface for You know say open source, you know music player like mpd or or others Okay, um, there's nothing specific to mpd at the moment. It's possible to add that Um as another python module But in many cases, it's not needed because there is an import Module and that can read existing playlist formats So if your music player can import and export in a format like xspf or m3u Then calliope can import and import that already So it's possible to interface, but it's often not needed. You can often just export and import using standard playlist formats Cool Absolutely. By the way, I really love the fact that you're using beats. I also use that library to organize my My library in my music library and I found it to be fantastic really a great piece of Yeah, go ahead Another question What next development are you planning uh for your calliope or You know system, you know, what Things are you either working on right now or are you planning for for the future in the near future? That's a very interesting question. Actually, I don't have an immediate answer. Um, one thing we just added Someone contributed some much better resolvers so you can now Resolve songs on Spotify and music brains much more accurately So I'm interested in exploring those a bit more Um, maybe not developing calliope itself, but developing some more examples um There is a the documentation is online. There was a link in my slides and that has a lot of examples And so probably the future developments will be adding more examples and more things that you can try out and kind of just fixing bugs in the The existing tools Awesome contributions are welcome And by the way, um That was going to be my next question. Good segue How would somebody contribute, you know, maybe, um, you know fixing bugs on on the github repo or or Yeah, as as with all sort of community develop projects The the best way to to get involved is to try it think of what you want to do and you will certainly find some bugs and Get involved fixing those The documentation the documentation is fairly complete, but um, it can always have more examples um And think about some fun things you might want to see in particular I haven't done much using music brains yet. So we can fetch the music brains data But i'm interested to see The things I described like recommending playlists based on all the same record label or all the same geographical location So that would be an interesting thing to work on and I think of all the pieces are there Yeah, makes sense Very cool, uh, somebody, uh, you're gonna is correcting me. It's on git lab in our github but Excellent, um a question. Um, and then I think we can Go to one ad. I think that's the way to it works and then the next speaker um Is there a way or are you thinking about adding a way if the what i'm asking is not already there? Uh, whereby you can get a big bigger corpus of music than the one that you have locally and use that to suggest new songs, which you haven't Heard because you don't you don't have them in your collection Yeah, that's a really interesting question Of course anything That can explore playlists can serve as a as a corpus so you can export things like the charts um spotify has Huge playlists which we can I think we can already export playlists from spotify So you could perhaps start by all of their um You know weekly playlists and then generate recommendations from that Or of course the music brains database is open. So if you wanted the biggest possible corpus, I'd say download the music brains database Work for that. That's a big job, though It's a big job Indeed well, Sam. Thank you so much again very interesting talk and a great project. Thank you so much Okay, thanks a lot