 Great, so yeah, thanks for having me here. It's actually my very first uropyton I have to say there's a great colleague of mine who has mentioned it quite some many times So I said it okay give it a try and try to submit your talk to the uropyton and now I'm here in lovely Dublin So thanks for having me. I guess in the recent days. We have already seen a couple of things around Spotify around GDPR Also about the downsides of let's say data science So for me as a data scientist sometimes GDPR feels like a burden But also of course it's in the interest of consumers or at least that it's how it's intended to be So a couple of or quite a while ago. It was I guess in December. It was a winter day I just said myself. Okay, let's give it a shot and pull your GDPR data that Spotify has collected about you so far and Just check it out and try to make use of it. And this is how that whole thing developed So maybe a couple of words about myself. So as you have already heard my name is Marcel So I'm a senior data scientist. I work for Inovex. So Inovex is some IT project house in Germany So I'm basically also in Cologne and what we do is basically Projects in data science and application development IT operations So and I also just decided to start my own podcast So this was actually back in September last year where I said, okay, I'm a real big fan of recommender systems I like the technology. I also like its implications and I like to talk to people and exchange with them So this is why I raised that podcast where I basically talk to industry experts and experts from academia And of course, I do build recommender systems myself. So for different clients and industry I do cross-domain recommender systems So I would like to know based on for example, you're listening behavior What may be music or not music but video material audio books you might be interested in and I've also done quite some some few works in e-commerce and Yeah, that's basically it. So give the Rex Burt's show definitely a try But actually you are expecting something for me and not I am expecting from you to listen or to subscribe to my podcast So now I should need to deliver a bit So Recommenders can grow complex So this is a very good image that depicts the different complexities within the recommender system on a high level by Eugene Yan that I really really love because it's so comprehensive and shows what you can really do in a system but also how things flow through it and Maybe back to GDPR So I have to admit I'm a big fan of Spotify and I guess I will be running into the risk of doing Much of Spotify advertising here So bear with me on that side. I guess I have been using Spotify for already 10 years now and there's a problem because I have that like songs playlist like many of you like have so maybe just to ask At that point who is using Spotify actually Okay, so there might be a couple of people who could make use of it I at least hope if not there might be many complaints during the weekend But at least that's also good because that shows that people have tried to use my software Another question where I would expect a couple of fewer heads Who has actually requested his or her GDPR data recently from some platform? Or maybe not also recently. Okay, I see at least one to a couple of hands nice And maybe now it comes a weird question and then I will finish off the questioning part Who is a semi professional or professional salsa dancer? At least who is into salsa cubana a bit? Okay, I will come back to you By the end Okay, okay, okay, but but maybe this is going to entertain you. Yeah Thanks So and what basically bugs me is you know, I guess back then when I did that project in December I had maybe one thousand eight hundred songs as a large stake on my like songs playlist. So it's every time like you Listen to your discover weekly and you enjoy a song then you just push on that heart button and it goes on top of your like songs playlist and Then from time to time I go into the playlist and I listen to the most recent songs Which are maybe let's say the 10 or 15 most recent. So this is at least very nicely describing my common behavior So I listen a lot to the discover weekly playlist But I also listen a lot to the songs that I've liked in the past which is kind of obvious but the problem is there are so many songs and I'm just using so few of them and I guess maybe Spotify I hope so has also recognized that problem to some certain part because sometimes from time to time I experience that somehow the shuffling function activates in my Spotify When I'm listening to Spotify from my listening for my like songs playlist and I wonder about okay Why do they do that? Maybe they have recognized that problem about people that are not Experiencing music that is further down. They're like songs playlist They're maybe something like that and then try to re-engage people with what they have liked in the further distant past At intellects we also say of course we want to use technology that inspires our clients But we also want to use technology that inspires ourselves and this was where I started and since I also have some kind of economics background Economists like these graphs which go into two different directions and maybe just have these fraud credrants in this We have just said one quadrant, but we could also check out the other directions and What I would what I basically did there is I asked myself. I mean there are Automated personalized playlists on Spotify already one of them I already mentioned which is the discover weekly playlist, but they are further so there is also that on repeat which says that it exploits your listening behavior of the past 30 days and Basically compiles on a daily on a daily basis a list that kind of is a collection of these 30 thongs And then repeat rewind goes a bit further So it just says it goes beyond that one month But it seems like it doesn't go so far and then of course It's a time of the year by the end of the year where everybody on all the social media is posting Their top songs of year X like for example last time 2021 So what you have lived listen most what are your favorite artists and so on and so forth But for me there was as I said something missing so something for rediscovery that exploits even Data that goes far beyond that point So and this is when I said, okay what about a personalized playlist for long-term music rediscovery and I felt that was somehow missing and Then I just came up and said, okay, maybe GDPR could be of benefit there I even haven't seen my data at that point But this was some kind of a parallel process Sometimes you first come up with an idea and then you try to search for the data Sometimes you first have the data and try to come up with an idea, but at least I had the problem framed So let's now solve it. So what I basically did is I requested my GDPR data from Spotify And at least there were of course a couple of files provided in the nicely bright and nice jasons and there were two interesting files because there was one file This is shown at the top which was basically Collecting all my light songs that I have collected so far and on the other hand side There was also my streaming history of the past 12 months. So I knew which song I have listened to at which time for how long from which artists with which you are I and so on So what we are going to do is we use these two jasons and it's not mainly only two jasons because it depends on how many streams you have performed so far and We use Python and we use Python to basically induce a user music taste profile So I basically create a profile about my own taste based on my streaming behavior of the past year And there were I guess 15k songs So quite some songs that I have listened to which could give me some evidence about my music taste And then what I do is I use that music taste to search in the space of songs That I haven't listened to for more than 12 months ago So I ignore everything that I have listened to in the past 12 months But what is still in my like songs playlist and I use basically my music taste profile as a filter within the space to retrieve what is closest to my annual or current streaming behavior But I which I haven't listened to in the past 12 months to kind of let me rediscover the past of myself for example And I do this of course with fetching some rich additional data from the Spotify web API And in the end what we do is we create a playlist Which is basically just a simple pandas data frame and then we upload it to create our own playlist on Spotify So and then we basically solved it and also have a playlist for long-term rediscovery of music And I definitely agree. There is still some a lot of blank space there Which I might address with one of my ideas by the end Okay, so maybe just a short discovery of the gdpr. So there's an article 15 of gdpr So let's do some law and the gdpr article 15 What does it says in the end is the information that you are going to request Shall be provided in a commonly used electronic form sounds like jason as I already mentioned it was jason But of course and I saw these three four five hands here in the audience that said, okay I have requested my data in the past you can also do that with other platforms and maybe there is stuff enough for future project So I did it with uh twitter. I did it with LinkedIn. I also did it with instagram I mean, there's no big magic there What I just want to show you most of the times you can get your data in two to three steps And then it will be provided I guess as part of her daily batch job Which generates the data and provides it and then you can basically download it a couple of days later So, okay, but I'm supposed to talk about Spotify not about twitter, linkedin or any other platform So how did the process look like in the example of Spotify? So for Spotify, they are at least two different options So you can request your they call it standard data or your extended so-called full history data What's the difference there is as I already said the standard data only contains your streaming history of the past 12 months Though it's a bit tinier and the extended is really the full history So when requesting my full history, I really got my streaming history that covered since these Since I joined Spotify in 2013 2012 as a user. It was about 220,000 streams that I completed so far So the first part the standard I guess in my case took a couple of days to be provided The second one is a bit more of an extensive process Extensive in compression because you need to write an email and then you get a confirmation And then after in my case, it was two weeks I was provided the file for download and there I had all my data that I could use But this case will be focused on your standard data so that if you like to use it you can easily get started there So how does it look like as I said you get some nicely raised there not too comprehensive in terms of the breadth of information So what you basically have there and what we are mainly focusing on here are these streaming histories jason So they are batched by 10,000 So Therefore we see two because I had I guess 15 or 16,000 songs there and you have your library And this your library that we see here already or not already it directly relates basically to that playlist So it basically resembles this like songs playlist where I say I have difficulties to re-engage in content That is a build order and you directly have it here What is a bit unfortunate about this is that you don't also have URIs provided in that streaming history Though you have it in the extended history. So there you have a bit more breadth of information So what I did here basically was some matching by joining artist name and track name And then finally matched with this one where I could also join these two things together and it worked But this is always a bit. Yeah, not so nice So it would be greater to directly join it via the URIs But this is very specific So and then as a data scientist I like to look into the data and just check it out what is in there and basically this was already where the Ideas somehow evolved of how I could also use it to create my own playlist recommender So basically as I said a year I had a couple of 15,000 streams Interestingly though what was that Along these 15,000 songs there were only 4,500 unique songs So which means that on average I'm listening to the same song for three times in a year I don't know what this says about myself, but it's up to you So and what we also are going to do is We discount all the songs all the streams that we just have listened to for less than 30 seconds Because this is not really a strong signal. And of course I want to exclude Weak signals in building my my user profile, of course, maybe you can also use that data to create some kind of The inverse taste and use that somehow so something up for personal work definitely What was interesting though? So along these 1,800 songs there were 60 percent That I haven't listened to in the past 12 months. So basically these 1,100 200 songs were actually Constituting the corpus from what I wanted to retrieve my songs using my taste profile So there are also some some some nice histograms you can build from it, but I will skip that So let's come to the core part I said now we have the data we use python sender to beta notebooks to process the data to get some insights about what has going on there And now comes the more interesting part. We was because we want to enrich the data As I said, we want to build a music taste profile. What I can do so far is that I can of course say what is my Most popular artists or what are the most popular artists or something like that? But there is not that much more I could also for example say when are my core using times and for example to say in the morning I'm rather listening to these artists and in the evening. I'm listening to the other artists so Maybe context aware recommender systems might ring the bell there if you have heard about this So I wanted to enrich the data and there is a nice Spotify web api that I used for that purpose first Of course, you need to register as a developer But it's also not such a big effort and then you create an app there And this app you are later on going to use in order to upload your playlist that you created on your Machine and to make it available in your own Spotify account or also to provide the service to others because I guess for one App you can register up to I guess it says 25 people that are using this app So for a small group of people I haven't found so far anyone who is interested in it But maybe I will do soon So there comes the interesting part because now we know artists We know the song names, but this is not leading to much more So what I basically did I used that data and basically queried additional audio features for each of the tracks From my like songs playlist and also for what I have been listening to in the past year So and this was kind of the key point and this api is quite nice because it shows you some More depth about certain songs. So To give you an example So there are various I guess you can't read it here But you get for example the tempo of a song You get it in beats per minute to just get an understanding of how fast or how slow the song is You get its dance ability you get its acousticness or speechiness or all this stuff So it's not so many but it's at least interesting features So it's 10 to 15 features that we use there and I was already able to say Okay, so given my behavior in the past 12 months and given the features of what is on my like songs playlist How did it change somehow and then I could already understand Okay, I have been turning a bit too less acoustic music, but to more Danceable music. So maybe I've been a bit more happier in that year. You couldn't fur or something like that I'm not sure And of course as always we need to transform somehow the data because it's on different scales So what I basically did is for each of the songs that I have listened to in the last year I basically fetched the features and I also Count it accumulated to how often I listened to it And of course now I don't want to have a song that I listened to and this was the case 150 times for example to have such a large impact And this is why I sometimes dampen the data a bit and their log transform or other transformations can help you a bit Is because in the end my user profile is nothing else Then like a weighted average and weighted by the number of normalized counts of the features of the songs So I'm basically trying to represent myself in the space of the item features Meaning that I now get a dance ability distribution for myself or a acoustic ness distribution over myself And this one I could then use as a search query in that space of not or under explored items So and by undexport I mean those that I want to retrieve from so those that are older than these 12 months So yeah, and this basically boils down to the thing. I am what I listen to So in another context, so as I mentioned, I've also worked on on e-commerce recommender systems So there's a blog post that I wrote about a deep learning for vehicle recommendations for a larger platform in Germany And what you see here is that we also can apply this in different settings So what you see here is we have a user that user interacts with cars for example by contacting a dealer or clicking or viewing or printing And of course you cannot collect all these events and they kind of resemble different Degrees of preference for the content. Of course if I contact a dealer Maybe in order to to arrange for a test drive test ride Then this might be a stronger feedback than just clicking a vehicle page And what I can then do is basically to take all these features that are connected to the items So those things that I want to recommend to the user and basically Create statistics over those interactions that are also somehow weighted in order to come up with a probability distribution for the user Such that I'm able to represent the user in the space of the items So that for example, I have a mean price for the user that I could use to say Okay, that user might maybe not be interested in cars that are pricier than 20 or 30k Yeah and then both data scientists, especially if they work in uh recommenders love is the cosine similarity Because you might have asking you so far. Okay, but now are how are you actually comparing these two things with each other And there what we normally do is we have those embeddings In my case, we don't even have an embedding because we don't really train anything there But this is not too bad But we can still apply the cosine similarity there some underlying assumption that I'm making by doing this is That all these features are of equal importance Even this might be something we could question in the future and that might also be able to learn from your own data Because you might rather be interested in songs that you're listening to Which are closer to your own dance ability or your own tempo and you disregard those that are far off your mean Let's say instrumental or something like that. So in this way, you could also say, okay I will learn a music taste profile for a user and also arrange the The the retrieval process in my item space by Paying more attention to those features that matter to the user. So and how does it finally look like? So we basically created that user taste profile. I guess we have a 15 dimensional space. So it's not too large I mean, if you are dealing with embeddings and we are somewhere rather in the hundreds of size So this is a very small size here So we basically have that user representation in that multi dimensional space now And we have all that items there that we could also represent in the same space And then what we typically do is we compute the cosine similarity for all of them Don't do this for production systems where you have millions or even more items Because then you're better off with using approximate nearest neighbor search What we are doing here is really exact nearest neighbor search because we can afford it if there are just 1000 or 2000 items where each has a dimensionality of 15 So and now comes the most boring chart and the most self evident Of course the red ones are the ones that look good And you see if I now sort all these songs that I haven't discovered from my like songs playlist Then you see it just goes off but still those that are on the list, but the Let's say worst ones have a quite high cosine similarity So in absolute terms, maybe not in relative terms of 0.6 or something like that So makes sense so far. And now it's up to you. How many do you want to retrieve? How many top k items do you want to take from from that from that candidates in my case? I said, okay, I just want to take 20 And then you basically had your songs you had their uris So now comes maybe the easy or light part and there I made use of spotty pie So spotty pie just makes it for some certain things a bit easier to deal with the spotify web api especially when it comes to The part of doing authentication because sometimes I feel very lazy about this stuff. Maybe I shouldn't At least I used spotty spotty pie then in the end to just simply Create a playlist in my account and upload a corresponding image that I Kind of borrowed a bit from the style of the previous spotty pie images and then used for my own individual playlist So and this will bring us maybe just to share a bit of what the result was in the end in my case So therefore what I will do is I will quickly switch to This notebook here. So what we can see here, um, you will all find this material in the corresponding A github repo. So there's a repo that's called liked to play and and like to play There is we had just that session before about good and well documentation I'm not sure whether I'm doing that good here, but at least I show you how to set up your environment how to Activate it and how to install the package and then make use of it yourself But don't forget to further or query your data your gdpr data and request it from spotify because of course you will need it because Data science and this might not be even be regarded as data science because it's rather nice analytical case Is senseless without data most of the time So what we now see you will also find these notebooks there if So I will skip the part of creating the playlist uploading the playlist. So let's look into My extended history. So as I said, I requested both data sets. So the standard set and the extended set So and then I basically looked into my extended history that really covers from 2022 back to 2013 to just check out Okay, so how are we actually doing and how are these 20 songs that I now retrieved? Compared to my listening behavior. So how old are these songs? How far have they been in the past? When did I consume them the last time and what you see here? So these are basically the URIs of these 20 songs is I mean the top song I really consumed a lot Was in my overall history and the least likely song at this position was only consumed four times Nevertheless, it was on the like songs playlist and it was among the 20 songs that was closest to my current music taste profile And I could also check when I listened to them the last time, but this was basically my distribution So also another not too fancy plot. I will definitely check out that pie with you that was presented yesterday But what we basically see here, it works. So it's not only music that was really tightly before that streaming history So from 2020 it was music that kind of covers All the time that I've been using Spotify, especially from the years 2016 to 2018 So really some older songs I should re-engage with So and this actually brings me to the end of my talk But now you might also question. So give us something more concrete show us what you have found there and now you come into the game because Um, I want to at least share with you how it now looks like on the platform because if we just go on my Spotify rediscover playlist then you will find that playlist and the very first one was actually a nice one because I did some salsa cubana classes quite a while ago and there was a song that was really, yeah, um Every time kind of in every class this one was played and it was Welcome to the rediscover past Yeah Yeah, I was shooting for that to get already some applause before the end of my talk. Thanks for that Maybe I should not only re-engage with the music But I should also re-engage with taking further classes because I was not sure as I was doing my best job here on stage But at least I tried. Yeah, so I'm not afraid to make a fool of myself Okay, so we have that rediscover past but now you might mention come on. This was very easy Yeah, you took some weighted average. You took that weighted average to discover Something that you have already seen so far, but I mean we could go further So we can think about extensions here I mean we could also there is another endpoint that delivers the recommendations that Spotify also has there I mean we could now use these songs that we came up with as seeds for the recommender api And this recommender api you can feed with songs So basically was up to five. I guess you are eyes for for for tracks on Spotify, but also with artists or With with genres So why not kind of inferring your genre distribution your top genres and then take those five from those 20 tracks as seeds and look what the recommender api might return you there and this then might give you some uplift here And move that only rediscovery focused playlist more into the discovery space At least this is what I'm going to try in the future And yeah, maybe also thinking about how you can update it more frequently without asking all the time for your provision of data by spotify Yeah, as I said, you will find all the material in the github page There's also a corresponding blog post your rediscover path proposing a new personalized playlist for spotify And of course if you have questions meeting at the conference, I mean, it's the last day, but it's not too late On inovx. We also have nice research and case study. So we also do research that we Show at the recommender systems conference. And of course, please please please listen to my podcast. That's it. Thank you Thanks Marcel. We have a couple minutes for questions. If anyone has some please come to the front Uh I would have one while we wait. Um, were you happy with the The new playlist or was there someone there you're like, uh, don't want to listen to that again I mean the way that I that I that I composed the playlist. Let's let's be honest. It's relatively naive Yeah, it's but it's it's a first shot. So therefore these songs they were not of some certain notion I mean, they're of totally different genres I mean, there's salsa kubana music next to some really more Melancholic music and then there is pop and there's rock. So there's it spans kind of from you too over killers to malo ruiz. So And this might be a bit irritating. So in that sense, I'm not really satisfied. But at least I have some hypothesis how to deal with it Hello, um, I have a similar question. Um regarding all the different features of the music So I have a very diverse music taste And uh, some of the genres involve no vocals at all others a lot of vocals and I probably would hate a playlist with only music that is right in the center of those two Um, that's this recommendation Uh algorithm account for that like would you still get the extreme outlays of your music taste? Um, no, it doesn't account for it yet or maybe only implicitly because of course, um, if your taste might be in the middle But due to that being too centric and too Out of average, this might be reflected by mismatches in the other dimensions of your profile And then since the cosine similarity kind of incorporates all those different dimensions that might account for it might Yeah, so nothing explicit, but it might work implicitly. If not, you have to put in some more rain work there Thanks Okay, I think we'll wrap up now the lightning talks start in five minutes. So once again, thank you Marcel