 Hey, thanks for joining me this afternoon. I'm Sophie. I'm a data scientist at Red Hat, and I'm part of a team who tries to solve business problems for both internal and external customers using machine learning. I've also spent some time focusing on putting those machine learning workloads into the cloud. But today I'm going to talk about recommendations. So I've been thinking about recommendations for a while now, and hopefully you're here because you're also thinking about recommendations or want to start thinking about them. Even if you're not thinking about them, pretty much every company and service that you interact with is. So who here subscribes to a film, movie, streaming service? Yay, audience participation. All right. Cool. So they make recommendations to you and the recommendations they make to you are likely to differ from the recommendations they make to me. In a similar vein, I subscribe to a music streaming service, and every day they recommend to me playlists of songs they think I might like, combined with some things they know I like because I've listened to you a lot. So they're perhaps what we think of as the most traditional recommendations. If we move on to the retail industry, my grocery store sends me coupons to an app for things it thinks I might like. Again, it's trying to sell based on what I've bought before and why it thinks I will want. But we can also step away from the idea of purchasing products or services. And even when we're using free social media, we see these suggestions for things that we need to purchase. And other people that we should interact with, for example, social media is always saying, hey, we think you might know this person. You might want to connect with them. So companies are realizing that effective personalized content can be given to us as users in order to really increase their profits and make the customers happy. And this is pretty much prevalent across every industry. So today I'm going to talk about that step from taking an off the shelf textbook recommendation method to give personalized recommendations to taking that off the shelf recommendation method to give great personalized recommendations. We'll start by talking about the textbook algorithm, which is known as alternating least squares. We'll talk about how it works, what it's good at. And from there we'll think about what it's not so good at. So what do we want from a personalized recommendation service that alternating least squares can't provide when we take it straight off the shelf? We'll talk about how some of these missing items can be fixed with some simple post-processing of our output. But one thing that we can't fix with post-processing is identifying data drift and doing something about it. Users change their tastes for many reasons on a variety of different time scales. And so we'll really drill down into what the problem is with that. And I'll introduce to you a composable method for recommendation, which we can use to solve this problem. Finally, we'll see how that scales when we are in the situation where we have millions and millions of users. OK, so alternating least squares is the textbook recommendation method. There's off-the-shelf implementations in Python, Scala, and R, so you could go and use it today on your data. In its standard form, it works on explicit data. So this is data where the user has given us a distinct rating. In this case, user one has said, I rate film number five, three stars. Alternating least squares takes that information and puts it into a matrix like so. And we populate this matrix with all of the ratings that we have across all of the users. So the question marks denote the things we don't know about and the things we want to make predictions for. The way in which the algorithm works is to then factorize that larger matrix into two smaller matrices, one corresponding to users and one corresponding to products. Now, the way in which that factorization works isn't important for the purpose of this talk. We just need to remember that we got ourselves two matrices. And we can look at the roles of one to find out about the users, the columns of the other to find out about the products. Once this larger matrix has been factorized, it's really easy to go ahead and make recommendations. So if we want to guess how user one will react to product two, we take the dot product of user one's vector with product two's vector. And from there, we can do things like compute these predicted ratings for a range of movies for a particular user and return to them the films we think they'll like the most. Another great thing about ALS is that it's really easy to compare users. So we can have a look at the feature vector for user one and compare it to the feature vector for this last user. If we find that they're very similar in terms of there's not much distance between their vectors, we can use this information so we could say, OK, we know user 10 will react the same as user one to a lot of these products. So there's no need to recompute for those. So we can save some computational costs. So that's in the case where our data is explicit. But what happens if we've got implicit data? So if I listen to a song once, do I like it? Yes? No? Maybe, right? We don't know. If I listened to a song 10 times, would you be more sure that I liked it? Yeah? What about 100 times? Perfect. Right, so this makes sense, right? We want to capture that I'm more likely to like a song if I've listened to it more times. And all we've got to do to use alternately squares on this data is define a function that maps from our recording. So in this case, our recording is how many times I played a song to the confidence that I liked that item. And from there, we can go ahead and use the algorithm basically straight off the shelf as we did before. OK, so we've got alternating least squares. We can use it on explicit data where the user said, yes, I like this and I like this so much. We can use it on implicit data as long as we define that mapping. And it's really easy to go ahead and make predictions and to compare users. But we shouldn't push ALS to production as it is. So what's it missing? Well, if you think about these two matrices, what happens when we add a new product to the market? No users have yet interacted with this product. So there's no way to say how they will rate it. And because the vectors, these explicit vectors for the products don't actually correspond to understandable or explainable features, then we can't just generate a feature vector for this product. So until somebody rates it, we've got no idea whether or not anybody else will like it. Similarly, ALS can't cope if a product goes off the market because of the way that the data is traditionally stored when you implement these. It really struggles if you want to no longer recommend a product to a user. Another thing that ALS can't handle is anomalous recordings. So these crop up, perhaps, when you hit the wrong star rating by accident. Or maybe there is some real anomaly in your opinion. Maybe you hate all horror films except for one. And ALS is extremely sensitive to these anomalous ratings. So in the algorithm, these anomalies are fed into the large matrix and then into the matrices that are factorized. And so your user vector is updated because you have an anomalous opinion. And as such, that's going to affect every prediction that is then made for you. Alternately, squares also doesn't take into account changes in user's opinions over time. So it's not uncommon for people to have the same accounts for many years. I've had the same Spotify accounts since 2008. And my tastes have changed. They've grown broader. I don't want to listen to everything that I listen to then. But I might want to listen to some things I listen to then. And alternatingly, squares doesn't capture this. And there's not just one way in which tastes can change. So we traditionally think of aging as the most general way in our taste change. We grow up, we mature. But tastes also change with seasons. So for example, you don't want to recommend that someone watches a Christmas film or listens to Christmas music in February or March. And some of these changes also represent seasons of life. There's likely some correlation positive or otherwise between someone's changing relationship status and interest in romantic movies. So although ALS can give us these recommendations, quickly and easily, it'd be nice if we could solve some of these problems that it doesn't address. Now, some of these problems can be solved by post-processing. So if we think of our alternatingly, squares algorithm as just a model that lives in this box here, the model takes in data and it spits out recommendations to users. We could go ahead and implement a post-processing microservice between that model and the recommendations. And this would help us out for a few reasons. So one is that we could filter out any temporal recommendations using tags. We're not gonna recommend anything that has a tag Christmas or winter in July. We can also use it to make sure that we recommend new or promoted products to everybody. And then once we see some ratings from those, we can go ahead and put those ratings back into our original matrix and factorize and so on. And we can make sure that outdated products are never recommended to anybody. Now in ALS, you probably don't want to remove the information that you had about outdated products if someone had rated them because this does give insight into those users' vectors, right? It still tells you something about a particular user, even though that product isn't available for purchase anymore. But what we can't do in post-processing is figure out which media or TV programs are gonna spark nostalgia, you're gonna wanna see again, and separate those from the ones that ultimately you don't mind if you never see again. So post-processing can't tell you if you listened to tons of nursery rhymes this weekend because you were struggling to fall asleep and you really liked their repetitive and calming nature, or if you were babysitting a small child for the weekend. Should your music recommendation algorithm now always recommend nursery rhymes to you on a Saturday or not? It's also not able to deal with anomalous behavior. They get fed back into the algorithm and further influence your recommendations. So luckily for us, this other class of recommendations exists. It's using composable signatures to make recommendations. So the idea is that we summarize a user's behavior by one or many of these composable signatures, which we can then compare and contrast over time and identify these changes in taste. So before I go on and talk about all the benefits of these signatures, we'd better figure out how to make one. We're gonna do this in the context of a music streaming service. So suppose we have a user and this is the music that they would listen to. The algorithm we're gonna use is called the MinHash algorithm. And this algorithm is traditionally used for identifying plagiarized documents. If anyone's ever tried to identify similar texts, this is the algorithm that they use, but it applies beautifully to this recommendation context. So what we wanna do is map from that user's listening history down to one of these composable signatures, which is called a MinHash signature. And we can just think of this as a vector, a vector of numbers. It's called MinHash and the hash in there stands for hash functions. So we need some hash functions to get going. If you're not familiar with hash functions, we can just think of them as a function which maps from a large range of inputs down to a small integer. And if my MinHash signature is gonna be of length n, I want n independent hash functions. So here we're going with five hash functions. We initialize the signature to infinity and then we're ready to begin. So we go to an artist that we've listened to, in this case Taylor Swift. What we do is we pass that artist through all of hash functions and these are gonna map from a string, in our case Taylor Swift, to a small integer for some definition of small. Okay, this is the exciting bit. We're gonna update our MinHash signature. So to update the MinHash signature, we take the row wise minimum. We take the minimum of what is suggested versus what's already there. What's the minimum of six and infinity? Nice. Next, what's the minimum of 12? Okay, you get the idea. So when we come along with our first thing that we've seen, this is how we get our MinHash signature going. From there we move along to our next artist and again these hash to small integers. It's called MinHash. So guess how we update? We take the minimum. So the minimum of 17 and six is, yeah okay, you see where this is going. So we're gonna insert in the values that were smaller for pulp than they were for Taylor Swift and keep everything else the same. And so we continue to update our MinHash signature as we move down the list of artists that our user has listened to. So that's going from our user's listening history to our MinHash signature. And you can imagine doing this live as the user is streaming some media, right? You can update it on the fly. It doesn't matter which order the artists are listened to. You'd get the same MinHash signature out. Okay, so great. We've represented our user by a smaller signature but how does this help us for recommendations? Well, if we've got two users, we can compare them by comparing their signatures. And the way in which we compare two MinHash signatures is to just count the number of locations in those signatures where the values are the same. So in this case, these users have the same signature in the second and the fifth bucket. And as such, we give their similarity to be two fifths. So once you've identified similar users, you can go ahead and make predictions. If we deem these two users similar enough, it makes sense that we'd say, hey, user two, you'll like the things that user one listened to, but you didn't. And vice versa. Okay, so now we've got another algorithm to make ourselves recommendations. But why does this help us out? What does it give us over ALS? Well, I told you that these signatures were composable and it's this composability which gives you some really nice properties. So we can combine two or more MinHash signatures really simply by taking the roll-wise minimum. So instead of assuming that these music histories come from two different users, suppose they come from the same user. We've got two signatures and we just take the minimum to combine them. So why does this help? Well, who listens to music on their phone on the way to work? Yep, and then keep your hands up. Who then gets to work and plugs their headphones into their work computer and continues to listen to something else? Still some, okay. And does anyone think that the things they listen to on the way to work differ from what they listen to when they're at work? So I like to listen to podcasts when I walk to work. I listen to repetitive music, not nursery rhymes, when I'm sad at my desk, right? And I think people see this behavior a lot of different devices. We use for different things nowadays. So with Minnehash, what we could do is keep a local copy on our devices of what the particular user has listened to. We can then make device-level recommendations to the user based on which device they're using, or we can combine and get a larger global picture. In the same way, we can keep recommendations across different time periods. So we could keep one signature for December, one for January, compare them. If they're notably different for the same user, then it suggests that something has changed in the user's behavior. From there, we can step back and look at what they've listened to or how they've been interacting with the service. If we see that something has changed significantly, we can raise a flag and intervene and see how we want to sensibly make recommendations from that. And the scale on which you make your Minnehash signatures can be as fine as you want it to be. You can always combine them and get a larger picture, but it might make sense to keep them separate, right? What do people listen to in the morning and in the evening and so on? We can always combine and get a day's view. So if we think about seasonality, if we kept a Minnehash signature or had the way to compute a Minnehash signature for what user listened to every month, this would enable us to do things like recommend Christmas films that they might like in December or recommend songs that are similar to the songs they listened to last summer, this summer. There's often temporal change in what people listen to in terms of music in summer and winter. And Minnehash really lets us do this. Okay, let's step back and think about how we made recommendations when we were using the ALS algorithm. So if we wanted to know how a user would rate a particular product, we just had to multiply the corresponding vectors together and get that estimation. With Minnehash, the method for computing recommendations is more involved, it's more complicated. So if Minnehash signatures are deemed similar enough, then you can go ahead and look at that user's listening history or interactions and make suggestions based on what other similar users did. Now, here we've only got six users, so making those comparisons isn't too arduous. But Spotify, for example, has 217 million users. So if you want to make these comparisons even just for one user and compare them to all of the users, it's just not feasible, it's not gonna scale at all. Luckily for us, something called locality-sensitive Minnehash exists. So the idea with locality-sensitive Minnehash is that it splits a user's Minnehash signature, which we've already computed into subsets, and it then hashes those using hash function. If for any two users, a subset maps to the same number when they're hashed, we consider them candidate pairs. So this reduces the space over which we have to make comparisons, and we can then go and just compare those candidate pairs of users. So here, I've split our Minnehash signatures into four bands. Each band contains two of those buckets from our Minnehash signature. So remember now they're both containing integers. So this time our hash function is gonna map from two integers down to one integer. And so what we do is we map for user one in band one. It gives us a value. We then go ahead and do the same thing for user two, using the same hash function, user three, and user four. And what we see here is that user one and user four both map to the integer four. So we'd add them to our set of candidate pairs down the bottom. We then go ahead and iterate this over all of the bands in our algorithm, and it really reduces the complexity of this system. So Spotify only, Spotify has 217 million users, and so even if we just want to keep one Minnehash signature for every user, making these comparisons is just infeasible, but the benefits of Minnehash really come in when we want to go ahead and keep track of multiple Minnehash signatures, right? So the situation gets much more complicated. So we had no matches in the second band, so nothing to add. In this third band, we've added another couple of candidate pairs, and I think we'll get one more from the fourth band. Okay, so you can see that this is really reducing the complexity of the number of users that we have to compare in order to be able to make those useful recommendations. Now, perhaps unsurprisingly, I haven't covered every aspect of recommendation in the past 20 minutes or so, so let's just talk about some things that we didn't have time to talk about in depth today. So first up, the Minnehash algorithm that I introduced assumed that if you played a song, you enjoyed it, right? But at the start, we agreed that that is likely not the case. So how could we solve that but still use Minnehash? Well, one thing that we could do is introduce a threshold and say, I'm only going to record a user's playing history or an artist that they listen to into a Minnehash signature if they listen to it more than a certain number of times, so we could set that threshold at two or 10 or so on. Or you could check more granularly, did they listen to the whole song even? For example, often the skip button is used a lot. So that's fine if we can introduce this threshold, but what if our ratings are in fact explicit and we want to use this method? We know that this user liked this product or we know that they disliked it. Well, in this case, we can keep multiple Minnehash signatures for that user, one for things that they liked, for example, and one for things that they didn't like. So you might think that when we're recommending, we only want to think about things that people liked, but actually, it really makes sense to consider things that people didn't like too. That gives you insights into things that people also won't like and so you should never recommend. And sometimes there's also strong correlation between things that people didn't like and things that people will, so you want to keep track of this. Another problem in recommendation is dealing with the situation where multiple people use the same account. So if you live in a house where everybody uses the same Netflix account and perhaps multiple people use the same user profile, then you've got yourself a bit of a problem. So again, Minnehash can help with this. One thing that you could do is look at which devices are people using and keep those device-specific recommendations because likely people use different devices within the same household. Or you could keep different signatures for different genres. So I might never watch action films, but if a friend suggests, hey, we'll watch this action movie, then okay, we watch it, but it's on my account. That doesn't mean that I want recommendations for action films all the time, but if we've kept that in a separate Minnehash signature, I, as the user, can easily see the difference when those recommendations are made to me. So we talked a bit about anomalous recordings and anomalies really influence alternatingly squares. They don't have the same effect in Minnehash, but we haven't really talked about how to identify them. So one way to identify these would be to look at the users, which we deem more similar based on Minnehash signatures, and then see, hey, how many of these users also listen to this item? So if amongst the users that you're more similar to, no one else has a particular behavior, you can know that that's an anomaly. From there, you can decide what to do with it. You can not put it into the Minnehash signature, or you can put it in, or you could keep it separate in an anomaly hash signature that you can use for recommending wacky things to them. And finally, we talked about data drift, the way in which tastes change, but we haven't talked about what to do if users explicitly change their opinion. So suppose I've rated a film five stars three years ago, and now I go ahead, watch it again, and rate it one star. Should we therefore change the way in which we rate everything that was deemed similar in that same time period that I enjoyed? Should we just not record this change in taste? Is it an anomaly? What should we do? So there's still things like this that we have to think about whenever we're using an algorithm. We have to think what would we do on what does this mean? So that's some things we didn't have time to tackle in depth, but let's talk about what we did cover. So we talked about alternately squares. We saw how it can quickly and easily make recommendations to users, and that we can compare users really easily, which is a lovely property. But it's missing the ability to deal with data drift to either identify it or make recommendations regarding it, and it's also really sensitive to outliers. We saw that some of these problems can be solved with post-processing. For example, we can make sure that we recommend new products to all users really simply with this post-processing microservice. And then we introduce these composable recommendation methods. So using the min-hash signature to create multiple summaries for a particular user enables us to go ahead and make more sensible recommendations that are time-appropriate. Finally, we looked at how we can make this method scale when we go ahead and use it with tons of users. So on that note, I'm going to stop there. I'm happy to take questions if there are any or I'll be around. Sorry about that. Ooh, I can hear myself. Right, one of the things that you mentioned when you were talking of active feedback, there's this other thing that certain people tend to, let's say you have a movie rating scale on one to five stars. Some people, three might be bad and five is just what they like. And then some others might be, let's say, more stinger in terms of what they give out. There needs to be a step for normalization before you actually process this. Wouldn't you have to do something like that? Yeah, that's not something I've thought about, but that certainly makes sense. So to normalize the ratings on a user scale so that you're, but then does that, so it sounds sensible and it probably is sensible. I wonder how it would work for people who really just are enthusiastic about everything though. Well, yeah, precisely. That's why, I mean, modeling that distribution is probably going to be, how exactly do you want to have a function? Because especially when you have, if you want to try and look at it at a product level, then that's going to cause an issue as well because you see where I'm going with this? Yeah, yeah, right. So we could, yeah, I understand. So yes, so this kind of setup that I talked about just took in data and then made predictions and we didn't think about pre-processing that data to make it more suitable to make predictions. That's a really good point. Thank you. I was wondering how well does the minhash work when you have either very few ratings or a lot of ratings? Because I've been thinking that when you have very few ones, it's hard to compare but when you have a lot of them, the minimum of a minimum of a minimum gets very small and maybe the comparisons are no longer meaningful. So how do you deal with this? Sure, so certainly when you don't have many ratings at all, you're really going to struggle with minhash. There's not much you can do there. I think it is much better suited to the situation where your data is implicit and it's sort of streaming in. You've got tons of it usually when it is implicit. You're certainly true that in the limit as somebody listens to every artist out there ever, then all the minhash signatures are just going to converge to the minimum of one for the sake of argument. I mean, in practice, the minhash signatures themselves are much larger than five long. There's sort of thousands of entries long and the hash functions themselves map to such a large range of values that it's rare that we get collisions from artists. So it's not something that we've run into that we're not running this in production. So it's certainly something to keep an eye out for. When you're recommending on the finer scales, if you're looking at what people listen to in a week or a month, then again, it's probably not going to become an issue, but certainly if you're curating someone's music history over years and years and years and making comparisons, then it's more of a concern. Okay, thanks. All right. Okay, it looks like that's it. I'll be around if anyone has more questions.