 So I am Aashay Tamane, I am working as a staff data scientist with Swiggy. However, today's talk is based on a slightly different domain. It's not about food, but about fashion. So the work done here was done for a leading fashion e-commerce portal. So just kind of want to shout out for my colleagues, Sagar Arora and Deepak warrior. So just to give some background about this work. So as I said, it was done at India's leading e-commerce fashion portal, where there are a lot of apparel, footwear and accessories being sold, which means stuff like obviously t-shirts, shirts, jeans dresses, just some terminology. So going forward, I'll be referring to t-shirt shirts and those kind of categories as product categories. And when I say product, it's going to be like individual products within those categories. So what are the challenges? So as you can imagine, product catalog is large and dynamic. So at the time of this work, there were about four lakh products in men category and like six lakh products in women. And about 2,000 products get added every day. So this is something unique about fashion that as opposed to other domains, the catalog is quite dynamic because fashion keeps changing. You can't keep showing the same set of products to people every day because stuff gets outdated very quickly. So that's an added challenge in this kind of domain. Now, given this large dynamic catalog, you want users to find the products that they want very quickly because if not, the products are going to churn very quickly or they'll drop out of their sessions and obviously your conversion rate will go down. So to help users find the most relevant products quickly, obviously personalization is critical. And especially in a field where you have so many products to see, it's very easy to get lost and if people don't find something interesting, they're just not going to buy it. Now, let's say I wanted to build a very simple recommendation system. What would I do? So imagine a user who had started looking at like a woman Kurta, then looked at a top and then went to flat. So a very typical approach to keep it simple would be to just look at the most similar product from every item that he has browsed and then just show it. Any ideas why it could go wrong? If I just did that, any thoughts? Similar to what you just said, one challenge is that let's say this session was cohesive. Let's say someone just browsed all t-shirts. It would have been very simple because I can then show all similar t-shirts to him and that would have worked. But what do you do in this case where a person is switching the product category itself? So he started with like a woman Kurta and then kind of switched tops and then actually went to flats. Now do I give more weight to the recent ones? Do I give weight to everything? Like what do I show to the person? So a lot of such challenges come up and it quickly gets like a very slippery slope if you kind of start building rules on top of this. So we want like a more standard approach where we don't have to write hand code such kind of rules but the model automatically tells us what the next kind of products we need to be shown to the user. So just to emphasize on the switching part which I will kind of refer to as the context switch. So another session here as you can see, right? It starts with her shirts, it has shorts. It has like a different type of casual shirt and then finally again shorts. So there's like a lot of switching happening. In fact, we see that 65% of million of session daily they have more than one product category. So in terms of purchase data as well like 41% of sessions, the actual purchased product is quite different from the first product that the person started with. Which means that people start with something and then buy something else. So if you want an analogy with the offline shopping world like think of going to a mall and then a lot of people do this window shopping where you kind of look at one product even in within a store you might start with something not necessarily you'll buy the first item that you looked at. So this has a very strong kind of connect with even the online world where people do tend to do a lot of kind of window shopping. So let's come to the problem definition, right? So we know we have defined that, you know, there's a lack of a defined need in fashion e-commerce. So we've seen that the user behavior is impulsive in the sense I see something I like I'll just go there, right? And hence it's harder to predict, right? So what is the recommendation? I'm trying to predict what the person may like in the near future. That's what recommendation is. So given this kind of problem background what we really want to do is given a user session p1 to pn-1 where p is an individual product I essentially want to predict the next product that he might want to look at, right? So this is the problem definition. The challenges as we saw we already saw that the catalog is large and dynamic new styles come big added, you know, daily and they have no historical data. And on top of this there are a lot of switches that keep happening. So how do you solve all of these problems together, right? So we use a combination of what we call as product groups. I'll talk about what that is in a minute. And then we use sequential modeling to kind of capture the context switches. So I think this will get more clear as we get into the talk. So let's start with what are product groups? At a high level what we want is to handle the sparsity of the items. We say that we want to build something of a cluster kind of sort on top of products. So think of product groups as in a very simple kind of language like a group of similar products and all of them should be replaceable with each other. As you can see as an example here so there's a lot of similar looking shirts not in terms of the color but more in terms of the style. So you would look in terms of the color but also in terms of the brand proposition and in terms of the MRP price of the product itself so you won't find something which is like 8000 or something in the same product group. So you want something as a cohesive product group which can capture well the similar products in one similar group. Now though we won't go in all the details of this and I'll put the relevant links at the end of the talk I'll just give a kind of a gist of how we are making these sessions. Now a very simple approach to do this as you can imagine would be imagine if every single item here had well-defined attributes. So I knew exactly what the fit is what the color is how it looks on the arms and things like that I could do something like a clustering. But the problem is that since this is manually cataloged items this is not that straightforward so a lot of items have these tags misplaced some of them are wrong and they don't really capture the style element to the full extent. So what we do is we augment that with user sessions so we have a lot of historical user sessions and imagine each session as a document so I think people who have used Word to Work by now and most of you would have imagine each session as a document and every kind of word in the document being a product attribute and then you have a bunch of products being done in that session and then we use kind of skip-gram word embeddings to kind of code each product attribute now that attribute kind of captures the session element as well and hence it is much richer than if you would just use individual catalog attributes again like I'll put the link at the end of so there are papers of which kind of you can look at in the details of the implementation just to show another example of this so if you look at the styles though we are not actually using visual features explicitly the user session actually augments those features because people tend to kind of look at the similar stuff though there are context switches they also kind of have a cohesive element in the way they are doing it so it turned out that when you use this user session kind of features extracted it actually adds to the richness of the product groups so we are making a slight change to our problem definition so initially we looked at given a user session of some products we want to look at the next product now what I do is I just replace each product which is corresponding product group and then the problem becomes given a user session with some product group seen what is the next product group here now what this allows us to deal with is sparsity so let's say tomorrow there is a new kind of product which comes in it's easy to map it to the relevant product group using the attributes because I know that mapping and then I don't need to do anything else because I already have a model built at the product group level and I can use the same model if I had to build this model at the product group level first of all as I said sparsity at item level will be extremely kind of high even for the existing items you won't find as many sessions for the exact same product but you can actually do it for the product group let's talk about baselines we don't want to jump directly into a very complex model because it should justify what we are doing so as we saw initially we looked at a similarity based approach where I could just recommend some similar stuff so what if I just did that what if I replaced each product which is product group and then I just show the most frequent product groups that a person has browsed this would be the first cut version of my model this is one majority voting based approach there could be another approach I could imagine like a graph where each node is my product group and given all this historical data I just need to find the transitions between product groups for example if I know most people who see men t-shirts then go to men shorts I can have a probability associated with that and I can use that very simply to recommend the next set of items what's the problem here the problem is that the entire sequence of the user journey will not be captured so as you saw people are looking at a lot of stuff and that can actually have a final bearing into what he is seeing next because I might start a session with t-shirts I can go to shorts but that also increases my probability of switching back to a t-shirt later on and you want to capture that if I don't capture that I might just look at shorts and recommend accordingly so that's the major challenge that we will be addressing in our approach so a traditional machine learning approach would be to use something like an HMM so for people who are kind of familiar with HMM we have a bunch of hidden states like X1, X2, X3 and then from each hidden state there are probabilities of observing the actual observation states so in our state the observations would be nothing but the sessions themselves so we won't go into details of the HMM part I'll just kind of give the intuition for it so when we build this one very interesting thing that we saw was we wanted to understand what the hidden states are like can we actually analyze the observational probabilities and see what that hidden state means so for example what we saw was if a person is in a given hidden state the emission probabilities were like short sunglasses hat and others now what that could mean is that the user is probably looking for something for a beach state or a summer state and if I can say that the user is in that state I can predict according to that so those were some of the things a more kind of a modern approach would be used like RNN which are like probably most frequently used now so it's similarly trying to capture the sequential modeling where given a bunch of items we want to kind of predict the next item so the products will be one hot encoded here and then each product group will be a sequence of product group will be passed through the RNN and we have like millions of samples to train with so that's not an issue at all now we know that RNN kind of suffers from the vanishing gradient problem so people who have worked with RNNs and hence there are two approaches you can take either use LSTM or GRU in our approach we have used GRU now coming to the gist of it right what are we trying to do so we have a bunch of sequences of product groups we feed it to the GRU and essentially we are trying to predict the groups that the user might look at how would we evaluate this I already know historical data I cut it off at some point in the session and I predict the next products in that session that is how we evaluate it so coming to the evaluation bit of it people who have worked with recommendation would know like MRR and MDCG these are like standard metrics if you don't know what they are don't bother just kind of look it up offline essentially if these metrics are higher the model is better at kind of recommending now we compare with the baseline approaches so the PG graph and the majority voting approach that we looked at and we compare it with the GRU approach both in terms of mean reciprocal rank and MDCG and we see that the GRU one performs better especially when the K increases, now K is nothing but how many items are you recommending are you recommending like 3 items, 4 items, 5 items and we see in fact a trend there so if you kind of look at the session length and kind of the MRR plotting with that so we see that as the session is longer and longer the model actually performs much better than the baselines because there the context actually becomes more and more important what I browsed in the session becomes more important if for shorter sessions it may not be that important because if you have let's say 3 clicks there's not much context switch that can happen in those 3 clicks so that's why the session length becomes very important now we see this behavior both for men and women just to confirm the fact whether there's any gender kind of bias into it and just make sure that the model works well equally well for both the genders now so this is how we kind of modeled it now we also wanted to see you know how the model is kind of working behind the hood right just to get an idea of what the model is doing what we tried doing is we artificially generated a few sessions from the model given the GRU we kind of generated few sessions and we wanted to observe are there any trends that the model is capturing because if you kind of sample from the model and if you can see some trends that means it's kind of have understood some aspect of your data so in this particular session you could see that this session started off with kind of westernware sequentially and then it switches to Indianware this is something that we also observe in our data that you know if someone doesn't kind of keeps looking at a lot of westernware and they can't find something relevant they actually switch to Indianware and many people might be more comfortable with Indianware than westernware right so they ultimately switch to that and they end up buying of Indianware similarly if you look at the men's collection now this is quite interesting so Roadster and Master Hour are like value brands like if you look at the price below it will be like 500 400 kind range it kind of in the session there's a kind of an ambitious switch to Levi's which is like 15,000 now if you look at the switch there it's quite abrupt now this is something that we have actually seen in a lot of sessions where people do you know browse start with value brands they want to kind of have that ambition of kind of switching to a more kind of a product but then you know ultimately you know maybe they don't want to go with it and they come back to you know the value brands so you'll see a switch back to Roadster and break bounds after that right where the value proposition is pretty similar to what we see now what it means is that imagine a person who has seen a lot of products in his session which are from value and then you had a Levi's in it now if you use something like a similarity approach right and let's say you this was the last item that you looked at you would recommend all Levi's similar items right it could be like very high brand high premium items which the user may not even be interested in it was just an ambitious click but if I used an approach which captures the entire context of it it's highly likely that when I recommend stuff to him I will not I will recommend less of Levi's items and more of Roadster and Mars and Harbor items the context becomes like a very important thing here so to conclude as we saw context changes are common in fashion e-commerce and even if I remove the word fashion from there in any domain that you look at you can have a lot of analogies there like so in any portal any journeys when the user is going through the products it's very unlikely that the user has a very high intent of a specific product in mind and he might actually you know do a lot of switches there so in all those cases they might look at you know you could look at similar approaches and in such as on your you know sequential modeling with JRU performs better that's what we observe so that's my time thank you have a quick question so right so here we were just looking at the items itself the sessions the intention for that was many times what we see is historically you might be interested in a digital brand but in a lot of times the session itself has a completely different trend right so you might have a lot bought a lot of you know let's say Ferrari t-shirts but in this session you're looking at device or something else right and that becomes like even more important than the historical data so yeah right sure sure so if there are different cohorts that we see that people behave itself differently in that case what you could do is kind of build different models for each segment of users so let's say there's a segment who only looks at premium shirts in that case maybe the transition that we saw wouldn't happen right then there would just be a transition from device to something else so just be a transition from device to something else right maybe device to Ferrari kind of thing but there could be another segment which could be more like a value based segment and for them like there could be a different model so it totally depends on the user kind of data that you have if your users all behave like similarly or if they have different cohorts you could do that so that's like the next step sure no not yet in our work but that's something that could be looked at okay yeah but right right no we didn't explore that so in essence we wanted to kind of just make sure that if you're taking a slightly complex approach than the baselines like there should be a substantial shift right because many times what happens is you do something complex and it can't even kind of beat the basic but you're right like if there's something you know like item to work we can try it out and compare how it compares with the MRR and NDCG so I have one question here yeah so when you're comparing with the baseline did you compare with another baseline which is sort of sequential but not using a model for example if you use some smoothing function similarity with respect to most recently interacted with them yeah so basically when I talk about the product group graph there are two versions of it that we tried so one version of it is basically just look at the last item in the sequence and then look at the transition from it the other version that we tried was from each item in the sequence you look at the transitions and then you kind of add up the probabilities and normalize for it and so yeah so yeah we tried that but the GRU just kind of does better than all those baselines so I have not reported all of those results here but yeah I'll put up the link as I said in the paper so you could kind of go in the more details of that as well