 customers. So no obvious question that we asked ourselves is can data science help, right? Now, how did this all start, right? So just a bit of a history lesson. So like most of the innovations attributed to wars in the history, one thing that happened in the fashion industry was around the American Civil War. So before American Civil War, most of the apparels were made to measure. But the American Civil War called for mass recruitment of soldiers, and therefore a need for mass production of soldiers uniforms. And that was the starting point of ready-made apparels. Now, so while people try to fix one problem, this just led to other problems, right? So the problem of fitment, right? It was more glaring for the women apparel. And then this essentially led to the beginning of a lot of sizing standards. A lot of sizing standards came up during the period of 1958, 1983, and beyond. Unfortunately, none of these sizing standards stayed. And the primary reason for these standards to fail was that the sample on which these standards were based was just not representative of the actual women population. So it so happened that manufacturers shunned these standards, and they just defined their own sizing standards. And this was the beginning of fit models, right? So fit models entered around the same period. Each manufacturer had their own fit models, and the apparel and other products were tailored to the fit models. So this was just the beginning of all the chaos. So essentially, what makes fit so difficult is two things, right? One, there is lack of standardization across brands for the same size. So brands design their clothes differently using their own sizing standards. The other thing that is very commonly seen is this demon of vanity sizing, which is essentially that brands often tend to downsize their products just to appeal to their customers. So no two sizes feel the same across brands. And here's a simple example, right? So we have two formal shirts, one by phosphorus, the other one by Raymond. And you can see that although the size numbers are the same, the actual sizes vary a lot. So there are a bunch of solutions out there. In the offline retail, for example, people have tried to address this somewhat by the use of mannequins or real models. There are three solutions. There are virtual fitting rooms. And on the other extreme, you have 360 degree body scanners, which take complete measurements of your customers. So all these solutions are essentially based on measuring up your customers and then recommending the products accordingly, right? So essentially, they measure up the customers and then map those measurements to product measurements. So all these solutions have been attempted online. So again, this requires them to get input on the use of measurements. And there are some solutions based on either visualization or just based on recommending the right size product to the customers based on the measurements entered by them. But again, all these solutions make and require the customers to explicitly enter their body measurements into the online portal. So of course, this is a very laborious process. Leads to a not very good experience for the users. And therefore, and of course, it's not scalable as well, right? Now our approach, on the other hand, is based on two primary assumptions. Firstly, we assume that we have, I mean the catalog data, we have access to the actual sizes of these products. We also assume that we have access to the sales data or some products that the customer already owns. And essentially the idea is that can we use the products that a customer already owns as a proxy of his or her body measurements? If we can do that, we can then essentially map those to the product sizes. So a successful sale to a customer, for example, is a signal of those products or the size of those products being fit for the customer. On the other hand, if a product leads to a return or exchange with a size issue related reason, then that is an indication of an incorrect fitment. So essentially there's a overall design of our approach, right? So we have these three data sources, primary data sources. We have the catalog size chart information and like I said, we assume access to the actual sizes of the products. Of course we have the sales data and the returns data. And so the pipeline essentially comprises of four key steps. The first step is that of clustering. So we basically, given the catalog data, we basically segment it into products of similar sizes. That gives us the clustered set of products. The next two steps then are to map the products as well as the customers to this common space of clusters. So there's a product cluster closeness scoring and there's also a customer cluster support scoring. And the fourth step is then to come up with a combined scoring using these two matrices and use that to recommend the right size products to our customers. So I'll run through each of this next. So this is the current state of affairs, right? So Bob here had made a past purchase of Francole and his shoe of size nine. And based on his past experience, when he's making current purchase, he selects a size nine shoe, a converse shoe again. But hey, it turns out that the earlier size that he had chosen was based on a UK size standard, while the current shoe that he is choosing is based on the US size standards. So of course it doesn't fit, right? And he has a poor experience. So, you know, all right, so clustering, right? So the first step that we did, that we do is, you know, we take the actual size data from our catalog and segment the products using clustering mechanism. This essentially gives us clusters of products, each cluster comprising of all the products which are similar in size. So we do this for different categories of products. And essentially we, at the end of it, we get clusters of products which are similar. So we used an implementation of K-Midoids. It's called Clara, which is essentially a clustering algorithm for large data. Essentially it uses, you know, takes samples of data for clustering and works very well for large volumes of data. K was empirically tuned just based on the inter-cluster distance as well as the intra-cluster distance between the products. So once we have, so once we do the product segmentation, the next step is to find the product cluster closeness. So this is a simple step where you essentially score each product and basically each product cluster pair based on its distance from that cluster and normalize it by the maximum distance between any two clusters. So essentially this gives us a two-dimensional matrix, a product by cluster matrix, where products of the same size tend to be in the same cluster. So you see the Frankel-Leone size nine shoe coming up in the cluster C2 and the converse size 10 shoe getting a high score for cluster C2. Right, so the other thing that we do is we score the customer cluster support and this is based on the sales and returns data. So essentially all the products that a customer has bought we kind of map it to the corresponding cluster and we end up with this matrix on the left side where we have a customer cluster, the number of sales that he has, number of purchases that he has done, he or she has done from that product from that cluster and come up with a score of how much is the customer, how much is the support for that cluster for that customer. So we use a simple walls technique to, it's essentially a technique to adjust for sparseness of the data. Given this matrix, we come up with a two-dimensional customer by cluster matrix where on each row we have a customer and it's support score for each of the clusters. All right. The last step then is to come up with a combined score based on the product cluster closeness and the customer cluster support. So given a new SKU that a customer is trying to buy, we essentially take all the products belonging to, so a product here is, so any product with a different size, it counts as a different product. So we take all the products belonging to that SKU, extract the matrix from the product cluster matrix that we had and just do a element-wise multiplication with the customer cluster support matrix and that gives us a final score for the product and the clusters. So on the right hand side, if you can just order that matrix, order the symbols and just take a row-wise max that will give you the product that we should recommend to that customer. So essentially what this does is that, now we are able to kind of recommend the right size to our customers. So a customer, who, so Bob here who tried to purchase a Converse SKU based on his past experience and selected size nine, we kind of interfere and then recommend size 10 SKU based on his past purchases. All right, so currently we have kind of evaluated this approach for the shoes category. Essentially what we did is that we used, we took all the data until certain time T and then use the next two months data for evaluating the model. So on an average, we see kind of encouraging accuracy. We see that about 88% of times we are able to accurately recommend the right size to our customers. But one interesting observation that we did have is that users often tend to buy for their friends and relatives and that poses an interesting challenge to then kind of recommend the right size to those customers. But then like I said, since we have a segmentation-based approach, so we know the support for each cluster for each customer. So even if a customer does buy different products belonging to different clusters, he or she will get a non-zero support score for these different clusters. And then so this essentially we can use to figure out the different profiles that a customer has and therefore during a fresh purchase, we can actually interfere and then probe a customer and check that his current purchase is going to be similar to which particular profile and use that additional input to then make a better recommendation to the customer. All right, so yeah, that's what I had. Thank you so much for listening. If there are any questions I can take. Hi, this is Shakti here. In your talk, you said that you have access to ground truth, right? There are two companies with two sizes who claim to be giving two different sizes but they are probably selling the same size shirt, for example. Phosphor says 41, Raymond says 49, but both of them are let's say 45 or something like that. You have access to the truth anyways. So what stops you from creating your own baseline and then just mapping all those numbers to your baseline and then using that to predict what, for example, if UK size nine maps to US size nine, then when he's choosing UK size 10 or something else, he can just ask him to predict US size nine, right? Because you can always map that. Yeah, actually, it's not that straightforward. Well, I mean, if you look at the actual sizes and the shoulder length and the length of the product, they tend to vary a lot even across different sizes. So it's not as straightforward as UK size nine mapping to a US size 10. I mean, sometimes there are, you know, the actual lengths might, you know, in one case, it might be let's say 30 centimeters. In the other case, it might be 30.2 or even 31 centimeters. So the actual size might actually vary a lot even though, you know, even for the same standards, the same standard size. Okay, so the first question that I had, you kind of answered it saying that, you know, we have people ordering for somebody else. And I think your answer to that was that you will have in the future profiles which the user would have to select. Is that correct? Yeah, so I mean, so currently, we don't have that additional input. So when a user is buying a certain product, we don't know who is buying for it. Because I think that is a major gap because at least in our country, we tend to be a guy who's technically savvy and he orders for everybody else. Yeah, I mean, 50% of the data that we see, I mean, there are cases where people tend to buy, and they have different multiple profiles. So we see users buying for cluster C1, but he also has bought for products from clusters, some other cluster. But since we have access to these different segments of products, so we know that this person has multiple profiles. So when he's making a fresh purchase, you know, we can kind of interfere and kind of tell him that, see, based on your past purchases, we seem to have, you know, you seem to have these multiple profiles. So if you're gonna tell us which profile are you buying for, then that can be an additional input. That's the other question that I had because the UI you showed, it would just show the size that you're recommending without any information saying that that's the recommendation that you're doing and based on analysis. So my question is, don't you think that's going to be a problem because what I would say is my size is eight, you put up 10, say, come on, the system is having a problem. I'll just go for eight anyways. So do you have any way of addressing that where you say that instead of showing UK or US standards, say that based on your previous purchases, this is the correct size that you should be using? Yeah, so I mean, it's not gonna be a hard interference. Right, it's gonna be a soft recommendation that, yes, yeah, I agree, yeah. The way it's presented, you know. Hey, yeah, so basically my question is around, so currently you were saying you are differentiating based on the type of a size. So UK size and US size, right? So let's say even in terms of a product segment, like if you are buying a casual shoe or a probably of floaters or actually the sport shoes, the size is different. So are you planning to have any type of those clustering as well? So no, I mean, it's not really based on the standards. Like I mentioned, we rely on the actual sizes of the shoe or any product, right? So we have access to, for example, in case of shoe, we have access to the length of the shoe or the width. Or in case of a shirt, we know the shoulder length and the actual length of the product. So that is what we use for clustering the products, not really the standard size. Hi, I have one question. Just reference to the previous question. Is it working? Sorry. Hello, I'm audible. Ashish, your question here. Yeah, just reference to the previous question. Suppose I use Raman Chat 42 size. And I'm going for some unfamiliar brand to me or I may use Adidas 9 size shoe. I want to buy Nike. So there can be a reference data for me, for the customer. See if you use this particular, you know, size nine of Adidas, you go for other brand. So this fits you. I mean, I'm talking from the point of view of customer because right now what I say is it's more of driven from the company towards customer. I mean, top to bottom, bottom to top. That's what I'm saying. Sorry, I didn't quite get your question. Suppose I use Raman 42, okay? I feel it comfortable. I go for some other. Yes. Alan Solly brand. So I'm not sure whether the 42 will fit me. Right. Okay, in that case, can the data help me out? Yes. You choose the right size for Alan Solly? Yes. Yes. So essentially, since we are using the actual dimensions of the product, right? So it might so happen that in that brand, you know, some other size 44 shirt might map to the same cluster just based on the actual dimensions of the size attributes of that product. Because most of your, I mean, analysis is based upon the historical data. If I'm a first-time customer. Well, if it's a first-time customer, yeah, so we do need that additional input. So if we can tell us what's there in your wardrobe. For example, we can, so for a customer who's first-time purchase customer, we can probe him for what's there in his wardrobe and use that information to do the mapping. Okay, sure. I have, excuse me. Can we just take it offline because otherwise it'll spill over. That's okay, sorry. Yes.