 I am a senior data scientist at IBM's AI services division also called CBDS Cognitive Business and Decision Support and I will present some of the recent work which we have done over the last one year or so with some of our clients and this is also an IBM asset. This work is also part of the ACM Commence Conference in January where we won the best paper award in the demo track. First of all I must acknowledge my team and my colleagues without whom this would not have been possible and I will just take one quick one of the ways in which we have been putting out a recommendation system on a chatbot. So there is a user, there is a user Freya and she wants a conversation agent, she wants to use a chatbot to buy clothes for herself so she says that she wants a blouse. Of course you can write it in more descriptive text but I will just keep it simple and you have the recommendations coming up. So this is very similar to an implementation that we had done couple of years back and that for a client of ours Lotte although the engine behind was different and that one for the client the best customer experience award at the World Retail Congress which is known as the Oscars of retail. So basically what we are seeing is that there are two users which I have taken up here and shown and seeing that recommendations for them are different but a fashion product if you are asked to describe any of the products you would see there is a lot of detail in each of the products and how does one capture that amount of detail. So again as we heard today morning we should sort of build up from different layers of abstraction. So what is the recommendation system? Very quickly it recommends users and products based on there not just there but on purchases and it is the core of many e-commerce platforms and most famously Netflix and Amazon. Netflix ran a competition a few years back and just again there is this confusion on how recommendation systems work. So advertisement is a mass messaging via medium that could be either TV or newspaper etc. Campaigns which have been around for around 15-20 years or even more are where you identify a group of customers who are susceptible and have a high propensity to purchase something you push certain products towards them whereas recommendation systems are personalized. So the core concept behind the personalization is that if two people with very similar profiles come in and basically they fall in the same segment their recommendations could possibly be different. And the underlying principle of recommendation systems working is that similar people buy similar products. This principle was not true and there is no reason I cannot prove this a lot of work in social behavioral studies have observed this empirically if this is not true no recommendation system would work. Traditional recommendation systems when we look at collaborative filtering based recommendation systems there are two classic collaborative filtering based recommendation systems and although and it is still very common using collaborative filtering in recommendation systems in production systems even today very common. The user based collaborative filtering is that if there is one user and there is someone who is similar to that user and if both these people like two products A and B then this user is also likely to like product C. In item based it is based on two products are similar and if there are users who like a certain set of products then another similar product is something that they are likely to like so that is the principle of item based collaborative filtering. Now just I do not want to scare you with the math but I want to bring out something important here that when we look at so this is a predicted rating for a certain user I and for a certain product J and what we are doing is we are saying that these are the products which are similar to it under some neighborhood and we take the known rating that I know and this rating could be derived from actual purchases could be derived from their online click stream behavior could be derived from reviews etc. So and I take a weighted sum using the product similarity. Now let us not bother about what this equation is the core part is there is this core thing here on product similarity we would have to understand that products are similar and the question is what do you mean by two products are similar and this could be in any domain is it how do we compute similarity traditionally we take the product features and we define a similarity metric it could be a Euclidean metric it could be a Jakart similarity it could be a cosine similarity but we define a similarity metric on the product features that are available. The question is that sufficient and this is not true only for collaborative filtering that I showed you other algorithmic techniques like content based filtering alternating least squares have under the hood the same concept and even in deep learning models and we keep talking about deep learning models today here are two of the two deep learning models neural collaborative filtering and wide and deep learning both of them use some basics of describing a product by a set of features and then doing some computations on top of that the question is how does one actually do this and what I am claiming and from my experience is that in most real world data is both incomplete and inadequate right so incomplete we understand we all have seen missing data and there are various ways in which you can impute it or ignore it whatever however as serious a problem as missing data is perhaps equally if not most serious is that data is inadequate here is a screenshot from a very popular online website and I am not complaining on that website but the two t-shirts on the left are both abstract print if you filter on t-shirt and choose abstract print on the design these are two shirts that come up and on the right are two t-shirts that come up which are checks right so these are apparently similar but as each of us understand that they are in a way very different and this information is not captured in your structured data feature and because it is so different I can also say that perhaps the either customer who is buying this product and the customer who is buying this product is very different so the problem is in this problem is not unique in retail fashion a very important area is where we have seen this is training content so when we look at training content you will have one training content in which in a course set a course description content data science right I mean what is data science anything and everything can be data science then maybe a text descriptive text which explains the course whether this is an Andrew and his machine learning course etc but it is not there in the structured feature that is available it is also present in media and content and photographs paintings of course a lot it is not the third is not something I have worked with but but he has this is something which where it is very important so what we are trying to say is that we should look at a multimodal representation a multimodal representation takes data with different modes of input one mode of input is for example is a visual input which may be an image and at verbal or a text which may be a structured features so there is a multimodal input and what you do is you build a neural network layers and create a shared representation this representation is something which combines information from different modes of data and the very in exactly the same way that you would do when you look at the shirt I am wearing right you would be not only looking at this is a full sleeve shirt you would be you would be understanding its pattern you would be understanding its texture etc so there is a lot of inherent information which we combine as humans and our attempt is can we do this by a AI system and again lots of different modes could be text descriptive text images videos speech and what we do here is so an auto encoder and I think most of us here would be aware a builds a representation a hidden representation a latent feature such that I can reconstruct the original from the input and the way I train our auto encoder model is I try and reduce the reconstruction loss and this is a latent representation in a multimodal auto encoder I take the data from different from different modes of input and create a shared representation across the modes so that I can regenerate each of the modes separately and the loss that I train it for is a weighted sum of the losses for each of the modes so therefore this representation can rebuild to a high level of accuracy each of the modes of the data I can rebuild the text I can rebuild the image etc. So why did we do multimodal in recommendation systems is because our consumer decision making process is also based on taking visual cues in addition to textual cues for example in the fashion domain but elsewhere it is not one mode of data by which we take a decision so therefore if we have to replicate that similar people like similar products then I should consider that similarity being across different modes of input. So what we do in our architecture is we build let me look at this diagram so we have an image and we build an image encoding what is an image encoding think of any convolutional neural network like a alienate to a VDD 16 whatever you want and look at the last layer before the softmax that could be an image encoding right so so that's an image encoding at our embedding for example here is this I think was an inception at embedding if I remember correctly and then we create a shared representation using the textual data features that I have and because it's categorical it's converted into one hot end coding so it's rather large 855. We have a product shared representation we have a user shared representation here we take a dot product of these two which is basically the representation of the user and the product in the same latent space and we say we try and match this to the rating we also have some more additional complexity on trying and ranking the products so that if I know that that for example Sujoy has like this shirt then similar then when we look at the predicted ratings this the concordance of the predicted ratings should be similar to the similarities of this shirt right so we take all these five different losses loss one loss two loss two loss three loss four and loss five we take a weighted sum we learn the weights via a round of cross validation and we take a weighted sum and we predict the results and let's look at some variance of the architecture we call model one to be text only because we are present some numbers we call model two to be text and image in the product model three is the same but here the text is information is enhanced by the image for example I've got rid of textual colors like green and blue and replace them with RGB values that's one example I've also fitted some some missing network a missing data and here this is a here we have a convolutional auto encoder instead of the embedding so we try and reconstruct the image if you look at the performance first of all the the difference with respect to collaborative filtering and and deep learning models are orders of magnitude difference so there's no question that deep learning models would do better right yes we have limited it to to 500 neighboring users but or 500 neighboring products but even if you increase it won't won't change much so there's an order of magnitude difference we see as we go across the models the fourth model which is my convolutional auto encoder is doing slightly better not much but the point is whereas the third model is is almost double of the is almost working double in both the validation as well as a hold out data set and why is this why so what we have implemented on production is model 3 and not model 4 because training this model is a big pain because this is very large convolutional auto encoder and so this is a big pain in training such a deep model so so that's why we have seen that using of of a model here where we try and rebuild the image embeddings is one is advantageous both from online prediction as well as training again one thing let me me point out that this system has been been deployed on production and we can get 100 millisecond under 100 millisecond response by using a bit of fancy engineering nothing fancy but but but thinking about what happens if you clearly deeply think about it on a certain day the products are not going to change the users are not going to change if a new user comes up I can't do any recommendation so this user list is known the product list is known at the beginning of the day so the beginning of the day during the training I simply store these two information and online I just need to do a dot product and a dot product as any of you know even it takes few milliseconds so we use that and we we have been able to deploy this one of the disadvantages of course here is for training we would need GPUs so we would need a lot of compute power to be able to do it well that is one disadvantage and we've heard this concern from some of our clients and so there we've had to use collaborative filtering based approaches some further work we are I'm currently we're now working on this is none of this is the results are available publicly but we are currently using descriptive text where along with any fashion product you have some design expert writing some text about this short pair it with X and Y and Z we are trying to use that sort of information we are looking at content on training content recommendation using text as well as multimedia streams so by text I mean the detailed text you have that this goes for example just take a course which everyone here understands Andrew and his machine learning course I mean that would be written this is being delivered by professor Andrew and Ray and is the foundational course on machine learning etc. So you would have a detailed descriptive text we are we are using that in some training content recommendation for both internal use as well as for some of our clients we have been looking at advertising and paid content in news media so when you are looking at an NDTV site you see a lot of other articles coming up and most of it is a very poor quality and is not a comment at NDTV or any other particular site but the recommendations are not practically useless what we are trying to do is we are trying to bring this in into our work and and build this sort of system but this is this is very this is very here you actually have three different components this is very experimental as of now we have had some progress here and so maybe in a future conference we can do that so references which I have referred to and I will end and open to questions yes yes but but what you look at for recommendation systems is is you look at individual past purchases right when you look at an advertisement the individual does not exist right so so that is the core difference where you need to have individualized history of purchase and click through and etc to build a recommendation system so purchase behavior is of course the other thing with which we have used a lot is clickstream behavior page views and an ad to cart wish list we have used a lot of that so that is another very important source of data reviews online reviews etc so all of this goes into our training for the recommendation system the inference as I said is under 100 milliseconds this is for 500,000 for a for a Indian retailer having 5 and 5 lakh 5 lakh users with 1,000 concurrent users during peak time and 35,000 products I am getting 100 millisecond latency no no no GPS for inference too expensive yes yes this is a very interesting problem not just holdouts the other problem is how do you even how do you prior to deploying it how do you even know whether it would be good or not so some of the work that we have done and filing patents on that is basically run simulations of the of the data with different combinations of the input and and seeing whether similar product product is coming up in the top end choices or not that would be a smaller intersection yes but on that intersection we are trying to see when I look at a similarity of the products are they are are we recommending similar products or not so this is this is actually a very deep question not all of it is actually available now for for public consumption we are filing some patents on this but yes that is we have been looking at this problem it is a very hard problem to solve in recommendation system once it so okay here ask here put is that I mean correct yes yes I know the individual I have the user identified through a login so so that is that is assumed if the user is anonymous then you cannot do personalization so if you think of an app today you are always logged in or you are on Amazon or Flipkart you are always logged in I mean that is the reason why they keep you logged in okay yes we we we claim it has it will work for for other industries but from our we have deployed it into production only in retail we believe it would but into production is only in retail not groceries but but fashion consumer goods yes would work we have deployed it in fashion and consumer electronics by the consumer goods sure thank you