 Hi everyone, I am Swaroop I am currently working at Fidelity Investments. So, I will be talking about recommended systems. So, I mean just for an idea sake I mean can you just raise your hands if you are aware of recommended systems. So, I cannot bullshit. So, I will be the agenda being I will talk what recommended systems are the most popular algorithms collaborative filtering content based and matrix factorization I mean with few tweaks and then finally few use cases from my experience. So, imagine yourself being on a e-commerce website. So, most of the e-commerce website these days have millions of products on their catalog. So, the company has to recommend a few relevant products from their existing millions of items to you. So, it would help the company to drive their sales because he is recommending the relevant products to you and it also help the user because less time on the website I mean go get the relevant products come out I mean lots of increased user satisfaction. So, what's the challenge here? So, you have millions of users millions of products and everyone each one of us have our own unique taste and the products even the products have a huge number of varieties. So, at the given scale I mean you can't try solving this problem using maybe a regression problem with a regression for every particular product in your catalog or even manually. So, at the given so if you want to apply any of the machine learning techniques to solve this particular problem at hand. So, that falls under recommended systems. So, if you mathematically represent the problem. So, this is how it would be you have users on one side of the matrix and you have products on the other side of the matrix. So, I mean here we have considered maybe a 10 by 7 matrix but usually you have millions of users and millions of products. So, and the user might have purchased a very few items maybe 3, 5, 10. So, for the company to recommend items from their catalog to the user they should know what his ratings would be for all the other items in the catalog. So, the company would want to know the million products in the catalog minus the purchased few items maybe 5 or 10. Then they can sort the predicted ratings in the descending order and then maybe send out a few customized marketing messages or how they want to do it maybe through sidebar messages or whatever it is. So, it's all about predicting million minus few rated items. Yeah, as I said I mean you can't solve this using regression and definitely can't solve when you have the sparse data. But you have the advantage because in if you solve it through regression you can interpret the coefficients which you get whereas here it might be tough because you're just it's a mathematical problem. You have a sparse matrix you're getting another matrix which is completely filled which is a huge matrix but so manually interpreting it might be really tough here. So, coming to the most popular recommender systems you have a collaborative filtering and content based recommender systems. So, collaborative filtering is all about similarity, user similarity. So, if you take a user based collaborative filtering it's all about clustering users who are similar to the other users. So, you cluster users and if I did not buy any product which the other users in my cluster have bought. So, it's all about recommending that product to me likewise item based. So, you cluster all the items and if I purchased a good number of items in the cluster. So, it's all about recommending other items in the cluster which are not bought by the customer. So, I mean it's all about you could see groceries I mean usually they they're very common in nature. So, if you miss out a few items from the which are purchased by other households you can very well recommend them whereas here it's like bed sheet and pillow I mean they go together. Coming to the content based recommendation systems here you consider the metadata of the items. So, if you consider books it could be the publisher the other the year in which it has been published the genre the number of pages stuff like that. So, you consider the metadata of the items in coming up with recommendations. So, now I mean we'll solve a simple collaborative filtering problem. So, let's assume you have users and products and these are the ratings assume that the ratings are in this range of one to five. So, I purchased few items and I've rated items in the scale of one to five. So, you'd want to know what item you'd want to recommend to maybe a user one because you don't know what his rating is for the item two maybe item four item five. So, if you're doing a item based collaborative filtering what you do is you construct this item item similarity metrics where what you calculate is the similarity of item one to all the each item to every other corresponding item in the catalog. That's what you compute here you can't compute because for the item two if you take the item two which is product two that is this row second and third columns there are no intersections as you could see you have three there and there is no rating for the no rating here likewise no rating here likewise you have no rating here. So, it could also be a cold start problem I mean if very few purchases are made ideally this would not be the case. So, I made the matrix really small. So, if that's the case you never know how similar item one is to item two you also don't know how similar and as you could see this is a diagonally symmetric matrix. So, all the values on the bottom are similar to the values at the top. So, once you compute this and for similarity you could use maybe a cosine similarity Euclidean distance PSN you could use any metric as it's your choice. So, once you have the similarity score computed you can calculate the final values for all the sparse values. So, you could compute all the sparse values I mean based on these. So, user one has purchased item I mean product one and product three. So, you know how product one is similar to other products you could see these values and then compute what these recommendations would be for the other items in the catalog. Likewise, it would be the same thing for the user based collaborative filtering you compute the user user similarity how similar am I to all the other users in the database. So, you could see you could see there are a few null values because there is no matching you couldn't compute it. So, that's a cold start problem. Likewise, there could be few popularity bias I mean few people have very unique taste and it's really tough to match them I mean every other user in the system rates it really low this guy rates it higher I mean can't help it. So, I mean unless there is another similar guy exactly who is mimicking his behavior. So, coming to the content based recommendations as I said you consider the metadata of the items. So, like here you could see the you consider all the publisher book release of the year and each user is nothing but so the user is nothing but the profile of the item. So, if I'm rating all the books by rolling very high. So, in a way my profile could be said as something as I'm more of a rolling fan. So, in a way it's as simple as any other new release of rolling maybe you could be recommended of it. So, the user profile is composed of the item properties. So, here since you have a item profile. So, given any new item coming into the system you could compute a simpler cosine distance or any other similarity metric between these two things and find out his affinity towards the new item which is coming into the system. So, there is no kind of cold problem per se and we can also provide a good meaningful explanation because. So, if I'm recommending something to this guy I know why I'm recommending him because I rated all the rolling books very high. So, that's the reason why I'm recommending him another newly rolling book which has been published and the coin being getting the metadata it's not as easy to get the metadata as we think is in the e-commerce I mean sometimes it's the metadata is really not so easy to get I mean we could see the talk on index I mean the things they have been doing to get the metadata maybe extract the image features. So, I mean as we know the higher the features I mean better it is and till now we've considered either user user similarities item item similarities what if we consider the interactions between users and items both at a time. So, let's assume had this matrix not been sparse let's assume every user has bought every item in the catalog and every item has been bought. So, in that case maybe you could have used in SVD and we could explain the direction I mean the interactions which have have been happening between the users and the items since the matrix has been sparse. So, you use a alternating least squares approach to solve this particular problem. So, simply put alternating what you do in alternating least squares is you initialize two random matrices and you optimize the objective function the objective function being the known ratings minus the ratings which are there in the compute randomly generated matrix. So, this is the error which you have. So, you know few ratings you generated a completely full matrix. So, you compute the root mean square error I mean this rating minus this rating whole square it's all about minimizing that particular error with a so, you do solve it maybe using a stochastic gradient descent by with choosing a particular alpha rate and approaching reaching the optimum value. One simple assumption being the interactions between users and items can never cross the number of items or the number of users in the system because you are creating. So, if the sparse matrix is of size m by n. So, you would create m cross k and k cross n. So, the k should always be less than m and n both. So, this is how you would solve it and to make it more complex you could also consider the user by us. So, let's assume if I am a Rajnikanth fan irrespective of how good or bad the movie being the content of the movie I tend to rate Rajnikanth movies higher. So, I have a particular bias towards Rajnikanth movies. Likewise, you could have an item bias. Let's assume a particular product has been hit in the market. So, the immediate product in the market usually tends to get high ratings. So, that could be the item bias. So, you would want to eliminate these from the ratings or else you could have a time varying impact. I mean if you could see if you go into the IMDB all the newly released movies are very high ratings as soon as they are released. But with time the rating decays. I mean they tend to fall. So, I mean this could be called SVD plus plus and right now Netflix is using an ensemble of SVD plus plus and restricted Boltzmann machines. So, this kind of does really well with the accuracies. I mean now that we have talked about four types of recommended systems I mean when would you want to use what? Let's assume if your item catalog is really huge even though you don't want to go for the matrix factorization because the computations are really high. As soon as a new item gets into the matrix you would want to do the entire matrix computation again. You want to do the matrix factorization which might take good amount of time. In the real world you obviously have millions of users and millions of items. So, maybe it's better to stick to collaborative filtering. When you have metadata of the items which is really rare in the real world maybe definitely use it. Go for the content based collaborative filtering. Content based recommendation systems and if accuracies are the key maybe matrix factorization. For cold start problem let's assume a new product has entered your catalog. No guy has brought it. So, what would you do? So, you can cluster all your existing items and maybe just give it the same attribute the same value what the other items have or combining methods to ensembles. I mean we know I mean you could very well boost the accuracies with ensembles even though the accuracies are nearly low the ensemble model will definitely have a higher accuracy and data processing. I mean maybe removing the average like I mean you could see people usually complain saying a particular instructor always scores all the students low a particular instructor is even though you do fairly okay he rates higher. So, maybe you want to remove the average from all the ratings. As I said earlier I mean Netflix is using an ensemble of SVD++ and RBMs and from my observation I've noticed that it's very simple to produce reasonable recommendations but it takes a really good effort to improve them. I mean to increase the recommendation to make the recommendations great or I mean maybe I've worked on a so while I was I've done a consulting project where a particular company he's hosting a meet with all its executives and at the tables are arranged in such a manner where and they wanted to figure out who would be sitting in the table I mean which kind of users gel together maybe if you make a I mean I don't know I mean they might might have thought that people from diverse backgrounds may not gel well that might might have been their assumption whatever something might have been going in their mind and they want to have the best seating arrangement for them and the data which you got from the firm was just the employee first name last name and the company he belongs to so we solve this problem by scraping the skill set data from the LinkedIn so usually most of the top executives they maintain a LinkedIn profile page updated so we scraped the skill set and the problem with the skill set is we could have clustered users had this matrix not been sparse I mean I have all the attributes I know all the skills I mean I could have easily clustered them but this matrix is sparse I mean a but he a particular guy knows Julia sass are high SQL and there are only two users so a guy knows high and DB DB meaning database the other guy said I know high SQL and database so it's very intuitive that even he knows SQL so we want this pass matrix to be filled so that would be a one and maybe since this guy doesn't know sass so the other values would be zero so using the matrix factorization we filled the entire matrix and then we went for a hierarchical clustering and that's how we got the seating arrangement fixed and maybe you could recommend our packages based on the packages you have already installed or in a financial domain usually people have multiple maintain multiple credit cards and they use each credit card for a particular category so the numbers are the dollar values U1 U2 are the users and flight food are the particular categories in which they spend money so the values are a bit normalized not normalized actually hidden actually so those are the dollar values so you so for the company credit card company to cross sell they should identify how much he is spending in the other category that's when I get to know I can cross sell to him so let's assume if you solve by matrix factorization so you the first time you see are the random values the values generated in the range of three to five and then you keep minimizing the root mean square error so after these are the iterations for a particular value of alpha and at the end you reach a stage where the random matrix values which are the values in the flight and user two will tend to three actually usually it's four yeah it's three and it's six so this has reached that final stage where the root mean square error is almost nearer to zero so now you know the what he's actually spending in the other categories now the credit card company can go hit that guy who is spending really high in the other categories because now I know based on his similarity to other users you consider the interactions and maybe you can generate the marketing list and then maybe assume a 10 percent lift and yeah that's it yeah thanks guys