 Hello, my name is Abhijit and I work at Jubeh Muguri. Now, way back in 2012, I just came back from Australia and after completing my master's, all I knew was a little bit of math and some math lab. And I was having a way back and finding the job here. So luckily, thanks to Zanab, I was introduced to her in Fifth Elephant. Yeah, I am doing well since then. So the point is that Julia is very welcoming and friendly to everyone, she has a language and as a community. So now I have to recommend this to you. So I just want to make this as a river of intermediate to... intermediate level talk. I don't want to be too advanced. Now, everybody know what the recommender systems are. It helps us in cross-selling, converting browsers, interpages and reading, drawing, the example, etc. The example I am taking is on a movie recommender system. For example, there are four movies and four users and you can see that the question marks are the ones which the users haven't seen those corresponding movies. So I will be using this as a toy example. Now, let's have a little bit of a background. So this very good matrix, let's call it as some R rating matrix. It's usually very sparse. It's like 99% sparse. At least my thesis was the Netflix problem. So the Netflix data set had around half of the users working on the movies and it was 99% sparse. There was only 1% of ratings what I had. So you can imagine the scope of... And then very much we factorize the rating matrix into a user matrix and a movie matrix and when you do a U-Cross and your entire R is there and then you can set a threshold, let's say over as rated model 4 and not seeing the movies you can go and try to squeeze it. It's as simple as that. So a little bit of math lesson here. So this is a rating matrix. Number of users, number of movies, you can factorize this into U and M as I said. You can observe there's a K here. The real dimension has changed. So U-Cross K, K-Cross M gives you U-Cross M and this is a classic example of Demonstration production. In this movie as I said in the previous example it was around 20,000. So I will show you certain graphs in which I got the best results for values of K-50. So I didn't need it, there was a lot of noise. So yeah. There are two methods I've used. I've tried it out with SVT and I've tried out a method called as Alternating V-squares. So Alternating V-squares also does the same and how do we do it? This is analogous to A-X-L-B. Just that X-L-B also exists in this case. And we ultimately solve for U initializing a random M and K-Cross M, we updated U as known and then we treat M as unknown and we keep doing this until we can watch. For example, how A-L-S works is let's say for user 1, we are seeing the movie 1, 2, 3, 5, 6 and 10 and we are going to initialize we would have initialized random or in some way we would have initialized M. This is our unknown, known as unknown. So this is it. So this is where ordinary V-squares you do it across all the users. Now you move your U alternatively you'll find M in a sort of way. Let me show you. Just initialize but probably 30 metrics. This is past metrics, how it looks as so this was the metrics you saw in the example. So zeros are the one switcher one predict. So let's do that. What all parameters we need to give we need to give the set we want to run for 30 iterations and the inner dimension for it's a very small matrix so we better use all four. So now let's do both. So now you have your predictions. So this turned out pretty good. We have predicted that that user if he sees that movie he's going to rate 4 so we wouldn't recommend that. Alright this was using A-L-S. Now what happens if we use SVD. This is a diagonal matrix. So if you observe here this SVD has really constructed the original matrix. It's of no use to us. It's zero supply. So what does this happen? I used a full rank SVD share. So now if we do the same thing by reducing the rank let's try to use it's not zero. Well it's almost close to zero but on a dimension when you're talking about 20,000 so let's say for some time 50 you're going to get some pretty good results out here. It's wrong and errors are being kept. There's some other way we can run it. Production is kept. Where is it? SVD is not wrong. SVD when you use it in a full rank is definitely wrong. Because what SVD does is it's going to factorize your input matrix and it's going to give you an orthonormal decomposition of the same rank. So all the eigenvectors what you have are orthogonal to each other. So these vectors that's 50 of them for 20,000 are enough to span the entire space represented by your r matrix. So that's how SVD works. So you're wrong when you're using SVD on a full rank sense. You should always reduce your rank. You can start and what I do because I started at 10 incrementally I just went on oops, moving that here. Factor analysis what it gives you this is the decompose matrix for example. We use this arrays of number makes no sense but let's just pick one of the column and if I sort them out you can see a pattern here. So you might argue how are you sure that whatever your eigenvectors you can treat them as your basis vectors. So when you sort there is some kind of a pattern here how many of you have heard or seen that we split on the other side? Have you ever heard about it? We've seen it all. Some little ways of learning would be faster. Bandwagon. Oh really? Majority say otherwise. Okay. How about Schindler's list? Titanic. You shall be dance of the other hand. There's a clear pattern because I can see some pattern here called this well, looks a bit cheery but yeah. There's a pattern whatever. This is a plot that I had for SVD as I mentioned this was on the Netflix program. But in ALS when you decompose your inner or your but does they not like orthonormal? So you're just going to play too but you're not going to really overfire them. Comparable results. It's not like SVD is bad. We were trying to use SVD in some sense. So this game is all about dimensions. So the trick is to get the dimension right and this is your right dimension. What is the white? Pardon me? That's the error. RMSC. This is what it was. It took around 10 seconds, 400,000, around 5 minutes for 20 million points. It's parallelized. I mean ALS is parallelizable. It's not parallelizable. ALS is embarrassingly parallel. It's very easy. There are some things to what we are currently working on and you can always go to the package and continue. But I just have another very nice thing. Thanks to Shashi that he comes to RMSC. You may want a couple of these packages as sure. So we have come up with this nice UI interface where you can just go around to show you some random movies where you can create and let's say randomly I don't even know those movies. The interesting thing is somebody was asking so if you can observe this I have intentionally written this so you can see the values change and let's say I have given these ratings and they submit. So this is work in progress or so these are the recommendations based on the ratings I have given. This is like an online learning. Somebody was asking incremental learning is what you have to take care of. We cannot generalize that in a package. You know your data, you know what model you are using so that you will have to fit into your code. Yeah, questions. How much time did it take for you to code this? You already knew the math so how much time did you take to Very interesting question. First day of the office I was able to do the entire 10. I think we should show the code. Just like character file or something. Yeah, it's around 100 lines and a lot of It's an unoptimized code but it's faster now so we didn't bother of it. Yeah, that's the package where all this code is posted. You need to say what thing I can't do. There was a slide that said what we are currently working on. Yeah, on these things. This is the roadmap. Do you have a principle components extraction or something like that? Do you have a principle components extraction or something like that? Do you have a principle components extraction or something like that? I usually prefer working on some of them. On the 20 million data set, it takes 400, 500 seconds to clean the entire model. There's not something we do really really early. So probably in a week's time when you accumulate a few users you retain the model. So weekly once or just 400 seconds. What's different in ALS compared to SPD? Is it a non-negative factorization? Yes. It is a non-negative factorization. Let's thank you.