 A very warm good afternoon to one and all, I am Harika. This is my team Pranav, Ajay and Aruna. We are from Community Recommendation of Fundamental Research Group. Water water everywhere but not a drop to drink. Analog us to this data data everywhere but not a thought to think. In order to facilitate the user with a set of recommendations like in order to avoid the problem of finding what article can be read, which article can be read by a foreign user, we come up with a recommendation system. In real life we often ask our friends or our piece to recommend as in order to get more content or more data about one particular article or one particular thing or any kind of book to be read or any kind of references. So a good recommendation system should help the users to get the data in such a way that they are interested in. Our problem definition is we have m users and n articles and we have some ratings associated with user and item pace. Optionally we have the time stamps. So the recommendation system should output the list of n recommendations for the user. Those n recommendations are the top most recommendations for the corresponding user's interest. And we have some technologies used. Those are flash, matte, flat log, numpy and pandas, redis, scipy, sh and tensorflow. And there are mainly two major approaches for building recommendation system. Those are content filtering and collaborative filtering. Content filtering is mainly concern about the user and the item in such a way that we can use the metadata. Like for example, if a user is interested in the machine learning article published by some particular XYZ author, then we will recommend some more of the machine learning articles that those are published by the same author itself. And coming to collaborative filtering, we do not need any domain knowledge. We will look at the past user item interactions and we will find similar users can be found. Then the set of articles recommended for the other users will be recommended to the current user also. We followed the methodology of collaborative filtering with matrix factorization. It can be explained by Tarno. Okay, so as she said, we basically follow collaborative filtering. And again, collaborative filtering can be divided into two methods. So one is neighborhood based in where you sort of try and define similar users. And if I want recommendations for you, I will look at users who are similar in some sense to you and generate recommendations based on that. So this involves the cosine similarity and things that some other team used. But the model that we preferred is called the latent factors model. So in this case, we basically take the three dimensional data set that we have, which consists of the users, the articles, and the ratings the users have given to those articles. And we convert this into a 10 or dimensional data set. So you can think of this as the inverse of principal component analysis. Instead of trying to reduce the dimensionality of data, you are expanding the dimensions. And you hope that these extra dimensions will help you explain the ratings. And like in content filtering, you build up huge user profiles where you take care of demographic information about the users, their tastes, and other things. So these factors that we generate, these are somewhat analogous to this. But in content filtering, you have to manually sit and generate factors, which is almost impossible for large communities. That's why we settled on this computer-generated factor model. And for the model, we are using latent factors. And it's implemented through many approaches. But again, the most popular one involves matrix factorization. And the basic idea is you have this huge matrix which consists of users and items. And the entries of the matrix are the ratings that the user has given that item. And our problem is, like in SVD, where you want to split a matrix up into... Basically, you want to factorize the matrix. But over here, we do not impose any orthogonality constraint. We are just happy to get two different factorizations of the matrix. Our approach is almost exactly similar to the one used by the tagging team, except that our model differs only very slightly in that we also allow negative entries in our factorized matrices while they constrain themselves to positive entries. So hard work says you have this huge matrix I was talking about. You split it up into two smaller matrices, u and v. So in u, you will have u as number of users times number of factors. Factors can be things like, is this a comedy movie or is this a crime movie? So if you have that sort of factor and positive values will indicate that it's a comedy movie and positive values for that factor in the user matrix will indicate the user likes comedy movies. And similarly, in the item matrix, you will have... If it's a comedy versus crime factor and positive values will indicate that it's a good comedy movie, negative values will indicate it's a good crime movie. This is... We ran this matrix factorization model on an Amazon Music data set. These green values you can see over here, they represent items that were purchased by the user and then the user reviewed these items. These blue items over here are the items our model is predicting the user will like and these red items are the ones that our model is predicting the user won't like. So you can see that there's this one lonely two over here, so all the low ratings go to the right of the two and there's a lot of five, five, there's one four over here. All the positive ratings are clustered near these positive values and they're already covered. So there's one problem still with this nice matrix factorization model. The problem is that users are humans, they're not perfect machines. Sometimes they'll be having a bad day, so they'll give low ratings to every article or maybe there is some article who's supposedly written by very popular authors so the ratings for that article will be biased upwards and what it should be. And that's not the only problem that we might face. You can have biases that are changing with time. So you can have the classic example given over here is someone who watches psychological killers. Such a person could very well start watching crime movies next year. And also we also have sharp transient shocks to user preferences. So things like if you have, there might be seasonal variations. So you have the world cup going on right now and so we might expect to see football related articles getting a lot of views. None of this stuff is covered by a nice matrix factorization model. So we implemented a new model that tries to capture these time variations. How we work is we take our time period which say consists of 60 days and we divide it into equally sized time periods. So we'll split them to periods of five days each. And then you look at how preferences change from one five day period to another five day period and because it's any transient shock should average themselves out you'll only get the long run component of any drifting user bias. And then to address what I was saying like users having a bad day or something then we look at each day within that time period and we introduce a variable that soaks up all the variation in ratings for that day. This model is known as time SCD++. So this model gives you the best possible results you can get on a data set out of all the recommendation algorithms that exist. And this is a comparison. The blue bars are the naive matrix factorization model I was talking about and this orange bars is the time model. So once a Netflix price data set, there's a data set that contains about 100 million ratings and this was the data set that started the entire recommendation contest everywhere and there's both the time SCD model and the matrix factorization model were initially tested and developed for a Netflix price data set. Then there's the Amazon digital music data set which again you can see in both of these data sets the time SCD model performs much better than the naive matrix factorization model and this is one books data set which we also include for comparison purposes but there were no time stamps given for a books data set so we could not train our time based model on this data set and now Aruna will talk about the workflow. This is the workflow of recommendation system. Our recommendation system runs separately from the main collaborative community system and all interactions between the collaborative system and recommendation system done through the post and get request. First the post request sent to the ready server post request contains the JSON data which is collected from the event logging system and this data is given to the RS, RS server. It starts training as a sub-process. Why we need as a sub-process? As a sub-process because if we run as a main process while training we didn't get any predictions so if we run as a sub-process we can get the predictions while training also. After training the system the model will be saved in the ready server as per the future reference workflow. When user visits an article the CC collaborative community makes a get request to the ready server and ready server access through the get request ready server access the saved model which is saved in ready server that will be used for training predictions. RS server returns the recommendations, predictions those are made by saved model CC finally collaborative community displays the recommendations through the API to the user. The next part will be continued by Ajay. For recommendation system we are using two APIs one is Django API and another one is Flask API Django API is in collaborating system it is take the user name as input and it will put a get request to the Flask API and the predictions are done in by Flask API and finally Flask API return the top rated article recommended for user IDs it will give to the Django API and Django API will collect the corresponding article titles and Django API return to the front-end page along with IDs with titles also this is for front-end design for authenticated users and this is for anonymous users so our future package we presently work on article views and we need to integrate the upload and download and social media also and more efficient utilization of memory and parallelization of time issues these are the device nodes, these are the reference, thank you so what are the parameters for recommending an article? for collaborative communities so article views how that can be a parameter for recommending some other article? because if I know you viewed an article then that like you don't just randomly go and view articles you look at the title of the title looks interesting only then will you view the article so even just a view tells us something about what sort of article you might like so can you go to that demo page? so go just two pages back so I am reading cloud computing how and why will I read data structure or DVMS, computer network we just randomly generated some articles we didn't generate a huge data set of articles so aren't you considering tags? we don't want to consider tags because supposing you are reading articles on maps then it is plausible that you might be interested in CS articles too or you might be interested in biology articles too so if you just look at tags then you render two problems one you will miss out on these sorts of comparisons and then you are depending on someone to actually make tags in a reasoned and well behaved fashion so you see the importance of Rekha MEMS system here ontology will help in making the system better and so it is as Fatak Sir mentioned personalized material should be the next big thing actually we are focusing on that so in our factorization so when we split it up into user matrix we have a separate row for every user so before you go into the detail so let me tell you one thing so if you want to recommend something to me personally you can't do that unless you know me correct so your system is not trying to know the user correct so these are very good but parameters are not there so do you know what his or her age male or female what is exploring otherwise there are many parameters that should be considered at least the education so that you can decide the level of articles to recommend so when somebody is on a machine learning page there is no point in talking about data structure it's kind of very basic thing for the one who is reading about machine learning how much time spent on that suppose I opened I opened but I didn't like the close it doesn't mean that I have been reading and I spent on each field and then it will be okay whether I have won something or it is just by chance I will recommend and I will recommend so not only just like dislike but the system that my mentors had developed so the interns had developed it will be a good thing so by saying it's easy, difficult or boring, confusing it will help in identifying more personal articles for the person, thank you so you have used a combination of collaborative filtering and matrix factorization no, matrix factorization is sub-technique in collaborative filtering so we have used matrix factorization so you have not used the neighbors based methods no okay so just a doubt actually based on neighborhood based methods you have to either based on content or based on users so in content like the example that is given based on movies so it makes sense to cluster some movies based on the parameters like the director or just whatever what you have mentioned whereas based on user how do you cluster because once you have some clusters then it makes sense but when you don't have any user when does a new user get added to a cluster so like you have a completely new user then you are right, we don't know anything about the user so we just assume that it's an average user but supposing you have a user who is viewing mainly computer science articles then you can cluster that user along with other people who are viewing computer science articles like that so that means you will be using some information provided by the user just like no the user doesn't provide any information we just look at the articles the user is viewing so if you and I view mainly the same type same articles then we are in the same cluster if there is no overlap between the articles we have viewed then we are in different clusters okay so isn't that cluster neighborhood based method based on content? yeah but there is a slight difference in neighborhood based models you have to give the dimension on which you are building your neighborhood explicitly so you might have some tag that calls it a computer science article or a maths article over here we are learning the dimension automatically so we do not have any tags or what sort of articles they are so we are even inferring the article type automatically okay yeah fine and just one more thing so you also mentioned tensorflow at some point so have you actually implemented yeah so we have two methods one was the time method that's not yet been implemented tensorflow that was our future work parallelized that using tensorflow but the other one naive matrix synchronization that has been implemented using tensorflow we actually used it what is the GPU used we did not have a GPU so we just used a CPU based version of tensorflow okay yeah just the data sets that you have mentioned yeah so the graph that you have mentioned so is this the result of your own experiment yeah our own experiment so how did you manage to get the data set of Amazon and Netflix so that's in a reference so Amazon data set is from I think reference number 8 and 9 there were these people they scraped all of Amazon and now they have uploaded the data set on the website characterized by these are music reviews these are automobile reviews things like that good thank you