 Hello everybody. Am I audible? Yes ok. So, I am Tuhin. So, I work at Imple Labs as data scientist. We basically specialize in building personalized recommendation system. And today I am going to talk about some of the cutting edge technologies that we are kind of experimenting with and we are on the process of productionizing all of those things actually right. So, we will be talking about the recommendation system as you can see that 2 hybrid and that 2 using probabilistic graphical model right. So, we will come to that what hybrid recommendation system means and what is exactly probabilistic graphical model. But when we talk about the recommendation system the first some of the things that came in come to our mind like Hofstra, Netflix, Amazon right. But here what we are going to do is we are going to keep it very generalized and for the simplicity we can take some example of let us say news media recommendations. So, that we can kind of I mean make sense about what we will be talking about and these are the main steps that we will be covering in this particular talk right. So, first question that basically comes to our mind is why what is hybrid recommendation system and why do we need it right. So, in recommendation systems two kind of two genres of recommended systems are very popular right. One is content based and other is collaborative filtering. So, for content based recommendation system what it is exactly is there are similar items or the contents or videos in terms of let us say news media right. So, for example, let us say I am interested in political news or social issues related news right. So, based on content based recommendation the recommendations that I will get in terms of news recommendation those will be pretty much related to the political issues and the social issues right. So, what happens is there is a over specialization right. So, probably I might be interested in sports news or entertainment news but as those are not there in my previous history and those are not similar to whatever I watched before that is why those will not be get recommended to right. So, that is a huge problem for the content based recommendation. So, and the another problem is that it looks only at a user level history right. It does not look at other users what they have watched right. So, probably there are similar users like me they have watched something else and that I might be interested in. So, those kind of things are not covered in content based recommendations right. In collaborative filtering user based interaction all those things are taken into consideration but the problem is if a particular content is let us say unpopular that means that particular content is not viewed by enough number of users that will not come out as a recommendation even if it has potential to get recommended to a particular user right. And other problem is the cold start problem. So, let us say a particular user is very new to a system right. We do not know enough about that particular user what did you watch whatever like what is his likings or her likings right. So, that way we cannot be pretty much confident about giving him recommendations right. So, these are the main drawbacks of two kind of popular genres of recommendations. In hybrid recommendation we basically combine all the positive things about those two stops right. So, what we what we trying to do is using probabilistic graphical model we are going to build a hybrid recommendation system that will use the essence of content based and collaborative filtering both into account and build a better model right. So, what is a PGM what is the probabilistic graphical model right. So, it basically is a deducted acyclic graph where individual nodes are dependent on only its parent and based on that we basically say that what is the transition probability from a particular node to its child node. So, that is what it is about in a nutshell. So, let us see how probabilistic graphical model can make inference right. So, on the left hand side it is a simple graph where a, b, c and 3 nodes a and b are the parent of c that means c is dependent on a and b and a and b are independent of anybody right. So, from this particular graph what we can do is we can create the conditional probability tables which are associated with individual nodes. How we can get it is based on the covariance pattern of a, b and c. If we can build this kind of conditional probability tables we can get any kind of query resolved. So, for example if I want to know what is the probability of b equals to 1 given c equals to 1. So, for example what is the probability of watching a particular video given another video is watched. So, those kind of queries we can solve using this particular Bayesian rules right. So, as you can see probability of a, b and c that is called the joint probability distribution is basically the product of individual probabilities which is probability of a probability of b and probability of c is basically dependent on a and b. So, that is why it is written like that fashion. If we can get that kind of expression for any kind of graph and we have the conditional probability tables which are associated with individual nodes we can calculate any kind of queries. For example, we can calculate what is the probability of a given b is not watched and c is watched right. So, all those kinds of things are possible using this simple kind of model. Now, what is the benefits of using probabilistic graphical model right. So, here what we are doing is as we can see that a, b and c are kind of interdependent on each other right. So, it basically brings in the essence of the content based filtering. On top of that we are also kind of considering what is the co occurrence of individual nodes that basically brings in the collaborative filtering essence right. So, both of the things basically we are putting it into a single graph where individual nodes are dependent on its parents and individual ages basically represents what is with what probability the other node is going to be come out as a recommendation right. So, this is basically the essence of probabilistic graphical model and this is how we are using content based and collaborative filtering recommendations to build a hybrid recommendation system right. So, for example, in this case a and b could be let us say topics for example, it could be political news and sports news right and c could be something related to let us say Narendra Modi gave some prizes to order some awards to Virat Kohli right. So, those could be kind of related to both kind of genres right. So, it is just a small section of the big graph just to explain how exactly the calculations can be made, but it would be a big graph where individual nodes will have multiple parents, individual parents will have multiple childs and that way the probability of getting chosen in the recommendation set will be depending on right yes. So, this particular graph is not dependent on the user right. So, it is it represents only the content right. So, if you have the different genres and the videos and the contents out of it we can build topic model out of it we create a big giant graph it is independent of what are the users it is I mean people are viewing from right. So, that is not at all dependent on the users. So, even if a user is totally new to the particular system based on the what is the viewing pattern in default in that particular population will be given as a default recommendation yeah. So, when a particular person is coming for the first time you do not have any viewing pattern of that particular person you do not have any users pattern right. So, yeah exactly. So, this particular probabilistic graphical model cannot ok. So, it might be at a global level it might be at a user segment level as well right. So, individual user segments can have individual probabilistic graphical models. So, when a particular new user is coming based on his demographic or the interest we can kind of make it fall in into a particular user segment and for that particular corresponding probabilistic graphical model we can recommend that particular recommendation ok. So, the approach that we followed in this particular case was first of all we try to find out what is the relationship between different genres right. So, for example, in news domain when we have videos related to politics videos related to sports videos related to entertainment those videos are kind of independent of each other they have only the single parent and apart from that we do not we are not able to make any sense whether those are related or not whether genres are related or not right. We cannot make that kind of influence out of it. So, the first thing is to find out the relationship between different genres or categories based on the content of individual videos in terms of the textual metadata we kind of did the LDA the latent direct allocation it is a topic modeling technique and that particular graph that we create is it is a unweighted dependency graph that is why we call it as UDG. Once we build that particular graph structure then based on the viewing pattern of individual users we kind of create that particular conditional probability table individual to nodes right. And once we do that then using the simple example that we showed we can find out any kind of queries right. So, let us see how the UDG or unweighted the dependency graph is kind of created right. So, for a particular content let us say we are getting the news videos we extract the metadata in our particular case it was molded structure like this part is the description in terms of the description that is a piece of text. But in terms of I mean finding out what is the description and what is the genre those were structured in terms of description text it was unstructured data. So, what we did is we did the topic modeling on the description part and we found out different topics out of it and given the genre and the video ID and the topics we found we try to find a connection between them and create a dependency graph. So, let us see how this looks. So, on the left hand side these are the explicit connections that we get from the data already there that is in the lower node the child node is basically the video IDs are the content IDs and the parents are basically genres like sports or politics like that right. And once we use the particular algorithm to find the unweighted dependency graph on the right hand side we kind of introduce a different second level which are the topics right these are the output of the LDM model. Now, we can see that topic 2 and topic 3 are kind of common to genre 1 and genre 2. So, previously what we had is genre 1 and genre 2 were totally independent right there was no relationship, but we could find from the graph generation that genre 1 and genre 2 are not totally independent there is some similarities between them right. So, we want to explore this particular relationship between two categories in terms of two videos are also now related right which were not related before. So, using this particular structure and once we have the different user viewing pattern then we can just train a particular probabilistic graphical model that will come to the latest slides and we can find out different queries like given content 2 and content 4 watch what is the probability that content 5 should be also recommended right. So, in terms of real life implementation we can see it as a very high level picture like this like there are multiple users who watched multiple news videos and let us say John watched political news and sports news then what should we recommend right. So, this is a simple question that we are trying to answer in this case. So, while we come to the implementation let me take you through two kind of approaches that are pretty popular in the market that is one is the vertically scalable this is good for around 20 K or 50 K node kind of data but it is vertically scalable means if we add multiple CPUs and increase the RAM then obviously performance will be better. In terms of horizontally scalable we can build it on Edward or PyMC 3 which are deep learning framework and we can make it work across different virtual machines and we can serve the recommendation in real time also. So, the simple code a simple prototype that is available in the GitHub that the link you will find it here. So, I will just take you through the code just to get you an understanding about how this particularly particularly works in real life. So, let me show the notebook this particular notebook is available in the link that is present it depends on the business case that you are doing right. So, if you are so there are two factors that are coming to picture one is the how frequently the content is updating right I mean how much delta content I am getting and how much user viewership is basically changing individual over time right. So, in our case we are doing it daily level right because we are basically serving to a very popular news media and we are getting huge traffic out of it right and in news media you would realize that every day new contents are coming up right. So, that is why the graph needs to be updated every day along with it the new viewership also is getting changed every day right. So, daily basis we are basically changing it yeah. So, so that is that becomes the time best time best component that needs to be added in the in this particular model right. So, or I should say that what are the trending videos that are coming right. So, that way we can kind of I mean create a combination of the recency of news what is the number of view counts that we are getting into a particular video and that can be I mean creating a ensemble kind of technique to along with the PGM model to give output like that. I am not sure why the browser is not showing yeah or do I need to yeah system preference. Yes. So, this notebook is available in GitHub you can download it you can run it the data is also there and the dependencies are also pretty simple. We use pomegranate library for implementing this particular probabilistic graphical model. We tried PGM Pi also that was the initial version and then we shifted to pomegranate yes. So, first what we do is we load the video data that is the news media data where you can see that these are the individual attributes we have the video ID we have the category name. So, category name is basically the genre and short description, story text and the title are basically part of the description right. So, what you do is we concatenate these three columns and we get the description out of it. So, once we get the and then we load the Coakens matrix also that is the viewing pattern of different users which contains the information about which videos has been watched by which user. So, that matrix we basically also is an input to this particular model. As we can see that there are 10 unique genres that are already there it is a structured data and for the description part is kind of formed by concatenating the three columns as I said. We clean this particular description column and we run multiple LDA models out of it and as we can see that LDA basically takes the number of topics as one of the parameters. So, it is very important to find out the optimal number of topics. So, that what we do is it is a very small experiment that I made it to keep it very simple. So, we are using we are just trying five models which has number of topics ranging from 8 to 11 and out of which the model with the low perplexity is basically the best model. We can see the in from the plot that topic number 11 the LDA model for which topic count is 11 is the best model out of it. Once we select the best LDA model we use that particular LDA model to find out what are the topics of individual videos. So, after we apply that best LDA model on the description part we get the column topic list where the topics are listed as the indexes. So, as in LDA model we do not have I mean a particular naming convention right. So, individual topics are basically a collection of keywords. So, we are just referring individual topics as the only the keywords or the topic IDs. Anyway, it is not going to make any problem to the probabilistic graphical model because it will be acting as a individual node which will not have any semantical meaning in terms of the content or the name. Once we get the topics and the video IDs and the category name then we have the three levels and as explained in the presentation we can create that generate that particular graphical model out of it which is we call the UDG right. Once we build. So, in the next section we write three functions which are necessary to produce the results so that pomegranate can consume it and train the particular model. So, what it requires is it requires the UDG that is the structure, it requires the node list and it requires the parent dig that is for a particular node what are the list of parents it has right. So, ok. So, in this particular data frame that we could see that video ID topic list and the category name right previously the entertainment news was the parent of the first video ID right here. So, now what will happen is the topic ID 2 will come as a second layer and it will basically become an intermediate node between the category and the video ID right. So, that way it will connect individual things right. So, it will it will have two parents. So, for example, for the second index exactly exactly exactly. So, what happens is here you can see that Indian news and entertainment news share the same topic the first topic itself right. So, that way we can find out the relationship between individual genres as well as individual videos also and that we can exploit while querying the probabilistic graphical model. So, once we do that we it is a simple call to pomegranate model to train that particular model out of it and it takes around 27 seconds and for around 5500 videos that we have tested here and now coming to the prediction part right. So, as in the example that we showed here also we are trying to say that ok these two video IDs have been watched that is why these the values are one for this then what are the videos I should recommend right. So, once we set this and we use the model to predict on the observations we kind of I mean get a result dict out of it which has the video IDs and the probabilities along with it which will represent what is the probability with which it should be recommended for naught. So, this is pretty much yes as he also raised the same question right. So, this graphical model will be kind of individual to individual user segments right. So, let us say we have the user user pool out of it we find let us say 10 segments out of it. Now, for individual 10 segments we have disjoint sets of users and for individual sets of users we have different pgm models. Now, when a new user comes who does not have any viewing pattern or any history he will be any way falling into one of those segments based on the metadata or any information that we are capturing out of the users right and then one we once we make it fall in a particular segment then we will refer to the probabilistic graphical model corresponding to that particular user segment exactly yeah one default recommendation we can always show him. So, in this particular case the default recommendation would be based on the population itself right. So, I so, in this particular case observations right the in the observation I cannot I might say that I did not see anything I can send a empty dictionary right then also it will send me the default recommendation that is there already in the particular okay. So, in this yeah so, that is the beauty of this recommendation model right. So, here what we are saying that this particular video ID corresponding to the value is 1 right for both of them. Now, if a particular user explicitly say that I do not want to watch this video or did a thumbs down right. So, we can we will say that okay for that particular video ID the value would be 0 here exactly that will also be taken care here. So, this is pretty much the demo part you can try it out and you can send me the feedbacks also. Let me come to the next part. So, this is the performance of the pomegranate that we kind of found out. So, as we are building production level system it is very important that whether the model is scaling with the data and the viewing pattern or not right. So, we benchmark the model till 400000 nodes and what you could see is pomegranate is good for low and medium kind of traffic right. So, till 50k 40k it is working pretty much fine you can see that the training time does not matter actually only the prediction time matters and we can see that for 50k it is around 26 seconds it is taking. So, probably the cases where the response time is not that important like email notification right. So, that way it might be useful for very high volume of data, but for real time recommendations it might not be the best case right. So, this is the benchmarking we did for 8 CPU machine and 16 GB memory. Now, coming to the summary part right. So, as we can see that while we build the conditional probability table the size of the conditional probability table is dependent on how many number of nodes it has right I mean in the parent node. So, if a particular node has two parent node then the size of the conditional probability table would be 2 to the power 2 minus 1 that is 2 right. So, in the same way if we have like 32 parents to a particular node right that will be 2 to the power 32 that will be out of memory out of memory problem right. So, we need to be smart enough to create the unweighted dependency graph if we have that kind of scenarios we need to break it down into multiple levels. So, as we can see that multiple levels in the gravitational graphical model is not a problem, but if we have a two-layer gravitational model and one child is having 100 parents that will fail anyway right. So, we need to be very smart while creating the ontology tree that is the UDG. Coming to the second part it is very important that we maintain a directed cyclic graph. If we if there is a cycle present in a particular graph that also can be addressed. So, there is something called Bayesian billyp propagation which is also part of the pomegranate library and with a maximum number of iterations we can kind of say that ok after this many iterations we should stop and we should take the inference out of it right. So, that is something that we can we can try and coming to the last point is the scalability right. So, we the pomegranate as we already talked it is a vertically scalable solution, but we can use that word which runs on tensor flow and using variational inference technique it is pretty scalable and we can get real life real time prediction out of it right. So, that is it from me any question. Yes. Yeah. So, normally what we what we did in our case it was if we kind of try to make it is not more than 6. So, that is something that we tried actually we ran out of it actually we ran out of memory problem that is why you came to know about it and we did not handle that particular case before and when it broke we kind of we kind of debugged and we found out that ok this is the problem and no we are not going to lose any information. So, what we are going to do is we are going to create a super topic out of individual topics right. So, we are going to break or we should say we are going to aggregate like 10 or 20 topics into a super topic and that will be kind of connected to the genres altogether right. So, that will basically increase the number of levels and decrease the number of parents. Output of the LDA we complained. Yes. Ok guys any question. Yeah. So, when we talk about recommendation looking good and bad we should have a certain way to specify that right. So, that is something that we do it by every testing to find out whether the model is working properly or not right or in this particular case right now we are utilizing around 1.8 million viewership data and total number of videos that we are kind of handling is total 25,000 videos and for that we are getting pretty much good results out of it right. But if you say if there is any particular number I should say that I mean 10 k for the number of contents and at least 50,000 viewership data will be good for kind of generalizing or finding out the priors to individual nodes. Thank you.