 Thank you for having me, so I am coming from the department of software and information system engineering at Ben-Gurion University, which is the youngest university in Israel, and during this talk I am going to give some review on the development that we had in the field of recommender system. Actually the field of recommender system has started almost 20 years ago to take a trend in parallel to the idea of Google metrics and page rank that was introduced by the Google founders. So just to make sure that we are all in the same page, let's look what is recommender system. So the idea of recommender system is to help users that don't have the time or the competence to go over millions of items that e-commerce sites now suggest. So in its most simplest form, actually recommender system provide a personalized rank list of items. Maybe the first large scale example of recommender system is that of Amazon, which used about almost 20 years ago a very simple pattern that recommend item based on other users. You probably all see that when you are watching on a certain item you get a recommendation that is based on people who bought this item also bought the following items. But recommender system can be found in many other websites. For instance, when you are watching YouTube, once the video is over you automatically get the next video, right? And this is done by a recommender system. Or even if you are doing search in Google, right? If you are looking for a certain term, for instance in this case I looked for Isaac Newton, you get other related search terms with Albert Einstein and so on. So actually recommender system is very popular and can get very effective for increasing the sales of companies. This is some statistics that are coming from the e-commerce websites like Amazon, 35% of purchases are being done from recommender systems, 20% from Alibaba. And if you are looking at the video field, the percentage is even higher. In case of YouTube, 70% of the clicks are coming for recommendation and in Netflix it's 75%. So just to give you some short history of the rise of the recommender system, the idea of recommender system was coined in the early of 90s but at the beginning it was part of the information retrieval community with a very small number of papers that actually deals with recommender system. The change or the emergency have been created in 97 when Professor Paul Resnick has published a special issue in the communication of the ACM regarding recommender system and since then we can see a very clear increase in the number of publications in the field of recommender system. Maybe another important milestone was the competition of Netflix. I will talk about that later. It did a lot of public relations to the field of recommender system and a year later a special conference of the ACM called ACM REXIS has been established and since then it's attract a lot of people that are working in the field of recommender system. You can see it with the number of publications that have been published in each one of the years. So there are a lot of recommendation models that can be used. I will mainly talk about collaborating filtering which is actually a bipartite graph that we are trying to solve but other models exist as well like the content based models, knowledge based techniques, community based techniques, context aware recommender system and some companies are actually doing a hybrid solution that combines several different approaches for recommendation but probably the most common one is the collaborating filtering and the idea of collaborating filtering is very simple which actually we tried to predict the opinion of a user regarding a new item based on the user's previous opinions on other items and the opinions of similar users to that item. Now specifically collaborating filtering can be defined in different ways. It really depends on the input that we are given. It can be rating data when a user actually gives rate for an item. This is for example what was the setting in the Netflix price but usually in many real systems you get what we call event data where a user click an item, purchase an item, add to a cart and so on. So when we are talking about data we should also differentiate between the explicit feedback where a user provides the rating or the like dislike and implicit feedback like when a user click on an item. It doesn't actually say that he likes the item right it just say that he was interested now on that item and of course in order to define the collaborating filtering task we can also look at the goals of our system. One of them can be rating prediction like in the case of the Netflix price but there are other kinds of goals that can be used for instance purchase prediction or top end recommendation and many others. So it's the actual setting of the problems indicate which way we are going to solve it in what way we are going to solve it. In any case we are giving of course a matrix that from one dimension we have the users in the other dimension we have the items and the most important thing about that matrix is that it's probably going to be very sparse because we have a lot of items we have a lot of users and usually users provide only a feedback only for very few number of items so most of the matrix is going to be sparse. Now from a graph point of view what we have here is bar part type graph from one hand we have the users from the other hand we have the items in this case the movies and the edges holds the rating that the user gave to an item and usually what we are trying to solve we we define some loss function the most popular one is root mean squared error which we try to compare the actual rating with the predicted rating that our system give and we try of course to minimize that there are several techniques to solve that in the 90s the most popular one was the nearest neighborhood now it's very trivial way to solve the problems but it's still very effective actually you with the very small short lines of code you can get very effective recommendation so this was the main approaches that have been used in the 90s then starting from 2003 matrix factorization become very popular to solving collaborating filtering tasks and of course in the last few years you all heard about deep learning and I will talk also about that it's actually a way to generalize the matrix factorization methods and of course getting even better results so if we are talking about the first approach the nearest neighborhood we can solve it either as a user to user model when we create a similarity matrix between the users based on their opinion on different items or we can build item to item model where we compare to every pair of items based on their on the opinion that user gave about these items now using the nearest neighborhood can be done in two different approaches the first one is actually using a predefined similarity measures like the Pearson correlation hamming distance cosine similarity and so on which is very easy to implement another approach which provides even better results is to learn the similarity using of course optimization the optimization in our case as we said is the minimization of the root mean squared overall so just to give you an example if we are using the hamming distance and we are giving a user with some feedback that he gave on different items and we would like to give a prediction whether he would like an item number two which is now indicated as a red we take all the user in the database that has rated item number two and we calculate the hamming distance for each one of them and the most similar one we take that user and get his prediction his rating and use it as our prediction for the current user which is of course very simple approach usually we are not using only the most nearest one but we combine the top k nearest user but that is a very simple approach and a better approach is to learn the similarity and in order to do that we first define the baseline prediction for the rating which is what you see here with the it's it's contains both the average rating the user actual offset from the usual rating and the item offset of that specific item and we can try to create a model by taking this baseline and add to that baseline the offset that we have from the actual rating and the baseline of other items multiplied by the weight or the similarity between these two items and wij is of course the variables that need to be found using an optimization by trying to solve for instance the minimization problem that we see here at the bottom now of course we can solve it in a different way usually we can use stochastic gradient descent to find these w's so this is the nearest neighborhood the second approach that was developed during the it got a popularity in 2003 is the singular value the composition or matrix factorization in order to understand why it became so popular we need to look at the Netflix price hopefully most of you already heard about that it was started on 2006 when Netflix defined the price of one million dollars which is a lot of money and they gave a training data set of 100 million ratings coming from almost half million of users and 18 thousands movies and they had a qualifying set that was divided into two half one of them is the test set that was used for determining the winners of the competition and we had also the quiz set that is used to calculate the leaderboard that was updated it ongoingly and the goal was actually in my opinion was very modest only improved their existing algoritmes by 10% which doesn't look like a very tough task it's actually reducing the ultimate square error from 95 to 85 well but it's took too many teams about three years to reach that threshold and after almost three years there was the first team that called belcore they succeeded as you can see here to get the threshold improvement of 10% I actually at that time when I saw that I called my friend koren which is one of the members of belcore I gave him my my congratulation but then 20 minutes later another team succeeded to break this threshold the team that is called ensemble you can see only 20 minutes between these two teams both of them succeeded to reach this threshold almost at the same time there was a bit dispute but eventually the first team got the the price of one million dollars anyway there are several lessons that have been learned from this נצפליק's prize first of all competition is a very good way for companies to outsource their challenges you're probably all known today kagel right that was purchased by google I think a year ago that actually companies can submit their task and many people are trying to solve it so this is one good thing for companies another thing is that they got a very nice public relation I mean many people deal with that Netflix prize and it's a very good way to hire top talent people if they succeed to solve the task anyway based on this competition SVD become the method of choice in collaborating filtering and actually from that moment every paper needed to use SVD as part of the solution right only because of the Netflix prize that make that uh methods very popular another important lesson is that if you want to win a competition you need to ensemble different models together to do some averaging of that and regularization is very crucial in order to avoid overfitting of the of the model and another result from that competition was that if you have enough data all the information of the content feature like in this case the genre or the actors of the movies doesn't really was not found to be useful that's really useless it's which is very surprising I mean you have information about the movies you have information about the users but you if you want to get a very high accuracy you don't need to use them to make the prediction again this is true only if you have enough data enough of enough rating data and another lesson is that methods that were developed as part of the competition usually are not practical in real life because they are too complicated to be used on on a daily basis anyway the idea of the basic idea of SVD is to take this bipartite graph and introduce some latent factor in the middle that both the users and the items can be related to them and mathematically this means that we take the original matrix of the rating and we decompose it into two matrixes that by multiplying them we can reproduce the original rating and make a prediction of new a rating as well basically what we are doing is we embed both the users and the items to the same latent space new space that we have discovered using this model now there are several ways to do the factorization that are became became popular in collaborating filtering we can use the classic SVD we can use the low rank factorization I think this is the most popular one in collaborating filtering or we can use a code book decomposition where the u and v are a permutation matrix and I will talk about that they it can be used for doing transfer learning between different domain in any case our problem is very simple we have two matrices u and v that need to be found we have some loss function usually we are using the mean squared l again not always because it's the right measure but just because it's easy to optimize this measure and it's a good proxy for the real measures that we are interested in and usually most of the people are using the most simplest optimization methods like the stochastic gradient descent now when we are trying to solve the collaborating filtering there are three main issues that need to be addressed the first one that we already mentioned is the sparseness of the of the matrix that most of the matrix is sparse we have the issue of long tail this means that many items that are located in the long tail have only few rating and we also have the problem that we called called start where certain users or certain items don't have enough data for us to make a good prediction regarding them so we need to address these issues and one way that we did in order to to address these issues was to use transfer learning now in classic machine learning what we have we create a model from from scratch for each one of the sources that we have for instance if we are making a recommendation for movies we create a model for movie if we make a recommendation for books we create a different model for books and so on what we can do with transfer learning is to take several source domains we create their models and then extract some of the knowledge and use it for solving a new target domain and the nice thing about that is that you can do transfer learning without sharing anything between the two domains what I mean is that we don't share neither the users nor the items but still there are several patterns that can be found in both of the domains both the in the source and in the target so how we can do that and in order to to do transfer learning we can use the factorization of a codebook which we have a permutation matrix for the user and permutation matrix for the items and we have the codebook in the middle and based on these two permutation matrix we actually can provide a prediction for the rating right so the idea of codebook transfer is the following one we take the rating matrix the original rating matrix both in the source and in the target domain we are doing permutation of both of them and from that permutation we try to extract the codebook which is every cell in the codebook is actually the average of all the relevant cells in the original matrix and hopefully what we can find is that some of the part of the codebook can be shared with a second domain the target domain why does it make sense just to to have an idea let's look at the following codebook matrix and let's look for instance on a certain user what we can say about that user this user is actually doesn't like to give compliment he actually gives ratings for usually of one or two he never gives a rating of four or five right and this type of users can be found in different domains you can find it in in games you can find it in movies you can find it in in books right it's it's a type of a user or if you are looking at certain item what we can say about that item that we are talking about controversial items sometimes people like him like that item and other people really dislike this item and again this is a type of a pattern that can be found in different domains so the idea of codebook transfer is is that it is you need less data to reuse existing pattern than finding them or discovering it from scratch so one of the methods that we have developed we call it Talmud Talmud in Hebrew is learning and it's try to do a transfer learning from multiple domains to a target domain and our objective was like always the mean squared error in this case we have the the actual rating from the target domain and we try to make a prediction based on several codebook that came from different source domain for each one of them we need to find the permutation matrix the relevant permutation matrix of that domain u and v and we have alpha that represent the relatedness of each one of the source domains to the target domain and in order to solve that actually we use a very simple algorithms we begin by finding the codebook of each one of the source domain using co-clustering algorithms once we do that we started to find the permutation matrix of the user items and the alpha coefficient in order to do that we first freeze the item permutation matrix and the alpha and we solve the problem for the user permutation then given that we freeze the user permutation and the alpha find the item permutation matrix and finally we found the alpha values alpha vector and we repeat this method until we get a good result now one thing that we found out is that it's not a good idea to use all the sources domain at once and create them prediction model so a better way is like what we are doing in a simple regression task when we add each one of the source domain gradually we try we begin with the with the first best source domain and then we add the next one and so on until we see that we cannot we cannot improve the results anymore and in order to do that the best the simplest way to do that is to take the training data split it to training and validation when training was used to find the permutation matrix and the alpha coefficient while the validation was used in order to indicate whether the this source domain can be useful to this target domain and we did some tests on that we we took some very popular public data set that are relatively with a lot of rating like the net flick movilance and so on and we tested it on on different target domain that we had that are relatively small in other words you don't have enough data on those target domain and what we have found is that this method that we have developed could actually help us to reduce the the loss function in all of the cases when we compare to existing method that have been very popular on that time another thing that we found out with that algorithm is what we call the course of sources like the course of dimensionality when you add more sources actually the training error goes down all the time but the test error goes down until a certain stage and then it started to increase so we need to actually we need to actually goes we add sources until we see that adding a new source actually increase the error so this is a very important aspect in in the algorithm so this is all was the era of svd but then a few years ago the idea of deep learning came and did some changes in our field as well in collaborating filtering and first of all giving a computational graph like a many system provide like tensile flow and so on first of all the svd can be implemented using that framework in order to do that we represent both the user and the item as one hot vectors and then we convert them or to an embedded vector and we need of course finding the metrics that convert from one hot to the to that embedded vector once we did that we had an operation of dot product and we get the prediction of our rating and based on this computational graph we can find of course the best representation of both the user and the item but actually the fact that we can solve now in a very efficient way any computational graph led to the following architecture which start almost the same we are having the user one hot vector and the item one hot vectors we are doing this to embedding by the way in this case instead of using dot product we just concatenate the vector of the user and the item this allow us also to have a different embedding vectors a different size of embedding vector for the user and the items and then we have several layers of this is the reason why it's called the deep learning and we get the prediction now the amazing thing about that is this is something that was published in a one blogger has published a very nice code how you get the price of Netflix but only these few lines of code it was using keras above the tensorflow framework and as you can see the blogger don't even try to do hyperparameter optimization it used a very simple definition or the default values and it still gets the best results when compared to the netflix price 10 years ago so this is actually demonstrate the power of deep learning in collaborating filtering tasks now another thing that we can do with deep learning is the idea of item to veck in which we are trying to embed the items into a new space and then we can answer different question based on that embedding in that case of course what we are looking for is that item similarity will be a proxy for the vector similar or maybe we should define it in another way around the vector similarity should represent the item similarity we learn these vectors from the user's session and inspired by the word of veck that lakoun has presented in the first day instead of words we actually have the items the catalog number of the items and instead of sentences we actually have the user session when a user came to a system usually the user doesn't look on one item but actually is looking on several items and we can use this session the sequence in the session as our sentence the analogy of sentence now in order to create this embedding we can use the same idea of a continuous bag of items in which we for instance in this case we have i'm not sure if you can see it so well anyway what we have here is we're taking a windows in this case of size two so in total we have a session of five items where the first two items and the last two items are used as an input and we try to predict item number item number three at the middle based on that on the context of the of this the first two and the last two items and in order to do that we can again start with the one hot encoding of each one of the items we try to find two matrices this one and this one the first one need to convert the hot one hot vector into an embedded representation of all the items and this one try to do the opposite convert from the embedded representation into one hot representation of the target item and in order to do that again we need to find to solve the problems usually using a stochastic guardian descent we find this two matrix based on that we can represent each one of the hot of each one of the items as a embedded vectors we can aggregate these vectors and then try to convert it using the second matrix to predict which is what is the missing items between these two items now some interesting result that we get from this very simple approach of item to veck is that and again please keep in mind that the algorithm didn't get any information regarding item title or the description of the item but still we can get a very interesting results like the most trivial one is when you give it a certain item like in this case it was galaxy s7 or a certain model you get a similar items other models of the same s7 something but even more interesting is that we can answer analogy like in the case of words you can do something like that if you take apple phone iphone model 5c minus apple iphone 4s plus something galaxy s5 edge what you get can you guess galaxy 6 in this case right because you are right you are reducing the the old one and then add the new one so you get the the new version of galaxy now the reason that you got this interesting result is because for each one of the items you have other items that are can be related one of some of them are common to the models that came from iphone like you can see here the apple earpods that are common to both iphone 5 and iphone 4 or on the other hand you have something charger cable that are common for galaxy 5 and 6 right but at the same time you can find some items that are related to the new model like the nano sim that is both related to iphone 5 and galaxy s 6 s and you have items that are related to the old model like the micro sim so because of that because of these related items that you can find in the catalog you can actually get this nice analogy results from the from the data so i will try to sum up what we are doing right now in this the last two years first of all the community of recommender system are pretty good in creating a high or accurate results but most of the prediction or recommendation that we give are very trivial and we are still trying to find out ways to provide better results either by providing diversity or providing serendipity in the in the recommended items that we provide to the user so so we'll be able to give an additional value to the user and not only the trivial recommendation and other things that we have found out and we are working on that in the last three years is how to incorporate the price as part of recommender system actually every user has some price sensitivity and it really depends on the on the user and the target items as a user i might be very a sensitive to their price when i am buying a laptop but not sensitive when i'm buying a cellular form okay so we need to learn this sensitivity patterns for each user and based on that making the the recommendation so we did some work with ebay on that issue and other things that we are working on is the how to make the recommend a how to explain the results of the recommendation so the user can have more trust on the result on the recommended items that he gets from automatically from the system the another important item that we are now working on that is how to counter react the fact that today if we are looking at the recommender system about 10 years ago many of the website didn't have a recommender system so all the data that have been collected was organic traffic of the user nowadays they already have recommender system so actually most of the activity of the users are not organic but actually are affected by the fact that you have a recommender system so you need a way to understand what is the organic behavior of the user and what came from the already implemented recommender system and it's very difficult task to do this the composition and the last thing that we are working in is how to use a recommender system for a domain that needs background knowledge i will give you some example about that for instance we have a project of auto ml automatic machine learning when we're trying to recommend the best algorithm to a certain problem giving our background knowledge that have been collected for many papers that were published in the field of machine learning so giving a problem giving the description of the problem and the data of the problem we try to automatically find you the best algorithm to run on that problem it's a way to automate the process of data science another example that we are using in in the case of snod base recommender system is to recommend a medical procedure based on your current medical record that you have again this is a domain that needs a lot of background knowledge in order to make that recommendation it's much more harder than making a recommendation for movies or games so that's it if you have any questions i have two questions the first one is about just a technical point about using deep learning for a singular value decomposition it's not the best way to do that i just actually i wanted to show that it can be done with computational graph it's definitely not the best way to do that but it shows that computational graph can generalize the idea of svd nobody really do it in that way if you are asking the question practically no one do that yeah i mean that's that was kind of my question but i was wondering even for very very large matrices whether it it it turns out to be a reasonable approach which one the deep learning for svd if you you're asking if it is reasonable uh i'm not sure i'm not sure even for a large one i'm not sure that this is the best way to to deal with that i mean probably writing your specific code for svd will be better in this case because there are some tricks that general computational graphs don't take into consideration but but the the slide just was to show that it's yeah it's possible it's possible by the way if you try to do that you will see that it's a it takes a very long period of time to to gather the results so it's it's it's not the the best way to to solve it it's it but there's a lot of research it's hard to parallelize svd you can parallelize yeah the deep learning right right aspect and so but still at least for my experience a computational graph are are are not the best solution for the for the svd okay may i ask my second question yes so so i was struck by um with your code book approach for matrix factorization reminding me of the discussion in a previous talk about when using deep learning to try and do translation that they were really trying to find an underlying um commonality something common between all languages and then map all languages to that remind me of your yeah yeah yeah you are you are totally right yeah i wondered whether you have sort of um thought about doing it a code book like plan but with the deep learning recommender system yeah we actually are now doing it uh i have a phd student that is she's trying to do that exactly that but with the deep learning yeah yeah yeah it's and we got a very encouraging results right now so yeah you're absolutely right yeah sure um yes uh do you know because the price of Netflix price stopped in 2009 yeah you know if with with the evolution and new algorithms what kind of accuracy can we achieve now with the same uh yeah you still can first of all Netflix just to just to let you know the the test part there there is the qualifying data set maybe i would go back to that lot of animation uh yeah here it is uh actually what we can do now is only test the quiz set because the test set is is unknown actually but it's a very good estimation to the the real results because we saw it all the time that in the leaderboard there is a good relation correlation between the quiz set and the test set that was not published by Netflix so you can still of course take the svd that was developed by yuda coran and his quarters and use it but as i saw as i have shown other methods like the deep learning can do it much faster i mean and we can go above 10 percent yeah yeah yeah it's go it's go it's improve even more than 10 percent and when you do transfer learning you you do only the domain adaptation not uh because in transfer learning we have a multitask learning and the domain adaptation and i see that you only yeah yeah yeah yeah we in that case we we did only the domain adaptation you are right thank you very much um how do you deal uh with new products or incoming products because if you have a user that buy a not product uh you cannot uh propose a not for the console yeah yeah yeah yeah so this is there what i call the cold start problem of both the user and the items so if you actually have a really new user without any interaction the best way you can do is use what we call popularity based on demographic meaning take the most popular items for people that came from a certain country this is probably the the first prediction that i can give you but after i have several interaction i can use collaborating filtering for providing a recommendation and the idea of transfer learning is actually is a is a very good way to do that what we have shown the idea of code code book transfer you can do it not only between domains but also in domains taking from one area that is is relatively dense and do transfer to an area that is relatively sparse so you can do that the the same trick in in domain and this is a very effective way to deal with both user call start user and call start item but again you can do that after you have some interaction without any interaction it's very very hard to to make a prediction yeah so i was uh be shocked by your the figures you gave at the beginning are 75 people in the Netflix yeah in the Netflix yeah i'm also in short but but let me give you some intuition why it's worked for instance in youtube when you are watching a movie the chance that you will see the next movie that is automatically played is very high in my case i usually watch a movie and then i get the recommendation and goes with it right i mean the question is the following so given that these recommendation systems take as input knowledge that come from user choice if user choice become recommended by the system itself then there are issues maybe it's pretty random in a like a bubble yeah yeah you hope i'm sorry other ways to detect this kind of phenomenon yeah yeah this is the second this is actually item number four here the fact that you don't have organic behavior of the user it's it's really affected but the current recommender system now it's very difficult it's not a simple issue uh companies like Netflix i actually asked them how they are uh trying to address this issue one way to do that is that you don't only get a recommendation by the system you get also random recommendation just as a way to learn more about you so it's a it's a it's usually the idea of active learning that you are trying to learn for something that is even if you you know that this is not the best choice right now you still suggest that item yeah but but it's a very difficult challenge it's not an easy task uh i'm not sure that there is a ways of detecting the the fact that the user are not taking best choices choices which are not actually the best or ways of uh i'll say for example some best seller may become a best seller just because of recommendation yeah yeah because of right you are absolutely you are absolutely right yeah again i'm not aware of of a good solution for that problem yeah but it's it's an issue yeah like the fake news exactly exactly yes i was wondering what was the impact for the newer generation that always grew up would recommend their system whether they're having organic choices still makes sense yeah yeah they'll still make sense of course yeah i'm sure about that against some of the website are trying to combine both organic suffering of the website with the recommendation in order to create more and more organic behavior of the user but i'm not sure that they really succeeded to solve this issue organic it's mean that the user decided to go to that item it's not what it was not recommended by the system it's his own decision to do that and when a recommended system always give you a choice sometimes you you take them even if it was not really your opinion about so it was it's my understanding that um netflix these days and other companies like sales force aren't using things like the user's ratings of past movies but are using have shifted entirely to behavioral that yeah they what you've watched they know where the amount exactly exactly the event what we called event data yeah because you have much more data of events than data of rating you are absolutely right and so in your research are you able to use some of that data yeah yeah sure sure sure sure sure sure sure again because of the netflix prize people are usually taking the the rating as the most important one but i totally agree with you you have much more data that is event based data and one way to do that is to give a different uh weight to different kind of events that you had for instance if you have a purchase event is more important than adding an item to the cart or a view the item page and so on and you can learn the effect of each one of the event as a way to predict the purchase of an item yeah we do that we do that all the time by the way the the idea of code book can be used for that as well you can imagine because instead of having different sources you have different kind of event you have one matrix for the purchase event one matrix for the item page view and so on and usually what we are trying to do is to predict the purchase matrix which is relatively sparse based on all other matrices like a view page item which are much more dense than that event and you can do this idea of transfer learning between the two two different type of events i can end with maybe just a remark which is you know here at uh you know t-bo demore last night he gave a talk in paris and he he said something about how human contact is really important and that you know having a face-to-face meeting is much better than a skype meeting it makes me wonder in the contents of recommender systems whether all the little bookstores that had experts that could recommend personally to the next book for you to read might might not be better than amazon's automatically by the way this is uh we have a community based recommender system which is trying to look what your friend likes and basically or actual friends and based on them making a recommendation it's still not the the most popular way to do that again it's very hard to beat collaborating filtering when you're doing testing you will always find out that collaborating filtering is much better than the existing method the other method more questions? you about again about this organic and induced purchase this one kind of data which is more organic than what we are using or it's searches it's data for searches and and particularly if you start your your session by searching for some item it's apparently it's it's essentially more organic than other i totally agree with you by the way google is doing that but unfortunately the search data is usually only google have it right or other companies like facebook but for most website e-commerce site they don't have enough search data that can be used for that so this is the reason why a small company let's call it not a small company but a company is without a search engine need to use the idea also for collaborating filtering i have one last question you mentioned some algorithms are good at winning challenges but not good in practice so what's your opinion which are the algorithms for the challenges that open which are the practical? i think when we are talking about practical it's need to be first of all you need to train it in a very efficient way it cannot take weeks of computational power to to create the model this is actually what happened in the next week's price they used a very accurate model but it took weeks to to train it this is one thing the other thing is that you also need to provide a recommendation in in timely manner and again the the solution that they have used is good for overall prediction but if you would like to give a prediction for a given user that is right now in your website their method was not very effective in doing that so again in practical life you need to to address that as well just a question about the data the data sets i'm very surprised with the recommendation in general they don't ask you feedbacks okay where our recommendation okay oh yeah yeah yeah yeah this aspect of developing the the data sets which would give a better there are there are a few data sets with the user feedback that were asked whether the recommendation is good or not we actually even in our university we did some collection of this kind of data where we asked the users if the recommendation is good or not but again it's very difficult to collect such data with the millions of of users it's much more easier to take the event data which you already have in in any case one thing that we have learned from collecting the data is that users are biased to the way you report the recommendation i mean if you say to a user i recommended that because i am quite sure that you will like it then the users think this is that the recommendation is good although by the way we created a random recommendation so users are biased to the way you present the the resource which is amazing for me it was more questions than you can have lunch and thank you very much again thank you