 So before we start, I need to share some house rules, before that, thank you everyone for coming. It's not the best location, it's simple to come, but I'm really happy that you guys made it here. So if you guys are at the front, you will see some stickers, you know, take it, like there's one for each seat, we don't have enough for everyone, so you can sit in the front, that would be great. I heard that the best seat is in the front, in machine learning you want to be the best. So without further ado, sorry, house rules, everyone has a badge that you click on, this badge is very important. If you don't have a badge, the security will ask you out. So please make sure you have that. Before you leave the office, you will need to return the badge, you know, put it on the desk outside, the security will help you. When you have a restroom, it's when you leave the door, you turn right, you have to exit this first door, and it's all the way to the end, you will see a restroom sign. To come back in, the security will help you to tap the door, and then you're coming again. Okay, today's session, we also managed to get, you know, and Unitite should help us do some online streaming and recording, so we can rewatch this. And without further ado, let me, you know, invite and thank Karthik for hosting this session today. During the session, challenge Karthik with the best he can. I know he's very good at this. You know, if you don't agree with him, challenge him, and ask a lot of questions, and at the end of the session, you know, we have a meeting up with you all to give up some stuff. Okay? During the session, if you have questions, please, there are four mics in the middle. We'd like you to just, you know, go over there, switch the model, and then just speak to the mic. So they have a question, and then, you know, Karthik will help to call in. Okay? You know, let's, you know, invite Karthik, and, you know, let's give him a round of applause. Thank you. I'll be sharing some of the details on some knowledge about recommendation systems, and, of course, it will be with TensorFlow. So let's just move on. So about me. So first, I'll just give you a brief description. So like I said, I'm Karthik. So what do I do? I actually work for SAP Research. So it's right across this building here. So my day job is mostly again on machine learning. So we work quite a lot on TensorFlow. And we build deep learning models, machine learning models, mostly on national language processing, some big data, some numbers as well. So, yeah, that's more or less about me. So prior to actually doing this, I was actually a graduate student at NTU. I graduated with a PhD. So, and then now I am actually working in the industry in Singapore. So, yeah. So let's go on. Let's move on. So I'll talk about the agenda first. So first, we'll get into why recommender systems. What are the types of recommender systems? And probably we'll build a recommender system ourselves. So, and probably have a hands-on session. We'll also go through some inference, some training right on this machine. Hopefully it should work well. And in case you missed it, so all the code, all the inference, all the train models are available on GitHub repository. So you can download it and you can try it yourself. All right. So before we get into recommender systems and what it's all about, I think we should first understand what data analytics maybe will have us just a short data analytics one-on-one starts. So first of all, there are three main branches, which is basically descriptive and the prescriptive and the predictive models. So what descriptive does is that it basically creates a summary of past data. So it creates historic data. It uses historic data to understand what historic data is about. And then it just gives you some information about how the history behave. So you take about prescriptive. So what prescriptive does is it tries to find a good action possible. So given some data, it tries to figure out what is the best action possible with this given data. And finally, the predictive model. So predictive model is mostly on statistical learning. So most of machine learning is under predictive model as well. So it tries to statistically understand what the data is about and then it tries to predict the future. That's what it does in terms of predictive modeling. But in terms of prescriptive modeling, what happens is it basically gives a recommendation. And that is what we'll be talking about today. And finally, for descriptive model, it basically tries to figure out some aggregate data. It condenses their data into some sort of form. And when do we use descriptive? So probably we'll use descriptive when we want to figure out what it is about, what the data we have. So some knowledge about the data we already have. Probably for prescriptive, what we would want to do is we want to use it when we want to get some suggestions, some actions, and we want to actually work on that. And when predictive, we want to actually do some prediction, as the name suggests, we want to do some prediction for the future. So maybe if we are talking about stock markets, then we would probably like to see the trend if you want to sell or if you want to buy. This sort of learning comes under the predictive model. And for example, so if we talk about descriptive model, so correlation is an example. It's an example. Average is basically descriptive. And for prescriptive, we have recommender systems. So that comes under prescriptive model. And finally, for predictive model, we have sentiment analysis. That's one example there. So with that as a precursor, let's move on to recommender system, which comes under the prescriptive model. So what is a recommender system? So that is the primary question that we are trying to, first of all, figure out. So if you think about it, recommender system basically tries to match a particular user to maybe two relevant items. So the challenge here is that it's not about understanding what the user wants or things like that. It's basically trying to wow the user with maybe historic data, with similar user data. So things like that is what's actually going to be a success for recommender systems. For instance, I was just talking to a friend here. So he was saying that some of the recommender systems are not as good. If you see Netflix, for example, you would have watched movies. And it would actually suggest you really interesting series or movies that based on what you actually watched. So this is a standard very simple example for recommender system. So basically, how does it do that? It basically tries to see your usage history. So maybe I watched Inception yesterday. Probably I could watch Interstellar today. So that's a straightforward recommendation there. And based on people. So maybe my friend who has a very similar interest to me, maybe I watched a machine learning video today. So he also watched probably something like a deep learning video yesterday. So the recommendation system could actually come back and say, hey, why don't you check out this deep learning video that your friend actually watched? So that's another example for how a recommender system could actually help us. So in this case, what happens is I did not expect the recommender system to actually come up with some new recommendation that I have not even seen before. Probably I'm interested more in machine learning, but not in deep learning. Of course, it's a broader scope. But it's just that deep learning is much more the thing right now. Probably I'm more interested in classical techniques. But this comes back and suggests me that deep learning. There's a video on deep learning. Why don't you see that? And that's a sort of a wow that actually recommender systems are trying to bring forward. So you should definitely find these similar. So one very interesting example is Google Play. So if you actually download games on Google Play, you can actually see that there is a category which says, based on your previous downloads. So that's a very interesting straightforward example for a recommender system. And of course, Netflix would actually suggest movies based on your watching history, maybe even people in your household. Things like Google Instant. Again, Google Instant is another very interesting example where they actually store some data, and then they try to predict data, predict their search question based on your previous searches. And Facebook, for example, is doing something like that. So you might have added a friend. And that friend might have added a friend who's actually in the circle. So it actually suggests another friend based on your friend's addition. So that's another example. And for Spotify, for example, so if you listen to some genre music, say I keep listening to rock music all the time, or maybe I like Hans Zimmer. I keep listening to Hans Zimmer's music. Then Spotify comes back and tells me that it's actually Hans Zimmer has a new soundtrack. Why don't you listen to it? So that's the sort of recommendation that I'm actually looking for, because I might not know that actually Hans Zimmer came up with some new soundtrack, or he actually scored a music for a new movie. So these are the types of systems that are actually very useful, because if you see players like Amazon are actually trying to sell something. So in this case, a recommender system could come forward and tell me that there is something that knew that actually I might have ignored, or maybe did not even know. And another example is Spotify. I might actually want to buy that track. And this way Spotify actually gets some revenue based on this new recommendation that I did not even know. So that's what a holistic, just a view of what a recommender system can do and does. Okay. So how can we go ahead and recommend? So can we just take something? So the first and straightforward approach is content-based recommendation. So content-based recommendation is trying to figure out the data, and it's trying to understand what the data has and then matches sort of features with this data and another data. So in this case, the content of the item plays a vital part, and of course your profile, some of the text in your profile could actually also play another vital part. In this case, it's purely dependent on the content extraction. So what happens is if you have a bad content analysis, then again, you're screwed. So that's the problem here. Moving on, the next type of filtering or the next type of recommendation is collaborative filtering. So what collaborative filtering does is it's basically trying to aggregate information. So like I said earlier, I might have a friend, so on Facebook maybe, who might have watched a video, and I also might like to watch the video. So Facebook comes back and increases the video on top of my news feed. And then what happens here is that it's trying to figure out similar user interests and suggest new content based on user interests. So it's trying to figure out the recommendations based on user interests, not on the content. So in this case, if Facebook was to suggest something to me through the content-based recommendation, it would have to go through the entire video content to actually do the suggestion, which is pretty expensive if you think about it. But so that is what happens. That's a difficulty with content-based recommendation. So if you see, for text, maybe parsing is easier. But even then, what happens is human beings are, we are actually, we use lingoes, we use unnecessary incorrect grammar. So what happens is content-based recommendation fails in a lot of cases. And that's when collaborative filtering kind of came into view. And then finally, probably if you want to know what is the best approach, then it's a hybrid approach. And this is where Netflix is actually doing a good job. And this is where a lot of companies are actually doing a very nice job. So if you think about it, what Netflix is doing is it's aggregating information from you as well as your siblings or your family. And then it is also using the content. If you think about it, what it's doing is basically taking the genre of the movie or maybe the series that you're watching. And then it's actually matching those with your history. So it's doing both. It's doing both content-based recommendation as well as collaborative filtering. So in the case where maybe my wife would have watched some movie and maybe because I am also, of course, sharing the Netflix account, then Netflix would come back and say, why don't you also watch this? Because we have separate accounts probably. So these are things like what happens with Netflix. So other players, other large players are actually going towards a hybrid approach. And that's proven to be one of the best approaches. But more often than not, it might not be very feasible to actually start off right away with a hybrid approach. So we either start with a collaborative filtering or with content-based recommendation. And then we subsequently, once we keep accumulating data, then we can actually think about moving towards something like a hybrid model. All right. So what is the need for recommendations? So if you think about it, so this is how we are now living. So we have data that's pervasive. So you think about we have IoT, we have data collected from cars, from IMUs, from every part of your life. My phone is recording my walk. It's telling me that you actually burned so much weight. My Fitbit is telling me that I have 20 stairs today, 20 flights of stairs. So data has become so pervasive. And there is an exponential explosion of data right now. So that is one of the biggest problems. So that was one of the problems earlier. But now it has become an inverse problem where we have too much data and we don't know how to use it. So that is one of the reasons why we need a recommendation system. So if you think about it, we have Amazon. Amazon might have like say 100,000 items for a particular category. So how does it actually promote or how does it actually show up some new item that could actually be of use to you? So and this is how it does it. It basically tries to figure out what is your interest? How do you purchase? What is your purchase history? What is your purchase pattern? And then it tries to suggest new items. So there is also some research that's actually going on about the times of your purchase. So things like in the evening is the most susceptible to buy something. Things like afternoons are also apparently good. So these are all ways for understanding how to actually give a recommendation. How to the need for recommendation and how do we actually solve this problem. So let's talk about content-based filtering. So we have four movies here. So we have Interstellar, we have Inception, Gravity, and I think that's Dark Knight Rises. So if you think about it, content-based filtering does something similar to this. It basically takes some features, something like it tries to figure out is it science fiction. So probably the first three would be science fiction and the last one is not. Is it directed by Christopher Nolan? Then only gravity is actually not. And so what happens is it tries to figure out this and it actually has a matrix. So suppose I had watched Interstellar and I am actually looking for a recommendation and I'm actually trying to figure out so maybe I'm just browsing through my Netflix to see what are the possible movies that I could watch next. Then what content-based filtering would do is basically go through this matrix and it would actually figure out the distance or maybe the most likely that I would actually watch a movie. So suppose the other three were the only movies in my list which is highly unlikely. Then what happens is content-based filtering would try to figure out how much of a match the other movies are. So in case of inception, it's almost like a 100% match. So I'm highly likely to watch Inception and the least is gravity and the second likely is Dark Knight Rises. So if you see, the distance is basically a match between what I watched and the content of what I watched and the new content. So this is how it tries to build the data. So the real problem here is that for you to actually figure out these features. So if you do not know or if these features are not apparent, then it might be very difficult to actually do this sort of filtering, this sort of recommendation. And another problem is that, again, similar to classical machine learning techniques, if you choose different features which are not suitable for a particular problem, then you would again come back to the same cycle. So you would actually go back to saying, okay, I am actually choosing the wrong feature. Is it good? Is it bad? So going back to evaluating if this feature is correct, so these are things that content-based filtering has a big problem with. And so let's move on and see how, you know, this is basically trying to figure out how the potential features, and then for the items. So yeah. So how does collaborative filtering work? So in this case, so let's take the same four movies, but I have four friends. So let's take an example of four friends who are here, Eva, Dom, Koop, and Ryan. So if you know these movies, you might actually understand these character names. So Eva has actually watched Interstellar, but Eva has not watched Inception and Dark Knight Races. So the question is not what, sorry, this is likes. So it's basically trying to figure out that these four are my friends. So if these four like some movie, then how likely is that being a friend of these four people? How likely am I to watch, say, Inception? Or how likely am I to watch Gravity? So that's how collaborative filtering tries to figure that out. So you have a set of friends, and you try to figure out some new data based on what your friends have already liked. So this is some sort of another way of actually doing the same recommendation. But in this case, you realize that all you need is the like, and so probably the understanding of how these people actually, you know, appreciated some movie, rather than figuring out what are the features that could actually be part of the recommendation itself. So one problem, so this is, one interesting thing is that you realize that for Facebook, if you actually do a like, so the like is actually contributing to something of this one. So what happens is every time your friend actually posts a picture, you're saying like. So what Facebook is going back and doing is, okay, you like this picture, which means that the content of this picture is useful or maybe interesting to you. And Facebook tries to figure out how to, you know, give you features or give you new content that is actually very similar to the earlier like that you made. So if you see that there was actually another experiment that one of the guys did for 30 days, he liked every single news feature that he started to see on his news feed. And at the end of 30 days, he realized that his news feed was total full. So it was completely useless. So that's the reason why actually Facebook tries to ask you to like something. So, and this is exactly the reason. So the more some, you know, part of some news that you're trying to like, Facebook is going to come back and then it's going to give you suggest new features or not suggest new videos that you might likely watch. And that's also one of the ways to actually get more sponsors and things like that. So that's how collaborative filtering works. So it's based on users and the choice is made by similar users. So that's the keyword here. It's basically, choice is made by similar users is what collaborative filtering is trying to do. So like I said, so what are the challenges here? The challenge here is that what if there are no users to start with? So that's one of the primary problems that collaborative filtering has. So whenever I start with a system, I might not have enough users to start with. So that's when I cannot actually give some recommendations. So that's the cold start problem. And eventually some way or the other, every company has to go through that stage. But eventually you'll get over with it. So initially there won't be much recommendations. You won't actually, there might be random recommendations. If you worked with deep learning, then what happens? It's like a neural network suggesting random guesses at the first when you initialize in the network. So exactly here it's the same case. While initializing, you give random suggestions. Over time what happens is it tries to learn the filtering. It tries to learn how users have actually responded and it tries to learn the recommendation. So another problem is what if there are no user profiles that actually match man? So in this case, so suppose I have, so I have a very peculiar personality. I try to watch really interesting things which none of the users that actually match on my system. So this could be a problem for collaborative filtering. So that's something that is also a problem for most systems. But again, these are all the outliers which eventually we can actually get over with. And the third problem is what happens when similar user profiles actually have completely disparate interests? So suppose my wife and I actually watch movies on the same Netflix account. And what happens is, what if I watch something totally sci-fi and then she watches totally that's maybe animation or maybe that's totally mystery or things like that. So in this case, what happens is Netflix would try to suggest a new movie. But eventually what will happen is because the data is so confusing, it's going to suggest something in between both of these. So that's when recommendations kind of fail. So but again, these are all outliers, but these are challenges that every recommendation system would actually have to go through. That's inevitable. And finally, yeah, this is something again. So if I share the same account, then this is another problem. And yeah, this problem, the third problem is actually where if I have two similar profiles, but I have completely different interests, then what happens? So although the collaborative filtering is trying to suggest some data based on similar user profiles, but because my interests do not match to that user, the recommendations are going to be totally, totally useless to me. So that's the problem that the third point is actually discussing about. So again, so the potential approach here would be to actually go for a hybrid approach where we just don't depend on a collaborative filtering kind of an approach, but instead we actually try to build a content-based recommendation, which eventually tops off with the collaborative filtering and then tries to provide a hybrid recommendation system. All right, so let's move on to what we are trying to build. So in our system, so if you had gone through the repository, it's basically, we will try to predict the ratings. So this is basically a movie lens data set when movie lens is basically a company that was actually aggregating all the ratings. And in this case, we have close to one million user data. So this is for about 6,000 movies, or 6,000 users, and about 3,000 movies. So the ratings for all these are available. There is also a larger data set, 20 million, and even more larger data set, which is also available for us to test. But we will take the one million data set. And the problem here is to predict the rating given by a user view for a particular item I. Item I here is basically the movie. And we have close to 6,000 users who have rated 3,000 movies on a rating scale of close to one to five. So, and we would probably use an RMSE between the two ratings and the prediction. And we will see how this actually pans out. And we will actually also see what are the possible ways to improve this. And then we will take on from there. So this is our base network implementation. All this is in TensorFlow. So I'll give a brief introduction to TensorFlow in case this is the first time you're hearing about TensorFlow. But let me first discuss this neural network. Basically, we have a user ID and a user item. So both these are over the bottom. And we get some embeddings for these user IDs and user item IDs. And finally, we actually use these embeddings to actually calculate the SVD. So every time the classical approach for recommendations is basically we create a matrix and then we try to do a decomposition of the matrix. But that's the classical form. What happens if we have millions and millions of data? So that's when we actually want to go for something like a distributed, like a TensorFlow-like system. And that's when TensorFlow actually proves to be immensely useful. So first of all, we generate the embeddings for the users and for the items. And subsequently, we use SVD to factorize this data. And since we are dealing with a million data set of one million records, we'll definitely use TensorFlow. And this last is then computed. So the last over there is actually computed and then back propagated to the network. So this is a very simple neural network. So we'll start this so that it's actually very easy to follow this neural network. And subsequently, we will actually, if possible, build on top of this and then see how to actually deploy this and then get some inference. So first of all, I'll discuss what TensorFlow is. Why do we need TensorFlow? And then we will go for the live demo. So first of all, for recommendation, we actually go for a factorization. So that's the conventional approach. But what happens is that with the factorization, we actually do something like an SVD, which is trying to figure out the latent features, underlying the users and the items. But what happens to large data sets? So that is when the challenge is we cannot do an SVD on a million by million, probably a matrix. So that is one of the reasons why TensorFlow, deep learning neural networks could actually, toolkit could actually be very useful. We actually do something with TensorFlow here. TensorFlow is basically a general computation framework, but invariably or more so, it's actually suited towards deep learning sort of setup. It actually makes most tasks simpler. Things like differentiation could actually be, you could just go through the theory. But in practice, it's actually very well implemented in TensorFlow. And there are so many versions almost every six weeks. So it's practically very difficult to actually keep up with the new API changes. And the approaches, the computations are actually getting better every single release. And one of the good things is that we have quite a lot of optimizers. So in the case like we have all the SDD, all SVDs, all these are implemented natively on TensorFlow. So it's also possible to compute these on clusters. So in case you have a cluster, we have probably GPUs distributed across different machines. Then TensorFlow is very, very useful even to do that. Probably one of the best deep learning toolkits to actually do distributed training on a cluster. There are a few more toolkits like Deep Learning for JA, which actually do that. But right now, TensorFlow may be at the computational level. TensorFlow does a very good job in doing the distributed training as well. And of course, it allows GPU acceleration. We'll see why it's actually doing this, how it's doing this. So if you see TensorFlow is released in November 2015, it's been close to a year and maybe three months. But it's already probably the number one deep learning toolkit already. So if you think about the number of GitHub contributions, the number of GitHub stars, then TensorFlow is right now at the number one. In terms of computation as well, so the TensorFlow team has done a very good job in actually making TensorFlow, the conversion, everything actually faster than most of the competing deep learning toolkits. Natively, everything is written in C++ with bindings in Python. But with the latest release, they have some experimental Go and Java API. So that could be very useful in case, you know, people do not want to switch from Java to Python. But everything else is written in Python. So still Go and Java APIs are experimental APIs. So we might have to wait for the actual release. Again, yeah, it's of course one of the most popular. If you look at the GitHub stars as of 6.30 today, this was 45,000 stars. So this is, again, just three months back, it was about 31,000. So it probably is one of the most popular deep learning toolkits in the market right now. The good thing is that it's an Apache 2.0 license, which means that, you know, on a commercial scale, it's actually very, you know, we don't have to worry about litigations and things like that, patrons and things like that. So yeah, that's there. So you can also train on multiple GPUs distributed on the same machine, or you have multiple machines, then you could do that as well on a network. And of course, it's natively can be deployed on Android and iOS. And one of the good things is that the current version is 0.12.1, which is also what I'm using right now. But I think version one, release candidate one is already out. I presume that it will be released at the TensorFlow Dev Summit next week. So we have to still wait and watch what they're going to be, what they're going to release next week. So it's quite exciting to actually see what they might be doing next week. And there is native GPU acceleration. Of course, there is, so in case you have to use, you have to use CUDA for all this, GPU acceleration, CUD and as well, if you want better deep learning bindings. And it actually runs on Windows, so natively on Windows finally. So with the version 12, it runs natively on Windows. And prior to that, it was anyway working on Linux machines. So the current version is 0.12, but there are some breaking API changes with TensorFlow version one. So if you're brand new to TensorFlow, welcome. I think if you wait for a week, maybe you might actually get into a better API sort of approach from next week onwards. But yeah, anyway, it's still a good time to start with TensorFlow. So that's the link for, we're trying to download TensorFlow and compile it from source. So some basics. So what TensorFlow does, how does it work? So TensorFlow is basically a compute graph. So what it tries to do is it tries to create a directed graph and then does computations on the graph. So that's as simple as that. It's basically used to define machine learning computations, but we can use it. It's a general-purpose mathematical toolkit, actually, if you think about it. But in general, people use it only for machine learning and mostly deep learning purposes. And, yeah, of course, it natively supports deep learning models. And this is a typical deep learning compute graph that is actually run on TensorFlow. So, yeah, that's a very straightforward image for a compute graph on TensorFlow. If you think about how it actually manages to do this device agnostic computation, all the devices for the TensorFlow, the TensorFlow core engine actually sits right on top of the devices. So if you actually want to compute certain things, you can actually just add those onto a particular device and then do it. So in one of the talks, Jeff Dean actually says that you could actually do the computation on probably a mobile device if it has the compute and then something else more powerful could actually be done on a GPU. So things like that. So the device agnostic computation is one of the powerful features with TensorFlow. And that's actually very, very interesting. And right now, there are Python and C++ front-ends with experimental Go and Java APIs that are also there for the front-end. So, okay, let's go to the live... Let's just try out a neural network and then see how it actually does that. So the first thing is all the code is actually available on GitHub. So what I will do is I will basically... Okay, so in the repository, so you will actually go through, you will have a train rexis. So this is the recommendation systems that is actually present in the repository. So let me just zoom this in. I think that's good. So first of all, we will use the movie lens dataset. So like I said earlier, there are ratings for approximately 3,900 movies, 3902 to be a proc correct. There are about 6040 user reviews, so user ratings. And it was of course delivered, it was actually given by the movie lens corporation or whatever, it doesn't matter. It's only the dataset that matters anyway. And you can actually get the dataset right from this link here. So let's go through this code. I think it would be useful in case you have some issues, we can actually go through this together. I'll just go through and I'll just run this right in front here. Hopefully it should all work fine. So the first three imports are basically data IO operations. We are trying to get some DQ, some NEXT is basically for iterations. And we are actually having a readers, which is also doing some data processing. It's basically reading from the TSV, that's the tab separated file. And then it's basically converting that to a data frame and a pandas data frame. And yeah, this worked. The next thing is we are actually having a random seed. So we do this random seed to actually do, replicate the experiments again and again to get the same result. So we actually set it to 42. Or some number as you just keep it a constant. So this will ensure that every single time you run the experiment, you should most probably likely get the same results because every time there's a random initialization. So instead of you actually do the seed, you give the seed. So it actually uses the same seed as the initialization. So this way you can actually avoid or rather ensure that you can replicate the results again and again every single time you run. So the first parameter that we are setting is the unum and the inum. So here is like I said earlier. So this is probably for, this is the number of users and the number of items that we have. And we actually set it to a batch size of 1000. So I'll talk about what batch size is doing and the dimensions and the max epochs later down. First of all, I'll talk about the data. So let's run through this. And the meanwhile, yeah. So here it's basically getting the ratings.dat. So the ratings is basically of the format like I showed here. It's basically user ID, item ID, and then the rating itself and the timestamp. So what we are trying to do here is that we are trying to get this file. We are loading this entire file. And then we are splitting this file in terms of user, in terms of ID and in terms of ratings. So every ID is mapped to a user. Every ID is mapped to a rating. Every ID is mapped to a movie. Yeah, that's correct. So now what happens? Then next thing is we are trying to split the data into train and the test. In this case, we are getting some indices, some random indices, and then getting the data to 90% for training and 10% for testing or validation. And this actually gives back the train and the test. And here is the actual neural network. So let me go through this part first. Okay, so this is what I'm talking about. So here if you see with tf.device, so I'm actually placing all these computes on a particular device. And here because my Mac does not have a GPU, I'm actually placing it on a CPU. But if you have GPU, you could actually effectively run some computations on the GPU. And in this case, we are actually initializing some variables. TensorFlow actually uses weights as variables so that we can actually save it for later use. And we have some, like I said in the show, like I shared in the neural network earlier, we have a weight for the user item and the user ID and the item ID. And then we have some biases for those. And subsequently, we actually use an SVD regularization. First of all, we compute the embedding for the users and embedding for the items. And then we use those embeddings together to actually compute the SVD. So that's what happens here. So first of all, we have the global bias, some user bias and the item bias. And subsequently, we're actually having an embedding lookup. That's what is going to get us the embeddings for the particular users and the items. And finally, we're going to use those to actually do the inference. So the inference is basically just the embedding of the users and the items. And then we are going to compute that and reduce the sum and then get some inference. It's basically an L2 regularization for that, the matmul. And then we are computing some loss based on this operation. And finally, we actually use, in this case, we use, yeah, let me run this as well. We use the cost here. The cost is basically an L2 loss and the penalty. And we actually follow the regularized leader optimizer, which is actually good in this case. We could use some other optimizer as well, but this works best. So you could try different optimizers here. So we basically, if you see, what TensorFlow is doing is it's basically setting up the entire compute, setting up the entire model. Till now, it has not done anything with respect to the data. It has basically just initialized everything and it is setting up for the final compute. So that's what happens in the cell after this one. So let me just go through where is my mouse, yeah. So here, this is where I'm actually getting the data. So all along, I've just been running and getting some, all of the variables initialized, things like the neural network initialized. But I've, okay, so something's gone. Okay, batch size is not defined. So this is, okay. So usually this should happen. So it was good that, okay, no problem. We can just initialize it here. Yeah. So that's what, so basically something was not initialized. That's okay. So the good thing with iPython is that it allows you to keep Python running in the back and then do things in between while it throws an error and then you can still go back and fix it. So that's the beauty of iPython. That's one of the things I love about iPython. So in this case, you can see that there is actually in total 1 million data, 1 million records. But in this case, we've only used 900,000 of that for the training itself and 100,000 for testing or validation. And the samples per batch is what we might be wondering what this is. So every time we take a small batch and then we train the neural network, in this case, we do that based on the batch size. In this, and for every time there is a iteration, we actually take only 900 samples. And let's go through what the data is itself. So if you think, if you see what the DF train is, it's basically a pandas frame. It's a data frame. So we can actually refer to it by the user. And the head is basically going to take the first five records. And if you see, it's actually by index. And these are user IDs. And so that's the train and the test. So if you go down further, you can actually see the top five for the item. So the item map, of course, it's over here. So I'm just running it so that I can initialize all the data again. So all of these are our type in 32. By default, we actually initialize things in float. But to keep things, to reduce memory footprint, we can actually keep things to in 32 here. And this is the item. So all these are again integers. So it's mapped to the ID back. And finally is the rating. So the rating is also for those particular IDs, for that particular movie, and for that particular, the rating was done by that particular user. So here we have some ratings. So again, all these are top five. And finally, we do the, these are all the, again, we have something, okay. Let me go up and then initialize this again. Okay, that's good, I think, yeah. Okay, so it initialized. So in this case, what we've done is we create a placeholder. So in general, when you, so TensorFlow would actually, for you to throw some data at runtime, you create a placeholder for you to feed the batches of the data. So here we have a user batch, an item batch, and a rate batch. So all these are from the data that we actually read from the ratings file. We actually create, these are, the first part is a shuffle iterator. So every time the training happens, we want the data to be random so that the neural network does not try to memorize the data itself. So we try to shuffle the data while training. And while testing, we want data per, like in a sequence. So we actually go for an epoch iterator and then we get the data from the data frame. So we have the placeholder ready. So, and the model also ready. So this is initialized. So up until now, even now we've not done anything. So TensorFlow is basically just sitting there. It has created the compute graph. It has basically initialized, not even initialize all the variables. It has created locations for it to actually do the initialization. So when you actually go for the next cell, here you see that there is a saver. So first saver is basically for you to save the variables to a local file. So in case you want to maybe restore the variables, if you want to retrain again, or if you want to do something like fine tuning or transfer learning or something like if you want to, you know, start, you know, run it for some time and then come back and do it for some more time. So things like that can be used with a saver. So the saver is basically just like a pickle in Python. So it's going to save all the variables to a local file. And then it allows you to restore it and then train again or do inference. So this is the first line here. The second line is basically the global variable initializer. What global variable initializer does is that like I said earlier, all the variables that we've created now would be initialized the first time here when we run this. And once this is done, we actually create a session. So a session is where everything runs in TensorFlow. So we actually use a width block here because it's easier to actually maintain with a width block. Otherwise it's like a file open or a file close. Once you open it, you have to close it. If you don't close it, things like runoff conditions and all those will be there, memory leakage. So we actually have a width block here. And in the width block, we are actually using the default session to create an initialization. So the first line we do here is basically the initialization. So let me run this while I talk. It'll actually do the training. So what it's doing now is it's basically initializing it and getting the data per batch. So every time it gets a new, while training, every time it's getting a batch, it's actually training, it's actually randomly shuffling the data and then it's actually feeding it into the neural network. So the session.run, the sess.run here is what is actually feeding the data. And if you see the feeddick, it's basically the dictionary that is actually fed at runtime. And every time it's basically the user batch. So these are the placeholders, if you think, if you saw, these are the placeholders here that were actually fed into the neural network. So these are what are actually being, again, used to actually initialize, iterate and get the weights. So here, the feeddick is basically getting the user batch and the items and the rates. And it's predicting some errors and all the losses and it's getting the results here. So once you do that, you're actually computing the error and then you're saying what the error is. If you go down here, so every time, so an epoch is where the neural network has seen the entire data once. So that's one epoch. So in this case, we have 900,000 samples. So one epoch is actually the neural network seeing all the data once. And we are actually running it, I think, for 50 iterations here, 50 epochs. So it should have completed by now. Yeah, it has completed it. So let's go through what it has done. What it has done here is basically every time it completes an epoch, it is going to do an evaluation of the loss. It's basically the error that in actually predicting the data and in the error in the actual data. So that's how it computes the loss. So the loss, the train error is the error with the training data itself. So it's basically trying to compute the error against itself. So that's the training error. The validation error is what the data has never seen. So data that we actually split 90-10, it's using that data to actually compute the validation loss. And if you see, it's actually pretty fast. So on this Mac, it's actually taking less than two seconds per epoch, which is for 100,000 rows of data. So what we do is we keep running this. And once we have, so in this case, if you go through it, after some time, it actually kind of plateaus. So the error does not go beyond 862. So that is when we actually want to see what is happening. So the training loss is actually kind of going down. But the validation loss has plateau. So let's go down and see the validation error here. So if you see the training error is actually still going down. So if I maybe run this for another maybe 100 epochs or so, it might even go down further. But the validation error might not go down. So this is where we need to understand how to improve the neural network, what is possibly maybe data is not sufficient, do we need to add a regularizer or things like that. We need to actually understand that and we have to incorporate those. But for our demo here, we will actually stop over here. And we will try to do some inference with this data. So I think I did not include the data here. I'll run the training data here. So in this case, so here we have the inference also. I think I've submitted this as well. So let me stop this. And yeah, let me run through this. So while it's running, I'll just go through all this again. So what we've done here is basically the same thing. So the only thing extra here is the inference. In effect, what we would do is we would do all the training on a GPU cluster so that we actually do the training faster. And the inference would maybe be even on a CPU. It doesn't matter. So here we have yet to start the training. We have already started the training. So it's already at 9 epochs. So let it run. So in general, we would try to do the training on a GPU cluster, like I said. But in this case, I am still running this on a CPU. But most problems are not so simple. We may not have a straightforward user item and the recommendation. In this case, what this will try to do is given a user and an item, it will try to predict the rating. So that's what we're trying to predict here. The validation error is the error in doing the prediction for a given user and an item. So that's what the validation error is here. So it's still going on. So it takes about, OK, I think it's done. So let me talk about this. So what happens is that after 25 iterations, I think I just ran this for 25 iterations for the test. It saves the model. So like I said, the saver is actually doing all the, it's actually saving the entire session variables into the model here. So if I go to the model here, it should be under save. So here you have a checkpoint. You have models data and model index and model meta. If you would want, if you have large training, then what you would do is you basically throw, you actually run a tensor board and see the validation and the training loss. But since we've already done with this, we have already computed the loss and we are more or less done with this. We are actually going to see the inference here. So the inference here is basically for the batch. And I'm actually showing only the first 10 items because the batch is 100,000 items. So if you see the prediction is actually very close. This is based on purely on only 25 epochs. Just on 25 epochs, this could actually do quite well. So if you see, given a user and an item, we can actually predict quite close to the actual result. If you see, there is a prediction, the actual is five and the prediction is 4.9 already. So that's the sort of, you know, the level of data, that is the level of training that we can do in such a short span. So we have a good compute. So TensorFlow allows you to do, you know, feed in batch by batch and get the evaluation done straightforward. But what if I have a very poor or very old laptop or things like I want to actually do compute on the cloud? So that's when we actually go for Google Cloud and we will go back to that. So what happens with Google Cloud is that we can actually set up a, I think when you start with, you get $300 of credit and the cloud infrastructure is pretty good. It allows you to do distributed training. We can actually do the same thing. We can do the training locally and we can set up the entire cloud using a storage and everything on Google Cloud ML. We basically, you know, fire up the console. Once we do that, we actually enable, we have to enable building. So initially don't worry, it's only for three months and you have $300 to test all the models and things like that. It's actually quite fun to understand what Google Cloud is actually can do and what it cannot do. So at times it's a bit difficult to actually set the data and then do the computations. And at times it's slower than the CPU that you have. But still if you think about it, if you have some cloud infrastructure that you would like to get some inference on the cloud, then this is a good way to go. So once you actually have a trained model like we have right now, we can actually put the model on Google Cloud and then set that, so let me just show how that looks. So it actually has a cloud, let me show you that case. So this is where your cloud, the minute you actually finish your training, so this is the place where you, I think the TF model is not visible. So once I finish training, I actually come here and then I put the data over here, I put the same model over here, let me go to the console. So this is the console, it's typically an Ubuntu console and it tries to get the connection first. Yeah, and then it has a very similar shell that you would like to, so if you see it's this and then if I actually do an LS, I have all this data over here. So once you set up the storage, you actually go here and then you put all the data on this player here. So you create a bucket and then once you have a bucket, you actually put all the physical data that you want into the bucket. You can actually use the same data, the same TensorFlow code that you ran locally to actually run on the cloud and you can get the inference from that itself. So let me just see how that looks. Yeah, so in this case, we have the same data over here and we can also, so there is a capability with Google Cloud is that you can actually run TensorFlow board as well and then you can run a Jupyter Notebook. So all that is also possible with the Google Cloud on this infrastructure. But currently I'm not able to run the ML because of some billing issues but otherwise you can basically put all the data and you can actually use the same inference that we did with the TensorFlow board, the Jupyter Notebook here on Google Cloud. So we can get the same results and we can actually wrap it on a JSON and then we can actually put it and get the results, you can actually get the response from Google Cloud. So instead of actually figuring out how to set up a complete server by yourself, Google Cloud can actually allow you to do that part and all you have to do is train this locally and put it up there. So yeah, so that's the part here. Let me try to figure this out. If I'm able to set this up, then maybe I will let you know but if you have any questions, you can ask me. In the meanwhile, I'll try to set this up and then so yeah, so once you finished with the model, you actually choose the create version and then the version actually allows you to seamlessly move from one version to another without actually having a downtime. So it's something similar to the serverables in TensorFlow and then it actually creates different versions. You can actually provide different versions to different people and then the Python API allows you to actually pull a different version and then get the results from different versions. So basically, I can have the same algorithm but doing different jobs for different people or different API calls. So all that is actually possible. So with the versioning and yeah, so that's more or less my talk. So if you have any questions, you can ask me. Maybe you can come to the question. Mike. Hello. Hi. Yes, thank you for sharing. This is one question you're only back to your cover. Okay. The Jupyter Notebook. This is massive using SVD. Can you explain a little bit more about that message? Just to me, it's like a matrix multiplication and some addition. So what do you call it? SVD? So SVD is basically a singular value decomposition. So what we are doing here is basically a type of SVD here. So we are trying to get the u, v, u, sigma v kind of a decomposition here. And yeah, so it's a very simple, it's a very similar way over here, but it's not calculating the eigenvectors and things like that. So do this right there. You're doing matrix multiplication. Okay, so this is a very simple implementation. So we want to do the same factorization, but with the user and the item data directly, you're actually getting an inference based on very simple matrix multiplication. If you're going to do an SVD itself, it might be too computationally expensive. So instead we actually resort to a simpler way of computing the same factorization. So you're carrying? Yes. Yes. Thank you. Hello. Hi. One question is this is done via TensorFlow, but there are some other alternatives for that when Apache Spark. So did you do any comparison on performance? Okay, so this is, of course we can do it with Spark, but the point is, there's not a comparison between TensorFlow and Spark. So Spark is purely a distributed computing framework, right? But with TensorFlow, the availability of various tools, something like where you want to do different optimizers, if you want to go deeper. So one thing that is actually interesting is if I have a deep neural network, this is basically a simple embedding and computing the loss. So what if I want, like I said, the PLA2, the loss actually goes to 0.86. But if you have more parameters, if you can actually get deeper, if you can have a deeper neural network, then you can actually get better features, and you can get better results. So the point is that you can actually do more computations. So with Python, with Spark, it might be possible that TensorFlow is actually good in terms of the computation. So the reason for doing TensorFlow is there is no, I don't think there's a comparison on Spark because these are completely different. Understand. My question is, do you try to build a model using different tools and see which model works better in terms of the OIMSE? Okay, so yeah, of course, if you have it, I didn't do the comparison there, but if you have a deeper neural network, it should definitely perform better. So yeah, to answer your question, no, there was no comparison there, but this is to introduce how to actually do a recommendation system, to build a recommendation system in a very straightforward way with TensorFlow. Another question I have is, you mentioned the hybrid solution. Could you join that with, I mean, when you land, there's just the ratings, there's no content there. Yes. Do you try any hybrid solution? So with this data set, it might not be possible, but with other data sets, we could do that. So for example, if we have something like a Netflix data, so the Netflix challenge, then maybe we could actually do more, but for content-based, what we might have to do is include the word embeddings as well. So things like the similarity between the genre maybe, or things like other features might be used, but in this case, this is a very straightforward approach for the collaborative filtering. Okay, thanks. Thank you. Hi. I have a very connected, similar question. It was about the different models and when do you use which one? So in this particular example, where would you use a conventional collaboration model? And where would you think a deep learning would be more useful on top of like, would it depend on the data, number of features, where, how, and why? Okay. That's useful. Thank you. So the question is, when do you use which model? And so to answer that question, so it depends, so the computation is what is more important. So if you want to actually do, if you have a lot of data, then typically you can use a deep neural network, but if you have very, very small data, say probably I have only 1,000 records, then what will happen is a deep neural network might overfit. So in general, if you try to generate more data, so when you have lots of data, say a million records, like I said, like I showed here, as in the movie Lens data set, you can actually go for something like a neural network. But classical techniques still work well. In most cases, classical techniques are actually good because data is the problem there. So although we are actually, we have a lot of data, we still do not have access to those data to actually do the computation. So in those cases, we actually still stick to the classical approaches and then get the results for those. But in general, we can actually do well. So there are actually good papers right now which can produce data from, so things like generative adversarial networks can actually generate new data based on the earlier data. It's basically like neural dialogue generation. So things like I can actually do a curing test with a computer and then I could still, the computer could pass to be a human being. So things like that is possible, but it's only with data you can actually do such things. But without data, it might not be possible. But if you have data then, so you can actually, you still have to do an experiment and then figure out if you need a deep neural network, if the neural network is good enough. So things like going through what is the validation error, if the training error is going down, if it is not going down, if it has gone for further epochs, then if it is not going down, maybe have a deeper neural network. So things like that, so we'll come into play. So yeah, hope I answered the question. Yeah. Hi, thank you. My name is Shanti Ram. I have a very simple question. I understand that TensorFlow provides you with a large number of optimizers built into the language itself. Is it possible for someone to extend and implement their own version of the new optimizer in TensorFlow? If yes, then because of distributed computing for the underlying runtime, do you need to go into C++ or do you need to do more complex stuff or is it possible to do with Python itself? So the question is, is it easy to build and optimize yourself or things like something fundamental, some fundamental operation yourself? But to answer that, yes, it is possible. And yeah, all these are C++ bindings. So if you want a basic operation, things like C++ would still be the only way to go. But you can still write in Python. The problem would be that it might not be as efficient as the underlying implementation. So suppose I implement SGD again, thinking it might have a better improved approach to doing the same, say maybe an Adam optimizer. Then what will happen is that the underlying code would still perform better in terms of computation, in terms of efficiency, that would still be better because it's an open source implementation and it's been continuously implemented. So yes, to answer your question, yes, you will have to do it in C++. So it's up to you. If you can do it in Python, if you're okay with the overhead. Hi. Hi. Hi, baby, I'm just wondering, because I think when you see the code, it seems like a kind of a black box to a no-programmer like that. So is it possible for us to even go down deeper to see how the algorithm works? How does it come up with a training? What are the repetitions that went through? Yes, and finally, how to test, maybe you could demo some of that. Yes, it is possible. So that is one of the topics that... So in order to understand what a neural network is doing, so we typically try to see the activations. So we actually go, once we have the train model, so to actually go down deeper into what it's doing, we actually see the activations itself. But so for a programmer to come up with, to understand what it's doing, it's better you actually have the underlying machine learning knowledge. So although, so everybody, all of us here would have been a programmer once. So it's nothing to, you know, you can always start. And it's better to understand what the neural network is doing before getting into it because after you get into it, without understanding it, you might not think, you might not understand what a hyperparameter is doing, what a stride is, what a kernel is. So things like that would become completely, you know, too many things to understand. So things like dropout would become too confusing. So if you don't understand the underlying concept. So if you're starting from scratch, I think it's still, you know, there are so much to actually happening. And a lot of resources available. You can still go deeper and then you can understand it. It's not a problem at all. So what approach will you say is good for if I want a manual, please say there's a seasonal recommendation I want to add in which I know is not part of my data. Say in Christmas I want to recommend a movie to a lot of users. So is this something that can be done within the model or will you recommend something? So in the movie's data set, so there is a timestamp. So what you're asking for is basically using the timestamp to actually improve the recommendation, right? I know that it's a Christmas time. I know that in Christmas season, a lot of people see these movies. I want to put that in my recommendations. But I know my data doesn't cover. Okay. So in that case, then you might still have to, you know, retrain the data. So because your data has only seen the training that it was given. So if you have something completely new, then it might actually recommend something completely unrelated. So to answer your question, you might have to retrain the data completely again. The model, say, ensemble, or if I don't have data to train it, or would you recommend generating data somehow? Yes. So for somehow it's a very different, so with images it's actually possible. There's a paper called Generative Adversarial Networks that can actually generate, so they've shown that they can actually generate new images that's actually completely indistinguishable or to an extent indistinguishable to human-human beings about the original data. But for text, there is a new paper from Stanford, the NLP group in Stanford that actually generates dialogues. That's a neural dialogue generation using reinforcement learning. So the GANs is actually a very interesting topic. So if you want to generate discrete data yourself, so at the moment it has been actually kind of solved with neural dialogue generation, but with discrete data like something like maybe stocks or things like, in your case, weather data which is not available, it is still a problem. It's not directly you cannot generate data in that sense. Okay, I'm showing. Which is your favorite cheapest airline ticket recommendation system? Thanks. Okay, so I don't understand. So the cheapest airline recommendation, if you still, I think one of the good things is, I don't know, there are too many. When I want to book a ticket, I generally don't stick to one. I would generally have an alert on Google flights, on Kayak, so many things, on Skyscanner. But in general, I found Google flights to be good. I don't know. Thank you. Any questions on you in 30, 30 more minutes. So more questions. Thank you. I have a simple question about linear regression. Have you actually tried to implement a linear regression? And most of the time we have to keep how many degrees or how many coefficients are there. Is it possible to actually let the algorithm define how many coefficients and how many degrees it has to be rather than we keep it ourselves? Yeah, so things like that. So that's the classic problem of, if you have, say, a two-dimensional data, and if you're trying to do a linear regression, if I fit three coefficients for a linear data, then typically I would be overfitting. So what you're talking about is basically finding the hyperparameters. So there is a standard way of doing that. So there's a Bayesian method, which actually tries to do a brute force. There is a brute force, of course, that's straightforward. But if you have very high dimensions, then it might be very difficult to figure out with a conventional or a brute force approach. It might be computationally too expensive, and the time taken is too expensive. But it's possible. Yes, there are approximate methods to actually figure out the hyperparameters in what you're talking about is basically the number of coefficients, for example. But there are other parameters as well. For example, the bat size maybe, or it's a learning rate, or the momentum, things like that can also be learned. So for all those hyperparameters, can be figured out that there are algorithms that allow us to do that. So that we can just check it out. Sure. I can maybe share that offline. Yeah. Thanks for sharing. Just a question. You mentioned that you can use TensorFlow to train a hybrid implementation model, right? But do you have some implementations on these use cases and where we can find examples? So right off the bat, I don't have an implementation. But if you think about it, like I said earlier, it depends on the content. So it depends on what sort of content you're looking at. So for example, if I'm looking at suggesting images and having a hybrid of images plus the collaborative filtering, then I might have to understand what the content of the images. So you have to define the problem before you actually get into developing a hybrid approach or any sort of approach, first of all. But yes, it is possible. But right now, I do not, because it's going to become too computationally expensive to compute on even a laptop or even on a server, because the type of content, the data you have right now, I don't think I have the data for doing that, but it is possible. For say, in the percentage problem where we involve multiple source of data. Of course, we mobilize a single source of data, a hybrid source of data. So you want to do hybrid, involving a new engineering of the content. So it's not manual engineering, per se, in that case, because even when you're doing a hybrid approach, say suppose I have a movie title, which is actually images, then what I could do is I could train a separate neural network that actually understands the content. So things like I could actually use the inception network to understand what the image content is. So if there is a person in it, I could actually say there are three people in it in the car. So something, things like I could understand the content from the image and then I could actually club it with, you know, have a word-to-word representation to train a neural network on that part for the content and for the collaborative part, I could actually use the user item kind of an approach and I could combine those to actually still do a recommendation. So it's not feature engineering, handcrafted feature engineering in that sense, but you still have to choose one of the sources. So again, so you need to still train the neural network to do the sort of an inference, to sort of understand what the content is. So if you're having an image, for example, then you might still have to understand what the image contains, right? So you might, you could, you still have to have a trained model on that and a trained model for this and combine these two like a joint training and then still, yes, it is possible. Do you know whether TensorFlow only supports live data streams or whether it's just a focus? So by live data streams, you mean images? Live streaming data. Live streaming data in the, in search of... Such as the live data stream. Sorry? Such as the live data stream. Okay, so it is possible, yes. So because all, as long as you're doing it for testing at inference, it is still possible because in the end, what TensorFlow is doing is basically you can initialize the entire graph and then you, it's about how the latency in the input and the computation and the output. So if you have something like a trained model which can do inference within, let's say, if I have a GPU, for example, if I give you an inception model, so for example, because in images, what we have is we can actually do it much faster than real time. So if we can do that with images, then with telemetry data, which is typically, you know, a matrix or a number, it is definitely possible. Yes, it's surely possible, yes. Hi. Hi. Basically your data is as good as what you can recommend, but we know that recommendation changes over time. Yes. People have different preferences. So do TensorFlow provide discount, data aging, like discounting of your data? Okay. So TensorFlow itself is only a compute engine. So it's not about TensorFlow. You can use the same model on any deep network or any tool, mathematical tool to develop the same algorithm or same results, the same model. The point here is to answer your question, you want to actually include something like a decay for the data. Weightage. Yeah, weightage. You could still do that. So what we basically do is we have something called incremental training. So you actually have a model that you actually keep training over time. If you have some new data, you initialize the same neural network with the existing weights, the weights that you trained earlier, and then you train with the new data. So what happens is even if your data is not as large as the earlier where you train, you could still train the neural network to actually improve the way in which it's given the new data. It's incremental training. Yeah. Yeah. Hi. How do you finally deploy this TensorFlow model in a Python server by using Flask and create the REST API? Yeah. So effectively, you still have a wrapper. So like Flask, so every time there is a job, so every time you need an inference, you actually have all these jobs scheduled. And then you actually, once you have the jobs, you actually run these jobs for every single inference. Typically, if it is images, you actually batch all these. And if you cannot do a batching, then you still need to have these models initialized on the compute, and then you still get the inference based on a schedule that you're actually running. So it's like something like a Q server. So like a rabbit MQ or things like that. So you basically give some data that needs to be run on the model. And then you get the inference and then you store it separately. I'm actually quite new to TensorFlow. So I want to ask is, if we train a model based on, we train a model, is it possible to use transfer learning to make another model that is of similar, that has similar features? So instead of run, like if I want, for example, movies, I want to do another model of books. So instead of running the whole data for books again, I use the model for movies. Okay. Yes, it is possible. Transfer learning is one of the best things with deep learning. So you can actually, so if you see in images, for example, the ILSVRC was close to 1,000 categories and a million images, you can use that to train completely new categories. So from 1,000, I can actually do a catch versus dogs or things like I can train a human being versus cars. So things like that can be done with TensorFlow. Yes. And in TensorFlow, there is a binary file that allows you to do this retraining. So you could do that, yes. Maybe a last question, anyone? So actually, I'm new to TensorFlow and machine learning. I just want to ask, how do you assign the weight? Suppose I got a lot of data from different websites and I have a lot of data. So the thing is machine learning basically works on the numbers because you train your models by assigning the weight and numbers. So how you assign a weight and numbers? Like I went to your site, like a Google AI project, there are a lot of projects, and those projects are working on numbers like musical instruments are there when a person just plays some piano and some random strings will be played. So at the numbers are mainly, they convert all the files into numbers and then based on that numbers it plays the following. So because I can't understand why I have, how you assign the numbers to this thing also. So typically you don't assign the weights. So the weights are first during the first titration when the graph is initialized. It's initialized on random. So if you see the first training error is 2.4647. So that's because it's randomly initializing it. So now what happens is from there onwards, the first time because it's random it produces an output which is basically an equal probability for all the classes that it's going to predict. So now what happens is it predicts so once it gets a loss it backpropagates the error and that's how it actually refines the neural network. So the weights are not initialized with a particular value but instead it is randomly initialized. So if you scroll up here you can actually see that the weights are so here the weights are truncated normal initializer. So if you see over there let me highlight that. So that's the truncated normal initializer. So that's the way you initialize the weights. So you don't initialize it to a particular value because then that completely defeats the purpose. So if you have this you actually backpropagate the error and then over time if you see the error actually goes down. So that's because the weights are being learned and it actually produces better results over time. So every epoch the loss actually goes down and effectively it's actually improving the accuracy as well. So hopefully that's what is happening over here. Every time there is a loss reduction so that's the reason for the So this is basically recommendations for the movies. Yes. So my main question is suppose in movies you have a list of names then you have a ratings of the user. So ratings in the numbers the list of the movies is just like strings. So you assign the strings a random number. No. Here what happens is this is basically so if you see the data itself it's basically mapping the ID with a user and an item and the rating. The item here is a movie and the user is a particular ID. So what you're saying is that it's not actually taking some random number it's basically the rating that the user gave for that particular movie. And what we are doing is we are making it into two separate data frames and then having the rating as another data frame so that we can actually use that as a loss. So when we actually use these two to actually predict the rating we actually find the error and then we propagate that back. That's what we are doing here. So every single time there is a compute we are actually predicting what the rating would have been given this user given that particular movie. That's what we are trying to predict here. You are passing this device here and there. Yes. How will this impact the problem itself? So the good thing with TensorFlow is that when you are doing all this it's basically like compiling. So in Java when you actually do the compilation it's actually going through all the variables and then figuring out if something is wrong. So in this case it's not figuring out something is wrong. It's basically setting up the compute graph and then it's only setting itself up for all the data. So effectively it's not impacting anything because it is already setting up everything before you even pass it the data. So once you pass it because it already knows that this is the data that it wants to process it shouldn't actually matter. Because it knows how to actually take it from one device and put it back so it should not impact on the performance at all. CPU colon zero means one core. You are forcing the program to get the first core. Yes. So if you have multiple cores you can actually so if you have multiple GPUs you can basically make that operation run on a particular GPU. It doesn't need any code change, right? No code change. Again, just one caveat here is that certain codes like embedding do not work on GPU. So you still have to do those on CPU. So if you go through the API you can actually figure out what are the computations that can be done on GPU. What are the computations that can be done on CPU. But yes in general you can actually it's device agnostic. On CPU if I had a GPU I could have as well run this on GPU. Yeah. Thank you everyone. Let's thank I think for this session. Okay.