 Yeah. So, good evening. I'm Karthik. I will be presenting some of the details, some knowledge about recommendation systems, and of course, it will be with TensorFlow. So, let's just move on. So, about me. So, first I'll just give you a brief description. So, like I said, I'm Karthik. So, what do I do? I actually work for SAP Research. So, it's right across this building here. So, my day job is mostly again on machine learning. So, we work quite a lot on TensorFlow, and we build deep learning models, machine learning models, mostly on national language processing, some big data, some numbers as well. So, yeah, that's more or less about me. So, prior to actually doing this, I was actually a graduate student at NTU. I graduated with a PhD. So, and then now I'm actually working in the industry in Singapore. So, yeah. So, let's go on. Let's move on. So, I'll talk about the agenda first. So, first, we'll get into why recommender systems, what are the types of recommender systems, and probably we'll build a recommender system ourselves. So, and probably have a hands-on session. We'll also go through some inference, some training right on this machine. Hopefully, it should work well. And in case you missed it, so all the code, all the inference, all the trained models are available on GitHub repository. So, you can download it and you can try it yourself. All right. So, before we get into recommender systems, and what it's all about, I think we should first understand what data analytics, maybe we'll have us just a short data analytics one-on-one sorts. So, first of all, there are three main branches, which is basically descriptive and the prescriptive and the predictive models. So, what descriptive does is that, it basically creates a summary of past data. So, it creates historic data. It uses historic data to understand what historic data is about, and then it just gives you some information about how the history behave. If you take about prescriptive, so what prescriptive does is it tries to find a good action possible. So, given some data, it tries to figure out what is the best action possible with this given data. And finally, the predictive model. So, predictive model is mostly on statistical learning. So, most of machine learning is under predictive model as well. So, it tries to statistically understand what the data is about, and then it tries to predict the future. So, that's what it does in terms of predictive modeling. But in terms of prescriptive modeling, what happens is it basically gives a recommendation, and that is what we'll be talking about today. And finally, for descriptive model, it basically tries to figure out some aggregate data. It condenses their data into some sort of form. And when do we use descriptive? So, probably we'll use descriptive when we want to figure out what it is about, what the data we have. So, some knowledge about the data we already have. Probably for prescriptive, what we would want to do is we want to use it when we want to get some suggestions, some actions, and we want to actually work on that. And when predictive, we want to actually do some prediction, as the name suggests, we want to do some prediction for the future. So, maybe if we are talking about stock markets, then we would probably like to see the trend, if you want to sell or if you want to buy. So, these sort of learning comes under the predictive model. And for example, so if we talk about descriptive model, so correlation is an example. It's an example. Average is basically descriptive. And for prescriptive, we have recommended systems. So, that straight comes under prescriptive model. And finally, for predictive model, we have sentiment analysis. That's one example there. So, yeah, so with that, as a precursor, let's move on to recommend a system which comes under the prescriptive model. So, what is a recommender system? So, that is the primary question that we are trying to first of all figure out. So, if you think about it, recommend system basically tries to match a particular user to maybe two relevant items. So, the challenge here is that it's not about understanding what the user wants or things like that. So, it's basically trying to wow the user with maybe historic data, with similar user data. So, things like that is what's actually going to be a success for recommender systems. For instance, I was just talking to a friend here. So, he was saying that some of the recommender systems are not as good. If you see Netflix, for example, you would have watched movies and it would actually suggest you really interesting series or movies that based on what you actually watched. So, this is a standard, a very simple example for recommender system. So, basically, how does it do that? It basically tries to see your usage history. So, maybe I watched Inception yesterday. Probably, I could watch Interstellar today. So, that's a straightforward recommendation there. And based on people. So, maybe my friend who has a very similar interest to me, maybe I watched a machine learning video today. So, he also watched probably something like a deep learning video yesterday. So, the recommendation system could actually come back and say, hey, why don't you check out this deep learning video that your friend actually watched? So, that's another example for how a recommender system could actually help us. So, in this case, what happens is, I did not expect the recommender system to actually come up with some new recommendation like that I have not even seen before. Probably, I'm interested more in machine learning but not in deep learning. Of course, it's a broader scope. It's just that deep learning is much more the thing right now. Probably, I'm more interested in classical techniques but this comes back and suggests me that deep learning, there's a video on deep learning, why don't you see that? So, and that's a sort of a wow that actually recommender systems are trying to bring forward. So, these are some, you should definitely find these similar. So, one very interesting example is Google Play. So, if you actually download games on Google Play, you can actually see that there is a category which says based on your previous downloads. So, that's a very interesting straightforward example for recommender system. And of course, Netflix would actually suggest movies based on your watching history, maybe even people in your household. Things like Google Instant. Again, Google Instant is another very interesting example where they actually store some data and then they try to predict their search question based on your previous searches. And Facebook, for example, is doing something like that. So, you might have added a friend and that friend might have added a friend who's actually in the circle. So, it actually suggests another friend based on your friend's addition. So, that's another example. And for Spotify, for example, so if you listen to some genre music, say I keep listening to rock music all the time or maybe I like Hans Zimmer, I keep listening to Hans Zimmer's music, then Spotify comes back and tell me that, comes back and tell me that it's actually a Hans Zimmer that has a new soundtrack, why don't you listen to it? So, that's the sort of recommendation that I'm actually looking for because I might not know that actually Hans Zimmer came up with some new soundtrack or he actually scored a music for a new movie. So, these are the types of systems that are actually very useful because if you see, players like Amazon are actually trying to sell something. So, in this case, a recommender system could come forward and tell me that there is something that knew that actually I might have ignored or maybe did not even know. And so, another example is Spotify. I might actually want to buy that track and this way Spotify actually gets some revenue based on this new recommendation that I did not even know. So, that's what a holistic, just a view of what a recommender system can do and does. Okay, so how can we go ahead and recommend? So, can we just take something? So, the first straightforward approach is content-based recommendation. So, content-based recommendation is trying to figure out the data and it's trying to understand what the data has and then matches sort of features with this data and another data. So, in this case, the content of the item plays a vital part and your, of course, your profile, some of the text in your profile could actually also play another vital part. In this case, it's purely dependent on the content extraction. So, what happens is if you have a bad content analysis, then again, you're screwed. So, that's the problem here. Moving on, the next type of filtering or next type of recommendation is collaborative filtering. So, what collaborative filtering does is it's basically trying to aggregate information. So, like I said earlier, I might have a friend, so on Facebook maybe, who might have watched a video and I also might like to watch the video. So, Facebook comes back and increases the video on top of my filter, sorry, on top of my news feed and then what happens here is that it's trying to figure out similar user interests and suggest new content based on user interests. So, it's trying to figure out the recommendations based on user interests, not on the content. So, in this case, if Facebook was to suggest something to me through the content-based recommendation, it would have to go through the entire video content to actually do the suggestion, which is pretty expensive if you think about it. But, so that is what happens, that's a difficulty with content-based recommendation. So, if you see, for text, maybe parsing is easier, but even then what happens is, human beings are, we are actually, we use lingos, we use unnecessary incorrect grammar. So, what happens is content-based recommendation fails in a lot of cases and that's when collaborative filtering kind of, you know, came into view and then finally, probably if you want to know what is the best approach, then it's a hybrid approach and this is where Netflix is actually doing a good job and this is where a lot of companies are actually doing a very nice job. So, if you think about it, what Netflix is doing is it's aggregating information from you as well as your siblings or your family and then it is also using the content. If you think about it, what it's doing is basically taking the genre of the movie or maybe the series that you're watching and then it's actually matching those with your history. So, it's doing both, it's doing both content-based recommendation as well as collaborative filtering. So, in the case where maybe my wife would have watched some movie and maybe because I'm also, of course, sharing the Netflix account, then Netflix would come back and say, why don't you also watch this because we have separate accounts probably. So, these are things like what happens with Netflix and even so other players, other large players are actually going towards a hybrid approach and that's been shown to be actually, that's proven to be one of the best approaches but more often than not, it might not be a very starting off, it might not be very feasible to actually start off right away with a hybrid approach. So, we either start with a collaborative filtering or with content-based recommendation and then we subsequently, once we keep accumulating data, then we can actually think about moving towards something like a hybrid model. All right, so what is the need for recommendations? So, if you think about it, so this is how we are now living. So, we have a data that's pervasive. So, you think about we have IoT, we have data collected from cars, from IMUs, from every part of your life. In fact, my phone is recording my walk, it's telling me that you've actually burned so much weight. My Fitbit is telling me that I have, I've climbed like 20 stairs today, 20 flights of stairs. So, data has become so pervasive and there's an exponential explosion of data right now. So, that is one of the biggest problems. So, that was one of the problems earlier, but now it has become an inverse problem where we have too much data and we don't know how to use it. So, that is one of the reasons why we need a recommendation system. So, if you think about it, we have Amazon. Amazon might have like say 100,000 items for a particular category. So, how does it actually promote or how does it actually show up some new item that could actually be of use to you? So, and this is how it does it. It basically tries to figure out what is your interest, how would you purchase, what is your purchase history, your purchase pattern, and then it tries to suggest new items. So, there is also some research that's actually going on about the times of your purchase. So, things like in the evening, is the most susceptible to buy something. Things like afternoons are also apparently good. So, these are all ways for understanding how to actually give a recommendation, how to, the need for recommendation, and how do we actually solve this problem. Okay, so, yeah. So, this is one of my, yeah. So, let's talk about content-based filtering. So, let's start with the content-based. So, we have four movies here. So, we have Interstellar, we have Inception, Gravity, and that's, I think that's Dark Knight Rises. So, if you think about it, content-based filtering does something similar to this. It basically takes some features, something like it tries to figure out, is it science fiction? So, probably the first three would be science fiction, and the last one is not. Is it directed by Christopher Nolan? Then only Gravity is actually not. So, what happens is, it tries to figure out this and it actually has a matrix. So, suppose I had watched Interstellar, and I am actually looking for a recommendation, and I'm actually trying to figure out, say maybe I'm just browsing through my Netflix to see what are the possible movies that I could watch next. Then, what content-based filtering would do is basically go through this matrix, and it would actually figure out the distance or maybe the most likely that I would actually watch a movie. So, suppose the other three were the only movies in my list, which is highly unlikely, then what happens is, content-based filtering would try to figure out how much of a match the other movies are. So, in case of inception, it's almost like a 100 percent match. So, I'm highly likely to watch Inception, and the least is Gravity, and the second likely is Dark Knight Rises. So, if you see, the distance is basically a match between what I watched and the content of what I watched and the new content. So, this is how it tries to build the data. So, the real problem here is that, for you to actually figure out these features. So, if you do not know or if these features are not apparent, then it might be very difficult to actually do this sort of filtering, this sort of a recommendation. Another problem is that, again similar to classical machine learning techniques, if you choose different features which are not suitable for a particular problem, then you would again come back to the same cycle. So, you would actually go back to saying, okay, I am actually choosing the wrong feature, is it good, is it bad? So, going back to evaluating if this feature is correct. So, these are things that content-based filtering has a big problem with. So, let's move on and see how this is basically trying to figure out how the potential features, and then for the item. So, how does collaborative filtering work? So, in this case, so let's take the same four movies, but I have four friends. So, let's take an example of four friends who are here, Eva, Dom, Coop, and Ryan. So, if you know these movies, you might actually understand these character names. So, Eva has actually watched Interstellar, but Eva has not watched Inception and Dark Knight Races. So, the question is not what sorry, this is likes. So, it's basically trying to figure out that these four are my friends. So, if these four like some movie, then how likely is that being a friend of these four people? How likely am I to watch, say, Inception, or how likely am I to watch Gravity? So, that's how collaborative filtering tries to figure that out. So, you have a set of friends, and you try to figure out some new data based on what your friends have already liked. So, this is some sort of another way of actually doing the same recommendation, but in this case, you realize that you all you need is the like, and so probably the understanding of how these people actually appreciated some movie, rather than figuring out what are the features that could actually be part of the recommendation itself. So, one problem, so this is one interesting thing is that you realize that for Facebook, if you actually do a like, so the like is actually contributing to something of this one. So, what happens is every time your friend actually posts a picture, you're saying like. So, basically what Facebook is going back and doing is okay, you like this picture, which means that the content of this picture is useful or maybe interesting to you, and Facebook tries to figure out how to, you know, give you features or give you new content that is actually very similar to the earlier like that you made. So, if you see that there was actually another experiment that one of the guys did, for 30 days, he liked every single news feature that he started to see on his news feed, and at the end of 30 days, he realized that his news feed was total full. So, it is completely useless. So, that's the reason why actually Facebook tries to ask you to like something. So, and this is exactly the reason. So, the more the some, you know, part of some news that you're trying to like, Facebook is going to come back and then it's going to give you suggest new features or not suggest new videos that you might likely watch. And that's also one of the ways to actually get more sponsors and things like that. So, that's how collaborative filtering works. So, it's based on users and the choices made by similar users. So, that's the keyword here. It's basically choices made by similar users is what collaborative filtering is trying to do. So, like I said, so what are the challenges here? The challenge here is that what if there are no users to start with? So, that's one of the primary problems that collaborative filtering has. So, whenever I start with a system, I might not have enough users to start with. So, that's when I cannot actually give some recommendations. So, that's the cold start problem and eventually some way or the other, every company has to go through that stage. But eventually you'll get over with it. So, initially there won't be much recommendations. You won't actually, there might be random recommendations. It's if you worked with deep learning, then what happens? It's like a neural network suggesting random guesses at the first when you initialize in the network. So, exactly here, it's the same case. While initializing, you give random suggestions. Over time, what happens is it tries to learn the filtering, it tries to learn how users have actually responded and it tries to learn the recommendation. So, another problem is what if there are no user profiles that actually match man? So, in this case, so suppose I have, so I have a very peculiar personality. I try to watch really interesting things which none of the users that actually match on my system. So, this could be a problem for collaborative filtering. So, that's something that is also a problem for most systems, but again, these are all the outliers which would eventually, we can actually get over with. And the third problem is what happens when similar user profiles actually have completely disparate interests? So, suppose my wife and I actually watch movies on the same Netflix account and what happens is what if I watch something totally sci-fi and then she watches totally that's maybe animation or maybe that's totally mystery or things like that. So, in this case, what happens is Netflix would try to suggest a new movie, but eventually what will happen is because the data is so confusing, it's going to suggest something in between both of these. So, that's when recommendations kind of fail. So, but again, these are all outliers, but these are challenges that every recommendation system would actually have to go through, that's inevitable. And finally, yeah, this is something again. So, if I share the same account, then this is another problem. And yeah, this problem, the third problem is actually where if I have two similar profiles, but I have completely different interests, then what happens? So, although the collaborative filtering is trying to suggest some data based on similar user profiles, but because my interests do not match to that user, the recommendations are going to be totally, totally useless to me. So, that's the problem that the third point is actually discussing about. So, again, so the potential approach here would be to actually go for a hybrid approach, where we just don't depend on collaborative filtering kind of an approach, but instead we actually try to build a content-based recommendation, which eventually tops off with the collaborative filtering and then tries to provide a hybrid recommendation system. All right, so let's move on to what we are trying to build. So, in our system, so if you had gone through the repository, it's basically, we will try to predict the ratings. So, this is basically a movie lens data set, when movie lens is basically a company that was actually aggregating all the ratings. And in this case, we have close to one million user data. So, this is for about 6,000 movies, or 6,000 users, and about 3,000 movies. So, the ratings for all these are available. There's also a larger data set, which is 20 million, and even more larger data set, which is also available for us to test. But we will take the one million data set. And the problem here is to predict the rating given by a user view for a particular item I. Item I here is basically the movie. And we have close to 6,000 users, who have rated 3,000 movies on a rating scale of close to one to five. So, and we would probably use an RMSE between the true ratings and the prediction. And we will see how this actually pans out. And we will actually also see what are the possible ways to improve this, and then we will take on from there. So, this is our base network implementation. All this is in TensorFlow. So, I'll give a brief introduction to TensorFlow in case you, this is the first time you're hearing about TensorFlow. But let me first discuss this neural network. Basically, we have a user ID and a user item. So, both these are over the bottom. And we have, we get some embeddings for these user IDs and user item IDs. And finally, we actually use these embeddings to actually calculate the SVD. So, every time the classical approach for recommendations is basically we create a matrix and then we try to do a decomposition of the matrix. But that's the classical form. What happens if we have millions and millions of data? So, that's when we actually want to go for something like a distributed, like a TensorFlow-like system. And that's when TensorFlow actually proves to be immensely useful. So, first of all, we generate the embeddings for the users and for the items. And subsequently, we use SVD to factorize this data. And since we are dealing with a million data set of one million records, we'll probably use, we'll definitely use TensorFlow. And this last is then computed. So, the loss over there is actually computed and then back propagated to the network. So, this is a very simple neural network. So, we'll start this so that it's actually very easy to follow this neural network. And subsequently, we will actually, if possible, build on top of this and then see how to actually deploy this and then get some inference. So, first of all, I'll discuss what TensorFlow is. Why do we need TensorFlow? And then we will go for the live demo. So, first of all, for recommendation, we actually go for a factorization. So, that's the conventional approach. But what happens is that with the factorization, we actually do something like an SVD, which is trying to figure out the latent features, underlying the users and the items. But what happens to large data sets? So, that is when the challenges, we cannot do an SVD on a million by million, probably a matrix. So, that is one of the reasons why TensorFlow or deep learning neural networks could actually, a toolkit could actually be very useful. We actually do something with TensorFlow here. TensorFlow is basically a general computation framework, but invariably or more so, it's actually suited towards deep learning sort of a setup. It actually makes most tasks simpler. Things like differentiation could actually be, you could just go through the theory, but in practice, it's actually very well implemented in TensorFlow and it's actually being, there are so many versions almost every six weeks. So, it's practically very difficult to actually keep up with the new API changes and the approaches, the computations are actually getting better every single release. And one of the good things is that we have quite a lot of optimizers. So, in the case like we have all the SDD, all SVDs, all these are implemented natively on TensorFlow. So, it's also possible to compute this on clusters. So, in case you have a cluster, we have probably GPUs distributed across different machines. Then TensorFlow is very, very useful even to do that. Probably one of the best deep learning toolkits to actually do distributed training on a cluster. There are some, a few more toolkits like Deep Learning for J, which actually do that, but right now TensorFlow still may be at the computational level, TensorFlow does a very good job in doing the distributed training as well. And yeah, of course, it allows GPU acceleration. We will see why it's actually doing this, how it's doing this. So, if you see TensorFlow is released in November 2015, it's been close to a year and maybe three months, but it's already probably the number one deep learning toolkit already. So, if you think about the number of GitHub contributions, the number of GitHub stars, then TensorFlow is right now at the number one. In terms of computation as well, so the TensorFlow team has done a very good job in actually making TensorFlow, the convolution, everything actually faster than most of the competing deep learning toolkits. Natively, everything is written in C++ with bindings in Python, but with the latest release, they have some experimental Go and Java API. So, that could be very useful in case, people do not want to switch from Java to Python, but everything else is written in Python. So, still Go and Java APIs are experimental APIs. So, we might have to wait for the actual release. Again, yeah, it's of course one of the most popular. If you look at the GitHub stars as of 6.30 today, this was 45,000 stars. So, this is again, just three months back, it was about 31,000. So, it probably is one of the most popular deep learning toolkits in the market right now. The good thing is that it's an Apache 2.0 license, which means that on a commercial scale, it's actually very, you don't have to worry about litigations and things like that, patents and things like that. So, yeah, that's there. So, you can also train on multiple GPUs distributed on the same machine, or you have multiple machines, then you could do that as well on a network. Of course, it's natively can be deployed on Android and iOS. One of the good things is that the current version is 0.12.1, which is also what I'm using right now. But I think version one, release candidate one is already out. I presume that it will be released at the TensorFlow Dev Summit next week. So, we have to still wait and watch what they're going to release next week. So, it's quite exciting to actually see what they might be doing next week. There is native GPU acceleration. Of course, in case you have to use CUDA for all this, GPU acceleration, CUD as well if you want better deep learning bindings. It actually runs on Windows. So, natively on Windows finally. So, with the version 12, it runs natively on Windows and prior to that, it was anyway working on Linux machines. So, the current version is 0.12, but there are some breaking API changes with TensorFlow version one. So, if you're brand new to TensorFlow, welcome. I think if you wait for a week, maybe you might actually get into a better API approach from next week onwards. But anyway, it's still a good time to start with TensorFlow. So, that's the link for if you're trying to download TensorFlow and compile it from source. So, some basics. So, what TensorFlow does, how does it work? So, TensorFlow is basically a compute graph. So, what it tries to do is it tries to create a directed graph and then does computations on the graph. So, that's as simple as that. It's basically used to define machine learning computations. But, we can use it, it's a general purpose mathematical toolkit actually if you think about it. But, in general, people use it only for machine learning and mostly deep learning purposes. Of course, it natively supports deep learning models, and this is a typical deep learning compute graph that is actually run on TensorFlow. So, that's a very straightforward image for a compute graph on TensorFlow. If you think about how it actually manages to do this device agnostic computation, all the devices for the TensorFlow core engine actually sits right on top of the devices. So, if you actually want to compute certain things, you can actually just add those ops onto a particular device and then do it. So, in one of the talks, Jeff Dean actually says that you could actually do a computation on probably a mobile device if it has the compute and then something else more powerful could actually be done on a GPU. So, things like that. So, the device agnostic computation is one of the powerful features with TensorFlow, and that's actually very interesting. Right now, there are Python and C++ front-ends with experimental Go and Java APIs that are also there for the front-end. So, okay. Let's just try out neural network and then see how it actually does that. So, the first thing is, all the code is actually available on GitHub. So, what I will do is I will basically, I'll keep, oops, sorry. So, okay. So, in the repository, so you'll actually go through, you'll have a train rexess. So, this is the recommendation systems that is actually present in the repository. So, let me just zoom this in. Yeah. I think that's good. So, first of all, we'll use the movie lens data set. So, like I said earlier, there are ratings for approximately 3900 movies, 3902 to be a proc correct. There are about 6040 user reviews. So, user ratings and it is of course delivered is actually given by the movie lens corporation or whatever that doesn't matter. It's only the data set that matters anyway. You can actually get the data set right from this link here. So, let's go through this code. I think it would be useful in case you have some issues we can actually go through this together. I'll just go through and I'll just run this right in front here. Hopefully, it should all work fine. So, the first three imports are basically data IO operations. We are trying to get some DQ, some next is basically for iterations, and we are actually having a readers, which is also doing some data processing. It's basically reading from the TSE, that's a tab-separated file, and then it's basically converting that to a data frame and a pandas data frame, and yeah, this work. The next thing is we are actually having a random seed. So, we do this random seed to actually do replicate the experiments again and again, to get the same results. So, we actually set it to 42 or some number, as you just keep it a constant. So, this will ensure that every single time you run the experiment, you should most probably likely get the same results, because every time there's a random initialization. So, instead of you actually do the seed, you give the seed. So, it actually uses the same seed as the initialization. So, this way you can actually avoid or rather ensure that you can replicate the results again and again every single time you run. So, the first parameter that we're setting is the unum and the inum. So, here is like I said earlier. So, this is probably for, this is the number of users and the number of items that we have, and we actually set it to a batch size of 1000. So, I'll talk about what batch size is doing, and the dimensions and the max epochs later down. First of all, I'll talk about the data. So, let's run through this, and the meanwhile, yeah. So, here it's basically getting the ratings.dat. So, the ratings is basically of the format like I shown here. It's basically user ID, item ID, and then the rating itself and the timestamp. So, what we are trying to do here is that we are trying to get this file. We are loading this entire file, and then we are splitting this file in terms of user in terms of ID and in terms of ratings. So, every ID is mapped to a user, every ID is mapped to a rating, every ID is mapped to a movie. Yeah, that's correct. So, now what happens then next thing is we are trying to split the data into train and the test. In this case, we are getting some indices, some random index, and then getting the data to 90 percent for training and 10 percent for testing, so our validation. This actually gives back the train and the test. Here is the actual neural network. So, let me go through this part first. Okay. So, this is what I'm talking about. So, here if you see with tf.device, so I'm actually placing all these computes on a particular device, and here because my Mac does not have a GPU, I'm actually placing it on a CPU. But if you have GPU, you could actually effectively run some computations on the GPU. In this case, we are actually initializing some variables. TensorFlow actually uses weights as variables, so that we can actually save it for later use. So, like I shared in the neural network earlier, we have a weight for the user item and the user ID and the item ID, and then we have some biases for those. Subsequently, we actually use an SVD regularization. First of all, we compute the embedding for the users and embedding for the items, and then we use those embeddings together to actually compute the SVD. So, that's what happens here. So, first of all, we have the global bias, some user bias and the item bias, and subsequently, we're actually having an embedding lookup. That's what is going to get us the embeddings for the particular users and the items. Finally, we're going to use those to actually do the inference. So, the inference is basically just the embedding of the users and the items, and then we're going to compute that and reduce the sum, and then get some inference. It's basically an L2 regularization for that, the matmul, and then we are computing some loss based on this operation. Finally, we actually use, in this case, we use an, yeah, let me run this as well. We use the cost here. The cost is basically an L2 loss and the penalty, and we actually follow the regularized leader optimizer, which is actually good in this case. We could use some other optimizer as well, but this works best. So, you could try different optimizers here. So, we basically, if you see what TensorFlow is doing is it's basically setting up the entire compute, setting up the entire model. Till now, it has not done anything with respect to the data. It is basically just initialize everything, and it is setting up for the final compute. So, that is when happens, that's what happens in the cell after this one. So, let me just go through where is my mouse. Yeah. So, here, this is where I'm actually getting the data. So, all along, I've just been running and getting some all of the variables initialized, things like the neural network initialized. But I've, okay, so something is gone. Okay, batch size is not defined. So, this is, okay. So, it's debug time. So, usually, this should happen. So, it is good that it's not defined, okay, no problem. You can just initialize it here. Yeah. So, that's what, so basically something was not initialized, that's okay. So, the good thing with IPython is that it allows you to keep Python running in the back and then do things in between while it throws an error, and then you can still go back and fix it. So, that's the beauty of IPython. That's one of the things I love about IPython. So, in this case, you can see that there are actually in total, one million records. But in this case, we've only used 900,000 of that for the training itself, and 100,000 for testing or validation. The samples per batch is what we might be wondering what this is. So, every time we take a small batch, and then we train the neural network. In this case, we do that based on the batch size. In this, and for every time there is a iteration we actually take only 900 samples, and let's go through what the data is itself. So, if you see what the DF train is, it's basically a Pandas frame, it's a data frame. So, we can actually refer to it by the user, and the head is basically going to take the first five records. If you see it's actually by index, and these are user IDs, and so that's the train and the test. So, if you go down further, you can actually see the top five for the item. So, the item, of course, it's over here. So, I'm just running it so that I can initialize all the data again. So, all of these are our type int32. By default, we actually initialize things in float, but to reduce memory footprint, we can actually keep things to int32 here. This is the item. So, all these are again integers. So, it's mapped to the ID back, and finally is the rating. So, the rating is also for those particular IDs, for that particular movie, and for that particular that the movie rating was done by that particular user. So, here we have some ratings. So, again, all these are top five, and finally, we do the- these are all the- again, we have something. Okay. Let me go up and then initialize this again. Yeah. Okay. That's good, I think. Yeah. Okay. So, it initialized. So, in this case, what we've done is we create a placeholder. So, in general, when you- so, TensorFlow would actually- for you to throw some data at runtime, you create a placeholder for you to feed the batches of the data. So, here we have a user batch, an item batch, and a rate batch. So, all these are from the data that we actually read from the ratings file. Then, we actually create- these are the first part of the shuffle iterator. So, every time the training happens, we want the data to be random so that the neural network does not try to memorize the data itself. So, we try to shuffle the data while training, and while testing, we want data per in a sequence. So, we actually go for an epoch iterator and then we get the data from the data frame. So, we have the placeholder ready, so and the model also ready. So, this is initialized. So, up until now, even now, we've not done anything. So, TensorFlow is basically just sitting there. It has created the compute graph. It is basically initialized, not even initialize all the variables, it has created locations for it to actually do the initialization. So, when you actually go for the next cell, here you see that there's a saver. So, first saver is basically for you to save the variables to a local file. So, in case you want to maybe restore the variables, if you want to retrain again, or if you want to do something like fine-tuning, or transfer learning, or something like, if you want to run it for some time, and then come back and do it for some more time. So, things like that can be used with a saver. So, the saver is basically just like a pickle in Python. So, it's going to save all the variables to a local file, and then it allows you to restore it and then train again, or do inference. So, this is the first line here. The second line is basically the global variable initializer. What global variable initializer does is that, like I said earlier, all the variables that we've created now would be initialized the first time here when we run this. Once this is done, we actually create a session. So, a session is where everything runs in TensorFlow. So, we actually use a width block here, because it's easier to actually maintain with a width block. Otherwise, it's like a file open or a file close. Once you open it, you have to close it. If you don't close it, things like runoff conditions and all those will be there, memory leakage. So, we actually have a width block here, and we, in the width block, we are actually using the default session to create an initialization. So, the first line we do here is basically the initialization. So, let me run this while I talk. It'll actually do the training. So, what it's doing now is it's basically initializing it and getting the data per batch. So, every time it gets a new, while training, every time it's getting a batch, it's actually randomly shuffling the data, and then it's actually feeding it into the neural network. So, the session.run, the ses.run here is what is actually feeding the data. And if you see the feed-dick, it's basically the dictionary that is actually fed at runtime. And every time it's basically the user batch. So, these are the placeholders, if you think, if you saw, these are the placeholders here that were actually fed into the neural network. So, these are what are actually being again used to actually initialize, that is iterate and get the weights. So, here the feed-dick is basically getting the user batch and the items and the rates. And it's predicting some errors and all the losses, and it's getting the results here. So, once you do that, you're actually computing the error, and then you're saying what the error is. If you go down here, so every time, so an epoch is where the neural network has seen the entire data once. So, that's one epoch. If I've seen, so in this case, we have 900,000 samples. So, one epoch is actually the neural network seeing all the data once. And we're actually running it, I think for 50 iterations here, 50 epochs. So, it should have completed by now. Yeah, it has completed it. So, let's go through what it has done. What it has done here is basically, every time it completes an epoch, it is going to do an evaluation of the loss. It's basically the error that in actually predicting the data and in the error in the actual data. So, that's how it computes the loss. So, the loss, the train error is the error with the training data itself. So, it's basically trying to compute the error against itself. So, that's the training error. The validation error is what the data it's never seen. So, data that we actually split 90-10, it's using that data to actually compute the validation loss. And if you see, it's actually pretty fast. So, on this Mac, it's actually taking less than two seconds per epoch, which is for 100,000 rows of data. So, what we do is we keep running this and once we have, so in this case, if you go through it, after some time it actually kind of plateaus. So, the error does not go beyond 862. So, that is when we actually want to see what is happening. So, the training loss is actually kind of going down, but the validation loss has plateaued. So, let's go down and see the validation error here. So, if you see the training error is actually still going down. So, if I maybe run this for another maybe 100 epochs or so, it might even go down further, but the validation error might not go down. So, this is where we need to understand how to improve the neural network, what is possibly maybe data is not sufficient, or do we need to add a regularizer, or things like that, we need to actually understand that and we have to incorporate those. But for our demo here, we will actually stop over here, and we will try to do some inference with this data. So, I think I did not include the data here, I'll run the train data here. So, in this case, so here we have the inference also. I think I've submitted this as well. So, let me stop this and let me run through this. So, while it's running, I'll just go through all this again. So, what we've done here is basically the same thing. So, the only thing extra here is the inference. In effect, what we would do is, we would do all the training on a GPU cluster, so that we actually do the training faster, and the inference would maybe be even on a CPU, it doesn't matter. So, here we have yet to start the training. We have already started the training. So, it's already at nine epochs. So, let it run. So, in general, we would try to do the training on a GPU cluster, like I said. But in this case, I am still running this on a CPU, but most problems are not so simple. We may not have a straightforward user item and the recommendation. In this case, what this will try to do is, given a user and an item, it will try to predict the rating. So, that's what we are trying to predict here. The validation error is the error in doing the prediction for a given user and an item. So, that's what the validation error is here. So, it's still going on. So, it takes about, okay, I think it's done. Okay. So, let me talk about this. So, what happens is that after 25 iterations, I think I just run this for 25 iterations for the test. It saves the model. So, like I said, the saver is actually saving the entire session variables into the model here. So, if I go to the model here, it should be under save. So, here you have a checkpoint, you have models data and model index and model meta. If you have large training, then what you would do is you basically throw you will actually run a tensor board and then see the validation and the training loss. But since we've already done with this, we are already computed the loss and we are more or less done with this. We are actually going to see the inference here. So, the inference here is basically for the batch, and I'm actually showing only the first 10 items, because the batch is 100,000 items. So, if you see the prediction is actually very close. This is based purely on only 25 epochs. Just on 25 epochs, this could actually do quite well. So, if you see given a user and an item, we can actually predict quite close to the actual result. If you see there is a prediction, the actual is five, and the prediction is 4.9 already. So, that's the sort of the level of data, that is the level of training that we can do in such a short span. So, we have a good compute. So, TensorFlow allows you to do feed in batch by batch and get the evaluation done straightforward. But what if I have a very poor or very old laptop or things like I want to actually do compute on the cloud? So, that's when we actually go for Google Cloud, and that's we will go back to that. So, what happens with Google Cloud is that, we can actually set up, so I think when you start with, you get $300 of credit, and the cloud infrastructure is pretty good. It allows you to do distributed training. We can actually do the same thing. We can do the training locally, and we can set up the entire cloud, using a storage and everything on Google Cloud ML. We basically fire up the console. Once we do that, we actually enable, we have to enable building. So, initially don't worry, it's only for three months, and you have $300 to test all the models and things like that. It's actually quite fun to understand what Google Cloud is actually can do and what it cannot do. So, at times it's a bit difficult to actually set the data and then do the computations, and at times it's slower than the CPU that you have, but still if you think about it, if you have some cloud infrastructure that you would like to get some inference on the cloud, then this is a good way to go. So, once you actually have a trained model like we have right now, we can actually put the model on Google Cloud and then set that, so let me just show how that looks. So, it actually has a cloud, let me show you that case. So, this is where your cloud, so the minute you actually finish your training, so this is the place where you, I think the TF model is not visible, okay. So, once I finish training, I actually come here and then I put the data over here, I put the save model over here, let me go to the console, let me fire up the console also. So, this is the console, it's typically an Ubuntu console, and it tries to get the connection first, and then it has a very similar shell that you would like to, so if you see it's this and then if I actually do an LS, I have all this data over here. So, once you set up the storage, you actually go here and then you put all the data on this place here. So, you create a bucket, and then once you have a bucket, you actually put all the physical data that you want into the bucket, and from you can actually use the same data that you, same TensorFlow code that you ran locally to actually run on the cloud, and you can get the inference from that itself. So, let me just see how that looks. So, yeah. So, in this case, we have the same data over here, and we can also, so there is a capability with Google Cloud is that you can actually run TensorFlow board as well, and then you can run a Jupyter Notebook. So, all that is also possible with the Google Cloud on this infrastructure. But currently, I'm not able to run the ML because of some billing issues, but otherwise you can basically put all the data, and you can actually do the same inference that we did with the TensorFlow board, or the Jupyter Notebook here on Google Cloud. So, we can get the same results, and we can actually wrap it on a JSON, and then we can actually put it and get the results. You can actually do of, you know, get the response from Google Cloud. So, instead of actually figuring out how to set up a complete server by yourself, Google Cloud can actually allow you to do that part, and all you have to do is train this locally and put it up there. So, yeah. So, that's the part here. Let me try to figure this out. If I'm able to set this up, then maybe I will let you know. But if you have any questions, you can ask me. In the meanwhile, I'll try to set this up, and then so, yeah. So, once you finished with the model, you actually choose the create version, and then the version actually allows you to seamlessly move from one version to another without actually having a downtime. So, it's something similar to the observables in TensorFlow, and then it actually creates different versions. You can actually provide different versions to different people, and then the Python API allows you to actually pull a different version and then get the results from different versions. So, basically, I can have the same algorithm, but doing different jobs for different people or different API calls. So, all that is actually possible with the versioning, and yeah. So, that's more or less my talk. So, if you have any questions, you can ask me. Maybe you can come to the question. Mike. Hello. Hi. Yes, thank you for sharing. This is one question on the back to your cover. Okay. The Drugterio book. This is a method using SVD. Can you explain a little more about that message? Just to me, it's like a matrix multiplication and some adhesion, right? So, what do you call it, SVD? So, SVD is basically a singular value decomposition. So, what we are doing here is basically a type of SVD here. So, we are trying to get the u, v, u, sigma v decomposition here. So, it's a very similar way over here, but it's not calculating the eigenvectors and things like that. Right. Actually, my question is, why do you have your code looks like that you're doing the matrix multiplication? Okay. So, this is a very simple implementation. So, we want to do the same factorization, but with the user and the item data directly, you're actually getting an inference based on very simple matrix multiplication. If you're going to do an SVD itself, it might be too computationally expensive. So, instead, we actually resort to a simpler way of computing the same factorization. So, you're getting vectors only if you can just multiply them. Yes. Okay. Thank you. Hello. Hi. One question is, this is done via TensorFlow, but there are some other alternatives for them when I've heard she spoke. So, did you do any comparison on performance? So, okay. So, this is, of course, we can do it with Spark, but the point is, this is not a comparison between TensorFlow and Spark. So, Spark is purely a distributed computing framework, right? So, but with TensorFlow, the availability of various tools, something like where you want to do different optimizers, if you want to go deeper. So, one thing that is actually interesting is, if I have a deep neural network, so this is basically a simple embedding and computing the loss, right? So, what if I want, like I said, the plus two, the loss actually goes to 0.86. But if you have more parameters, if you can actually get deeper, if you can have a deeper neural network, then you can actually get better features and you can get better results. So, the point is that you can actually do more computations. So, with Python, with Spark, it might be possible but TensorFlow is actually good in terms of the computation. So, the reason for doing TensorFlow is, there is no, I don't think there's a comparison between TensorFlow and Spark, because these are completely different. I understand, I understand. My question is, do you try to build the model in different tools and see whether which model works better in terms of the OIMSE? Okay. So, yeah, of course, if you have it, I didn't do the comparison there, but if you have a deeper neural network, it should definitely perform better. So, yeah, to answer your question, no, there is no comparison there, but this is to introduce how to actually do recommendation system, to build a recommendation system in a very straightforward way with TensorFlow. Okay. Another question I have is, you mentioned the hybrid solution. Could you join that with, I mean, when we land, there's just the ratings, there is no content there. Yes. Do you try any hybrid solution? So, with this data set, it might not be possible, but with other data sets, we could do that. So, for example, if we have something like a Netflix data, so the Netflix challenge, then maybe we could actually do more, but for content-based, what we might have to do is include the word embeddings as well. So, things like the similarity between the genre maybe, or things like other features might be used, but in this case, this is a very straightforward approach for the collaborative filtering. Okay. Thanks. Yeah. Thank you. Hi. Hi. I have a very connected, similar question. It was about the different models and when do you use which one? So, in this particular example, where would you use a conventional collaboration model, and where would you think a deep learning would be more useful on top of, like, would it depend on the data, number of features, where, how, and why? Okay. That's useful. Thank you. Okay. So, the question is, when do you use which model, and so to answer that question, so it depends, so the computation is what is more important. So, if you want to actually do, if you have a lot of data, then typically you can use a deep neural network. But if you have very, very small data, so you probably have only 1,000 records, then what will happen is a deep neural network might overfit, so in general, you try to generate more data. So, when you have lots of data, say a million records, like I said, like I showed here as in the movie Lens dataset, you can actually go for something like a neural network. But classical techniques still work well. In most cases, classical techniques are actually good because data is the problem there. So, although we have a lot of data, we still do not have access to those data to actually do the computation. So, in those cases, we actually still stick to the classical approaches and then get the results for those. But in general, we can actually do well. So, there are actually good papers right now, which can produce data from, so things like generative adversarial networks, can actually generate new data based on the earlier data. It's basically like neural dialogue generation. So, things like I can actually do a curing test with the computer and then I could still, the computer could pass to be a human being. So, things like that is possible, but it's only with data you can actually do such things. But without data, it might not be possible. But if you have data, then you still have to do an experiment, and then figure out if you need a deep neural network, if the neural network is good enough. So, things like going through what is the validation error, if the training error is going down, if it is not going down, maybe run for more further epochs, then if it is not going down, maybe have a deeper neural network. So, things like that. So, we'll come into play. So, yeah. So, hope I answered the question. Yeah. Hi, Patrick. Hi. My name is Shankaram. I have a very simple question. I understand that TensorFlow provides you with a large number of optimizers built into the language itself. Is it possible for someone to extend and implement their own version of a new optimizer in TensorFlow? If yes, then because of distributed computing for the underlying time, do you need to go into C++ or do you need to do more complex stuff or is it possible to do with Python itself? So, if two, okay. So, the question is, is it easy to build an optimizer yourself or things like something fundamental, some fundamental operation yourself? So, to answer that, yes, it is possible. And yeah, it is, all these are C++ bindings. So, if you want a basic operation, things like C++ would still be the only way to go. But you can still write in Python. The problem would be that it might not be as efficient as the underlying implementation. So, suppose I implement, you know, SGD again, thinking it might have a better improved approach to doing the same, say maybe an Adam optimizer. Then what will happen is that the underlying, the code would still perform better in terms of computation, in terms of efficiency, still that would still be better because it's an open source implementation and it's been continuously implemented. So, yes, to answer your question, yes, you will have to do it in C++. So, it's up to you, even if you can do it in Python, if you're okay with the overhead, yeah. Hi. Hi. Hi, baby, I'm just wondering, because I think when you see the code, it seems like a kind of a black box, you know, to a no-problem like that. So, is it possible for us to even go down deeper to see how the algorithm works, how does it come up with the training, what are the computations that went through? Yes, and finally how to test, how, maybe you could demo some of that. Yes, it is possible. So, that is one of the topics that, so in order to understand what a neural network is doing, so we typically try to see the activations. So, we actually go, once we have the train model, so to actually go down deeper into what it's doing, we actually see the activations itself. But so, for a programmer to come up with, to understand what it's doing, it's better you actually have the underlying machine learning knowledge. So, although, so everybody, all of us here would have been a programmer once, so it's nothing to, you know, you can always start and it's better to understand what the neural network is doing before getting into it because after you get into it without understanding it, you might not think, you might not understand what a hyperparameter is doing, what a stride is, what a kernel is. So, things like that would become completely too many things to understand. So, things like dropout would become too confusing. So, if you don't understand the underlying concept. So, if you're starting from scratch, I think it's still, you know, there are so much to actually happening and a lot of resources available, you can still go deeper and then you can understand it. It's not a problem at all. So, what approach will you say is good for if I want a manual piece, it is a seasonal recommendation I want to add in, which I know is not part of my data. Say, in Christmas, I want to recommend a movie to a lot of users. So, is this something that can be done within the model or will you recommend something? So, in the movie's data set, so there is a timestamp. So, what you're asking for is basically using the timestamp to actually improve the recommendation. Right? A manual recommendation, I know that it's a Christmas and I know that in Christmas season, a lot of people see these movies. I want to put that in my recommendations. But I know my data doesn't cover. Okay. So, in that case, then you might still have to retrain the data. So, because your data has only seen the training that it was given. So, if you have something completely new, then it might actually recommend something completely unrelated. So, to answer your question, you might have to retrain the data completely again. Would you recommend doing something outside the model, say, ensemble or if I don't have data to train it, or would you recommend generating data somehow? Yes. So, for somehow, it's very difficult. So, with images, it's actually possible. There is this paper called Generative Adversarial Networks that can actually generate. So, they've shown that they can actually generate new images. That's actually completely indistinguishable or to an extent indistinguishable to human-human beings about the original data. But for text, there is a new paper from Stanford, the NLP group in Stanford that actually generates dialogues. That's a neural dialogue generation using reinforcement learning. So, the GANs is actually a very interesting topic. So, if you want to generate discrete data yourself. So, at the moment, it has been actually kind of solved with neural dialogue generation. But with discrete data like something like maybe stocks or things like in your case, whether data which is not available, it is still a problem. It's not directly, you cannot generate data in that sense. Okay. I'm showing. Which is your favorite cheapest airline ticket recommendation system? Thanks. Okay. So, I don't understand. So, the cheapest airline recommendation, if you still, I think one of the good things is, I don't know, there are too many. When I want to book a ticket, I generally don't stick to one. I would generally have an alert on Google flights, on kayak, so many things on Skyscanner. But in general, I found Google flights to be good. I don't know. Thank you. Any questions on your Italy? Time of minutes. So, more questions. Thank you. Okay. I have a simple question about linear regression. Have you actually tried to implement a linear regression? Most of the time, we have to keep how many degrees or how many coefficients are there. Is it possible to actually let the algorithm define how many coefficients and how many degrees it has to be, rather than we keep it ourselves? Yeah. So, things like that. So, that's the classic problem of, if you have say a two-dimensional data, and if you're trying to do a linear regression, if I fit three coefficients for a linear data, then typically I would be overfitting. So, what you're talking about is basically finding the hyperparameters. So, there is a standard way of doing that. So, there's a Bayesian method, which actually tries to do a brute force. There is a brute force, of course, that's straightforward. But if you have very high dimensions, then it may be very difficult to figure out with a conventional or a brute force approach. It might be computationally too expensive, and the time taken is too expensive. But it's possible. Yes. There are approximate methods to actually figure out the hyperparameters in what you're talking about is basically the number of coefficients, for example, but there are other parameters as well. For example, the bat size maybe, or it's a learning rate, or the momentum, things like that can also be learned. So, for all those hyperparameters, can be figured out that there are algorithms that allow us to do that. I'll put them so that we can just check it out. Sure. I can maybe share that offline. Yeah. Thanks for sharing. Just a question. You mentioned that you think it's possible to train a hybrid implementation model, right? So, but then you have some implementations on these list cases, and where we can find examples. So, right on the bat, I don't have an implementation, but if you think about it, so like I said earlier, so content-based depends on the content. So, depends on what sort of content you're looking at. So, for example, if I'm looking at suggesting images and having a hybrid of images plus the collaborative filtering, then I might have to understand what the content of the images. So, that is one of the, so you have to define the problem before you actually get into developing a hybrid approach or any sort of approach, first of all. But yes, it is possible, but right now I do not because it's going to become too computationally expensive to compute on even a laptop or even on a server, because the type of content, the data you have right now, I don't think I have the data for doing that, but it is possible. Maybe we're going to do a phrase. For say, in the percentage problem where we involve multiple source of data. Of course, in moving as a single source of data, a hybrid source of data. So, you want to do hybrid, but it's not necessarily involving a manual engineering of the content. So, it's not manual engineering per se in that case. So, because even when you're doing a hybrid approach, say suppose I have a movie titles, which is actually images, then what I could do is I could train a separate neural network that actually understands the content. So, things like I could actually use the inception network to understand what the image content is. So, if there is a person in it, I could actually say there are three people in it, with the car. So, something, things like I could understand the content from the image, and then I could actually club it with, you know, have a word-to-face representation to train a neural network on that part for the content, and for the collaborative part, I could actually use the user item kind of an approach, and I could combine those to actually still do a recommendation. So, it's not feature engineering, handcrafted feature engineering in that sense, but you still have to choose one of the sources. So, again, so you need to still train the neural network to do the sort of an inference, to sort of understand what the content is. So, if you're having an image, for example, then you might still have to understand what the image contains, right? So, you still have to have a trained model on that, and a trained model for this, and combine these two like a joint training, and then still, yes, it is possible. Do you know whether TensorFlow only supports live data streams or whether it's just a focus? So, by live data streams, you mean images? Live streaming data. Live streaming data in the source of- Just allow it to take some. Sorry? Just allow it to take some. Okay. So, it is possible, yes. So, because all, as long as you're doing it for testing at inference, it is still possible because in the end, what TensorFlow is doing is basically, you can initialize the entire graph, and then it's about how the latency in the input and the computation and the output. So, if you have something like a trained model which can do inference within, let's say, if I have a GPU, for example, if I give you an inception model. So, for example, because in images, what we have is we can actually do it much faster than real time. So, if we can do that with images, then with telemetry data, which is typically a matrix or a number, it is definitely possible. Yes. It's surely possible. Yes. Hi. Basically, your data is as good as what you can recommend, but we know that recommendation changes over time. Yes. People have different preferences. So, do TensorFlow provide discount? Data aging or discounting of your data? Okay. So, TensorFlow itself is only a compute engine. So, it's not about TensorFlow. You can use the same model on any deep network or any mathematical tool to develop the same algorithm or same results, the same model. The point here is to answer your question, you want to actually include something like a decay for the data. Yeah, weightage. You could still do that. So, what we basically do is we have something called incremental training. So, you actually have a model that you actually keep training over time. If you have some new data, you initialize the same neural network with the existing weights, the weights that you train earlier, and then you train with the new data. So, what happens is even if your data is not as large as the earlier where you train, you could still train the neural network to actually improve the way in which it's given the new data. How do you propose a complete use of training? It's incremental training. Yeah. Yeah. We'd like to ask how do you finally deploy this TensorFlow model in the Python server by using Flask and create the REST APIs? Yeah. So, effectively, you still have a wrapper, so like Flask. So, every time there is a job, so every time you need an inference, you actually have all these jobs scheduled, and then once you have the jobs, you actually run these jobs for every single inference. Typically, if it is images, you actually batch all these, and if you cannot do a batching, then you still need to have these models initialized on the compute, and then you still get the inference based on a schedule that you're actually running. So, it's like something like Q server, so like a rabbit MQ or things like that. So, you basically give some data that needs to be run on the model, and then you get the inference, and then you store it separately. Sorry. I'm actually quite new to TensorFlow. So, I want to ask is, if we train a model based on, if we train a model, is it possible to use transfer learning to make another model that is of similar, that has similar features? So, instead of run, like if I want, for example, movies, so I want to do another model of books, so instead of running the whole data for books again, I'll use the model for movies. Yes, it is possible. Transfer learning is one of the best things with deep learning. So, if you see in images, for example, the ILSVRC was close to 1,000 categories and a million images. You can use that to train completely new categories. So, from 1,000, I can actually do a catch versus dogs, or things like I can train a human being versus cars. So, things like that can be done with TensorFlow. Yes. In TensorFlow, there is a binary file that allows you to do this retraining. So, you could do that. Yes. Maybe a last question, anyone? So, actually, I'm new to TensorFlow and machine learning. I just want to ask then, how do you assign the weight? Suppose, I got a lot of data from different websites, and I have a lot of data. So, the thing is, machine learning basically works on the numbers because you train your models by assigning the weight and numbers. So, how you assign a weight and numbers? Like, I went to your site, like a Google AI project, there are a lot of projects, and then those are projects are working on numbers, like musical instruments are there, when percentages play, some piano and some random strings will be played. So, at there, the numbers are mainly, they convert all the files and the numbers, and then, based on that numbers, it plays the part of the thing. So, how you, because I can't understand how you assign the numbers to this thing? So, typically, you don't assign the weights. So, the weights are first, during the first iteration when the graph is initialized, it's initialized on random. So, if you see the first training error is 2.4647, so that's because it's randomly initializing it. So, now what happens is, from there onwards, every time, so the first time it's because it's random, it produces an output, which is basically an equally probability for all the classes, that is going to predict. So, now what happens is it predicts, so once it gets a loss, it backpropagates the error, and that's how it actually refines the neural network. So, the weights are not initialized with a particular value, but instead, it is randomly initialized. So, if you scroll up here, you can actually see that the weights are, so here, the weights are truncated normal initializer. So, if you see over there, let me highlight that. So, that's the truncated normal initializer, so that's the way you initialize the weights. So, you don't initialize it to a particular value because then that completely defeats the purpose. So, now, once you have this, you actually backpropagate the error, and then over time, if you see the error actually goes down, so that's because the weights are being learned, and it actually produces better results over time. So, every epoch, the loss actually goes down, and effectively, it's actually improving the accuracy as well. So, hopefully, yeah. That's what is happening over here. Every time, there is a loss reduction, so that's the reason for the reduction in loss as well. So, this is basically recommendations for the movies. Yes. My main question is, suppose in movies you have a list of names, then you have a ratings of the user. So, ratings in the numbers, the list of the movies is just like the strings. So, you assign the strings a random number. No. No. Here, what happens is this is basically, so if you see the data itself, it's basically mapping the ID with a user and an item and the rating. The item here is a movie, and the user is a particular ID. So, what you're saying is that, it's not actually taking some random number. It's basically the rating that the user gave for that particular movie. And what we are doing is, we are making it into two separate data frames, and then having the rating as another data frame so that we can actually use that as a loss. So, when we actually use these two to actually predict the rating, we actually find the error, and then we propagate that back. That's what we are doing here. So, every single time there is a compute, we are actually predicting what the rating could have been given this user given that particular movie. That's what we are trying to predict here. Yeah. Hi. My question is, there is a device code in CPU. You are passing this device here and there. Yes. How will this impact the problem itself? So, the good thing with TensorFlow is that, when you're doing all this, it's basically like compiling. So, in Java, when you actually do the compilation, it's actually going through all the variables and then figuring out if something is wrong. So, in this case, it's not figuring out something is wrong. It's basically setting up the compute graph, and then it's only setting itself up for all the data. So, effectively, it's not impacting anything because it is already setting up everything before you even pass it the data. So, once you pass it because it already knows that this is the data that it wants to process, it shouldn't actually matter. So, because it knows how to actually take it from one device and put it back. So, it should not have, it should not impact on the performance at all. Does it mean that CPU colon zero means one core? You're forcing the program to get one core? Yes, for the first core. Yes. So, if you have multiple cores, you can actually. So, if you have multiple GPUs, you can basically make that operation run on a particular GPU. That doesn't need any code change, right? No, no code change. Again, just one caveat here is that certain codes like embedding do not work on GPU. So, you still have to do those on CPU. So, if you go through the API, you can actually figure out what are the computations that can be done on GPU, what are the computations that can be done on CPU. But yes, in general, it's device agnostic. So, because I'm running this on the CPU, if I had a GPU, I could have as well run this on GPU. Yeah. Thank you, everyone. Let's thank Karthik for this session. Okay, just before I move off, I just have a very small e-draw because I requested some strikes.