 Hello everyone, I am Priyanka from Wal-Mart Labs, Bangalore. I work as a senior data scientist with the computational advertising team there. I'll be talking about how we do user response prediction at scale. So first off, what does an advertising ecosystem look like? So there is a user who will visit an advertiser's site, say walmart.com, and do different kinds of activities on our site. They could look at item pages, some category pages, maybe add a few items to cart. And then even usually the users will eventually go away, right? And then they'll start browsing other websites on the internet. At this point, as advertisers, we want to re-engage the user and get them back onto our site. So we take part in options and submit bits to these options. And these bits are computed at user level. We are trying to do a user level re-targeting because as advertisers, we do not want to bombard users with ads. We only want to show relevant ads to users who are actually interested in our products. So how do we go about doing that? So this is where response prediction comes in. So we want to predict if shown an impression, will the user actually end up clicking on our ad? Or if the user clicks on our ad, will they actually end up purchasing from our site? Which is click-through rate and convergent rate respectively. Once we have some estimate of the user's intent, we'll then go about bidding in these options. And the two key aspects in this problem are the data that is there, which is the real essence of the problem. And secondly, the algorithms that will be used to train over this data. So I'll be talking about all of these components as we go further. So we have some idea of what response prediction is, right? Now, how do we go about formulating this problem? So I have a user for which I want to predict if this user is going to purchase from my site in some time from now, right? So the first part is defining the features itself, right? For that, we'll have a look-back window. So say we have 30 days look-back window. And now over these 30 days, I'll collect all the possible features that I can about this user. So these features could be like interaction signals. So I know the kind of product that the user was interested in. I know the kind of categories that the user was browsing and maybe also the art to cut, right? Even user history can be really important. So if I know this is a user who's already purchased from my site, so I know he's a loyal customer and might purchase again, right? Even item signals could be really important. For example, if I know I'm price competitive on a certain item, then that means I am the best choice for the user, right? So that user might actually end up purchasing from my site. Even contextual signals like day of week, time of day could be really important. So now we've built the user features. The next part is getting the labels itself, right? So again, we define a prediction interval. So say I want to predict if in the next seven days, the user is going to purchase from my site. So if the purchase actually ends up happening, then the label is plus one. And if the purchase does not happen, the label is minus one. So at the end of this exercise, I have a data set where I have user features, which could be say X and I have labels which are say Y here. And this is a binary classification setting where the label could be either plus one or minus one. So now that I have a data set, the next problem is to train models on it. So this is how the pipeline looks at Walmart right now. It is a pretty standard pipeline. So we have lots of inputs coming into a Spark pipeline there. These inputs are essentially all the site history of the users, attributes of the products that the user have been browsing, and also the campaign history. So all of this goes into a Spark pipeline and which first aggregates the data, does some preprocessing over it, adds an ML classifier on top of it. Now this ML classifier could be say XGBoost, Random Forest, Logistic Regression, etc. And finally there is a hyperparameter tuning layer. So which essentially ensures that I learn the best model out of this data that I have. Once I have the best model, I just purchase that model. So why we are using Spark here is because of two reasons. Spark is a distributed processing platform and we have lots of data. And these two work really well together so that we are able to do all the data aggregation and processing and modeling really fast. Secondly, Spark allows us to store not fully this ML classifier, but this whole pipeline as a model. So which essentially means when I am taking this model from offline to online, a lot of the processing is, I do not have to repeat a lot of these steps. I do not have to repeat aggregation and processing, etc. So all of this is already taken care of. Just give raw data to this pipeline and out comes the scores. So now we have a good offline model and the next logical step is to take it and deploy it online. The most important thing to understand here is that model scores are not equal to bids. Model scores are just probabilities which tell us how likely it is for the user to purchase from our site. But bids, they are actual dollar values which I will end up paying if I win an impression. How do we go about converting these model scores to bids? The first step along the way is calibrating these scores, because these scores will now be used to compute the bids which will finally take part in the auction, which means I have actual money going here, right? So we want the predicted probabilities to be as close to true probabilities as possible. Multiple models can be used here. So for example, isotonic regression, flat scaling, etc. Now that I have good scores, the next part is to scale these scores into final bids that will take part in the auction. So multiple factors will play a role here, including where is this inventory that I am bidding for? Is it a mobile inventory or is it a desktop inventory? Because mobile inventory is usually much cheaper than desktop inventory. So for 30 cents, you might get a click on mobile but not on desktop. Even objective of the campaign is going to be really important. See, I have been running a revenue campaign to get as much revenue as possible and suddenly a marketer comes up and tells me they want more football in a certain category. So my problem is now to get more football in electronics category and also there is this revenue campaign going on. So that means I want more clicks in a certain category, which means I will need to increase the bids in a certain category, but the user intent has still not changed, right? The user scores still remain the same and now I suddenly have to bid more. So the scaling function will need to take that into effect. Also, publisher signals could be really very important. So if I know certain inventory or creative has worked well for me in the past, I would like to exploit that information and maybe bid more on that sort of an inventory, which I know works well for me. So this is the part where I have been able to build a model which finally gives me bids, which can take part in the auction and not just this course. What are the other challenges that I might have to face when deploying it online? So the first challenge along the way is that we are dealing with real-time processing. So as soon as a person does any activity on our site, we want to be able to target them off any of our site. They might visit any other website and we want to be there to show our ads to them. So our models cannot act as a bottleneck. All the processing needs to happen really very fast. Also, the model that we had trained was a batch model, right? So all of the data was sitting right there and we were trading a model over it. And now suddenly in an online processing, what is happening is we are just throwing streaming data at this model. Will this integration really work? So we have faced issues here. So when we deployed random forest for the first time, we saw a lot of delays happening in the pipeline. Even when we tried XGBoost for the first time, we saw a lot of memory leaks happening. So we deployed solutions around this and eventually things started working. But this is also a potential point where things can break. Now over to A-B testing. So this is the real holy grail. And a lot of models seem to promise quite a bit in the offline testing part. But a model only works if it proves it's worth in A-B testing because you always need to A-B test, just have only a segment of the population being transferred to a new model. And if it really works, just scale up the percentage of segment that is going over to a new model. At this point, I would also like to call out a win that we had last year. So Walmart had an A-B test against a really big third party advertising player. And the goal of the campaign was to get as many new customers as possible. And Walmart's in-house campaigns really won by a significant margin. And this was a really big win for Walmart. So we've been discussing a very standard binary classification setting, right? And lots of classifiers already exist for the same. What sets our problem apart from others is the kind of data and domain we are dealing with. And so I'm going to share a few nuances of what we've encountered. I'll start with the purchase funnel itself. So millions of users visit our website each day. And most of them just look at home page, category page, search page, and then they just go away. These are very light and tin pages. People do not do any other activity. Just maybe visit one or two of these pages and then they just go away. Some of them actually end up looking at an item page, which means they have more intended, they're interested in a concrete item, right? And even a lot of them go away. Few of them actually add up things to cart, which means they are looking to purchase, right? Even fewer finally complete the purchase. So if you look at the funnel, as we go down this funnel, the data is getting sparser and sparser. And this is going to be an important nuance to keep in mind while modeling the problem. Also how users interact across different devices might be quite different. So on desktop, for example, we see that very few people add items to cart. But a lot of them end up converting. Whereas on mobile, we see lots of worstly traffic. People keep adding things to cart. But very few of them actually end up converting. So the kind of traffic we have across these two devices, even for the same user, is very different. And that is also something to be kept in mind while modeling. This bit is about how labels itself should be formed. So are you trying to build a conversion model or a click model? Because a good conversion model might not be a good click model and vice versa. So for example, we've seen that segment of users, they tend to be really good clickers. If your goal is to get clicks on your site, just go to them and they'll get so many clicks onto your site. But they do not tend to convert. So you need to be really sure what is it that you're trying to model. And this is a very important design decision that needs to be taken much beforehand. Also the setting that we've discussed till now seems very ideal for an advertiser, right? It seems like as an advertiser, I know everything that the user has done on my site, which is not really true because in a realistic setting, users have multiple touch points with an advertiser. So users have multiple devices and multiple browsers through which they finally interact with an advertiser. And as an advertiser, what I see is a multiple partial views of a user. So the same user on desktop might seem like an average shopper, but on mobile, they might just seem like a casual browser. So had I known the whole story that this was the same user across these two devices, I would have known that this user is definitely going to purchase and is just looking to add a few more items to their cart. But because I do not have that information, I do not know that these two users are the same. What ends up happening is I have a lot of incomplete data in the system. Also, there are a lot of noise sources in the system. So suppose a user has low connection speed. So their device might not be able to send out a few signals and I might lose out on those signals, right? Even cookie churn is a really big problem. About 65% of the cookies are deleted monthly, which means that even for the same device and user, after a certain point of time, I will not be able to track them. So this essentially means that the data in the system is very noisy and also very incomplete. To put things into perspective, so Tritio did a study where they claim that about 31% of the transactions involve two or more devices. Also, if we look at the user-centric view of activity as compared to the device-centric view, we have about 40% increase in conversion rates. So that means incomplete data is a real problem. But only 5% of the advertisers have a complete consolidated view of the users. Other 95% do not. There exist some probabilistic methods to stitch user profiles across devices, but complete consolidation remains an open problem. So this brings me to the point where I will talk about the kind of optimizations we are working on to actually deal with this problem of noise and data incompleteness. Because the current classifiers that are used in user response prediction, they assume that the data is precisely known, which I just saw is not really the case. So what we propose is to characterize uncertainty in the data and that this will lead to robust classifiers which will be immune against any data perturbation. So how we characterize this uncertainty is using principles of robust optimization. And this results in two algorithms, robust factorization machines and robust field-aware factorization machines. This is a paper accepted at the www.2018 and my co-author Surbhi Punjabi is also sitting here in the audience. So instead of me telling you how the solution looks like, we'll build it together today. So I'll first discuss the state of the art, like what factorization machines are, what field-aware factorization machines are, what robustness really means and how do we incorporate this robustness in these highly expressive algorithms to obtain their robust variance. So let's start with the state of the art. So this is a binary classification setting where we have user features and we want to predict label as plus one or minus one, right? For a family of classifiers, the optimization function looks something like this. So we are trying to minimize a loss over a vector w that we are trying to learn. And now this loss has two components. One is the empirical error and the other is the regularization penalty. So what empirical error is trying to ensure is that the predictions are as close to true labels as possible and what regularization penalty tries to ensure is basically just tries to regulate the complexity of the classifier. Essentially, you know, trying to learn as simple a classifier as possible and the phi function here is going to be really important. So this is sort of a transformation function. It defines how the features are going to interact with each other. And it takes the help of this w vector that we are trying to learn. How we define this phi will result in many kinds of classifiers. We'll also see a lot of them today. So just keep in mind this phi function is going to be really important and we are going to play with it throughout this talk. So let's start with logistic regression. This is a classical algorithm very famous and what it tries to do is so what it says that let's learn a vector w of length d where d is actually the number of features. So the phi function it defines is just a linear interactions of these features and the interactions are made by the vector w that we are trying to learn. So a very good phi function here and this is very scalable because the number of parameters that is just the order of d because the w vector is order of d. This is very interpretable. So if you want to understand how important a feature is just look at its corresponding value in the w vector and you'll have some sense of how important a feature is. The problem with logistic regression is that it does not capture the pairwise interaction effects. What that means is essentially say a certain category is only browsed on mobile and never on desktop. So this sort of a feature interaction so pairwise feature interaction will never be captured in a logistic regression model. So this is where poly 2 comes in. So it says let's try and capture these pairwise interactions between features and we'll use a now a matrix W for that. So this is a matrix that will try and capture the pairwise interactions. So for any two features J and K the importance of their interaction will be captured by this JKth index in this matrix. Now we have order of d square parameters is being learned because this matrix W is order of d square and the five function looks something like this. So pairwise interactions means we are saying for all possible features J and K we are taking XJ into XK and the importance of this interaction is given in this JKth index of this matrix. So all is good. We are able to capture all of the order to interactions and we have a new five for this thing. But the problem here is twofold. First we are trying to learn order of d square parameters. D is usually of the order of millions in response prediction especially for advertising domain. So that means we are trying to learn order of million square parameters which is definitely not feasible and also on top of that this parameter matrix W is going to be highly sparse. Let's try to understand why that happens. So in advertising response prediction so we have lots of categorical features and they are called as fields. So publisher could be a field, brand is a field, device is a field and each of these categorical features can take millions of values. So publisher could be CNN, Vogue and million other publishers that are possible. Even brand could be Nike, Adidas and millions others. Device could be desktop, mobile, iPad etc. But when we want to use these fields to train models we first convert them into features by one hot encoding them. So what one hot encoding means is essentially that for a certain impression the publisher is CNN say. So in that case in CNN I'll just have an entry of one and Vogue and every other publisher will have an entry of zero. So we one hot encode these into features and CNN would never really interact with any other publisher. So essentially CNN which is one feature doesn't really interact with millions of other features. So that means they do not really interact and therefore this parameter matrix is going to be really sparse. So this is where factorization machines come in. So this was a seminal idea proposed by Stephen Rendell in 2010 and what he says is that you know let's just learn a latent vector of dimension P per feature. And now this latent vector will capture every interaction that this feature can have with any other feature. So let's see how that looks like. So now we have for each of the D features we just have a latent vector P that we are trying to learn. So because we had D features so we have essentially matrix of ordered D into P that we are trying to learn. And now if we want to capture the interaction between any two features J and K how do we go about it? So we just take the latent vectors of these features. So say VJ is the latent vector corresponding to feature J and VK is the latent vector corresponding to feature K. We just take a dot product of these features and this gives us the interaction between J and K. A quick primer on dot product. So these are the two features for which these are the two vectors on which I'm taking dot product. I'll first element wise multiply these two vectors and which will give me a new vector. So 0 into 5, 0, 1 into 2, 2, 4 into 8, 2, 8 and again 0. And once I have this new vector I just sum up all the elements in this vector. So 0 plus 2 plus 8 plus 0 gives me 10. So this is the final value. This is the final weight of interaction between features J and K. So just revising now. So we have order of D into P parameters that we are trying to learn. Now D here is much, much less than P. So P here is much, much less than D. So if D was of the order of millions P is just of the order of maybe tens or hundreds. That's it. So we've reduced this parameter matrix by quite a bit and the feature-featured affinity is now given by the inner product of the latent vectors that we are trying to learn. So the 5 function now looks something like this. So we have linear interactions just like we had in logistic regression and we also have pairwise interactions now. So again XJ into XK this pairwise interaction is weighed by the inner product of these latent vectors. So this is good. We have a highly expressive model now and the order, the parameters are now just order of D into P, right? Now let's move to an even more expressive model which is the field aware factorization machines. So just let's recall how the fields used to work. So we had publisher, brand, device, et cetera as fields and we used to one heart to encode them and get features which were finally used for training the models, right? But when we are training these models when we are using these features we forget the fact that a lot of them used to belong to the same field, right? So this is what field aware factorization machines aims to do. It aims to keep this information intact. So how it does it is essentially learning a latent vector for each feature and field combination. So instead of now just a latent vector per feature we will have a latent vector per feature field combination. So let's see how that looks like. So for each feature and field, so say there are Q fields and D features we will have a latent vector of dimension P and so that means essentially we are trying to learn a parameter matrix of order of D into P into Q. So now if we want to capture interaction between any two features say Nike and Vogue, we will just take two latent vectors and take a dot product over them and these latent vectors are now vector corresponding to Nike and field of Vogue. Vogue's field is publisher, so a Nike and publisher and then Vogue and field of Nike. So because Nike is a brand so we have Vogue and brand and now just take a dot product and you have the interaction rate between these two features. So order of D into P into Q parameters being learned here now and the fire function evolves to something like this. So again we have linear interactions and pairwise interactions and it's just that the weight of pairwise interactions has changed. So now the pairwise interactions are given by a feature and field of the other features. So the dot product of those vectors now gives us the interaction weight. So this is an even more powerful algorithm as compared to factorization machines. The only problem being that the number of parameters being learned here are even more, but these have become quite popular. Both FM and FFM are quite popular because they've won not only Kaggle competitions but have really done well in production settings as well. So AdRule has a blog and Pritu has a paper about it. You can go through those papers. So now that we have some idea about FM's and FFM's let's try and introduce robustness in these algorithms. So robustness essentially has two key ideas. First is uncertainty. So we need to define uncertainty associated with data points and the second idea is redefining the optimization function itself. So let's first define the uncertainty. So uncertainty looks something like this. So say this is how my data points used to look earlier and this was a classifier that I was learning. Now when I introduce uncertainty over data points, that means essentially having hyper rectangular manifolds over these data points. And now this data point can reside anywhere in this hyper rectangular manifold. And so we've defined uncertainty, right? And we can see how it looks like. And now let's look at what the optimization actually wants to do. So robust optimization seeks to learn a classifier that is feasible and near optimal even under the worst case realization of this uncertainty. What that means is essentially how the optimization problem is framed. So let's look at the optimization of a general classifier. So we have a loss minimization form. We are reducing loss over a vector W that we are trying to learn. So we are seeing this form earlier as well. But robust factorization has a minimax form. So we are trying to reduce loss. But now if you look at loss which is each data point has an uncertainty associated with it. And we are trying to minimize loss with respect to W. But there is also a maximize term here which is, so we are trying to maximize loss over uncertainty and then trying to minimize this over W. So in our paper, we use box type or an interval uncertainty. What that means is essentially uncertainty of each feature is independent of every other feature. So if I have certain uncertainty over one feature, it doesn't really impact other features at all. So if you recall, so FM's had linear and pairwise interactions, right? So we define uncertainty over linear interactions as mu. And we also define uncertainty over pairwise interactions as sigma. And we introduce these uncertainties in the phi. So now we have a new phi function for robust FM. And if you can notice, so in the linear interactions, we have introduced this linear uncertainty mu. And in pairwise interactions, we have introduced the uncertainty sigma. So we have a new phi which essentially means a new algorithm altogether. And now let's look at the optimization problem. So we have a robust minimax formulation, right? Because this is robust optimization. And we'll introduce this new phi that we have just defined. And we have this new phi. And so this is the robust optimization formulation for FM's, right? But most of the solvers that we have like gradient descent and so on, they solve only a pure minimization form or a pure maximization form, right? But we have a minimax form. So what we do is next, we reduce this minimax form to a pure minimization form. And how we do that is essentially, we try to upper bound this loss with respect to uncertainty. So we just get a worst case loss in terms of uncertainty. And the uncertainty terms now go away. Now we just have a normal minimization form which we can use, we can use any solver to over this and get the solution. Similarly, for robust field-aware factorization machines, we'll again define a new phi with linear uncertainties and pairwise uncertainties. And again, we'll have a robust minimax formulation, which will then reduce to a pure minimization form. And we have the robust field-aware factorization machines. So now over to the experiments that we ran. So we used three real-world datasets from Krittu and Awa Zoo. The datasets were regarding click-through rate prediction and convergent rate prediction. And we provide a spark-scala-based implementation. The code is open-sourced and link is available in the paper. You can check it out. Also the results were very promising. So what we see is that we see a significant reduction in log loss in case there is noise in the data. But if there is no noise in the data and still you are trying to use the robust formulations, we see a slight hit in performance. But in noisy cases, it is definitely something that you could use. Also RFM and RFFMs are genetic predictors. They are not restricted to the computational advertising domain. We've also proved this in the paper. We've actually run this on even credit card for detection dataset. And there also we're getting similar results. Now over to the key learnings that we've had over the period. Firstly, data is supremely important. There are so many layers to it. You just keep peeling it off and you'll have something more to learn each day and keep improving your model. Secondly, keep your goals really high. But you need to start small and you do not want to be obsolete by the time you've finished a model. And you need to keep reiterating because a lot of things you learn through the process. So many things will actually not work out, which is also a good learning. Also, AB tests are the real litmus test. Model works only if it has proved its worth in an AB test. Finally, I think innovation is extremely important because each of us in our small way is trying to solve a new problem. And if we innovate, we not only add to our understanding but also to the understanding of the community in general. Thank you. So let me repeat the question. So I had mentioned about predicted probability versus true probability. So what does that really mean? So a model will have certain predictions, right? So if you look at those predictions, so a model will give out a probability essentially. When you have classification, a model essentially tries to say how likely, what is the probability of a plus 1 or a minus 1, right? So that is predicted probability of the model, right? And then we have true probability in the sense that, so we now know that the model has given me, say, we now have a distribution of the probability from the model. Now what we essentially end up doing is that we distribute maybe say into a few buckets, this probability, the predicted probability into a few buckets. And for each of these buckets, we'll see what is the true probability according to the data that we have. So for all of these samples that fall in a certain bucket, what is the true number of 1s and 0s? So that sort of a thing. So this is the truth. So is it close to the truth or not? Thank you. Hello? Yeah. So my question is that you have built a model. For example, after looking at the past data, and then how are you monitoring day-to-day that how is it going means sometimes the prediction, because there will be noise or new scenes coming up maybe, because your user is there with some reason and you don't know what exactly is going on. I think there are some uncertainties you talked about between the two variables and things like that. But you have to monitor this day-to-day. New uncertainties are coming, which are not seen by the model. How are you actually doing it means in a regular basis? So we train the models from time to time. So exactly because of this thing, because there are new distributions. We don't know why those distributions are changing. So we have to keep repeating the modeling exercise from time to time in order to latch on to the new distribution, the new pattern that we are seeing. I think that is what we do right now. Do you do it in some kind of monthly basis? Sir, follow-ups offline please. She'll be available. Yeah. Hello. Yeah. So can you please come to that field-aware factorization slide? Can you please come to one field-aware factorization slide? Yes. So your fields are like publishers brand device, right? Right. And your features are like CNN, Vogue, Nike, Adidas. Correct. So what only I have a small query. So you are fixing your number of features always to D. Two? Your features are always D. For example, in publisher it will have D. In brand also you will have D. In devices also D. All of these total accounts to D. Once I have this whole set of features here, the order you are taking as it can be. Summation DI is D basically. Summation of? DI. Different intervals you are taking D1, D2 and there some is D. Oh, so these are fields. So there are Q fields. So you could maybe say, I don't know, QI or Q2, Q3. Yeah. So this is finally done. Yeah. Hello. So I wanted to understand you short the result that the algorithm was performing better on the, I can say, data which is shifted or not unexpected data. So first, in the first thing, how did you identify that this data is, you know, lossy or unexpected? Right. How did you differentiate between the expected data and unexpected data? Right. So we took this data and for training, so essentially, we trained both, we took some sample of this data and we, over that later, we trained both the factorization machines and the robust factorization machines and for testing, we introduced some noise into the model. So that was, so we modeled that noise as we increase the noise levels, how does, how is the performance of these two classifiers? The detailed experiments are in the paper. I'm not showing the detail experiments here because that just takes things. Hi. Thanks for the presentation. I have a question like after the A-B test, how are you incorporating the results to improve the model further? Yeah. So essentially what A-B test, so there is an idea that you usually test with A-B test. Yeah. Right. And you only come to know whether it worked or not. Like at least, I mean, maybe somebody has better ideas about that. Yeah. You didn't come to know that this did not work. And then you need to like go back and figure out, I think there is where intuition plays a part. You then need to figure out what would have gone wrong and where you could have improved and things like that. And, you know, start again. Hi. So you have used factorization machines to capture your future interactions, right? Right. Did you get a chance to experiment or at least have you come across any study? What happens if you replace factorization machines with embeddings without encoders? Because they also in a way try to do the same thing, right? I mean, instead of using linear interactions, they are more of nonlinear interactions. Can you repeat the question? So you are using factorization machines just to capture the interaction among the features. Correct. Right? We could do this using embedding also. Embeddings are autoencoders because they are sparse auto vectors, right? You want to capture interactions among them and want a lower dimensional projections basically. I think they are a bit different in the sense that this finally becomes very linear, right? Even if there are pairwise interactions, this sort of becomes very linear. Whereas if you talk about autoencoders or embeddings, they are not so linear, I'd say. So they might be more expressive in that sense, I should say. Please connect with her offline. She'll be available. She'll be attending as well. Hey, hi. Thank you for the great talk. My question is regarding the offline training. Once you have A-B tested and you are happy with the model and you have deployed it, your offline trainer maybe once a day, twice a day, whatever the frequency you decide on. In that, do you tune your hyperparameters regularly or once you have settled on a set of hyperparameters that you're happy with and proven in A-B testing? You just let that be and just fetch a new data, run it through the same model again and just publish it. We retrain with the hyperparameter tuning as well. We keep changing the hyperparameters and check it out. It's not like the hyperparameters don't remain fixed. At least maybe the boundary, we sort of are uncomfortable in a boundary. So that might sort of say the same, but yeah, we check with different hyperparameters. Hi Priyanka. Thanks for the talk. I actually have a question about a topic you brought up pretty earlier in the talk about how you sort of make your model run near real-time. Can you, I mean, just a bunch of three for abstract questions. I mean, do you have pre-computed features? What sort of databases do you keep? Could you tell me a little bit more about that? Yeah, we keep a lot of pre-computed features for that. So for example, a lot of the item attributes, even users, previous history and stuff like that, all of that is pre-computed and kept at the right places. So for example, we have some things we just told as files. So we have a Spark streaming solution. So some of the stuff we just told as files in the same machines across the cluster. For some of the data, we have Cassandra stores. So for example, all the user information, which is happening in real-time, it has been stored in Cassandra. So we've defined aggregates in Cassandra itself. Like before being pushed into Cassandra, a lot of the aggregation happens just as soon as any activity happens. A lot of the data is aggregated and then put into Cassandra. So that in real-time, there are very few competitions that we'll have to do for the model itself. Okay, can we have the people who want to ask questions raise their hands? How many more are there? Okay, there are quite a few more. We're out of time. Let's give a round of applause to Priyanka once again. She'll be available. You can connect with her offline.