 I think I guess we can go ahead and get started. Well, good afternoon, everyone. I'm Manav Shraff. I'm from HP Global Analytics. And I have my colleague Subhashish Mishra and Biswajit Pal along with me to talk about some of the exciting work that we are doing in the customer analytics area and how it is relevant for all of us. So the topic of our discussion for today is I know what you are going to do next summer. Pretty exciting, right? Can we really know that? To a certain extent, yes. We have used certain advanced statistical techniques to marry with some important business problems that manifest themselves in the customer intelligence, customer analytics, customer retention kind of a domain where we really try to predict who will really purchase from us next and when will he purchase. And by being able to make those predictions, we are able to target our customers better with the right product choices at the right time, with the right offers, and so on and so forth. So for the discussion today, the agenda for the next 25 or 30 minutes is I'll first talk about the customer analytics paradigm. So how does the work that we have done? And it's pretty technical stuff, really fit into the entire business perspective, right? So I'll talk about that first. And then I'll hand it over to my colleague, Subhashish, to really talk about some of the techniques around Bayesian hierarchical model. And it gets pretty deep into those techniques that we have used to marry the business problem with an analytical solution. And then how we kind of improvised our original model to be big data relevant. And then we'll conclude with what are the next steps we are doing and open up for questions in the last five to seven minutes for the audience, right? So let me just start by showing you some numbers. Research shows that it is six to seven times costlier to acquire a new customer than to retain an existing one. It's not a surprising thing. How many of you would have not expected that? A show of hands, please, right? So no brainer, mostly everyone knows that. Research also shows that if companies could increase customer retention by an incremental 5% or so, it has significant impact on overall profitability to the tune of five to 95%. What I'm trying to drive here is that customer retention is a kind of a activity for our organizations. It's really imperative to drive profitable growth and create brand perception within customer's mind, which is ever-lasting. And so from HB's perspective, we take customer retention as one of our most important activities in the customer intelligence and customer analytics domain. Customer is at the core of everything we do. So what is the customer analytics paradigm at HB? Really, our key goal, if you ask any of us who are customer analytics practitioners, our key goal is to minimize the customer information asymmetry. What I mean by that, there is an inherent gap between what I know about my customer versus what I want to know. And this gap always exists. But how can I minimize this gap by continuously creating a 360-degree holistic view of a customer? And what we do at HB is we combine all of the customer data, which is customer transaction information, every touchpoint information with our customer. For example, when a customer calls our service center, if he has a problem with a laptop or a printer, we collect that information. If a customer goes to hp.com and browsers there, or buys a product there, we have his web browsing information. If a customer buys warranty from us, we have that information. And when a customer calls us to say something is not working, we have that information as well. And if a customer tells us what he would like hp.com to do, better, we have that information. So when we integrate all this information together and bring it all together to create a single view, we really understand what a customer is doing today, right? The next layer that we act to it is we try to also gather external information about our customers. What are they saying in the social blogs, in the social media? Reports that we buy from syndicated information providers to understand the market better, right? Once we augment this data with our base-level data, which is our internal transaction, customer transaction, demographic information, and so on and so forth, we really have the ability to answer some of the questions like why is a customer doing something that he's doing? And really the crux of the matter is the deep why in sight, you know, how do we then take all this information together to predict what a customer will do next? And when we are able to do that, we are able to have meaningful conversations with our cost customers rather than random interactions, right? So if I know that someone has bought a laptop three years back from HP, he's ready for a refresh in the next two or three months because a refresh rate of a laptop is about three years. And at that point in time, if I offer him a laptop product, it will probably make a lot of sense to him and resonate to him. But if someone has bought a laptop last month and I start sending him offers to buy more laptops this month, do you think that'll make sense, right? So really the imperative is how do we have so much information out to customers and so relevant information that we can make those intelligent, decisions on how we interact with our customers? So everything that we do on the customer analytics, customer intelligence paradigm is with three key goals in mind. Number one, to improve our customer retention and cross-cell metrics. Number two, to improve our marketing effectiveness. We spend millions of dollars on marketing. How can we generate more sales with less? And third is to create a brand perception about HP in our customer's mind which is beyond reason. And those are the three imperatives that each one of us think about as customer analytics practitioners at HP. Having said that, for today's discussion, we'll focus on the first piece which is the customer retention and how we have married the business problem of improving customer retention with some advanced statistical techniques and married this together to create a significant business impact in the marketing that we do to our customers. Right, at this point I'll invite my colleague, Savishesh, to talk about the technical aspects and what statistics and that brain power that was used to really marry this business problem with the statistical outcome. Audubula? Okay, Audubula folks. So let me jump right into the heart of the analytical problem here. And to do that, let us consider the typical time points in the transaction history of a consumer. So he, of course, starts off, and this is his first major milestone, when he starts off with his first purchase, let's say a time point T naught there, okay? Goes on to make multiple purchases as exemplified by say points T1, T2, et cetera. Now, the last record that we have of him making a purchase is a time point Tx. The exact question that we seek to answer here is what are his chances or probability, that is, to make a purchase between Tx and another time point capital T in the future? Okay, now, to do that, we start off with modeling some of the more important consumer transaction decisions as a probability distributions, okay? More importantly, we model the number of time that a customer, let's say j, transacts as a point of distribution with rate parameter lambda j. So lambda j essentially, like I said, is rate parameter, which means that if this guy transacts 10 times at my unit of time is days, then his rate would be 10 by 365, okay? Secondly, we model his chances of dropping off after a transaction as a binomial distribution with parameter pj. So for our purposes, his chances of churn, that is, his chances of no more being an HP customer is p, okay? And so on and so forth, but these are the two major connoisseur assumptions of analysis. Having said that, let us assume that we are aware of, let's say we have an estimate, a good estimate of p and lambda for a particular consumer. In that case, his chances of making a purchase between Tx and the time point in the future cap T can be very well represented by the expression that we see there, one minus p thing. And how? Okay, let's step back a bit. What we said is that his chances of churn are p, right? So elementary probability, one minus p essentially give me the complement of that event. That is his chances that he's still around, he's still my customer, he's gonna make a purchase, right? Now, I'll assume that number of times he purchases a follower, or is a distribution with rate parameter lambda j. In that case, let's not work it out, but his chances of making zero purchase, that he does not purchase at all, is given by the right-hand side term. It's the bar of minus lambda T minus Tx, okay? Again, this is a chance of making a zero purchase. So one minus that would give me the complement event that he makes at least a purchase, okay? Which means, this entire expression tells me that the chances, what are his chances that he's still survived, he's still my loyal customer, and he'll make at least a purchase. So many tricks again, all right? In that case, this whole expression might as well be very well used as the scoring equation for a repeat purchase phenomenon. Essentially, which means that we'll need good represented values of lambda and P for a customer, and we can feed this into the equation and get a repeat purchase score for the time span under considerations. Now, how do we do that? That's what we're gonna consider next time we're gonna experience. Okay? Now, if we do have this individual level, this probability assumptions, like I said, for our poison distribution for transactions and binomial for chances of dropping out, we can formulate what's called the individual level likelihood function for the particular consumer, and via that, the whole maximum likelihood function for all the consumers, and hence the data itself. Now, this is where exactly the classical paradigm of statistics stops. What you do is you kind of choose values of the important parameters so as to maximize what's called the likelihood function and get to the parameters of interest. What we have done further on is just to explore the fact that there is heterogeneity in consumers and across consumers, and that's really the beauty of the Bayesian paradigm. So what we have assumed is that lambda and P varies across the population as a gamma distribution and a beta distribution, respectively. So in the paradigm, some things that we're talking about the Bayesian paradigm, these are called priors of the distribution, lambda and P, all right? Following the same paradigm, these prior distributions into my likelihood function of the data would reach me what's called as the complete posterior distributions of the parameter for interest. In this case, lambda and P. So this is kind of a joint distribution probability distribution function of the lambdas and P's that we have considered. In our case, this complete posterior distribution is in a complicated form and is really very difficult to directly sample from. So what we did was use something called a Markov chain Monte Carlo simulation schemes, a Gibbs sampler, essentially, which is a simple form of just a form of MCMC sampling schemes to draw values of, represent the values of lambda and P. Now, of course, we have considered a normative test to assume that the posterior distributions converged. We have used diagnostics to see that it's converged well and stuff. And it's also important to point out at this juncture that the necessary data for this analysis is just three transaction variables, very basic transaction variables. The consumers first purchase time, consumers last purchase time, and number of times he has transacted, nothing beyond that. So that speaks about the power of this approach. Now, once we have representative values of lambda and P, we get a whole range of lambda and P values, of course, from the simulation scheme. That's what this throws up at the end of the day. What we need, essentially, is to take representative values of lambda and P for a particular consumer. Of course, we can take a mean or a median that give us some kind of a central estimate of lambda and P for a particular consumer. And once we have that, we can feed it back into the equation in the earlier slide that kind of gives me the repeat purchase of propensity score for a consumer in the time period, TX to T under consideration. Now, this sampling scheme that we had designed, this entire algorithm was devised in an open source tool. I'm sure many of you are aware of it by this time. It's called R, which is a really cool statistical computing tool. It's free and it has loads of packages and can do really cutting edge analysis. Now, once done with this, we figured that it's a fairly accurate approach. When we used it to identify repeat prospects, it did really well. It was implemented to open source, which was, again, very cool. But it had a slight problem in terms of the computation time it used to take. You see, the simulations that we are talking about are really very computationally intensive and it takes a whole lot of time to run. To give a hint of it, it takes about 1,800 minutes to score about a million consumers. Now, for a company like HP, which has databases running into hundreds of millions of customers, that's the problem. This makes it unscalable in some ways. At this juncture, we figured that we needed to modify this approach a bit to make it more suitable in the big data analytics scenario. And this is exactly the modification that my colleague, Vishwajit, is going to talk about next. So, over to you Vishwajit. Hello. Yeah, hi everybody. So, I will take the last part of the session. So, as Shubhashish pointed out, and this is one of the drawbacks of the Bayesian hierarchical model, that it takes a long time to score a large database. And in our scenario, the database is for like 100 million customer and it's a transactional database. So, you can understand how much data it has. So, this is the point where we thought that maybe a good, you know, approximation should be a good idea. And approximated in that sense, what we thought is that a good scoring algorithm, which has a property that if the two customers have the same values for the explanatory variables, will have a similar score, if such kind of scoring algorithms we use for approximation, then I think it should do a good job for us. So, in that scenario, we thought that a sufficiently parameterized regression model might suffice our purpose. So, what we did was a regression framework. Now, in this framework, now just to recap as Shubhashish pointed out that in Bayesian hierarchical model, we estimate two parameters. One is the probability of churn, that is the P, and another is the rate of purchase, which is a lambda, using three basic transaction variable, which is the first purchase, last purchase, and the frequency of purchase. So, we used a similar approach for the regression methodology as well. So, we built a separate model, one for P and one for lambda, using those explanatory variable, which we have used in the Bayesian hierarchical method. In terms of modeling technique, we started off with the ordinary least square regression. Now, it's gave a good fit, but the problem happened when we actually started comparing what is the actual error between the actual values and the predicted values, and there we don't get a result what we're expecting to do a good approximation. So, then we moved into a technique called generalized additive model or GAM. So, what GAM does and how it is different from linear regression is that, the GAM explains both the linearity as well as the non-linearity aspect of the data. Now, the non-linearity aspect get covered by a function called smooth polynomial function called spline function. Now, the result when we tested the GAM on our database, the result was really good. The error between the actual and the predicted was just around one percent, and that's what we are expecting to get a good approximation. But there is a slight issue. I'll not say in terms of execution, but in terms of when you present this equations to business. Now, the spline function, which is there in the GAM, it's in the closed form. So, when you present it to the business, so it's little difficult them to explain it. So, what we thought that because spline is a polynomial function, if we use a polynomial regression, instead using the output of GAM, it might give a similar result and it will be much easier to explain it to business. So, actually we went ahead and did a polynomial regression and it was working the similar way what GAM was working, like the error was around one percent. So, essentially what we did, we actually approximated the whole Bayesian method using a regression polynomial equation. And what are the advantages of it? So, the immediate advantages was the scalability, which was the main issue with the Bayesian model. So, first scoring a 10 million customer, the Bayesian model was taking around three days of time that got reduced within an hour in our case using a regression polynomial. And secondly, in terms of accuracy also, like when we tested both the models in our internal database, both the model captures around 60 percent of the converters in our top three deciles. So, are you people aware of the lift chart or means how many of you are aware of the lift chart? So, I'll just explain it. So, in this lift chart, this diagonal indicates that the conversion rate when we did do a random targeting and this graphs above are the conversion rate when we do a regression framework or any kind of modeling technique. So, if you concentrate in the top three deciles, what essentially we are saying that if we do a random targeting, we capture 30 percent here. In comparison, we are capturing around 60 percent if we do any use any one of the method that is regression or Bayesian. So, essentially we are giving a 2x lift to the business in terms of the conversion rate, which is a pretty good, you know, accomplishment. So, this is what we did in terms of the approximation part. So, essentially the next question come how this whole framework will be used by business. So, how business will use it? So, this is our customer database where we have the transaction data lies. So, what they will do? They will pick up the main three transaction variable which we uses. That is the timing of first purchase, timing of last purchase and frequency of purchase. Then this is the whole repeat model. I mean this will be a sort of black box for them, but in terms of explaining it. So, we will do a Bayesian hierarchical method. We will estimate the P and lambda. Then we will use those input in our regression based approximation, get the polynomial equation, values for P and lambda, plug in to the repeat purchase equation, get the repeat purchase probability. Now, when we do this for the entire database, we will have repeat propensity score for the entire database. And post that the marketing can use it in various ways. So, it can do effective marketing and it also, you know, like what it helps essentially is that we are targeting the right customers at right time. So, we are not bombarding emails or direct mails to customers who are not going to purchase from us right now. And this actually lead to a good customer experience. And essentially this whole framework has various advantages. First in one is accuracy. In terms of accuracy, we already mentioned that, you know, top three design contains about 65% of the converter. And when business used it in their marketing campaign, they got a significant lift in terms of incremental sales. So, when they used in the last year holiday season campaign, the incremental sales were about 25% of the total sales, both for the direct and the email campaigns. Second thing is the scalability. So, in scalability currently, what we do with this algorithm, we score around 100 million of customer less than a half days time. So, it's in four or five hours, we can score 100 millions of customers. And lastly, the diversity. As you remember, if you recollect, our whole algorithm was based on three basic transaction variables. First purchase, last purchase, and frequency of purchase. So, these are the three basic variables which are available to any marketer and across any industry. So, anybody can use this algorithm in order to score their customer and get the repeat purchase. And also, this particular model can be used with various other models to solve various important marketing problem. And one of them we are currently trying, which is the customer lifetime value. So, what we are doing and what is customer lifetime value? So, right now, we have assigned a propensity score to each customers. But if we can actually tie up a monetary value for each customer that will help business to segment them into three or two or three groups and then target them accordingly. So, what we are doing here currently, we have the repeat propensity score with us. So, we know which customers are actually converting, have the likelihood to convert. We have also the rate of purchase, which is the intermediary output. So, if we can devise the algorithm that will actually predict that what will be the next most likely product. Then we can actually score the customer lifetime value with it and that's what we are trying. So, you know to wrap up the whole session, what we did here is that we use some advanced analytical method and also some simple as well in order to solve a critical business problem and then generate some incremental revenue for the business. And I think that's all for the session and now we'll be opening up for the questions. Just one or two questions. Yeah, thanks for the presentation. My question is about the slide that you showed where you said it's about 26% holidays and incremental sales. I would like to understand from the statistical analysis that you did, how was the attribution for this 26% did the business come back and then tell you that look I have in principle taken and implemented your statistical analysis and therefore this is or is it derived from your end? That's it. So, what we typically do, you know like when a campaign goes it has various segments in the target. So, few segments are picked up from our repeat purchase model and few they use some business rule. So, what we typically measure is that the business rule we consider as a no model kind of scenario. Basically, these are the list which were business haven't used our model to pick up and then we basically we generate a test and control kind of scenario here and then calculate the incremental sales. Yeah. It's the standard champion challenger methodology. Standard champion challenger? I have a question which is very typical in terms when we do the Bayesian analysis or Bayesian hierarchical modeling. You've shown the posterior probability and prior probability. What about the cases which is very true for a product companies like HP or electronic companies which bring a very new product which was not there in the market till yesterday and there is no prior probability of people buying it or liking it or not liking it. So, that event has never occurred that this particular product like a particular kind of gadget which has never purchased before. So, how you taken care of that kind of scenario? Typically, we have not assumed such products into this case. These are products that have been there and it's just modeling some of the consumer decisions more like a purchase. We have not modeled the introduction of product as a Bayesian framework as out here. So, for some of the new product introductions we have a little different type of modeling techniques that we use. So, this only captures information where we have existing products and typically when we launch a new product it is typically within a particular category. For example, it's a high price category, right? And we basically launch a product within a high price category. So, from some of the past transactions we'll know exactly who are more likely to buy our high price products. That will solve some piece of the problem but we also have some very specific models to target with the pieces where we have new product introductions. So, Ashish, this question is on the hierarchical Bayesian model, whatever you presented. So, one assumption with standard poison is IID, right? I mean, Identical Independent Distribution. And I assume this model is used on a lot of HP products, right? From toners to printers and everything of that sort. The non-identical part is explained when you say different customers are given different lambdas and things like that. But there is a time series aspect to it. Correct. And that's not very... I mean, that assumption is laid out, not very laid out. For example, you're trading every time period similarly. So, that's an issue. So, are you suggesting to say that... Are you trying to say that, you know, that the observations across the past have to be autocorrelated in some ways? They have to have a correlation structure? Is that what you're trying to say? Yeah. So, that time aspect was something that we had not considered. But then, in the repeat purchase phenomenon itself, you know, the first few years we're getting at a kind of a distribution or kind of a function of this past pattern. Which is called the mic. Can you hear me? Yeah. So, essentially, we have not considered the autocorrelations patterns as such. But then, the posterior years from which we finally get the sample are a function of this past transaction behaviors, right? So, in that way, I suppose it's getting reflected. But explicitly, as in autocorrelation functions inside the posterior, no, that has not been. Okay. So, my question is that you have considered three variables, which are pretty simple, like, you know, last, past, and the number of purchases. But is it so simple because things like seasonality and things like, you know, you have a product ecosystem and there are internal dependencies, like, you know, I don't buy cartilage unless and until I buy a printer. So, how do you incorporate those dependencies into your model, like, you know? Can you repeat that again? Can you? So, one way perhaps, again, is to specifically, you know, devise a price in some way, devise a price so as to incorporate that. But I suppose it'll be pretty complicated. That's one thing that we are on so as to incorporate that in the price. Another way, if you really want to incorporate these variables would be to maybe build a fully parameterized model. So, just maybe build something like, you know, a regression-like model, let's say, because these guys, these are variables, right? Essentially, something like a logistic regression or survival modeling, things like that, a parameter model, a traditional, you know, classical approach model sort of a thing. That could take care of it. Having said that, what you have seen is that by way of this approach, this vision approach, using only these three variables, is a decent enough result. Yes, I'm sure that the more enriched your model is in terms of information, the better result you'll get. And if you can make it that strong enough to incorporate these variables into the price distribution or in the framework, I'm sure we'll get far as better lifts in that way. Can I hear you? Correct. No, so the final posterior that we get, the lambdas and peas that we talked about. I'm talking about the final campaign, the selection, 1 to 100 customers. So you're saying while targeting, do we use those things? No, actually, you know, typically in our targeting scenario, the business, suppose business want to target, say, 100 customers. So it's not that all the customers are targeted using our model. So suppose 50 goes through our model and 50 they hold out for some, say, business rule or something like that. And at the end of the day, they compare both the segments and they finally come out that how much incremental lift is coming. And that's the way it is designed. We'll break for tea right now and be back at 4 p.m. Thank you, guys.