 All right, so thanks, everyone. Thanks for coming to see my talk on customer lifetime value and what better way to start the customer first stream than finding out exactly how valuable our customers are for us and why we want to find out how valuable they are. So about me, as Cronin said, I work at Dink. I'm a senior data scientist there. The problems that I mostly deal with are customer lifetime value estimation and churn prediction. And as you'll see, these two problems are quite tightly interlinked. If you ever want to top up your friends or family's phones from above, please go to our website, type in their phone number, and they'll get their top up. So I'll start this talk by answering why we care about customer lifetime value and what exactly it is. And then I'll touch on three big papers on this topic. So the first one is a paper from 2005 that takes a distribution fitting approach. And the last two papers are using more modern machine learning approaches to try to approximate the lifetime value of customers. But before we go on, we need to define customer lifetime value. What is it? It's sometimes abbreviated either as CLV, CLTV, or LTV, depending on who you ask. I tend to err on the LTV side. So LTV is the net present value of all future cash flows generated by a given customer. Why do we care about this? Because we need to make sure that the acquisition cost of a customer is always lower than their lifetime value. Furthermore, if we spend more money in keeping them in retention costs, then we need to make sure that if those retention costs are added to that acquisition cost, we still stay below their customer lifetime value. Because in the end, the total value of a company is ultimately determined by the sum of all the individual customer lifetime values of their customers. For those of you that aren't acquainted with marketing lingo, there is also the issue of churn. A churned customer is a customer that stopped using our product. So once a customer is churned, we can safely assume that they won't come back unless we reacquire them. And why do we actually want to use it? Well, customer LTV is not an aggregate metric. And for most companies, different customers generate different amounts of revenue. Sometimes those differences can span orders of magnitude, as we'll see further down the line. And since not all customers are equally profitable, while we might be very happy spending five euros on a customer that generates 50 euros in revenue for us, we'd be less happy in spending five euros to acquire a customer that would only spend one euro on our website. We're also interested in targeting, generally, the best customers because the cost of retention is generally way lower than the cost of acquisition. And another issue that we need to deal with is the L, the lifetime in LTV. Now, quite often, it makes more sense to not worry about the amount of revenue that customers will be generating 10 years down the line, unless you're a bank and you're selling mortgages. But mostly, think about the revenue they will generate in the near future and try to predict that near future as well as possible. So we can align our LTV definition with marketing cycles or quarterly budget plannings and so on and so forth. There's two main business contexts under which modern online businesses operate. One of them is a contractual setting, so for instance, something like Spotify or Netflix where users pay a monthly fee that generally comes on the say day of the month to access some services. And the other one is a non-contractual context. So users will show up on an online retailer's website whenever they have a need to buy stuff. And this presentation will mostly deal with non-contractual settings. So our first attempt at predicting customer lifetime value was made by Fader, the guy who formulated the definition from earlier on, who managed to predict customer lifetime value using only three features. Recency, how recently a customer was on the website, their frequency, how many purchases they made in a finite amount of time, and their monetary value, the total amount of revenue that they generated on our website. And they started by looking at the user journeys. For instance, the first user has probably churned. Why? Because they had a quick succession of purchases on our website and they haven't purchased in a long time. The second user is a bit more of a seasonal buyer, so they're most likely not churned since their last purchase was relatively close to the present day. And how do we incorporate all this information? And how do we actually manage to extrapolate this for users that are harder to guess like the ones below? So we start with two assumptions. We assume that the act of purchasing is a queue-like process where the distance in between consecutive purchases is described by a Poisson distribution. And the second assumption we make is that the probability of a user still being active, not churned, decays exponentially from the time of their last purchase. So if I purchased yesterday, the odds that I'm still active on your website are pretty high. If I purchased last three years ago, the odds that I am churned are quite high. And from this, we can get the probability of a user being alive and the expected number of transactions that they will make within a holdout period. We can add the monetary value information that we have to that and come up with LTV estimates. Now, the authors of that paper applied their algorithm on a data set from this company that's now defunct CD now. And they got pretty good distribution estimates. However, this is tricky. It's relatively easy to fit a random distribution and call it a predictor. When they plotted a scatterplot of their predicted versus actual values, the picture is a bit more sobering. You can see that we massively over-predict some low-value users and massively under-predict some high-value users. And this can be quite problematic, given that high-value users really generate the bulk of revenue on most websites. However, they predict better than random since their correlations are actually quite good. So why do we care about those high-value users? We all know about the 80-20 rule, where the top percentile, the top x-percentile of people can generate the bulk of the revenue. And those are two independent data sets, one's from that defunct company you see now and one is from a recent Kaggle competition about predicting Google Store revenues. And we see that the Google Store revenue is actually more skewed, so more high-value users generate more of the revenue than low-value users. So if only one customer would generate all the revenue, then their plot would be a blue rectangle. And what generates this, it's that Pareto, or more likely Pareto, like not an exact Pareto distribution of customer revenue, which can be quite skewed. And how do we deal with this? How do we deal with every hard problem? We just break it down into more accessible subdomains, and that should make us think of trees. And indeed, trees are probably the best way of targeting this problem, but instead of using one tree, as Dr. Elder said before, why not use several trees and have a Meage vote on this? So Groupon published a paper in 2016 where they took a two-stage random force network model. And at first, they segmented their users into one-time users, power users, et cetera. And for each of those segments, they predicted the number of customers that would come back. So for each customer, they would predict whether they would come back or not. And for those that would come back, they decided to predict their lifetime value. There was 40 handmade features. The most predictive one was the email engagement score. And in the end, they wanted to obtain a picture like that where they would see the predicted purchase value for the next quarter for a customer versus the previous purchase value for a customer and see if users are gaining value using value or are at risk of churning and targeting marketing communications accordingly. Their results were quite good. And the thing I want you to see from that not very presentation friendly table is basically that their errors are going down the more active their users are, which intuitively makes sense because the more time a user would spend on a website, the more they would establish a pattern of usage, the easier their behavior becomes to predict. ASOS followed up on this work. And again, Dr. Elder stole some thunder from my presentation where he talked about leaking data from the future. ASOS actually has that diagram in the middle that beautifully shows the way that they create a model that predicts the future. So they have a present time slice. They create the labels for that. They go backwards in time to create the features. And then they have their model predict features created on the most recently available data to predict the features. They also tried to replace their churn model with their neural network. But even though their data hinted that had they kept adding neurons, they would have beat the random forest in terms of area under the receiver operating characteristic curve. The cost of adding all those neurons and training those models would just not have been worth it. So in conclusion, the revenue distribution in most cases makes us trying to predict it not particularly easy. And if you want a good baseline, try Pareto MBD. Random forest models will probably be your best choice. And neural networks can improve predictions, but it comes at a significant cost. All the papers are available online. There is libraries for the Pareto MBD as well. And thank you. Thanks very much, Eric. Excellent illustration of some of the concepts we heard from, as you said today, Dr. Elder and others. How do you manage all the data that comes in from all these online transactions? Is there a large data source that you need to curate and manage in Ding? It's large enough. I guess we just need to be careful to have it as well labeled as possible. And also, we can subsample it when it becomes overwhelming. Yeah, yeah. Very good. Sounds like you've got a good handle on churn. Excellent. Lifetime value is a very important concept. Thank you. Thank you very much, Eric. Excellent. Great.