 Hi, I am Subashish and it is a pleasure to be here for my talk, which really to me is a brief field guide to this very vast discipline that causality and causal inferences. So, you see causal questions have fascinated and puzzled us humans in equal measures for maybe over two millennias and beyond. Right from initial work done by Grand Rondmann Aristotle to in some 300 plus BC in ancient Greece to work happening at the frontiers of statistics and machine learning as we speak. So, as I am on my own journey through this rabbit hole, my humble attempt in today's session will really be to give you a you know, whiff of this discipline, hopefully enough to wet your appetite for further self-exploration. By the way, in case you are wondering why do we have this you know curious cat here, I promise to get to that in the next couple of slides. Quick question to get started with the session, hopefully not too obtrusive for the audience here. How many here are really fond of chocolates, a quick raise of hands? All right, we are all in good company in that case. So, it will be for as very interesting for you, it was very interesting for me at least to note that chocolate consumption has this very strong interlinkages with well, Nobel Prize winning chances beat that right and this is basis a study that came out in the New England Journal of Medicine in 2012. What they really do is they plot the chocolate consumption to the x axis and this is by 22 countries and they plot Nobel Prize winners are 10 million to the y and you note that it almost seems that as chocolate consumption increases, Nobel Prize winners also increase it is almost a linear pattern with a correlation coefficient of 0.8 jokes apart that is pretty strong as a linear correlation right. Now, how many in this room really strongly believe that chocolate consumption can incrementally influence your chances of Nobel Prize winning? That few okay, we have a bunch full of skeptics here good for science, but why not? Maybe chocolate consumption gets my creative juices flowing gets me intellectually stimulated, but will not not convincing enough okay, it seems a skeptical audience again. So, if there is a fallacy lurking behind the scenes right, what do you think is happening? Could anybody want to take a crack on that? Shea, please, please sir, okay, okay, very nice, very nice. All right, we have one shot here and go on. Okay, so what is perhaps really happening is there is a bunch of factors that could affect both of these variables. One obvious one is the economic state of the nation concern. So, more well off countries, citizens there might tend to partake in consumption of more luxury goods like chocolates. They also tend to have better infrastructure investment into education, investment to scientific R&D which may produce more Nobel laureates and you know hence this lurking factor influencing both gives this very interesting strange results, but sorry to disappoint the chocoholics consuming more chocolate won't get you a Nobel Prize. It's probably correlated, it's not causation here. Now this in fact is one major reason, a lurking third factor that affects two variables that leads to such cases of what we call as spewier's correlation and I'm sure you remember your stats 101 professor telling you that correlation is not always equal to causation and you see that is one of the reasons why you get to see such iffy results. Well, if correlation is not equal to causation then what is causation and how do we get to causal effects. Now I did not want to come up with a stiff academic definition of sort, but rather wanted to motivate that using a couple of common causal questions and I'll focus on one here. Now notice what it's trying to get to, it's trying to answer what is the effect of an intervention, the introduction of a mobile app on a metric of interest which is churn for an e-tailer and this is the general anatomy of any causal question. You have an intervention or a treatment as we would call it in causal analysis and you're trying to find out the effect of that on an end metric of interest. If you notice the questions all along that's the generic framework that they always follow. The other thing and that's slightly interded so forgive me for playing the devil's advocate here. You'd notice that these questions come from various walks of life, you have questions from public policy, from marketing, medical sciences, epidemiology, economics and so on and that is because causal questions are really ubiquitous and I'm joking if you scrutinize carefully causal questions are lurking in nook and crannies of your life and then burst out at you. So wrapping our heads around causal thinking is effectively beyond, it will help you beyond this analytics in your daily life as well but in terms of analytics why are we really concerned about causality and causal inference, right? This is something that many of you may be familiar with, this is standard analytics maturity model that Gardner came out with and it shows the various complexities of different types of analysis and the value that they can unlock. So when you start over they do something very basic like descriptive analytics and so on and working solely on correlation actually will do just fine even up to the point where you work with predictive analytics. You see standard machine learning algorithms do not differentiate between causation and correlation yet you're getting fantastic results, right? The point being that when you're moving to the final, final frontier of prescriptive analytics where you're really trying to prescribe what would be, what can be prescribed to move the business needle, what should be your prescription, recommendation. That's where there's a strong feedback that should come in from diagnostic analytics which really is the causal component of it. It really tells you why is something happening and what is the extent to what it's happening. Now to concretize, assume there is this causal inferencing framework and you're running a marketing campaign and you kind of understand through that that email notifications kind lead to lift and sales, right? So once you have this wisdom in place, this causal wisdom in place, you can actually action on it. You can recommend to lever email notifications to drive up your sales and that's the power of causality. Well, I'm hoping that we have given a broad brush introduction in a nutshell of causality and why do we need causality and so on and so forth. Now causal inference is really what gets you to the deduction of this causal effects. There are more than one analytical frameworks that are possible to do causal effects. We'll be focusing on just one today called the potential outcome model or the PO model. Like any statistical framework, it is based on a bunch of assumptions and we'll be focusing on one particular one today. More such assumptions are met, the better your chances of getting to more robust results. Now it's always good to put faces to name and this gentleman that you see here, forgive the pixelation and so are the pioneers of potential outcome model starting from Jersey Neumann 1920s, that's when this entire thing began. It died down a bit, it was reinvigorated in the 70s by Donald Rubin and in fact this model is alternately called Rubin's causal model as well in honor of Mr. Rubin and then again his associates Paul Rosenbaum and Paul Holland kind of took up the mantle. Well, to illustrate the basic essence of the potential outcome model, let's take a relatable example, nothing esoteric, nothing fancy. We have this person who's suffering from headache and he's contemplating on popping an aspirin to get relief. So in this instance, his treatment has really two options, either he pops in the aspirin that's what he gets treated or he does not pop in the aspirin, he does not go to the treatment. Corresponding to each such treatment options, there are hypothetical potential outcomes possible. So what really happens if he takes an aspirin, either he could get a relief or not get a relief. Likewise for not popping an aspirin, realized outcomes can only be either you get a relief that is your headaches gone within the next two hours maybe or not or essentially not get a relief, your headaches persist beyond the next two hours. He pops in the pill, what we notice is this, that the headaches gone, would our conclusion forget causality logical this right, would our conclusion be that aspirin has caused this remission? How many say yes, raise of hands? How many say no, divided house, all right. So proper causal thinking would probably tell you that we really can't say. And this is where the count of actual thinking comes into place, the count of actual in this case is what would have happened had he not popped in the aspirin. Now had he not popped in the aspirin and he had not experienced relief, vis-a-vis he had popped in the aspirin and experienced relief, then there is a difference in outcomes right, and that's when you can possibly say there is a causal effect of taking an aspirin. But herein lies the classic fundamental challenge of causality. You cannot have a person in two states simultaneously in the same time. This is not Mr. Schrodinger's cad, you cannot be dead in life. Likewise you cannot have a person pop and not pop the pill at the same time. And this is neatly illustrated in this case, I don't know if it's visible but what we are really trying to show is we are decomposing the observed outcome. And the treatment option is either 0 or 1, right, we spoke of it. So if you just plug in either of them, you'd realize there's one outcome. The other outcome will be unobserved, it's latent. You cannot observe it, you can just get one outcome, right. So we are stuck. Seems that getting into causal reads at an individual level at least is an impossible business, right. What do we do? We do what we always do in statistics. We expand the net and we think about a sample, right. So we move from the individual effects to a sample. So we look at over a sample and maybe try to get to an average read over the sample. And what would that require us to do? Rather than complicated maths, I want to do this graphically. So what we need is this sample of folks who are suffering from headache. Have them pop in an aspirin, have them also not pop in an aspirin at the same time, and observe the percentage experience of relief in each of the scenario. Take the difference of that and that will be a causal effect. But wow, that is again impossible. Remember the fundamental challenge of causality. This cannot happen simultaneously at the same time. What happens in practical, real, realistic world is something like this. There'll be a bunch of folks who pop in the aspirin. There'll be a bunch of folks who do not. You get to the rates of relief in each and get to the differences. But the problem is you cannot really call that as causal. And why is that so? Because these two sets that you notice are not composing of the same set of people. They are different, heterogeneous. They could differ in terms of age, in gender, medical conditions and so. So despite the fact that if you observe any difference in the rates of transmission, who knows if it is due to administering aspirin or the pre-existing difference in effects. The one thing I wanted to drive home here is that when you're trying to get to any kind of causal effects, it'll always kind of be a comparison between, in this case, the treated and the non-treated set. They'll go by different, this goes by different nomenclature. It could say exposed, non-exposed, treated, non-treated, test control and so on. But it will basically be that. And keep this in mind because this will be effective in the next couple of slides. So we're back to square one. Seems that we really can't get to causal effects. Well, not so fast. There is something called randomized control trial or something that we call popular in the industry as A-B testing. What A-B testing again does is it takes the set of folks and randomly splits them into two parts. Now when it does this random splitting or randomization, what it really ensures is the set of folks in this two subsets, they're roughly similar. They're pretty comparable between themselves in terms of the overall characteristics. Which means that now, if you take the percentage experiences relief and kind of come up with a difference, you can sort of come close to what would be an ideal state. And A-B testing in the truest sense are really the goal standards when it comes to getting to causal leads. But, but, there's a big but. A-B testing is not always possible. Many reasons why? First of all due to practical considerations. At that time, you know, you as a poor analyst will be tasked with analyzing something for which A-B testing wasn't done. It's a retrospective study. You cannot turn back time, you don't have a time machine. And you have to work with non-experimental data. At times, there'll be multiple A-B tests running over the population, which will make it very difficult to tease out the effect of a particular A-B test. At times, cohorting can be non-random. So, in this case here, we have assumed that the cohorting is perfectly random. It randomizes, so there's a homogeneity between these two sets. Things can go extremely oriented. There have been cases where things have been. More importantly though, A-B testing may be undesirable at times. You cannot randomly assign people to smoke and not smoke to treat out whether they can get cancer or not. That's plain unethical. In practical terms, I was reading this recent article by the head of data science of Coursera, I think, Alina Glasberg-Sanz. And she made this very cogent point. So, Coursera has this certificate that I mean, essentially, if you need to get certificates from them, you need to purchase them, right? You need to pay for that kind of. So, Coursera as a policy does not do any kind of A-B test for this pricing. They actually rely on quasi-experimental methods or observation methods. You'll get to cover some of them later and try to understand the effect of price differences and what could be the optimal prices to set. Long story short, at this point, at many instance, you, my dear analyst, will be left with just observational data. And observational data has its own set of pesky challenges. The one prominent challenge is called confounding. Now, before getting into any technical artifacts, what does the word confounding conjure up in your mind? Just plain English. Anybody wants to take a stab at that? Right? Is it again? Confusing, okay, okay. All right, all right. Yes, confusing, befattling, and offer skating, and so on. Something that makes things fuzzy. And confounding here is exactly that. So, let me take an example. Assume that we are trying to find out the effect of exposure on an online ad for a brand on purchasing the brand online. Now, there's this theory and it's pretty proved that the more a person is online, right, more active he is online, the more likely he is to get exposed to that, and also buy the product online, right? So again, you see there's this one variable influencing both the treatment and the outcome. So even if you want to just tease out or kind of find out what is the effect, sole effect of digital ad exposure on buying online, you really can't. There's an offer skating happening. By the way, we have covered confounders before. In fact, the one that we spoke of in terms of Nobel Prize winning, the economic state of the country was bit of a confounder here. There's this other example in terms of medical sciences. Try to, it's the same thing, it reinforces the same thing really, where let's say that you come out with an experimental treatment. What really may happen is folks who tried the treatment also are the most sicker ones. So the state of the sickness may affect both trying out the treatment, unfortunately also the mortality rate. So again, it's difficult to tease out the effect of trying out the treatment on mortality. There's a confounder here in terms of sickness. So synopsizing what we have thus far, EB testing are the gold standard. Though EB testing is not always possible, randomized control trial is not always possible. It will be left with observation data at times. Observational data has this principal problem of confounders, the confounding, which really means there's some pre-existing difference between the treated and the non-treated side. I think you get the drift now. Our principal task in terms of causal inferencing, and that's one crucial thing that we always have to do, is to level the field. What we have to do is take off this pre-existing difference, mitigate them as much as possible, so as to restore a sense of balance between the treated and the non-treated group, and thus making them extremely comparable. That's what, you know, there's a main funder behind causal inference. Now, this is what is encapsulated by the major causal assumptions. I just have time to cover one, perhaps DD major one, which is called exchangeability, and I'll hug back to the previous example that we talked about. So assume that via research, you get to know that it is, you know, users who serve beyond 10 hours a week, they are more susceptible to viewing the digital ad. So what you can do, now, again, getting back, if you need to get to any kind of causal effects, you need to compare between the treated, in this case the exposed customers and the non-exposed customers, right? Now, what you can really do is keep only two customers who have been online more than 10 hours in a week. So effectively what you're doing is, you're leveling the field, you're controlling for the differences, you're taking away that source of pre-treatment bias, right? And that is basically what exchangeability kind of halves on. It tells you that provided you know what the confounder is, in this case, assume we know this is the confounder, you condition it, and you condition it in such a manner that the treated and the non-treated set, they're exchangeable. There's no difference, pre-existing difference between both the sets. And that's why we call this as exchangeability, right? One quick thing is that here we have assumed that this is the only confounder, may not be the case. Thing is it could be many things, it could be gender, geographical location, propensity for the brand, and so on and so forth. Now, the more exhaustive and the more covered your list of confounders are, the more your chances that you will possibly be able to satisfy such assumptions and get to more robust causal reads. Now, such causal principles are kind of brought to life by different techniques to tease out causal effects. Whole bunch of them, much even beyond what you see here in this slide, I will be focusing though on two of the slightly connected techniques called COVID-based matching and propensity-based matching. And rather than talking more about in theory, I'll try to cover them through a simulated case study. Hopefully that makes it slightly more clear. Try to define the problem first. So there's this retailer who has come out with the same-day delivery plan, which is, you order something, food, grocery, et cetera, it'll be delivered to your doorsteps in the same day. Now, the business hypothesis is that customers who subscribe to this program because of the superior customer experience they get, they will tend to become more engaged, more loyal to the retailer. And then the aftermath of subscription will tend to make more purchases. That's the manifestation of the loyalty that way, right? Big question is, does this play out and then what is the magnitude of this effect? Now, assume that this program has been launched at month M and we are at month M plus six for the moment. So it was launched in Feb, we are in August. And what we start off with investigating, of course, is an M will be a kind of point of reference always through this discussion, the month of joining. So what we start off with, who are the customers who subscribed to this program in that given month? There are 5,000 or so customers, in fact, who have subscribed to the program in the given month. Now, what we also obviously want to see is the key question, remember, is are they more engaged in the aftermath? And when we say more engaged, it's always a relative question. Are they more engaged with respect to customers who did not subscribe? And engagement here is something we define in terms of certain key retail behavior metrics. There are three that we'll use here. One is total number of transactions made in this six month phase. The second is spend per transaction. The third is number of items bought per transaction. So we have these three metrics in the post six month phase for the converted customer set. Now, to get to any kind of causal effects, we know by this time that we need to compare between the converters and the non-converters, which means that we have a set of customers who did not convert 100,000 of them, a random set from the overall population of non-converters, and similar metrics for them in the post six months period. We also expect that it may be very likely that there are pre-extinct differences between the customers who convert and not convert. That is why we also have the similar three variables, transactions, spends, and items in the previous six months before the, you know, this program was launched. Overall then, we have a data set with about 5,000 converters and 100,000 non-converters and over this eight variables, right? What we notice when we drill a bit deeper and trying to understand the difference between the set of converters and non-converters is interesting. So, this is what we kind of see. This is, so converters are those who subscribe to this program in a given month. Yeah, okay. All right, so this is the density plot, the fancy histogram of sort if you may, and the one that you see here is for total number of transactions. The graph to green is for the subscribers or the converters. The graph to pink issue is for non-converters. You'd note that on average, the converters make more transactions than the non-converters. In the last six months, I mean, this is pre-six month phase before this program was launched entirely. They made about 35 transactions, whereas customers who did not subscribe to the program may have made about 30 transactions. The same story plays out for spend per transactions. Customers who convert it tend to spend more than customers who have not converted, which means there already is a pre-existing bias of sort, right? Subscribers typically are among the, already are among the better customers. We're saying hello to confounding again, right? You know, the more engaged customers are more likely to subscribe to same day shopping. They're also more likely to buy more in the aftermath, right? So again, it will become very difficult to disentangle the effect of this program on incremental purchases. If we do not control for this differences, the pre-existing differences that we spoke above, and this pre-existing differences in, you know, engagement is manifested by the levels of the three variables that we spoke above, right? Transaction, spend, and items per transaction. Can I take that data after this, with a bit of difference curve? Okay, so this is, the crux of it is that, how really do we kind of get to the effects of this program on incremental behavior, incremental purchase behavior, right? This is what kind of COVID mismatching can help you do. So COVID rates are a statistical term for what we nowadays call as features in machine learning. And the way it works is something like this. So assume we have this one customer who converted or subscribed in a given month. These are his salient, you know, behavioral characteristics, the three variables that we have spoken up of in the previous six month phase. And assume for now we don't have 100,000 non-converters, but three non-converters, again they're salient behavioral characteristics in the previous six months. Now, if we had to control for engagedness in the pre-six month phase, what we need to do is find out a customer who is very similar to this person in terms of this three metrics, right? That we would have controlled for the difference that already exists. One way to do this is simple. You calculate the distances between this customer and the other customer. Here we have taken a very simple Euclidean distance in this instance. And what you do is follow a greedy procedure and find out the closest clone of this customer, the nearest neighbor of this customer, right? So what happens then, you see these customers, one from a converter set, one from the non-converter set, they're very close now in terms of these three metrics. They're almost exchangeable. We have actually satisfied the exchangeability. One can swap for the other. We have also controlled for this pre-existing differences in effects. Now, some quick pointers on the fundamental nuances before we get in the depths of this. One, I think you realize that this is based a lot on the measure of distance. We talked about calculating Euclidean. Not a great thing to do always. There could be instances where your variables could be different measures. It could be in dollars. It could one could be in inches. It could be height. What could be something else? Best to use something that kind of normalizes for that. What's used in literature and in typical industry is Mohan-Lawish distance, which really scales the distance by the variance covariance matrix. In fact, better, what's also used at times is the robust variant of M distance. It's called robust Mohan-Lawish, which is a bit more resistant to outliers. The other thing is calipers. This actually is caliper. We talked about finding out distances between a customer in the converted set and the non-converted set. But what if the distance is too wide? That's not a good thing, right? Would you really want to consider such a customer in your analysis because the matching isn't good? Calipers essentially is a distance. What it really says is if the distance between this matched pair, if it's beyond a certain limit, do not include it for analysis. Now, whether you want to include it or not is a very subjective question. Come to think about it. If you have to leave that out, you'll probably have more accurate results, right? So you have less of bias that way. But if you leave that out, you will have a lesser sample. So it's a classic bias variance, kind of a trade-off that you need to take into consideration. The other thing is we spoke off of one-to-one matching. There's nothing stopping you from doing one-to-many. In fact, if you do one-to-many, you'll have more samples. Your variance is reduced, but it's a flip now. If you move from the nearest neighbor to the second to the third, your accuracy of the match is slightly reduced. So again, a classic, you know, bias variance kind of a trade-off. Okay, so we spoke of the fact, and we have kept on speaking about the fact that the fundamental notion is there is a treated and a non-treated or an exposed non-exp... Let's call in this case a converter, a non-converter set, and there are pre-existing differences. So there is a fundamental disharmony. There's fundamentally an imbalance between both the sets, and we need to mitigate that. Then the first step for us is really to assess what is the extent of this imbalance. So let's take an example. Assume that we are trying to find out what is the extent of imbalance between transactions for subscribers and transactions for the non-subscribers. And a classic way to do that would be to do hypothesis testing that we all familiar with basic inference theory, right? You test for the equality of means, and so on. The only thing is that, typically, the large sample sizes that such studies involve, courtesy the large control group, typically the large group of non-subscribers, would often mean that even small differences in means can get very amplified. And you may end up rejecting the non-hypothesis, even if that is the truth. So a type one kind of error can happen very rampantly. So what's really done in this case is we resort to something of a more effective, but also a great measure, which is called standardized mean difference. Standard mean difference does something quite simple. It looks at these two groups, again, transactions for subscribers, transactions for non-subscribers. You calculate the difference between both, and you divide there by the square root of the variance. So you're standardizing. The other thing is you'd notice that this is bereft of any artifact of the sample size, which typically hypothesis test would involve the test statistic there. So it doesn't suffer from that defect. General practice, there's the absolute value of this is taking, the modulus of this. So obviously if this is zero, that means it's kind of a perfect balance. There's some rough rules of thumb from field studies and experimental results. So typically if it's less than 0.1, you're good, you're cool, don't worry about it. If it's more than 0.2, this metric, it's then that you start worrying. Things are not in a very good shape. And in fact, in our instance, if you look at those three metrics that we are concerned about, the SMDs actually are quite high. They are always higher than 0.6 and go up to one for transaction. So it's a bad state. So now we have assessed the fact that there is a severe amount of imbalance in these three metrics of interest. There is a preexisting difference. Now matching can be used to mitigate that. Assume that we have implemented matching and then what really happens? So then what really happens is interesting. The SMDs actually fall to between 0.02 to 0.03 for each of these variables. This is a drastic fall from where it was at in a pre-matching state to where it's at now. This is actually the density plot for transactions and you note now there's a good amount of overlap between the treated and the non-treated set. So it's fair to note now that we have brought in a semblance of harmony. We've almost kind of taken away that element of pre-treatment differences. And what we have now is a clone customer for each customer in the subscribe set. So we have, that way, 5,000 pairs of customers. We have, like I told you, we have 5,000 subscribers and we have one clone from the non-subscribers set for each of them. So we have a pair of 5,000 that way. Assume that what we're interested in is finding out does transaction get affected? Do we notice any tangible change in engagement in terms of number of transactions in the post six months of subscription? Because that's the thing that we're concerned about. What happens in the aftermath of that subscription program? We can do something simple. You can just tabulate the transactions for each customer in the set and calculate the differences. The easiest thing to do would be to kind of take an average of that, which is something we have done. The average turns out to be about bit over two transactions. So two transactions, incremental two transactions for the set which subscribed over the set which did not subscribe over six months, which means about four transactions over here. And if this expanded over a wide customer base, this has quite a bit of an effect. It's a quite a bit number of transactions. You could also do significance testing. Just take that with a pinch of salt. P values have taken a lot of flak off late and so on, legitimately so. And it turns out to be quite significant if you do a simple P testing on the differences. We also do the same thing for spend per transaction. The interesting thing that we notice there's a very small delta for spend per transaction. So what's probably happening is customers are more engaged. As in they're making more transactions but they're not spending more money per transaction. But eventually that still meets more engagement. Overall, your overall spends are increasing given the fact that your transactions are increasing and so on. Now all that is fine. What creates a problem for COVID is matching is when your feature space explodes, right? When you're dealing with too many features, first of all, calculating or getting to such nearest numbers can be computationally complex. The second thing is even measures of distances are very fuzzy in such high dimensional spaces, the curse of dimensionality, right? So the thought is that what if we could rather, for now, what we had done was we were taking these vectors, these three dimensional vectors and calculate the distances between them. What if we could rather condense this into a scalar, just numbers and calculate simple differences. Then we wouldn't suffer from such effects, right? And this is the gist of something next that we're going into called propensity score matching. What propensity score matching does is really it predicts the chances for a customer to be part of either of this subscriber or non-subscriber, I mean essentially part of the subscribers group, right? So it produces a score for that. And how does it do that? So we have data for this, the customers that joined in the given month, right? So we have data whether customers have joined or not. So it's a one, zero variable of sort. You also have predictors, the behavioral variables of interest, right? What you can readily do when you see it lent itself very well to a classification kind of a modeling framework, you can use these predictors to actually predict the chances of a customer subscribing to the program. And you get a kind of a score, which is a scalar, which you can now use to find out customers who are kind of nearest to each other, right? We are avoiding calculating distances entirely. And this is what we kind of do. What I have used here is something very simple. I just used generalized linear models or binary logistic regression model. This is the same old data that you are by now kind of familiar with. As an output from the model, it spits out a score between zero and one for customers to subscribe. And what you can do is again, calculate just the simple differences, right? Between customers in the subscriber set and the non-subscriber set, again, get a close clone and the same process repeats. Practical tip, given that the probability scores are confined between zero and one, at times it becomes very difficult to find out close clones. The variance is very less, it's very closely clustered. So generally what happens in practices, you actually transform this score from a probability space to a logout space. Because the variance is more, it ranges from minus infinity to positive infinity and it's quite easy that way to find out close clones. It's cleaner that way. So we follow the same process. We do this exercise and in the aftermath of this, again, we notice that there's a sense of balance that's there in the data now. The SMDs are in the range of 0.02, 0.03, something similar to what we had noticed for covariate based matching. We also do the same analysis for understanding if there are treatment effects, that there's a higher engagement for this customer who's subscribed. And we've shown this here only for transactions. And you notice again, the delta is about two point something. So it kind of hugs back to what we had noticed for covariate based matching. Now, this is not the only instance where you can use matching. Matching has various use cases. It's very often used in advertising industry, especially for understanding the ROI for online campaigns. It's often used in multi-touch attribution models to understand the ROI for different digital investments. My favorite instance, though, is where Heineken had kind of used it to understand the effect of cooler replacements on stores which sell beers, to understand whether it leads to any kind of sales uplift for them. By the way, all references for these are given here. So these two feel free to go through and share. In terms of tooling, so the techniques that we spoke of primarily come from a statistical school of thought and no surprises that they're slightly better covered in R. Python has all that you need to kind of build up on it, but for now it's better covered. In R itself, I have primarily used match and match it, but there are more options that are available for you to do in R. I'd like to speak about one of them, which is called sensitive analysis, which I could not get to cover because of the time limit here, but what really happens is that once you have done with matching, so see, what we have assumed here is we have assumed that there's this set of confounders. There's three variables that we spoke of, but there could always be unknown unknowns, confounders that we aren't aware of, so unknown sources of bias, and what may happen as a result is, things could change drastically if you do not account for them. Your results, your magnitude of results, your direction of results can change. So what may be needed at times is to stress test your model, to understand how resistant your models are to such specification biases. That's what sensitive analysis is done. It's very important to kind of do it perhaps at the end of the analysis, so definitely to go through the materials when you can. By the way, like I said, I think I alluded to it. What we've covered in matching is definitely not all. There is much, much more to matching than just this, and there's this companion GitHub repository I've created with more materials, more documents, the source codes, et cetera, and data set for this. Share that with you, so please do go through. Hopefully that'll help you get started on your journey in matching. Now, quick thing. Does this mean that matching is the silver bullet to get to all our causal effects? No, absolutely not. And this is this very interesting case study that Facebook came out with. By the way, this is a quote by Cochran, who is one of this founding fathers of modern statistics. Now, this is very interesting study that Facebook did. What happened is they had this, about 14 online campaigns that they had run for which they actually had to use randomized control trials, which means they're using a gold standard. They have reads on the efficiency of the campaign using RCTs. What they wanted to do is use observational method and see how closely can this gold standard numbers be approximated. Now, Facebook had an unique and enormous wealth of data in terms of users. I think no surprise, you guys all know about that. And that's one key thing. The second key thing is, if you go through the paper, again, reference is given. They say that each of the campaigns could range users between 2 million to 14 million, which means that you have an enormous sample size, which gives you a tremendous power to analysis, right? And they used fair amount of sophisticated observational methods even beyond what we covered in the scope of the analysis. They used both the ones that we did, but beyond that as well. And despite that, what they saw is that in about 50% of the instances, observational methods overstate the efficiency by a factor of 3x. And why is that so? It's basically because of what we spoke of. There could be unknown unknowns. There were unobserved sources of bias that weren't accounted for in such large scale experiments. So what this is, is a sobering tale or what this tells us is that you need to be very sure about your data generating process. What is influencing? What, how is it influencing? And be sure about the structures. Well, that is a cautionary tale, but I really wanted to end up this session with something that I find incredibly hopeful, which is work happening at the confluence of causal inference and machine learning. And there has been a lot of attention of late on why to push the frontiers of deep learning and artificial intelligence. We need to teach machines to differentiate between cause and correlation and effect of it. Much of it has come due to, you know, there's a hug back to action by Judea Pearl, who is a big founder figure of the probabilistic school of thought in causal reasoning. This very famous book called Booker by, very interesting, you know, definitely worth going through. There already was work happening in bits and pockets on machine learning. So this is a group that Professor Vandalan leads in UC Berkeley that works on something called targeted learning, which is almost the machine learning we are getting into causal inferencing. Of late, if you have been following the major AI conferences, I Clearance, so you'd notice that along with ethics, governance, security, causality is one of the major themes that have emerged, right? In fact, in the recent IKEA conference that happened in New Orleans in May, something this year, there was this very interesting presentation by this very famous researcher, Leo Batu, who works in Facebook. And he proposed coming out of the system that kind of tells you what is the invariant property of a kind of overall system, which means that he's trying to get to measure that kind of calls out previous correlation and gets to causal effects. Beyond the scope of this discussion, but I've given the link to it, very interesting the way that they're doing it. And in general, what I think is we are headed to very interesting times in terms of causal inferencing and very exciting times ahead overall. Thank you for your time. Thanks, sir. We only have time for one question. All right, yes, sir. You can connect with Swash's offline. Hi, really nice talk. So in the example you quoted, what would have been the gold standard? I'm not clear because we were still doing A-B testing. It's just that A-B testing was in terms of time and not like, so what would have been the gold standard in that thing? A-B testing perhaps would have been that you actually allow some customers to subscribe and essentially randomize between customers and let such customers, send the customers through some subscriptions and keep off some other customers from subscriptions that way. So even in that, we would have to do the matching rate. Even if you're doing that, we have to do the matching. So what I got is that the flow was that gold standard is this. If you don't follow the gold standard, if you just have observation data, you would follow matching, which I don't think is entirely true. Even with the gold standard or whatever, you would have to do the matching to ensure that the groups are divergent, same, yeah, yeah. Is that correct? That's correct. If you can do every testing, you don't need to do matching at all. Did I get that question correct? So I think I was saying that even if you are- Can we take it offline because it can't be here because of the movement? We can take it offline. I can explain that to you, yeah. All right, I think, thank you.