 Hello everyone and welcome to another episode of Code Emporium where we are now going to talk about A.B. testing concepts. So A.B. testing is kind of a concept that is largely overlooked in the job of data science description. You know, when people think of data science, they think all we do is modeling, but we actually just don't do modeling all the time, even though sometimes we would like to. Testing is super important and I thought in this video, I could impart some knowledge to you about what A.B. testing actually entails in the industry. So whether you are trying to apply for a job or you're already a data scientist in the industry and want to learn more, or if you're just super curious about A.B. testing in general, then this video is for you. And so in this video, we are going to talk about five data science concepts that you should probably know. Before we get started, please do give this video a like because if you do give it a like, it'll help me spread this video out to other people who are just like you and just want to learn about testing too. More likes means more impressions, which means more views, which means more knowledge shares, which is great for everyone. And also if you like content like these, please do subscribe and we have a Discord server right in the description below. So do check it out after this video. Join the community of geeks because we're going to have a lot of fun here. Starting hot at concept number one is A.B. testing itself. That's right. So I think the best way to explain A.B. testing is to actually start with an example. So let's say that you're running an e-commerce platform and business looks pretty good. You get a lot of purchases on your site, which is amazing. But then you start, you know, doing some data analysis and you see that a lot of the time people add to a cart, but they don't actually complete the purchase. And you think there's a lot of money that is probably just going out the window because of this. And you would be right in thinking so. But how do we actually solve this problem? So one of your buddies, like one of your coworkers over there just says, hey, why don't we just send emails out to people who have added to their cart and just not completed their purchases? And maybe we can give them like a lead time of like three days. So if they haven't completed a purchase in three days, then we'll be sending them emails saying, hey, you have something in your car. Why don't you like finish that transaction because I think you need this product. Now they could have just forgotten this product. So they put it in a car and they just forgot it. Or they might just genuinely have lost interest in the product itself. But you don't know. And that's why we are conducting a test here. As for particulars on how you would kind of conduct this test at a very high level is you have, let's say millions of users or something. And then you'll only take a subset of these users and use them for the test. So you take these users, split them into two groups, one group that gets no email at all, and the other group that actually does get the emails. And then we start sending the emails out to this one group and then compare their purchase conversions over time. If purchase conversion of the group that receives email is indeed greater than that, that doesn't receive emails. Well, you know, you kind of have your win there. And so you're able to close in an opportunity by probably just sending out emails, maybe, you know, periodically or just, you know, a couple of days after a person has added to card and maybe just left it there. This kind of A.V. test is actually known as a holdout test. And it is super common in e-commerce platforms and is also pretty widely successful too. It's a holdout test because we are holding out from sending emails to a group of users, because one of these groups actually doesn't even receive emails. It's a very common test, pretty easy to conduct and also can drive very high value to your company. This is now a good segue for concept number two, SUTFA. Now, SUTFA stands for Stable Unit Treatment Value Assumption. It's basically the fundamental assumption that needs to hold the true when you are conducting an A.V. test. Reading off the Wikipedia page here, it states that the potential outcome or observation on one unit should be unaffected by the particular assignment of treatments to the other units. What this is essentially saying is that, let's say, for example, I am a user in an A.V. test and then I have a buddy Sam, who is also a user in an A.V. test and both of us receive emails. My probability of actually making a purchase should be completely independent of whether or not Sam decides to also make a purchase. In email campaigns where users are typically just randomly selected in a holdout test, this is typically not of concern too much because it generally is just assumed to be true, but depending on your problem, the situation can be different. And you might want to make sure that this assumption does hold true. If it doesn't hold true, that means that the output of your test just may be bias either in favor of one group or the other. And if it's bias, that means that the results might be inaccurate or just like not predictable. So you want to make sure that this assumption is true before conducting your test. Let's go into some more details on topic number three, which is sampling and distributions. A question that you might ask here is, why don't we just send the email to all of our users? And why are we only sending it to a few subsamples of them? Well, let's say that we have like millions of users on our platform. And let's say we do send an email. What if that email just has no effect? Or even worse, what if the email had a negative effect on users? If we were to send it to such a large scale of users, then that means we would observe that negative effect like many times over, which is catastrophic in some cases, depending on the situation and the problem. And so what we would do is we would try to get a subset of the total number of users that is representative of all users. And this subset is what we call a sample. And the total users is the population. So essentially we are trying to get a good sample that represents the population. Another term that we've thrown around is the term distribution, which is essentially a curve of probabilities of these users or of these samples. So sometimes data can be modeled as a normal distribution or a log normal distribution or some other many types of distributions that are out there. Since the population is typically large, we don't know its distribution. And so we would take a subset of users and analyze their distribution. So that's the samples distribution and using the samples distribution, we would conduct experiments, make verdicts, make decisions. And the idea here is that the samples distribution is representative of the total population's distribution. And so we're able to make appropriate decisions that are representative of the millions of users on your platform. So it's typically these samples and sampling distributions that are used in 80 tests coming on to our next topic of interest, which is concept number four, and that is hypothesis testing. Hypothesis testing is essentially a framework of implementing an A.B. test. So it's a type of A.B. test where we have a hypothesis and we try to prove it with sampling data. In our current e-commerce setup, our hypothesis would be something like sending emails does not affect purchase conversion. The way that we would typically implement this is we would now take a subsample of our population, which is a small group. We split it into two groups. Then we now have a control group, which does not receive any emails. And then we have a test group, which actually does receive those emails. And so when we conduct an experiment and try to monitor purchase conversion for these two groups, we want to see, we want to actually prove or disprove the initial hypothesis. In the end, we want to make a verdict to see how ridiculous or plausible our initial hypothesis was. And when we make that verdict, that leads to an actionable decision. A lot of the interview questions I see typically revolve around hypothesis testing. So I do think it's super important and super useful to know. Coming on to topic number five, and this is our final topic here, and it is Bayesian testing like hypothesis testing. Bayesian testing is another framework of implementing an A, B test. Bayesian testing relies on the famous Bayes rule. Let's say in this formulation, P is a probability distribution. A is purchase conversion. And B is just the sampled and observed data that we're seeing when conducting the experiment of comparing users who receive emails and users who don't receive emails. So the way that you would kind of break this formulation down is that we have some prior knowledge of how purchase conversion is for users who don't receive emails. We then conduct the experiment and then update our beliefs on how users behave if they were to receive emails. And so we have a prior and a posterior, posterior meaning after having some knowledge or after seeing the data prior is before seeing the data. If you kind of understand these symbols in their inherent meaning and how they're actually used as like prior and posterior, I think it becomes easier to memorize this formulation and you don't have to spend too much time on it. Bayesian testing is something I increasingly use in my work, and it is super fun to know and also very useful. Although I do expect only like fundamental questions and foundational questions to be asked during an actual interview. And a lot of the interview questions are typically hinged on hypothesis testing, since that's more widely adopted in academia. And I guess that's what students learn in general to begin with. If you're interested in getting more details on Bayesian testing, I have a couple of videos that I've walked through examples with real data on how to conduct Bayesian tests for, you know, actual Kaggle data. And I highly recommend you check that out. And that's all I have for you today. So thank you all so much for watching until the end. Please do give this video a like to spread this word around. Please do subscribe if you like content like this. Please join us on Discord because we're going to be having so many chats and conversations and I want to get to know you and you want to get to know me. It's going to be super fun and I will see you next time. Bye.