 Alright, hello everyone, and welcome to episode number 100 of Code Emporium. I officially have 100 videos, including the shorts, the five shorts that I posted, so it's been a long way, and we finally hit 100. Woof, woof, woof, woof! Now, to commemorate, well, nothing crazy. We're gonna be doing a video on the importance of Bayesian testing. Thought of getting another A.B. testing video out there. It had been like two years almost, I think, since my last video on A.B. testing, and I thought, well, Bayesian testing is cool. Why not make a video on that? And here we are. So with this video, I'm gonna try to explain it, like, from the basics of, you know, like how we segregate data from control and treatment groups, and then segue into how Bayesian testing specifically will be super useful using a real dataset with a lot of code. So here we go. So right now, we're gonna be starting out with a kind of the experiment definition, like how you would in actually like if you were working in a company, right? So I have this A.B. test data from Kaggle, this dataset on Kaggle, which really doesn't come up with the description. So I'm just making one on my own, and then we're gonna work from there, right? So let's say that you're working in this company. It's like an e-commerce platform, and you have a home page, right? And now you and your software engineers and your team decide, hey, let's try to change it a little bit and see if we can amp up purchase conversion. So users are more likely to purchase some products. So you have now your control group, which is like the old webpage, and then you have a treatment group, which are the users who will get the new webpage. And typically in an experiment, we'll try to make sure that the control and test group users are segregated from each other. So users in the control will only receive the old webpage, and users in the treatment will only receive that new webpage that you design. And the metric that we kind of want to want to track here is purchase conversions. So purchase conversions, as I've like kind of mentioned up here, it's the number of conversions divided by the number of exposures or exposed users. And an exposed user is a user who sees your old webpage or your new webpage for the first time. So this is when they enter the experiment and they are bucketed as either test or control. And this happens during that experiment duration. And then of these users, how many of these users actually converted or how many of these users actually made a purchase? So let's say that in like during the experiment time, we have 10 users who were shown the old webpage of those users, three of them made a purchase within like seven days. Now that means that the purchase conversion rate is like 30%, right? Three on 10. And that's kind of what we're working with here. And that's the main metric that we're going to use to decide whether or not we decide to go with the webpage or don't go with the webpage. Now, in general, like general tips that I've kind of listed out here of like what you should be thinking about before even starting the test is how do you think that this experiment will fare? Well, my hunch here is that let's say that if we're only given, we've only changed the webpage and purchase conversion is a very downstream metric. So chances are you're not really going to move the needle too much on that just because you change a couple of things on your homepage. Now, the second question here is do we have any actionable results? That is like, let's say we do figure out that purchase conversion is indeed higher for the treatment group than it is for the control group. So should we go for it? Or what about situations where, you know, like our hunch states, we're not really going to see much of a difference. What do we do then? What is an acceptable tolerance that we are to say? Let's go for the new webpage. Let's go and let's not go for the new page or some other third avenue that you want to explore in between. So try to think of all of those contingencies, all of those scenarios and you would want to communicate them with, you know, your stakeholders who are kind of over viewing this test with you, right? Now, I've put this entire definition and this this entire experiment definition in terms of a story just on my own. I didn't really have this data like I mentioned before. It's just in the form of a Kaggle data set. But in reality, you would want to do all these steps that I just mentioned. So now for the purpose of this data set, though, we have a data set. We've already collected that data and performed an experiment. So what does that data look like? Well, I already showed you that link to the Kaggle data set that I showed you there. I'm just reading it in the form of a CSV and I'm printing out like two rows here. So one row of your of this data set corresponds to an exposed user. So this would mean like, for example, this user was was bucketed into the treatment group at this time stamp, right? And because they're bucketed into the treatment group, they were shown this new page that we created, right? And this timestamp so becomes like an exposure timestamp. And this converted is going to be a binary variable. It's going to be zero if well, everything is always initialized to zero because like, as soon as you're exposed, you haven't really made a purchase yet. So converted is zero. But if this user were to make a purchase within seven days of this time stamp of exposure, then we will bucket them as being, I mean, we will we will change this variable zero to become one so that they have been converted, right? And that's kind of what we do with this entire data set. And I've explained it in text here. If you want to read all the notes and everything that I'm kind of saying is also in the form of textual notes here. So if you can't follow along, you can replay the video and also follow along with the notebook that I will provide later. Yeah, so now just looking at some very big picture stats, we have like 290,000 users right now in this entire experiment. So the data that we're seeing is actually collected over the course of three weeks where we're comparing the old page, which is the control with the new page, which is the treatment and the control and treatment groups, they have been divided 5050. So half of the users will see the old page, half of the users will see the new page. Pretty simple enough, right to understand. Cool. Now let's go on. There are some like cleanup operations that we kind of needed to perform on this data. And I'm doing that right here in the data processing phase. So in this case, like I saw there is like these this user, for example, 746755, they've been bucketed in both. They've kind of seen like the new page and the old page. And that's not what we want to see. And it kind of looks like weird data here anyways. And so what we would one thing that we could do is, well, we either get the timestamp of their first exposure, or we just remove all these users altogether. Now, if you do do the first one, you kind of see like you have the user bucketing control, yet they get the new page. It still doesn't really make sense. And also when I looked at the data, I see that the number of users that have been bucketed in both testing control are like 3894 out of like the 300,000 almost that we're seeing. So that's like about a 1% of users that have been exposed to both the test and the control group. And I thought, well, if it's only 1%, it's not really going to hurt anybody. It looks like we have a lot of data anyway, so I'll just remove them and removing them. I did. That's what I kind of did with this little piece of code over here. Right. Now this little cell over here. So what I'm trying to do is like, you know, we're kind of given a shebang of like three or four weeks of data, right? But I kind of want to simulate a situation of how you would see it when you're constructing this test yourself. So it's not like you're going to see this test at like the end of three or four weeks and not even look at it during the test time because you need to monitor how this is going. So I put like weekly timestamps like after week one week two week three and week four so that you can see the status of your data and also see like what kind of results you would be seeing during that particular point of time in between the experiment. So I thought adding a week column so that we can kind of filter it out later on would be pretty useful. And hence it's added. And you'll see that so in week one we have 83,000 users that were exposed week two we had 91,000 week three we had 91,000 week four we have the 20,000 because it's not. I mean it kind of stops in the middle of week four. Cool. Cool. We're so we know this so well. Thank you for explaining it. You're welcome. All right. So now we get to the a little bit of the nitty gritty meaty parts of it. So the actual experiment itself. So now we are kind of toggling between the the frequentist approach versus the Bayesian approach to determine like whether you know differences of what we're seeing in metrics are significant or not. Now in the frequentist approach like let me just actually walk through what I'm doing here. I'm just actually computing the just the purchase conversion statistics for both the test group and the control group. So basically the purchase conversion for the test group is 11.8% and the purchase conversion for the control group is 12%. Right. This is after all four weeks. I kind of ran the entire experiment and I'm saying after everything this is what we see and we see a lift that's pretty small of like 0.14% in favor of the control. Now in the freak typically like in order to make decisions you would want to in the frequentist approach we want some sort of p value. Right. Now that we get from like let's say in this case since we're using purchase conversion purchase conversion is a number that ranges from zero to one. And for this kind of categorical very it's basically a categorical variable you either are not converted or you're converted. You would use something like the chi squared test. Right. And the chi squared test is a test of independence where it's saying OK the null hypothesis is the control and treatment are independent and the alternative hypothesis is that they are not independent. Now already looking at this to me whether I reject or not reject that null hypothesis. My next step and even like the interpretation is not very clear. Like what does it mean if treatment and let's say that we do reject the null hypothesis. The treatment and the control are just not dependent on each other. There's not really a very clear actionable next step. And with a lot of these frequentist approaches we come up with you know hypotheses that well they're they kind of are either weak or they're they're just not too relevant to you know the actual point the task at hand. And it just becomes difficult to either explain it to someone else and also you know understand it for ourselves. Right. But anyways I'm kind of doing it just for the sake of you know comparisons right here. So for the chi squared test we need a contingency table which is basically a small two by two matrix of you know how many users have converted how many users have not converted how many users have converted in both the like treatment and the test groups. Right. And once we have that table we can kind of like use that to compute like what is the expected conversion rate versus the observed conversion rate. And use that to compare to compute the chi squared value and the p value or chi but it's actually the chi squared value which is the test statistic and the p value. Now we see like a p value of 0.228 which is well that's above if you consider the level of significance at 0.05 as we do in typical AB tests that means that we will we cannot reject the null hypothesis. And essentially what it is saying is that there is a 23% probability 28% probability that a more extreme version or a more extreme value of the chi squared statistic than like what we're seeing 1.128 has occurred by chance. Right. So basically we just that statement itself. Like you would have to internalize it process it and it still is a little muggy of like what we should do next. Right. And what would be better instead of having these arbitrary like hypotheses it would be nice to have statements that are you know that are directly referencing the metric at hand that is like purchase conversion here. Like I want to know how how confident are we that the difference that we're seeing that 0.144% is actually a significant difference. Right. That's kind of what we want to know and that is a true nature of probability that we are accustomed to. But that's not something that we're going to see in a frequentist approach and but exactly this is kind of something that we would want to see. This is what we would be seeing in the Bayesian approach though. So I put a little figure here together you know just to to kind of outline what's going on. In the frequentist approach right we have we're conducting the experiment and before the experiment we have like OK we have control and treatment buckets buckets and we have this hypothesis of like you know in this case the chi-square test hypothesis of independence right. And in the end we're going to get to scale our values which are a test statistic and a P value right and based on these two we are going to make something like we can reject or we cannot reject the null hypothesis. Which again you can't explain when explaining it to a stakeholder they'll be like what does that mean right. Now this as opposed to the Bayesian distribution the Bayesian way of testing it is quite different because with the Bayesian test we are not just dealing with two point estimates. The output of the Bayesian test is two distributions right. So you know how in Bayes theorem we would this is kind of follows like the logic of Bayes theorem where we have a prior distribution. This is a distribution by the way of like how we believe the data is. Then we perform an experiment to better inform our prior and using you know kind of injecting information that we got during the experiment to the prior distribution. We come up with a posterior distribution of our control and treatment. Now these two are distributions and with two distributions you can do whatever the hell you want right. And in this case we can make statements like we are X percent confident that the lift is Y percent. Which is a much more actionable statement and a much more interpretable statement than just using something with reference to a null hypothesis. Even for yourself even for stakeholders or anyone around you right. And so that's kind of what we're doing here. So let's take a look at this Bayesian approach of it right now I wanted to so in here I said like OK we got to input a prior where do we get this prior from. Now with this prior data. I have kind of come up with. I'm using pre experiment data rather we have three four weeks of data right. So I'm trying to determine the priors based on the first week of data that we have. And then we can conduct the experiment from week two week three and week four. That's just the way that I thought it was easiest to get this prior information in an actual company though. Like if you're working you don't need to wait for a week to get your data because you already have that data from the past. So you can just use that as a prior. So that's kind of what I'm doing here. That's why I'm extracting just the first weeks of data from the control group and using that as prior samples. And what I'm doing here is well I said the prior is a distribution. It is a distribution of it is going to be a distribution of the metric that we're concerned about which is purchase conversion rate. So that means it's going to be a conversion rate is a number between zero and one right number of conversions to the number of exposures. And so I am just going to be sampling like we're going to take all this data from the prior just randomly sample 1000 elements. And then I'm going to compute the mean of the conversion of like that converted variable zero and one. This will effectively give us the conversion rate for that sample. And so we do this repeat this 10,000 times to get like a list of 10,000 purchase conversions right. And now to this what we need to do is we need to create again we need to create a distribution right. And so from the data points we create a distribution and the distribution I'm choosing to model this as is a beta distribution. So I'm fitting these 10,000 points into a beta distribution. So why I'm choosing a beta distribution is because we need a distribution with a prior has to be a distribution of the metric. It has to be a distribution of probabilities because you know the purchase conversion is like a it's kind of like a probability it's between zero and one. And that is best fitted with well the definition of that kind of curve is actually a beta distribution. And in fact, there is a really cool resource that I've linked to the bot at the bottom of this page if you just click on this link way at the bottom. It's a good blog to actually understand what the beta distribution is and it will take you to this page right. And here it's cool because they explain the beta distribution with like baseball and really quick of what it is is like let's say that you have you wanted to model the out the batting rates of like you know baseball players right. So the number of times that they are at bat versus the times that they've actually hit or like you know. So number of times are up as a denominator. The number of times they hit is a numerator. So you get like a hit rate right just kind of like purchase conversion here. And before you would probably take as a prior you would take like okay we'll create a beta distribution of like average the average batting rates or the average hit rates across all players right. That'll be a prior and then for this particular player when they go to at bat they have like a number of hits they have a number of misses and we use those hits and misses for that game to update our beliefs. And that leads to a new distribution a new beta distribution which is a posterior. So in this case, if your prior values are alpha zero and beta zero that we determine from like pre experiment data, the hits we update it with alpha plus hits. We update alpha with hits and beta with misses because alpha is kind of like you know the number of yeah it is proportional to actually the number of hits and beta is proportional to the number of misses. Now in the context of what we're doing in you know right here in this experiment alpha is kind of you'll see here in this code that I've read it. It's like a prior alpha it is that it is similar to like the number of conversions like the exposed users that were converted and beta is going to be proportional to the exposed users that were not converted. And so update my beliefs that way. Now let me probably explain it, you know, better as we go down. Let me I think I jumped ahead a little bit. So essentially what we're doing is we're just going to take our priors. Sorry, the, the, the samples that we took those 10,000 sample purchase conversions rates will fit it to a beta distribution. Right. And I'm passing in like the mean and zero I'm trying to like center it to like mean zero standard like standard deviation variance one only because like it kind of converges faster and it was throwing me warnings otherwise. Right. And this is the prior alpha and prior beta. So it's kind of like the alpha zero and beta zero that we saw before. Right. Now this kind of formulation you'll see it kind of evolves in a lot of math that you can you can probably click on this blog post and click here in order to to reveal more. But essentially these hits and misses would have been would have been obtained during the game. And similarly, we would have obtained these values during the experiment time. And hence we would update our beliefs. I hope that made sense. Well, anyways, we can apply that here. So we have. Yeah. So we constructed or we fitted our beta distribution and we have, we have now our prior. So now if we actually conduct this experiment for some time, let's say that I'm only going to conduct it for the first week. That means like, well week one doesn't count, but like num weeks is like the second week means that we've actually only if it's to that means that we've only conducted the experiment for like one week. And these are the results we're seeing. So we see the treat the purchase conversion is 11.8 for the control. It's 11.9. And the difference is like pretty small, whatever this negative some 100th of a thousandth of a percent. Right. Now what we're doing is, well, we want to update our beliefs of the of the priors. Right. So we take the priors we see how many users we update alpha alpha by the way is proportional to the conversion. Beta is proportional to the misses or the nonconversions. And so we update the parameters accordingly. And since alpha and beta, the parameters of the beta distribution, we now pass. We now create a new distribution for it, which is the posterior. So the posterior is essentially the prior with adjusted values. Right. And now we have a two posterior distributions for treatment and control. Now what we can do is now that we have two distributions, we can just randomly sample some values from this in order to compare both of them. Right. So RVS is random variable sampling. I'm sampling like a thousand from control. I'm sampling a thousand from the test beta distribution. And from that, I'm seeing like, okay, how many cases are the treatment greater than the control? And the mean of that would be like a probability. Right. It'll essentially amount to like, what is the probability? It is like the total number of samples where treatment is greater than the control divided by the total number of samples taken. Right. And that is essentially the probability that the treatment is greater than control. Right. And here we're getting a value that's like 30%. Okay. So we're saying that the probability now this at this point we're seeing the treatment is, you know, the probability that we're seeing a difference of 0.009 is 30%. So we're like kind of 30% in its confidence. It's kind of like proportional to a confidence level. And if you kind of, I'm just trying to kind of see what the beta distribution looks like. I'm just plotting the mean and variance here. But I want to tell you right now, we're going to change this value from 2 to 3 to 4 so that you can see like week over week how this experiment is really performing. And just notice how this variance changes and also notice how this probability changes. Right. So let's just first do that. So this was running the experiment for one week. Now let's say that we run it for two weeks. That means now weeks is three. So it runs from week one to week three. We run the cell. And okay, we got updated numbers over here. Still not too much of different. And let's now recreate our posteriors or update our posteriors. And you see now the treatment. Oh, let's learn the next cell too. So now you see the treatment is greater than the control. But now it's reduced to 18% from like, what was it? Some 30%. Right. So that means in other words, we're also more confident now that the probability that the control is greater than the treatment is like a little over 80%. And the problem. So we are 80%. About 80% confident that the control is greater than the treatment by this, this minuscule amount where it was 0.01, which is good. Because now it's sent now the more extreme this value is the kind of better is for you and your experiment because we just have so much data and it looks like we're being more well informed. And what essentially is happening is that the two distributions for the control for the posterior distributions for the control and treatment are actually more getting more and more defined and more and more distinct. You can see that because when I ran the cell again, the variance actually decreased by half. Two points something now it's like one point something. Right. So the variance decreased by half, which means that it like just getting one more weeks of data really, really made this more. We made us more confident in our results because now we're sampling from two distributions that are more distinct from each other. So it's even though like the difference we're seeing the difference is very small, but we're pretty confident that we're pretty confident in what we are seeing in the experiment too. Which means that we can almost like call the experiment. If you would want to call it, you can kind of call the experiment here because we're getting very confident that the control is greater than the treatment. Now let's say what happens if you run it for another week. Right. So what do you think is going to happen? Well, we have this now difference, but what do you think happens here? Right. We said now the controls getting more and more confident. So we should expect it to go down. Right. And down it goes. It's still going even lower. That means that like it's like 85% probably that we are not 85% confident that the control is greater than treatment with by by a factor of like what 0.012%. Right. So we're super calm. We're getting even more confident. And what happens now to the variance here? It should decrease as it just did. Right. So these distributions now getting narrower and narrower. So it becomes easier now to even distinguish them more, which is great, which means that, you know, our experiment, you can kind of see like there is power in our experiment. And also we can now quantify these probabilities with respect to the lift and not with respect to some arbitrary like hypothesis that we might be setting up. So yeah. And even with this too, we can make other statements like what is the probability that we are seeing a 2% lift, you know, in case you want to specify a very specific metric or a specific number benchmark threshold above which you wanted to, you know, move forward below which you didn't want to move forward with replacing the webpage. Right. And you can do all sorts of things with with distributions. So as kind of a closing thought here, I put a couple of advantages which I thought were like the most important are for Bayesian over the frequentist approach is that results are more interpretable than the ones that we got from the frequentist approach, which is true. Not only is it more interpretable for you to look at and understand it is also more interpretable to explain to stakeholders as opposed to like explaining hypothesis and oh, we rejected the null hypothesis at the 0.05 level of statistical significance. Really weird. And we can also interpret results at any point during the experiment. Right. So with, I don't know if I mentioned this, but with hypothesis testing, typically, we would have to wait for like statistical significance. And it's kind of hard to interpret results at in the middle of the experiment because they kind of lose meaning. Whereas in Bayesian testing, you could see like we were, we were just like changing these values from three to three, four at any point of time, we can interpret results because we have like this confidence level right here. Right. Well, this probability level, which is a confidence in it in its essence. Right. And that kind of helps us in interpretation at any point in the experiment. So we will have valid results to interpret at any point of time. And those are the kind of, I mean, those are the major features and advantages that I saw of Bayesian over frequentist. I'm still really new to this myself. I'm trying to use it, you know, in work as in when I run some tests, either to see if like my workflows are running as expected or to genuinely see some significant boost. But I hope you keep all these tips in mind. Thank you so much for the support. 100 videos is amazing. And we're almost at 50,000 subscribers. So if you haven't subscribed, please do please hit that like button. I'm going to share all these resource, well, this resource in the description down below. If you have any comments or questions or like if you think there were certain things that are just unclear, it's a really, I know it's a little shaky concept even to me. So feel free to throw them down the comments below. I might not respond to all of them, but I definitely read all of them. So please do that. Thank you so much again. And I will see you all in the next one. See ya. Bye bye.