 Hello and welcome back to the lecture series. In this lesson, we're going to talk about more of the traditional statistical side of hypothesis testing. So normally, you look at a statistics textbook, they won't teach you how to conduct these randomization procedures. Instead, they'll focus on the statistical theory. So the first part of this lecture is actually having to will focus on significance. So we're here in Google collab. And so in order to focus on significance, we're actually going to look at a deck of cards. So we're going to use a proportion. So we've got our proportion is the number of cards, number of red cards divided by the total number of cards. And in theory, it should be 5050 so standard deck of cards has half red, half black colors. So alternative is that what's more than 50% in effect that we've got a deck that isn't fair. So we're going to go ahead and get started that I'm going to type in X equals NP dot array. And this is how we're going to create an array. Before we get started so I have a deck of cards here. And I'm just going to draw the top card. And we're going to record whether or not it's red or black. So first card is a five of diamonds. That is red. Second card, six of diamonds. Also read six parts, nine of diamonds. We need to have a comma there, nine of hearts. And we'll do one more for diamonds. So we'll go ahead and stop there. And we're going to use this data to look into what it means how significance levels and how significance testing can change based off of our data that we have here. So with that, we can go ahead and get started. All right. So we have this data here that I drew out of my deck of cards. And, you know, the first six drills we had were read data points. And so we can go ahead and develop our hypothesis test. This is a single proportion test. So we need to know the sample size, which is just the length of our X array. We need to know how many iterations we're going to do. So that's capital N. And then we need to integrate or conduct the binomial test where we specify our sample size, our null proportions, and the number of iterations. So we can run this looks like it's still connecting. I need to run my libraries here. So we've got NumPy, Pandas, plot nine, and later in this lecture series, we'll get into SciPy.stats. So run this. So this sets up our binomial data set. And then we need to actually calculate our proportions. So we can say p hat is just counts divided by the sample size. And our data p hat is the actual data set. So we have the length of X in which X equals R. And then we divide that by little. And so we've got our data here. And as we have done in previous lectures, we can go ahead and visualize this. We can create a data frame. So we say pd.dataframe of just our sampling distribution of proportions. And I'll go ahead and rename these columns to just say p hat. And then we can do our ggplot. So we're going to plot our p hat data frame. And we're going to do a dot plot. So geome underscore dot plot. We have our AES where we just need to serve provide the X value, which is p hat. And we can set the dot size to something relatively small. So we can add our vertical line where the X intercept is set to data p hat. The color is set to blue. And the line type is set to dashed. And then we can run that. We can look at this and we can already see that, you know, the majority of our data is certainly below our sample statistic where 100% of our data came from the red draws. So we can already sort of guess that our p value is going to be quite low, but let's see how low it actually is. And so we can say print the p value. And this is just the length of p hat data frame, where the p hat column is greater than or equal to our sample proportion, divided by the number of iterations that we did. And so we can see this p value is .021, where we therefore reject the null hypothesis in favor of the alternative that the proportion of reds is actually higher than 50%. Well, let's say, for example, that I only did this four times or three times. We just drew three cards. We can go ahead and run this and rerun all of these different things. We can see how our proportions are sort of spreading out. They're not becoming as normally distributed as we would expect. And then we run this and suddenly our p value is greater than 0.5, which means we fail to reject the null hypothesis that there's more than 50% of the deck. So if I go back here and let's say we continue to draw cards, I just drew a 10 of diamonds and eight of hearts and then a queen of hearts. So now we have even more reds. We're up to nine reds drawn, run through all of this. We can see it's becoming a little bit more normally distributed in the plot. And now our p value is 0, 100% of the data is below there. So we've got as low as we can go, which provides really strong evidence that this deck that I'm drawing from is not actually a fair deck. There are much more reds than we would get in a normal, randomly distributed deck. And if we continue to increase the number of draws, if they continue to be red will continue to have this p value drop to zero. And so this is really good to evidence how, you know, the significance of our data set of our hypothesis test can change depending on how large your data set is. But also, how where you set your significance level. And so we go back to when we had six, or yeah, six red draws and run through these, our p value is 0.016, which under a normal significance level would reject the null hypothesis. But sometimes people will use 99% significance levels, which becomes 0.01. And in this case, if we followed that significance level, we would fail to reject the null hypothesis just barely. And so this is why, you know, determining that significance level is really important and sticking to it and acknowledging what your, which level you're working at, but also making sure that you get a large enough statistical size so that you feel confident in your final statistical conclusion.