 Hi, in this video, we're going to go over how to do a hypothesis test for a single proportion using randomization distributions in Python. And we're going to continue on with this Texas cold snap example from the power crisis in 2021. And we're still going to be focusing on just when because this is a single proportion. So we have to look at a single sample here. Now the proportion that we're going to look at is the proportion of time that wind generation fell below the forecast peak capacity. So recall that this forecast to peak capacity is really reflects the demand on the system. And so we're going to look at the proportion of time that essentially supply was not meeting demand during this crisis. And again, just for wind here. Okay, so let's get to it. So our hypotheses for this single proportion, the null is that that that the forecast peak capacity was met half the time or at least half the time. And the alternative is that this proportion, the proportion of time below the peak capacity is greater than 0.5. That is more than half the time when it was not meeting that capacity level. Okay. So that's kind of that's our failure criteria here in in the alternative. So we're testing to see if either the null was met, or the null was not met. Again, we're not, we never accept this alternative hypothesis. So let's start off. The first thing to do always in step one is to calculate our sample statistics. So what is our actual proportion from the data? So let's calculate wind p hat. So our p hat from the wind sample is the length of the elements in our wind vector. Recall that with the hypothesis test for the single mean, we extracted the generation from from wind into a single vector called wind. So this wind object is a series of the generation in gigawatts. And we're going to divide by little n, little n being the sample size of wind p hat. We calculated those back up in the hypothesis test for the single mean. So n is the, is the length effectively wind here and wind becomes this generation of just the wind source. So let's see what our sample statistic is here. So wind, put this out, 0.64. So 64% of the time, the actual generation fell below the forecasted peak capacity of 6.1 gigawatts. So then the question is, well, is that a lot? Is that statistically significantly different than half the time? So that's what the hypothesis test here is going to answer. So for a randomization distribution, this is going to be a bit simpler than what we did with the, with the mean. We're not going to use random choice. Instead, our key function here is going to be random dot binomial. We're going to use this binomial distribution to, to simulate conditions under the null hypothesis. So if the null hypothesis were true, then half the time the generation would be above or equal to 6.1 and half the time be below. That'd be the, that's essentially what's going on in the null. And maybe those proportions are a little bit different, but, but let's see here. So with the binomial distribution, how we can simulate from that is that, well, we can essentially, with the binomial distribution is essentially doing a series of coin flips. So we can flip a coin at every time during our two-day period here of the Texas cold snap. And, you know, if it's heads, then we'll say that wind falls below the capacity. If it's tails, it falls above. So that's, that's essentially what's going on with this binomial distribution. So let's put into action in Python here. We'll establish our little n is the length of the wind. We already have that established above, but we'll reproduce that here. And again, we can always specify a capital N. We'll keep it the same as what we had for the hypothesis of a single mean. And what our random dot binomial distribution is going to do is it's going to return the counts of successes here. So it's going to turn return essentially the, our numerator in this proportion. So the number of times that wind falls below 6.1. And what we need to give it is our sample size. So we want to do n different flips or trials. And our proportion here is going to be 0.1. I'm sorry, 0.5. That's going to be our null value. We take this value straight from the null hypothesis up here. And our size is going to be n. So we're going to do little n trials, capital N number of times. So let's go ahead and run that and we'll see what's contained in counts here. So we see that we have a collection of a thousand counts, a thousand numbers of successes. And so if we want to get our p-hats from these, so our simulated p-hat values will be just the counts divided by in this case a little n. It's not letting me, every time I hit return it capitalizes it. p-hat, there we go. So there, there it is in proportion format. So these are our p-hats. What we're looking at right now, this series of numbers, this is our randomization distribution. Let's visualize this. So we'll do p-hat data frame is equal to pd.dataFrames. We're just going to transform p-hat into a data frame here and give it some colon names. We'll say p-hat. And then use our good friend ggplot to visualize this. I'm going to use a different visualization than histograms. We did a histogram last time for the single mean. Let's use a, let's do a dot plot now. We'll see what this looks like. The reason, well, I'll explain the reason why I'm using a dot plot here. Let me take a look at it. And as we had before, let's also put our sample statistic here as a vertical dashed line. So we'll have our sample statistic when p-hat is going to be a vertical dashed line at this x-intercept value. The color again will have the red and line type can be dashed. So there we go. There's our randomization distribution. This is the dot plot. So instead of having bars, we have a collection of dots stacked on one top of one another. And the reason why I like doing this for the randomization distributions is that each dot represents one randomization. So a single randomized sample. So we can see basically the number of randomized trials in this case that fall within each bin, within each category of p-hat here, which is nice. It gives us a sense of the scale and how many trials have been run. Again, this vertical red dashed line, this is our true sample statistic. This is what the data say. And similar to what we saw with the hypothesis test for the single mean, this falls outside of our randomization distribution. In this case, it falls above it. But let's go back and reference our alternative hypothesis here. Our alternative hypothesis is that p is greater than 0.5. And here we see no randomized trial that is at least as great as this red dashed line. So this red dashed line is pretty extreme here. So just by inspection of this, we know beforehand that we can reject the null hypothesis. We can formalize this into a p-value. Let me just copy and paste our code for printing the p-value here. Our p-value here is not going to be based on random mean. Instead, we have p-hat is our series that contains a randomization distribution. And it's not wind mean this time, but rather wind p-hat. And we've got to flip this around. It's greater than this time. But still our p-value is 0. So there's zero probability of getting a randomized trial at least as great as our sample statistic if the null hypothesis is true. Again, this randomization distribution is reflecting the condition under which the null hypothesis is true and showing what sort of variability we would have in that in our data vis-a-vis our p-hat if that null hypothesis is true. So what should we write for a concluding statement? Again, we can say since the p-value of zero is less than 0.05, we can reject the null hypothesis that wind met or exceeded the forecast of p-capacity. I'm just going to write capacity half the time. But the key term here is rejecting the null hypothesis. So we reject the null hypothesis and we provide the reason. The reason being that our p-value is less than our default significance level of 0.05. Okay, so that is our test for a single proportion.