 Hi, in this video we're going to do a hypothesis test for two samples, again still using wind and natural gas from the ERCOT data set and looking at percent deficit. But instead of comparing means, we're going to do a comparison of proportions now. And our proportion of interest is going to be the number of observations, in this case a percent deficit, that fall below zero percent over the total number of observations. Okay? So, let's go ahead and take a look. So to begin with, our hypotheses are that there is no difference between the proportion of percent deficit values below zero percent for gas versus wind, in other words that wind and gas perform essentially the same. The alternative is that gas has a higher proportion of failure, basically more of the time gas falls below zero percent deficit compared to wind. So that's our alternative hypothesis and this is formulated based upon the visualization that we developed before, right here, looking at these curves of percent deficit. Natural gas we see is below zero percent, a hundred percent of the time, wind is not. So we know just by looking at this that natural gas has a higher proportion. But let's actually calculate it. So always step one, find the sample statistic. Our sample statistic here is the difference in proportions, not the proportion itself, not a single proportion. So first we'll find the proportion for wind. And so here we need to find the number of percent deficit observations that fall below zero percent. So from gen-piv we need to first find, we need to extract all the fuel types that are equal to wind and also where the percent deficit is less than zero. So all the cases that meet this criteria, so the fuel is wind, percent deficit less than zero. And we need to divide this by the total number of wind observations here. And we'll do the same thing for natural gas, except our fuel here is not wind, it is natural gas. But still we want to find the number of cases that have percent deficit less than zero. And then finally our sample statistic is the difference between these two things. We need to preserve the same order that we have in our hypotheses above here. So gas, p-hat minus wind, p-hat, and we can go ahead and print these things to save a little bit of time. I'm going to grab the print statements that we had above here just so we can kind of inspect our results. So we'll say proportion of gas, proportion of wind, and difference of proportions. The proportion of gas that's less than zero percent deficit, one of course, wind, 64 percent. We saw this before when the single proportion. So the generation, the number of times that generation was below the forecasted peak capacity was 64 percent, same value that we have here. And of course the difference is about 36 percent. So now the randomization distribution. So the question is, okay, well there is a difference in the proportions. Is that a statistically large difference? In other words, is gas actually different than wind or is it just a matter of random chance in the samples? So the following code relies on NumPy even though we've already imported it previously. Just make sure that we have that in there. Again we will copy genpiv to a new data frame called sim just so that we can do our randomization on sim and not mess up our original data frame genpiv. Capital N will be our default value of a thousand and our little n, our sample size is going to be the number of rows that we have in sim. We need an empty vector again to store our results in and we will do our for loop j in range and so we're going to loop through a thousand times or capital N times and every time we will do the following. So we do our simple, we'll do our again our reallocation here. So actually I can go ahead and borrow, I'll go up to our test for comparison of means and actually we can essentially borrow this and just edit it a little bit, bring this back down because we're still going to do our reallocation. So we're still going to scramble the percent deficit values just in that column and across the groups of wind versus natural gas except we're just going to instead of finding the meaning here, we're just going to redo this, we'll find our proportions again. So we're just calculating our, a different statistic is all. So we'll have our wind p hat except we'll call this, to distinguish it from above we'll call it sim and we'll have our, we'll call this prop different difference and we'll fill in the jth value and instead of gen pivot, we'll get this from sim right after we scramble the percent deficit but otherwise it's the same operation, we're still finding the same proportion and maybe a search and replace would have saved a little bit of time here but it doesn't hurt to do it this way. So there we go and then we make sure to use our simulated gas p hat and our simulated wind p hat. Okay, so that should be all good there, we'll do our randomization and then we can visualize. So again, I'll go back up and I'll grab our visualization code that we've used before and we'll just edit it a little bit. So we can call this set of x bar, we'll change this to prop diff and this will be our proportion and we will have the differences of proportions. Bring that same name down here, our data frame, we're calling prop diff df and then of course our sample statistic, we're called sam diff, and what do we get? There we go, so here's our dot plot. So again, the randomization distribution showing what sorts of differences and proportions we would see if the null hypothesis were true. Again, the null hypothesis effectively saying that there's no difference amongst wind versus natural gas. So even if that null hypothesis were true, we could see differences you know on the order of plus or minus about 12 percent here. However, in reality what we saw with the data was about, you know, what was it positive 36 percent, right? So extremely far outside of the realm of possibility here. You can then assume that the p values can be zero just by looking at this, but let's go ahead and formally calculate it. It's not x bar anymore, it's prop diff, prop diff, we call our sample difference sam diff, and let's go back up, look at our alternative, our alternative was greater than, so we need to change this to be greater than, and we see that there are no, there's no element of the randomization distribution that's at least as great as this red dashed line. So lo and behold, our p value is zero. And again, our concluding statement would be something along the lines of, since the p value is 0.0, and below the significance level of 0.05, we can reject the null hypothesis. That's the critical statement there, reject the null hypothesis that there is no difference of proportions between gas and wind. In other words, there is evidence that that gas under performed relative to the capacity more than wind. Okay, so that's our example here of a difference of proportions test using a randomization distribution. Thank you.