 But in this video, we're going to go over how to do a difference of means test or a comparison of means for two samples So let's get to it So we're going to continue on using this percent deficit that we calculated in the last video and our Hypothesis test for these two samples and in the comparison means or difference of means test is going to be The null is going to be that the mean for gas minus the mean for wind is equal to Zero or in other words that these two means are equal the alternative here is that The mean for natural gas is less than that for wind Or in other words that different the difference between those two is less than zero The reason I set it up this way is because just by looking at this visualization that we developed in the last video Because natural gas is consistently below 0% deficit I Have the suspicion that its mean will be less than that for for wind So I set it up hypothesis based on that But let's go ahead and calculate our sample statistic here and we can confirm whether the the mean of the data conformed to that so so first let's pull out our wind sample and Recall that our data frame is is gen piv that we're working with here We've calculated percent deficit in gen piv. So we want to pull out that values from that variable here. So gen piv loc and anywhere where Gen piv As a fuel value that's equal to wind We want to pull out the associated percent percent deficit values, okay, that'll be our our wind sample same thing for Natural gas We'll just replace wind with gas So we'll just pull out those two samples and then our Sample difference is going to be our Samt gas dot mean so the mean of Samt gas Minus Samt gas Sorry Samt wind So here's our sample statistic is the difference between The mean of the gas percent deficit and the mean of the wind percent deficit Note that this order is important here. It's got to conform to what we'd have in the hypothesis test here It's really critical. Let's print some values here. So let's print our mean gas value is This Let's also print our mean wind value. This is just for inspection And then finally the difference between those we see that our our Average percent deficit for a gas sample is negative twenty eight percent The average percent deficit for a wind sample is negative fifteen point five percent. So indeed from the samples the Gas is less than the wind and so this would justify having the hypothesis set up this way And then the difference is indeed negative or less than zero So again justifying this this set up here But still the question remains is this negative twelve and a half percent? Is this statistically significantly below zero in other words? Is this completely out of the realm of possibilities? Even if the null hypothesis for true So that's what we want to test with the randomization distribution and that the resulting hypothesis test here So moving on to a randomization distribution We already have imported numpy previously, but let's do it again anyway Just for as a reminder that numpy is needed for this One thing that we're going to do here is we're going to copy the genpiv data frame Over to something called sim so we're going to make a new data frame sim That's just going to be an identical copy to genpiv The reason being is that we're going to be scrambling up some of the values there You don't want to mess up our original data for Again, just a reminder capital N will be a thousand as we'll consistently use Through these hypothesis tests although you could do more when you're doing this on your own Our sample size Is going to be the number of rows in Our in genpiv or in sim here So this is going to be the sample size of wind plus the sample size of gas This shape function gives the first the number of rows then the number of columns by putting the index here We're getting we're extracting the number of of rows from them And then as we did in the test for single means we want to store Our resulting statistics From the random randomization distribution into some variable. So we'll have x bar diff. This is just going to be an empty variable capital N 1000 blank spaces in it and as we go through this for loop We will Fill in those blanks So from our sim data set data frame What we're going to do is we're going to scramble up this percent deficit column So we're going to make a new percent deficit column or override the percent deficit column And we're going to use random choice to do the scrambling. So we saw random choice first in the hypothesis test for a single mean where we were drawing With replacement from that wind shift variable here, we're going to draw From our original percent deficit column And We want to use size equals n. So the same sample size that we have originally And here we don't want to do this with replacement So we're going to do replace equals false. The reason being is that If we go back up to our data frame Really, we just want to scramble up these values and reorder them in the same column And we don't want to repeat any values. We really just want to reallocate these values across the two groups Wind and natural gas And do that in a random sense. So some will stay within wind some will stay within natural gas But we're going to switch up the order and and potentially regroup some of them Because under our null condition if there isn't any difference between these two groups Then it wouldn't really matter what what group the observations follow. So that's the philosophy under that The null hypothesis here So once we do that scrambling, well, then we just need to execute Our calculation of the statistic again. So I'm going to copy and paste everything from From step one into here and really we just need to replace all these stamps with say sim for example So we'll have our simulated wind sample It's coming from sim so replacing Gen pivot with sim after you've scrambled up percent deficit here And then the difference will be The difference in these needs So the same steps that we had before in calculating the sample statistic from the data We'll use After we've done the randomization and scrambled up the This percent deficit column here, okay Now Oh, sorry, I should recall this x bar diff and we want to of course store this difference Okay So we can go ahead and run our randomization distribution And then visualize it. So I'm going to go back up. I'm going to just copy and paste our visualization from our proportion test of a single proportion Bring it back down here. It's the same structure except We have a different we have different names of things but still our randomization distribution We want to turn into a data frame And our x intercept should be our sample statistic the sam diff here So we can go ahead and visualize that so Under our null condition if the null hypothesis is true if there really wasn't any difference between two groups We would see a difference of means a percent deficit that ranges from about negative 10 to about positive 10 So pretty big swing But nevertheless our actual sample statistic. So from our data what we observed in reality is Quite a bit less than that range falls outside of that range So what do you think our p value is? You're thinking zero. You've got it, but we can go ahead and calculate it here again, just replace our P hat with this x bar diff The series not the data frame our sam diff Replaces when p hat and the direction of this inequality Should be less than because that's our alternative hypothesis And we have zero here. So our concluding statement again, since The p value is very low 0.0 and less than The Significance Level of 0.05. That's a default value. We can reject The null hypothesis that There is no difference between The means of the two groups The two groups being a percent deficit of gas versus percent deficit Of wind Again, the critical language here is rejecting the null hypothesis because our p value is less than 0.05 But it's always good to contextualize that result in the actual problem. So what do you learn from rejecting all hypothesis? Well, we know Then that there actually is a difference between these two groups so there is evidence that natural gas under performed more more than wind Okay, so that's our test for a difference of means