 Hi, for the remainder of lessons and videos in under hypothesis testing, we're gonna look at two sample comparison. So comparing the wind sample that we have here, this red bumpy curve versus the natural gas sample, this blue bumpy curve. Now, before we get into actually any actual hypothesis tests here, I want to talk a little bit about an additional variable that we're gonna calculate out of this. So let's get to it. So the reason for calculating an additional variable here is to have something by which we can compare wind versus natural gas on an even basis. We can see clearly here that natural gas is producing a lot more energy than wind and it also has a much larger forecasted peak capacity than wind does. Because of these natural differences and these natural differences arise because natural gas is a much larger system in the state of Texas than wind is. In other words, Texas relies way more on natural gas energy than wind energy. But because of that, we're kind of comparing apples to oranges here and we wanna level this playing field and come up with a metric by which we can compare fairly wind versus natural gas and directly answer the question, well, which one has underperformed more? If we just looked at generation alone, well, we'd say wind is underperforming simply because it's not making as much energy as natural gases. But that's not fair because there aren't as many wind turbines or there isn't as much wind energy to begin with in the state of Texas than there is natural gas. In other words, Texas is way more dependent on natural gas here. So in order to facilitate a fair comparison between wind and natural gas, we're gonna look at the percent deficit. And this percent deficit is the generation, so that bumpy curve minus the forecasted peak capacity divided by the forecasted peak capacity. So in other words, this is basically looking at how much each energy source has either gone above or below that dashed line on a proportional basis to that capacity level. So how are we gonna calculate this? So we'll just make a couple of new variables here. Recall that the data object that we're working with is genPIV, that's our data frame. We're gonna use this .Loc function to define some new values. So we're gonna say anywhere where the fuel is equal to wind, we're gonna make a new variable called capacity and set it equal to 6.1. At the same time, we're gonna do the same thing except when the fuel is natural gas under this new capacity variable, we're gonna set it to 48.4. These are gigawatts. These are the forecasted peak capacities in gigawatts. So we talked about in the first video. Then with those in place, we can calculate a new variable called deficit. That will simply be our generation in gigawatts here minus this capacity. So that's our deficit. That's the numerator, the percent deficit. And then if we wanna just get this into a percent deficit, we can do deficit for the capital D divided by the capacity times 100%, get that 100%. So let's see what this looks like. So there we go, here's our percent deficit. Many instances, it's negative. In other words, that means it's below the forecast peak capacity. I think some of these rows that are omitted here for wind, they'd be positive. Let's visualize it. So to start off with our visualization, let's go back up to our previous visualization, borrow from that. And so we'll still do a line plot. We'll still have the same colors here except we don't wanna plot generation. We wanna plot our percent deficit and we'll color by fuel type, we'll sell blue and red, except now since we're looking at percent deficit, these capacities, they're not as relevant anymore. So let's delete those. Instead we wanna compare to is 0%, right? So our 0% deficit. Let's look at this as a black line dashed, both do it as black because regardless of whether it's wind or natural gas, we wanna compare it to 0% deficit. And there we go. So the red wind, blue natural gas, black dashed line at 0% deficit, we see that natural gas is always in the negative and wind is sometimes, perhaps most of the time, well, 64% of the time as we learned from the test of single proportions, below zero, okay? So that's what we're going to use, our 0% deficit and the remaining two sample tests. So comparison of means, both paired and unpaired and the comparison of proportions.