 Hello there, this is a video covering module 8.2 sample hypothesis testing. Previously we looked at only one population, meaning one population proportion, and we ran a hypothesis test to determine what its value was. Well now we're going to look at two population proportions and compare them, and we'll also look at two population means and compare them. So first we'll start off with two proportions. To use the method we're about to use, the following requirements must hold. You must have proportions from two simple random samples that are independent. And for each of the two samples, the number of successes is at least five, and the number of failures is at least five. So you must have at least five of each observation that you're looking at in the samples. So to conduct a hypothesis test for two proportions, we will definitely use Google Sheets, we'll go to the data list tab, and we'll go to the region that says two proportion confidence interval p value. And we'll literally type in summary statistics for the two pieces of data, for the two populations I should say the two samples we're using. So for population one, p sub one represents the population proportion for group one or population one, and sub one means to sample size of group one, x sub one is the number of successes or observations in the sample for a particular characteristic, p hat sub one is the sample proportion for group one, q hat sub one is one minus p hat sub one. And so all of those little subscripts of one mean these are the, this is the information for population one. And then the corresponding notations of subscripts two represent the information for population two. So subscripts are key for organizing your data that you have. So the null hypothesis to use, remember the null hypothesis H naught is always equal to. So we say the proportion for group one is equal to the proportion for group two. And where this p sub one equals p sub two comes from is remember if there's no difference in two things, if we subtract two things and we get a difference of zero that means there is no difference. That means the two quantities are equal to each other. That's where the p sub one equals p sub two comes from. Also it's always important to write p sub one on the left to avoid errors when you're inputting your data into the spreadsheets or whatever technology you're using. And then we'll look at a p value, we'll compare it to alpha, if the p values less than or equal to alpha then we reject a null hypothesis. So I have 89 undergraduates that were randomly assigned to two groups. They were given a dollar in two different forms then given a choice to keep the money or buy gum or mitts. The claim is that money in large denominations is less likely to be spent than an equal amount of money in smaller denominations. We're going to test the claim at the .05 significance level. So here I have my information and the claim is that money in large denominations is less likely to be spent than an equal amount of money in smaller denominations. So I have group one, this is the group that was given a dollar bill and then I have group two. This is the group that was given smaller denominations, four quarters in this case. So when you look at your hypothesis, your null hypothesis is the proportion for group one that spent the money is equal to the proportion in group two that spent the money. And then your alternative hypothesis is the proportion that spent the money in group one is less than the proportion that spent the money in group two. The larger denominations the proportion that spent is less likely, so it's less than the proportion that spent in group two. And of course our claim is the alternative hypothesis. So literally all I have to do is I go to Google Sheets and I input the number of observations from group one which is 12, the sample size from group one which is 46. And then for group two I'll enter 27 for the number of observations x2 and then the sample size of 43. And the only other thing that you'll have to input is the alternative hypothesis sign which is less than. So let's enter those values now in the Google Sheets and let's get that p-value. So we're going to the data list tab to the two proportion confidence interval p-value region. So for group one, the people that got the dollar bill you have 12 out of 46 people that spent the money. In group two you had 27 out of 43 people and changed the alternative hypothesis sign to less than. Group one's proportion is less than group two's proportion. You only care about for a hypothesis test the p-value of 0.002. p-values are typically to four decimal places, 0.0002. So my p-value is equal to 0.0002. So how does that compare to alpha? How does that compare to our significance level of 0.05? Well, 0.002 is clearly less than 0.05 so we reject the null hypothesis. So the null hypothesis is rejected so long. So then since that's rejected, all eyes now point to our alternative hypothesis to our claim and we say the following. There is sufficient evidence to support the claim that money in large denominations is less likely to be spent than an equal amount of money in smaller denominations. You can also do confidence intervals for two proportions. It's literally going to be the same process except you now have to enter the confidence level into Google Sheets. So for a two-tailed test, the confidence level is always 1 minus alpha. For a left-tailed or a right-tailed test, the confidence level is 1 minus 2 times alpha because you have those two tails and you're focused on the middle whatever percentage of data values or sample mean values or sample proportion values in this case. So like I said, Google Sheets will go to the same spot and then we'll have to make sure we enter the confidence level. So construct a 90% confidence interval estimate of the difference between the two population proportions in the previous example. So they went ahead and just gave us the confidence level and that's the easiest way to solve these types of questions is when they give you the confidence level directly, it's as easy as just literally typing it into Google Sheets. So confidence level is 90% or 0.9, 0.90 however you want to do it. So in Google Sheets, all you have to do is go to confidence level and type in 0.9. Look at your lower limit, look at your upper limit and you'll see that you have, we'll do three decimal places, negative 0.528 and negative 0.206. That's kind of cool. So I had negative 0.528 and then I had negative 0.206. So the interpretation is with confidence level 90%, the difference between the proportions of the two groups is between negative 0.528, negative 0.206. So also logically 0 is not in the interval, so a difference likely exists. Two things have a difference of 0, then there is no difference, 9 minus 9 is 0. But if 0 is not in the interval, a difference likely exists. So now we're going to go through and do a hypothesis test and do a confidence interval and see what we can conclude. So among 450 workers surveyed in a poll, 40% said it was seriously unethical to monitor employee email. Among 120 senior level managers, 30% said that it was seriously unethical to monitor employee email. So we have two groups here, we have workers, we have the managers. Consider the claim, that's a hypothesis test keyword there, claim that for those saying that monitoring email is seriously unethical, the proportion of workers is the same as the proportion of managers. We're going to use a 0.01 significance level. So in this example, I just need to identify who's group one and who's group two first. Group one, let's let those be the workers and then group two, let those be our managers. All right, so for the hypothesis test, first thing to note would be the hypotheses. The hypotheses would always be for the null, it's always the first group's proportion is equal to the second group's proportion. And in this case, that's actually the claim that was mentioned in the question. Remember, the alternative is always the exact opposite. So the proportions are not equal to each other. All right, so what your goal is is you're going to type in the number of observations for group one and the sample size of group one and to Google Sheets. So for group one, 40% said it was seriously unethical. Well, how do you find 40% of 450? Our sample size is 450. We need to find 40% of that. Well, 450 times 0.4 is going to give you 180. For group two, you're going to have 120 times 0.3 because 30% of the managers said it was seriously unethical and 120 times 0.3 is actually 36. And we already know the sample size for group two. For the managers, it was 120. There's your information you need for your Google Sheets. So for your interval, for our interval, this is a two-tailed test. Why is it two-tailed? Because the hypothesis uses not equal to. Therefore, my confidence level will be 1 minus my significance level. 1 minus 0.01, that's my significant significance level. So 0.99. Another value will type in Google Sheets. Let's go to Google Sheets now and type in this information. Remember, we're on the Daedalus tab in the two-proportion confidence interval p-value region. So for my first group, it's important to note that 180 out of 450. The email was unethical. In group two, we had 36 out of 120. That's my manager group. Our alternative hypothesis sign was not equal to and the confidence level was 0.99. Note you have a p-value here, 0.0448. And note that you have your confidence interval lower limit and confidence interval upper limit, negative 0.023 and 0.223. So let's write those answers down now. So our p-value is equal to 0.0448. We're going to compare that to alpha. So 0.0448, let's compare it to the significance level 0.01. 0.0448 is actually greater than 0.01. Not under that limbo bar, we fail to reject the null hypothesis. So we fail to reject our claim in this case. We'll write our statement in just a moment. And for confidence interval, it's important to note that we got negative 0.023 and then we got 0.223, which means the following zero is in the interval. So there is no difference in the population proportions. So that's the direct interpretation of the interval in terms of answering what can you conclude. So now let's write the hypothesis test summary statement and the confidence interval summary statement. Alright, so for the hypothesis test, there is not sufficient evidence to warrant rejection of the claim that the proportion of workers who say monitoring email is seriously unethical is the same as the proportion of managers. And that structure of the statement can be found in the module seven videos. It's the table with four rows and it shows you Senate structure based on what you do to the null hypothesis and where your claim is located. And for the confidence interval with confidence level 99%, the difference between the proportion of workers and managers who say monitoring email is seriously unethical is between negative 0.023 and 0.223. If you want to write these down, feel free to pause the video. Lastly, looking at seat belts, a simple random sample of front seat occupants involved in car crashes is obtained among 2823 or 2823 occupants not wearing a seat belt, 31 were killed and among 7,765 occupants wearing seat belts only 16 were killed. Let's use a 0.05 significance level to test the claim that seat belts are effective in reducing fatalities. Alright, so let's look at the hypothesis test. After I identify who's my group one and who's my group two. So group one is actually going to be those that were wearing no seat belt. So it's the proportion of deaths of those that were not wearing a seat belt and group two will be those that wore a seat belt. Literally, I just labeled them in the order which the question presented the groups. So for my test, I'm looking at my claim of seat belts are effective in reducing fatalities. So what that would mean is the following seat belts were effective. Then the proportion of people that died from not wearing a seat belt would be much greater than the proportion that died from wearing a seat belt. So the no seat belt deaths will be more than the seat belt wearer deaths. So this actually goes with my alternative hypothesis because it contains greater than and remember the null is always going to be equal to and you always write group one on the left to avoid mixing up your data and typing it in the wrong spots for the wrong group. Alright, so my claim is going to be the alternative hypothesis. That's what they described in the question. Alright, so now on Google Sheets x1 and n1 x2 and n2. Alright, so in group one out of 2823 occupants 31 were killed. That's my non seat belt wears in group two out of 7,765 people. We had 16 that were killed and your h1 sign is greater than. We'll type that in the Google Sheets. Alright, so now my interval my confidence interval. This is a one tailed test. Therefore my confidence level will be one minus two times your significance level one minus two times point zero five. So it's going to be point nine. That's what we need for Google Sheets. I let it all the important information. Let's now type it into the Google Sheets spreadsheet. Alright, so what I'm going to do in my box is I'm going to type in I had 31 out of 2823 people in group one that died and then I had 16 out of 7,765 in group two. My sign should be greater than and my confidence level is going to be 0.90 or 0.9. Note that your p value is basically zero and then you have your lower bound and upper bound of your confidence interval. So we'll type these bits of information here. So my p value is basically zero and when you compare zero to the significance local alpha, it's definitely less than. So that means we do reject the null hypothesis. So as we can cross up the null hypothesis and all eyes now point to our claim, there is evidence to support our claim. We'll write the statement in a minute and then in terms of your confidence interval you had point zero zero six comma point zero one two. So zero is not in the interval. So there is a difference in the proportions. So here is the hypothesis statement and the confidence interval statement. There is sufficient evidence to support the claim that seat belts are effective in reducing fatalities and the confidence interval statement is with confidence level 90% the difference between the proportion of deaths from not wearing seat belts and wearing seat belts is between point zero zero six and point zero one two. Remember when zero is not in the interval that suggests that there is a difference in the proportions. So anyway, that's all I have for now. Thanks for watching.