 Welcome to our lecture on the two-sample Z-test for the difference between two proportions. When do you want to use this sort of statistical test? Well, anytime you want to compare proportions or something that in effect is a proportion like a rate. If you want to compare the rates of defectives in computer chips that are supplied by two different companies, and you might use one of the companies in your tablet computer that you're for manufacturers. If you want to compare death rates for heart transplant surgery at two different hospitals, if you want to compare graduation rates for two high schools in the same area, so you can choose where to send your children. These are the kinds of things that you'll be doing in this lecture. We're going to be using the Z as a way of determining whether the observed difference between the two proportions is significant or just sampling error. Okay, now to use the Z we need a larger sample. There won't be a T version of this. Forget it, you can't do the T test. It has to be relatively large, and here are the rules on the bottom. N1 times P1 should be greater than 5. You need a relatively large sample to use the Z test for this purpose. This Z test is actually an approximation, which by the way is why we can't use the T if we have a small sample size, because that would be an approximation to an approximation, and we don't really want to go that far. The formula you see over there, the sample proportion from group one minus the sample proportion from group two and so on, on the right hand it shows you how to get the sample proportion, X1 over N1, X2 over N2, where X is, you might say, the number of successes or the number of hits. It's whatever it is that you're counting and that you're getting the rate of, like maybe number of defectives in total number of parts, and of course N is the sample size for the individual group, N1, the sample size for group one and two, the sample size for group two. Now notice the formula for a Z requires under the square root sign and the denominator a P bar. It requires the pooled estimate of the population proportion, and what this is is basically the sample proportion if you had not split your data into the two different groups. If you combine both the groups together, you get the pooled proportion by taking X1 plus X2, and then in the denominator N1 plus N2, and that gives you the pooled estimate of the population proportion for the groups. We're comparing death rates at two hospitals. We've been looking at liver transplants in two similar hospitals in similar areas, and we notice in hospital A, 77 out of 100 people died within six months of getting that liver transplant. In hospital B, it's 120 out of 200. Clearly you're comparing two proportions. Remember HO is that the two proportions are the same. That's why we pool it because under HO we're saying that there's no difference, so we pool it. Now the question is, are the death rates for the two hospitals statistically different? We're going to test that alpha equals 0.05. Now the sample size is large enough. In fact, combined it's 300. Look at 300 people, and this allows us to use the Z approximation. Now we're ready to do the problem. HO is that there's no difference between the two proportions, which is like P1 equals P2, or if you wish P1 minus P2 equals 0, and the alternative, H1, is that P1 is not equal to P2. So it's a two-tailed test, and we didn't put the 0.25 in the right tail and the 0.25 in the left tail. The alpha of 0.5 is split into two, so you have 0.025 in the right and 0.025 in the left tail. Now notice what we do. We look at the two proportions. PS1, 77 out of 100 died, that's 0.77. PS2 is 120 out of 200, that's 0.60. So really we're comparing 77% with 60%. Now when we need that P bar, that's that pooled proportion. Remember on the HO there's no difference, so we're playing, you know, devil's advocate in the straw man, and they say, okay, on the HO we can combine it, so we combine the two to get that P bar. 77 plus 120 divided by 100 over plus 200 to 300 people, 197 over 300 is 0.657. So that's going to be, you'll see that 0.657 in the denominator. So we compare 0.77 minus 0.60, so you have 0.17. There's a 17% difference in the numerator, and then you have this huge thing in the square root, which is a lot easier to do, but just remember to do the thing of the parentheses first. First do 1 over 100 plus 1 over 200. Then whatever you get, times that by 0.343, times that by 0.657, take the square root and you get 0.058. Your Z value, it's an approximate Z, is 2.93. Clearly it's in the rejection region. Anything more than 1.96 would have been in the rejection region. 1.97, 1.98, and 2.93 is well into the rejection region. Reject the HO, the probability of getting this sample evidence of HO is true is less than 5%. In problem two, we're comparing two unemployment rates. The unemployment rates for county A and for county B. Remember, a rate is just a proportion, and if we're comparing two proportions, we're asking the question, are they really the same? Are they just two different samples taken from the same population? Or are they statistically different? We're going to test that alpha equals 0.05, just to keep things simple. You already know how to do that. Here we solved the problem. We do the hypothesis test, the two sample hypothesis test. The null hypothesis, again, the proportion from group one is equal to the proportion from group two. The alternate hypothesis, which we accept if we reject the null hypothesis, is that the two proportions are different. The sample proportion from group one, 100 out of 400, was 0.25. That was the unemployment rate in group one. And in group two, 44 out of 200, it was 0.22. That was the unemployment rate in group two. The pooled proportion ended up being 0.24, which kind of makes sense. And you see the picture of the Z distribution under the null hypothesis. We're using alpha 0.05, so once again, we have 0.025 in each tail. And the critical values from the Z table are plus and minus 1.96. The calculated value of Z from the data, from the sample evidence, working through the formula, works out to 0.8. And that's in the big, white area of non-rejection. So our conclusion is do not reject HO. The two unemployment rates are different, but the difference is only due to sample variation. It could happen with any two samples, even from the same county. We're going to examine real data. This is from the Donner party. The study was actually published by Donald Grayson. And he wanted to know whether the survival rate on the conditions of starvation is different for men and women. In other words, are men or women more likely to survive? Who's more likely to survive when there's no food? You can't do an experiment like this, obviously. You can't starve people. But they looked at the Donner party, and you can see a little bit about the background. They were traveling from Illinois to California at a huge blizzard. There's no food for months, and they actually didn't kill each other. They were nice to each other. But if somebody died, they had to resort to cannibalism. And we know the death rate for the women was 10 out of 34. 10 out of 34 women died. And for the men, it was 30 over 53. Now, you can see why you need a statistical test. You can't just simply say that more men died. It may just be chance variation. So we're going to test at the O5 significance level to see whether the death rates for men and women are statistically different from each other. Anyway, now we're going to do the problem. HO is at P1 equals P2. There's no difference in death rates when there's no food. Men and women have the same survivability, if that's a word. And H1 is at P1. It's not equal to P2. We pool it. 10 is one. The group HO is no difference. We combine the men and the women. There are 40 deaths out of 87 people, which is 0.46. That's P bar. P bar is 0.46. So we're comparing. This is a Z. We're comparing 0.294 minus 0.566. So in the numerator, we have minus 0.272. And in the denominator, you have the square root of 0.46 times 0.54. Then in parentheses, which maybe you got to do that first, 1 over 34 plus 1 over 53. After you do the thing in the parentheses, multiply that by 0.54 times 0.466, take the square root, and you end up with 0.1095 in the denominator. You end up with a Z value of minus 2.48. Now, as previous two problems, you know what happens when you took the alpha of 05 and you split it. So you have 025 in the right tail, which gives you a Z value of plus 1.96. 025 in the left tail, which is the Z value of minus 1.96. Minus 2.48 is in the rejection region. So basically reject HO. We conclude that the survival rates for men and women are different. In fact, the author of the study explained why he believes that women have a higher survival rate under starvation conditions than men. Women have an extra layer of fact tissue. That's there. So the fetus, and of course the woman herself, has enough nourishment in the case of a famine. It's not uncommon sadly in this world that people starve. I mean, that's one of the hopeful you are aware of what's going on in many parts of the world. There's not enough food. Food deprivation is a serious problem. So women have that extra layer of fat that's there for them and of course for any, if they get pregnant, fetus. So according to this study, and it's been corroborated with another study, the same thing happened. And in both studies, the women did better than the men when there was no food. Another theory, and that's that the men got very annoying and the women might have bumped them off for the food. So perhaps there's another reason why women were more likely to survive under these conditions than men. Just the theory. Here we go again, but this one's a little different. Pass rates. College X claims that its pass rate on the bar exam is significantly greater than the pass rate of College Y on the bar exam. We want to use a significance level alpha of 0.01 for this test. The difference is the claim. I know till now we were only looking at two sample tests where the test is a two-tail test. We're going to reject either the right or the left. But the way this is written lends itself towards a one-tail test. The college claims its pass rate is greater, which means we will end up on the next slide setting up the null hypothesis, the straw man, to knock that down and prove that it's greater. That would be H1. We'll see how that works. Meanwhile, here's the data. In College X, 130 passed the test out of 200. In College Y, 208 passed out of 400. You see the two sample proportions there. PSX is 0.65. That's 130 over 200. The sample proportion of College Y is 0.52. That's 208 over 400. We have to first get, as we know, the same as before, we have to compute P-bar, the average proportion of the samples, as if they were one. You have the two numerators, the two denominators, and you end up with a P-bar of 0.563. Make note that this 0.563 should be in between the two sample proportions and indeed it is. Let's see how that one-tail test looks on the next slide. Again, we have the straw man. HO is that the pass rate for College X is less than for College Y. Of course, they're hoping to refute that and shoot it down. If we reject HO, then we're left with H1, which does have a significantly higher pass rate on the bar exam than College Y. Now we do the, convert everything to a Z-score, and notice that the rejection region, notice H1 points to it, is on the right, and the critical value is 2.33 for Z. That's at the 01 level. We end up with a Z-score of 3.02. So we're in the rejection region, which tells us the sample evidence has less than 1% chance of occurring, and basically we reject HO. We end up rejecting HO, so it sounds like, and now H1 is that College X does have a higher pass rate. That sample evidence of 65% for the sample we took from College X is significantly better than the pass rate for College Y, which is 52%. So we reject HO, and now College X can make that claim, and basically we say the claim is fine. The record. In the previous slide you were given an insane theory as to why women have a higher survival rate than men. Do not accept that theory. That is bias. We're trying to teach you not to listen to bias and only look at the facts. The facts of women have a higher survival rate. We talked about why. It's not because men are annoying. I want the microphone back. Anyway, we're here to teach you how to use statistics and not to use stupid theories of no basis in statistics. As you know, we keep urging you to do more and more problems with lots of them all over our website. Practice, practice, practice. Actually, the two sample Z-tests were portions used quite a bit. Companies are always comparing pass rates on different things and survival rates. It's always done, defective rates. So just do lots of problems and get good at this. Okay, and good luck.