 Welcome to our lecture on the two-sample Z-test for the difference between two proportions. When do you want to use this sort of statistical test? Well, anytime you want to compare proportions or something that in effect is a proportion like a rate. If you want to compare the rates of defectives in computer chips that are supplied by two different companies, and you might use one of the companies in your tablet computer that your firm manufactures. If you want to compare death rates for heart transplant surgery at two different hospitals, if you want to compare graduation rates for two high schools in the same area, so you can choose where to send your children. These are the kinds of things that you'll be doing in this lecture. We're going to be using the Z as a way of determining whether the observed difference between the two proportions is significant or just sampling error. Okay, now to use the Z we need a larger sample. There won't be a T version of this. If you use one of the small samples, forget it, you can't do the T test. It has to be relatively large, and here are the rules on the bottom. N1 times P1 should be greater than 5. Just look at it, you'll see. You need a relatively large sample to use the Z test for this purpose. This Z test is actually an approximation, which by the way is why we can't use the T if we have a small sample size, because that would be an approximation to an approximation, and we don't really want to go that far. The formula you see over there, the sample proportion from group one minus the sample proportion from group two and so on, on the right hand it shows you how to get the sample proportion, X1 over N1, X2 over N2, where X is, you might say, the number of successes or the number of hits. It's whatever it is that you're counting and that you're getting the rate of, like maybe number of defectives in total number of parts, and of course N is the sample size for the individual group, N1, the sample size for group one and two, the sample size for group two. Now notice the formula for a Z requires under the square root sign and the denominator a P bar. It requires the pooled estimate of the population proportion, and what this is is basically the sample proportion if you had not split your data into the two different groups. If you combine both the groups together, you get the pooled proportion by taking X1 plus X2, and then in the denominator N1 plus N2, and that gives you the pooled estimate of the population proportion for the groups. We're comparing death rates at two hospitals. We've been looking at liver transplants in two similar hospitals in similar areas, and we noticed in hospital A, 77 out of 100 people died within six months of getting that liver transplant. In hospital B, it's 120 out of 200. Clearly you're comparing two proportions. Remember HO is that the two proportions are the same. That's why we pool it because under HO, we're saying that there's no difference, so we pool it. Now the question is, are the death rates for the two hospitals statistically different? We're going to test that alpha equals 0.05. Now the sample size is large enough. In fact, combined it's 300. We've got 300 people, and this allows us to use the Z approximation. Now we're ready to do the problem. HO is that there's no difference between the two proportions, which is like P1 equals P2, or if you wish P1 minus P2 equals 0. And the alternative, H1, is that P1 is not equal to P2. So it's a two-tailed test, and we didn't put the 0.25 in the right tail and the 0.25 in the left tail. By now you know that we took the 0.05, the alpha of 0.05, split into two. So you have 0.025 in the right and 0.025 in the left tail. Now notice what we do. We look at the two proportions. PS1, 77 out of 100 died, that's 0.77. PS2 is 120 out of 200, that's 0.60. So really we're comparing 77% with 60%. Now when we need that P bar, that's that pooled proportion. Remember on the HO there's no difference. So we're playing devil's advocate in the straw man, and they say, okay, on the HO we can combine it. So we combine the two to get that P bar. 77 plus 120 divided by 100 over 200 to 300 people, 197 over 300 is 0.657. So that's going to be, you'll see that 0.657 in the denominator. So we compare 0.77 minus 0.60, so you have 0.17. There's a 17% difference in the numerator, and then you have this huge thing in the square root, which is a lot easier to do. But just remember to do the thing of the parentheses first. First do 1 over 100 plus 1 over 200. Then whatever you get, times that by 0.343, times that by 0.657, take the square root and you get 0.058. Your Z value, it's an approximate Z, is 2.93. Clearly it's in the rejection ratio. Anything more than 1.96 would have been in the rejection ratio, 1.97, 1.98, and 2.93 is well into the rejection region. Reject the HO, the probability of getting this sample evidence that HO is true is less than 5%. In problem two, we're comparing two unemployment rates. The unemployment rates for county A and for county B. Remember, a rate is just a proportion, and if we're comparing two proportions, we're asking the question, are they really the same? Are they just two different samples taken from the same population, or are they statistically different? We're going to test that alpha equals 0.05, just to keep things simple. You already know how to do that. Here we solved the problem. We do the hypothesis test, the two sample hypothesis test. The null hypothesis, again, the proportion from group one is equal to the proportion from group two. The alternate hypothesis, which we accept if we reject the null hypothesis, is that the two proportions are different. The sample proportion from group one, 100 out of 400 was 0.25. That was the unemployment rate in group one. And in group two, 44 out of 200, it was 0.22. That was the unemployment rate in group two. The pooled p, the pooled proportion, ended up being 0.24, which kind of makes sense. And you see the picture of the z-distribution under the null hypothesis. We're using alpha 0.05, so once again, we have 0.025 in each tail. And the critical values from the z-table are plus and minus 1.96. The calculated value of z from the data, from the sample evidence, working through the formula, works out to 0.8. And that's in the big white area of non-rejection. So our conclusion is do not reject HO. The two unemployment rates are different, and the difference is only due to sample variation. It could happen with any two samples, even from the same county. We're going to examine real data. This is from the Donner party. The study was actually published by Donald Grayson, and he wanted to know whether the survival rate on the conditions of starvation is different for men and women. In other words, are men or women more likely to survive? Who's more likely to survive when there's no food? Well, you can't do an experiment like this, obviously. You can't starve people, but they looked at the Donner party, and you can see a little bit about the background. They were traveling from Illinois to California at a huge blizzard. There's no food for months, and they actually didn't kill each other. They were nice to each other, but if somebody died, they had to resort to cannibalism. We know the death rate for the women was 10 out of 34. 10 out of 34 women died, and for the men it was 30 over 53. You can see why you need a statistical test. You can't just simply say that more men died. It may just be chance variation. We're going to test at the O5 significance level to see whether the death rates for men and women are statistically different from each other. Anyway, now we're going to do the problem. HO is at P1 equals P2. There's no difference in death rates when there's no food. Men and women have the same survivability, if that's a word. And H1 is at P1, it's not equal to P2. We pool it. H0 is no difference. We combine the men and the women. There are 40 deaths out of 87 people, which is 0.46. That's P bar. P bar is 0.46. This is a Z. We're comparing 0.294 minus 0.566. On the numerator, we have minus 0.272. And in the denominator, you have the square root of 0.46 times 0.54. Then in parentheses, which maybe you got to do that first, 1 over 34 plus 1 over 53. After you do the thing in the parentheses, multiply that by 0.54 times 0.46, take the square root, and you end up with 0.1095 in the denominator. You end up with a Z value of minus 2.48. Now, as for the previous two problems, you know what happens when you took the alpha of 05 and you split it. So you have 0.25 in the right tail, which gives you a Z value of plus 1.96, 0.25 in the left tail, which is Z value of minus 1.96, minus 2.48 is in the rejection region. So basically, we reject HO. We conclude that the survival rates for men and women are different. In fact, the author of the study explained why he believes that women have a higher survival rate under starvation conditions than men. Women have an extra layer of fat tissue. That's there. So the fetus and, of course, the woman herself has enough nourishment in the case of a famine. It's not uncommon, sadly, in this world that people starve. I mean, that's one of the, hopefully, you are aware of what's going on in many parts of the world. There's not enough food. Food deprivation is a serious problem. So, you know, women have that extra layer of fat that's there for them and, of course, for any, if they get pregnant, a fetus. So according to this study, and it's been corroborated with another study, the same thing happened. And in both studies, the women did better than the men when there was no food. Another theory, and that's that the men got very annoying and the women might have bumped them off for the food. So perhaps there's another reason why women were more likely to survive under these conditions than men. Just the theory. The record. In the previous slide, you were given an insane theory as to why women have a higher survival rate than men. Do not accept that theory. That is bias. We're trying to teach you not to listen to bias and only look at the facts. The facts are women have a higher survival rate. We talked about why. Okay? It's not because men are annoying. Anyway. I want the microphone back. Anyway. We're here to teach you how to use statistics and not to use stupid theories of no basis in statistics. As you know, we keep urging you to do more and more problems with lots of them all over our website. Practice, practice, practice. Actually, the two sample CTest reportions use quite a bit. Companies are always comparing pass rates on different things and survival rates. It's always done, defective rates. So just do lots of problems and get good at this. Okay? And good luck.