 Welcome to our lecture on statistical inferences about the population proportion P. Remember way back when we talked about the central limit theorem and the sampling distribution of the sample mean? Well, sometimes we want to make inferences about other population parameters, and we have to know the sampling distribution. In this case, in this lecture, we're talking about the sample proportion as an estimator, or using it to make inferences about the population proportion. So we need to know something about the sampling distribution of a proportion. It turned out, and we'll see more in the next slide, but all we need to know at this point, is that when n, the sample size, times P, the population proportion, and the complement, n times 1 minus P, if they're, as long as they're both at least five, greater than or equal to five, then the z-distribution is a very, very good approximation to the sampling distribution of the proportion. Therefore, we can use the formula for z to convert the sample proportion, a random variable, to a z-statistic. P s is what we're using to denote the sample proportion. P, capital P, is what we're using for the population proportion. And this is as good a time as any to note that depending on the book you're using, or some other notes that you may be using, some people use other notations for the population proportion. You may be using pi, the Greek letter pi, as the population parameter, the population proportion. But we're going to be consistently using capital P. So like all formulas for z, when we're using a z-statistic, we take the random variable minus its expected value divided by its standard deviation. And we know this from those who studied the sampling distribution of a proportion. We're not going to study it very, very, very much in depth. There's a little hint of it on the next slide. But basically the formula for converting to z is P s minus P divided by the square root of P, the population proportion, times 1 minus P divided by n. All of that is under the square root. And as always in the formula for z, what population parameter are we using? The one under the no hypothesis. So in this case, when we say we're taking P s, the sample proportion minus P, well, we're using the sample proportion in order to make inferences about P. So we don't know P. We don't know the population proportion. But we're doing a hypothesis test. We have a hypothesis. And that's the value of P that we use over here. Now let's look at the formula for a confidence interval estimator of P, with 1 minus alpha, for instance, maybe 90% or 95%. The sample statistic, like before, is smack in the middle of the interval. That's sample proportion P s. And then plus and minus the half-width of the confidence interval. In other words, we take P s as something on one side, as something to the other side. And that's our interval estimator for the population proportion P. What do we use? First thing is the value from the z-table that gives us our level of confidence. And then we multiply by the measure of variation. And this time, under the square root sign, we have P s, the sample proportion, times 1 minus P s divided by n. So here's something to ponder. Take a minute and think about it. This is the first time you see the two formulas that are very, very similar, just sort of algebraically turned around. One for hypothesis test, one for a confidence interval. And yet they use different values. The formula for z, for computing the z-statistic, uses the population parameter P under the null hypothesis. The formula for a confidence interval estimator just uses the sample proportion. You don't see P anywhere in there. Why is that? Think about it for a minute. I'll give you 10 seconds. Now I'll give you the answer. Well, if we're doing confidence interval estimator, we're trying to estimate the population parameter. That means we are not making any assumptions about it. We have no claims. We have no hypotheses. We don't know anything. We don't even know enough to make a guess about P. All we have is the data from the sample. So we use the sample proportion all the way through. Okay, so here's what we're referring to before. The sampling distribution of a proportion should be... A proportion is binomially distributed. So it's two states. They're not success. There's a probability of each P and one minus P. That's a very, very, very obviously something that's going to follow the binomial distribution. However, you can see from the picture as n gets large and n times P and n times one minus P get closer... are greater than or equal to five, at least five, the distribution gets approximately very, very close to a bell-shaped normal type of distribution. And the z becomes a very, very good approximation for the sampling distribution of a proportion. And this is a good example to see when we use the test of a proportion. A politician claims that 70%, if that really is a proportion, it's not a mean, 70% of the people in her district are Democrats. Even if that's true or not, so you take a sample, we call it P little s, the sample, the sample is 50 out of 100, which is 50%. Now the question is, is that 50% far enough away from the claim of 70%, it's a 20% difference, but it could be sampling error. And certainly everyone in the class would agree if I make a claim of 70%, so out of 100 people who have gotten 69, that's probably sampling error. So 71 or 68, but here it's 50 out of 100. So that's really what we're going to do. We're going to take the sample evidence, convert it into a z-score, and then actually for the z-score, we can see the probability. 95% or not. Now again, knowing that the z-value, and you're familiar with 1.96, that between plus 1.96 and minus 1.96 and the z-distribution, will give you 95% of the area. So we have that little cushion. So we're willing to accept up to a certain point that a sampling error. So a little bit of a deviation from the 70%, which means 70 out of 100 people, we would accept. A little bit of, we accept the sampling error, but it's too far away. Let's say we're beyond minus 1.96 or beyond plus 1.96, then we will pull you a liar and we'll say the politician is lying. Well, the formula you have on the sheet, and we turn the sample evidence, but notice we're using the claim. You claim that P, the population proportion is 0.70, we play devil's advocate. You're right. Let's say it is 0.70. So we put that into the formula. We have the 0.70 there. We look at the sample evidence, and now this will give us a probability in effect. So when we finish, we get a z-score of minus 4.36. That's definitely in the rejection region. And we know the probability of getting a z-score of minus 4.36 is a lot less than 5%. And you can figure it out. If you really want to figure out what is the probability, you can actually go from minus 4.36 to negative infinity, see how much area you have then, double it, and you'll actually see what is the probability of getting this sample evidence if HO is true. But our conclusion is the politician is a liar. We shouldn't have seen, if they're claiming 70%, we shouldn't have gotten the sample of 50 out of 100. P little s of 0.50 is too far away from 70%. Can't you conclude that the politician is a liar because her mouth is moving? Yeah, I think I would agree with that, too. Okay, now we're constructing a confidence interval. Now remember, with a confidence interval, you don't have any claims. You're just working through the sample evidence, which was 50 out of 100 or 0.50. So all you can work with is that 0.50. We use the same z-score for 95% confidence of 1.96. And let me do the plus and the minus on both sides. So we do 0.50 plus and minus the z-value of 1.96 times the square root of 0.50 times 0.50 over 100. And our sampling error, or margin of error, if you want to call it that, the margin of error is 0.10. So our conclusion is we're 95% confident that somewhere that range of 40% to 60%. Remember, 0.40 is 40%. And 0.60 is 60%. Somewhere that range of 40% to 60%, we have the true proportion of Democrats in this district. But let me tell you right away that 70% is not reasonable. Neither is 30%. But up to 60% could be reasonable. Here's another problem, very similar to the previous one. You may even wish to take a moment, stop the audio, do the problem on your own first. But for now, let's move on. A politician claims that exactly 90% of the American public favors legalizing drugs. A survey of 100 people shows that only 79 are in favor of drug legalization. Tests at alpha equal 0.05. Well, the first thing you do, just like any other hypothesis test, you need your known alternate hypotheses, H0 and H1. H0 is the claim. The claim is that exactly 90% of the American public favors legalizing drugs. So H0 is that claim. That P, the population proportion, is equal to 0.9. The alternate hypothesis, H1, is that it's not. If we reject H0, we don't really have to specify whether it's not because it was too large or not because it was too small. Either way, if we reject the known hypothesis, all we're saying is that 0.9 or 90% is wrong. From the data, we compute, I'm sorry, one more step first, looking at the critical values from the Z distribution. Alpha is 0.05. We take that 0.05 and break it up equally into the two tails because this is a two-tail test. We're going to reject if we find a value that's too high, we're going to reject if we find a value that's too low, 2.5% and 2.5%, but for that tail probability from the Z table with critical values of plus and minus 1.96. The calculated value of Z, we calculate from the data. The data tells us that 79% of the sample was in favor of drug legalization. So that's 0.79 minus 0.9. And we're using the hypothesized population proportion 0.9 in the denominator in the measure of variation. That's 0.9 times 0.1 divided by 100 or that under a square root sign. So what you get is negative 0.11 in the numerator, 0.03 in the denominator, or negative 3.67, which is way out on the left side in the region of rejection, and so we reject the no hypothesis at alpha equal 0.05. P, we don't have P, we haven't computed it, but if we did, it would be way less than 0.05. I just want to point out one more thing to take this opportunity. Remember always when you're doing these tests, you need four things. When you do a hypothesis test, you need four pieces. You need the hypotheses because you can't reject or not reject something that isn't there. You need the critical values from the distribution of the test statistic. You need the calculated value of the test statistic, and you need your conclusion. Reject or not? I'm looking at a problem of legalizing killing politicians. Why? Can that be the next problem? How about killing spouses? My D is the... Part B is to construct a two-sided 95% confidence interval estimator for the population proportion. All this involves putting numbers into the formula. The sample proportion, 0.79, smack in the middle, plus and minus the margin of error, which turns out to be 0.08, and randomly an interval from 0.71 to 0.87. We have 95% confidence that this interval does contain the true population proportion of people who are in favor of legalizing murdering... legalizing drugs. Just on the record, I'm not in favor of this law that my wife wants to pass up making it legal to kill spouses. I'm against that law. In any event, here's a problem with defective widgets. And again, we've been doing two-tailed tests before. We did two problems with two-tailed. Let's do one with one-tailed tests. The company claims that no more than 8% of its widgets does affect us. And you know it by now. You hear the word no more. That's a clue. That's a clue. No more means it's going to be a one-tailed test. All right? Now, no more than 8% of its widgets. Now, you take a sample of 100. Ideally, it should be 8 defectives. We found 10. Okay. It could be sampling error or it could be that the company's been lying. All right? So we have PS is 10 over 100 or 0.10. The claim was 8%. The two would pass this problem. First, we're going to test the claim at an altitude of 05. And part D says, no claims were made. We just want to construct a 95% confidence interval and suicide it. Well, here we're doing the hypothesis test. So HO is P. Remember, capital P is the population proportion. Some books call it pi. We're calling it capital P. And it's less than 0.08. Anything below 0.08 is fine. The claim was something less than 8%. No more than 8%. H1 is that P is more than 8%. Okay? We have the sample evidence, PS. That's the sample is 10 out of 100 is 0.10. So using the formula for Z, now this is a one-tailed test. So the critical value is on the right. Remember, the clue H1 always points to A for the critical value. If you put the full 05 on the right side, you have a critical value for Z of 1.645. All right? So now we convert the sample evidence into a Z-score. And we do 0.10 minus 0.08 over the square root of 0.08 times 0.92 over 100. That's the formula. And you get 0.020 over 0.027 or 0.74. So the Z-score, in effect, you've taken the sample evidence and you've converted it all into a Z-score of 0.74. Notice it's not in the rejection region. There would have to be more than 1.645 to be in the rejection region. If they're in value of 2, 2.5, that's in the rejection region. 0.74 is not enough for us to reject your claim. So we can't reject HL. We don't reject it. Because we're probably getting the sample evidence with more than 5%. So basically the conclusion is that maybe the claim is accurate and we're just looking at sampling error. 10 is not enough of a deviation from 8%. On the other hand, I think you wouldn't even need statistics for this. If I claim that no more than 8% of my widgets are defective and you took a sample of 100, instead of finding no more than 8 out of 100, let's see if around 45 would be effective out of 100. I think you would all know that that would not be sampling error. How about the whole of the level, the fact that 100 out of 100 is not enough to buy anything from this company? You certainly know that it's almost impossible to get that kind of sample evidence. Now here in Part B, no claims are made. All we have to work with is sample evidence. We know 10 out of 100 widgets made by this company are defective. So here we're using the z-value of 1.96 for a 95% confidence interval. And if you look at the... we use 0.10 plus or minus 1.96 times the square root of 0.10 times 0.9 out of 100, that whole term works out to... on the right of the plus and minus side, works out to 0.06. That's called the margin of error, the sampling error. It's the margin of error. So we know that somewhere between 0.10 plus 0.06 to 16%, 0.10 minus 0.06, 0.04%, somewhere between 4% and 16%, that's where you'll find the true population proportion of defectives. It's somewhere anywhere between 4% and 16%. Now the obvious question is, we have a pretty wide interval here. What can we do to narrow it down? That's the kind of thing that... even if you have a statistician, you should know that the reason it's so wide is because if you were a sample of 100, maybe you should take a bigger sample. And that's one of the things we know, if you take a larger sample, that'll make your confidence interval narrower. Your margin of error won't be as large. This will prove that we call one sample test. We're looking for one sample. Remember, we look at one sample test, and think of it as like three cases. One case where we can use the z when you know sigma, or when n is quite large. Some books say, you know, have others. But our approach is if n is very large, we're going to use z. The population standard deviation, you can use z. So large samples we're going to use z. If the samples are small, and it doesn't matter how small or small, but when they tend to be small, we're going to use t. Remember, then we use the t-distribution. And if you're dealing with a proportion, and again, in cases where you're dealing with a binomial, proportion, we saw a couple examples before, where I say a certain percentage are Democrats, or a certain percentage are defective. When you're dealing with proportions, although they say that, you know, at least, you know, the third extent of the students who take statistics will fail. That's a joke. We don't accept 30%. But things like that, then you're going to have to use the z as an approximation of the binomial distribution. So we have three cases, and all three have been covered. If you want a clue, a word clue, where you're looking at a problem, and you want to know, is this a meaning problem, or a proportion problem? Are we making inferences about the meaning of the proportion? A very good clue is, does the problem tell you the standard deviation? Because as you saw from the formula, the standard deviation for the distribution of the proportion has a proportion built into it. You don't have a separate standard deviation. So here's our problem. You tell me what you'd be using. But let's say that, you know, at least 40% of students taking the CPA exam at our college will pass. You hear a percentage, right? It's got to be the zone. I'm not going to use standard deviation, and eventually I just go, well, I'll give you the numbers. I'll say, you know, 40 out of 100. Some number like that, it's a problem involving proportions. On the other hand, if I talk about a score, if I say that the SAT scores at this college are, and I give you a mean in the standard deviation, then you're working with the VOP depending on the sample size.