 In this lecture, we moved to look at making inferences about a different population parameter. Up until now, the inferences estimation and hypothesis testing were always about the population mean mu. In this lecture, we're going to be making statistical inferences about the population proportion. Welcome to the lecture. Remember way back when we talked about the central limit theorem and the sampling distribution of the sample mean? Well, sometimes we want to make inferences about other population parameters, and we have to know the sampling distribution. In this case, in this lecture, we're talking about the sample proportion as an estimator or using it to make inferences about the population proportion. So we need to know something about the sampling distribution of a proportion. It turned out, and we'll see more in the next slide, but all we need to know at this point is that when n, the sample size, times p, the population proportion, and the complement, n times 1 minus p, as long as they're both at least 5, greater than or equal to 5, then the z-distribution is a very, very good approximation to the sampling distribution of the proportion. Therefore, we can use a formula for z to convert the sample proportion, a random variable, to a z-statistic. P s is what we're using to denote the sample proportion. P, capital P, is what we're using for the population proportion. And this is a good time then to note that depending on the book you're using or some other notes that you may be using, some people use other notations for the population proportion. You may be using pi, the Greek letter pi, as the population parameter, the population proportion. But we're going to be consistently using capital P. So like all formulas for z, when we're using a z-statistic, we take the random variable minus its expected value divided by its standard deviation. And we know this from those who studied the sampling distribution of a proportion. We're not going to study it very, very, very much in depth. You'll see a little hint of it on the next slide. But basically, the formula for converting to z is P s minus p divided by the square root of P, the population proportion, times 1 minus p divided by n. All of that is under the square root. And as always in the formula for z, what population parameter are we using? The one under the no hypothesis. So in this case, when we say we're taking P s, the sample proportion, minus p, well, we're using the sample proportion in order to make inferences about P. So we don't know P. We don't know the population proportion. But we're doing a hypothesis test. We have a hypothesis. And that's the value of P that we use over here. Now let's look at the formula for a confidence interval estimator of P with 1 minus alpha, for instance, maybe 90 percent or 95 percent. The sample statistic, like before, is smack in the middle of the interval. That's sample proportion P s. And then plus and minus the half-width of the confidence interval. In other words, we take P s as something on one side, as something to the other side. And that's our interval estimator for the population proportion P. What do we use? First thing is the value from the z table that gives us our level of confidence. And then we multiply by the measure of variation. And this time, under the square root sign, we have P s, the sample proportion, times 1 minus P s divided by n. So here's something to ponder. Take a minute and think about it. This is the first time you see the two formulas that are very, very similar, just sort of algebraically turned around. The formula for hypothesis test, one for confidence interval, and yet they use different values. The formula for z, for computing the z statistic, uses the population parameter P under the null hypothesis. The formula for confidence interval estimator just uses the sample proportion. You don't see P anywhere in there. Why is that? Think about it for a minute. I'll give you 10 seconds. I'll give you the answer. Well, if we're doing confidence interval estimator, we're trying to estimate the population parameter. That means we are not making any assumptions about it. We have no claims. We have no hypotheses. We don't know anything. We don't even know enough to make a guess about P. All we have is the data from the sample. So we use the sample proportion all the way through. Okay. So here's what we were referring to before. The sampling distribution of a proportion should be, a proportion is binomially distributed. So it's two states, the success, not success, there's a probability of each P and one minus P. That's a very, very, very obviously something that's going to follow the binomial distribution. However, you can see from the picture as n gets large and n times P and n times one minus P get closer, are greater than or equal to five, at least five, the distribution gets approximately very, very close to a bell-shaped normal type of distribution and the z becomes a very, very good approximation for the sampling distribution of a proportion. Okay. This is a good example to see when we use the test of proportion. A politician claims that 70 percent, whatever he is, he sees the proportion. It's not a mean. 70 percent of the people in her district are Democrats. Even if that's true or not, so you take a sample, we call it P little s, the sample, and the sample is 50 out of 100, which is 50 percent. Now the question is, is that 50 percent far enough away from the claim of 70 percent to 20 percent difference, but it could be sampling error. And certainly everyone in the class would agree if I make a claim of 70 percent, so out of 100 people, if you'd have gotten 69, that's probably sampling error. So 71 or 68, but here it's 50 out of 100. So that's really what we're going to do. We're going to take the sample evidence, convert it into a z-score, and then actually from the z-score, we can see the probability. And we can see if it's less than 5 percent or not. Now again, knowing what the z-value and you're familiar with is 1.96, that between plus 1.96 and minus 1.96 and the z-distribution will give you 95 percent of the area. So we have that little question. So we're willing to accept up to a certain point that a sampling error. So a little bit of a deviation from the 70 percent, which means 70 out of 100 people, we would accept. A little bit of a, we accept the sampling error. But if it's too far away, let's say we're beyond minus 1.96 or beyond plus 1.96, then we will call you a liar and we'll say the politician is lying. Well, the formula you have on the sheet and we turn the sample evidence, but notice we're using the claim. You claim that P, the population proportion is 0.70, we play devil's advocate. You're right. Let's say it is 0.70. So we put that into the formula. We have the 0.70 there. We look at the sample evidence and now this will give us a probability in effect. So when we finish, we get a z-score of minus 4.36. That's definitely in the rejection region and we know the probability of getting a z-score of minus 4.36 is a lot less than 5 percent. And you can figure it out. If you really want to figure out what is the probability, you can actually go from minus 4.36 to negative infinity, see how much area you have, then double it and you'll actually see what is the probability of getting this sample evidence if HO is true. But our conclusion is the politician is a liar. We shouldn't have seen, if they're claiming 70 percent, we shouldn't have gotten a sample of 50 out of 100. P, a little less of 0.50 is too far away from 70 percent. Can you conclude that the politician is a liar because her mouth is moving? Yeah, I think I would agree with that too. Can I look at structing a confidence interval? Now remember, with a confidence interval, you don't have any claims. You're just working through the sample evidence which was 50 out of 100 or 0.50. So all you can work with is that 0.50. We use the same z-score for 95 percent confidence of 1.96. And let me give you the plus and the minus of both sides. So we do 0.50 plus and minus the z-value of 1.96 times the square root of 0.50 times 0.50 over 100. And our sampling error or margin of error, if you want to call it that, the margin of error is 0.10. So our conclusion is with 95 percent confidence that somewhere that range of 40 percent to 60 percent, remember, 0.40 is 40 percent and 0.60 is 60 percent. Somewhere that range of 40 to 60 percent, we have the true proportion of Democrats in this district. So let me tell you right away that 70 percent is not reasonable. Neither is 30 percent. But up to 60 percent could be reasonable. Here's another problem, very similar to the previous one. You may even wish to take a moment, stop the audio, do the problem on your own first. But the finalists move on. A politician claims that exactly 90 percent of the American public favors legalizing drugs. A survey of 100 people shows that only 79 are in favor of drug legalization. Tests at alpha equal 0.05. Well, the first thing you do, just like any other hypothesis test, you need your know and alternate hypotheses, H-O-N-H-1. H-O is the claim. The claim is that exactly 90 percent of the American public favors legalizing drugs. So H-O is that claim, that P, the population proportion, is equal to 0.9. The alternate hypothesis, H-1, is that it's not. If we reject H-O, we don't really have to specify whether it's not because it was too large or not because it was too small. Either way, if we reject the know hypothesis, all we're saying is that 0.9 or 90 percent is wrong. From the data, we compute, I'm sorry, one more step first, looking at the critical values from the Z-distribution. Alpha is 0.05. We take that 0.05 and break it up equally into the two tails because this is a two-tail test. We're going to reject. If we find a value that's too high, we're going to reject if we find a value that's too low, 2.5 percent and 2.5 percent. But for that tail probability from the Z-table, we have critical values of plus and minus 1.96. The calculator value of Z, we calculate from the data. The data tells us that 79 percent of the sample was in favor of drug legalization. So that's 0.79 minus 0.9. And we're using the hypothesized population proportion 0.9 in the denominator in the measure of variation. That's 0.9 times 0.1 divided by 100, or that under a square root sign. So what you get is negative 0.11 in the numerator, 0.03 in the denominator, or negative 3.67, which is way out on the left side in the region of rejection. And so we reject the null hypothesis at alpha equal 0.05p. We don't have p. We haven't computed it, but if we did, it would be way less than 0.05. I just want to point out one more thing and take this opportunity. Remember always when you're doing these tests, you need four things. When you do a hypothesis test, you need four pieces. You need the hypotheses because you can't reject or not reject something that isn't there. You need the critical values from the distribution of the test statistic. You need the calculated value of the test statistic, and you need your conclusion. Reject or not? I don't think there's a problem about legalizing killing politicians or why. Can that be the next problem? How about killing spouses? All right. It might be... I guess we gotta get an email. Part B is to construct a two-sided 95% confidence interval estimator for the population proportion. All this involves putting numbers into the formula. The sample proportion 0.79 smack in the middle, plus and minus the margin of error, which turns out to be 0.08, and randomly an interval from 0.71 to 0.87, we have 95% confidence that this interval does contain the true population proportion of people who are in favor of legalizing murdering... What are legalizing and legalizing drugs? Just on the record, I'm not in favor of this law that my wife wants to pass up making it legal to kill spouses. I'm against that law. But any event, here's a problem with effective widgets. And again, we've been doing two-tailed tests before. We did two problems with two-tailed. Let's do one with one-tailed tests. The company claims that no more than 8% of its widgets does effective. And you know by now, you hear the word no more. That's a clue. That's a clue. No more means it's gonna be a one-tailed test. All right. Now, no more than 8% of its widgets. Now, you take a sample of 100. Ideally, it should be 8 defectives. We found 10. Okay. It could be sampling error, or it could be that the company's been lying. All right. So we have PS is 10 over 100 or 0.10. The claim was 8%. Now, the two who passed this problem. First, we're gonna test the claim at an alpha of 05. And part D says, no claims were made. We just want to construct a 95% confidence interval and suicide it. Well, here we're doing the hypothesis test. So, HO is P. Remember, capital P is the population proportion. Some books call it pi. We're calling it capital P. And it's less than 0.08. Anything below 0.08 is fine. The claim was something less than 8%. No more than 8%. H1 is that P is more than 8%. Okay. We have the sample evidence, PS. That's the sample is 10 out of 100 is 0.10. So, using the formula for Z. Now, this is a one-tailed test. So, the critical value is on the right. Remember, the clue H1 always points to A for the critical value. If you put the full 05 on the right side, you have the critical values of Z of 1.645. All right. So, now we convert the sample evidence into a Z-score. And we do 0.10 minus 0.08 over the square root of 0.08 times 0.92 over 100. That's the formula. And you get 0.020 over 0.027 or 0.74. So, a Z-score. In effect, you've taken the sample evidence and you've converted it all into a Z-score of 0.74. Notice it's not in the rejection region. There would have to be more than 1.645 to be in the rejection region. If they're in value of 2, 2.5, that's in the rejection region. 0.74 is not enough for us to reject your claim. So, we can't reject HL. We don't reject it. Because we're probably getting the sample evidence. It's more than 5%. So, basically, the conclusion is that maybe the claim is accurate. And we're just looking at sampling error. 10 is not enough of a deviation from 8%. On the other hand, I think you would need new statistics for this. If I claim that no more than 8% of my widgets are defective, and you took a sample of 100, and instead of finding no more than 8 out of 100, let's see if around 45 would be effective out of 100. I think you would all know that that would not be sampling error. How about the whole of whatever defective? 100 out of 100. That would be enough to buy anything from this company. You certainly know that it's almost impossible to get that kind of sample evidence. Now, here in Part B, no claims are made. All we have to work with is sample evidence. We know 10 out of 100 widgets made by this company are defective. So, here, we're using the z-value of 1.96 for a 95% count as interval. And if you look at the... We use 0.10 plus or minus 1.96 times the square root of 0.10 times 0.90 of 100, that whole term works out to on the right of the plus and minus side, works out to 0.06. That's called the margin of error, the sampling error. It's the margin of error. So we know that somewhere between 0.10 plus 0.06 or 16%, 0.10 minus 0.06, 0.04%, some of them being 4% and 16%, that's where you'll find the true population proportion of defectives. It's somewhere, anywhere between 4% and 16%. Now, the obvious question is, with a pretty wide interval here, what can we do to narrow it down? And that's the kind of thing that, even if you're not a statistician, you should know that the reason it's so wide is because if you were a sample of 100, maybe you should take a bigger sample. And that's one of the things we know, that if you take a larger sample, that will make you a confidence interval narrower. Your margin of error won't be as large. Suppose a researcher claims that at least 40% of students taking online statistics courses cheat on the exams which are online. You're going to test the claim, right? So you sample 1,000 students randomly, n equals 1,000, and you find that 520, 520 out of 1,000 admit to cheating on their online exams. Okay, now we're going to test at the alpha of 05. So we start off H0, P is greater than 0.40, H1 is P less than 0.40. We know this is a one-tail test because the claim was at least. And notice the rejection region is on the left, and we're using Z, that's a Z distribution, and we have minus 1.645. And now when you calculate the sample proportion, you get 0.52. That's it, you don't have to do anymore. The sample evidence supports the claim. When the sample evidence supports the claim, you don't have to calculate Z, you say right away, you can't reject H0, the sample evidence supported it. The only reason we're doing tests is if it was even slightly below 0.40, suppose it would have been 0.39, you don't want to reject it if it turns out that it's sampling error. Okay, so that's the statistics it's all about. When you're doing these tests, you want to make sure you're not rejecting a claim on the basis of something that's really just sampling error. But here when the sample mean, in this case the sample proportion, supports the claim. That's it, you're finished, it supported it. So you can't reject. Okay, so again, do not do the problem. There's nothing to do. You cannot reject H0 if the sample evidence supports the claim. Thank you for attending this lecture. In the context of the course, this lecture wraps up this section on one sample statistical inference. Then we move on to two sample tests and more, maybe. And naturally, as you know, if you want to make sure that you retain this information, do as many practice problems as you can find. Do the homework, find a lot. There are a lot of problems on our website. Do every problem that you can get a hold of and it will do you good. You'll be okay.