 In this lecture, we continue the introduction to statistical inference. Previously, we covered estimation. And in this one, we introduce you to the concept of testing hypotheses. Well, we learned about estimation as part of inference. Now we're going to learn about hypothesis testing. With testing some kind of hypothesis or a claim. Claims are always about parameters. Let's say a company makes a claim about its product. An example we have here is a frozen yogurt company says their yogurt has no more than 90 calories per cup. Now they're not talking about one particular cup of yogurt. They're talking about, it's a parameter, about the mu. That the population mean, you know, when it comes to calories per cup, the mu is less than or greater than 90. We're going to do a hypothesis test using the same kind of sample evidence we saw in estimation. Remember the sample evidence is N, X bar and S. Okay, that's called the sample evidence. So we're going to use that sample evidence to test the claim. Claims, which we call hypothesis, is always about a parameter. So we're going to take a sample. Suppose the sample is N equals 100 cups. That way you get the sample mean. Now again, if the sample mean is 90 or less, 90 calories or less, that's it. If we can't accuse the company of lying, their claim was that they had, you know, 90, no more than 90 calories per cup. The sample evidence supports that. You don't need statistics then. Everything is done for you. The sample evidence supports the claim. What happens if it's more than 90? Again, the question is how much more? Could it be slightly more? Could it be a lot more? In statistics, you know, we have to do a test so that we can ascertain whether the company has basically lied about their yogurt. And again, if you look at it, let's say, we all agree it would be tiny problem. Let's say 90.1 calories. Probably they're not lying. That could be sampling error. So they, you know, they're telling the truth. What happens if it's really more? A lot more. Suppose it's 500 calories a cup. Then even a non-statition would say, there's no way that the yogurts have less than 90 calories per cup. When our sample n equals 100, found the average was 500. It's clear the company has not been telling us the truth. So this is essentially the basis of hypothesis testing. There's going to be some kind of claim. And it's always about a parameter. We've been looking at claims about mu. We're going to take some sample mean. We're going to get the sample evidence. We're going to get the mean, the standard deviation based on n. And we're going to see if this claim is reasonable or not. Now let's learn some of the language of hypothesis testing. So we're going to talk about the claim. As we said before, there's a claim. There's a claim about a parameter. The actual claim we're going to call the null hypothesis h and little zero. That's called the null hypothesis. That's what we're going to put the hypothesized parameter value. And that eventually will be compared with the sample value. So we have ho, in this case mu, about mu. Then we have h1. That's called the alternate hypothesis. That's only accepted if you reject ho. And that's a decision that's based on looking at the sample evidence. So either we're going to reject ho, essentially tell the company they're lying, or we don't reject, call it accepted in quotes. Because we technically don't really accept ho. We just don't have the evidence to reject. So we're going to have a claim about a parameter as ho. And the alternative, in case ho gets rejected, we have an alternative hypothesis. And the sample evidence is going to be used to make this decision. Let's take a look at some of the things that can happen when we test the hypothesis. You'll see that we have something we call the state of nature, the columns. Either the null hypothesis really is true, or it really is false. We don't know. Why don't we know? Because we only collected a sample. All we know is the sample evidence. We don't know the population parameter. On the other hand, the rows, we can look at the decision that we make. As a result of the hypothesis test, either we will reject the null hypothesis, or we won't reject the null hypothesis. So there's two possible outcomes of our decision, and there's also two possible things that could be true about the null hypothesis. Let's take a look. If the null hypothesis is true and we don't reject it, that's good. We made the correct decision. On the other side, if the null hypothesis is false and we do reject it, there's another good. We did another correct decision. The other two cells represent possible errors that can occur. We call those errors alpha and beta, or sometimes type one error and type two error. If the null hypothesis is false, but we end up not rejecting it, we say the evidence doesn't let us reject this null hypothesis. We made a boo boo, and it's called a beta error, or an error of type two. If the null hypothesis is true and we reject it anyway, then we've made an alpha error. That's an error of type one. If you were wondering if this alpha is the same alpha that we looked at when we talked about estimation, confidence interval estimations, and the level of confidence was also called one minus alpha, it is indeed exactly the same quantity. It's exactly the same alpha. If we have an alpha error of 0.05, meaning the error that we make when we reject the null hypothesis, even if it's true, that's equivalent to, if we were doing a confidence interval, it's equivalent to using a confidence level of 95%, one minus alpha. There is a trade-off between these two types of errors, the alpha error and the beta error. We would love to be able to say, oh, error is a bad thing. I want all my errors to be very, very small. Can I get both alpha and beta errors down to 0? How do I do that? Of course, even though we would love to do that, we can't. This is a trade-off very much like the one we saw in the estimation lecture between the level of confidence, which was related to alpha, and the size of the interval. You can't keep on increasing the amount of confidence, which you would like, and narrowing the size of the interval, which you would also like. We can't do both of those. Well, in this case, we can't keep reducing the alpha error and keep reducing the beta error at the same time. It doesn't work that way. They work counter to each other. Let's think about it. With the alpha error, what we're doing is we're rejecting the null hypothesis when we should not reject it, when it's true. Well, as the probability of this type of error goes down because we're trying to lower it, we're reducing the probability of rejecting the null hypothesis when it's true, and we're reducing it lower, lower, lower as much as we can. Well, what happens? What happens is we're going to be rejecting less. We're going to be reducing the chance of rejecting altogether, which automatically is going to raise the beta error, the error that we make when we accept the null hypothesis, even if it's false. I don't understand the trade-off between the alpha and the beta error. Remember, the alpha error is the error of rejecting HO. It's the error of rejection. You've rejected HO when it's true. The beta error is the error of acceptance. You've accepted HO when HO is false. So we're going to see there's a trade-off, and the legal system understands the trade-off very well. You can have a legal system where it's extremely difficult to convict criminals. The so afraid of incarcerating innocent people, we make it very hard. You've got to be super certain. The judge says, I want absolute certainty before I put anyone in prison. Well, that system will be committing another kind of error. You're making it so hard, no one's going to get convicted. You don't convict anyone. On the other hand, you can try the other approach. You make it very easy to convict. So you're going to have a legal system that's so easy to convict. Well, guess what's going to happen? A lot of innocent people end up behind bars. So what do you do? You have the error of acceptance and the error of rejection. Putting people in prison and making it impossible to put people in prison. So our legal system actually kind of compromises. We do. We try to keep the alpha error low, but we also worry about the beta error. So what our legal system says, we don't require complete certainty. We don't say beyond the shadow of a doubt. We don't require complete certainty. But you do want beyond reasonable depth. So again, we're trading off the alpha error with the beta error. The key point is you can't make both zero. It's impossible because they're not the different kinds of errors. We can see this trade-off between the alpha and the beta errors in quality control. Suppose you have a company and they buy computer chips for their smartphones and they buy 50,000 at a time. All right. Now, what a smart company does, no pun intended, what they'll do is they'll take a sample of 100 chips and decide on the basis of the sample whether to reject the entire shipment. The huge shipment comes in and now they take a random sample of 100 chips and make a decision. Now, if they're going to reject on the basis of even like one chip in the sample of 100 that's defective and they're going to use that as, you know, we found one defective chip in 100 and so they reject immediately. They could end up rejecting a lot of good shipments. On the other hand, if the firm is too liberal, say, a couple chips here and there, what do we care? And they accept the entire shipment and they make it very easy to accept the shipment and they always assume, oh, it's a sampling error. Then they're going to make the error of acceptance. This is why government and industry generally work with an alpha of .05. They will need to make errors occasionally but they don't want to end up rejecting shipment after shipment. Because again, you have to realize when you're talking about quality control there's always going to be a few defective chips. They'll have a standard that they know they can't have zero. So again, notice how the acceptance error and the rejection error kind of is a tension between the two and there's a trade-off. So we try to keep the alpha error low but we don't make it zero because we don't want to shoot up the beta error. If the alpha error goes down too much, the beta error goes up a lot. So we try to keep the alpha error at .05. Let me give you some steps in hypothesis testing. You may not need it. In fact, most people don't need the steps but if you need it, here they are. Step one, you formulate HO and H1. HO again is the null hypothesis. It's about a parameter and H1 is the alternative hypothesis. For example, you might say HO is a mu is 12.7 years. H1 is mu is not. Then we specify the level of significance, the alpha to be used. Generally, the government likes .05 once in a while you'll see .01 and very rarely .10. But you're going to have to have a level of significance to decide on when you're going to reject HO. Step three, select the test statistic. Now we're going to be learning about Z. But later on we'll learn about a T test. You're not limited to Z. You might use a T distribution. You may learn about the F distribution. That's step three, decide what test statistic. Rule of thumb, large samples. When you're testing the mean, you're going to be using Z. Very small samples, you might have to use T. Step four, you establish the critical value of values of the test statistic needed to reject HO. And you'll draw the picture and you must draw that picture. Then you'll take the sample evidence and you'll actually turn it into a test statistic. You'll get the actual value. We call that the computed value of the test statistic. So when you're all finished, you'll write Z equals. It'll be some number or T if you're working with small samples. You'll write Z equals. And that's the value, the computed value or actual value of the test statistic. And based on this, you're going to make a decision either going to reject HO or not reject HO. And again, this is all going to be done by drawing the diagram. You can have what we call critical values and you're going to see where the sample evidence falls. Does it fall into what we call the acceptance region? Or will it be in the rejection region? Let's see how we set up a hypothesis test. A company claims that its soda vending machines deliver exactly on the average eight ounces of soda. So that's a claim about Mew. The population parameter from the production process of the vending machines. Why does it have to be exactly within certain limits? Well, if we have too much, if we find out that on average, these vending machines really deliver way too much soda. For one thing, it could ruin the machine. It'll overflow the cup for another thing. We're wasting product. What if we find out that these machines deliver too little, much less than the eight ounces average of soda it's supposed to be delivering to the customers? We're short changing our customers. We're going to have issues in the public relations and in government watchdog agencies. So we don't want it either way. We want to make sure that the true population parameter is what it's supposed to be. We have our claim, eight ounces. That's what we use for the null hypothesis. The null hypothesis is that Mew, the population average amount of soda delivered is eight ounces. What's the alternate hypothesis? That it's not eight ounces. How could it be not eight ounces? Well, look at the picture. Either it can be very, very much larger than the hypothesized mean. And you see the red region of rejection on the right side. Or it could be much smaller than the hypothesized mean. And then it would fall into the region of rejection on the left side. So you can see why you have to draw a picture. And we're going to see more problems with this. Don't worry. But in this case, all we're doing is formulating it to show you what a two-tail test looks like. In this problem, we're going to be rejecting either in the right tail or the left tail. If you're too far away from the hypothesized mean on the high side, or if you're too far away from the hypothesized mean on the low side. So both of those tails contain the region of rejection. And it's called a two-tail test. Suppose we're testing at an alpha level of 0.01. That means that we're willing to reject 1% of the time if the null hypothesis is true. And what you see in front of you is a picture of the null hypothesis being true. So that means if the null hypothesis is true, we want to be able to reject 1% of the time. We take that alpha, split it in half, put half of a percent in one tail, half of a percent in the other tail, and look up in the Z table to see what Z values will split the distribution like that. Beyond what Z value on the right does the region of rejection start, and beyond what Z value on the left does the region of rejection start. Well, with 0.005 in the tail, we have seen this before in the estimation lecture and in the normal distribution lecture. We end up with the Z value of plus 2.575 and minus 2.575. The rest of the problem will do eventually, but for now we just want to show you a few problems in terms of how you set them up. So we just saw an example of setting up a problem, a hypothesis testing problem, when we're doing a two-tail test. So you can get the idea that one of the first things we have to do is decide on whether we have a one-tail test or a two-tail test, and that's going to be reflected in HO, in the null and the alternate hypotheses. So it's one of the very first things you have to do because formulate HO and H1, if you remember, was part of step one in hypothesis testing. With a two-tail test, as we saw, the null hypothesis is that the parameter value is equal to a certain value. Then the alternate hypothesis is simply that it's not equal. We don't have to say a different value, what value we think it is. All we have to say is that we reject the null hypothesis, so it's not. And that's why when you have a two-tail test, you split the difference and you say, I can either reject because it's too large, larger than it should be, or I can reject because it's smaller than it should be. With a one-tail test, as you'll see, we're only rejecting on one side of the distribution, either on the high side or on the low side, but not on both. And in that case, our alpha region is all in one tail. Let's look at the difference between one-tail test of hypothesis and two-tail test. We already saw that with a two-tail test, one of the key words in the word problem is probably going to be exactly. Not necessarily, sometimes you may have to figure it out, but certainly if the word exactly is in there, like a company claims that it's a pharmaceutical product has exactly one milligram of aspirin. You really don't want too little because it won't be effective. You don't want too much because it could be dangerous. Obviously, we're not going to get an average of exactly 1.00 milligrams of aspirin, but there's going to be some wiggle room. And what we're saying is that the mean mu of the distribution is exactly one milligram. And how far away would we have to be on either the high side or the low side in order to reject this hypothesis, to reject this claim? With a one-tail test, here's an example. A company claims that its raisin brand has at least 100 raisins in each box. And of course, again, we're talking about an average. And another thing we're looking at is we're assuming that that's a good thing. We're assuming that if somebody does a collection, does the experiment, collects data and finds that on the average, there are more than 100 raisins per box. That's okay, no one's going to have a problem with that. They're not going to reject it. Imagine if you're testing, doing quality control to see if you want to accept the shipment. You're going to accept the shipment. So the problem only arises if you find that the sample mean is less than, in this case, 100. We know that since we're working with the normal distribution, that there's always going to be a certain probability, a positive probability that the sample we took really did come from the distribution with the mean in the null hypothesis. But we're asking, is it reasonable? Could it have been that this distribution is true and we got a different value just because of sampling error? Or perhaps that something happened to the machinery and the production process no longer turns out boxes with at least 100 raisins. And that's really going to be the key. Is it sampling error that we're observing? Or is there really a significant difference from what HO is supposed to be? All right, here are some more examples. We're going to be looking at both two-tail tests and one-tail tests, different problems, to see how the null hypothesis gets set up. This is purely to look at the setup. In the first problem here, a company claims that its bolts have a circumference of exactly 12 and a half inches. Well, if these bolts are, let's say, there to connect the wing to an airplane, that's kind of important. You don't want it to be too large. You don't want it to be too small because the wing will fall off. That's pretty bad. So in this case, the null hypothesis would be that mu is 12.5 inches, exactly what it's supposed to be. The null hypothesis, which we accept if we reject the null hypothesis, is that mu is not 12 and a half inches. The second problem is saying a company claims that its slice of bread has exactly two grams of fiber. And again, from looking at the problem, you get the feeling that you're going to reject if you see that you have an average much higher than 2 so that it's not sampling error. You're going to reject if you see that you have a sample average much lower than 2. And the null hypothesis is that mu is exactly two grams. The alternate hypothesis, that it's not. And remember, the alternate hypothesis, is what we accept if we reject HO. We're going to see an example now of a one-tail test. A company claims that its batteries have an average life at least. When you see words like at least or at most, you know you're dealing with a one-tail test. So they said at least 500 hours, which means anything after 500, 600, 7, 8, it's all good. The only problem is if it's a little bit below 500, you want to know is it sampling error or not. So the rejection region goes to the left, right, the left tail. And here's how we set it up. HO mu is greater than or equal to 500 hours. H1 mu is less than 500. Notice H1 is always pointing to where the rejection region goes. So if you're testing at an alpha of 05, the entire 05 now has to be in the left tail. You don't split it up because of a one-tail test. The only problem is on the left when you're below 500. If you're too far below and it can't be sampling error, we're going to reject HO. So again, notice that hint? H1 always points to where the rejection region should be. And by the way, the critical value, if you check your 0 to Z table, if you want to have 5% in the tail, that means you have between 0 and Z, you have 45%, 0.4500. So you look at your Z table, 0 to Z, and you want 0.4500, 45% of the area, and you find out the value is, in this case, minus 1.645. It's also a one-tail test. A company claims that its overpriced bottled water has no more than 1 microgram of benzene, which is poisonous. How do you formulate that? Remember, the company's claiming it's no more than. So we write that on HO, mu is less than 1 microgram. H1, if we reject HO with left with H1, is that mu is greater than 1 microgram, which is problematic if more than 1 microgram of benzene. Again, notice, we're going to test at the 0.5 level. So we have to put the entire, it's a one-tail test. That means the entire 0.5 goes on one-tail, and in this case, the right-tail. Remember, H1's pointing to it. It's only a problem when you have too much, less. If you have, let's say, a trillionth of a microgram, the government's going to be even happier, or zero benzene, and it's even good. Anything on the left is good. It's on the right side where you're getting problems. All right, so since we're testing on 0.5, the entire 0.5 is in the right-tail, and the critical value is plus 1.645. Here's an example of a two-tail test, and we're going to do the whole problem for you. A pharmaceutical company claims that each of its pills contains exactly 20 milligrams of kumadin. That's a blood thinner. And you see why it's a two-tail test. Too much, you'll kill the person. Too little, it won't work. So you want it to be exactly 20 milligrams of kumadin. Okay, so you take a sample of 64 pills, and you find, here's your sample evidence. Based on your sample of 64, X bar is 20.50, and the standard deviation is 0.80 milligrams. So the question is, should the company's claim be rejected? A test at alpha equals 0.5. So notice how the steps are set up. Formulate the hypotheses. First of all, you realize it's a two-tail test. It's not at least or at most. It's exactly. So HO is that mu is exactly equals 20 milligrams. H1 is mu is not equal to 20 milligrams. Okay, now we're going to choose the test statistic and find the critical values. Now we're testing at alpha 0.5, but this is a two-tail test. We're going to take the alpha and cut it into two. Half of 0.5 is 0.25. Put 0.25 in the right side, right tail. 0.25 in the left tail. And again, we've seen this a number of times, that's from the 0 to Z table. If you have 0.25 in the tails, let me get 0.470 in the 0 to Z part, the fat part, and that critical value for Z is 1.96. Those are called the critical values. Anything between minus 1.96 and 1.96, we accept. That white region is accept. If you fall on the right tail, you're going to reject, and if you fall on the left tail, you're going to reject. Remember, you're rejecting if there's too much coumadine and too little coumadine. So you need to have two rejection regions. That's where we cut the alpha in half. Okay, so if we fall into that red area, and the shaded red area, then that's called a rejection region. So if you get a plus 2 or a plus 3 or a plus 4, you're rejecting on the right side too much. If you get on the left side too little, that's like a minus 2 or a minus 3 or a minus 4, we reject on the left side. Now what we're going to do is take the sample evidence, turn it into a Z value, and see where the Z value takes us. We're in the acceptance region, which means it could very well be sampled here, or we're going to be in the rejection region. So here's where we convert the sample evidence into a Z score, Z value. 20.50 minus 20 over 0.80, that's your standard deviation, over the square root of n. What we end up with is 0.50 over 0.10, and that's equal to 5. Okay, so now we have a Z score of 5. Now what does that tell us? Now we're way into the rejection region. Okay, so we're going to reject the HO. Notice, we would have rejected that 1.97, 1.98, 1.92, 3, 4, we're at 5. That's basically telling us that the probability of getting this kind of sample evidence, if the claim is true, is very low. It's very unlikely that this is going to happen. This sample evidence is not what you expect if your mu is 20 milligrams. We've just shown you how to do a hypothesis test, but let's show you another way of looking at this data. Remember, we're always working with the sample evidence. With a hypothesis test, we're testing the claim using the sample evidence. But suppose there's no claim made, and all you want to do is estimation, and that's constructing a confidence interval, watch how we do the same, almost the same thing, but from a different perspective. Okay, we take the sample evidence, and we want to construct a 95% confidence interval. Okay, so we take the sample evidence, the X bar, 20.50 milligrams, plus or minus 1.96, because that gives you two-sided confidence interval if you want 95% confidence. So you do plus or minus 1.96 times the standard error of the mean, which we computed before, S over the square root of N, which was 0.10. And now we have constructed a 95% confidence interval, and we would have, based on that, we would have said we're 95% sure that the mu is a fixed value, but it's somewhere with 95% confidence between 20.304 milligrams and 20.696 milligrams. And notice, just on the basis of that, we would have known 20 is not in that interval. So you can see that hypothesis testing and confidence intervals are basically almost the same thing, the two sides of the same coin, because both rely on the sample evidence. So if a claim is made about a parameter, if there's a claim made, you do a hypothesis test, because you're going to test that claim. And if you won't test it, the government's going to test it in some cases. If no claim is made, all you want to do is use sample evidence to estimate a parameter, and maybe even to determine what claims may be made in the future, then you can do a confidence interval. In both cases, you're relying on the sample evidence. So these are really two ways of looking at things, but as far as you're concerned in the course, if we actually construct a confidence interval, there's no claim was made. You're just constructing a confidence interval that you're either 90, 95, 99, or whatever, percent sure, contains the parameter. But once you hear a claim, once you hear somebody making a claim about a parameter, you're going to take the sample evidence to test the claim and to see if the claim makes any sense and is reasonable. This is an example of a one-tail test. The company claims, notice you hear the word claim. The minute you hear that word claim, you know it's going to be a hypothesis test. They're claiming that their LED bulbs will last at least 8,000 hours. So you can take the sample evidence, if you, the government, somebody's going to be testing the claim. You sample 100 bulbs, find that X bar, the sample mean is 7,800 hours, and the sample standard deviation is 800 hours. So should we reject the company's claim, we're going to test at an alpha of 0.05. Okay, so first we have to write down the HO and the H1. HO, the claim is mu is greater than 8,000 hours. Again, notice, anything more than that is fine. 9, 10, no one's going to get upset if they're claiming 8,000 hours and now the bulbs last 14,000 hours. Good news. It's on the left side where there's a problem. And that's why HO's amuse less than 8,000 hours. And that's why we need a statistical test, but it's all going to be on the left. It's a one-tail test. So we take the entire alpha of 5%, it goes to the left, that means you have between 0 and Z you have 45%, 0.4500. And now your critical value, since you're on the left, is minus 1.645. That's called the critical value. So we reject if you fall into that rejection region that's shaded in red. So anything less than minus 1.645, minus 2, minus 3, minus 4, that's rejecting. Anything to the right of that, like let's say plus 4, you don't have to do a test. You're definitely okay. But let's say between 0 and minus 1.645, you don't reject, that could be sampling error. Now we're going to take the sample evidence and turn it into a Z score. 7800 minus 8000, which is minus 200. Over 800, the standard error of the mean, 800 of the squared of 100, that's 80. Minus 200 over 80 is minus 2.50. Now the computed Z value, that's called computed or calculated Z value, that's basically your sample evidence converted into a Z score. That sample evidence now is a Z score minus 2.50 and I see it's in the rejection region. So we reject HO because the probability of getting this sample evidence, if the claim is true and that mu is 8000 or more, this is called 8000 hours, the probability of getting this kind of sample evidence of 7800 based on the sample size of 100 with a standard deviation of 800, the probability of getting all that is less than 5%. You can actually figure it out. Certainly if you do it by computer, the computer will tell you the probability, but it's going to be less than 5%. There are a lot more problems on our website. Do as many problems as you can find. As always, practice, practice, practice. The more practice you get doing the problems, the better off you'll be on exams.