 Welcome to this review session for statistical inference. We're going to be learning about inference. Now, inference, we can talk about estimation. That's what we're talking about now, confidence interval estimation. They both work with the sample evidence. Now, the sample evidence will either consist of a sample mean, X bar, and a standard deviation, and a sample size N, N, X bar, and S. And then using that sample evidence, we can construct the confidence interval. And that's called estimation. We're going to call it confidence interval estimation. Later on, we're going to switch over to hypothesis testing. Right now, we're going to be talking about confidence interval estimation. Here's our first problem. A major tire manufacturer is interested in estimating the average life of its tires. So you want to estimate an average, you want to estimate mu. What do we do? We take a random sample of 144 tires. That's N, the sample size, 144. And the sample of 144 tires was analyzed and got the following results. X bar, the sample mean, was 40,000 miles. So in the sample of 144 tires, there was an average life of 40,000 miles. S, the sample standard deviation, 12,000 miles. And here's what the problem is asking you to do. Construct a 95% two-sided confidence interval estimator for the true average tire life, in other words, for mu. And the question that the problem is asking you to answer is, what is the upper limit of this confidence interval? Here's the answer. First, what you don't see here is you look at your formula sheet and you get the formula for a confidence interval estimator of mu where you have a large sample size. 144 is a large sample size. And so we're using the Z distribution, we're getting our values for the 95% confidence from the Z table. In fact, you see the picture there of the Z distribution. Two and a half percent in each tail, it's the same as saying 0.025. The 95% is in the center of the distribution and we want to get a confidence interval estimator that covers the central 95% of the distribution. And when you look something up in the Z table, you know that the picture at the top of the Z table, because we usually use the 0 to Z table, has the area next to the mean as to the shaded area. That's why it's called the 0 to Z table. Well, if you're covering the central 95% of the distribution, the area under the curve at the side of the mean between the mean and the Z value is 0.95, 0.470 on one side, 0.470 on the other side. And when you go into the Z table and you sort of look around in the middle trying to find the value of the area under the curve, the probability that's closest to 0.4750, you actually do find it. And it's associated with a Z value of 1.96. So that's plus 1.96 on the right side, minus 1.96 on the left side. And you can see that the tail probabilities are 2.5% each on each side, which is the same as saying 0.025. Now we plug our numbers into the formula. X bar is 40,000 plus and minus. There's the 1.96 from the Z table. And then you have a measure of variation, S 12,000 divided by the square root of N, square root of 144. What do you get? 40,000 in the middle. That's X bar. And like every confidence interval, we have our statistic in the middle. We have a little something on one side, a little something on the other side in order to get our interval. This little something is called the margin of error. If you take the confidence times the measure of variation, and that's the margin of error, sometimes it's also called the half-width WIDTH of the confidence interval because that's exactly what it is. It's half the size of the interval. This is the margin of error. So what's the margin of error if anyone asks you? 1960. The interval we come up with then goes from 38,040 miles all the way up to 41,960 miles. And what we say about this is that the probability that this interval estimator that goes from 38,040 to 41,960, the probability that this interval does contain the true mean of the distribution of the population is 0.95, or at least 0.95. And now we have to go answer the question. There was a question on the previous slide. What's the upper limit of the confidence interval estimator? And there you go. It's right there on the slide, on the right-hand side of the interval, 41,960 miles. That's the answer to the question. Look at this problem now. A major computer chip manufacturer asked you to estimate the average life of its chips. They took a sample of 25 computer chips. Now, right away, you realize 25 is a small sample. Okay, so they took this 25 chip sample. X bar, the sample mean is 8.50 years. S, the sample standard deviation is one and a half years. And you're asked, first you want to construct a 95% two-sided confidence interval for the average life of the chips. And the question is very specific. What is the lower limit of the confidence interval of this problem? Notice we're constructing a 95% confidence interval. This is for a small sample. Okay, so we need a 95% confidence interval. And this is a small sample. We lose one degree of freedom. It's a one-sample case. So we're working with T24. And if you want 025 on the right and 025 on the left, that's, by the way, 2.0% 0.025, you'll find that the T24, look for the critical value. It's 2.0639 on the right and minus 2.0639. So that's the value we're going to use. That's T24. So we do 8.5 plus and minus 2.0639 times 1.50. That's the standard deviation divided by the square root of n, the square root of 25. Notice that the margin of error is 0.62. Okay? Well, you also notice that 2.0639 is not so far from 1.96. The z value. But we can't use z. It's a small sample. You only have 24 degrees of freedom. T infinity, that's z, which is infinity degrees of freedom. We'll use it once we get to 50 or 60 degrees of freedom. Anyway, getting back to the margin of error 0.62. 8.5 plus 0.62 is 9.12 years. 8.5 minus 0.62, 7.88 years. To answer the question, the lower limit of the confidence interval is 7.88 years. All right. You've been hired by a firm who construct a 90%, 90% this time, two-sided confidence interval estimator for the following production data. N, the sample size, 400 parts. Out of those 400 parts, the number of defectives was 20, 20 parts. And the problem asks you to construct a 90% two-sided confidence interval estimator of the true proportion of defectives. We know what the sample evidence is. We want an estimate of the population proportion. And in addition, we're being asked to answer the question, what is the upper limit of this confidence interval expressed as a percentage? Notice, I want to point out to you, if you're looking for a mean and a standard deviation, you're out of luck, because this is a different type of problem and it's exactly that clue that if nothing else will tell you you're making inferences about the population proportion using the sample proportion. So this is not about means, it's about proportions. With proportions, we don't have any choice of distribution. We can only use the z-distribution. And so you see the z-distribution laid out there for a 90% confidence interval estimator. So we want the central 90% of the distribution, 45% on one side of the mean, 45% on the other side of the mean. Each of the tails has 0.05, 5%. When you look that up in the z-table, you look up 0.4500 in the middle of the table as a probability. And you look for what the closest that you can come to is halfway in between a z-value of 1.64 and 1.65. And so here we split the difference and called the z-value plus and minus 1.645. The proportion, the sample proportion, 20 out of 400 is 0.05. The formula for a confidence interval estimator of a proportion you have in front of you, you'd find it from your formula sheet. The sample statistic is 0.05, that's in the middle of the interval. The margin of error is, again, the value from the z-table that gives us 90% confidence times the measure of variation and it ends up being 0.018. That's the margin of error. That's the confidence interval that's on one side of the sample proportion, the size of the half width on the other side of the sample proportion. 0.05 is in the middle. You add 0.018 to it. You get an upper limit of 0.068. You subtract 0.018 from it and you get a lower limit of 0.032. The probability that this interval between 0.032 and 0.068 really does contain the true population proportion is 90%. To answer the question, what's the upper limit of this confidence interval? The upper limit is 0.068, but if we express it as a percentage, it's 6.8%. How are we going to use the sample evidence? Back to the sample evidence, we'll do something else with the sample evidence. We're going to test the claim. It's called hypothesis testing. Somebody's going to make a claim about a parameter like a company might say that the average life of its refrigerators is at least 10 years. That's a claim, not about one particular refrigerator. It's a claim about a parameter and we can test that claim with the sample evidence and see if the claim is reasonable. This is what hypothesis testing is all about. Testing a claim. Let's take a look at some of the things that can happen when we test the hypothesis. You'll see that we have something we call the state of nature, the columns. Either the null hypothesis really is true or it really is false. We don't know. Why don't we know? Because we only collected a sample. All we know is the sample evidence. We don't know the population parameter. On the other hand, the rows, we can look at the decision that we make. As a result of the hypothesis test, either we will reject the null hypothesis or we won't reject the null hypothesis. There are two possible outcomes of our decision and there are also two possible things that could be true about the null hypothesis. Let's take a look. If the null hypothesis is true and we don't reject it, that's good. We made the correct decision. On the other side, if the null hypothesis is false and we do reject it, there's another good. We did another correct decision. The other two cells represent possible errors that can occur. We call those errors alpha and beta or sometimes type 1 error and type 2 error. If the null hypothesis is false but we end up not rejecting it, we say the evidence doesn't let us reject this null hypothesis. We made a boo boo and it's called a beta error or an error of type 2. If the null hypothesis is true and we reject it anyway, then we've made an alpha error that's an error of type 1. If you were wondering if this alpha is the same alpha that we looked at when we talked about estimation, confidence interval estimations and the level of confidence was also called 1 minus alpha, it is indeed exactly the same quantity. It's exactly the same alpha. If we have an alpha error of 0.05, meaning the error that we make when we reject the null hypothesis, even if it's true, that's equivalent to, if we were doing a confidence interval, it's equivalent to using a confidence level of 95%, 1 minus alpha. Suppose a company called Kedem Technology, they claim that the Kedem laptop will last at least 10 years and you want to test the company's claim and you take a random sample of 100 laptops. So you set up HO, mu is greater than or equal to 10 years, H1 is mu is less than 10 years. Now we're not going to go through the test because we need to know the sample mean and we know n, but we don't have a sample mean, sample standard deviation, but you've concluded do not reject HO. Now suppose we know the true life and the true life of a Kedem laptop is 9.5 years. What error, if any, have you committed? Think about it. The true life, mu is 9.5 and you did not reject HO. Remember, not rejecting HO means you accepted H1. So what kind of error did you make? Well, the answer is you made a type 2 error because you, and again in quotes, accepted. You accepted HO, right? Not rejecting means you accepted it in quotes. So you accept that it's more than 10 years, but the reality is that it's 9.5. So you accepted HO when it's full, so that's called a type 2 or a beta error. What happens, suppose, if you reject HO based on your sample evidence? You reject HO and the true life, let's say we know the true life of a Kedem laptop is 10.8 years. What error, if any, have you committed? Well, the answer is you made a type 1 alpha error. Why? You rejected HO was your decision based on the sample evidence. You rejected HO, which means you said that, no, these laptops don't last more than 10 years, but it does last more than 10. It actually lasts 10.8 years. That's called a type 1 error. Rejecting HO when it is true. Now we're going to use the sample evidence. Okay? Back to the sample evidence, but we'll do something else with the sample evidence. We're going to test the claim. It's called hypothesis testing. A company is going to make a claim about a parameter, like a company might say, that the average life of its refrigerators is at least 10 years. That's a claim not about one particular refrigerator. It's a claim about a parameter. And we can test that claim with the sample evidence and see if the claim is reasonable. And this is what hypothesis testing is all about. Testing a claim. Here's our first problem. A company claims that a slice of its Keto cheesecake has no more than 30 calories on the average. Test this claim at the 0.05 level of significance. Alpha is 0.05. The evidence from the data is in the table in front of you. N, the sample size, 81. X bar, the sample mean is 32 calories. S, the sample standard deviation, is 3 calories. You do want to test the claim. That's not implied. The problem is specifically asking you to test the claim. But there's a question that's being asked, too. The question is this. The calculated value of the test statistic is... I just want to remind you that when you do a hypothesis test, there are four things that you need to do. You've seen this in the lectures. You've seen this in the problems, in the homework, in the do-it-nows. But I'm going to remind you here. For a hypothesis test, you need the hypotheses, H0 and H1. You need the decision rule, which means you want to know the test statistic. You want to know your critical values from the test statistic, where you're rejecting. You want a rule for... If your sample evidence is greater or less than the critical value from the table, then how are you rejecting? You need a rule for that. And then finally, you need the calculated value of the test statistic. You get that from the evidence, from the data. And then you look at everything you have done, and your final piece in this four-part problem is your conclusion. Do you reject the null hypothesis, or do you not reject the null hypothesis? Let's see how to do this problem. We've learned several different hypothesis tests. Every topic, every chapter seems to be a different type of hypothesis test. But something that we've been pointing out to you all along, and I hope by now you've noticed on your own, is that they're all really pretty much the same. One little thing has changed in moving from one type of hypothesis test to the next one. So the question is, I have a problem. I know it's a hypothesis test because it says, test this claim, test this hypothesis. But I need to know what type of hypothesis test it is, so I know how to go about solving the problem. Well, here's a list for you of what you're looking for. What kinds of hypothesis tests have we learned? Okay, well, let's see. One question you have to look at is, what's the parameter? We've learned to test hypotheses about mu, the population mean, and about p, the population proportion. So those are your only choices. How do I know this? Because that's all we've learned. Second, what is the appropriate test statistic to use? What distributions are we using in order to come up with our decision rule? What have we learned? Only z-test and t-test. So you know it's going to be one of those two. Third, are we doing a one-tail test or a two-tail test? With a two-tail test, we'll reject if the data is too far away, either on the right side or the left side, on the high side or the low side. With a one-tail test, we're only rejecting on one side of the distribution. We'll see examples of both. And then finally, maybe we should even do this first, how many groups are there? How many samples? How many populations? If you have a test with one sample taken from one population, that's a one-sample test. If you have a hypothesis test with two samples taken, each one taken from a different population, that's a two-sample test. And you know what? Those are your choices, because that's all we learned. We learned one group test and two group tests. Alright, so here's the problem again, looking at it in the context of what can I do? What have I learned that I can apply to this problem? What is the parameter being tested? Well, it must be mu, because I have an X bar there from my sample. And the company is asking, the company's claim that you're testing is that the cheesecake has no more than 30 calories on the average. So we're making inferences about the population average about mu. What is the appropriate test statistic to use? Well, if we're making inferences about mu, we can either use Z or T. And it'll depend on, number one, whether the sample size is large enough. And number two, whether we know the sigma, the population standard deviation. Well, as you know by now, we almost never know sigma. And indeed, in this case, that's true too. But a sample size of 81 is fairly large. And for us, in this course, it's large enough so that you can use the Z statistic. Is this a one-tail test or a two-tail test? That might be a little bit more tricky. Let's take a look at the description again. A slice of cheesecake has no more than 30 calories on average. So where are you going to reject? Are you going to reject this claim if you find out that the slice of cheesecake really has very few calories, like 10? Probably not, because you're looking to reject the claim. The claim is that there's an upper cap of 30 calories. No more than, so less than or equal to. So you'll only reject if you find your data from the sample evidence is too high, too far above this claim. And then finally, how many samples? Just one. We're only looking at one sample, one type of cheesecake, a sample of size 81. All right, so here's the solution. Remember, your hypothesis test has four pieces. The first one, you're writing down your null and alternate hypotheses. Your null hypothesis is that mu really is less than or equal to 30 calories. That's the claim at most 30 calories. The alternate hypothesis, the one that you're going to accept if you reject the null hypothesis, is that mu really is something greater than 30 calories. There's no reason in the world that you have to say exactly what it is. You're just stating that the alternate hypothesis, if you reject the null hypothesis, the alternate hypothesis is that mu is greater than 30 calories. When you look at the picture for setting up the decision rule, we're only rejecting on the right side. Why? Because we're only rejecting if we find that mu really should be something higher than 30 calories because we had data that showed that the evidence is that mu should be larger. A nice little trick is that if you look at H1, there's a kind of a right arrow. The greater than symbol looks like a right arrow pointing to the right. And H1 is covered by the region of rejection because when we reject H0, we accept H1. And so that's a clue that the shaded-in area, the region of rejection, should be on the right side of the distribution. It has to match. H1 and the region of rejection have to match. Since alpha is 0.05, that 0.05 is all on one side. If we're looking things up in the Z-table, as it looks like we are, to use the 0 to Z-table, you have to take one half of the distribution, 0.5, subtract 0.05, and you get 0.45 for that area in the middle that's attached to the mean. And you look it up in the middle of the table, in the 0 to Z-table, and you find, again, like we did before, that the Z-value is something between exactly halfway between 1.64 and 1.65. Would it be nice if we could use something like the T-table instead of the Z for statistical inference, or if someone had set up a Z-table in the same format as the T, because all we need to do then is we need to enter the table with a tail probability. And in this case, we don't need degrees of freedom. And in fact, you can do that. A neat little trick is to look at the T-table. The last line on every T-table is always a T with degrees of freedom infinity. And those values are exactly equal to the values in the Z-distribution. So it might be easier sometimes to use the T-table even when you know you're looking for a value from the Z-distribution. At any rate, we have the decision rule. If our calculated value of the Z-statistic is greater than 1.645, we'll reject the null hypothesis. If it's not greater than 1.645, we will not reject the null hypothesis. And we use the formula, again from your formula sheet, for Z, X bar minus mu divided by S over the square root of N. And you end up with a calculated Z value of 6. Wow, 6. That's a whopping Z value. And it's way over to the right, way into the region of rejection. And so we reject the null hypothesis, we reject HO. The average number of calories per slice is greater than the claim. It's not less than or equal to 30 calories. What's the answer to the question? The question was what's the calculated value of the test statistic and the answer is 6, 6.00. This problem, the company makes a claim, claims are always about a parameter, a population parameter. The claim here is that their three-dimensional printers have a life of at least 10 years on average, at least 10 years. We're going to test the claim at the 0.01 level of significance. It's called alpha, the alpha error. We can also go level of significance at 0.01. And we're going to use the data in the table below. N is 28, right away you see small sample. X bar, the sample mean is 9.1 years. And S, the sample standard deviation is 1.6 years. And you want to know the calculated value of the test statistic. What is it? You're going to decide whether to reject or not reject based on that. Here are the choices you're going to have to make. So think about it. It's a mu or p, a z-test, t-test, one-tail, two-tailed. Are we looking at one sample or two samples? Next slide will tell you what to do, but common sense will help you. Well, let's look at the data. N is 28, X bar, the sample mean, 9.1, S is 1.6. What are we looking at with testing the parameter mu? Definitely about a mean, population name. What is the appropriate test statistic with N is 28, and you don't know sigma? T-test. Since the claim was that the life is at least, the minute you see the words at least or at most, it's going to be a one-tailed test. This is a one-tailed test because of the word at least. How many samples are we looking at? There's one sample. We just look at one particular sample. Let's solve the problem. Ho is that mu is greater than or equal to 10.0 years. Anything more than 10 is good. It has a longer life. They said at least 10 years. H1 is when you reject Ho, you've got H1. That mu is less than 10 years. That's bad news. They rejected your claim. Now we have to take the alpha of 01. It's in one-tailed. Now, which tail? It's the left. Remember, H1 always points, the arrows point to the left. That's where you're going to put the rejection region. We need to put 0.01, and this is a T27. T27, look at 0.01, tail. Since it's on the left, it's got to be a negative number. If you go to the table, you'll see 2.4727, but make sure you put a minus. It's on the left. Anyway, that's called the critical value, minus 2.4727. Anything more than that? That's like minus 2.8, minus 3, minus 4, minus 5. All the way to minus infinity, you reject. Anything to the right of minus 2.4727, you will not reject. The quotes accept. Okay, here's where we turn the sample evidence into basically a T-value. 9.1 minus 10 over 1.6 over the square root of 28. So you have minus 0.9 in the numerator, denominator, the little rounding is 0.302. You end up with a T-value. This is called the calculated test statistic. It's a T-value, T27, minus 2.98. It's in the rejection region. We reject the company's claim. The average life of their printers, the 3D printers, is not 10 or more years. It's less than 10. That's your conclusion. We reject HO, the calculated value of the test statistic, minus 2.98. New problem. The company claims that at least 45% of people who take its CPA preparation course will pass the exam. We have a random sample of people who took the course, a sample of 200 people, and you had the following results. Out of those, 268 passed the CPA exam. There's a few things you notice right away. We'll look at them soon. You notice the words at least. You notice 45%. You notice there's no mean or standard deviation, and we'll get to that. We want to test this claim at the 0.05 level of significance, so alpha is 0.05. And we want to answer the question, the calculated value of the test statistic is what? We'll see how to do that. Here we have this slide again. It's actually, if you've noticed, the same exact slide is repeated for every hypothesis test. So I'm not going to go through it very, very differently now than we did before. I just want to point out, if you were wondering, yes, it is exactly the same. And it's just a reminder of what we need to know in order to solve these problems. Here's the problem again, in the context of looking at those questions and figuring out what it is. And clearly, this is a one-sample, one-tailed test about the proportion. We're making inferences here about a claim about the population proportion. What's the parameter being tested? Pain, the population proportion. What is the appropriate test statistic? Z, because when we're working with proportions, we can't use T. We have no choices. It makes your choice easy. Is this a one-tailed test or a two-tailed test? It's one-tailed. It's a very easy problem to figure that out, because the words at least are right there. You have a directional clue in the wording of the problem. How many samples? Only one. One sample, one population. And here's the solution. The null hypothesis that the population proportion is greater than or equal to 0.45. The alternate hypothesis is that it's less than 0.45, which matches the picture with the region of rejection on the left. 0.05 in the tail. The value from the Z distribution, negative 1.645, cuts the distribution into the region of rejection. That's the one that's shaded red. And the rest of it, where you're not rejecting. From the data, the sample proportion, 68 over 200 is 0.34. The calculated value of the Z statistic is negative 3.14, which is more negative. It's less than and it's more negative further into the region of rejection than the critical value of negative 1.645. So we have no choice. Reject the null hypothesis. And to answer the question, the calculated value of the test statistic is negative 3.14. Look at the data below. We're looking at two ice cream companies. Let me hit two companies. Right away, I think you should suspect these are two samples. And we take a sample of Breyer's ice cream and it's 250 for that sample. Then we take a sample of 100 containers of ice cream made by Bluebell Ice Cream Company. And notice we see a slight difference and 100 milligrams versus 96 milligrams. We're not sure if that difference is significant. And that's what we're going to test at the O5 level. Is there a difference in the calcium content? Okay. You've seen this a million times already. Okay. What parameter is being tested? Mu or P? What's the test statistic? Z or T? One tail, two tail test. And how many samples? Anyway, let's look at the problem again. Breyer's ice cream, 100 milligrams is the mean for their calcium content. The Bluebell, the mean is 96. Okay, so we know what we're doing. We're testing two means. See if they're different. We're going to use two samples, Z test. It's going to be a two tail test. With two samples, it's two tail. And finally, the two groups, two samples. Anyway, we're looking at HO. Mu1 equals Mu2. This way it's going to be with two samples. That there's no difference. Mu1 equals Mu2 is the same as saying Mu1 minus Mu2, zero. No difference between the two ice creams. H1 is there is a difference. Mu1 is not equal to Mu2. There is a difference. We use the two samples, Z test formula with large samples, right? So it's 100 minus 96 over the square root of 5 squared over 250 plus 3 squared over 100. It works out to 4 over the square root of 0.19 4 over 0.436. We get an incredibly high Z value of 9.17. Now look at the rejection regions. You've seen these numbers before. 1.96, this is for a Z test. To reject on the right we need 1.96, at least 0.25 on the right. Or less than minus 1.96. That's 0.25 on the left. Any number higher than 1.96 or lower than minus 1.96 you're going to reject. Now you've got 9.17. Now 9.17, there's a lot more than 1.96. So you're going to reject because that's a very calculated value. The test statistic is super high showing you that this difference is very almost impossible to be chanced. It's a definite difference. So we reject the HO and conclude that these two screens are indeed different. New problem. Given the data that you see in front of you test the hypothesis that the average job engagement scores at two companies are the same. Are they the two companies the same or are they different? We want to use an alpha of 0.05, the level of information in the table there. You've got an average from the data of a 7.9 for ABC company 6.9 for XYZ company. You've got a standard deviation 1.2 and 1.1 and you've got the sample size 14 and 16. We'll see what to do with that in a minute. Then the question is what's the calculated value of this test statistic? Here we are again. The same questions you asked for every single hypothesis test you ask again. This is how you approach a problem. So clearly what we have here is a two-sample t-test. The sample sizes are small, too small for anybody, even for me. And we don't have the population standard deviation sigma. What's the parameter being tested over here? It's mu. What's the appropriate test statistic? As I just said, we looked at the sample sizes and we see we have to use the t-distribution. Remember one more thing, by the way, that when you use the t-distribution in place of the z, you're also assuming if you don't already have this information that the underlying distribution the sample data came from is a normal distribution. If it's not and you have a small sample size you have to take an advanced course and apply a non-parametric statistical technique and we haven't learned that and we're not learning that. Another assumption that we're going to be making in a test like this is that the two variances the two population variances are the same. We're testing it there are ways to test it but in this course we're not going to do that and you will see it on your Excel print out when you do the problems using Excel and this property is called homoscedasticity and that means equal variance that's the assumption that we're making of equal variance. Continuing with our questions what kind of hypothesis test we're doing is this a one-tailed or a two-tailed test? Well, nice for you if you know you're doing a two-group test which is the next question and you already know it you don't have to worry about one-tailed two-tailed because I'm never going to ask you a one-tailed test for a two-group hypothesis test so that narrows down your problems considerably I think Here's the solution our null and alternate hypotheses, the null hypothesis is that the two population means are the same the alternate hypothesis is that they're different. We're using a T28 28 degrees of freedom we lose one degree of freedom for every sample so it's 30 minus 28 30 minus 2 is 28 for a alpha 5% it's a two-tailed test 0.025 in each tail from the T distribution we see that the critical values for this problem are plus and minus 2.0484 Now the formulas here, the calculations are a little bit hairier for this problem a two-sample T then you've had till now you first have to get the pooled variants and then you're going to insert the pooled variants into the calculated value of the T statistic it's just following a formula it's not terrible but it's a hairier formula than the ones you've been using than the ones you've had till now the calculated value of the T statistic for this problem is 2.38 it is in the region of rejection it's greater than the critical value of 2.0484 and so we reject the null hypothesis the two means are not the same and to answer the question the calculated value of the test statistic is 2.38 This problem is made by HP well-known computer company and Dell we want to know if the defect rates are different or the same anyway for HP we see that 14 defects out of 200 tablets 14 defects out of 200 for Dell it was 15 defects out of 300 tablets and you're asked for the calculated value of the test statistic whether you're going to reject or not reject but now you know what to do you can decide is it a proportion is it a mean is it a t-test is it a one-tail test with two-tail test and we look at one sample of two samples think about it for a moment well, by now you should figure it out this is two proportions we're looking at 14 over 200 versus 15 over 300 it's a parameter we're looking at is P two proportions we're going to be using Z we have 200 and 300 nice size sample it's going to be a two-tail test all the two sample tests are going to be two-tail and it's two samples basically we're comparing two proportions then we're comparing two proportions HP their proportion of defects is 14 out of 200 that's 0.07 7% defects Dell it was 15 out of 300 0.05 we look at a difference of 0.07 versus 0.05 and the total sample size is 500 as I mentioned these are two sample tests for proportions now notice to use the formula you get to get that P bar kind of like a pooled P because under HO it's one group so put it together pretend it's one group if it's one group you have 29 defects out of 500 that's called P bar and that's 0.058 5.8% that goes in the formula so again HO P1 equals P2 H1 P1 is not equal to P2 there's a difference since we're testing the 0.05 level we're splitting it up 0.025 on the right 0.025 on the left it's Z and by now you know that it's 1.96 is the critical value for the right minus 1.96 is the critical value on the left and now we turn the sample evidence the two proportions we turn it into a Z score basically Z equals 0.07 minus 0.05 in the square root you have 0.058 times 0.942 those two numbers add up to 1 P and the P bar and 1 minus P bar in the perennial of the sample sizes 1 over 200 plus 1 over 300 you end up with 0.02 over the square root of 0.19 this all works out about 0.95 0.95 is not in the rejection region so the calculate value of the test statistic is 0.95 and you're not rejecting HO again you'd reject HO if it's higher than plus 1.96 0.95 is not higher so even though it looks like it's different this can be attributed to sampling error so we say the difference between the two proportions is not significant it's a not significant difference and the calculate value of the test statistic Z in this case is 0.95 anyway we hope you enjoyed this review for the exam we only looked at inference here but we tried to explain how to solve a problem in inference but as you know if you want to get good at this do lots of problems remember always decide is it one sample, two sample is it going to be Z or T are we looking at means we're looking at proportion very simple method we've given you do lots of problems you get good at this and you'll have a lot of success in statistics and on the final good luck