 Welcome to our lecture on the student's t-distribution. If you recall, if you go back to the information regarding the central limit theorem, that tells us what happens when n is large, 30, 50, but certainly large. What do you do when n is small? Again, if you know the sigma, the population standard deviation, the parameter, it doesn't matter. If you have a situation, you don't know sigma, you're going to use s, the sample standard deviation, to estimate it, and you have a small sample, say 25. Now what are you going to have to do? Well, this is a problem that occurred a long time ago, and William Dossett developed a t-distribution. He wrote under a pseudonym called student, and you'll see why. Basically because he's an employee, he couldn't use his own name. So he developed something called the student's t-distribution. Essentially, it's the small samples when you don't know the sigma. As you can see, we'll still continue to use the z-statistic if n is large. We'll still continue to use the z-statistic if we know sigma. The problem, if you look at the two-dimensional table, the problem is for sigma unknown and small sample. What do we do then? Well, if the population that the data came from is normally distributed, we can use the student's t-statistic. What happens if not? What do we do? Are we up the creek? Well, as far as this course goes, yes. But what you have to do is take another class, a more advanced class in statistics, and learn about non-parametrical statistical methods. Let's talk about the t-distribution. That's what. It looks exactly like the normal distribution, except it has more spread. It also, the mean, median, and mode are the same. It's symmetric. It goes from minus-infinity to plus-infinity. But the big difference is, there's only one z. The z is a normal distribution with a mean of zero and a standard deviation of one. But t, you know, is a t for, when you have one degree of freedom, is a t two degrees of freedom, a t with three degrees of freedom, a t four, a t five, a t six, a t seven, and so on. What happens is, as you have more and more degrees of freedom, the t starts resembling the z. Now, what do we mean by degrees of freedom? We have that concept already. All you have to know for this course is n minus one, the sample size minus one. And in the next slide, you'll see why we divide, why we lose one degree of freedom, especially a mathematical reason, so you don't have to go crazy over this. Just remember, in this case, with a one sample study, you lose one degree of freedom. So if your sample size is 25, you're going to be dealing with t 24. If your sample size is, let's say, 18, then you've got to work with t 17. So again, when you look at the t distribution, you have to know how many degrees of freedom you're talking about, because the t 10 is not the t 11. The t 11 is not the t 12. The t 12 is not the t 13. And you can see the formulas for using the t statistic are exactly the same as the formulas you used with the z statistic, except that over here, we're using s as a point estimator for sigma, because sigma is unknown. That's why we're using t in the first place. You'll find these formulas on your formula sheet as well. On this slide, we explain a little bit as to why you lose a degree of freedom. Remember, in the one sample case, you're going to be working with n minus one degrees of freedom. From a mathematical point of view, every time you use a sample estimator where you should have used a parameter, you lose a degree of freedom. And that's the reason we divide it by n minus one to calculate the sample standard deviation, because we don't know mu. Remember the formula was sum of the x minus x bar, sum squared over n minus one. We can conclude with that. When we divide it by n minus one, we lose the degree of freedom. Otherwise, s would be a biased estimator of sigma. So all you have to know for this course, this is all the mathematics. Let me ask you why you lose a degree of freedom. Simply say there's a mathematical reason for it, and it has to do with bias. And that's sufficient. So again, in these one sample cases, you lose one degree of freedom. So if you sample size as, let's say, 24, you have to work with the t 23. The distribution you need to use. Here's an example we use t. A consulting firm, they claim their consultant earn exactly $250 an hour. And you want to test their claim, but you could only get a sample of 16 consultants. And you find that the sample mean x bar is $200 and s is $96. And you want to test the claim up to an alpha of 0.5 level. And I'm going to assume that the population falls to normal distribution. So let's see the steps. The first step, she's doing all these statistical tests, you need HO and H1. Well, HO is that mu equals 260. That's the claim. The mu equals $260 and H1. Remember, this is the two-tail test because they use the word exactly. So the mu is that H1 is that mu is not 260. Any event, now step two, you have alpha is 0.5. This is a two-tail test. You take the alpha, cut it in half. So you have 0.25, that's 2.5% in the right tail. 0.25, 2.5% in the left tail. And you can't use z. This is the number. You can't use z for this. You have the sample size of 16. And as I mentioned, it was one degree of freedom. You have to work with t 15. So you can go to the table and you're going to look for 0.25. That's going to be the column for 0.025. And you're going to look for 15 degrees of freedom. And minus 1, 16 minus 1 is 15. And you can find the critical values now are 2.1315 on the right. And the left tail is minus 2.1315. So you're close to the 1.96. You're not that far away. And again, t 15, as I mentioned, is not the same as t 16 or t 17. But for t 15, your critical values are 2.1315. Notice that you're paying a bit of a price. Because if you'd be using z, if you do sigma and you do z, you have the critical values of 1.96 and minus 1.96. It's a bit harder to reject now because you have to go, you need a number that's more than 2.1315 on the right or less than minus 2.1315 on the left. That's the price you pay for the small sample and not knowing sigma. This shows you how we found the 2.1315. You need a column. We want the 0.25. We need the degrees of freedom. That's the row, 15. And notice we intersect the 2.1315. Very easy to use this table. And here's where we do the mathematics. t 15 equals 200 minus 260 over the standard error of the mean, which is 96 over the square root of 16. So the numerator becomes minus 60. The standard error of the mean, that's the denominator is 24. And we get a value of minus 2.50. So in step six, we decide to reject HL. The problem is less than 5%. Remember, why do we reject? Because minus 2.50 is more to the left than the minus 2.1315 value. So we're in the rejection region. We're higher. It's more negative. And that's why we reject HL. The probability of getting that value, remember those over the infinity, is less than 5%. This is not what you expect to see. This sample evidence is not what you expect to see if HL is true. HL is 260. We're pretty far away. 200 is relatively far away. And that's what this is telling you. So we reject HL. Now, if you recall, we've said this a few times, that there's kind of two ways to do inference. One way is to the test hypothesis that the claim was made. But it's only that it made any claims. And you just want to take the sample evidence and construct the confidence interval. You can do that too. And now with t, you're going to have to use the critical values of the t, which is 2.1315. You can't use 1.96. So notice the price you pay now is you're going to have a wider interval. So any event is what you do. You take the formula, which is x bar, you solve it, x bar, which is 200. That's the sample mean, plus or minus 2.1315 times the standard error of the mean, 96 over the square root of 16. So 2.1315 times 96 over the square root of 16, that gives you a margin of sampling error of $51.16. So we have 200 plus 5116. Then we do 200 minus 5116. So now we have a 95% confidence interval that goes from 148.84 all the way to 251.16. That's your confidence interval. Again, when 95% show that range from 148.84 all the way to 251.16, some of that range is the real population in the true mean. Notice, by the way, something's not in there. That's the claims of more than 251.16 would not be reasonable. That's why somebody wants to make a claim based on this. They say, no, let's claim 260 or 270. They say, no, no. The higher you're willing to go is 251.16. Here's another example. A company claims that its soup vending machines deliver on average exactly four ounces of soup. The company statistician takes a sample of n equals 25 and finds an x-bar value sample mean of 3.97 ounces, a sample standard deviation s of 0.04 ounces. Part A tests the claim at alpha equals 0.02, and part B is to construct a two-sided confidence interval estimator. Here's part A, the hypothesis test. Notice that we're using an alpha level of 0.02. That's just to make sure that you get practice using nontraditional alphas, the region of rejection. The no hypothesis is the claim. The no hypothesis is that mu really is exactly 4.0 ounces. The alternate hypothesis, the one we accept if we reject H.O., is that mu is not equal to four ounces. It's either something greater than four ounces or significantly smaller than four ounces. So the region of rejection is split in two, 1% on the high side, on the right, 1% on the low side, on the left. This is a T24. It's a T distribution with 24 degrees of freedom. That's one less than m. And when you look that up in the table, the tail probability of 0.01 and the degrees of freedom 24, you'll find 2.4922. So that's plus 2.4922 on the right and negative 2.4922 on the left. Those are the critical values. Once we get the computed value, we compare that and see are we in the region of rejection or in the middle in the region where we don't reject. The computed value, you can see the formula right there. We're using the formula of X bar minus the hypothesized mean divided by the standard error of the mean and you end up with a value of negative 3.75. So it's well into the region of rejection at this alpha level and the conclusion is reject in all hypothesis. If we don't have a claim, we have the sample data. We would like a 98% confidence interval estimator of mu of the true population, mean mu. We use the formula for a confidence interval estimator at 90% confidence. That's the same as an alpha of 0.02. We already looked that number up in the table. So we take the sample mean that's 3.97 plus and minus the critical value from the t-distribution 2.4922 times the standard error of the mean and we end up with an interval that goes from 3.95 ounces on the left on the low side to 3.99 ounces on the right side. We have 98% confidence that this interval really does contain the true population, mean mu. And again, we could note, this isn't really the way we do a hypothesis test, but in essence, hypothesis testing and estimation are just two sides of the same coin. If we were going to look at this and say, well, if 4 in there, that was the claim, would the answer would be no, it's not. And so if I used a 98% confidence interval because that's the level of confidence I want, the claim of 4.0 would have to be rejected. Okay, we have a different kind of t-test here. You'll see why it's a bit different. The school claimed that the average reading score of the students is at least 70, at least. So again, any more than 70 is fine. If the sample mean would be 75, wow, you don't have to do any tests. The problem is only in one direction. Below that, you want to make sure you're not looking at sampling error. So we took a random sample of 16 students and we wanted to test the claim. And we found that the sample mean was 68 with a standard deviation of 9. So in part A, we're going to test the claim at an alpha of 05. Part B, we're going to say no claim was made, no made any claims, and didn't do the two-sided confidence intervals. Simply want to know, based on the sample evidence, construct a two-sided 95% confidence interval for Mu. Okay, so here's how we set it up. We're going to test the claim now. So HL is that Mu is greater than or equal to 70. H1, if you reject HL, we're left with H1. Mu is less than 70. The school is in trouble. Part A, the formula you know already, T15 equals 68 minus 70 divided by the standard error of the mean, 9 over the score of 16. The numerator, we've got minus 2. The denominator, we have 2.25. We have minus 0.88. That's our computed T value. Notice it's not in the rejection region. All right? So if it's like minus 2 or minus 3, we're in the rejection region. This is not the rejection region. We're quite close to the zero. So we cannot reject HL. We have no evidence to reject HL, and the probability is greater than 5% of this could happen. This is a sampler. It could very well be sampler. We do not reject HL. In part B, in part B, this is what we're doing. No one made any claims, nothing. You have sample evidence, and you just want to construct a two-sided confidence interval. Okay? Because confidence interval is always going to be two-sided. So no claims. Don't worry about one side, one tail, two, nothing like that. All right? So now what happened? We have sample evidence. We have a 68, as you know. That was a sample mean, and we want to construct a two-sided, 95% confidence interval. If you go to the table, you'll find that the T15 value, remember T15 is one degree of freedom. You'll find the critical value is 2.1315. So we do 68 plus or minus 2.1315 times the standard error of the mean, and that's S over the square root of N, that's just 2.25. 2.1315 times 2.25. That's 4.8. That 4.8, again, is the margin of error. That's what it's called a margin of sampling error. It's the margin of error. 68 plus 4.8 brings you all the way to 72.8. 68 minus 4.8, unless they get 63.2. This is your 95% confidence interval. Anywhere from 63.2 to 72.08. So by the way, this was a record. The school is safe. As far as they're concerned, you could be even as high as 72.8. That's a very wide interval. But again, that's the price you pay for working with a small sample and you don't know sigma. If you don't want the interval to be too wide, take a larger sample. Here's the next example. So apparently, van der Lea Industries produces bottled water and distributes it. Hardly, hardly overpriced bottled water, but nevertheless. Part A says, let's test the claim that at most we're going to find one part per million benzene in the water. We don't want benzene in our water, so it's a good thing if we find out that it's been limited. Part B is again saying, supposing there was no claim, what do I do with the sample? Evidence, can I estimate mu? The sample data, we collected 25 random-elected bottles of water, found an X bar value of 1.16 part per million and a standard deviation of 0.2 ppm. Okay, here we're testing the claim. First, setting up the hypothesis. The null hypothesis is that mu is less than or equal to one part per million benzene. That's how when at most looks, at most is the same as less than or equal. The alternate hypothesis is that the null hypothesis is incorrect. The alternate hypothesis then must be that mu is something greater than one part per million benzene. Remember, the region of rejection, when we draw the picture, the region of rejection is on the same side as the alternate hypothesis, so it's on the right. If we find from the sample data that we have a test statistic that's too far over to the right so that it's in the region of rejection, that means that there was too much benzene in the water. With N equals 25, we have degrees of freedom 24. We don't know sigma. We're working with a T distribution of 24 degrees of freedom. The table value, the critical value that we get from the table with all the alpha of 0.05 in the tail is 1.7109. The calculated value of the T statistic ends up being four exactly, and that's way out into the region of rejection, so we're going to reject the null hypothesis. Same problem. No claim was made. No one made any claims, and we just want to construct the confidence in the lesson based on the sample evidence. Well, we take the 1.16. That's your sample evidence. That's your sample mean, right? 1.16 plus and minus 2.0639. That's your critical value. This is sort of two-sided T. It's putting the 5% into 2.025 and 025, so we need to use 2.0639. See the diagram right here on the right. We're doing T24 times the standard error of the mean times 0.04, and now we see the margin of error is 2.0639 times 0.04, so now when we construct the confidence interval and the level of 1.077 parts per million, only up to 1.243 parts per million of this impurity with benzene. Anyway, again, if you don't know as long as you have more than one, I think they're in trouble, but essentially this is the way you construct the confidence interval. The only way to learn statistics is just to do lots and lots of problems, and again, you'll see the methodology is always the same. It's a certain method that we use over and over again to just keep doing problems. You'll learn this very well, and you'll understand a very important concept, and this is the basic concept you're learning in the inference part of the course. When you take a sample, there has to be some kind of margin of error, the sampling error. Just remember that, that your sample statistic, let's say the X bar is not mu. It's an estimate of mu.