 Welcome to our lecture about the two sample t-test. You're going to learn how to make inferences about means from two separate populations. We've already learned to use the z-statistic for inferences about means of two groups. Now we're going to learn to do the same thing with the t-statistic. When do we use z and when do we use t? The rules are exactly the same as what we learned in one sample statistical inference, making inferences using one sample from one population. If you want to review, learn more about it, go back to that lecture. Probably it's more likely to be in the one sample t-statistic lecture using the t-test. Basically, to summarize, if you know sigma from the two groups, sigma one and sigma two, you have no problem, just use the z. If you have a large sample size in both together, n1 and n2, and again, as before, what does large mean? It might have to do with the policy of the decision maker. And in this case, the decision maker is your instructor if you're taking the class in a formal setting. If you have a large enough sample size, use the z. The problem arises if you don't know the population standard deviation, and usually you don't. And if your sample size is just not large enough, for whatever reason, you were not able to collect enough data, which happens very often. Data is very expensive. And sometimes in the process of collecting data, you're actually destroying the objects. So it happens quite frequently. So what do we do? Well, then we would like to use the t-distribution as an approximation to the z. But again, as before, we have that additional caveat that in order to use the t, we also have to know that the underlying population is normally distributed. If we don't know that, often what we're doing is making that assumption. We're saying, well, Sigma 1 and Sigma 2 are not known. My samples, n1 and n2, my sample sizes are small. And I'm going to assume that the underlying populations are normally distributed. There are situations where we know for sure we can't assume that, because we have some knowledge from previous research about the shape of the distribution. And then we have to work with other methods, non-parametric or distribution-free methods, which is not a topic for this class. You'll notice that the formula we use to get the calculated value of the test statistic that's a t with n1 plus n2 minus 2 degrees of freedom. It has another value in it that has to be computed first. That's the pooled variance. Otherwise, things look kind of familiar. We have x1 bar minus x2 bar. So the difference between the two sample means. Then we have divided by the square root of there's that pooled variance times 1 over n1 plus 1 over n2. So before you compute the calculated value of z, you have an interim value. You have to get the pooled variance first. The pooled variance is really an average variance. But since n1 and n2 don't have to be the same size, it's a weighted average by sample size. That's really all it is. If n1 and n2 are the same size, then what you're doing is you're getting an average of the two, the average variance of the two samples. And you can see how we calculate s2 pooled over there. We're using degrees of freedom. We're weighting by degrees of freedom. So n1 minus 1 divided by the total degrees of freedom is the weight for the variance of group 1. And 2 minus 1 divided by the total degrees of freedom is the weight that we use for the sample variance of group 2. Now we want to introduce yet another assumption we're going to be making and not testing in this class. And it's called homoscedasticity. What we're doing here is we're saying, we know that for us to be able to do this test, the two variances really have to be the same. We're assuming that the two population variances are really the same, are really equal. Now like anything else, we could test for this. This parameter would be sigma or sigma squared. And we'd be testing, the null hypothesis would be that sigma squared 1, the sigma squared from group 1 is equal to the variance from group 2. We're not doing that in this course. So the only thing you can do right now in order to finish the problems and to do the exercises is to assume that if you had done the test, they would be the same. And one thing that would be nice, it doesn't always happen, is that if the exercises that you do would say we assume that the two variances really are the same. But on the other hand, if you're doing a problem and there's a follow-up question, what assumptions did you have to make in order to do this problem? And it's a two-sample t-test. Well, you have a few now. You're assuming a normally distributed population. You're assuming that the variances of the two groups are actually equal. You don't know what they are. And you have different sample variances. But you're assuming that the difference in the sample variances is due to sampling variation rather than to the two population variances actually not being equal. If we were going to learn how to test for homoscedasticity, it would be something called an F-test. Certainly any statistical software you use will do it. And you will learn about this in other courses. In problem one, we're comparing the reading scores of two groups, men and women. And we note that the women in their scores, average score is 84 compared to the 80 for the men. That's a four-point difference. The question is whether that four-point difference is significant or may just simply be a chance of sampling error. So we're going to test whether this is significant. I notice the sample size is 16 and 15. Even under HO where you say that there's no difference, you've got a sample size of 16 plus 15 of 31. We'll see in a moment that's 29 degrees of freedom. You cannot use z since you don't know sigma. So you're going to have to use t, a two-sample t-test. HO is that mu one equals mu two, which is the same as saying no difference. H1 is that mu one is not equal to mu two. Now this is a t with 29 degrees of freedom. We lose two degrees of freedom because it's n1 plus n2 minus 2. It's a mathematical adjustment in effect. You have 29 degrees of freedom. Now look at the critical value, t29. And you have 025 in the right tail, two-tail test, 025 in the left tail. And notice it's not 1.96 and minus 1.96, which it would be if this were a z. Since it's t29, which is not quite a z yet, the critical values are 2.0452 and negative 2.0452. These numbers come from the t-table. You look for 025 in the tail and you look at 29 degrees of freedom and you'll see the value 2.0452. As a check, go to infinity. When t infinity is z, you'll note that the value will be the familiar 1.96 and minus 1.96. When you're using the t-test, you're going to have to first calculate s squared pooled. Now s squared pooled is the HO is that there's no difference, make it into one group. If you take the raw data and put it into one group, you'd have 31 numbers of course, then you'd be getting the s squared pooled. But if you don't have the raw datas and you just have s1 squared and s2 squared, then use this formula, n1 minus 1. So you've got 15 times s1 squared to 256, n2 minus 1, which is 14, times the s2 squared, which is 400, divided by 29. This is essentially a weighted average. This is a weighted average of the two variances, the group 1 and group 2. And you end up with 325.5. Okay? And notice it's right sort of in the middle between s1 squared of 256 and s2 squared of 400. And then now you can do the calculation of the t29. And you see it's 80 minus 84. It's a minus 4. That's the difference between the two groups. And then in the denominator, first do 1 over 16 plus 1 over 15. Do that first. Times it by 325.5, 325.5. Take the square root of that. That's the square root of 42.04, which is 6.48. Your t29, that's your sample evidence basically, is negative 0.62. Clearly you're not in the rejection region and you have no evidence to reject HO. And in simple English, there's no significant difference between men and women on these test scores. Here we're comparing. This is problem 2. We want to know if the salaries are different in two companies. Actually this is the daily wage. We take a sample of 30 people, 10 from company 1, 20 from company 2. And notice that the average in company 1 was 210. Daily pay. But company 2 was 175. Before you can make any conclusion like saying company 1 pays more than company 2 based on your sample evidence. Before you can do that, you're going to have to test your significance. You're noticing basically a $35 difference, but that could just very well be a sampling error. HO is mu1 equals mu2. H1 is mu1 not equal to mu2. This is a two-tail test. We're testing at the alpha of 01. We cut the 01 in half. So we have 0.05 in the right tail. 0.005 in the left tail. And this is T28. And notice the critical value for T28. Get it off the table. Look at 28 and look at 005 in the tail. And you'll find that the critical value is 2.7633. It's symmetric. So on the left it's also negative 2.763. Those are the critical values. The next step is to get this S squared pooled. So we've got to take N1 minus 1, which is 9 times 625. 625, of course, is the S1 squared. And then we take N2 minus 1, which is 19. 20 minus 1 is 19 times the S2 squared of 400. Divide by 28, which is the degrees of freedom. And then you get a weighted average of 472.3. That's called S squared pooled. We put that into the formula. We have T28 equals 210 minus 175. That's the $35 difference that we observed. And now we have the standard error for the difference, which is first you do the 1 over 10. 1 over N1 plus 1 over N2. 110 plus 120th. Calculate that first. Times it by 472.3. You get the square root of 70.845. It's 35 over 8.42. You get a T28 value of 4.16. That's your calculated T. That's essentially your sample evidence. And now we turn the sample evidence into a T value of 4.16. And we indeed find it's in the rejection region. Our conclusion is that that $35 difference between the two companies is a significant difference. So company A pays more than company B. In this problem, we're looking at two suppliers of concrete beams. And the metric we're looking at is the strength of the beams. We want to know if the two suppliers are basically equivalent or if we should purchase from one rather than the other. Strength is measured in pounds per square inch of pressure. And what we want to know is, is there a significant difference between the beams supplied by supplier A and supplier B with regard to the strength of the concrete beams? We see that the two averages, the two sample averages are 5,000 pounds per square inch versus 4975 pounds per square inch. That's a 25 PSI difference. Is that difference real? And reflects the fact that the two suppliers really do make concrete beams with different strengths. Or is that just sampling error in it? It could have happened due to the randomness inherent in the world. We're going to do this test at an alpha of 0.05. And we'll do that on the next slide. The null hypothesis is that mu1 is equal to mu2. There's no difference. The alternate hypothesis is that the two population means really are different. We're using a T with 20 degrees of freedom. The sample size 2 minus 2, that's where we get the 20 from. With an alpha of 0.05, it's split equally. It's a two-tail test. We have 0.025 in one tail, 0.025 in the other tail. And so when you look this up in the T-table to get the critical values that you see here, you're looking at the 20th row, degrees of freedom 20, and the column for 0.025 tail probability. And you find that the critical values are plus and minus 2.086. And that's going to be your decision rule for the test. If you're in the red shaded area, if your calculated value of the statistic is in the red shaded area, you'll reject the null hypothesis. You'll say it's too far away from the mean of zero. If it's in the white unshaded area, you'll say, well, I can't reject. It could be. It could be. How do we get the calculated value? Again, the first thing you need is to get the pooled variance. It's kind of like an average variance weighted by degrees of freedom. That's 2995. And then we use the formula to get the calculated value of the T statistic. You see the difference between the two averages 25. You end up dividing by 23.4. I'll leave it to you to on your own do that formula. Make sure that my work is correct. And you end up with a T 20 value, calculator value of 1.07. Aha, that's in the unshaded area. It wasn't beyond 2.086 on the one side. It wasn't beyond negative 2.086 on the other side. So the conclusion is do not reject HO. There is no statistically significant difference between the means of the pounds per square inch of pressure between the strengths of the two suppliers of concrete beams. The difference we observed, which was actually a pretty small difference, was just due to random variation. We're going to also use Microsoft Excel to solve two sample T tests. One of the reasons is, as you can see, the calculations are doable but formidable. And so we do rely more on statistical software. We don't want to use pencil and paper and calculator for problems like this if we don't have to. Remember, you can figure this out in many different ways. There are a lot of online resources for using MS Excel to solve statistics problems. We have some very, very simple instructions in text form and in video form on the handouts page of our website. Please feel free to use it. Let us know how it works for you. We're going to use Excel to answer the question whether men and women spend the same amount on wine. It's well known that men spend a lot more on beer than women. But how about wine? And the researcher took a sample of 34 people. 17 happened to be women and 17 were men. And they found that the average amount spent on wine, this is pretty in a year, by women was $437.47. The average amount spent by men was 552.94. The raw data is on the right. You can see it. Again, this represents the spending habits on wine, the men and women and there were 34 of them. The real question here is, is this difference statistically significant? And we're going to solve it using Excel. Again, we have the raw data here. We're not just reporting the mean and the standard deviation. We're showing you the raw data. Anyway, this is the Excel printout. You can do it yourself and you'll see. Now, Excel calls it variable one and variable two. There's a way to put in what it is, but we know that variable one was the women. That was group one. Variable two is the men. And you can see the means. The first row shows you the mean. And you don't need so many decimal places, but I want you to see the printout as it appears. So we see 437.47. For variable two, which is the men, it was 552.94. We see the variances. We see the observations, 17 and 17. The pooled variance. So you don't have to do arithmetic. It's right there. There's your pooled variance. Notice it's between the two variances, kind of a weighted average. That's 101.002.8493. HO is that there's no difference. Mu1 equals mu2 is another way of saying mu1 minus mu2 is zero. That's a default. Okay. All I pass is mu1 equals mu2. No difference. Notice the degrees of freedom are 32. 17 plus 17 minus 2. And your calculated T, if you do all the arithmetic, this is your calculated T, your computed T. Notice it's minus 1.059. Or we'll round it to minus 1.06. Now, we did it as a two-tail test. You can ignore the next two rows. This is one-tail. We always do this as a two-tail test. So let's look at the two-tail. Now, you see the critical values? We get this off the table. We don't have so many decimal places. The critical values, if you looked at our table for T32, you'd see it's 2.0369. Excel has many more decimal places. So it's showing you the critical values. On the right side, it would be plus 2.036931619. And on the left tail, it would be minus 2.036931619. This is where we have 025 in each tail. It's doing this at the 05 level. So 025 in the right tail. So clearly, you know that the T-stat is not in the rejection region. It hasn't gone beyond minus 2.03693. If you draw it, you'll see kind of in the what we call the white area, the unshaded area. But Excel does a better job than just that. This is the way we were doing it. We see if the calculated T is in the rejection region or not. It actually gives you the probability of getting the sample evidence. And it shows you that probability is not less than 05. It actually is 0.297399288. Let's just call it about 0.30. 30% chance. We'll see in a moment what that represents. But you know, you're not below 05. You've got a 30% chance of getting the sample evidence or even bigger difference. You know, if HO is true. It's telling you what is the likelihood of getting the sample evidence if HO is true. Anyway, we clipped a little piece of the output. This is really what all you look at if you're a statistician. You look at the P, that's the probability of getting the sample evidence or something more extreme. If HO is true, and you see the value there as 0.297, let's just call it 0.30. Okay? Now, you don't have to be a mathematician to know 0.30 is not less than 05. The 30% chance of getting this sample evidence if HO is true. Okay, so it's not so unusual. That's why we have no evidence to reject it. It could very well be a sampling error. So we don't reject if you see that probability there. And that's based, of course, on the calculated T that we got. Okay? The calculated T was not in the rejection region. So the probability is 0.30. And so basically, you know right away just by looking at that probability since it's more than 0.05, you don't have a significant difference. And there's your conclusion on the bottom. There's no statistically significant difference between men and women on how much they spend on wine consumption. If you look at the printout and you want to know why was your calculated T statistic. If you did all the mathematics, turning the sample evidence into the T, why was it minus 1.06? I'm going to round it. Why was it a negative number? You made the women first. You put them in group 1 and the women spend less, so you get a negative number. If you reversed it and you made the men first and they became the first variable, then you have positive. It doesn't make a difference because symmetrical. So you still won't be rejecting. Okay, now the question is what would the calculated T value have to be for us to reject it? Well, it gives you the critical value for the two-tail test. 2.0369316. So to reject HO, you need a calculated T value of either more than 2.03693 or whatever. Let's say 2.04, 2.07, 2.10, something more than 2.0369, or less than, if you're rejecting on the other side on the negative side, then you'd have to have a value less than negative 2.03693, etc., which is the critical of that. So, for example, negative 2.5, you'd be rejecting. You'd see that the probability is less than 0.5. This is problem two, we're looking at job satisfaction scores. And we took a random sample of white and non-white employees. Again, there's the raw data. The higher the value, the more satisfied you are. And if your score is a zero, that means you're not happy. Okay, so we see the raw data here and notice among whites the scores appear to be a little higher, but we're not sure yet. That's why we want a statistical test. So is there a difference between the two groups, white and non-white employees with regard to job satisfaction? We're going to test that the alpha 0.5 and the HO, again, is a mu1 equals mu2, which is another way of saying mu1 minus mu2 is zero. No difference. And it's going to be a two-tailed test. Well, let's look at the numbers. Really, you look at the probability. But indeed if that's what a statistician would look at first, probably. Look at the probability that's next to the last row, which says probability T is less than or equal to lowercase T two-tailed. You only get interested in two-tailed because we're doing two-tailed tests. And notice that probability is 0.001, etc. Basically, it's significantly 0.5 level. Probability of 001 1 in 10,000 roughly. I'm sorry, 1 in 1,000 roughly. Okay, 1 in 1,000 is less than 0.5. Okay, so we know it's significant. So we know the difference between those two means are significant. Now we can look at the means because we know they're different. Okay, variable one is white employees. Their job satisfaction on average and I'm rounding is 6.17. Non-white employees their job satisfaction, and again I'm rounding is 3.56. So we have a difference of 6.17 versus 3.56. We know it's significant. It's a significant difference. It's not explained all by chance. We see the variances 4.735 5.08 around 5.085 Okay, and we notice there's 18 employees in group one, which was the white employees. The non-white employees, 18 of them. Okay, that's where total sample size of 36. We lose 2 degrees of freedom. Notice degrees of freedom are 34. There's your pooled variance 4.91. And that's kind of again sort of in the middle between 4.735 and 5.0849. So that's why it's called the pooled variance. And then the calculated T The calculated T is 3.535. That's turning the sample evidence into a T statistic. Okay, so T stat is 3.535 and you see right away it's outside of the it's in the rejection region. How do I know? Because the T critical to tail the last row on the printout is 2.032. Anything more than that we're going to be rejecting. If you reject the left side, anything less than 2.032. We're rejecting on the left. In this case, we see we clearly reject and I didn't even need to look at the critical values. Once I saw that probability there and again it's the probability of capital T less than or equal to lower case T to tail. It's 0.001 I knew right away I have a significant difference. So in simple English if you look at this printout you see the two groups the white and non-white people do not have the same job satisfaction. There is significant difference. So the difference between 6.17 job satisfaction versus 3.56 which of course the latter one is the non-white employees is different and we should investigate why a non-white employees unhappy at this company. This essentially repeats what I've told you. These employees have more job satisfaction than non-white individuals. Now the company might say well it's only sample 36 and they have thousands of people working at the company but you don't buy that. That's the whole point of the statistics is to make sure you're not looking at chance or sampling error. And the printout tells us clearly this is not what's supposed to happen. Okay so we did a two tail test and the probability of getting this kind of sample evidence is no difference between whites and non-whites. The probability of getting this sample evidence is 0.0 or something more extreme actually is 0.0012. In other words, if white and non-white employees feel the same and the same job satisfaction score there's only 12 out of 10,000 chances of getting the sample evidence. Notice this is not what's supposed to happen. With testing at the 05 level we are going to reject basically in simple English you tell your boss the difference is significant, it's not explainable by sampling error or chance. There is a difference white employees seem to have higher job satisfaction than non-white employees and again as I said before you're going to try to investigate why this is true. Anyway this slide just summarizes and basically what you tell your boss you found the difference of 2.61 remember this is a 0 to 10 scale so 2.61 is quite a bit on a 0 to 10 scale. We found the difference, the likelihood of getting this is a lot less than 0.05 in fact it's 0.0012 we reject HO, we tell the boss the two groups are not the same when it comes to job satisfaction and we should try to investigate. In simple English we reject HO and we're convinced that this is not sampling error there's a serious difference between job satisfaction scores and non-white employees. Thank you for attending our lecture as always once you learn the material find as many problems as you can and practice, practice, practice.