 Welcome to our lecture about the two-sample t-test. You're going to learn how to make inferences about means from two separate populations. We've already learned to use the z-statistic for inferences about means of two groups. Now we're going to learn to do the same thing with the t-statistic. When do we use z and when do we use t? The rules are exactly the same as what we learned in one sample statistical inference, making inferences using one sample from one population. If you want to review, learn more about it, go back to that lecture. Probably it's more likely to be in the one sample t-statistic lecture using the t-test. Basically, to summarize, if you know sigma from the two groups, sigma one and sigma two, you have no problem. Just use the z. If you have a large sample size in both together, n1 and n2, and again, as before, what does large mean? It might have to do with the policy of the decision maker. In this case, the decision maker is your instructor if you're taking the class in a formal setting. If you have a large enough sample size, use the z. The problem arises if you don't know the population's standard deviation, and usually you don't. If your sample size is just not large enough, for whatever reason, you were not able to collect enough data, which happens very often. Data is very expensive, and sometimes in the process of collecting data, you're actually destroying the objects. So it happens quite frequently. So what do we do? Well, then we would like to use the t-distribution as an approximation to the z. But again, as before, we have that additional caveat that in order to use the t, we also have to know that the underlying population is normally distributed. If we don't know that, often what we're doing is making that assumption. We're saying, well, Sigma 1 and Sigma 2 are not known. My samples, n1 and n2, my sample sizes are small. And I'm going to assume that the underlying populations are normally distributed. There are situations where we know for sure we can't assume that, because we have some knowledge from previous research about the shape of the distribution. And then we have to work with other methods, non-parametric or distribution-free methods, which is not a topic for this class. You'll notice that the formula we use to get the calculated value of the test statistic, that's a t with n1 plus n2 minus 2 degrees of freedom. It has another value in it that has to be computed first. That's the pulled variance. Otherwise, things look kind of familiar. We have x1 bar minus x2 bar. So the difference between the two sample means. Then we have divided by the square root of, there's that pulled variance, times 1 over n1 plus 1 over n2. So before you compute the calculated value of z, you have an interim value. You have to get the pulled variance first. The pulled variance is really an average variance. But since n1 and n2 don't have to be the same size, it's a weighted average by sample size. That's really all it is. If n1 and n2 are the same size, then what you're doing is you're getting an average of the two, the average variance of the two samples. And you can see how we calculate s2 pulled over there. We're using degrees of freedom. We're weighting by degrees of freedom. So n1 minus 1 divided by the total degrees of freedom is the weight for the variance of group 1. And 2 minus 1 divided by the total degrees of freedom is the weight that we use for the sample variance of group 2. Now we want to introduce yet another assumption we're going to be making and not testing in this class. And it's called homoscedasticity. What we're doing here is we're saying we know that for us to be able to do this test, the two variances really have to be the same. We're assuming that the two population variances are really the same, are really equal. Now, like anything else, we could test for this that this parameter would be sigma or sigma squared. And we'd be testing the null hypothesis would be that sigma squared 1, sigma squared from group 1 is equal to the variance from group 2. We're not doing that in this course. So the only thing you can do right now in order to finish the problems and to do the exercises is to assume that if you had done the test, they would be the same. And one thing that would be nice, it doesn't always happen, is that if the exercises that you do would say, you know, we assume that the two variances really are the same. But on the other hand, if you're doing a problem, and there's a follow-up question, what assumptions did you have to make in order to do this problem? And it's a two-sample t-test. Well, you have a few now. You're assuming a normally distributed population. You're assuming that the variances of the two groups are actually equal. You don't know what they are, and you have different sample variances, but you're assuming that the difference in the sample variances is due to sampling variation, rather than to the two population variances actually not being equal. If we were going to learn how to test for homoscedasticity, it would be something called an F-test. Certainly, any statistical software you use will do it, and you will learn about this in other courses. In problem one, we're comparing the reading scores of two groups, men and women, and we note that the women in their scores, average scores, 84, compared to the 80 for the men. That's a four-point difference. The question is whether that four-point difference is significant, or it might just simply be a chance of sampling error. So we're going to test whether this is significant. I notice the sample size is 16 and 15. Even under HO, where you say that there's no difference, you've got a sample size of 16 plus 15 of 31. We'll see in a moment that's 29 degrees of freedom. You cannot use z since you don't know sigma. So you're going to have to use t, a two-sample t-test. HO is that mu-one equals mu-two, which is the same as saying no difference. H-one is that mu-one is not equal to mu-two. Now, this is a t with 29 degrees of freedom. We lose two degrees of freedom because it's n-one plus n-two minus two. It's a mathematical adjustment in effect. You have 29 degrees of freedom. Now, look at the critical value, t-29, and you have 025 in the right tail, two-tail test, 025 in the left tail. And notice it's not 1.96 and minus 1.96, which it would be, this were a z. Since it's t-29, which is not quite a z yet, the critical values are 2.0452 and negative 2.0452. These numbers come from the t-table. You look for 025 in the tail, and you look at 29 degrees of freedom. And you'll see the value 2.0452. As a check, go to infinity. When t-infinity is z, you'll note that the value will be the familiar 1.96 and minus 1.96. When you're using the t-test, you're going to have to first calculate s-squared pool. Now, s-squared pool is the HO's that there's no difference, make it into one group. If you take the raw data and put it into one group, you'd have 31 numbers, of course, then you'd be getting the s-squared pool. But if you don't have the raw data, and you just have s-squared and s-2-squared, then use this formula, n1 minus 1, so you've got 15 times s-1-squared to 256, n2 minus 1, which is 14, times the s-2-squared, which is 400, divided by 29. This is essentially kind of a weighted average. This is a weighted average of the two variances, group 1 and group 2. And you end up with 325.5, okay? And notice it's right sort of in the middle between s-1-squared of 256 and s-2-squared of 400. And then now you can do the calculation of the t29. And you see it's 80 minus 84, so minus 4, that's the difference between the two groups. And then in the denominator, first do 1 over 16 plus 1 over 15, do that first, times it by 325.5, 325.5. Take the square root of that, that's the square root of 42.04, which is 6.48. Your t29, that's your sample evidence basically, is negative 0.62. Clearly you're not in the rejection region, and you have no evidence to reject HO. In simple English, there's no significant difference between men and women on these test scores. Here we're comparing, this is problem 2. We want to know if the salaries are different in two companies. Actually this is the daily wage. We take a sample of 30 people, 10 from company 1, 20 from company 2. And notice that the average in company 1 was 210, it's daily pay. But company 2 was 175. Before you can make any conclusion, let's say company 1 pays more than company 2, based on your sample evidence. Before you can do that, you're going to have to test your significance. You're noticing basically a $35 difference, but that could just very well be a sampling error. HO is mu1 equals mu2, H1 is mu1 not equal to mu2, this is a two-tail test. We're testing at the alpha of 01, we cut the 01 in half, so we have 0.05 in the right tail, 0.005 in the left tail. And this is T28. And notice the critical value for T28, get it off the table, look at 28 and look at 005 in the tail. And you'll find that the critical value is 2.7633. It's symmetric, so on the left it's also negative 2.763. Those are the critical values. The next step is to get this S squared pooled. So we've got to take N1 minus 1, which is 9 times 625. 625, of course, is the S1 squared. And then we take N2 minus 1, which is 19, 20 minus 1 is 19, times the S2 squared of 400, divide by 28, which is the degrees of freedom. And then you get a weighted average of 472.3. That's called S squared pooled. Okay, we put that into the formula. We have T28 equals 210 minus 175. That's the $35 difference that we observed. And now we have the standard error for the difference, which is, first you do the 1 over 10, 1 over N1 plus 1 over N2, 1 tenth plus 1 twentieth. Calculate that first, times it by 472.3. You get the square root of 70.845. It's 35 over 8.42. We get a T28 value of 4.16. That's your calculated T. That's essentially your sample evidence. Okay, and now we turn the sample evidence into a T value of 4.16. And we indeed find it's in the rejection region. Our conclusion is that that $35 difference between the two companies is a significant difference. So company A pays more than company B. In this problem, we're looking at two suppliers of concrete beams. And what the metric we're looking at is the strength of the beams. We want to know if the two suppliers are basically equivalent or if we should purchase from one rather than the other. Strength is measured in pounds per square inch of pressure. And what we want to know is, is there a significant difference between the beams supplied by supplier A and supplier B with regard to the strength of the concrete beams. We see that the two averages, the two sample averages are 5,000 pounds per square inch versus 4975 pounds per square inch. That's a 25 PSI difference. Is that difference real and reflects the fact that the two suppliers really do make concrete beams with different strengths? Or is that just sampling error in it? It could have happened due to, you know, the randomness inherent in the world. We're going to do this test at an alpha 0.05 and we'll do that on the next slide. The null hypothesis is that mu 1 is equal to mu 2. There's no difference. The alternate hypothesis is that the two population means really are different. We're using a T with 20 degrees of freedom. The sample 1 plus sample size 2 minus 2, that's where we get the 20 from. With an alpha 0.05, it's split equally. It's a two tail test. We have 0.025 in one tail, 0.025 in the other tail. And so when you look this up in the T table to get the critical values that you see here, you're looking at the 20th row, degrees of freedom, 20, and the column for 0.025 tail probability. And you find that the critical values are plus and minus 2.086. And that's going to be your decision rule for the test. If you're in the red shaded area, if your calculated value of the statistic is in the red shaded area, you'll reject the null hypothesis. You'll say it's too far away from the mean of zero. If it's in the white unshaded area, you'll say, well, I can't reject. It could be. It could be. How do we get the calculated value? Again, the first thing you need is to get the pooled variance. It's kind of like an average variance weighted by degrees of freedom. That's 2995. And then we use the formula to get the calculated value of the T statistic. You see the difference between the two averages 25. You end up dividing by 23.4. I'll leave it to you to on your own. Do that formula. Make sure that my work is correct. And you end up with a T 20 value, calculator value of 1.07. That's in the unshaded area. It wasn't beyond 2.086 on the one side. It wasn't beyond negative 2.086 on the other side. So the conclusion is do not reject HO. There is no statistically significant difference between the means of the pounds per square inch of pressure between the strengths of the two suppliers of concrete beams. The difference we observed, which was actually a pretty small difference was just due to random variation. We're going to also use Microsoft Excel to solve two sample T tests. One of the reasons is as you can see the calculations are doable but formidable. And so we do rely more on statistical software. We don't want to use pencil and paper and calculator for problems like this if we don't have to. Remember, you can figure this out in many different ways. There are a lot of online resources for using MS Excel to solve statistics problems. We have some very, very simple instructions in text form and in video form on the handouts page of our website. Please feel free to use it. Let us know how it works for you. We're going to use Excel to answer the question whether men and women spend the same amount on wine. It's well known that men spend a lot more on beer than women. But how about wine? And the researcher took a sample of 34 people. 17 happened to be women and 17 were men. And they found that the average amount spent on wine, this is pretty in a year, by women was $437.47. The average amount spent by men was $552.94. The raw data is on the right. You can see it. This represents the spending habits on wine for men and women and there were 34 of them. The real question here is, is this difference statistically significant? And we're going to solve it using Excel. Again, we have the raw data here. We're not just reporting the mean and the standard deviation. We're showing you the raw data. Anyway, this is the Excel printout. You can do it yourself and you'll see. Now Excel calls it variable one and variable two. There's a way to put in what it is. We know that variable one was the women. That was group one. Variable two is the men. And you can see the means. The first row shows you the mean. And you don't need so many decimal places. But I want you to see the printout as it appears. So we see 437.47 for variable two, which is the men. It was 552.94. We see the variances. We see the observations, 17 and 17. The pooled variance. So you don't have to do arithmetic. It's right there. There's your pooled variance. Notice it's between the two variances, kind of a weighted average. That's 101.002.8493. HO is that there's no difference. Mu one equals mu two is another way of saying mu one minus mu two is zero. That's a default. And all I pass is that mu one equals mu two. No difference. Notice the degrees of freedom are 32. 17 plus 17 minus two. And your calculated t, if you do all the arithmetic, this is your calculated t or computed t. Notice it's minus 1.059. Or we'll round it to minus 1.06. Now we did it as a two-tail test. You can ignore the next two rows. This is one-tail. We always do this as a two-tail test. So let's look at the two-tail. Now you see the critical values. We get this off the table. We don't have so many decimal places. The critical values, if you look at our table for t32, you see it's 2.0369. Excel has many more decimal places. So it's showing you the critical values. On the right side, it would be plus 2.036931619. And then the left tail would be minus 2.036931619. This is where we have 025 in each tail. It's doing this at the 05 level. So 025 in the right tail. So clearly you know that the t-stat is not in the rejection region. It hasn't gone beyond minus 2.03693. If you draw it, you'll see what we call the white area, the unshaded area. But Excel does a better job than just that. This is the way we were doing it. We see if the calculated t is in the rejection region or not. It actually gives you the probability of getting the sample evidence. And it shows you that probability is not less than 05. It actually is the 0.297399288. Let's just call it about 0.30, 30% chance. We'll see in a moment what that represents. But you know you're not below 05. You've got a 30% chance of getting the sample evidence or even bigger difference if HO is true. It's telling you what is the likelihood of getting the sample evidence if HO is true. Anyway, we clipped a little piece of the output. This is really what all you look at if you're a statistician. You look at the p, that's the probability of getting the sample evidence or something more extreme if HO is true. And you see the value there is 0.297. Let's just call it 0.30. Now you don't have to be a mathematician to know 0.30 is not less than 05. The 30% chance of getting this sample evidence if HO is true. So it's not so unusual. That's why we have no evidence to reject it. It could very well be a sampling error. So we don't reject if you see that probability there. And that's based, of course, on the calculated T that we got. The calculated T was not in the rejection region. So the probability is 0.30. So basically, you know right away just by looking at that probability since it's more than 0.05, you don't have a significant difference. And there's your conclusion on the bottom. There's no statistically significant difference seeing men and women and how much they spend on wine consumption. If you look at the printout and you want to know why was your calculated T statistic? If you did all the mathematics, turning the sample evidence into the T, why was it minus 1.06? I'm going to round it. Why was it a negative number? It's only because you made the women first. You put them in group 1 and the women spend less, so you get a negative number. If you reversed it and you made the men first and they became the first variable, then you have positive. It doesn't make a difference because it's symmetrical, so you still won't be rejecting. Now the question is what would the calculated T value have to be for us to reject it? Well, it gives you the critical value if you're doing a two-tail test, and instead it was 2.0369316. So to reject HO, you need a calculated T value of either more than 2.03693 or whatever, basically let's say 2.04, 2.07, 2.10, something more than 2.0369, or less than, if you're rejecting on the other side on the negative side, then you'd have to have a value less than negative 2.03693, et cetera, which is the critical value. So for example, negative 2.5, you'd be rejecting. You'd see that the probability is less than 0.5. This is problem two, we're looking at job satisfaction scores, and we took a random sample of white and non-white employees, and again, this is the raw data. The higher the value, the more satisfied you are, and if your score is a zero, that means you're not happy. So we see the raw data here, and notice among whites, the scores appear to be a little higher, but we're not sure yet, that's why we want a statistical test. So is there a difference between the two groups, white and non-white employees, with regard to job satisfaction? We're going to test at the alpha 0.5, and HO, again, is a mu1 equals mu2, which is another way of saying mu1 minus mu2 is zero, no difference, and it's going to be a two-tailed test. Well, let's look at the numbers. Really, you look at the probability, and indeed, if that's what a statistician would look at first, probably, look at the probability, that's next to the last row, which says probability capital T is less than or equal to lowercase t, two-tailed, you're only interested in two-tailed, because we're doing two-tailed tests, and notice that probability is 0.001, etc. Basically, it's significant at the 0.5 level. This is probability of 001, 1 in 10,000, roughly, I'm sorry, 1 in 1,000, roughly, 1 in 1,000 is less than 0.5. So we know it's significant. So we know the difference between those two means are significant. Now we can look at the means, because we know they're different. Variable 1 is white employees. Their job satisfaction on average, is 6.17. Non-white employees, their job satisfaction, and again, I'm rounding, is 3.56. So we have a difference of 6.17 versus 3.56. We know it's significant. It's a significant difference. It's not explained a little by chance. We see the variances, 4.735, 5.08, around, 5.085, and we notice there's 18 employees in Group 1, which is the white employees. The non-white employees, 18 of them. We have a total sample size of 36. We lose 2 degrees of freedom. Notice the degrees of freedom are 34. There's your pooled variance, 4.91. And that's kind of, again, sort of in the middle between 4.735 and 5.0849. So that's why it's called the pooled variance. And then the calculated T. The calculated T is 3.535. That's the calculation. That's turning the sample evidence into a T statistic. So T stat is 3.535, and you see right away it's outside of the, it's in the rejection region. How do I know? Because the T critical, two-tail, the last row on the printout, is 2.032. Anything more than that we're going to be rejecting. If you reject on the left side, anything less than 2.032. We're rejecting on the left. But in this case, we see we clearly reject. And I didn't even need to look at the critical values. Once I saw that probability there, and again, it's the probability of capital T less than or equal to lowercase T2-tail. It's 0.001. I knew right away I have a significant difference. In simple English, you look at this printout, you see the two groups, the white and non-white people do not have the same job satisfaction. There is significant difference. So the difference between 6.17 job satisfaction versus 3.56 which of course, the latter one is the non-white employees is different. We should investigate why a non-white employees unhappy at this company. This essentially repeats what I've told you. White employees have more job satisfaction than non-white individuals. Now the company might say well, it's only sample of 36 and they have thousands of people working at a company but you don't buy that. That's the whole point of the statistics. It's to make sure you're not looking at chance or sampling error. And the printout tells us clearly this is not what's supposed to happen. So we did a 2-tail test and the probability of this kind of sample evidence if there is no difference between whites and non-whites the probability of getting this sample evidence is .0012 or something more extreme actually. In other words, if white and non-white employees feel the same job satisfaction score there's only 12 out of 10,000 chances of getting this sample evidence. This is not what's supposed to happen. At the .05 level we are going to reject and basically in simple English you tell your boss the difference is significant. It's not explainable by sampling error or chance. There is a difference. White employees seem to have higher job satisfaction than non-white employees. And again as I said before you're going to try to investigate why this is true. Anyway, this slide just summarizes and basically what you tell your boss you found the difference of 2.61 on a 0-10 scale. So 2.61 is quite a bit on a 0-10 scale. We found the difference the likelihood of getting this is a lot less than .05 like this .0012 we reject HO, we tell the boss the two groups are not the same when it comes to job satisfaction and we should try to investigate. So in simple English we reject HO and we're convinced that this is not sampling error but a serious difference between job satisfaction scores of white and non-white employees. Until now we've been doing two-tail tests where we'll reject either on the high side or the low side. Here's an example of one which is a one-tail problem. Does a chip made by company X have a longer life than the chip made by company Y? Company X thinks it does so let's see what happens. Here's the data a sample of 16 company X a sample of 14 company Y 2 sample standard deviations 2 sample averages 6.8 years for company X 6.1 years for company Y we're using a significance level of .05 and of course we're assuming equal variances where all the assumptions we have to make in order to use the T distribution. Okay note how we set up the HO and the H1 HO that's our straw man is saying that company X the average life of its chips is less than out of company Y. Of course the company's been claiming the opposite they want to reject that. I've been saying that company X the average life of its chip is longer greater than company Y again notice the rejection region is where H1 is pointing on the right since we have T28 degrees of freedom 16 plus 14 minus 2 which is T28 and 05 in the right tail the critical value is 1.7011 and again we'll look at a T with 28 degrees of freedom okay you can see how we calculated the S squared pooled remember it's kind of like the average or sort of weighted average of the two variances for the two groups company X company Y and we see that S squared pool is 0.731 and notice the T28 we end up with a T value of 2.24 which is in the rejection region the probability is less than 05 and our conclusion is we reject H0 okay we shut the straw man down and it turns out the computer chip is made by company X do have a longer life than those manufactured by company Y okay this probably looking at a pharmaceutical company they're testing a new weight loss drug they're claiming that people will lose weight if they take this drug with their meals okay we take a random sample of 10 people notice you have the before weight and the after weight so these are not two independent samples this is either called a paired two sample T test or a matched T test but just remember though it's not 20 people we're looking at 10 people and looking at change and you can see how much they what the change was the first subject went from 130 pounds the second subject went from 125 to 120 and we're going to test this for significance notice already that even though this problem is in a lecture about two samples this is not really two samples at all it's matched but it's one sample with two metrics two values taken from each subject one the before weight one later the after weight and so if we compute the difference which could be a loss could be a gain if you compute the difference that's one variable and it's then everything reduces to whatever we learned earlier in the lecture on one regular one sample testing but it's considered part of this lecture it's not it's paired or matched testing so you can see how the data table works out you can do this in Excel by hand or you can have Excel do it for you you can find some problems that are worked out and what do we have before after we take the after minus the before that's the difference the average difference you can see it on the box on the top right because the differences add up to negative 40 divide that by 10 the average difference in weight is a loss of four pounds that sounds good for the company right to get the standard deviation you use the formula for standard deviation which you can see laid out over there in terms of D we end up adding up all the deviations squared you get 80 divided by n minus 1 all of that goes under a square root and so the standard deviation is 2.98 pounds and now we have to use this in order to do the hypothesis test you can see how this looks exactly like a one sample test our null hypothesis is that the mu of the difference is zero in other words the null hypothesis is no difference using the drug the before and the after weight are the same not significantly different the alternate hypothesis H1 is that there is a difference will reject either way either if the weight loss drug works in its intended way or if it works in the opposite of its intended way so in this case we are going to reject the null hypothesis if we find a loss that is too high or too low and you see the T formula laid out and worked out it ends up being negative 4.25 that's a very very large value it's way into the region of rejection on the left side so yes definitely we reject the null hypothesis the conclusion is that the average weight loss of 4 pounds is statistically significant at the alpha equals 0.05 level and even more you can see the P value is 0.0022 thank you for attending our lecture as always once you learn the material find as many problems as you can and practice practice practice