 Hello, everybody. Guess what? It's time for statistics again. Chapter 8.2 estimating mu when the population standard deviation is unknown. It's Monica Wahee, your library college lecture here to talk to you more about confidence intervals. Now, at the end of this lecture, you should be able to state one way in which the Z distribution is like the t distribution, and one way in which they are different. You should also be able to demonstrate how to find t sub c in the t table. You should be able to state how to calculate degrees of freedom. And you should also be able to explain how to calculate a confidence interval when the population standard deviation is not known. Alright, so here's what we're going to go through. First, I'm going to describe to you how and why the t distribution was invented and give you really the important backstory on why this came about. Then I'm going to show you how to use the t table when calculating confidence intervals when the population standard deviation is unknown. Now, if you see there, that's a t table, but that's not the kind of t table we're going to use. However, this t table is probably better than our t table because it looks really yummy, doesn't it? Okay, the reason I'm using a very serious tone of voice is because this is a serious issue. What happens if you can't use the z table? So I'm going to tell you about students dilemma. Now, student wasn't really a student. Okay, that's just a nickname for a guy who was having a political issue. So he couldn't use his real name, right? So I'll get to that in a moment. But first, I just want to go and set the stage for this political issue. Remember back in the last lecture, when you know the population standard deviation, you also have to consider some assumptions about x before you make your confidence intervals. Okay, and these are the ones up here, you know, this was from the last lecture, but I want to call your attention to two things. One is obviously, you have to know the population standard deviation. That's one of the things you have to really have before you use that z table. Also, if you're not sure of access distribution, you really should get a sample of at least 30. And if you know it's not distributed normally, like it's skewed, you really need a lot more. But you need at least 30. Well, those two requirements can limit you a lot, right? Because what if it's unknown? What if the population standard deviation is unknown? Like, for instance, let's say you're studying a new topic, like now we have e cigarettes, they're kind of new. Maybe we don't know population standard deviations about e cigarettes. Or maybe we're starting to study a particular subgroup of people that we didn't really understand, like we're doing more studies on transgender people right now. Okay, maybe we don't have a population standard deviation for the subgroup now, because we're starting to do better studies. You know, that can be an issue. And also, what if you just plain old have small sample? What if you just can't get a lot of people in your study? Well, there are ways to work around it. So first of all, if you don't have a population standard deviation, you need to use some sort of estimate for that. And you can use a kind of lower quality estimate by using your sample standard deviation, right? I mean, there's sample standard deviations only going to be made out of however's in your sample. So it's not the greatest estimate of the population standard deviation, but it'll do you could use that. But the it also means if you're going to use your sample standard deviation, that we kind of violate the whole central limit there, right? I mean, we need a super large sample or else we're violating the central limit there. So this is kind of what the problem looks like. And so here is that guy was talking about that he had to hide behind his nickname, which was student. Okay, so in 1908, William Sealy Gossett had this problem. That's him on the slide. He worked at Guinness, you know, the beer maker. And they use barley, I guess to make the beer. And he needed to test samples of barley where he did not know the population standard deviation. And he also couldn't test large samples. So he had this problem. So this is actually a common problem. So what he did to solve the problem is he came up with his own distribution. I don't know why he called it the t distribution. But that's what he did. He called it the t distribution. But again, as they had a rule at the time that everybody working there could not publish anything, no matter what. And he like begged his boss, and his boss would not let him publish. So he secretly published it and they, they nicknamed him student, probably because he was always doing calculations and stuff and acting like a student. So he published this new t distribution under his nickname student. And now it's called the students t distribution. Although some people get really political about it. And they say, we have to call it Gossett's distribution. He was robbed of his whatever, you know, but we can call it students. I'm sure he's fine with that. He seemed like a really modest guy actually. Okay, so let's talk about students t distribution. What I've got on the slide is actually the Z distribution. But I wanted you to kind of think about that. So you could see what the differences are between the Z and the t distribution. So like the Z distribution, the T has its own table. So don't go to the Z table. If you want to look up T, you got to go to the T table. And unlike Z, T is different, depending on the size of n. So remember, it doesn't really matter how many n's you have, you know, how big your n is, how many x's you have, you can go look up the Z table, and it just doesn't matter. On the T table, you actually have to know this thing called degrees of freedom. And how you calculate the degrees of freedom is you go n minus one. And so obviously, you can't get the degrees of freedom unless you know what n is. And if you don't get the degrees of freedom calculated, then you don't know how to find what you need in the T table. So that's sort of the key to the T distribution that's different from the Z distribution. So like for example, if you have degrees of freedom of three, the curve looks kind of like the Z distribution, but it's really like shallower. And see how the tails are a little thicker like that. Now I'm going to show you an example of, okay, here's the degrees of freedom of five. So like if your n was six, and it was degrees of freedom or five, see how it gets stretched up more. And the tails are not as thick. So as you increase your sample size, here, let's look at 10. It keeps going up. And eventually, you're right, it's going to meet the Z. But that's why you need to know your degrees of freedom because you're going to get a different number for T, no matter or depending on which degrees of freedom you have. So I'm just going to review here these properties of the students T distribution. So like Z, the T distribution is symmetrical around zero like you may have not noticed that from what I was just showing you. But yeah, that's similar. But unlike Z, the distribution depends on the degrees of freedom, which depends on the sample size, since the degrees of freedom is n minus one. And like Z, the distribution is bell shaped. But unlike Z, the tails are thicker, right, like the thing gets kind of stretched out. And so as the degrees of freedom in the sample size increases, the T distribution approaches the Z distribution, like you saw how that as I increase the degrees of freedom, that curve kept going up until eventually it would meet the Z. And also, for the T distribution, like the Z distribution, the total area under the T curve is always one. So we're dealing with the same kind of area probability situation. Alright, now that I've introduced you to the T distribution, I'm going to explain to you how to find the conference interview interval when the population standard deviation is unknown, and you get a new formula. Okay, let's just go over what the slide says. Okay, these are the steps and then I'll go through an example. So step one is make sure that you've calculated your x bar and your s, right? So you have a sample of n size n. So you need to know n. And you need to calculate the x bar, which is pretty easy, but you need to calculate s, which is not that easy, right? Because remember that from chapter three. So you got to get all that together and get that ready. So that's step one. Then step two is you got to select your confidence level like C. And like I was just saying, you know, depending on the confidence level, and also the degrees of freedom, then you get that stuff together. And you look up T sub C in the T table. And I'll demonstrate that like how to do that. After you do that, then you have to calculate your margin of error. Only you have to use this formula because we don't have Z is you'll notice it looks a lot like the Z one, only instead of Z sub C, you have T sub C, instead of population standard deviation, your sample standard deviation, because we don't have the population standard deviation. So we're going to use this estimate using the sample. So once you get your E, you're familiar with step four and step five, right? Because step four is to get the lower limit, you subtract E from x bar. And step five to get the upper limit, you add E to x bar, the margin of error. And so I'm going to demonstrate this. And then I'll also be able to show you like how to deal with the T table. Okay, so hello, time for class, right? Only we're in a different college. So the population standard deviation is unknown. See, we're studying a different subgroup. But we want to know their mu, and we're going to do our best. So first, let's obtain their x bar. And let's calculate their s. And let's count up the n. So I'm pretending at the new different college where we don't know the population standard deviation, that they did their test and they got x bar 68, which is a little bit better than we were doing before. And I made up that they got the sample standard deviation was 10. And that we studied 30 people. So n equals 30. So we did step one, we pulled together our x bar, our s and our n. Okay, now we need to decide on a C, right? So I just picked 90%. We also need to calculate our degrees of freedom. And so that was really hard, right? That's n minus one 29. That's probably the easiest thing you'll learn in this class is degrees of freedom. I just love that, at least something's easy. Okay, but now we're going to do something kind of hard. We're going to look that up. We're going to look up T sub C in the tea table. So before we go to the tea table, I just want you to remember we've selected 90% for the confidence level. And our degrees of freedom are 29 because we need those ingredients before we go to the tea table. And we go find our T sub C. Okay, here we are at the tea table. Okay, so please notice that the tea table is on page a 10, that's appendix 10. Also notice that it's only one page. Okay. Now, I know this doesn't really look a lot like the z table, but don't worry, I'll help you understand. First, we're going to look over here, and notice this DF, this is degrees of freedom, right? And they're down this way. So remember the one I told you remember, we're looking up, which is 29. This is where we're going to look for the 29 isn't here. Now, you'll notice over here, it says C for confidence level. And then you'll notice these two things under here, ignore these two things right now, these two rows under here, and just pay attention to C. Because you know what C is, basically, it's our confidence level, right? And remember, I told you to remember 90% because that's the one we picked as we go across here. Sure enough, here's 90%. Right? If you'd done 95%, it'd be over here. Here's 99. But this is 90. Okay, so we looked up our C here. And now we're going to go under our DF, our degrees of freedom, and we're going to find the number there. So let's scroll down. So this is, this is the fourth one over, right? So I got to remember that as we scroll down. Do do do do. Okay, here's 29. So the fourth one over is the one we want, which is 1.699. So as you can see, it matters what your degrees of freedom are, because like if we'd gotten, for instance, like 20, what we had degrees of freedom of 20, we'd have like a bigger T, meaning that would make our E our margin error bigger. And that would mean that our confidence interval got wider. And that's the penalty we pay for having smaller sample. But here up at 29, we get a little discount, we get a little smaller 1.699. And as you get up there, you'll see that this just keeps getting smaller and smaller. You'll also notice that they started kind of cheating over here. After 30, they jumped 3540 4550, because it's hard to fit all of us on one table, you'll find different tables, sometimes they'll go out. But one of the things I want you to notice is that down here, it says like 120. And then it says infinity. And infinity, like over here, the 95% 1.96, what's that, that's Z, right? So this is sort of the table version of what I showed you in a diagram, that if you get enough degrees of freedom, or sample size and then you end up just having Z. So this is really to solve the problem like student had where you have small sample and also it's compounded by the fact that you don't have a population standard deviation. So just to remind you, when you go to the T table, look up look for C and choose your confidence level and then go down and go find your degrees of freedom, which ours was 29. And here we go. And now hold that thought remember this number 1.699. All right, we're back. We looked up T sub C. That was a lot of work. So here it is. T sub C 1.699. Okay, we did step two. Now we can move on to step three, because we have all our ingredients, we're going to calculate the margin of error using this formula. So here we are calculating me. So we took our C T sub C, and we multiplied it by our standard error, which is 10 over the square root of 30, because 10 is our standard deviation, and 30 was our n. And so that's how you make the standard error, right? So once we did that, we got 3.1 for our margin of error. Okay, great. And now it's all smooth sailing, right? We're used to four and five, but let's just do it out here. So I added that margin of error to our list. So that's 3.1. So we subtracted 3.1 from our X bar, which is 68. And we added it to our X bar to get our lower and upper limits 64.9 and 71.1. So this is what you say is, you know, we wanted to know, we said class from a different college, what is their meal? And you say, I am 90% confident that the meal of the class from a different college is between 64.9 and 71.1. That's how you say it. So just to recap, if you know the population standard deviation, like in the last lecture, then you have a very robust measurement of variation, right? Because it's based on the whole population. This allows you to use the Z table. As long as your X is normal, or as long as you have a lot of sample, right? So if your X is normal, you don't need as much sample. But if it's not, or if it's iffy, you need a lot of sample. But if you have that population standard deviation, then that makes all the difference. And Z scores are smaller than T scores, which means that your margin of error is always going to be smaller if you can use the Z table. So your confidence interval will always be smaller if you can use the Z table. So that means you have a more precise estimate of mu. So the smaller your confidence interval is, the more you kind of know where mu might be. And you like that. So using the Z tables grade, and that's what's great about having a population standard deviation. But unfortunately, that's not always the case. Like with student, he had all these issues. And sometimes in healthcare, we have these issues because we can't get large samples of certain kinds of patients and or certain populations. And also sometimes we just don't know their population standard deviation. There's not going to be any way we can figure it out. So if you don't know the population standard deviation, then you really just don't have a very good measurement of variation in your data. But as long as your X is normally distributed, you can use the sample standard deviation as a stand in instead. And if you are lucky enough to have a sample size bigger than 100, you can use the Z table, or like above that 120, you know, where it's infinity, because they're the same at high numbers. But if you have a sample size is smaller than 100, you probably be safer using the t table. And if it's smaller than 30, you definitely need to be using the t table because it really matters that those low numbers. So in conclusion, students t table was really useful to us. And it was really useful to him and everybody else who needs to deal with smaller samples. And when you don't have a population standard deviation, so you can actually make a confidence interval and you can actually estimate me anyway, even with these problems. And so and then I concluded this lecture with a demonstration of how to calculate a confidence interval using the sample standard deviation and using the t table. Now I just want to point out something this guy was at Guinness. Why didn't he name it a beer table?