 Hello, and welcome back to statistics chapter 8.1 estimating mu when the population standard deviation is known. This is Monica Wahee, your library college lecturer, bringing you more statistics. So let me just cut to the chase. What this lecture is about is confidence intervals. Okay, so we're going to be talking about confidence intervals. Let's just be honest here. So at the end of this lecture, you should be able to state the formula for the margin of error. You should be able to explain how to choose Z sub C. That's how you pronounce that Z sub C, when making a confidence interval, you should be able to state one of the assumptions behind x that needs to be met in order to make confidence intervals. You should be able to interpret what a confidence interval means. And finally, you should be able to demonstrate how to calculate sample size. So here's what we're going to talk about. First, I'm going to review a few key points from chapter seven. Now, I know you're kind of sick of chapter seven, but I'm just going to pull out a few key points that I want you to think about during chapter eight, because I think it'll help you understand chapter eight. Next, I'm going to explain what confidence intervals actually are. We're going to go through quite a bit of theory. So you understand what we're doing when we're making a confidence interval. Next, I'm going to demonstrate how to calculate confidence intervals. And then I'm going to demonstrate how to calculate n or sample size if you want to know how big your sample should be for a certain confidence interval. So the first one of how to make confidence intervals, that's after you've already had your sample and measured it. The second one, demonstration of how to calculate sample size, that's before you get your sample if you want to know how many you need. Alright, so here's chapter seven, please humor me. I'm just going to pick out a few ideas that I want you to think about. So first of all, remember, from back, the slides I showed you, there were all these different types of inferences. And one of them called estimation, I said, we're going to practice this in chapter eight. Well, I wasn't lying. Here we are. So what's estimation? Estimation is where we estimate the value of a population parameter using a sample sounds like a logical thing to do in statistics. In this situation with the confidence intervals, what we're doing, we're trying to estimate Mu from using a sample. Because Mu is obviously a population parameter. And if you just have a sample, you don't know what Mu is, but we try to estimate it from our sample. And so that's what chapter eight's really about. Okay, next, remember 7.1, right, remember the empirical rule? And remember, I asked you all these questions about the probability of selecting an X above or below a cut point or the percentage of data between a certain group. There was one question though I want you to think about. And that was this question about the middle 68%. So the question was, what are the cut points for the middle 68% of the data? And that literally meant that it had to be 34% on one side of Mu and 34% on the other side of Mu. And when I said, Well, what are the cut points? What are the X's that mark that on this particular distribution? And it was 51 was the lower limit and 80 was the upper limit. So I just want to point out here that this means that the probability of selecting an X between 51 and 80 with this particular distribution was 68%. So technically, that's kind of like you would say, if I select an X from this distribution, I am 68% confident it'll be between 51 and 80. That's another way of saying this. Okay. Okay, now we're going to go skip ahead to chapter 2.7.2 and 7.3. Now remember what we did there? We had already learned about the empirical rule. Now we figured out how to calculate X's for Z's that were in between the cut points of the empirical rule. So most questions focus on the probability of selecting X below or above a cut point that wasn't on the cut points of the empirical rule. So that was most of what you were doing in 7.2 and 7.3. However, there was one question I gave you on the slides about a middle 20%. So I'm bringing that one up. So this was the question. It's what scores mark the middle 20% of the data, right? So it'd be like 10% on one side and 10% on the other. And so what we did was we looked up we looked up what the probabilities were and what the Z's were, and we got the middle 20% of the data. And so the middle 20% of the data was 61.9 through 69.1. So I just wanted you to notice that that also could be said this way. It's like the probability of me selecting an X between 61.9 and 69.1 was 20%. Or I could say this way, if I'm 20% confident that if I select an X from this distribution, it will be between 61.9 and 69.1. So if you've ever wondered how you can feel a certain percent confident, that's how you do it, is you just say it that way, you take a middle thing like this, and you say the probability of doing this, you know, you look at the probability and you say, I'm that confident, that percent confident that the X I select will be between these two cut points. Then I want to remind you what we did after that is we went into chapter 7.4 and 7.5. And this time, we weren't just picking one X, we weren't looking at the probability of picking one X. We were looking at the probability of picking a sample or a set of Xs that gave us an X bar that was between certain cut points, right? So this one was, what is the probability of me selecting a sample of 36 students with an X bar between 60 and 65? That was the way I was doing this question. So and again, we answered the question, right? The probability of selecting a sample of 36, so that the X bar is between 60 and 65 is 41%, which is another way of saying, I am 41% confident that if I take a sample of 36 people from this class, their X bar will be between 60 and 65. So that's how you feel 41% confident. Because that's what I always had trouble with in statistics. I'm like, I don't know how to feel that way. Well, this is how you feel that way. But I wanted to point out this is not like the perfect example. Because what we're going to be doing with confidence intervals is going to be centered around mu, which is a zero. This is lopsided, this is centered somewhere else, because this was a different lecture, right? But what we're talking about is what if you wanted to find the X bar lower and upper limits for the middle 90% or the middle 95% or the middle 99%. That's what you would be saying is that, okay, well, between this lower limit and this upper limit, I'm 90% confident that the mu is going to be in there. That's what you're trying to do with the confidence intervals. Remember the central limit theorem? The confidence intervals work with the central limit theorem. And so I'm sorry, more theory. So remember how we had that neighborhood of people, and we were taking samples of five of them and measuring their BMI, and we were trying to figure out what will we get if we got the X bars for every possible sample of five from that group, right? And of course, we got millions literally of X bars. Okay, and we then we made a histogram of all those possible X bars, because each, each sample of five represented a sample we could have gotten if we'd just done one study. So in real life, we usually only get one sample. So if we get to look at all the different samples we could have gotten, we wonder, well, which one are we going to get? Which, where is our X bar going to be in our sample that we pick because we're only going to pick one. And mostly we're upset because we're not sure how far it's going to be from me. So if you look at this histogram, there's a bunch of samples where the BMI, the mean BMI in the sample, the X bar was less than 20. And there's also a bunch where it was more than 40. And that's probably not very representative, because you see where most of the X bars are between 25 and 30, right? So we get worried if we think about what if we get a sample that's on one of those tails? What do we do about that? So as you can see, just from three different samples, the wildly different X bars. So I could have gotten one of these. So how do I know, right? Like I feel uncomfortable because I know that that X bar might not be close to the mu. Well, the first thing you need to know is don't get too excited about the X bar. Remember sampling error, how you're going to get an X bar and it's probably not going to be me. Well, we know that. And we're we can handle it in statistics. In fact, that's how we handle it. It's with confidence intervals. What we just say is, Okay, X bars there. But we want to create a range or an interval around X bar with a lower limit and an upper limit. And then just say, Okay, fine, we're not we know X bars and right. But within this range, most of the X bars I could have gotten, like if I got another samples, most of their X bars would have fallen between this lower limit in this upper limit in this interval. But you can't actually say all. And I'll tell you why you can't say all is because if you came to me, and you asked me what your grade was going to be, if I said, Well, I can 100% assure you it'll be something between a and F. That's totally useless, right? And so if you said, Well, I want to be 100% certain that between this lower limit and this upper limit, I've got my BMI, I say, Okay, great, we'll make the lower limit zero, which nobody can have in the upper limit 100, which nobody can have there. Now I'm 100% confidence in there. So that's why you don't want all you want a probably you want to know if like 90% or 95% are in there. Because otherwise, the 100%, you could just take two huge numbers, you know, one hugely low and one hugely high and say they're all in there. And that just doesn't get you anywhere. So what I did was I superimposed the Z distribution, the normal distribution on top of that histogram of the X bars. And I just drew a lines for where the middle 95% would be because I just know that it's between negative 1.96 and positive 1.96. Okay, and so that's technically if you wanted your 95% confidence interval, you'd have to find the X at negative 1.96 and the X at 1.96. And then that would be your lower limit and your upper limit of the middle 95% of the X bars. And so you'd say, Okay, I'm 95% confident that the mu is somewhere in there because most of the X bars are in there. So I'm going to keep going with this example. So imagine you just took 100 samples of any size from this population, right? So maybe I'll take 100 samples of like five, we can stick with five, you get 100 x bars, okay, which is not what we normally do. But I just want to give you this example, you get 100 x bars, right? If you figured out the upper and lower limits for the middle 95% of the sampling distribution, you'll find that 95 out of 100 of your x bars were actually between those limits. And how I know this really works is that I use Excel and I can make up people. So I can make up a population of people with BMI's all different sizes. Then I can take 100 samples of five from each of those fake people I made up in Excel. And then I can take the X bars and I can actually see this. This really works really well. That's why people are so convinced of it. But of course, you got to realize that there are five that don't fall in there. So in real life, when you're doing a study, you know, you always ask yourself, am I getting one of the weird ones that fell outside or not? But you try to be 95% confident if you're using a 95% confidence interval, that you're getting one of the ones in the middle. So if you I was picking on 95% because I memorized this 1.96. So if you want the middle 95%, the Z is always the same. It's always 1.96 on either side of me, right? negative 1.96 positive 1.96. It's just always the same, right? You just have to look it up. So people like me memorize that one, like 95% is totally the most popular one in healthcare. Everybody uses 95. In the olden days, we used to use 99% more than we do now. But 95% has always been popular. These other ones aren't really that popular, but I just showed them to you for demonstration purposes. I want to pull out some points about vocabulary. So you're going to hear me say these three things, you're going to hear me say critical value Z, and also Z sub C, which is written like the Z with little C under it. Now I remember when my professor first said Z sub C, I had no idea what he was saying. I thought he was saying Z C. So I was like, what is Z C Z C? Well, it's Z sub C. And you'll see that sometimes that whenever you have a like a letter, a little letter, and it's a little below, it's sub, right? So in it and then also critical value, Z is also Z sub C. So what what if you look at the table, you'll see that level of confidence is we use C to stand for that. So if I'm like, I want the level of confidence of 95%, I could say C equals 95%. Right. But if I do that, I'm forcing Z sub C to be 1.96. They go together. So if I pick a C of 80%, then my Z sub C or my critical value C is 1.28. And you know what it means by critical value, like it's critical, you know, that's the cut point where you're going outside the confidence level. So it's critical. That's why they call it critical value Z. So the Z you pick, or even the C you pick determines the width of the interval around x bar. So you see remember where 1.96 was? Well, that's for 95%. If you pick 99%, you'd be talking 2.58. Remember, like three is kind of big, that's kind of out there in the in Zealand, right, like that's close to the edge. So that's what happens is you get this wider confidence interval. And remember, you're going to get a grade in somewhere between a through f. Well, the wider it is, the less useful it is because the less precise it is, you don't really know exactly where it is. But in any case, whatever confidence level you pick, at the end, you're the generic thing you're getting ready to say is I am I'm using an example 95%. I'm 95% confident. The mu is between the lower limit and the upper limit. But you never actually say that you have to interpret it with the thing you did. Like, if I was doing blood pressure, I could say I'm 95% confident. The population mean blood pressure, SPP is between whatever my confidence interval was was like 110 mmhd and 120 mmhd. So that's how you would actually say it. And don't worry about it. I will give you a demonstration. So just to remind you, I kind of am going over this theory in different ways. So it syncs it. A confidence interval for mu is an interview that we've computed from sample data, where the level of confidence or the C this percent refers to the probability of having the resulting interval contain actually the value of mu. So when I say I'm 95% confident, I'm 95% confident this interval has me in it. Another way of saying it is the level of confidence is the proportion of confidence intervals calculated from a random sample size of n samples that actually contain you. So remember when I said, let's say I took 100 samples, if I had a 95% confidence interval, I would expect that 95 out of 100 or 95% of those intervals would have me in it. So again, don't get too excited about x bar. When you get it, don't go Oh, my gosh, it's high or low. Just remember, it's a point estimate, meaning a one time estimate of a population parameter. There can be others like if you've gotten a different sample, you've gotten a different x bar. Okay, so just don't get too excited about it. Because remember sampling error, right? Your x bar is not going to probably be your mu. So when using x bar as a point estimate for mu, you have to deal with the concept of the margin of error. I some books abbreviated me, but the one we're using use uses just to calculate. And that's the absolute difference meaning like it doesn't matter plus or minus the absolute difference between the x bar you got in the mu. So I'm just gonna give you an example of what I mean by e. So if we had gotten the x bar of 60 from these students, and we magically knew mu was 65.5, our margin of error or e would have been 5.5, because that was different. If our x bar had been 90, and then our mu was really 65.5, then our margin of error is 24.5, because that's a difference between 90 and 65.5. And again, it's absolute difference. You always make it plus. I just wanted to remind you what the word margin means. See when you type like a report, and you have margins, like you're supposed to have a margin of, you know, one inch or whatever, it means a little space. And that's what you're doing is you're making a little space of error, meaning somewhere in this space is where your mu is. So that's why it's called a margin of error. All right, now you can't just go around making confidence intervals without thinking about it. So first, I just want to go over the assumptions about x that you have to consider before making confidence intervals, x being the thing that you're measuring that you're making your distribution out of, right? So first, x is out there in the world and you're drawing a simple random sample of size n from a population of x. Notice this is a simple random sample. This is not a biased sample. Remember how like if you get a convenient sample, it's probably probably pretty biased. You can't do the confidence interval thing, if you're getting a bias sample. So you're trying to get a simple random sample of a particular size from a population of x values that are hopefully normally distributed. Because as you can see, under three, if x itself has a normal distribution, then we know that x bar will have a normal distribution no matter how big our sample is. So we like it if we're drawing from a population of x values that has normal distribution. Oh, and also two says we have to know the value of the population standard deviation. So if we're doing that, and especially if we meet three, that the x itself has a normal distribution, we're really excited about making our confidence interval because we know it'll work really well. But if we're in the situation where you see four, we aren't sure about excess distribution. Remember, we only have a sample. We don't have a like a lot of data from the population to even figure that out. So if we're not sure, we should get a sample of at least 30. But as you can see by five, if it's definitely not normal, like it's skewed, like remember how we looked at some parameters from hospitals, right, that are totally skewed. So if I was sampling those hospitals, I'd be shooting for a higher sample size, like 50 or 100, because we're taking a simple random random sample of size and and ends got to be pretty big. If you've got skewed data and you want to make a confidence interval. So basically, what you're thinking about is your x and if you got a simple random sample, if you have a population standard deviation line around you can use. And then if x itself has a normal distribution, you don't need to worry about your sample size. If you're not sure you want at least 30. And if you definitely know it's not normal, you want a large amount. Now for a confidence level C, the critical value Z sub C is the Z score such that the area under the curve is between negative Z sub C and positive Z sub C. And if you're like, you're just repeating yourself, I am kind of repeating myself. But I'm doing it in different ways because I know this is really difficult concept. And so sometimes people need to hear it differently. But this is one way of saying, okay? Now I'm going to go on to talk about how to calculate a confidence interval. And trust me, it has something to do with all this theory. So here are the simple steps, right? So the step one is you make sure you have your x bar, right, because you took a sample, you better have an x bar. And of course, you're going to have an end because you took a sample and you know how many people you measured. But then you also need to make sure you have your standard deviation for your population lying around because you need those ingredients for this. Next, you have to pick C. And then that connects to whatever Z sub C you pick. So you see the table on the right, you have to select your C in the Z sub C that comes with it. So you're going to be given your x bar, your population standard deviation in your end. And then you got to get picky about C and Z sub C. But once you've made those decisions, you calculate e or the margin of error using this formula. So see that there, beautiful formula, right? Then once you get that e, that margin of error, you so you have to get the lower and upper limits for your interval. So to get the lower limit, you subtract e from your x bar, right? So that's going to make x bar forced to be in the middle, because five is you add e to your x bar to get the upper limit. So I'm going to just walk you through these steps with an example. But that's all you have to do. It's almost like the theory was harder than actually doing it. I did want to do a shout out to this because the first time I learned this stuff, I could not get over this. I was like, Okay, I get we're trying to estimate me from a sample. But if we have a population standard deviation, why don't we have the population mean, like we obviously had to measure everybody to get the population standard deviation. Why don't we just look at the mean at the same time? Like, why are we even doing this? So that was like a really difficult conceptual problem. But then I worked for a very short time at the Department of Public Health, the state of Massachusetts. And what I realized was when you're doing like population level health, the mu can and often should change. But the variation, which is measured by the standard deviation tends to say the same. So imagine that we studied at the Department of Health or Public Health, a community where the blood pressure, the mu blood pressure was just way too high. We'd study it, we find this really high mu and we'd also find the population standard deviation of blood pressure. Then let's say we did work to try and lower the mu blood pressure. And we wanted to see what it was like a few years later after we did all that work. Well, we can't measure the whole population again. We'll just take a sample of them. But we can use the old population standard deviation because that generally doesn't really change that much. It's just the mu that's going to change. So that's like this huge explanation because I had this huge conceptual problem with this. And hopefully that will solve your problem. If you have this problem. And if you didn't have this problem, you probably just sleep better at night than I do. And I just to illustrate this, I this with this slide was available online. So I took it. It's just an example of one of the findings from this article on total knee arthroplasty, which sounds perfectly painful. But in any case, as you can see, this is BMI, you know, since we're on the subject, their BMI their mean BMI changed as time went on and went up. However, I can't demonstrate this to you. But in the paper, you'll see that they're they had a standard deviation. And that standard deviation really didn't change much. So if you're making it, let's pretend you measured everybody, the population in 2003 on the slide, we're just pretending. Then you wanted to see what things were going, what was going on in 2007, you could use the old population standard deviation from 2003, and just apply to 2007. But as you can see, even if you took a sample of those people in 2007, you probably get like kind of a higher x bar because there's a higher mu. Alright. Another thing I want to get into before I actually demonstrate this is that for you standard error fans, you might have noticed that in the margin of error calculation, the standard error is hiding. So remember, I touch out a prep cook your standard error because you can use a lot and then you don't have to deal with thinking about square root of n being underneath something. So the formula actually for the margin of error could technically be written like this. So if you're doing a quiz or something, you want to keep track of your SE, your standard error, and reuse it if questions are asked about different levels of confidence, because only your Z sub C would be changing that your SE would be the same. So this is just a shout out to standard error fans. Okay, now I'm going to walk you through actually calculating a confidence interval when the population standard deviation is known to give you an example. So here's our example. So remember the class of students that I was in that was telling you about, where we had such a hard test that nobody got 100%. So in that class of students, there was we studied the population standard deviation and we found it was 14.5, right? But now it's a new term, right? Same college new term. And we have a new class of students. So we want to know if their mu is different, like maybe the teacher got a little better. And so maybe their mu is a little higher. Who knows. Okay, so step one is make sure you have your x bar, your population standard deviation and your n available. So we're going to take a sample of 100 just because it's easier that way. And we I'm going to say we got an x bar of 63, right? That's not a very good score on 100 point test. But that's it's a hard test, right? And we have our old population standard deviation to use. So we've done step one. Okay, so we have our x bar, our population standard deviation and our n. Now let's do step two, where we have to pick C, which is the confidence level. And that forces you to use the Z sub C at that confidence level. So just as a demonstration, I'm going to pick 99% for C. So the Z sub C is 2.58. But I could have picked some other one, I just picked 99% because I felt like it. All right. Step three is to calculate the margin of error or e. So remember our formulas up there. But then I you know, and standard error fans, you know, there's a standard error in there. So I just first put up the numbers we need their x bar is 63 remember, population standard deviations 14.5 our n is 100, our C, our confidence level is 99%. So our Z sub C is 2.58. So we just plug it into the equation. And we get the e is 3.7. Okay, so that's step three, we got e, which is 3.7. Now step four, you want to get the lower limit of the confidence interval, you subtract e from x bar. So our x bar was 63. And we got our new e look I put in the red of 3.7. And then we subtract we get 59.3. So that's our lower limit. And see, I put that on the slide now. Now step five, you add the e, which is 3.7 to the x bar, which is 63. And you get 66.7. So you get your upper limit. And then what do you say, you say, I am 99% confident the mu of the new class of students is between 59.3, which is a lower limit, and 66.7, which is the upper limit. And if we just think logically about that, that's about like 59 to 66. So if I was talking to you about your class, and you're like, Oh, my gosh, it was so bad, the mean was 63. And you also tell me that the 99% confidence interval is 59 to almost 67. It really tells me everybody did pretty bad in that class. Like, if 99% of the samples you could have taken would have gotten an x bar in between those two numbers, people are not doing well in that class. All right. So I just gave you one demonstration of how to calculate a confidence intervals because it's actually a lot simpler than the other stuff we were doing. Like you just plug in the numbers and you do that and you do the interpretation, it's pretty easy. But sometimes you don't want to know that the you're not trying to estimate mu. In other cases, you're actually trying to figure out how many people to have in your study in order to estimate mu. So I'm going to show you how to do that if that's your problem. So I started by just putting up the steps to calculate sample size, when the population standard deviation is known. So I just want you to look at the formula. It's actually the same formula just rearranged. So it's solved for n, right? Remember that square root of n, which is under underneath the standard error. Well, how we how that gets dealt with is you see on the right side of the question, you have to square things, right? So basically, if you just look at that formula, you realize in order to calculate n, you have you need a z sub c, you need a population standard deviation, and you need an e. So here's how we're going to go about it. Step one is make sure you have your population standard deviation around and available because you're going to need it for this. Next, you have to make a decision, you have to pick C. So this is before you're doing your study. So you're saying, Okay, how big of a confidence, how much confidence do I want when I'm done with my study, right? If you want 99% confidence, you're going to have you're going to end up with more needing more sample, right? Because then you'd end up putting that 2.58 in for z sub c up, up in that equation, because it's on the top, it'll inflate the number. If you are don't need to be so confident, like if you look at 95%, you get a discount, that's only 1.96. Then you don't need as much as so it depends on much money and effort you want to make. So so you have to pick C, so you can get your z sub c, so you can put that in the formula. Then next, you have to also pick how big of an E you want. So remember how big of a margin you want. If you pick a big margin, then you won't get a very precise estimate, right? Like you you won't get upper and lower limits that are kind of close together. So you can kind of know where it is, right? So you've got to also make a political decision of how big of an E you want. And then finally plug all that into the formula and you calculate the sample size of the end. So I'm coming up with a new question. How many students do I need? If I want a 95% interval, and I want my E to be five, that's the way I start the question. Okay, how many students do I need if I want this and that? Well, what does that literally mean? Let's just make a list. First of all, we're going to use our population standard deviation. So I threw that in there. Right. And then because I wanted a 95% confidence interval, our C is 95%. And then we're stuck with our Z sub C, which is 1.96. Like I was showing you on the last table, we were looking at. And I decided I wanted my E to be five. So I put that in there. So that's steps one, two and three. So step four is calculate. So let's calculate. Alright, so what do we do? We plugged it into the equation. So we have Z sub C, which is 1.96 times the population standard deviation. We have the product of that divided by our E, which is five. And then we take that whole thing to the second, and we get 32.5. So that's our end, right? But you need to like always round up due to not being able to have like 0.3 of a person. So sometimes people ask me, Well, do you round down? Because it's under 0.5. I'm like, No, because you need at least 32.3 people, and you can have 0.3 of a person. So you always have to round up. So if you get a little piece of a person, you have to take the whole person. It's just the way people work. So that therefore your answer is actually 33. So if you wanted a 95% confidence interval, and your population standard deviation was 14.5. And you wanted your margin of error to be five, you would need to get a sample size of 33 people. So we finally made it to the end of this lecture. In conclusion, I reviewed I started by reviewing some slides from chapter seven. I know you're kind of done with that. But I reviewed them because they related to the subject of confidence intervals. Next, I explained Z sub C, and the level of confidence and the whole concept of the point estimate of x bar, then I finally went into some application, right? Enough of the theory. Let's actually construct a confidence interval when the population standard deviation is known. And then so I did one of those for you. And then we did an example of how to calculate sample size in case you are getting ready to do a study. And you need to figure out how many people you needed it based on the population standard deviation you have, and your choices of confidence level, and of the size of e. And so if you thought you were 95% confident or 99% confident, here is someone who is 100% confident. That is a very confident peacock on that slide.