 Hello, we're now going to look at making or calculating a confidence interval for estimating a population mean So to start off anytime you build a confidence interval, you need a point estimate Well, I have news for you the sample mean x bar is The best point estimate for the population mean mu believe it or not Next we had to introduce a new type of distribution, but don't worry It's still very similar to the normal distribution. The students t distribution is a continuous distribution with the following characteristics It has symmetry with respect to the mean It's more spread out and flatter than the normal distribution and as the sample size and gets larger the t distribution does approach the standard normal distribution and Then we use something called degrees of freedom to determine the type of t distribution and the degrees of freedom is Sample size minus one you don't really need to worry about what degrees of freedom are you just need to know that that's an element That's used or a component that's used when doing calculations with the t distribution So when do we use the t distribution? Well, we use it when the population standard deviation is not known But the data appears to be bell shaped That's when we use the t distribution so for predicting a Population mean the error bound or margin of error formula when sigma is not known when the population standard deviation is not known Which obviously if we're estimating a population mean It's likely we don't know anything about the population standard deviation well, the formula is the error is the critical value t sub alpha over 2 Instead of using the standard normal distribution we do we do the t distribution And then it's the sample standard deviation divided by the square root as a sample size You might recognize this standard deviation divided by square root of sample size. It's called the standard error It's from the central limit theorem. It's that adjusted standard deviation So once again t sub alpha over 2 is called the critical value Alpha over 2 in the degrees of freedom are used to find that critical value So here's some notation that you'll see coming up Mu remember m u mu is the population mean x bar is the sample mean s is the sample standard deviation n is the number of sample values or the sample size Big e is the margin of error or the error bound and t sub alpha over 2 is that critical t value separating an area of alpha over 2 in the right tail of the t distribution So here's a comparison of the t distribution with the standard normal distribution So degrees of freedom Which remember that sample size minus 1 you look at the shape of the t distribution It's kind of shallow. It doesn't have a really tall peak on it But if you up the degrees of freedom to 9 that means the sample size of 10 Notice the peak gets taller and your tails get less and less And then whenever you have your degrees of freedom of infinity meaning your sample size is extremely large Notice you have your peak and then your tails are very very very tiny over here Which is what happens in the standard normal distribution So that's showing you that the t distribution approaches the standard normal distribution as sample size increases to a very large number So i'm first going to show you how to use a t table to calculate the Critical value for a t distribution What you'll have to do is you'll have to calculate alpha over 2 That's the area in one tail and then you'll locate the corresponding column So you'll find okay area in one tail is 0.05 Then you'll identify the degrees of freedom in minus 1 And then you'll find that along the appropriate row And then the critical values the intersection of this row and column I'll also show you how to use google sheets using the compute tab to calculate that critical value Although honestly since we don't really do much with the t distribution and doing a bunch of hand calculations You might like the table method better. I don't know. I'll show you both though Calculate the critical value for each of the following. So the confidence level is 95 and the sample size is nine So that would mean alpha Is one minus the confidence level Which is point zero five Which would mean alpha over two the area in one of the tails is point zero five over two And then degrees of freedom Sometimes abbreviated as df Is n minus one Nine minus one Which is eight All right, so my area in one tail is point zero two five and my degrees of freedom is eight My area in one tail is point zero two five and my degrees of freedom is eight So where do these this row and this column intersect Right at two point thirty one So my critical value Is two point thirty one What if I use google sheets all right, so My degrees of freedom is still going to be eight And then my area to the left of the data value i'm trying to find Will be one minus alpha over two That's one minus Point zero five over two That's point nine seven five That's what when you would type into google sheets in the compute tab There's a region called the t region and you're going to focus your attention there So where that one minus alpha over two comes from It's basically from the fact that 95 is the dead center of your bell curve And each of your tails would have to have two point five percent And then that critical t value cuts off your right tail So area to the left is 97.5 percent or point nine seven five. So let's go to our google sheet spreadsheet To the compute tab We'll zoom in a little bit here All right, all you have to do is type in degrees of freedom is eight Type in area to the left of point nine seven five, which I already had there for you So remember this is the t distribution region and you get about two point thirty one So same answer Same answer two point thirty one Let's try this again Confidence level is 90 and n equals 10. So Let's calculate the degrees of freedom first this time n minus one All right, and then we have to find our alpha over two Well alpha is always one minus the confidence level Which means alpha over two the area in one tail Area in one tail is point zero five All right, so area in one tails point zero five degrees of freedom is nine Where do where does the row and the column intersect? It looks like 1.83 So the critical value Is 1.83 If you were to use google sheets again, I won't walk you through the whole process But I'll tell you what the input is degrees of freedom is still nine And then area to the left Will be one minus whatever alpha is divided by two That's one minus point zero five Area to the left would then be point nine five You can try it out if you want to Point nine five is your answer though. Sorry point nine five is the area to the left It'll give you a critical value once again of 1.83 guaranteed Or your money back All right, so to build a confidence interval for the estimate of mu the population mean when the standard deviation is population standard deviation Is not known you literally take that point estimate x bar you subtract the error You take the point estimate x bar and you add the error And you will be somewhere between there with the designated confidence level And then you also have plus or minus form and then you have Interval form as well It's just like we did for a population proportion except instead of p hat We're dealing with x bar and the formula for the error is slightly different All right, so the procedure for constructing a confidence interval Verify the requirements are met that means the population is normally distributed Or if that's not the case you really need to have a sample size greater than 30 Using n minus one degrees of freedom. You'll find the critical value corresponding to the desired confidence level You'll evaluate the margin of error You'll calculate the lower bound and upper bound of your confidence interval and you will round accordingly We're going to use technology to do all this for us for this This portion All right, so we will use google sheets to calculate the whole confidence interval We will be using the data list tab and instead of typing in three pieces of information like we did for the population proportion We're actually going to have to type in four pieces of information Sample mean sample standard deviation sample size and confidence level So hopefully it's not too painful typing in that extra number So a common claim is that garlic lowers cholesterol levels In a test of the effectiveness of garlic 49 subjects were treated with doses of raw garlic and their cholesterol levels were measured before and after the treatment The changes in their levels of LDL cholesterol have a mean of 0.4 in a standard deviation of 21 Use the sample statistics of n equals 49 x bar equals 0.4 and s equals 21 to construct a 95 Confidence interval estimate of the mean net change in LDL cholesterol after the garlic treatment What does the interval suggest about the effectiveness of garlic in reducing LDL cholesterol? So I have cool good good news for you here. See these sample statistics These sample statistics are literally literally what you are going to type in the google sheets All right in google sheets We'll type sheets here for google sheets You are going to go to the data list tab We're spending a lot of time in the data list tab and the region you're going to go to is the one variable confidence interval p-value region And that's for t distribution All right, you need to input the following information x bar Is 0.4 and then s Is 21 N is 49 And then your confidence level is 0.95 or you could type in 95 with the percent sign whatever you want to do So let's check it out. Let's see what happens All right, so we're going to go to the data list tab And then we have this region the one variable confidence interval p-value for t distribution There's one for z distribution that we'll talk about momentarily, but x bar in this case is 0.4 Your standard deviation is 21 your sample size is going to be 49 And confidence level is the only other thing you need And it's going to give you your upper bound and lower bound of your confidence interval negative 5.63 And 6.43 All right negative 5.63 And then you have 6.43. So remember we're trying to find the The estimate of the true mean net change in LDL cholesterol after the garlic treatment So the interpretation of this confidence interval is as follows With 95 confidence the mean net change of LDL cholesterol after garlic treatment is between negative 5.63 and 6.43 We have to have that senate statement to go with the confidence interval So what can we conclude about the effectiveness of garlic in reducing LDL cholesterol? So this interval is capturing the change In LDL cholesterol levels. Well, guess what? Since zero Is in the interval what does it mean when the difference between something between two things is zero? Well, it means there is No difference So since zero is since zero is in the interval garlic Likely does not reduce LDL cholesterol Since zero is in the interval that means there's likely no difference. So garlic likely does not reduce LDL cholesterol So sad So now we're also going to look at finding a sample size for estimating a population mean To find the corresponding sample size for various requirements. You must take your critical value multiply it by the population standard deviation Divide by the margin of error and then you will square the result and you may say to yourself wait a minute Why in the world? We don't know the population standard deviation. Well, when we're predicting sample size We like to use an approximation for that population standard deviation Otherwise our life could turn into a living nightmare if it's not already that way So let's talk a little bit more about this formula Your value for n that you find has to be rounded up always And then if you don't know value for sigma for the population standard deviation, which often happens You've used sigma is equal to the range divided by four The range is the maximum value minus the minimum value So soon we want to estimate the mean iq score for the population of statistic students How many statistic students must be randomly selected for iq test if we want 95 percent confidence That the sample mean is within three iq points. There's our error And then we know that scores range from 60 to 120 We know that for a 95 confidence level the critical value is 1.96 So the error they wanted is within three points And then the range or sorry the standard deviation Is the range divided by four We're using the estimation rule here 120 divided by 60 over four 60 over four is 15 and you will plug These into the formula So i have 1.96 times 15 divided by three and this is all squared of course So that's actually going to give you 29.4 over three squared Which will give you 9.8 squared We just keep going and going and going and we actually get 96.04 You're like, can I just get away with messing with or picking 96 people? And with that I say no regardless You have to round up always round up So we have to use a sample size of 97 in this case All right, the last thing we're going to do is learn how to create or calculate a confidence interval for estimating a population Mean where the population standard deviation is known We're still going to have our point estimate. We're still going to have our error But our error we can actually find the z score rather than using the t distribution So we're allowed to use standard normal just like we normally would no more t distribution That's because the population standard deviation is known Google sheets will use the same type of stuff same area except we'll go to the z distribution area of the data list tab So let's do our example People have died in boat and aircraft accidents due to an inaccurate estimate of the mean weight of men Over the past several years the mean weight of men has increased So we need to update our estimate of that mean so boats aircrafts elevators and other such devices do not become dangerously overloaded Using the weights of men from a random sample We obtain these sample statistics and as 40 X bar is 172.55 And we know from other sources that the population of weights of men has a standard deviation of 26 pounds So there's that population standard deviation that we know from a previous study potentially So to find the best point estimate of the mean weight of the population of all men Well, that's literally 172.55 Construct a 95 confident interval. I think we'll use google sheets here Don't mind if I do So you'll type in x bar. We'll type in a standard deviation We'll also type in n Which is 40 and you'll type in the confidence level Which is 95 percent So on the data list tab if you scroll down just a little bit, you'll see the z distribution area You'll type in x bar You'll type in your sigma You'll type in your sample size and then your confidence level and you'll have your interval of 164.49 and 180.61 So your confidence interval is going to be 164.49 comma 180.61 The interpretation is as follows With 95 confidence the mean weight of men is between 164.49 pounds and 180.61 pounds So what do the results suggest about the mean weight of 166.3 pounds? That was used to determine the safe passenger capacity of water vessels way back in 1960 Is 166.3 A safe weight to assume or a safe average weight to assume for men Where does 166.3 lie in this confidence interval? Well, I would say Since 166.3 Is almost Out of the interval in particular it's out on the lower end of the interval Since 166.3 is almost out of the interval Using this weight may not be safe They may need to pick a higher weight However, the point estimate is also 172.55 Suggesting A true mean weight 166.3 I mean, it's only one sample that we looked at here But if you look at a variety of samples and you notice that all of them have means above 166.3 well, I think it's Kind of likely that you would probably need to increase this Average weight that you use for safety guidelines Because we don't want unsafe vessels Just real briefly remember that if sigma is known When you're estimating a population mean you use the standard normal distribution And then when sigma is not known And your population is normally distributed meaning, you know, it's bell shaped or The other requirement is if the sample size is greater than 30 you can actually you would use the t distribution But anyway, that's all I have for now. Thanks for watching