 lecture is an introduction to statistical inference in which we start looking at estimation, and hypothesis testing will be in another lecture. As you know, statistical inference involves making inferences drawing conclusions about population parameters based on sample statistics. We could be doing estimation where we're looking at the sample, getting the sample statistic like say X bar, and then using it to draw conclusions about the value of the population parameter mu. The other activity we could be doing in statistical inference is testing hypotheses where we collect a sample and we have a hypothesis about the parameter and we use the sample evidence in order to test this hypothesis. I think most of us are familiar with the notion of using one value to estimate something that we don't know. We'd call that a point estimator. There's a problem though in using something like X bar, which we know from the last lecture is a random variable with its own distribution, to estimate the population parameter mu. After all, what's the probability that we're right? What's the probability that the X bar value we happen to get from the one sample out of many that we happen to collect really is equal to the single value, the true parameter mu, the mean of the population. There's only one mean mu, but there's a lot of different possible X bar values that we could have gotten, one for every one of the different samples that we could have collected. The probability that a random variable, X bar in this case, is equal to a particular value, mu in this case, is always zero with a continuous random variable. That's why when we're doing probabilities from the normal distribution, we had to get probabilities in an interval. We had trouble constructing probability questions with a continuous random variable like a variable from the normal distribution. Well, that's exactly what's going to happen over here. We're going to use intervals here as well. We're going to say, well, I can't use X bar as a point estimate of mu because X bar is just one value that I happen to get from my one sample that I happen to collect. But I can take X bar, add a little something on the right side, add a little something on the left side, make it an interval and then say, well, this interval is my best guess. It's my best estimate for the true population mean mu. And I'll try to come up with a certain level of confidence, which is really just a probability that the interval really does contain the true population mean mu. When you think about it, you'll realize that this is actually a much more accurate estimator than the point estimator, which only seems accurate. If you, I'm just remembering many, many years ago, when the other professor Friedman and I were dating, he used to somehow get away with saying, I'll meet you at so and so a time at such and such a place, give or take 20 minutes. Well, you could never be late. If you use an interval like that, you could never be late. It can't go wrong. 3 p.m., give or take 20 minutes, you could never be wrong. If you're there at 3.02, okay. But if you say specifically, I'm going to meet you at 3.02 and then you come at 3.03 or 3.05, well, you're just that a lock. You're late. And that's why an interval is always better than a point estimator. And in fact, it's always more accurate than a point estimator. And that's an interesting thing to consider. How wide should the interval be? Well, you're going to see that it depends on how much confidence you want in the estimate. For example, suppose you want a confidence interval estimate for the mean income of a college graduate. Well, if you want 100% confidence, there's only one way to get 100% confidence. You've got to take a census. Otherwise, you're going to have to work with less than 100%. That's another way you kind of, if you don't mind getting fired, you can tell your boss 100% confidence. Sure, minus infinity to plus infinity. That's 100% confidence. Or in the case of income, zero to infinity. That's a good way to get fired. And you sound like a wise guy, but the reality is that's the only way to get 100% confidence if you're not willing to take a census. And notice what happens. 95% confidence interval. Well, you're 95% sure that that interval, in this case right here, being 35,000 and 41,000, you're 95% sure the mu is in there. Remember, mu doesn't move. It's a fixed number. It's a true population mean. If you're willing to live with 90% confidence, notice 36 and 40. 80% confidence between 37,500 and 38,500. Okay, so basically, this is a very important concept. The wider the interval, the greater that your confidence you'll have. Again, that's why minus infinity to plus infinity, that gives you 100% confidence. And notice you get 0% confidence if you give a particular number. That's why we never say X bar is mu. X bar is a random variable. So I have zero confidence if I say, well, the mean income of a college graduate is $38,000. That's called a point estimate. That gives you zero confidence. We need an interval. And again, the wider the interval, the more confidence you'll have. In this slide, you're going to learn how to construct a confidence interval estimator. You're going to see it called CIE. Okay, so we're constructing a CIE for mu. Look at the formula. X bar, by the way, the sample evidence and these kind of problems is going to be the standard deviation S, X bar, the sample mean and N, your sample size, took a sample. So there's an N. Remember, N, X bar and S, those are going to be, that's going to be called the sample evidence. So we're going to use, ideally we should have sigma. We're going to use S instead of sigma. What's that Z alpha? That comes from the Z table. Okay, and that will depend on how much confidence you want. So we're going to learn how to do that. Confidence intervals, we're going to be learning in this course are called two-sided confidence. We're not going to say it always, but keep in mind it's two-sided confidence interval estimators. This is why we take the alpha. You have to split it in half. This is, so this is the way you'll make sure to get the right Z value. All right, so we're just going to call the Z value Z alpha. Really, it's been split because it's on two sides. Again, this course assumes that we're working with two-sided confidence interval estimators. Okay, now what's in the middle there? The big white space. That's one minus alpha. So let's take a problem. If you want 90% confidence, okay, that means the 10% you don't have confidence, that's going to relate to your alpha error. Okay, we put 5% of the right tail, 5% of the left tail. If you're working with 95% confidence, one minus alpha, that's the confidence, 95%, then the alpha of 5%, this again split into two tails. So you have 2.5, 0.025 in the right tail, 0.025 in the left tail. If you wanted to work with 80% confidence, okay, that means the central part would be 80%, then you have 10% of the right tail and 10% of the left tail. So one minus alpha is essentially your confidence and that alpha, which we're going to learn more about, that's just kind of an error, it's an uncertainty. That's your lack of confidence in a way. You work for a company that makes smart TVs. Your boss says, take a sample of 100 of these TVs of a machine to test it and everything, and they want to know with certainty the exact life of the smart TV that they manufacture. Remember, you took a sample of 100 TVs. So what is the exact life of a smart TV made by your company? Again, the sample evidence is always N, 100, X bar turns out to be 11.50 years, that's the sample mean, and the sample standard deviation is 2.50 years. So what is the exact life based on this sample evidence of 100, but your boss said with certainty, the boss wants certainty. So what are you going to tell your boss? Well, here's what you're going to tell your boss because your boss wants 100% confidence, that's called certainty. So the only answer you can say is, well, minus infinity to plus infinity. Actually, you could have also told your boss, you can't do it from a sample of 100, you need to take a census, and that means test every single, also get your fired. You say, okay, we're going to test every single TV we made, and the boss will say, that'll destroy each one, I can't sell them, you're fired. So these are two answers that are going to get you fired. Minus infinity to plus infinity, well, let's test every single TV. The whole goal is to use a sample. That's what inference is about. Taking sample statistics, trying to draw conclusions about the population parameters. Now we're going to share the right way to do it, but we call the better answer. Okay, the boss, you tell the boss 100% confidence, can't do that. I don't want to get fired. So the boss say, okay, we'll work with 95% confidence. Okay, now remember, you have the sample evidence and it's 100 X bars, 11.50 years, and S is 2.50 years, and you want 95% confidence. 95% confidence, again, the 95% is in the middle there, right? So what do you have in each of the tails? It's two-sided confidence interval. Take your 05, 0.05 divided by two. So you have 025, which is 2.5% on the right tail, 0.025, which is 2.5% on the left tail. And then looking at your zero to Z table, if you have 2.5% in the right tail, that means you have 47.5% from zero to Z, from zero to 1.96. Of course, you have the same thing from zero to minus 1.96. You have 0.4750. If you look it up on the table, you'll find the Z value that corresponds to 95% confidence is 1.96. That's your Z value. So now using the formula X bar, 11.5, plus and minus that Z value, 1.96 times the standard error of the mean, which is really sigma over the square root of n, but we're using S, which you can, we have a large sample, some say 30, some say 50, but 100 everyone agrees, use S instead of sigma. So you have 2.50 over the square root of 100, which is 0.25. Notice 1.96 times 0.25 is 0.49. This is years, of course, 0.49 years. That is called the margin of error. That's a very important term. Make sure you memorize that term. Margin of error. That number that comes after the plus and minus, which is basically half the interval, that's your margin of error. Okay. So you do 11.50 plus 0.49. So your upper value, the confidence interval, is 11.99 years. Your lower value is 11.01 years. Okay. Now you see the confidence interval. Any value between 11.01 and 11.99, any value in there. Remember, there's only one mu and it's fixed in place. That doesn't move. It's not around the variable. It's a true mean. You don't know what it is, but that interval is, you know, covers 99, 95% confidence. With 95% confidence, that interval got the mu in there. You don't know what it is. You never will know because you didn't take a census. But in there, between 11.01 and 11.99, you are 95% sure lies the true mean, the mu. Now we're going to summarize what you've just been taught. First, with 95% confidence that the interval from 11.01 years to 11.99 years contains the true population parameter mu. Again, the reason you don't know what it is is the only way to know the true population parameter mu is through a census. We're doing this through a sample. We have 95% confidence that that interval contains it. Or another way to put this is in 95 out of 100 samples, that's where your 95% confidence comes from, 95 out of 100 samples, the population mean would be contained in intervals constructed using this procedure, the same n and the same alpha. Please remember, the population parameter mu is fixed. It's not a random variable. x bar is the random variable. That's why somebody else could take a sample of 100 small TVs and get a different x bar. We can have thousands of people taking samples of 100 TVs getting different x bars. It's incorrect to say there's a 95% chance the population mean will fall in this interval. The population mean doesn't fall anywhere. It's fixed. The only way to get at it is through a census. We want to know it exactly, but we have a good way of getting a good idea of what it is. Another exercise. We're looking at this in a slightly different way. We have sample evidence, sample of 100 refrigerators, looking at the lifetime of a refrigerator. The x bar value, the average of the sample, was 18 years, the standard deviation from the sample, four years. What we want to know is can we construct a confidence interval estimator of mu with various different levels of confidence? The idea of this exercise is to see what happens to the interval, the CIE, when we're using different levels of confidence but keeping everything else the same. This is more or less an exercise just to see how this formula reacts when we're using different table values. You see the formula on top. We're just repeating it. The x bar is in the middle. On either side is the z value from the table multiplied by the standard error of the mean. That gives you the half width of the interval or what's called the margin of error. We want to do this five times. One with 100% confidence, one with 99, 95, 90% and 68%. Let's see what happens to this interval and why we want to care about it. Remind you that we're using s in this formula, the sample standard deviation. We're using s as a point estimator for sigma, the true population standard deviation. When we're estimating one parameter, we really don't want to use an estimate of another parameter but pretty much all the time, or almost all the time, that's what we have to work with. As long as our sample size is large enough, we're okay doing that. Remember, we're constructing a confidence interval estimator for mu with x bar in the middle. What we're doing in the formula is substituting s as a point estimator for sigma. That's because s is an unbiased estimator of sigma. The way we compute s, it's an unbiased estimator of sigma s of the standard deviation of the sample. The expected value of that is sigma squared, the variance of the population. You can see the two formulas there and how we work with them and how we substitute one for the other. This is just a brief digression just to remind you that the formula is supposed to have sigma in it but if we have it, we use it. If we don't, we will very likely be using s to replace it. We're going to talk more about this later on in other lectures. On this slide, we see all the solutions to all of the parts, a, b, c, d, and e. You already know that if you want 100% confidence, you need to take a census. If you don't take a census, all you could say is negative infinity to plus infinity, which is useless. That's one of the things we want to learn here is you not only want to be able to construct an interval estimator with a certain level of confidence, you want something that's relevant, that's important, that gives you information. Saying that with 100% confidence, I can estimate a parameter with going from negative infinity to plus infinity, it gives you zero information and it's zero important and zero relevance. But that's what you got if you're using sample data and you want 100% confidence. Now, if you just reduce your confidence level a little bit, you have something else to talk about. You can work with it. In part b, 99% confidence, you get the value, you want the central 99% of your distribution, so you look at the z table. With alpha 0.01, half a percent in each tail, the z value is plus and minus 2.575. We put that into the formula for confidence interval. The average x bar is in the middle, x bar plus and minus, there's the z value 2.575. We multiply it by the standard error of the mean four divided by the square root of 100. You end up with 18 plus and minus 1.03. Again, 1.03 is the margin of error and your confidence interval estimate at 99% confidence goes from a lower limit of 16.97 years to an upper limit of 19.03 years. That's a little bit better than negative infinity to plus infinity. It's still a little bit wide. The interval is a little bit wide. It gives you more information, b gives you more information than a for sure, but it still would be better if it were narrower. What do we need to sacrifice in order to make the interval a little bit narrower? Well, let's go take a look at part c. 95% plugging all the values into the formula. The only thing different is the z value that we get from the table. Looking at the central 95% of the distribution, we need a z value of plus and minus 1.96 and we end up with a confidence interval of 17.22 years at the low side to an upper limit of 18.78 years. That's pretty good. That's probably as narrow as you need, whatever the purpose is for this exercise. 95% confidence gives you not only 95% confidence, which isn't bad, but it gives you a fairly good narrow-ish interval. What happens if we're willing to reduce our confidence even further? 90% confidence, we end up with an interval that goes from 17.34 to 18.66. We went down in confidence from 95 to 90. We didn't really improve the relevance of the interval too much. It doesn't really help as much, we might as well stick with 95%. If we want to go down to 68% confidence, how do we get 68%? We go to the z table, we want the central 68% of the distribution. That means that 1 minus 0.68 is 0.32. That's split into the two tails and the z value is 1. And so it's plus and minus 1, 18 plus and minus 1 times the standard error of the mean. We end up with 18 plus and minus 0.4 as the half-width of the confidence interval or the margin of error. And at 68% confidence, we have an interval that goes from 17.6 to 18.4, but 68% confidence. So this exercise is just to show you how things change when the only thing we're changing is the level of confidence and the value we get from the z table in order to imbue it with that level of confidence. We're going to talk now about how we have to balance confidence and width in a confidence interval estimate. Okay, so how do we keep the same level of confidence and somehow make the CIE narrower? Let's analyze the formula. You see the formula? X bar, that's the sample mean, that's always the center. Okay, here's your margin of error plus and minus z alpha sigma over the square root of n. And you know that the more confidence you want, you've got to use a higher value of z. We saw if a 99% confidence used 2.575, if a 95% confidence used a value plus and minus 1.96. In any case, 100% confidence needed infinity, plus and minus, plus infinity to minus infinity. Okay, so the higher the value of z, the larger the half width of the interval. Or I can say that the more confidence you want, you're going to make the whole interval wider. Okay, the larger the sample size, the smaller the half width, because we're always dividing by the square root of n. So obviously, if you take a larger sample, you won't have such a wide. There's a trade-off. Take a larger sample, you'll have more certainty, so your confidence interval will be narrower. But of course, it costs money. I mean, obviously, if you take a sample of 1,000, you'll be more certain of the results. You have a narrower interval than if you take a sample of, say, 64. But basically, this is what something you should know. If you want a narrower interval, you don't want it to be wide, which of course you don't want, take a larger sample. Now, there are some tricks that are not taught in this course about stratification, various kinds of sampling tools that can help reduce the standard deviation for you. And that, of course, would make the confidence intervals narrower. But again, we're not going to do it in this course. But just bear in mind, there are ways to do this. And you might learn about them in other courses. But the simplest concept that we've learned is that if you want to narrow the width of your confidence interval, just take a larger sample. And we'll learn about this trade-off. This problem involves the life of a tennis racket. Okay? A company wants to estimate the average life of its tennis racket. So, for every company to do this, because they want to advertise how durable their product is. Okay? So, you know what the sample evidence is. N, X bar, and S. Okay? N is 81. They took a random sample of 81 tennis rackets, put the information into a spreadsheet, like Excel maybe. The X bar was 7.50 years, and S was 1.10 years. And now you're asked to construct a 95% confidence interval estimator of Mu. Okay? It's very simple. The hardest thing here in all these problems is just figuring out what the Z to use. Remember, you're taking the, you want 95% confidence, so the 95% is in the middle. Right? And that means you're going to have, in each of the tails, you're going to have 0.25. 0.25 in each tail. Okay? You're going to have 0.25 in each of the tails. And let me give you a 470501, you know, the fat part of the curve, 470501 on one side, 470501 on the other side. And of course the Z value then is 1.96. You need plus and minus. Remember, these are two-sided confidence intervals. So, you want plus and minus 1.96. So, you do 7.50, that's your X bar, plus and minus. Now you take the 1.96, your Z value times 1.10 over the square root of 81. That's the standard error of the mean. That value, 1.96 times 1.10 over the square root of 81, which by the way is 9. That value of 0.25, 0.24. And of course that's years. That's your margin of error. Make sure you know that term, margin of error. So, you have plus and minus 0.24. So, you add 0.24 to 7.50. That's 7.74 years. Then you subtract it and you get 7.26 years. And there is your confidence interval estimator. CIE is now, it goes from 7.26 to 7.74. You're 95% sure that that interval, somewhere in that interval, is the true mu, which you cannot get from a statistic. But you could have gotten it if you were willing to test millions of rackets and take a census. This problem involves the life of digital copiers. Okay, you work for a company, they manufacture digital copiers, and they want to know the average life of their machines. They take a random sample of 121 copiers, and here's the information you're getting from the sample evidence. Remember N is 121. They find out that X bar is 6.10 years. That's the sample mean, and the sample standard deviation is 1.85 years. And now you're asked to construct a 99% CIE for the population mean mu. Okay, so 99%. Remember, these are all two-sided CIEs. You want a two-sided confidence interval estimator. Okay, if you want 99% in the middle, okay, you divide that in half. So you have 4950, that's to the right of the zero, and 4950 to the left of it. And in each tail, you're taking the 1%, that's your alpha, right? 1-99% is 01. To cut 01.5, you have 005 in the right tail, 0.005 in the left tail. And now you go to your zero-to-Z table. And guess what? You find the Z value is 2.575. Okay, so plus and minus 2.575 will give you 99% of the area under the Z distribution. Okay, so now you take that value, 2.575 times S over the square root of N, which is called the standard error of the mean. So that's 1.85, that's S over the square root of 121, which by the way is 11. Okay, and I need X bar plus and minus that margin of error. So we do 6.10 plus and minus 2.575 times 1.85 over the square root of 121, which is 11. Your margin of error is 0.43. That's your margin of error, 0.43 years. Okay, so you do 6.10 plus 0.43, and then you do 6.10 minus 0.43, and then you have your confidence interval. Simple as that. This problem of looking at the life of a baseball, okay, a company wants to estimate the average life of its baseballs. So you're working for this company, so I'm going to take a random sample of 64 baseballs, and then you put it all the results in the spreadsheet, and then you get the X bar of the 64 and the 64. Your sample evidence now is X bar is 3.35 months under normal usage. Your standard deviation is 0.88 months, and now you've been told to construct an 80% confidence interval estimate from you, 80%. Remember, these are all two-sided intervals. You've got to cut everything in half. So you have 0.40 and 0.40, which leaves, if it's 80% confidence, 1 minus alpha, that means 20% is the alpha, and you've got to split that in half. So you have 0.10 on the right tail, 0.10 on the left tail. Okay, now you need a Z value. Which Z value will give you 40 from 0 to Z will give you 0.400 of the area under the curve. Okay, looking at your 0 to Z table, you find out that Z value approximately is 1.28. Okay, so now you have 3.35, that's your sample mean, plus and minus. Now that value, as you know, is called the margin of error. 1.28 times the standard error of the mean, that's Z times S over the square root of N, 1.28 times 0.888. 0.888 is the S, the square root of N, your N was 64. And you do that arithmetic and you find out your margin of error is 0.14. That's the margin of error. 3.35 plus 0.14 is 3.49 months, 3.35 minus 0.14 is 3.21 months. And now with 80% confidence, you can say this interval that goes from 3.21 all the way up to 3.49 months, that interval is 80% certain that in that interval somewhere fixed in place is your mu. Here's a graphic just to illustrate the point that the random variable we're looking at and that we're working with is X bar and even the interval based on X bar, the confidence interval estimator. We take a different sample, we get a different interval, we get another sample, get another interval. In this particular example, we happen to know, but we don't know because we don't have a census, that let's say mu is 6.4, 6.40. But depending on the sample that we take and the X bar value we compute and the interval we get out of it, we're not necessarily going to know this, but we will work with a certain level of confidence and do the best we can. If we were going to let's say construct a 90% confidence interval estimator of mu, some of the times the interval will cover, will contain mu, some of the times it won't. If this is working the way it's supposed to work, out of let's say a thousand such experiments, 900 of them should contain mu, should contain 6.4. With only 10 it's kind of hard to be exact, but that's all we have here in this problem for you. But just take a look at the graphic for a minute. Mu of 6.40 doesn't move. Some of the intervals are shifted over to the right, some of them shifted over to the left, some of them fairly over the middle, fairly residing, standing with a foot on either side of 6.4. But sometimes we're off. We have to get used to the fact we're operating under uncertainty. And in fact, out of these 10 confidence interval estimators, three of them did not contain 6.4. We do the best we can. We use the z-table, the values from the z-table, to imbue our results with a particular level of confidence. But we always have to remember there's that little bit that we're allowing that could be wrong. And we have to be very, very humble. And remember, sometimes we could be wrong. We could be doing, do the experiment again, replicate it, get different results. Either results that seem better or the results that seem worse, but it doesn't necessarily mean that the first experiment was wrong. Let's see if we can summarize some of the key points in this lecture in estimating from a sample to a population. When we're working with a sample, that's the whole point of statistical inference, we can never have 100% certainty. That's a key point, kept hammering it over and over again, and it's worth saying it again. If you want 100% certainty, you have to take a census. You're not going to get that from a sample. What can you get from a sample? You can state the confidence that you have that the estimator that you created, the interval estimator, contains the true parameter. That's basically what you can do. And we saw various problems with various levels of confidence to see how that works. The traditional level of confidence, if all else, if it's not stated, is 95% confidence. But whatever problems are given you, it'll tell you, the problem will say what confidence to use, and you should be able to use the Z-table to come up with any level of confidence. And of course, the more confidence, the wider the interval, the lower the confidence, the narrower the interval. That's the trade-off. As always, do lots and lots and lots of problems. It's going to stand you in good stead. It'll help you understand the material, and it'll help you on the exams, because when you look at a problem, if you've done a lot of practice, you'll be able to quickly recognize it as far as which type of problem it is, and you'll be able to remember how to solve the problem, and you'll be able to do it faster, because we all know in an exam situation, you don't want to have to go back and forth and think about stuff and look it up in your notes or in your textbook.