 We'll continue in statistics and inference by discussing estimation. Now as opposed to hypothesis testing estimation is designed to actually give you a number give you a value not just a yes no go no go, but give you an estimate for a parameter that you're trying to get. I like to think of it as sort of a new angle looking at something from a different way. And the most common approach to this is confidence intervals. Now the important thing to remember is this is still an inferential procedure, you're still using sample data and trying to make conclusions about a larger group or population. The difference here is instead of coming up with a yes no you instead focus on likely values for the population value. Most versions of estimation are closely related to hypothesis testing, sometimes seen as the flip side of the coin. We'll see how that works in later videos. Now I like to think of this as an ability to estimate any sample statistic, and there's a few different versions. We have parametric versions of estimation and bootstrap versions. I got the boots here. And that's where you just kind of randomly sample from the data in an effort to get an idea of the variability. You can also have what are called central versus non central confidence intervals and estimation, but we're not going to deal with those. Now there are three general steps to this. First, you need to choose a confidence level anywhere from say, well, you can't have zero, it has to be more than zero. And it can't be 100%. Choose something in between 95% is the most common. And what it does is it gives you a range of numbers a high and a low, and the higher your level of confidence, the more confident you want to be, the wider the range is going to be between your high and your low estimates. Now there's a fundamental trade off in what's happening here. And it's the trade off between accuracy, which means you're on target or more specifically, that your interval contains the true population value. And the idea is that leads you to the correct inference. There's a trade off between accuracy and what's called precision in this context. And precision means a narrow interval. And it's a small range of likely values. And what's important to emphasize is this is independent of accuracy, you can have one without the other or neither or both. In fact, let me show you how this works. What I have here is a little hypothetical situation. I've got a variable that goes from maybe 10 to 90. And I've drawn a thick black line at 50. If you think of this in terms of percentages and political polls, it makes a very big difference if you're on the left or the right at 50%. And then I've drawn a dotted vertical line at 55 to say that that's our theoretical true population value. Then what I have here is a distribution that shows possible values based on our sample data. And what you get here is it's not accurate because it's centered on the wrong thing. It's actually centered on 45 is supposed to 55. And it's not precise because it's spread way out from maybe 10 up to almost 80. So this situation, the data is no help really at all. Now here's another one. This is accurate because it's centered on the true value. That's nice. But it's still really spread out. And you see that about 40% of the values are going to be on the other side of 50%. It might lead you to reach the wrong conclusion. So that's a problem. Now here's the nightmare situation. It's this is when you have a very, very precise estimate. But it's not accurate. It's wrong. And this leads you to a very false sense of security and understanding of what's going on. And you're going to totally blow it all the time. The ideal situation is this, you have an accurate estimate where the distribution of sample values is really close to the true population value. And it's precise. It's really tightly knit. And you can see that just about 95% of it is on the correct side of 50. And that's good. If you want to see all four of them here at once, we have the precise two on the bottom, the unprecise ones on the top, the accurate ones on the right, the inaccurate ones on the left. And so that's a way of comparing it. But no matter what you do, you have to interpret a confidence interval. Now, the statistically accurate way that has very little interpretation is this, you would say the 95% confidence interval for the mean is 5.8 to 7.2. Okay, so that's just kind of taking the output from your computer and sticking it into sentence form. The colloquial interpretation of this goes like this, there's a 95% chance that the population mean is between 5.8 to 7.2. Well, in most statistical procedures, specifically, frequent as opposed to Bayesian, you can't do that. That implies that the population means shifts. That's not usually how people see it. Instead, a better interpretation is this 95% of confidence intervals for randomly selected samples will contain the population mean. Now I can show you this really easily with a little demonstration. This is where I randomly generated data from a population with a mean of 55. And I got 20 different samples. And I got the confidence interval for each sample and I've charted the high and the low. And the question is, did it include the true population value? And you can see that of these 20 19 of them included it. Some of them barely made it. If you look at sample number one on the far left, barely made it sample number eight, it doesn't look like it made it sample 20 on the far right, barely made it on the other end. Only one of them missed it completely that sample number two, which is shown in right on the left. Now, it's not always just one out of 20. I actually had to run the simulation about eight times, because it gave me either zero or three or one or two. And I had to run it until I got exactly what I was looking for here. But this is what you would expect on average. So let's say a few things about this. There's some things that affect the width of a confidence interval. The first is the confidence level or CL higher confidence levels create wider intervals. The more certain you have to be, you're going to give a bigger range to cover your basis. Second, the standard deviation or larger standard deviations create wider intervals. If the thing that you're studying is inherently really variable, then of course, your estimate of the range is going to be more variable as well. And then finally, there's the n or the sample size. This one goes the other way. Larger sample sizes create narrower intervals. The more observations you have, the more precise and the more reliable things tend to be. I can show you each of these things graphically. Here we have a bunch of confidence intervals, where I'm simply changing the confidence level from 0.50 at the left side up to 0.999. And you can see it gets much bigger as we increase. Next one is standard deviations. As the sample standard deviation increases from one to 16, you can see that the interval gets a lot bigger. And then we have sample size going from just two up to 512. I'm doubling it at each point. And you can see how the interval gets more and more and more precise as we go through. And so let's say this to sum up our discussion of estimation. Confidence intervals, which are the most common version of estimation, focus on the population parameter. And the variation in the data is explicitly included in that estimation. Also, you can argue that they're more informative, because not only do they tell you whether the population value is likely, but they give you a sense of the variability of the data itself. And that's one reason that people argue that confidence intervals should nearly always be included in any statistical analysis.