 As we discuss statistics and data science, one of the really big topics is going to be inference. And I'll begin that with just a general discussion of inferential statistics. But I'd like to begin unusually with a joke. You may have seen this before it says there are two kinds of people in the world. One, those who can extrapolate from incomplete data. And the end, of course, because the other group is the people who can't. But let's talk about extrapolating from incomplete data or inferring from incomplete data. First thing you need to know is the difference between populations and samples, a population represents all of the data or every possible case in your group of interest. It might be everybody who's a commercial pilot, it might be whatever. But it represents everybody in that or every case in that group that you're interested in. And the thing about the population is it just is what it is. It has its values, it has its mean and standard deviation. And you are trying to figure out what those are, because you generally use those in doing your analysis. On the other hand, samples, instead of being all of the data, are just some of the data. And the trick is they're sampled with error. You sample one group and you calculate the mean, it's not going to be the same if you do it the second time. And it's that variability that's in sampling, that makes inference a little tricky. Now also with inference, there are two very general approaches. There's testing, which is short for hypothesis testing. And maybe you've had some experience with this, this is where you assume that a null hypothesis of no effect is true, you get your data and you calculate the probability of getting the sample data that you have, if the null hypothesis is true. And if that value is small, usually less than 5%, then you reject the null hypothesis, which says really nothing's happening. And you infer that there is a difference in the population. The other most common version is estimation, which is, for instance, characterizing confidence intervals. That's not the only version of estimation, but it's the most common. And this is where you sample data to estimate a population parameter value directly. So you use the sample mean to try to infer what the population mean is. You have to choose a confidence level, you have to calculate your values, and you get high and low bounds for your estimate that work with a certain level of confidence. Now, what makes both of these tricky is the basic concept of sampling error. I have a colleague who demonstrates this with colored m&ms, what percentage are read, and you get them out of the bags and you count. Now, let's talk about this, a population of numbers, I'm going to give you just a hypothetical population of the numbers one through 10. And what I'm going to do is I'm going to sample from those numbers randomly with replacement. That means I pull a number out, it might be one and I put it back, I might get the one again. So I'm going to sample with replacement, which actually may sound a little bit weird, but it's really helpful for the mathematics behind inference. And here are the samples that I got. I actually did this with software. I got a three, one, five and seven. Okay, fine. Interestingly, that is almost all odd numbers, almost. My second sample is four, four, three, six and 10. So you see I got the four twice. And I didn't get the one, the two, the five, the seven or eight or nine. The third sample, I got three ones, and a 10 and a nine. So we're way at the ends there. And then my fourth sample, I got a three, nine, two, six, five. All of these were drawn at random from the exact same population. But you see that the samples are very different. That's the sampling variability or the sampling error. And that's what makes inference a little trickier. And let's just say, again, why the sampling variability, why it matters. It's because inferential methods like testing and like estimation, try to see past the random sampling variation to get a clear picture on the underlying population. So in some, let's say this about inferential statistics, you sample your data from the larger populations. And as you try to interpret it, you have to adjust for error. And there's a few different ways of doing that. And the most common approaches are testing or hypothesis testing and estimation of parameter values.