 Hi, I'm Mike Marin. Today we'll discuss the concept of a sampling distribution of the mean. In many fields of study we collect data. These data are used to calculate an estimate, which we use to understand our world. Usually we collect only one random sample and then use it to estimate something, say the mean of a much larger population. But what if we had collected another random sample? Wouldn't the mean be different? Wouldn't our estimate of the population mean change? So how can we know that our one sample can be trusted to represent a much larger population? And how can we know that our work as statisticians is precise? The concept of a sampling distribution helps us to understand how we can use one sample to make statements about the mean of an entire population. A sampling distribution is an important concept in the statistical sciences. It's foundational and it's crucial that you learn it. But what is it really? A sampling distribution is the probability distribution of a given statistic based on a random sample. It describes the what if of all the possible estimates we could have ended up with. Still not sure what I'm talking about? Stick with me. Today to help us understand these concepts, we will be estimating the average length or mean length of fish that live in this lake. The thing we are interested in estimating today, which we will call a parameter, is the population mean. Our variable of interest is the length of a fish. We have our random sample of 25 fish from this lake here, and we will begin to measure the length of each individual fish. Now, in order to get the true population mean of all the fish in this lake, we'd have to measure every single fish. And that could be thousand, and we don't have that kind of time today. So today, I'm going to entrust the skills of a special friend of mine. Neptune, can you please help us calculate the length of every single fish in this lake? That's close! The important thing to learn here is that we must understand how samples behave when we know, well, everything, the truth. We can do this using probability theory. First, we're going to build up an understanding of how samples behave when we know the true values for the entire population. Then, we'll be able to learn how to use a sample to try and make statements about the entire population in the real world, when we don't actually know the truth without help from Neptune. We do this using statistical inference. So, we now know the mean length and the standard deviation of the length. And this probability distribution shows the lengths of all fish in this lake, every single fish. So, we know that this population has a mean length of 40 centimeters and a standard deviation of 10 centimeters. Thanks for your help, Neptune! We now know what we can do when we have the power of Neptune at our disposal, but what if we don't? What if we're back to being regular statisticians? What if we can't count each and every fish? We will now collect a random sample of 25 fish and measure their length and calculate the mean of the 25 length. We will call this our sample mean, or estimate of the population mean. It's important to remember that this sample mean is just one of many we could have gotten just by chance. We would have a different sample mean each time we collect data, because each set of fish is slightly different from one another. Look at this histogram. It's starting to show not 1, not 2, not 3, not 4, but all of the possible sample means we could have ended up with. Wouldn't it be great if we could see a histogram of all the possible sample means without having to collect sets of 25 fish over and over again forever? What would happen if we ended up with a different sample of 25 fish? You'll be pleased to hear that there is a way to know approximately what the shape of this distribution would look like without repeating the sampling process thousands of times. Thanks to Neptune, we now know that the true mean of fish in the lake is 40 cm and the standard deviation is 10 cm. And we know that the distribution of the lengths of all fish looks like this. Then what's going to happen is we're going to go out with our fishermen and we're going to take a random sample of 25 fish from this population. And when we take that sample of 25 fish, we expect our sample mean to be equal to 40 cm, the true population mean. Although, we know that it's not going to be exactly equal to 40 due to what we call sampling variability. Now imagine all the thousands of random samples we could have taken. Each mean deviates slightly because not all samples have a mean that equals Neptune's true mean of 40. The standard deviation of all possible means is called the standard error. And we can work that out. The standard error is the standard deviation of the lengths of individual fish divided by the square root of the sample size. So our standard error is 2 cm. And this tells us, typically, how far the mean length of a sample of 25 fish deviates from the true mean. So typically, we're going to end up with a sample mean about 2 cm away from the true mean of 40. Even without Neptune, we have an idea of how close our estimate will be to the true value. The sampling distribution of the mean will be approximately normally distributed under certain conditions. In fact, an old rule of thumb tells us that approximately 95% of sample means we could get will stay within about 2 standard errors of the true mean, and be between 36 and 44 cm. That's most of them. Not bad for a simple guy with only a paper cutout of a fisherman to help him out. Now, check this out. The sampling distribution, the distribution of all the possible means we could get, is approximately normal or belching. OK, so we think of the sample mean that we're going to get as one of many possible sample means we could get. And we can think of pulling it out of this distribution here. Even if this here, the population of all individual fish lengths is not normally distributed. Like in a different lake, with an unusual proportion of large sea creatures. This here, the set of all possible means, would still be normal in bell shape, as long as our sample size is large. Now that's pretty cool. Now, you may be thinking to yourself, why is Mike spending so much time pretending and talking about what ifs today? Well, the reason is that in everyday life, we generally don't know the true values for an entire population, and we must use a sample to try and make statements about the population we're sampling from. Learning how samples behave when we do know the true values for the entire population allows us to build the theory necessary to make statements about a population when we do not know the true values for the population. In the videos to follow, we will discuss how the sampling distribution of the mean can be used to help us make statements about a population, as we learn how to construct a confidence interval and test claims using hypothesis tests. Don't forget to check out the statistics visualizations that accompany this video. You can find the link in the description. Thanks for watching!