 This is a video about the sampling distribution of a statistic. First of all, what's a statistic? Well, suppose we've taken a random sample. For example, we may be interested in the distribution of children in families, and so we have a population of all families, and we take a sample of a few families and look at how many children there are. A statistic is anything we can calculate using only the numbers in the sample. So, for example, if we add together all the numbers in the sample, then that's a statistic. Or if we look at the maximum number in the sample, then that's also a statistic. If we subtract two from each of the numbers and then add up all the answers, we get a statistic. And if we look at the sample mean, the mean of the numbers in the sample, which you get by adding them all together and dividing by how many there are, you get a statistic. And if we look at the sample variance, which is defined like this, you get a statistic. So, statistic is anything you can calculate using only the numbers in the sample. Okay, this begs the question, what isn't a statistic? Well, the population mean isn't a statistic, and nor is the population variance, because these are parameters of the overall population, which you can't know just by looking at the numbers in a random sample. Likewise, calculations that involve the mean and the standard deviation of the population aren't statistics. So if you take the sample mean, subtract the population mean and divide by the standard deviation, that isn't a statistic. And in general, any calculation which you can only do knowing parameters of the population, like mu and sigma, isn't a statistic. So none of the things on this page is a statistic. In general, a statistic is something you can calculate using only the numbers in a random sample. If a calculation requires you to know population parameters, like mu and sigma, then it can't be a statistic. Okay, in this context, it's important to understand the difference between things like the population mean and the sample mean. So the population mean and the population variance are parameters of the overall population. They're facts. The sample mean and the sample variance are numbers that you obtain by doing a calculation with the numbers in a sample. And they're random, because their value will depend on the particular sample that you choose. In general, the sample mean and the sample variance will be close to the population mean and the population variance, and they can be used to estimate them. But almost all of the time, they'll be slightly different. So the population mean and variance are facts, and the sample mean and sample variance are random variables, which can be used to approximate the population mean and population variance. Okay, now the main point of this video is to work out some sampling distributions. And first of all, you need to know what those are. A sampling distribution is simply the probability distribution of a statistic. So let's move on straight away and look at some examples. My first example is to do with Roman coins. And you may know that there were all types of Roman coins, but I'm just going to concentrate on two types. First of all, the denarius, which was a silver coin, and secondly the cistercius, which was a brass coin. Let's imagine that a hoard of Roman coins is found containing only denarii and cistercii in the ratio 1 to 2. And you also need to know that if a cistercius is worth one unit, then a denarius is worth four units. First of all, let's find the mean and variance, the population mean, and population variance of the value of the coins. Well, in order to do this, we need to know the probability distribution for the population. And the population consists entirely of the numbers 1 and 4, because 1 is the value of the cistercius and 4 is the value of the denarius. And we know that two-thirds of the coins are the cistercii, the lower-valued coins, and the probability of getting a 1 is two-thirds, and the probability of getting a 4 is one-third. Now the population mean is defined as the expected value of a coin. And we work that out by multiplying each possible value by the associated probability and adding up the answers. So the expected value of the coin will be 1 times two-thirds plus 4 times a third, which gives us the answer 2, so that's the population mean. The population variance, var x, is equal to the expected value of the squared value of the coin. Take away the square of the mean. So that's going to be equal to the squares of the possible values times the probabilities. Take away the square of the mean, which is 1 squared times two-thirds plus 4 squared times a third. Take away the square of the mean. And that also happens to give us the answer 2. So in this case, both the population mean and the population variance are equal to 2. Okay, now let's move on and find the sampling distributions of some statistics. Suppose that three coins are chosen from the horde. We'll need to work out all the possible samples. So one possibility is that we get three cistercii, which we can write 1, 1, 1. The next possibility is that we get two cistercii, and then afterwards a denarius, and we can write that 1, 1, 4. And we can carry on listing the possibilities, bearing in mind that we should write them in a systematic order to make sure that we don't miss any out. So the next possibility is that first we get a cistercius, then a denarius, and thirdly a cistercius, and so on. 1, 4, 4. 4, 1, 1. 4, 1, 4. 4, 4, 1. And 4, 4, 4. Okay, having done this, we can find the sampling distribution for the mean value of the coins. That's the sampling distribution for a sample mean. Okay, well here are the possible samples. And let's divide those into categories. The first category will have one mean, the next category will have a different mean, and so on. In fact, the sample mean for the first category will be 1 plus 1 plus 1 over 3, which is obviously 1. The sample mean for the next category will be 1 plus 1 plus 4 over 3, which is 2. For the third category, it's 1 plus 4 plus 4 over 3, which is 3. And finally, when you get 3 denarii in a row, the sample mean will be 4 plus 4 plus 4 over 3, which is obviously 4. Okay, now we know the possible values for our sample mean. We need to know the probability that we get each of those values. So first of all, what's the probability that we get a sample mean of 1? Well, that happens if we get 3 cistercii in a row. So the probability will be 2 thirds cubed, and that's 8 over 27. What about the probability that we get a sample mean of 2? Well, that's more complicated. First of all, there are three ways of getting a sample mean of 2, as you can see. And the probability of getting it in each way is going to be the square of 2 thirds times 1 third. And that's because we need to get 2 cistercii and 1 denarius. So the probability of getting a sample mean of 2 is 3 times the square of 2 thirds times 1 third, which is 12 over 27. We can find the probability of getting a sample mean of 3 in a similar way. That's going to be 3 times 2 thirds times the square of 1 third, because again, there are three ways of getting it. We need to get 1 cistercius and 2 denarii. And that probability is equal to 6 over 27. Finally, the probability of getting a sample mean of 4 is the same as the probability of getting 3 denarii in a row, which is the cube of 1 third, which is 1 over 27. Okay, so now we found all the different possible values of the sample mean, and we also know the probability of obtaining each value. So that means we can write down the probability distribution. Here it is. It shows the possible values of the sample mean and the probability of obtaining each possible value. And this is, in fact, the sampling distribution for the sample mean. So that's the answer to the question. Okay, now let's work out the sampling distribution for the median value of the coins. We need to look at the possible samples again. In the first category, the median is 1. That's obvious, because all the values are 1. In the second category, the median is also 1, because if you take any of those samples and write them in order, you always get 1, 1, 4. And so the middle value is 1. In the third category, the median is 4, because this time if you write them in order, you get 1, 4, 4, and the middle value is 4. And obviously in the final category, the median is 4. So now we can see that there are two possible values of the median, 1 and 4. Let's work out the probability of each value. Well, the probability that the median is equal to 1 will be the sum of the probabilities we worked out earlier. The probability for the first category is 8 over 27, and the probability for the second category is 12 over 27. So the total probability, i.e. the probability of getting median of 1, will be 20 over 27. In the same way, the probability that the median is 4 is going to be 6 over 27 plus 1 over 27, which is 7 over 27. Okay, so now we know the possible values of the median, and we also know the probability of getting each value. So again, we can write these in a table, showing the possible values of our random variable, and the probability that it has each value. And this is the sampling distribution for the median. Okay, let's look at one more example quickly, and this is going to have to do with travel by Eurostar. Suppose that 80% of travellers between London and Paris make their journey by Eurostar, and we look at a random sample of 10 people who've travelled between London and Paris this year. Let's define the random variables x1, x2, and so on up to x10, i.e. one random variable for each passenger, as follows. Let's say that each xi is equal to 1 if the i-th person made their last journey by Eurostar, and zero otherwise. So what we've got here is 10 random variables, and each one is equal to 1 if the person travelled by Eurostar, and zero if they didn't. Well, the question is, what's the sampling distribution of the sum of these random variables? Well, this looks very complicated, but in fact it isn't, it's a bit of a trick. If you think about it, what you get by adding up these random variables is simply the number of people who travelled by Eurostar. And so what we've got here is a binomally distributed random variable, because we're looking at the number of successes in a sequence of trials. The trials being the people, and success being that they travelled by Eurostar. So the sampling distribution here is the binomial distribution with parameters 10 and 0.8, 10 because that's the number of trials, the number of people we're interested in, and 0.8 because that's the probability of success, success being that they travelled by Eurostar. Now, do you remember this because you see questions like this fairly often? Okay, well now we know the sampling distribution, we can ask questions about probabilities. For example, we can ask the probability that the sum of our random variables is greater than or equal to 9. Well, we can do this by adding together the probabilities that it's 9 and that it's 10. And the probability that it's equal to 9, according to the binomial formula, is 10 choose 9 times 0.8 to the power of 9 times 0.2 to the power of 1, which is equal to 10 times 0.8 to the power of 9 times 0.2. And the probability that it's equal to 10 would be 10 choose 10 times 0.8 to the power of 10 times 0.2 to the power of 0, which is simply 0.8 to the power of 10. So the probability that the sum is greater than or equal to 9 will be 10 times 0.8 to the power of 9 times 0.2 plus 0.8 to the power of 10, which turns out to be 0.376 to three significant figures. Okay, that's nearly the end of my video on the sampling distribution of a statistic. What you have to be very careful to understand is that on the one hand you have a population with facts like the mean and the standard deviation. And on the other, when you take a random sample of numbers from the population, you can do a variety of calculations with these numbers to produce all sorts of statistics. So statistics are randomly produced. They're random variables, which depend on the particular values that happen to be in your sample. And because they're random, they will have a probability distribution associated with them. The exact probability distribution that you get will depend on the calculation that you do with the numbers in your random sample, and also on the distribution of numbers in the original population. So each statistic will have its own probability distribution, depending both on the nature of the statistic and the underlying population. The probability distribution is called the sampling distribution, and there's lots of fantastic maths to do in working out what the sampling distribution of any particular statistic is. Okay, that is the end of my video. Thank you very much for watching. I hope you found it useful.