 Suppose the members of a population either have a feature, type A, or don't, everyone else, taking a sample of size N and determining how many are type A is a hypergeometric experiment, but if the population is large enough, it's approximately binomial, and if the sample is large enough, it can be approximated by a normal distribution. For example, suppose that 15 million people support Proposition A, while 14 million do not, let's model the number of voters in a sample of N equals 100, who support Proposition A. Now, the empirical probability the first voter chose and supports Proposition A is... Now, this probability will change a little bit because if we draw a supporter of A, then the number of supporters of A drops to 14,999,999,999, and the number of actual voters also drops, but it doesn't change much over our N equals 100 voters, so it's approximately binomial, and we can approximate this using a normal distribution. So we compute our mean and standard deviation, and this is a borderline case. Our standard deviation is just a little bit under 5, but we'll go with it. And so we have a normal distribution with mean 51.72 and standard deviation 4.997, and so 95% of the time the number of supporters of A will be within 1.96 standard deviations of the mean, or between... That's the probability side. What about the statistic side? Suppose our sample of 100 voters included N supporters of A. Could we use the same interval to give a 95% confidence interval for the fraction of voters who supported A? And the answer is... No. And here's the problem we've run into. Suppose you pull 100 persons and find 61 of them support Proposition A. Let's try to find a 95% confidence interval for the number of respondents that you would expect to support Proposition A. So if we assume our proportion is 61 100s, then the number of respondents who support Proposition A should be approximately normal with standard deviation. And our 95% confidence interval will be everything within 1.96 standard deviations. Now we could claim the percentage of supporters of A is between 51.44 and 70.56%, but we'd be wrong. Maybe. So here's where the problem arises. Our data seems to support that up to 70% of the population supports Proposition A. But if 70% of the population supported A, the number of respondents who support A could be approximated using a normal distribution with mean and standard deviation. And in that case, 95% of the time we'd see between and we wouldn't see our observed value. In other words, if the claim we were making was true, we wouldn't see the data we're getting. Now we should take into account the shifting of the confidence interval. But in practice, we ignore it and use the observed empirical probability as the probability of success. And since we're going to use the empirical probability as our probability, we can just report that value. And this is the population proportion. But we often just call it the percentage of respondents. So here's how that works. Suppose 61 of 100 respondents said they supported Proposition A, find a 95% confidence interval for the population proportion of supporters of A. So we already determined we'd expect to see between 51.44 and 70.56 persons who supported A. And since we pulled 100 persons, we'd expect to see between .5144 and .0756 of the population supporting A. So let's put this all together. Suppose we pull n persons and find x of them support Proposition A. The confidence interval for the number will be plus or minus some multiple c of the standard deviation, where we use for our empirical probability the number who supported A divided by the sample size. The confidence interval for the proportion can be found by dividing by the sample size, which gives us. And if we wanted to, we can turn this into a result for the standard deviation for a proportion. But remember, don't memorize formulas, understand concepts. So you're better off ignoring the formula. For example, suppose a poll of 8500 voters is taken to determine whether they would vote for an abusive misogynist endorsed by a con artist. 4087 said they would. Let's find a 95% confidence interval for the fraction of voters who would vote for an abuser endorsed by a con artist. We can model this using a binomial distribution where our probability of success is our empirical probability 4087 out of 8500. And the number of trials is 8500. And we can model that using a normal distribution with mean and standard deviation. And so the number of supporters out of 8500 can be modeled using a normal distribution with mean 4087 and standard deviation 46. Now our 95% confidence interval will be all values within plus or minus 1.96 standard deviations or between. And since we pulled 8500 persons, the confidence interval for the proportion would be and as proportion we would probably express this as a percentage say between 47.02% and 49.15%. Alternatively, since our confidence interval is everything within plus or minus 1.96 standard deviations of the mean, the number of supporters is going to be in the interval. So our proportion is going to be between. And again, we can convert this into a percentage. We could say this is 48.1% plus or minus 1.1% where we've rounded our values to the nearest tenth of a percent. Or we can use a formula. The sample proportion is 4087 8500 or 48.08. And our proportion standard deviation will compute the formula and the 95% confidence interval will be plus or minus 1.96 standard deviations. And this gives us our 47% to 49.1%. And again, since 1.96 standard deviations is about 0.11, we might report our result as 48.1% with a 1.1% margin of error.