 Hi there and welcome to module six where we will discuss confidence intervals. So what a confidence interval is is it's making a prediction or making an interval that predicts the value of the mean of a population or the proportion of a population by having a lower limit and an upper limit. So this lower limit and upper limit together create an interval and the hope is that the true value of the mean or the population proportion is located in that interval. So this actually is our first dive into what is called inferential statistics and that's basically using sample data to make an estimate about a population parameter. In this course we were specifically going to look at the population proportion and the population mean. Those are the two parameters we're concerned with. So in this module we're going to use sample statistics to make an interval estimating the value of a population parameter. So we use sample statistics to do this. So our first stop will be estimating a population proportion. So that's the proportion or percentage of a population that has a specific characteristic. To start a point estimate is a single value or point used to approximate a population proportion. So it's basically a sample proportion. So the sample proportion which is going to be used as our point estimate it's a starting point to build our interval. We start with a certain value then we'll build our interval around that predicted value and the sample proportion the notation is p hat. So that's p hat and then some books will even use notation p prime but p hat's pretty much the main notation used in this course and it is the best point estimate for the population proportion p. So we use p hat to predict p. So research firm conducted a survey of 1007 adults and they found that 85% of them know what Twitter is. What is the best point estimate? So the best point estimate is obviously your sample proportion p hat. What is p hat in this case? Out of your sample what proportion of adults know what Twitter is? Well 85%. So we'll write 0.85. That's all they're asking for. That's going to be the starting point for our interval that's going to be used to estimate the true population proportion. So 0.85. Well what about a survey for those in favor of voting for a certain candidate in an upcoming election? It showed that 523 out of 839 people were voting for candidate A. What is the best point estimate in this case? So what proportion of adults in the sample so we're finding our sample proportion here were in favor of voting for candidate A? Well they don't outright give you the percentage or the proportion. You have to make it. Well that's 523 people out of 839. That's our sample proportion and that is about 0.62. That would be our point estimate our sample proportion in this case. So our confidence interval that we're making starts with a sample proportion, a point estimate. A confidence interval is a range or interval of values used to estimate the true value of a population parameter as I previously mentioned. The confidence interval is sometimes abbreviated with the letters CI. CI represents confidence interval. Every confidence interval has both a significance level and a confidence level. The significance level which is denoted by the Greek letter alpha is the chance of the population parameter not lying in the confidence interval. And then you have a confidence level. It's the probability 1 minus alpha that the confidence interval actually does contain the population parameter. So for instance you could have a 95% confidence level so that means there's a 95% chance that the population proportion is contained in that interval and there's a 1 minus 95 a 5% chance that the interval does not contain the population parameter. It's important to note that alpha is equal to 1 minus the confidence level. Alpha, your significance level is 1 minus the confidence level. So a confidence interval can be expressed of the form you could write your proportion P. You could write your lower bound of your interval on the left is 0.828 and then your upper bound on the right is 0.872 and then have less than signs separating your proportion from your numbers. This is called inequality notation or tri-linear inequality form. Or you could literally write your interval as a lower bound and upper bound separated by a comma enclosed within parentheses. So that's interval notation form. And so there's two different forms which we can write our intervals. An example of a written interpretation of this confidence interval displayed on the screen is with 95% confidence the population proportion is between 0.828 and 0.872. That's how you interpret in words a confidence interval that you find. So I already said that we start a confidence interval development using a sample statistic called a point estimate. And because obviously whenever you look at a sample there's error involved we have to create this kind of margin of error this error bound around that sample statistic. So the first step to do that is do something and find something called a critical value. It is a z score that means it's a data value along the x-axis of the bell curve that can be used to distinguish between sample statistics that are likely and unlikely to occur. The z score separating the right tail region is commonly denoted by z sub alpha over 2. So what this means is whenever you see the critical value the notation z sub alpha over 2 which is used to help calculate the error of our confidence interval that we're developing z sub alpha over 2 is read as the z score whose area to the right is alpha over 2. Confidence intervals will have both the positive and negative critical value. We like to just basically focus on the positive critical value when we do our calculation. So if you think about this if you have a confidence level that's 95 percent so that means you're trying to center over the middle 95 percent of your your mean values or your proportion values in this case. Well what this means is that if your confidence level is 95 percent this leaves five percent to divide up amongst the two tails. So alpha is five percent alpha over two or five over two is two and a half percent so it really helps to draw the picture here. If you're finding a 95 percent confidence interval that means you have this little outer right hand region that has two and a half percent and then you have this left hand region that also has two and a half percent and our goal is to find the data value whose area to the right is alpha over two and we'll use technology to find this data value along the x axis then the picture will help you find that data value. So we'll actually use Google Sheets to find this data value or this critical value. So the first stop here is find the positive critical value that corresponds to a 95 percent confidence level. Remember what I said I said the key is to always draw a picture so I'm going to draw my bell curve and what we're looking for is we're trying to focus on the middle 92 percent and I want to find the positive critical value. So that's this right hand cutoff value that separates the right tail from the middle 92 percent of the possible proportion values. So if I have 92 percent what does that leave for the two tails? It leaves eight that's actually your alpha. So alpha over two or eight over two leaves you with four percent. So my two tails have an area each of four percent. The picture really puts it in the perspective. You got your middle 92 percent that's what you're focusing on for your confidence interval and then you have four percent left on either side. There's a strong symmetry here. So to find your critical value we'll do just what we did in the previous module. We'll go to Google Sheets and type in the mean of zero sigma equals one because what we're doing is we're putting everything in terms of a standard normal distribution and then we need the right area to the left to find the value along the x-axis of the bell curve you use area to the left. Well what is the area to the left? Well what's 92 plus 4 is 96 percent also known as 0.96. So these are the three numbers we need for Google Sheets zero one and 0.96. So we'll go to the compute tab we're just looking for the critical value normal region we type zero we type one and then left tail area we type 0.96. You only need to type in three things and you'll see you'll get 1.75. 1.75 is the positive critical value. So your critical value is 1.75. Let's do that again. Let's now find the positive critical value associated to an 85 confidence level. So we have our bell curve so what percentage is going to be in the middle of the bell curve that's 85 percent and what's left for our two tails or alpha our remaining percentage is 15. Well what's alpha over two what's 15 over two that's 7.5 left over for each of the tails. So we're looking for the positive critical value the value whose area to the right is 7.5 percent. But remember you need to actually go to Google Sheets to find our critical value in Google Sheets you need to type the following information in on the compute tab new is zero sigma is one and then you need to know your area to the left and what is your area to the left in this case the area to the left of the value you're finding is 85 plus 7.5 so that's like 92.5 percent and as a non percentage that's 0.925 and that's going to tell me what z sub alpha over two is. So Google Sheets we literally need to type area to the left as 0.925 zero and one are already in place for us so you get to two decimal places 1.44 so 1.44 is the critical value there. So this is how you actually find the critical value if they ever ask you to find the critical value for a confidence interval please draw your picture put your main confidence level in the middle and then find the remaining percentages that go into the two tails and you use the Google Sheets document where you mu is zero sigma is one and then you type in the area to the left so I like to use pictures rather than fancy formulas so the good news is most of the time in this course you'll have confidence levels of 90 percent 95 percent and 99 percent which means corresponding levels of alpha the corresponding significance levels will be 0.10 0.05 and 0.01 well you don't have to draw the picture every time to find the critical value associated with each of these confidence levels for a 95 for a 90 percent confidence level the critical value is 1.645 for a 95 confidence level which is very common the critical values 1.96 and for 99 confidence level is 2.575 so write these common critical values down because they can definitely save you time so we start off our confidence interval creation with a point estimate which is the sample proportion in this case and then we have to find what is called the error bound error bound or the margin of error because obviously there's always some sort of error involved whenever we're estimating so sometimes it's denoted by ebp or just with the capital letter e and it's the maximum likely difference between the observed proportion p and the true value put in the true value of the population proportion p so the error bound is the critical value that's the critical value times the square root of p hat times q hat over the sample size n and it's important to note that q hat is literally one minus p hat so that is the error bound formula and fortunately or unfortunately if you're ever asked to give the error bound you must calculate it by hand technology is not going to tell you what it is if they ever ask explicitly for the error bound itself you must calculate it by hand so in the error bound formula p is the population proportion p hat is the sample proportion q hat is one minus p hat n is the sample size e or ebp is the margin of error and z sub alpha over two remember that is the critical value it's the z score separating an area of alpha over two in the right tail of the standard normal distribution so how to find a confidence interval how to actually get your final answer for your confidence interval well if you take your point estimate your sample proportion and subtract the error that gives you the lower bound if you take your sample proportion and you add the error that gives you your upper bound and p is somewhere the true proportion to sandwich somewhere in between remember this is the inequality form then another way to represent the interval is to write p hat plus or minus e p hat plus or minus the error and the other way is using interval notation you have p hat minus e comma p hat plus e that's how you express an interval and interval notation so we have inequality form on the bottom we have interval form or interval notation and in the middle is there's really no fancy notation there I just call it plus or minus notation so that being said we can express the confidence interval 0.333 is less than p is less than 0.555 and plus or minus form to do this I need to note that whenever they give me a confidence interval that is in any quality form you can find the sample proportion by adding the right bound to the left bound and dividing by two so that's 0.555 plus 0.333 divided by two 0.888 divided by two or 0.444 that is my p hat in this case to find your error bound when they give you the actual confidence interval you take the right bound 0.555 subtract the left bound 0.333 and divide by two so that means if we want to put this in plus or minus form p hat plus or minus e that will give you 0.44 plus or minus 0.11 that's how you would express the confidence interval in plus or minus form that's all you would write so as of right now you've been exposed to a lot of different components of confidence intervals including the the sample proportion which is a point estimate we've talked briefly about critical values which is a part of the overall error bound and then we've talked about various ways to express confidence intervals along with different confidence levels so in our next video we'll actually put all this together and create our own solid confidence intervals so that's all for now thanks for watching