 Hello, welcome back. In today's lecture, we will be focusing on confidence intervals. We have previously looked at point estimates for the population mean. So we get a single value as an estimate for the population mean. However, this population mean is an unknown parameter and the estimate is only as the name suggests a guess value based on the sample you have chosen. Here of course, taking the sample mean and projecting it as an estimate of the population parameter. Suppose I take another sample, I will get another value of the sample mean and since these two point estimates are different, we really do not know which of them is better. Both of them are taken as random samples, the elements forming the random sample are independent. However, they give, they may give different values, they usually give different values. So which one of them is closer to the truth, closer to the population mean. So this is a question which we will be addressing in today's lectures. So rather than giving a point estimate, it would be nice if we can give an interval estimate for the population mean. Now we do not really have to do anything beyond collect the random sample and find the sample mean and the sample variance. Using the sample mean and the sample variance, we can construct the confidence interval. We can construct the interval estimate for the population mean. So what is the term confidence or what is the term confidence interval mean? So this is what we are going to look at today's lecture. So we have all travelled by trains and sometimes we may have gone to remote locations and the trains are very rare or infrequent in such places and we would really like to be in the station on time so that we can catch the train and reach our home without any delay. So the question we may ask the local people would be what time would the train be expected and the person depending on his experience or knowledge may give an interval on the arrival of the train. So the train may have been running through that place for the last 30, 40 years and so there would be a kind of a population mean on the arrival of the train to the station but nobody has really logged in the exact arrival time the last 40 years. So nobody really knows the average or the mean arrival time of the train. So any person when being asked what is the arrival time of the train he may say usually the train comes to the station let us say at 230 or he may say it may come between 220 and 240. Another person may say well to be on the safe side so that you do not really miss the train you may assume that the train comes from between 2 to 3. So the wider the interval given the more safer you are in actually catching the train. So we are thinking that the larger interval of 2 to 3 will capture the mean time of the train arrival to the station. But nobody really wants to go to the station too much in advance it will be very boring to wait in the station. So we would require a precise instrument suppose somebody says that the train is going to come between 220 and 240 it is acceptable some person may confidently proclaim that it may be coming between 225 and 235. So this will help us to plan our journey better but at the same time if the interval becomes very narrow then there is a danger of us missing the train for example the person might have said the train is going to come between 225 to 235 pm. So we may reach the station around 224 and we may be told that the train has left. So we have more confidence when somebody says the train is going to come between 2 and 3 but that is a very vague estimate. So it is not a precise estimate but if you may try to make the interval or the range of arrival very precise there is also danger that the train might have left. So how to construct the interval in which we have a high confidence and it is also precise. So that is what we are going to look in this course. The notation and the basic ideas are based on the prescribed textbook for the course the one written by Montgomery and Runger. So we were looking at only point estimates of the population parameter so far. This idea which I am going to tell you right now is not only meant for the population parameter mu but also for other parameters. But we have to look at their probability distributions. We have to find out the probability distribution of the variance or the standard deviation. So we will be applying the concepts of confidence interval to the population parameter mu based on the random sample mean. Even if we do a random sample and we get an unbiased estimate of the population mean it is only an estimate. It is only an estimate. Another random sample may give a completely different estimate of the population mean which of them is correct. In other words which of them is more closer to the population mean mu. Since the population parameter mu is unknown we really do not know which of the random samples gave the sample mean that was closer to mu okay. So we are really in the dark on which of the x bar values to believe. So rather than giving a point estimate it makes more sense to give an interval estimate. It makes little practical sense to take many random samples. So the nice thing about this concept is we will be basing or constructing a suitable interval around the population parameter mu based on a single sample based on the information provided by a single sample. So we want to be reasonably sure that the upper and lower bounds of an interval we construct does actually encompass the population mean. So we want to know how wide this interval must be. How to also quantify the reasonably sure criterion. One person's reasonably sure may differ from another person's reasonably sure criterion. So we want to quantify this reasonably sure criterion. When the interval is wider we are more confident that we have encompass the population mean through our interval okay. So we have a sample mean and we are constructing a confidence interval using that sample mean and if we choose to make this particular interval quite wide then we are reasonably sure again I am using the term reasonably sure. We are reasonably sure that we have encompassed the population mean. As I told earlier if we say the train is going to come between 2 p.m. and 3 p.m. we are making sure that the person or the passenger is not going to miss the train. So we tell that the train is going to come between 2 p.m. and 3 p.m. So the broader the interval becomes we become increasingly sure that this interval will encompass the population mean. However, there is no sense in making this interval very wide. We cannot be very safe okay. We cannot say that right the train is going to come sometime tomorrow afternoon and you are better off waiting there from 12 noon. So as the interval becomes wider and wider the practical utility of the interval reduces. So this is the key point. We are having only one random sample with us but that random sample comprises of n entities and we can use these n entities to get the sample mean and the sample variance. We may be taking the marks scored in a particular exam or the height of the entities in the sample or the weights of the people whom we have queried. So we are going to have a collection of n attributes or data points and we can use this to find the random sample mean and the random sample variance. Using this information or using the random sample mean and the random sample variance we can have both a point estimate as well as an interval estimate. We can construct the so called confidence interval using the information contained in a single sample. So we define the interval estimate as a range of values around the population parameter with suitable upper and lower bounds that reasonably possibly contain the population mean. We do not claim that the interval estimate we have constructed with suitable upper and lower bounds will certainly contain the population parameter mu. No, we have not made that statement. We only say that reasonably or possibly contain the population mean. So an interval estimate for the population parameter is termed as the confidence interval. We develop a certain confidence that the interval estimate does indeed contain the true population parameter. It is likely that the interval may still not surround the population parameter. The moment I start using the word likely we are introducing the element of uncertainty or we are implying that there is a probability associated with this interval. If we construct the interval in such a manner that we can do it with high confidence then it is less likely or the probability is still small that this interval does not contain the population parameter. So it is less likely that this interval does not contain the population parameter. So there is a typo I will just correct it here. We cannot guarantee that a particular confidence interval does indeed contain the population parameter. Let us take infinite intervals which is not possible. But let us assume that 100 intervals are good enough and let us say that we construct these intervals in such a manner that 95 of them may encompass the population parameter. So this 100 may be 1000 if you want. But to show that percentage confidence I have used 100. If you want to construct a large number of intervals then you have to take infinite intervals and then out of the number of intervals chosen if we assume that 95% of those intervals should encompass the population parameter. 95 is a usual number we use. So if we decide to increase the confidence band more intervals will surround the population parameter. So we will be doing the discussion with the assumption that the population variance sigma squared is known. This is an assumption okay sigma squared is also a population parameter and just as mu we usually do not know sigma squared. But for the purpose of discussion let us take that sigma squared is somehow known. Later on we will see how to handle situations when sigma squared is also not known. So a random sample comprising of random variables x1, x2, so on to xn has a sample mean x bar. Assume that the random variables have come from a normal distribution. So we are making 2 assumptions here. The first assumption is the sigma squared is known. The second assumption is the random variables have come from a normal distribution. We know that a linear combination of independent random variables is also a random variable. The random sample has been chosen such that the elements are independent of each other. If they are taken from a normal distribution then the resulting linear combination of the random variables will also result in a normal distribution. It is important to note that the random variables x1, x2, so on are independent of one another. The sample mean is based on adding the random variable attributes and dividing by the total number. So it is a linear combination. So the sampling distribution of the means is also normally distributed and we know the properties of the sampling distribution of the means. We know that the sampling distribution of the means is centered around the population parameter mu and the variance of the sampling distribution of the mean is sigma squared by n. Here n is the sample size. It is also an important parameter even though n is not present in the population probability distribution function it plays an important role. The sample size n plays an important role in influencing the confidence level of the interval as well as the precision of the interval we are constructing. We will see more on this shortly. Now x is coming from a normal population. A linear combination of all the random variables gives the sample mean and this sample mean is also having a normal distribution with mean mu and variance sigma squared by n. So different sample means will have different x bar values. We are now interested in looking up at the probabilities. So rather than working with different sample means it will be helpful if we normalize them somehow. We use the standard normal variable z. We know that the standard normal variable z belongs to a normal distribution of mean 0 and variance 1. So to normalize x bar we convert it into z using the transformation z is equal to x bar minus mu whole divided by sigma by root n. Sigma is square root of sigma squared. Sigma is the standard deviation of the population. Since sigma squared is known, sigma is also known. This z follows the standard normal distribution. The standard normal distribution is 1 which has a mean of 0 and variance of 1. So what is the form of the interval we want? We want a lower limit for the population parameter mu. We want a higher limit or an upper limit for the population parameter mu. So let us express this as L less than or equal to mu less than or equal to u. This L and u will be different for different random samples okay. So depending upon the random sample you are drawing, L and u will get identified or estimated. So we cannot predict a priori what is the value going to be taken by L and u. It depends upon the sample which is being drawn. The random samples are based on random variables and we know that any combination of random variables is also a random variable. The sample mean is a random variable and if we are going to construct the bounds for mu based on random samples, we are going to then construct intervals based on the random samples. So L and u will then represent random variables corresponding to the lower limit and upper limit respectively. What I am trying to say is the intervals may take different bounds depending upon the random sample chosen. The random sample is a random variable. So the interval we are constructing based on the random sample is also random okay. So different intervals may take different values. Different samples may have different sample means. Different random variables x1, x2, so on to xn may take different values. So what I am trying to say is the intervals we are going to construct also behave in a random fashion. And since these intervals are bounded between L and u, L and u may take different values and hence L and u are themselves representatives of random variables L and U. So with this background, let us define the random variables L and u such that the following condition is obeyed. Probability of capital L less than or equal to mu, less than or equal to u is equal to 1-alpha. We know that the sampling distributions of the means have a probability distribution. It has a probability distribution function. So in this particular case, we are assuming that the sampling distribution of the means is a normal distribution okay. So we have a normal distribution curve available with us and that gives the distribution of x bar values. So using that curve, we define that mu has a lower bound and an upper bound such that probability of L less than or equal to mu, less than or equal to u is equal to 1-alpha. You may ask where is this mu coming from? Please remember and recollect that the sampling distribution of the means will have a mean value of mu which is the population parameter. So the random samples x bar are spread around the population parameter mu. So that is why the sampling distribution of the means will have mu at the center and using the normal distribution curve associated with this probability distribution of these sample means, we can define probability of L less than or equal to mu, less than or equal to u such that it is equal to 1-alpha. Alpha is a fractional value. It is bounded between 0 to 1. So we are constructing the interval estimate around the population parameter mu. We assume that sigma squared is known and the sampling distribution of the means is normal. Based on this information, we define L and u such that probability of L less than or equal to mu, less than or equal to u is equal to 1-alpha where alpha is bounded between 0 and 1. This means that the confidence interval constructed does indeed possess the population mean with the probability of 1-alpha. So if alpha is 0.1, 1-alpha would be 0.9. So the probability that the confidence interval constructed having the population mean is 0.9. We were talking about the 95% confidence intervals or confidence bands. So in order to get the 95% confidence, we have to put alpha as 0.05. So 1-alpha will then become 0.95. So reiterating, there is a 1-alpha probability that the confidence interval constructed from the sample drawn does indeed contain the population parameter mu. Since there is no unique sample mean, there is no unique confidence interval. What is 1-alpha really? We just saw that it is a probability that the sample drawn and the confidence interval constructed has a 1-alpha probability of encompassing the population mean. Montgomery and Runger have an interesting discussion regarding this. They say that the probability here is more of a frequency type. So we are now looking a bit more closely at the 1-alpha probability. We have already defined this 1-alpha as the probability that the sample drawn and the confidence interval hence constructed will have 1-alpha probability of encompassing the population mean. Montgomery and Runger have an interesting discussion on this. They say that once we have an interval with us, it may have the population mean mu or it may not have the population mean mu. When it is present, the population mean mu is present within the confidence interval. It is certainly present. If it is not present in the confidence interval constructed, it is certainly not present or it is certainly absent. So what is this 1-alpha probability really? So 1-alpha is a fraction of the confidence intervals we may draw from the population that will actually contain the population parameter mu. It is quite simple. So summarizing, we have probability of L less than or equal to mu less than or equal to u is equal to 1-alpha where 0 is less than or equal to alpha less than or equal to 1. Out of the intervals constructed from the large number of random sample means 1-alpha into 100 such intervals, interval estimates will contain the population mean. So after we draw the sample, we get the sample mean small x bar and we get the sample standard deviation small s. Using this information, we can easily construct L less than or equal to mu less than or equal to u. We can identify the value of L and value of u. The lower bound L and upper bound u are called as the lower and upper confidence limits and 1-alpha is called as the confidence coefficient. These terminologies are important because when you want to communicate your findings in papers or in conferences or even in group meetings, it is important that you use the standard terminology. So L and u are called as the lower and upper confidence limits and 1-alpha is called as the confidence coefficient. We do a bit of mathematical manipulations here. We have probability of L less than or equal to mu less than or equal to u is equal to 1-alpha where 0 is less than or equal to alpha less than or equal to 1. We may write this as probability of x bar-u by sigma by root n is less than or equal to z less than or equal to x bar-l sigma by root n equals 1-alpha. From this step to this step, it looks a bit difficult but in reality it is quite simple. Let us put a negative sign here and here and here. Since we are putting a negative sign, the inequality sign gets reversed. So we have probability of –u less than or equal to –mu less than or equal to –l and then we add x bar so that we get x bar-u less than or equal to x bar-mu less than or equal to x bar-l then we divide by sigma by root n. I will demonstrate this in the board. So we have the standard definition probability of L less than or equal to mu less than or equal to u is equal to 1-alpha probability of –u less than or equal to –mu less than or equal to –l. The moment I put a negative sign, what happens is the inequality sign reverses. So u comes here, L goes here and then I am adding x bar to all the terms and so I get probability of x bar-mu less than or equal to x bar-mu less than or equal to x bar-l. The probability still remains at 1-alpha, there is no change in that and then we divide by sigma by root n and we get x bar-mu by sigma by root n is less than or equal to z less than or equal to x bar-l by sigma by root n. The purpose of doing this is to get to the standard normal form. Here we have assumed that sigma is known and x bar follows the normal distribution. Since different x bars will have different normal distributions, we want to normalize them in such a way that we can reduce them to the standard normal form. That is why we are having this z term here and since we are dealing with probabilities, we can now use the standard probability tables to get the values. So we can, this is an inverse problem in the sense 1-alpha is given to you. So what should be the bounds for z? What should be the lower limit? What should be the upper limit? Such that the probability will be equal to 1-alpha. So we have probability of x bar-u by sigma by root n less than or equal to z less than or equal to x bar-l by sigma by root n and that is given as 1-alpha. We will call the variables on the either side of z as –z alpha by 2 and plus z alpha by 2 respectively. So this becomes –z alpha by 2 and this becomes plus z alpha by 2. So probability of –z alpha by 2 less than or equal to x bar-mu by sigma by root n less than or equal to z alpha by 2 is equal to 1-alpha. The terminology is again important, z alpha by 2 represents the upper 100 alpha by 2 percentage point of the standard normal distribution. z alpha by 2 is x bar-l by sigma by root n and that value is chosen in a certain manner. How it is chosen? I will soon demonstrate. We are at present defining this group x bar-l by sigma by root n as z alpha by 2 and we call it or term it as upper 100 alpha by 2 percentage point of the standard normal distribution. So if you look back, we have defined the statistic also a random variable x bar which was transformed into a standard normal variable z according to the transformation, z is equal to x bar-mu whole divided by sigma by root n. We have assumed that the sample was drawn from a normal population, the x1, x2 so on to xn were drawn from a normal population and hence the sampling distribution is also normal. The samples drawn were independent of each other and hence the sampling distribution is also normal. For the sampling distribution to be also normal or for the linear combination to be also normal, it is important that the sample constituents were independent such that their mutual binary covariances will vanish. So this theoretical derivation and interpretation we have already seen in our earlier lectures. So we have to define the interval in a suitable manner and then do some rearrangement. Here we have probability of –z alpha by 2 less than or equal to x bar-mu by sigma by root n is less than or equal to plus z alpha by 2 is equal to 1-alpha. We can then multiply sigma by root n on all sides and then subtract x bar, bit of simple mathematical manipulations. So I am just taking sigma by root n here, sigma by root n here and then I will get probability of x bar-z alpha by 2 sigma by root n less than or equal to mu less than or equal to x bar plus z alpha by 2 sigma by root n. So I would request you to work out this transformation yourself and see whether you get the final following form. So at this point we will take a small break and once we come back we will see how to present the confidence interval in its final form. We will also see what is to be done in order to make the confidence interval we have developed also be a precise interval. So we need to look at the issue of the confidence provided by the interval we have constructed but also the precision of the interval as well. So we will meet shortly.