 by 5, which is 0.04. So, we see that the variance is smaller here compared to the population and most of the values of x bar are within 0.12 units, 0.12 units being 3 times the sd of x bar, 3 times 0.04. So, from here I had indicated earlier also had I not known, had I not known the population mean, had I not known the population mean and if I had say got my sample mean say x bar as say 5.5 in this case then I can be quite confident, I can be quite confident in saying that population mean is within 5.38, 5.62. So, what we have done here you will observe that this is the sampling distribution of x bar and I know that most of the values of x bar must lie in and around 5.4 which is the population mean and when I say most of the values I mean suppose say around 99 percent of the values must lie in and around 5.4 and what are those two values which I need to figure out which I can say are the two values in and around which the population mean would lie with certain confidence. So, these two values are this 5.38 and 5.62 these are the two values within which the value of 5.4 is quite likely to occur and how much likely is it and the answer to that is it is likely to the extent of 99 percent because this is 5.62 and this is 5.38 and we know that this area within these two limits is 0.997. So, therefore, if I get a value of x bar to be 5.5 then I can be quite confident in saying that the population mean is within 3 SD is within 3 SD of this value of 5.5. So, to 5.5 I have added 0.12 to get 5.62 and I had subtracted 0.12 to get 5.38 and so therefore, the value of the population mean is expected to lie between these two values and I am confident that is going to be captured 99.7 percent of the time. So, this brings us formally to these confidence intervals. So, an alternative to reporting a single value which is the case for a point estimate for the estimation of the parameter we calculate and report an entire interval of possible values which is called the confidence interval and a confidence interval is always calculated by first selecting a confidence level like in the earlier problem we said that we are 99.7 percent confident that the true population mean would lie between those two values. So, this value of 99.7 percent that is called the confidence level. So, a confidence interval is always calculated by first selecting a confidence level which is a measure of the degree of reliability of the interval to have captured the true population mean mu. Now, one basic question that one may ask is well I can always make my confidence interval high and it is desirable to have a confidence interval with high confidence. For example, if I want to have the confidence interval of the mean height of this class right. So, I want to find out the mean height of this class. So, for that I take a sample of size say 5. So, I pick up 5 individuals measure their heights find the sample mean based on 5 observation. Now, from here I can always find a confidence interval with any degree of confidence suppose I want the confidence interval with 99.99999 level of confidence. Well if you ask if you demand such a confidence level well my interval is going to be between 4 feet and 6 feet. So, the mean height of the students in this class is between 4 and 6 feet and I am sure that the true value is going to lie between that. So, right. So, what is the drawback of increasing the level of confidence? It is that your confidence interval becomes very wide and that does not that is not that much of information. However, if you say I need a confidence interval with confidence 90 percent confidence that should carry the true population mean. In that case I can give you confidence interval of the type that the mean height of this class is between 5 feet 2 inches and 5 feet 6 inches. So, that would be concentrating more, my error in making the statement would be of the order of say 5 percent if I am giving this as a 95 percent confidence interval. So, there is a trade off in the sense that if you want to increase the level of confidence your width of the confidence interval is going to increase. So, a confidence level of 95 percent implies that 95 percent of all samples would give an interval that includes mu the population mean and only 5 percent of all samples would yield an erroneous interval. Let us take an example consider a random sample say x 1, x 2 so on x n from a population with mean mu and variance sigma square then for population normal. So, for the case when the population from where we are drawing the sample if it is normal then x bar would follow a normal distribution with mean mu and variance sigma square by n. The other option is that we do not indicate whether the population is normal, but we indicate that the sample size is large. So, n large in this case my x bar is approximately normal with mean mu and variance sigma square by n and the second case follows from the central limit theorem. So, in either of these two cases we can have that we have that x bar minus mu by sigma by root n this follows what distribution in either of these cases exactly or approximately what does it follow it follows the standard normal distribution with mean 0 and variance 1. So, from here if I draw the picture of the standard normal curve I know that if this area the central area is 0.95 what are the two values within which this area is 0.95 that is minus 1.96 and 1.96 we know that this area result holds for the standard normal curve. So, since this x bar minus mu by sigma by root n follows the standard normal distribution we can therefore, say that the probability that x bar minus mu by sigma by root n lies between 1.96 and minus 1.96 this is how much what is the probability of the random variable x bar minus mu by sigma by root n lying between minus 1.96 and 1.96 this is 0.95. So, from here can we make a probability statement on mu. So, this in other words means that minus 1.96 sigma by root n is less than equal to x bar minus mu is less than equal to 1.96 sigma by root n this is equal to 0.95 or finally, we can write x bar minus 1.96 sigma by root n is less than equal to mu is less than equal to x bar plus 1.96 sigma by root n is equal to 0.95. So, the last probability statement indicates that the true population mean mu is captured between these two limits is captured between these two limits 95 percent of the time. And these two limits these are called the lower and the upper limits. So, this is called the lower limit and this is called the upper limit. So, we can finally, say that with a probability of 0.95 sigma by root n is less than equal to 0.95 the selected sample will be such that the value of mu is captured between lower and upper limits. So, from here we can see that we have literally derived the confidence interval which constitute these two sample values the lower and the upper limit. And in this case we have written the 95 percent confidence interval. So, 95 percent confidence interval for mu when sigma the population S D is known in this case if after observing x 1, x 2, x n we compute the observed sample mean x bar then a 95 percent confidence interval for the population mean mu can be expressed as x bar minus 1.96 sigma by root n and x bar plus 1.96 sigma by root n these gives the lower and the upper limits of the confidence interval. Now, where in this confidence interval has this value of 95 percent been used where is this 95 percent can anybody guess where has this 95 percent fact used the fact that it is a 95 percent confidence interval how has it been used that has been used through having this value of 1.96 because we know that the central area of a standard normal curve within minus 1.96 and plus 1.96 is 0.95. So, let us take a example given that the population S D is 0.2 a 95 percent confidence interval for the population mean height when sample mean based on a sample of size 25 is say 5.45 this is from the formula we have just plugged in the value of the sample mean. So, this which is 5.45 and we have plugged in the value of the S D and we have used the fact that the sample size is 25 in a formula which gives us the required 95 percent confidence interval 5.37 and 5.53 and we can say with a confidence of 95 percent. That the statement that the true population mean lies between these two values. So, given this confidence interval say 5.37 and 5.53 which is a confidence interval can one tell what is the point estimate of the population mean mu. So, given the confidence interval how can we find out the point estimate of mu. If you look at this very carefully we know that the point estimate of mu is the sample mean x bar, but how do we find out x bar if you are given the lower and the upper limit of a confidence interval. If you observe that this lower and the upper limits are on the two sides of x bar at equal distance and the lower limit they are equidistant from x bar in here. So, in order to find out x bar given the lower limit and the upper limit you just simply need to take the simple mean of the two limits. That is going to give you x bar and therefore, that is the point estimate of the population mean given the confidence interval. Now we concentrated here only to find out a 95 percent confidence interval in the sense that we are only confident 95 percent of the time that the given confidence interval will capture the true population mean. However we can consider other levels of confidence since we already know that the z curve looks like this and the area within minus z alpha by 2 and z alpha by 2 is 1 minus alpha. So, this area is alpha by 2 and for the case where we looked at the 95 percent confidence interval this area was 0.95. So, these two areas they constituted the remaining in two halves 0.025 and this is also 0.025. So, that the total area is 1. So, from here we see that the probability that the standard normal random variable z lies between minus z alpha by 2 and plus z alpha by 2 is 1 minus alpha and this allows us to work out the 100 into 1 minus alpha percent confidence interval for the mean mu for a normal population when the value of alpha is known is given by x bar minus z alpha by 2 times sigma by root n and x bar plus z alpha by 2 sigma by root n. So, basically the value of z alpha by 2 can be worked out given the value of alpha. So, if you consider alpha to be 0.05 that would give you a 95 percent confidence interval and the value of z alpha by 2 when alpha is 0.05 is 1.96. So, let us try to work out the 99 percent confidence interval for the same problem which we did. So, given the population SD as 0.2 a 99 percent confidence interval for the population mean height when the sample mean based on n is equal to 25 is 5.45 is the same problem. So, only thing that we need to be careful here is since it is asking for a 99 percent confidence interval we have to get hold of the correct value of z alpha by 2 where this area is. So, this is 0.99. So, this would be 0.005. So, then we can look at the standard normal tables and get hold of the value of z alpha by 2 and that value for alpha equal to 0.01 the value of z alpha by 2 is 2.58. So, we have simply substituted the value of z alpha by 2 in this and that gives us the required confidence interval which is 5.35 and the value of z 5.55. So, the true population mean mu lies or rather is captured within this confidence interval 5.35 and 5.55 and we are confident to the extent of 99 percent that that is correct. So, we keep an error of only 1 percent in going wrong in saying that the true population mean is captured within these two limits of the confidence interval. You can clearly see that this confidence interval has lower limit 5.35 and the upper limit 5.55. We service this the earlier one would have confidence interval which will be shorter or will it be larger? The width of the confidence interval of this we service the 95 percent one was this. So, here it is 5.37 and 5.53 whereas, for the 99 percent we have the lower limit a bit lower than 5.37 and in other words the width of this confidence interval is larger. So, the trade off is you are more confident in making the statement, but at the cost that you your confidence interval has a larger width. So, this brings us finding out what should be the desired sample size in order to achieve a desired level of confidence and desired width of the confidence interval. So, suppose I am looking at the 1 minus 100 into 1 minus alpha percent confidence interval and we know that that is x bar minus z alpha by 2 sigma by root n and x bar plus z alpha by 2 sigma by root n. We know that this is the confidence interval. So, from here what is the width of the confidence interval? So, the width of the confidence interval is the difference of the upper limit and the lower limit which works out to as twice z alpha by 2 sigma by root n. So, that is the width of the confidence interval. Similarly, we have the term called the margin of error and the margin of error is nothing but half of the width. So, for a desired margin of error or the for desired w confidence level say 100 into 1 minus alpha percent from here we see that the value of root n is 2 z alpha by 2 sigma by w. So, in other words for the desired width and the desired confidence level we can indicate what should be the sample size in order to satisfy that requirement and that sample size is provided here through root n equal to 2 z alpha by 2 sigma by w where w is the width of the confidence interval. So, the general formula for the sample size n that is necessary to ensure a confidence level of width w is the square of twice z alpha by 2 sigma by w. So, you can always work out the required sample size. So far we have worked out the confidence interval when sigma the population s d of the population standard deviation is known. Now, let us look at the situation where the population of interest is normal. So, we are given that the population from where we are drawing the sample is normal. So, that x 1 x 2 x n constitute a random sample from a normal distribution with both mu and sigma unknown. So, the moment the distribution of the population is normal so the distribution of the sample mean would also be normal. However, we cannot use the distribution of the sample mean as normal to find out the confidence interval because the confidence interval that we have learned so far requires the value of the population s d sigma and here we do not know that. So, this requires the use of another distribution called the t distribution when x bar is the mean of a random sample of size little n from a normal distribution with mean mu. We define this random variable capital T which is x bar minus mu by s by root n where s is nothing but the sample standard deviation which you have learned earlier. This is the sample standard deviation s also called the standard error s by root n is called the standard error of x bar. Now, this statistic T which we are defining here has a probability distribution called a t distribution with n minus 1 degrees of freedom. So, this statistic T would be used therefore, to find out the confidence interval and as you can see this involves not getting into the value sigma which is unknown in this case. So, what are the properties of this t distribution? First of all if we consider T v to be the density function curve for v degrees of freedom then the T v curve is bell shaped and centered at 0. So, the first thing is that the t curve is again a bell shaped curve and it is centered about 0. So, this is similar to the z curve. However, each T curve is spread out more than the standard normal z curve. So, we are saying it is more spread out. So, if this is the T curve then since it is considered to be more spread out than the standard normal curve. Therefore, the standard normal curve would look like. So, it was the standard normal curve we serve as the T curve is more concentrated in and around 0. As the degrees of freedom v increases the spread of the corresponding T curve decreases. In other words it slowly moves towards the shape of the normal curve the spread goes on decreasing for the T curve. Finally, as the degrees of freedom tends to infinity the sequence of T curves approaches the standard normal curve. In other words the z curve is called a T curve with degrees of freedom infinity. So, for large degrees of freedom the T is nothing but the z. Again the T critical values as we have discussed the critical values for the normal the chi square. So, for T the critical value is represented as T alpha nu alpha v the which is the number on the measurement axis for which the area under the T curve with v degrees of freedom to the right of this T critical value is alpha. So, let us draw this picture to see what is this T alpha v. So, that is nothing but this value on the measurement axis such that the area on the right of this T critical value is alpha. Now, these values these T values again can be found out from the tables. For example, this is the standard table which is there in your books. So, this gives you the T critical values for example, I want to find the T critical value 0.025 with 15 degrees of freedom. So, in that case we have to look at this degrees of freedom here corresponding to 15. Here the symbol basically they have used is little n for degrees of freedom and we have to look at the value of alpha which here is 0.025. So, the required answer is this which basically tells me that the area on the right of T 0.025 15 is 2.131. So, T 0.025 15 is 2.131. And this area is 0.025. So, we have this T critical values from this table which you can work out for various values of alpha. So, we are going to use this T distribution for finding the confidence interval for small samples from a normal population with unknown sigma. So, let X bar and S be the sample mean and standard deviation computed from the results of a random sample from a normal population with mean mu. The 100 into 1 minus alpha percent confidence interval is then given as X bar minus the T critical value corresponding to alpha by 2 with n minus 1 degrees of freedom times S by root n and the corresponding upper limit where we replace the minus by the plus sign. So, the derivation of this confidence interval can be worked out exactly in the same way as we did for the case where we use the normal distribution. So, here we know that X bar minus mu by S by root n follows a T distribution and if I work out the probability as 1 minus alpha we can work out this confidence interval. So, let us consider this example. Consider a population with mean mu a random sample of size 16 is drawn from the population. The sample observations X 1, X 2 and so on X 16 gives the following values. So, we are given summation X i varying from 1 through 16 and summation X i square and from here we can work out the sample mean and the sample standard deviation and for 15 degrees of freedom since we had n equal to 16. So, for 15 degrees of freedom and for alpha equal to 0.025 we already saw that the value of the T critical value is 2.131. So, this allows us to work out the 95 percent confidence interval for the population mean mu as this. So, what we have done here is we have used the T critical value instead of the Z critical value. One important thing which is missing here which should be questioned is when you say that consider a population with mean mu it is important in order to use this confidence interval we have to specify that the population is normal. If the population is normal then when sigma is not known this result follows from the T distribution. However, when we have large sample then we may not require explicitly indicating that the population from where we are drawing the sample is normal even though we do not know the population S d sigma. So, last sample confidence intervals for a population mean when sigma is not known here if the sample is large if n is sufficiently large the standardized variable Z which is X power minus mu by S by root n has approximately a standard normal distribution. And this implies that X bar plus minus Z alpha by 2 S by root n is a last sample confidence interval for mu with level of confidence 100 into 1 minus alpha percent. So, here again when we say n is sufficiently large we would stick to the thumb rule of n being written equal to 30. So, we can use this last sample confidence interval when sigma is not known by replacing sigma by the sample standard deviation and then use the result of the standard normal curve. So, let us consider this example consider the population of weights for ninth grade students in K V I I T. A random sample of size 48 is drawn from the population with mean mu the sample observations X 1 X 2. So, on X 48 gives the following values we have X bar and summation X I square summation X I square and from here we work out the X bar and the sample S D. And then the 95 percent confidence interval for the population mean mu is then given as using the formula the formula which we have for the confidence interval for the normal case the first case and for n large only thing is here we do not know what is sigma we have replaced instead of sigma we are using the sample standard deviation to get the 95 percent confidence interval. Similarly, the last sample confidence bounds for mu the 100 into 1 minus alpha percent upper confidence bound is given as this and similarly the 100 into 1 minus alpha percent lower confidence bound is given by this. So, in other words we make statements that mu would not exceed a certain value with certain the with level of confidence as 100 into 1 minus alpha percent this is given through the upper confidence bound and similarly for the lower confidence bound. Now let us look at the confidence intervals for variance and the standard deviation of a normal population. So, if X 1 X 2 X n be a random sample from a distribution with mean mu and variance sigma square then the sample variance S square is this quantity and this S square this statistic and this S square this statistic has an expected value of sigma square in other words the expected value of S square is sigma square and furthermore the statistic n minus 1 S square by sigma square this follows a chi square distribution with n minus 1 degrees of freedom provided x i are i i d normal. So, if my random sample is from a normal population then n minus 1 S square by sigma square follows a chi square distribution with n minus 1 degrees of freedom. In other words the random sample X 1 X 2 X n from a normal distribution with mean mu and variance sigma square the random variable n minus 1 S square by sigma square which is nothing but summation x i minus x bar whole square by sigma square has a chi square probability distribution with n minus 1 degrees of freedom. Further we have the following result on independence if X 1 X 2 X n be a random sample from a normal distribution with mean mu and variance sigma square then it can be shown that the sample mean X bar and the sample variance S square are independent random variables. So, that is an important result which indicates the independence of these two statistics X bar and S square we already know that X bar follows a normal distribution and from the result just now indicated n minus 1 S square by sigma square follows a chi square distribution with n minus 1 degrees of freedom. So, the more important thing here is that the sample mean X bar and the sample variance S square are independent when X 1 X 2 X n are i i d normal. We had already discussed the chi square critical value which is chi square alpha v and that is nothing but the value on the measurement axis says that the area on the right of this chi square critical value is alpha and we need this piece of information to actually work out the confidence interval for the population variance sigma square. So, since we know that n minus 1 times S square by sigma square follows chi square with n minus 1 degrees of freedom we can make a statement that the probability that n minus 1 S square by sigma square is less than equal to chi square alpha by 2 v. So, this is my question by chi square alpha by 2 v. So, this area is alpha by 2 and suppose I have this value which is chi square 1 minus alpha by 2 that is the area on the right of this critical value nu that gives you this inner probability as 1 minus alpha and this area is alpha by 2. So, therefore, we have that the statistic lies between chi square 1 minus alpha by 2 v and chi square alpha by 2 v and this probability works out to as 1 minus alpha. So, from here by rearranging we have n minus 1 S square by chi square alpha by 2 v is less than equal to sigma square is less than equal to n minus 1 S square by chi square 1 minus alpha by 2 v this probability is 1 minus alpha and clearly from here this leads to the 100 into 1 minus alpha percent confidence interval for sigma square the this is your lower limit and this is your upper limit. So, to summarize we have that 100 into 1 minus alpha percent confidence interval for the variance sigma square of a normal population has lower limits n minus 1 S square by chi square alpha by 2 n minus 1 and the upper limit n minus 1 S square chi square 1 minus alpha by 2 n minus 1 and in order to find out the confidence interval for sigma we simply take the square root of these two limits to give us the confidence interval for sigma. So, before we end today I would like to we already saw this problem most of you who did not come later this was a problem of trying to pick up which of these three curve is most suitable to indicate the sampling distribution of the sample mean based on sample sizes 50, 75 and 100. So, which one of these three represents the sampling distribution curve for sample of size 50 and we are given that these three curves are corresponding to 50, 75 and 100. So, the answer to this is the one which has got the largest spread which is b because the spread decreases as the sample size increases. So, the other two will correspond to 75 and 100. So, which one corresponds to 100 a, b or c the c 1 let us look at this problem. So, a psychologist wishes to estimate the mean i q of students of one college he picks a random sample of 49 students from the college measures each students i q and obtains the following 95 percent confidence interval for the population mean the confidence interval is 115 and 125. So, that is the confidence interval the lower limit is 115 and the upper limit is 125. The first question is find the margin of error. So, what is the margin of error here that is half the width of the confidence interval which in this case is 5. Give the point estimate of the population mean what is the point estimate of the population mean that is going to be 120 which is nothing but the mean of these two values 115 and 125 right. I think I will stop here we are already overdue. So, we have our mid-sem not the mid-sem the quiz two on next Wednesday coming Wednesday same here and in the IC rooms depending upon where you were in the mid-sem examination. So, you be there the exam starts sharp at 8.30. Any other questions that you may have for me?