 Alright, we are now looking at the sampling distributions of the mean. We started originally with the random variable x, then we started looking at the sample mean x bar. We collected the random variables into one group or set and we created the sample mean x bar x1 plus x2, so on to xn by n and this is also a random variable, it is also having a distribution. But what is the type of the distribution of the random variables forming the sample mean? Okay, we are unaware of it but rather than working with random variables x, as I said earlier we are now going to work more with the random samples. We are going to work with a collection of random variables and we are going to look at the sample means and use them to draw certain inferences. So we should also know the distribution of the sample means, what population they follow. The question is we do not know about the population itself, we do not know about the original population, we do not know whether it is normal or gamma or weyble or what is the type of the distribution, we do not know it is parameters but all we have is only the random samples and they themselves are forming another distribution. Fortunately for us the central limit theorem comes to our rescue, what is the central limit theorem? That is going to be the focus for the next half an hour or so. Since the parent population's probability distribution is usually not known, we cannot also say directly what is the sample statistics sampling distribution. The central limit theorem simplifies matters a lot by stating that even if the original probability distribution of the population is not normal i.e. it is not Gaussian, the sample mean distribution tends towards the normality provided the sample size is high say greater than 30. If there is a non-normal population from where a random sample is picked, if we take a reasonably large sample size, the sampling distribution of the mean is normal. So the important thing is the sampling distribution of the mean is tending towards normality provided the sample size is reasonably large. Further, even for smaller samples, the distribution is still approximately normal if the parent population distribution does not deviate too much from normality, okay. Even if you have a small sample, the sample size is small, the distribution is still approximately normal if the parent population distribution does not deviate too much from normality. So it is indeed fortunate that we have the central limit theorem. So now let us make the formal statement of the central limit theorem. Let x1, x2 so on to xn be a random sample of size n taken from any not necessarily normal population with mean mu and variance sigma squared. Let x bar be the sample mean. The limiting form of the distribution of the standard normal variable z is equal to x bar-mu by sigma by root n as n tends to infinity is the standard normal distribution, okay. So since it is a standard normal distribution, we are using the symbol z. So z is equal to x bar-mu by sigma by root n. The limiting form of the distribution of z is equal to x bar-mu by sigma by root n as n tends to infinity is the standard normal distribution. What we are doing is we are creating a new random variable z by expressing it or defining it in terms of x bar-mu whole divided by sigma by root n. Here x bar is the sample mean, mu is the population mean, sigma is the population standard deviation, n is the sample size. When n tends to a large number, then this random variable tends towards a standard normal distribution. Please recall that the standard normal distribution is something which is having mean 0 and variance of unity. Let us take some interesting examples. The first one involving the role of 2 dice, okay. Number of theory has been built from games. We will be demonstrating the central limit theorem with couple of simple examples, okay. So we have the outcomes tabulated here. All the possible outcomes are tabulated here. You have 1 and 1. That means the first die is showing 1. The second die is also showing 1. The average of 1 and 1 would be 1 and the number of such outcomes is only 1. And when you have the die showing numbers 1 and 2, the first die may show 1 and the second die may show 2 or the first die may show 2 and the second die may show 1. So there are 2 possible outcomes. So that is why the number 2 has been put. And the average of 1 plus 2 and 2 plus 1 would be both 1.5. Similarly, you have for other cases. For example, if you have 3.5 as the average, that may be formed by the combinations 1, 6, 2, 5, 3, 4, 4, 3, 5, 2 and 6, 1. So in such a case, you can have 6 ways of getting the average 3.5. So these are the 6 possibilities through which we can get an average of 3.5. Similarly, we can do for the other averages also. You cannot get an average of 1.33 with 2 dies or you cannot get an average of 5.25 with 2 dies. We can get only these as the possible outcomes. And the numbers are here and so the numbers will add up to 36. There are 6 ways in which the first die can throw up the results. There are 6 ways in which the second independent die will throw up the results. So you have 6 x 6 which is 36 possible outcomes. And the probabilities are calculated based on the number divided by the total number 1 by 36, 2 by 36 and so on. And these are the probability values and they sum up to 1. This can be represented on a graph. We can plot the probability and x bar. So when you plot the probability versus x bar, you can see a kind of a hat. This is definitely not a bell shaped curve but it is more of a hat shaped curve and these are the probabilities. We are talking about discrete probability outcomes and so we can directly mark the probability against the outcome. So 1 was about 0.0, difficult to read, 0.03 is it? It is close to 0.03 which is 0.027 and that is what you have here. So the probabilities are marked against each of the averages that are possible. Let us see what is going to happen when you have 3 dies. It becomes slightly more cumbersome. You have more occurrences of the mean. The discrete probability distribution tends towards the continuous one or appears to be continuous as you increase the number of die. You can now get more possibilities of the mean. You can get 1. You can also get an outcome of 4, the sum of numbers appearing on the 3 die. The sum of numbers appearing on the 3 die can be 4 and the mean value would be 4 divided by 3 which is 1.333. How can you get the number 4? The die will have numbers 1, 2 and 1 and the 3 dies can roll in such a way to get 1, 2, 1 in 3 different ways. It can be 1, 2, 1, 1, 1, 2 and 2, 1, 1. So there can be 3 ways in which the number 1, 2, 1 may arise. Similarly when you look at the outcome as 5, it can be 2, 2, 1 or 1, 1, 3. You can get 2, 2, 1 in 3 different ways. You can get 1, 1, 3 in 3 different ways. The average is 5 divided by 3 which is 1.667. So when you do like that for all the possible cases, you can see that the averages can range from 1 to 6 and there is a finer division of the interval between 1 to 6 because you are having 3 dies. So you look at the frequency of the occurrence and 3 will occur in only one way. The outcome of 3 or a mean of 1 can occur in only one way. Outcome of 4 or a mean of 1.333 can occur in 3 ways. Like that you can see the number of occurrences for all these outcomes or for all these means both are equivalent. You can see that the numbers can be recorded in this table and they can be counted and that would be the total ways in which a number 3 can arise, the sum of numbers on the 3 dies or an average of 1 can realize. There can be 3 ways in which a number of 4 can totally appear on the 3 dies or a mean of 1.333 can arise and so you can have all these possibilities. Since you are talking about 3 dies, the number of possible outcomes are 6 x 6 x 6 x 6 which is 216. So you have 216 here. The probability can be obtained by dividing 1 by 216, 3 by 216, 6 by 216. So you can have all these probabilities and they can add up to 1. Now even with 3 dies, you are taking an average based on n is equal to 3. The sample size is equal to 3. You can see that the distribution is tending towards normality. It is appearing more bell shaped. For n is equal to 2, you had a hat shaped and now you are getting slightly a broader peak. So if the sample size is large, the sampling distribution of the means is normal even if the original population is not normal. If the parent population is normal, the sampling distribution is also normal even for small n. There are 2 distinct cases. The parent population is not normal but the sample size is large. The resulting distribution of the sample means is normal. In the second case, the parent population itself is normal. So even if you take a small sample from such a population and you look at the distribution of the sample means, you will find the sampling distribution of the means is also normal even for small n. For small sample sizes, the sampling distribution of the means is approximately normal provided the parent population does not exhibit a great deviation from normality. Even if the parent population was not normal, it was only slightly deviating from normal and you have a small sample size, a sampling distribution in such a case involving a small sample size would also tend to be approximately normal. We were looking at variance of x1 plus x2. We were also looking at variance of x1-x2 and expected value of x1 plus x2, expected value of x1-x2. The reason for doing that is in our statistical applications, we may wish to compare sample statistics taken from 2 independent normal populations. Let us say that we are taking the sample statistics from 2 independent normal populations. The normal, both the populations are normal from where the samples are taken and sample statistics are calculated. Let us say that the 2 normal populations have different parameters and they are mu1, sigma1 for the first population, mu2, sigma2 for the second population, mu1 is different or may be different from mu2, sigma1 may be different from sigma2 and that is what I meant by 2 populations which are belonging to the same type but they are having different parameters. We know by now that the linear function of the random variables from these 2 independent populations is also a normal distribution because the original random variables where from normal distributions themselves. Let us assume before we go to the most general case, let us assume that sigma is known and so we do not know only the value of mu. Now let us consider a linear function of the independent sample statistics. Let us define the linear function as x1 bar-x2 bar. x1 and x2 are random variables, x1 bar and x2 bar are also random variables, x1 bar-x2 bar would also be a random variable and that would be having a probability distribution. What is the mean mu of such a distribution x1 bar-x2 bar? This can be written as expected value of x1 bar-x2 bar which is expected value of x1 bar-expected value of x2 bar. By now you should be familiar with this, that is why I am not giving you the steps and that can be written as mu of x1 bar-mu of x2 bar. The mean of the first sampling distribution of the mean-the mean of the second sampling distribution of the mean. So this is interesting and this is important. So x1 bar-x2 bar is a random variable. It is having a probability distribution and it will have its variance. What is the variance of the distribution formed by the difference of the two sampling means x1 bar-x2 bar? What is the variance? That would be sigma x1 bar squared plus sigma x2 bar squared. We saw that variance of x1-x2 is equal to variance of x1 plus variance of x2. x1 and x2 can be any random variable. In the present case x1 is x1 bar, x2 is x2 bar. Do not look at it as x1 and x1 bar as very different quantities. x1 is a random variable, x1 bar is also a random variable. x2 is a random variable, x2 bar is also a random variable. So when you are trying to find the variance of the difference of any two random variables, it would still be the sum of the variances of the two random variables in question provided the two random variables were independent and that is why we are talking about two independent populations. So we are having sigma x1 bar squared plus sigma x2 bar squared. Now we have to ask ourselves, what is sigma x1 bar squared? What is the variance of the probability distribution formed by x1 bar? What is the variance of the sampling distribution of x1 bar? The variance of the sampling distribution of x1 bar would be sigma 1 squared by n1. The variance of the sampling distributions of the mean x2 bar is given by sigma 2 squared by n2. Sigma 1 squared is the variance of the first population. Sigma 2 squared is the variance of the second population. Sigma 1 squared by n1 is the variance of the sampling distributions of the mean corresponding to x1 bar. Sigma 2 squared by n2 is the variance of the sampling distributions of the means corresponding to x2 bar. n1 and n2 are the sample sizes for x1 bar and sample sizes sample size for x2 bar. This is a very important concept. I request you to think it over understand it and try to write down the combinations properties on a paper after thinking about these concepts and see whether you are able to understand. Otherwise you again go through the lectures and see where you did not understand. So if the two parent populations were normal in addition to being independent, then the resulting distribution formed by the difference of x1 bar and x2 bar would also be normal and the parameters would be mu of x1 bar-mu of x2 bar. What is mu of x1 bar? What is the mean of the sampling distributions of x1 bar? In other words, what is the expected value of x1 bar? We know by now it should be mu1. Similarly expected value of x2 bar or mean of the distribution formed by x2 bar would be mu2. So you will have mu1-mu2. Similarly, the variance of the distribution formed by the difference of the two sample means x1 bar and x2 bar would have sigma1 squared by n1 plus there is a typo. I will correct it sigma2 squared by n2. So the variance of the distribution formed by the difference between the two sample means x1 bar and x2 bar would be sigma x1 bar squared plus sigma x2 bar squared which is nothing but sigma1 squared by n1 plus sigma2 squared by n2. If the two populations are not normally distributed, then what can you say about the resulting sampling distribution of the mean? It would depend upon the sample size. If you assume that the population from which the random samples were drawn was not very deviant from the normal distribution, then for sample size is greater than 30, the two independent sampling distributions are approximately normal and a linear combination of them would also behave approximately normally. Here what you are doing is quite important. We are now talking of difference between two sample means x1 bar and x2 bar. x1 bar and x2 bar have been taken from two different populations 1 and 2. Please do not confuse with x1 bar and x2 bar being taken from the same population. Now we are talking about two different populations and we are taking samples from these two populations and we represent them by x1 bar and x2 bar. Now we are looking at the resulting distribution we will get based on the difference between the two sampling means and what we are observing is if the sample sizes are greater than 30 in both the cases, the sample taken from the first population is having the size greater than 30. The sample taken from the second population is also having the size greater than 30 and according to the central limit theorem, the two independent sampling distributions would be behaving normally and hence a linear combination of them would also behave approximately normally. So according to the central limit theorem, since the sample size was greater than 30, x1 bar would behave in a normal manner. The sampling distribution of x2 bar would also behave in a normal fashion and the linear combination of them here x1 bar-x2 bar would also be behaving approximately normally. Consider two independent populations with parameters mu1, sigma1 and mu2, sigma2. Let x1 bar and x2 bar be the sample means of the two independent random samples of sizes n1 and n2 drawn from these two populations. So now we are going to define a new random variable based on the difference between the two random sample means. These are two independent random samples drawn from two different populations of mean mu1 and mean mu2 and variance sigma1 squared and variance sigma2 squared. I am talking about the two populations of means mu1 sigma1 squared and mu2 sigma2 squared. Now we are having the sample means x1 bar and x2 bar. We are taking the difference of them. Then we subtract this quantity x1 bar-x2 bar with mu1-mu2 okay. Also note that the expected value of x1 bar would be mu1, expected value of x2 bar would be mu2 okay. As for as the original population as well as the sampling distributions go, their means are identical. The mean of the sampling distribution of the means is equal to the population mean but the same thing is not true with the variance. The sampling distribution of the means will have a variance sigma squared by n where n is the sample size. So as far as the variance is concerned, the sample size comes into play. So we know that the variance of x1 bar is equal to sigma1 squared by n1. Variance of x2 bar, the variance of the sampling distribution of the means for x2 bar would be sigma2 squared by n2. The variance of x1 bar-x2 bar is equal to sigma1 squared by n1 plus sigma2 squared by n2 okay. So when we are making this combination, we are not arbitrarily choosing our mu1 and mu2. We are not arbitrarily choosing sigma1 squared by n1 and sigma2 squared by n2. You may recollect that the standard normal variable was defined as z is equal to x-mu by sigma where mu was the mean and sigma was the standard deviation of the population from where x was chosen. So we are taking x1 bar-x2 bar and we are looking at that corresponding distributions mean which is mu1-mu2 and the variance sigma1 squared by n1 plus sigma2 squared by n2. So that sigma is square root of that would become the standard deviation. So we are defining a standard normal variable because of the central limit theorem, the x1 bar-x2 bar was behaving approximately normally owing to the large sample size. Because of the large sample size for x1 bar, because of the large sample size for x2 bar, both of them according to the central limit theorem would tend to exhibit normal behavior and the linear combination of the 2 random variables x1 bar and x2 bar would also tend towards normal behavior and so we are creating a standard normal variable z for this particular situation and that standard normal variable is given by x1 bar-x2 bar-mu1-mu2 divided by square root of sigma1 squared by n1 plus sigma2 squared by n2. If the 2 populations are normal, okay, right now we are looking at 2 original normal populations then irrespective of the sample size, you are not constrained by a small sample size. In such a situation, x1 bar-x2 bar-mu1-mu2 by square root of sigma1 squared by n1 plus sigma2 squared by n2 will be a standard normal, okay. In the previous case, the original populations 1 and 2 were not normally distributed but large samples were chosen. So the sampling distribution of the difference in means also behaved normally and for large sample sizes n1 greater than 30 and n2 greater than 30, we had the standard normal variable. In the easier case where both the populations are coming from normal distributions, even if the sample sizes for both the sample means x1 bar and x2 bar are small, even then the resulting distribution of the sampling distribution of the means x1 bar-x2 bar would be normal, okay because the parent populations were themselves normal. So this is what I am summarizing here. They are having a large sample size greater than 30, parent distribution is also normal, the statistic involved is x bar, the population mean is mu, variance is sigma squared by n, the sampling distribution is normal. If the sample size is small and the parent distribution is normal, does not matter, the resulting sampling distribution of the mean would be normal. You have a large sample greater than 30, the parent distribution is different from normal, nothing to worry, central limit theorem will help us and the sampling distribution of the mean would be normal with mean mu and variance sigma squared by n. The population mean would also be equal to the sampling distribution mean. The population variance is sigma squared, but the sampling distribution variance would be sigma squared by n, okay. So the sampling distribution variance is sigma squared by n and you have a large sample size, the parent distribution is different from normal. The resulting distribution of the sample means would have mean mu and sigma squared by n, o into the central limit theorem it would be normal. If you have a small sample size less than 30, the parent distribution is only slightly deviating from normal, then also you can assume that the sampling distribution of the mean would be approximately normal with mean mu and variance sigma squared by n. Now let us look at the desirable properties of the point estimators. We have seen that we are estimating the population parameters mu and sigma squared by using sample statistics. We are using the sample mean x bar and the sample standard deviation s to get good estimates, point estimates of the population mean mu and population standard deviation sigma. We are talking about good point estimators. We will qualify it even further by saying them as unbiased point estimators. The sample mean and sample variance give us estimates of the population mean and variance respectively. They are not meant to give us estimates of the sampling distributions parameters. We are talking about samples taken from a population. The samples have been taken from the population to get idea about the population parameters. We are not using the sample estimators to help us to find the sampling distribution parameters. This is an important difference which we should be aware of. We are using the sample estimators x bar and s squared to know about mu and sigma squared of the original population. We are not using x bar and s squared to get us estimates of the sampling distribution properties. Once the information of mu and sigma squared is estimated then it would be helpful for us. How? We will be seeing some examples in the future. The sample mean is expected to give us the population mean mu and sample variance is expected to give us the population variance sigma squared and remember not the sampling distribution variance sigma squared by n, okay. You can say okay fine. I know the sample size n. I know the sigma squared estimated from the sample variance. However conceptually we are querying the population through the random sample and in most cases have only one random sample taken. For unbiased point estimators the expected value of x bar will be equal to mu and the expected value of s squared will be equal to sigma squared. What this means is the expected value of x bar that means the mean of the sampling distribution of the means would be equal to mu and the expected value of s squared is equal to sigma squared, okay. The sigma squared is the population variance and the expected value that s squared will take is also equal to sigma squared. We can prove them. This I have already told you. The sample mean and sample variance are only determined from the available sample data. Sometimes the available sample may be only 1 and it may be also small in number. So whatever we have we have to make do and draw the appropriate estimates. This is very interesting. We have to prove that expected value of s squared is equal to sigma squared. So I am just substituting the definition for s squared here. So since n-1 is constant we can take it out and you are essentially having expected value of sigma is equal to 1 to n xi-x bar whole squared. We have already seen that this will reduce to sigma is equal to 1 to n xi squared-n x bar squared. I request you to carry out the calculations on a paper on your own. If you are stuck you please look at some of the earlier examples we have covered. So expected value of x bar-mu whole squared is the variance of x bar, okay and that we get as sigma squared by n. We also know that the expected value of x bar squared, okay is equal to sigma squared by n plus mu squared. Previously in one of the first example set problems we saw that expected value of x squared was sigma squared plus mu squared. The same concept I am applying for expected value of x bar squared. Instead of sigma squared which was the variance for x I am using sigma squared by n which is the variance for x bar and the mean of x was mu and the mean of x bar is also mu. So expected value of x bar squared is equal to sigma squared by n plus mu squared. Hence we can write 1 by n-1 into expected value of sigma xi squared can be written as n into sigma squared plus mu squared. Since all the random variables were identically distributed for each of these xi squared we will write sigma squared plus mu squared then add it up. I is equal to 1 to n sigma squared plus mu squared n times. So that will become n into sigma squared plus mu squared. Then we write for the expected value of this n into x bar squared. So we will have n and we use the previous result. So expected value of x bar squared is equal to sigma squared by n plus mu squared. We plug it in here and we have 1 by n-1 n into sigma squared plus mu squared- n into sigma squared by n plus mu squared. And so we get n-1 sigma squared in the bracket. This n and n will cancel you will have-1 sigma squared. So this n mu squared will cancel this n mu squared. You will have n-1 sigma squared resulting that n-1 will cancel out with this n-1 and you get sigma squared. So by defining our variance the sample variance in terms of n-1 makes it possible for us to have the sample variance s squared as the unbiased estimator of the population variance sigma squared. If we had n in our definition for the sample variance this expected value of s squared would have been different okay. That is not the same as the population variance sigma squared. Just by making the definition properly in terms of the degrees of freedom given as n-1 for the sample variance. We can see that the expected value of s squared is sigma squared itself. Hence s squared is an unbiased estimator for the population variance sigma squared. So the bias of a point estimator is given by the expected value of the estimator-the actual population parameter theta. We want the bias to be 0. We want the expected value of the point estimator to be theta itself so that we get theta-theta is equal to 0 so that the bias disappears. When you have x bar which is the point estimator for the population mean we are using the random sample mean as the point estimator for the population mean mu. Expected value of x bar was mu and theta was also mu, mu-mu is equal to 0. So the bias has become 0. We can confidently proclaim that the sample mean is an unbiased estimator of the population mean mu. Similarly, we saw that expected value of s squared is equal to sigma squared. So we can proclaim that the s squared, the sample variance is an unbiased estimator of the population variance sigma squared. So concluding, we have seen the point estimation process, okay. We were looking at random samples. The sample means, the sample means also behave as random variables. It exhibited the full-fledged probability distribution and the complication was we do not know about the population parameters mu and sigma. We do not know the nature of the population whether it was normal, log normal or variable but even with so many uncertainties by carefully choosing a sample and by using the sample statistics like the mean and sample variance, we were able to generate estimates of the population parameters mu and sigma squared respectively. And we are also able to show that these x bar and s squared sample statistics were unbiased estimators of the two population parameters. We also talked about the central limit theorem and the central limit theorem is a boon to us because if we choose an adequately large sample size say n greater than 30, the sampling distribution of the mean behaved in a normal fashion even if the original distribution did not belong to the normal classification, okay. So we have covered quite a lot of important ground here and these definitely form the basis for our design of experiments and analysis of statistical data. I would request you to revise the portions up to this point and be clear with the concepts. You do not have to remember the formulae or the rules. It is important for you to understand the concepts, assimilate the concepts and then the remaining part of the course would not only be easy but also enjoyable. You will be able to directly relate to what we have covered up to this point with what you are learning from now on. Thank you.