 As Salaamu Alaikum, welcome to lecture number 33 of the course on statistics and probability. Students, you will recall that in the last lecture, I discussed with you the sampling distribution of p hat and later we talked about the sampling distribution of x 1 bar minus x 2 bar. Today I will continue with the topic of the sampling distribution of x 1 bar minus x 2 bar and I will begin with an interesting example. As you now see on the screen, suppose that car batteries produced by company A have a mean life of 4.3 years with a standard deviation of 0.6 years. A similar battery produced by company B has a mean life of 4.0 years and a standard deviation of 0.4 years. What is the probability that a random sample of 49 batteries from company A will have a mean life of at least 0.5 years more than the mean life of a sample of 36 batteries from company B? Students, you will appreciate that this is quite an interesting problem and a real life problem because in a real life situation people can be interested in this kind of information. So, how do we proceed? We are trying to solve this question. As you now see on the screen, the first thing to realize is the various pieces of information that we have. For population A, we have mu 1 equal to 4.3 years, sigma 1 equal to 0.6 years and the sample size n 1 is 49. On the other hand, for population B, mu 2 is equal to 4.0 years, sigma 2 is 0.4 years and the sample size is n 2 equal to 36. Now, because both of the sample sizes are large enough to assume that the sampling distribution of x 1 bar minus x 2 bar is approximately normal. Therefore, we can apply this particular distribution in order to solve this problem. Now, to apply the normal distribution, students of course, you must be remembering that we have to convert our variable, which in this case will be x 1 bar minus x 2 bar. We have to convert our variable to z and in order to do that, of course, I first need to compute the mean and the standard deviation of my sampling distribution of x 1 bar minus x 2 bar. So, as you now see on the screen, mu x 1 bar minus x 2 bar is equal to mu 1 minus mu 2 that is 4.3 minus 4.0 and that is 0.3 years and the standard deviation sigma x 1 bar minus x 2 bar is equal to the square root of sigma 1 square over n 1 plus sigma 2 square over n 2 and substituting the available values we obtain sigma x 1 bar minus x 2 bar equal to 0.1086 years. Thus, our variable z given by x 1 bar minus x 2 bar minus mu 1 minus mu 2 over the square root of sigma 1 square over n 1 plus sigma 2 square over n 2 comes out to be equal to x 1 bar minus x 2 bar minus 0.3 divided by 0.1086. Now, the question arises that in this z case formula, we substitute the value of x 1 bar minus x 2 bar students. What is the probability that the difference between the two is at least 0.5 years? That we have to substitute x 1 bar minus x 2 bar equal to 0.5. So, we get that z value that is the area that will give us the required probability. So, as you now see on the screen when we substitute x 1 bar minus x 2 bar equal to 0.5 in the formula of z, we obtain z is equal to 1.84. As you see on the slide the area to the right of z equal to 1.84 is that area which is the required probability. In order to compute this area, we of course, first of all find the area between z equal to 0 and z equal to 1.84 and consulting the area table, we find that this area is equal to 0.4671. Now, since we need not the area between 0 and 1.84, but the area between 1.84 and infinity. Therefore, we subtract 0.4671 from 0.5 which is the area under the standard normal distribution between z equal to 0 and plus infinity. And performing this subtraction, we obtain 0.0329. In other words, the probability is 3.29 percent that in our samples, the first one of size 49 and the other one of size 36, in samples, x 1 bar and x 2 bar in means, in mean lifetimes of the batteries, it is at least 0.5 years. I said that this probability is 3.29 percent. Actually, it is a very small probability. And therefore, if you interpret it in a way that this is a complaint that the second factory is more of a lifetime and you are very less and the difference is half a year or more than that. So, we will not do this on the basis of this information. After all, 3.29 percent is a small quantity. And students, in this particular problem, we have mu 1 and mu 2 values available. Mu 1 is equal to 4.3 and mu 2 is equal to 0.0. So, the real difference is that it is 0.3 years. So, in reality, the difference in the average lifetimes of the two types of batteries is 0.3 years. And then, we select these two samples. Given the reality that the real difference in the average lifetimes of the two batteries is 0.3 years. Let us now proceed to the sampling distribution of p 1 hat minus p 2 hat. The differences between the proportions of successes in two samples drawn independently from two populations. As you now see on the slide, suppose there are two binomial populations with proportions of successes p 1 and p 2 respectively. Let independent random samples of sizes n 1 and n 2 be drawn from the respective populations. And the difference says p 1 hat minus p 2 hat between the proportions of all possible pairs of samples be computed. Then, a probability distribution of the differences p 1 hat minus p 2 hat can be obtained and such a probability distribution is called the sampling distribution of the differences between sample proportions. So, you have noted that whenever we are interested in such a situation that the outcome of the population is categorized in success or failure, then we say that we are dealing with a binomial population and we denote the proportion of successes in the population by small p. And if we have two such populations, then it is obvious that we will denote those proportions in the two respective populations by p 1 and p 2. So, let me now illustrate the sampling distribution of p 1 hat minus p 2 hat with the help of an example. It is claimed that 30 percent of the households in community A and 20 percent of the households in community B have at least one teenager. A simple random sample of 100 households from each community yields the following results. p hat A is equal to 0.34 and p hat B is equal to 0.13. What is the probability of observing a difference this large or larger if the claims are true? In order to solve this question, we assume that if the claims are true, then the sampling distribution of p hat A minus p hat B is approximately normally distributed. We do so because of the fact that both sample sizes are large enough for us to be able to apply the normal approximation to the binomial distribution. If we have assumed that our sampling distribution is approximately normally distributed, then once again we have the same situation as before. We will have to convert our variable p 1 hat minus p 2 hat into z and then only we can find any area that we want to find. So, in order to convert p 1 hat minus p 2 hat into z, what are we to do? Of course, first of all we need to find mu p 1 hat minus p 2 hat and the standard deviation of p hat minus p 2 hat. And as you now see on the screen, we have the two important formulae mu p 1 hat minus p 2 hat is equal to p 1 minus p 2 and sigma square p 1 hat minus p 2 hat is equal to p 1 q 1 over n 1 plus p 2 q 2 over n 2. Without going into the mathematical derivations of these formulae, we would like to apply them in this particular example. And of course, we are keeping in mind that a stands for population number 1 and b stands for population number 2. So, we are quite clear that we could also have said p 1 hat minus p 2 hat. Applying the values that we have in this particular example mu p a hat minus p b hat comes out to be 0.30 minus 0.20 and that is 0.10. Also sigma square p a hat minus p b hat comes out to be 0.30 into 0.70 over 100 plus 0.20 into 0.80 over 100. And solving this expression, the variance of the sampling distribution comes out to be 0.0037. Substituting these values in the formulae for z, we obtain z is equal to p a hat minus p b hat minus 0.10. The whole thing divided by the square root of 0.0037. Now, the question arises that in this formulae, p a hat minus p b hat, what value should we substitute? Students, do you remember that the statement of the question was that the two samples that we have taken out of these two populations of size 100, in the first sample, the proportion of teenagers was 0.34, that is 34 percent. And the second sample, it was only 13 percent. And our question is that if our assumption about the population that there are 30 percent teenagers in the first group, in the first community and 20 percent in the second group, what is the chance that the difference in the samples that we have had is greater than this. So, this means that we have to compute the area that is the area to the right of the difference between these two sample proportions, that is p a hat minus p b hat equal to 0.34 minus 0.13. So, as you now see on the screen, this is exactly what we want to do, p a hat minus p b hat comes out to be 0.34 minus 0.13 equal to 0.21. And when I substitute this value in the formula for z, my z value comes out to be 1.83. In order to compute the area to the right of z equal to 1.83, we first look up the area table and find the area between z is equal to 0 and z is equal to 1.83. And this area comes out to be 0.4664, hence the area to the right of z equal to 1.83 comes out to be 0.5 minus 0.4664 and that is equal to 0.0336. You have seen that in this example, the probability has come out to be a very small number 0.0336 i.e. 3.36% even less than 3.5%. Students, I would like to encourage you to try to interpret this result that you have obtained. You should think that what is its probability in this example? Does it mean that one of the claims is at least one wrong? Or does it mean something or not? You should think about it because it is important for you not just to solve a question in a mathematical way, but to be able to interpret that result with reference to the problem that is at hand. The next point that I would like to convey to you now, students, is that all the sampling distributions that I have discussed with you, the sampling distributions of x bar, p hat, x1 bar minus x2 bar and p1 hat minus p2 hat, all these formulae that I have presented, they are with reference to the simplest technique of sampling and that is simple random sampling. i.e. stratified random sampling or cluster sampling or any other technique of random sampling that we are not considering, we are restricting ourselves to simple random sampling or maybe you remember that I have discussed this with you before that when you have one population and you draw one sample out of that one whole population, that one sample you draw by the lottery method, that is called simple random sampling. That we can say that our population is homogeneous. This is very important. Homogeneous, the elements of the population are similar to each other. If we can say that they are similar with respect to that variable that we are interested in, then we say that the population is homogeneous. Stratified sampling, that the population is heterogeneous. We will assume that we are restricting ourselves to the simplest way of random sampling. Our population is homogeneous and we are drawing one simple random sample. Alright, there is another point that I would like to discuss with you now. So, we will assume that we are restricting ourselves to the simplest way of random sampling. Our population is homogeneous and we are drawing one simple random sample. There is another point that I would like to discuss with you. You remember that I told you that the standard deviation of our sampling distribution is called standard error. I told you that I will discuss it later on. Why is it called standard error? I think this is about time that we have this discussion. Students, standard error represents the standard deviation of the sampling distribution of X bar. If we are talking about mu, we are trying to estimate mu, then X bar is the estimate. We will first make the sampling distribution of X bar and then we will talk about estimation. The next topic that we will discuss is the sampling distribution that we have made of X bar. Students, any one X bar value of X bar value, kajo difference here from mu that is a horizontal distance graphically. And we can say that it is the sampling error. Of course, this term was introduced. I am sure that you remember in the beginning of the course. We discussed that the difference between the sample statistic and the corresponding population parameter is called sampling error. Now, I am repeating again. What is the visual picture of any sampling error? The difference between X bar and mu is the horizontal distance between X bar and mu. And our sampling distribution here, students, there are many values of X bar in our sampling distribution. So, corresponding to the various different values of X bar, some of these distances will be small and some of these distances will be large depending on whether our any particular X bar is close to mu or whether it is far away from mu. So, as you now see on the slide, we have a picture of this type in which some of those distances are small and some are large. When I compute the standard deviation of the sampling distribution of X bar, I hope that you will realize from our previous discussions regarding the standard deviation that I will get a distance which is neither extremely small nor extremely large. I will get a distance which is of intermediate size. So, students, we can interpret this in this way that the standard deviation of our sampling distribution gives us a standard value of the sampling error, a standard value of the distance between an X bar value and mu. So, we say that this is the standard error of this particular distribution. Now that we have had adequate discussion of the basic properties of sampling distributions, it is time for us to move to the very important concept of estimation. As I explained earlier, estimation itself can be divided into two parts. We can about point estimation and interval estimation. As you now see on the screen, point estimation of a population parameter provides as an estimate a single value calculated from the sample that is likely to be close in magnitude to the unknown parameter. The formal definition I have presented to you is very simple. If we describe it in simple alphas, then it is so easy that you will say that there is nothing in it. If I mean mu, then I want to estimate mu. But I do not have the resources to estimate all the populations that collect data from it, so what would I do? I will draw this random sample So it is so easy that you will say that there is nothing in it. Look, if I want to estimate the population mean mu, but I do not have the resources to collect all the data from the entire population, then what will I do? I will draw this random sample and I will find x bar. And x bar is a single value which is going to act as an estimate of mu. Similarly, if my interest is not in mu but in sigma square, that is the variance of the population, then what should I do? I will compute the variance of the sample s square. And obviously, s square is one single value which is going to act as an estimate of sigma square. Now, there is an interesting point here. Two alphas are estimator and estimate. And what is the difference between the two? As you now see on the screen, an estimate is a numerical value of the unknown parameter obtained by applying a rule or a formula called an estimator to a sample x1, x2, so on up to xn of size n taken from a population. Once again, it is very simple. There is no need to be confused. Students, I had said that x bar will act as an estimate of mu. So, x bar is a formula summation x over n. This is the estimator and for any particular sample, when I compute it and suppose I get x bar is equal to 3.7, so this numerical value is 3.7, that is the estimate. As you now see on the slide, if x1, x2, so on up to xn is a random sample of size n from a population having mean mu, then x bar equal to sigma x over n is an estimator of mu and small x bar, the numerical value of capital x bar is an estimate and of course, we can call it a point estimate of mu. This is a point estimate term. Our sampling distribution of x bar is x axis over x bar. So, a particular x bar is one point on that axis and thus we say point estimate and students generally speaking, as you now see on the screen, in general the Greek letter theta is customarily used to denote an unknown parameter which could be mean, median, proportion, standard deviation or any other such parameter. On the other hand, an estimator of theta is commonly denoted by theta hat and sometimes it is denoted by t. It is important to note that an estimator is always a statistic, a quantity computed from the sample and thus it is a function of the sample observations and hence it is a random variable, because the sample observations are likely to vary from sample to sample. In other words, an estimator is a random variable and it has a probability distribution. It is the same distribution that we have been discussing called the sampling distribution of that particular statistic. So, the point to be noted is that in repeated sampling our estimator is a random variable but that will be the estimate in that particular situation, that will be the estimate of the corresponding parameter. Students, having given you the basic definition and concept of point estimation, I would now like to share with you some important properties which are regarded as desirable qualities of a good point estimator. As you now see on the slide, three of the main desirable qualities are unbiasedness, consistency and efficiency and I would like to take them up one by one. An estimator is defined to be unbiased if the statistic used as an estimator has its expected value equal to the true value of the population parameter being estimated. In other words, if theta hat is an estimator of a parameter theta, then theta hat will be called an unbiased estimator of theta if E of theta hat is equal to theta. On the other hand, if the expected value of theta hat is not equal to theta, then theta hat will be said to be a biased estimator. This is very easy to understand. We have already considered this property that mu x bar is equal to mu, that is equal to the population mean students. This is exactly what we are talking about here. Mu x bar is equal to mu, that the expected value of x bar is equal to mu. After all, expected value, the mean of something, so mu x bar, the mean of the sample means or expected value of x bar, so if we are saying that expected value of x bar is equal to mu, then you have seen that the statement given just now has been fulfilled and we can say that x bar is an unbiased estimator of mu. But, that is a very important interpretation because your concept is very clear. Now, when I say that x bar is an unbiased estimator of mu, what I mean is that the sampling distribution of x bar is centered at the population parameter mu. As you now see on the slide, the center of the distribution of x bar is at that point where the population value mu lies and if on the contrary, we had a situation where the sample statistic is not an unbiased estimator of the corresponding parameter, then the picture would be like this that the mean of the sampling distribution of our statistic would either be to the left of the population parameter or to the right of the population parameter. But, if unbiasedness property prevails, then this means that the sampling distribution of my statistic is centered at the population parameter, not on the left, not on the right, but exactly on the parameter value. Let us apply this concept to the example of the ministry of transport test to which all the cars are required to be submitted. As you now see on the slide, our example was that we are examining the case of an annual ministry of transport test to which all cars irrespective of age have to be submitted. The test looks for faulty brakes, steering, lights and suspension and it is discovered after the first year that approximately the same number of cars have 0, 1, 2, 3 or 4 faults. Students, you will recall that based on the information that I just presented, we can say that our population is uniformly distributed because all values of x, 0, 1, 2, 3, 4, they are equi probable and hence the probabilities are 1 by 5, 1 by 5 and so on. After that, you will remember that we considered that we are considering a very small sample of size 2 from this population of cars and we noticed that we had many possible situations, many different samples. How many? You remember capital N raised to small n, 5 raised to 2, 25 samples and after that you will remember that we found x bar for every one of those possible samples and then we constructed the sampling distribution of x bar. So, now, what is happening is that the mean of my sampling distribution, the expected value of x bar should be exactly equal to the population parameter, the population mean. So, students, you will remember that the population of our, when we computed the answer was 2 and when we found the mean of our sampling distribution, which you now see on the screen and in which the x bar values are 0.0, 0.5, 1.0 and so on and the probabilities are 1 by 25, 2 by 25, 3 by 25 and so on, when we computed the expected value of x bar given this particular probability distribution, the expected value of x bar came out to be exactly equal to 2. So, you can see that the property of unbiasedness is fulfilled. And this is exactly what I said earlier that if you plot this graph, then the mean value 2 is exactly at that same point where you have mu, the mean of the population because mu is also equal to 2. Whether we sample size brahadi n is equal to 3, n is equal to 4 or any sample size students, this property is valid in the case of x bar and mu. Regardless of the sample size, we always find that the sample mean x bar is an unbiased estimator of mu. Now, the question is, what is the situation of estimators of various parameters? Now, you can also do the same computations in those cases, but I will only present a summary, a kind of a summary of the various situations that you could have and as you now see on the screen, the sample median x tilde is an unbiased estimator of mu, if the population is normally distributed. In other words, if the random variable x is normally distributed, then the expected value of x tilde is equal to mu. As far as p, the proportion of successes in the population is concerned and p hat the proportion of successes in the sample that we draw from that population, we find that the expected value of p hat is equal to the expected value of x over n, where x of course represents the number of successes in our sample of size n and applying a little bit of algebra, we find that the constant n comes out of the expectation sign and so we have E of p hat is equal to expected value of x divided by n, but we have already learnt that expected value of x in case of a binomial distribution is equal to n p, we do remember that the mean of a binomial distribution is n p and so substituting n p instead of expected value of x, we obtain expected value of p hat is equal to n p over n and that is equal to p the proportion of successes in the population. So, you have seen that a simple derivation that is proved for us that the sample proportion p hat is an unbiased estimator of p. The sample variance s square is this also an unbiased estimator of the corresponding parameter sigma square, this is not the case because it can be mathematically proved that the expected value of s square is not equal to sigma square. Now, this particular fact brings us to another important point, students the difference between the expected value of our estimator and the true value of the parameter, this difference is called the amount of bias and it is positive if the expected value of our estimator is greater than the parameter value and the bias is negative if the expected value of our estimator is less than the parameter value. It is very interesting that the sample variance is not an unbiased estimator of the population variance, but it is obvious that if unbiasedness is a desirable property which is there, then we would like to say that our variance formula is something like that if the unbiasedness property is fulfilled. So, students we do modify the formula of capital s square and we define the sample variance in a slightly different form. So, that this new version becomes an unbiased estimator of the population variance and as you now see on the screen, we define small s square is equal to summation x minus x bar whole square divided by n minus 1 and it can be mathematically proved that this new formula is indeed an unbiased estimator of sigma square. In other words, expected value of small s square is equal to sigma square, students what we have done is a slight modification in the variance formula that we divided the numerator into n minus 1. This does not make much difference. We are actually trying to measure the spread of the data set. The concept of the spread of the data set is more related to the numerator than the denominator. After all what is the numerator? The sum of the square deviations of the observations from the mean. So, if we have measured the deviations, then after that if we have divided the sum of n minus 1, then the basic concept remains the same. Before we proceed to the next desirable quality of a point estimator, students, I want to talk about why we are saying that unbiased is a desirable property. For example, if you talk about mu and you talk about x bar, which is to act as an estimator of mu, some of those x bar values are going to be greater than mu, some of the x bar values are less than mu. So, we are saying that our procedure is that on the average, my x bar becomes identical to mu. We cannot guarantee that it is going to be equal to mu. Some of the x bars I repeat are less than mu, some of the x bars are greater than mu, but if x bar is an unbiased estimator of mu, this is the concept of unbiasedness and that is why it is regarded as a desirable and important property in point estimation. What is the next property that I would like to discuss with you students? It is called consistency and as you now see on the slide, an estimator theta hat is said to be a consistent estimator of the parameter theta, if for any arbitrary small positive quantity E, the limit of the probability that the absolute difference between theta hat and theta is less than or equal to E, the limit of this probability as n tends to infinity is equal to 1. I am sure, students, let us look at this expression and this statement step by step in a methodological way and let us just relax. What do we have inside the expression of probability? In other words, the estimator is very close to the parameter or probability that should tend to 1 as the sample size n tends to infinity. Now, this is a mathematical way of expressing it. If we want to say it in very easy words, what we want to say is that if you increase the sample size, then the probability increases that our estimator and our parameter have very little difference. That is, our estimator is a very good estimator of the parameter. So, agar aisa ho kisi bhi estimator ke sils le me, then we say that that particular estimator is a consistent estimator of the corresponding parameter. Students, ye jo discussion mein aap se ki sevaze hai ke consistency jo hai that is a large sample property. Kyuke chote sample pe to hum ye baat karhi nahi satte nahi. Jaise-jaise sample size bharta jaaye agar aisa ho, then we say that our estimator is a consistent estimator of the corresponding parameter. Ek baat ye note karne ki hai that a consistent estimator may or may not be unbiased. Yani consistency aur cheeze hai unbiasedness aur cheeze hai aur ye munkin hai ke ek estimator jo consistent hai wo unbiased bhi ho. Lekin aisa bhi munkin hai ke ek estimator consistent to ho, but it is not unbiased. Let us consider a few examples that you now see on the screen. The sample mean x bar which is an unbiased estimator of mu is also a consistent estimator of mu. The sample proportion p hat is also a consistent estimator of the parameter p. The median is not a consistent estimator of mu if the population has a skewed distribution and the sample variance capital s square. Although it is a biased estimator, it is a consistent estimator of the population variance sigma square. Generally speaking, it can be proved that a statistic whose standard error decreases with an increase in the sample size, that statistic will be consistent. In today's lecture, I discussed with you the basic concept of point estimation and then we went on to some desirable qualities of a point estimator. I discussed unbiasedness and consistency and in the next lecture, I will be talking about efficiency. After this, we will discuss briefly the various methods of point estimation. Until next time, best of luck and Allah Hafiz.