 the course on dealing with materials data. Presently we are going through the sessions on parametric estimation. We have gone through the session, we have described point estimators such as maximum likelihood estimator and methods of moment estimator. Then we discussed in the previous session the interval estimator as a data presentation with a error and we discussed two cases. One is a case of a normal distribution when the sigma is known, the standard deviation of the population is known. Then we can use normal distribution to come up with the interval estimator of the parameter mean, the population mean mu and when the variance is unknown we can make use of t distribution to come up with the interval estimator for the population mean mu. In this present case very loosely I will try to explain what is interval estimation in general and in particular we will take a case of interval estimator of a variance when population is normal and we will briefly discuss one sided interval estimator and then we will talk about the unbiased estimators, outliers and the robust estimators. So, as I said this general approach I am just trying to give you a general feel as to what is being done when you do the interval estimator. Because we started with an example of data representation it is necessary to have a general understanding as to what is happening in this case. So let us consider that we have a sample of size n from a population with a distribution f theta and theta is an unknown parameter. And now we want to find an interval estimator of theta with some confidence level 1 minus alpha. So, what we want to do is we want to find 3 things. We want to find the 2 numbers a and b such and a statistic that is a function of only data d of x 1, x 2, x 3, x n such that probability of a less than b is 1 minus alpha. And this statistic d will be some kind of an estimator or a function of the estimator of theta. So, in this situation let us see that if we have a probability density plot for a statistic d and suppose it takes this form I have very carefully shown an skewed density function because symmetric density function in terms of normal distribution and t distribution we have already discussed in the previous session. So, I am very purposefully showing a very general density function which is a skewed density function. And now we are looking for a and b such that between a and b the area under this density function curve is 1 minus alpha or we can say that area beyond b is alpha by 2 and area below a is alpha by 2. So, for example chi square is one such skewed function and in that case we are looking for a chi square value a which is chi square with n degrees of freedom with alpha by 2 because once again please remember we calculate the tabulate the value as probability of random variable smaller than the given value a. So, here the probability is alpha by 2. So, I am considering it a is equal to chi square with n degrees of freedom alpha by 2 as a value. And b is the chi square n 1 minus alpha by 2 because this is alpha by 2. So, this area is 1 minus alpha by 2 value. I want to make this clear that this points are understood it correctly. So, let us see some things which I feel I should clarify. See when I write chi square n degrees of freedom I mean a distribution ok. But when I say chi square n minus 1 alpha what I mean is a value such that probability of I would say a value such that probability of random variable t less than chi square n minus 1 alpha is equal to alpha. This is what I mean. This is what I will explain in the next session. So, here what I want to say let us make it clear and this is true not just with chi square but the t distribution as well as others. Whenever we say that chi square with n degrees of freedom I mean the distribution chi square with n degrees of freedom. But when we write chi square n degrees of freedom alpha I mean a point or a value such that probability that t less than this value chi square n alpha is equal to alpha. So, please make sure. So, in this case as I explained this is 1 minus alpha this area is 1 minus alpha. So, A we the area below A we call it alpha by 2 and the area above A we call above B we call alpha by 2. So, the first point A is very clear because you are taking probability of value less than A is equal to alpha by 2. So, A has to be equal to in our notation chi square n alpha by 2 instead of alpha we have alpha by 2. But when you look at the value of B what happens is that probability t less than B when you take that is not alpha by 2 it is probability t greater than B is alpha by 2. So, you have to consider the complete probability which is here which is actually 1 minus alpha plus alpha by 2 which turns out to be 1 minus alpha by 2 and therefore, B becomes chi square n 1 minus alpha by 2. This is what I wanted to explain. So, I hope now it is clear and this is how you look into the table because table always generally gives you this kind of these values. But as I and in this case there is no doubt because this is not a symmetric distribution. So, it always gives you this value and therefore, you have to look up the value accordingly. Now, let us go to the interval estimator of population variance. So, we have an n size random sample from a normal distribution with mean mu and variance sigma square and we want to find an interval estimator of sigma square. So, we are assuming that sigma square is unknown. We have to find a statistic. You remember I said that you have to find a statistic which is in some kind of an estimator or you know a representative of an a function of an estimator of the theta. So, here we are we are taking this as the S square or let me put it this way we change the color to black. So, this is my D, this is what I call Dx1, x2, x3, xn in the previous notation. So, I take this D divided by sigma and I find that it follows a chi square n minus 1 degrees of freedom. Please refer to the previous discussion. I think we have discussed this enough number of times and therefore, we find that this value n minus 1 s square over sigma square will lie between a and b. You remember we already found out a and b in the previous slide. So, it is chi square n minus 1 alpha by 2 is less than this value n minus 1 s square over sigma square is less than chi square n minus 1 1 alpha by 2. This is the a and b as we calculated previously. So, now you want to find the interval estimate for sigma square. So, you have to take the inverse you have to take the inverse of this value and therefore, this value will go in the denominator here and this value will go in denominator here. So, remember that this is b and this is a it is because we have taken the inverse function and then multiplied with the then denominator this. So, this becomes an interval estimator and you know that this has a probability 1 minus alpha this is 1 minus alpha and therefore, this provides the interval estimator of sigma square the population variance of a normal population. Now, you see so far we have considered two-sided estimator. It means that the estimator had the bound on two sides probability of a less than b less than b is equal to 1 minus alpha. Suppose we have only one-sided estimator that is we take probability of a less than b is equal to 1 minus alpha or probability d less than b is 1 minus this situation arises. This situation arises when you actually are you know that a certain variable is always greater than certain number or you are not interested whether it is bounded by two sides, but you would just like to know if it is above a certain value or if it is below a certain value. In this case the one-sided interval estimator are needed and I do not think we need to calculate because for every case that we calculated this interval estimator can also be derived using the same statistic b and the probability or the probability density function of that estimator. So, for example, we can say that if you are talking about mu of normal mu sigma square and it is your sigma square is known then you are going to be talking about a less than x bar minus mu over sigma square sigma square root n which is going to be 1 minus alpha and so you are looking for in a normal distribution. This is the normal 0 1, this is 0 you are going to look for a value of a such that this probability is 1 minus alpha. You can look into the normal table and do it. Similarly, if you are looking for the if you are into t distribution you are going to follow the same method. In case it is a chi square distribution with n degrees of freedom then you have a skewed distribution and then you are looking for an a. So, you are going to look for a which is like this, this is 1 minus alpha and therefore you are going to look at this table where this is alpha. Similarly, here also you are looking to the table where this is alpha. So, instead of alpha by 2, alpha by 2 on two sides you will have alpha on one side in the second case you will have alpha on the other side. So, I am not going into the details of this, but it can easily be worked out in the same way as we have done in the two-sided interval. Now, we will move to what is called the unbiased estimators. This is a quality or a evaluation of an estimator which is a for a point estimator. So, we are from interval estimators we are back to point estimator and we will first define what is called unbiased estimators. So, let x1, x2, xn be a random sample from a distribution of theta, theta is unknown and let statistic dx1, x2, xn be an estimator of theta. Then the bias of the estimator is defined by expected value of this statistic minus the value theta, the parametric value theta and if the estimator is if this bias is 0 then the estimator is called an unbiased estimator. So, if you recall from our past we know that I mean our past lessons we know that sample mean is an unbiased estimator of a population mean mu. The sample variance is an unbiased estimator of a population variance sigma square. But remember that maximum likelihood estimator of sigma square which is 1 over n summation xi minus x bar whole square is not an unbiased estimator of sigma square. Please remember MLE of sigma square we generally call it sigma square hat is not an unbiased estimator. It is the sample variance which is an unbiased estimator of sigma square. Sigma square hat maximum likelihood estimator of sigma is not an unbiased estimator of sigma. There is another we would like to talk about another kind of estimator which is called a robust estimator. But before that we would like to understand outliers. What is an outlier? Outlier are the extreme values in the data. So, if you look at any histograms you must have made some histogram plots. Suppose you have got some histograms like this for some data and you find that few points are just lying here. Then these are called outliers in the data. It means that they do not fall into any shape that this histogram takes this separately. So, let x1, x2, xn be a random sample from a population with distribution f with mean mu and variance sigma square. I am not calling it a normal I am calling it a general distribution f which has a population mean of mu and population variance of sigma square. Generally the data falling outside the interval of x bar minus 2 that is the sample mean minus 2 sample variance and sample mean plus 2 sample variance is identified as an outlier. This is a general thumb rule. It is a thumb rule. There is no proof to it. But generally whatever falls outside x bar plus or minus 2s square is identified as an outlier. If you assume the normality of the f distribution in that case x bar minus 2 this interval which is x bar minus 2s square less than x less than x bar plus 2s square is approximately 98%. However, before identifying anything as an outlier it is advised that you better measure the value again to make sure whether it is an outlier or not. It may be a measurement error or we have to realize that when you are experimenting and truly a very different value pops up, a very different value comes up. Maybe you have to sit back and think whether it really indicates some new phenomena or some new theory which you have left out. So, it is not a good practice to just ignore or throw away the outlier data and consider only the good consistent data. It is a good idea to know that these are the outliers in the data and it is very important to report what you are going to do with the outlier when you do the analysis of the data. Now I come to the definition of robust estimator why I am talking about outliers. An estimator D is called robust if it is not affected by any outliers. That is the value of estimator D does not change significantly if there are extreme values in the data. You must be thinking we have heard this before. We have when we were talking about measures of central tendency we said that the average arithmetic average is sensitive to the extreme value while the median of the data is not sensitive to the extreme value and this is exactly what we want to say here. So, if we say that we have a sample then the mean of the mean value of the population can also be estimated by a median of the sample. You remember what is the median if you have odd number of data points then it is exactly the middle value it is xn by 2 which is the median value. If you have even number then you have to take the two middle values and the average of the two gives you the median value. So, if x1, x2, x3 you have n data points and you order them from larger from smaller to the largest in that case the middle value gives you median and that is also an estimator of mean value of the population and as we know that median is robust against affected value and therefore median is a robust estimator of sample I am sorry population mean while average is not sample mean is not a robust estimator of the population mean. So, this is how the robust estimator is defined. Remember that when you find a median of the population you are not going to make any assumption on the shape of the population distribution and therefore median also represents what is known as a non-parametric estimator. This is just for your information that non-parametric estimators are estimators which are based on the ordered value of sample they are based on the ordered values of sample. So, what we would like to say here is that the mean of the population can also be estimated using the median of the sample and median of the sample is a robust estimator because it is not affected by the extreme values or the outliers of the data while mean is. So, though the sample mean is an unbiased estimator of the population mean it is not robust estimator of the population mean is median. So, let us summarize we introduce methods to arrive at interval estimator when population distribution is not symmetric. We try to explain it in a very general terms but we immediately arrive and gave a specific example of chi-square distribution. We also explained how to find the what is also known as cutoff value I did not use this term but whenever you find the values let me write it down here. When you are trying to find a value a and b such that probability of a less than b x1, x2, x3, xn is less than b is 1 minus alpha then a and b are called cutoff values. They are also called cutoff values and we look for it in the respective distribution tables. So, we showed how to look for a and b and how to calculate this a and b in the table. So, we derived the interval estimator for a population variance for normal population which is derived from the chi-square distribution because the sample variance is taken as a statistic and from there the interval estimator of population variance is derived. We introduce the concept of one sided interval estimator. It would be a mere repetition of what we have done for two sided intervals. So, we have not repeated it here. We talk what is called bias of an estimator and what is known as an unbiased estimator. We stated that sample mean is an unbiased estimator of a population mean. Sample variance is an unbiased estimator of a population variance. But we said that maximum likelihood estimator of population variance under normality is not an unbiased estimator of normal population. An unbiased estimator of variance of the normal population. We discussed briefly about outliers. We emphasize that outliers are the extreme values in the data. They should be noted. They should be studied. There should not be a unanimous decision every time remove the outliers. No, because at times outliers represent the measurement errors. So, we may have to conduct the experiment again to make sure that this is actually the value we are getting or it may be an indicator of a new theory or a new phenomena which we had never expected out of the experiment. After brief introduction to the outliers, we introduced what is known as robust estimator. Robust estimator are those which are not affected by outliers and we showed that sample mean is not a robust estimator of a population mean. Though it is unbiased, but it is not a robust estimator, median is a robust estimator of a population mean. Because it does not get affected by outlier as we discussed in the very beginning of descriptive statistics lectures and we also said that median is a kind of a nonparametric estimator of population mean. Thank you.