 ब hayır आ Thats why i came to the courseš on dealing with materials data In the present sessions, we are considering the case The ofparametric estimation of population parameters Let us recall that we discussed in details where need is for parametric estimation the need is that you have an unknown population।ु । । ॿ ४ ू ॉ ४ । २ ४ ४ । १ । । ै ३ ३ । । । ॢ ३ । ० ॉ ॒ And we discussed point estimators in the previous few sessions in which we talked about a maximum likelihood estimator of parameter and we also talked about the method of moments estimator of parameter. We showed in the case of MLE which is maximum likelihood estimator that sometimes when you try to find a maximum likelihood estimator you may have to go and find a solution numerically. While in the case of method of moments which is MLE estimator we found that one has to be careful because at times it can give an inconsistent result and this we showed by estimating the parameter a and b of a uniform continuous uniform distribution a and b and with a typical one particular sample of size 5 we show that it can lead up to inconsistency. Now we are coming to the next session in which we would like to talk about interval estimation. So let me tell you the interval estimation is not something an unknown area to you. It is something that has been used very regularly in the representation of scientific data that we collect through experiment. So first we will start by understanding the data representation with the error or with the accuracy as whichever word you use it. Error is more negative, accuracy is more positive but it talks about the same thing. Data with error as an example of interval estimation then we will do the interval estimation assuming a normal distribution and then we will see that how it also leads to a t distribution if you want to have an interval estimator with population variance is also unknown. So let us begin. Here you can see on one part of the slide I have a data of coefficient of linear thermal expansion and the temperature versus the temperature in terms of Kelvin. So you see that temperatures are straight forward given the coefficient of linear thermal expansion has two components in it. For example here it reads minus 0.38 plus or minus 0.05. When the first value minus 0.38 is called the value and this is called the range plus or minus 0.05 represents the range. Now the first component actually tells you that this is the average value of linear thermal expansion they have got and the second value which gives you the range it actually represents the one sigma limits that the data has given. If you take sigma as a standard deviation of the data then it generally represents plus or minus one sigma limit. It means that the data lies what this really representation says. For example what it says is that your data lies your data at temperature 77 lies between minus 0.38 minus 0.05 and it is less than minus 0.38 plus 0.05. This I have described more carefully in the next slide so let us go through it. So here for example you take a case of 889 Kelvin temperature then at 889 Kelvin temperature the coefficient of thermal expansion is 2.02 plus or minus 0.14. In statistical terms it means that at temperature 889 Kelvin 68% of your data would fall in the range of 1.88 and 2.16. This is called an interval estimation of linear thermal expansion given at a temperature 2889 Kelvin. This is what is called an interval estimator. Remember that if you talk only about 2.02 it is a point estimator when you add a range to it it becomes an interval estimator. So why do we need an interval estimator? Well if you take in this case at 889 Kelvin if we say that the thermal expansion is 2.02 this is what I got when I did my experiment. Suppose someone else does the experiment can you guarantee that it will come to 2.02? No. So this is true for any experiment and therefore we say that a point estimator say sample mean for a population mean only indicates that the value of sample mean is close to the population mean. It gives you an idea where does the population mean lie but how close it is or how many times it will be closer to this. What all values that can population mean take these questions are answered through the interval estimator. So again we start with the example we go with the normal distribution. Let X1, X2, X3, Xn be an n random measurement of an experiment. And assume that this measurements come from the normal population with mean mu and standard deviation sigma. Let expected value of sample mean be mu and we know that the variance of sample mean is sigma square by n. And therefore the standard normal variate X bar minus mu divided by its variance sigma over square root n is distributed as normal 0 1. And therefore from the normal table we can say that probability of 1 minus I am sorry probability of minus 1.96 is less than z is less than plus 1.96 is 95 percent. It means that if you replace z by this formula X bar minus mu over sigma over square root n then this is what it says and it simplifies to say that the mean value lies between sample mean minus 1.96 standard deviation divided by square root n and sample mean plus 1.96 standard deviation divided by square root of n. And this probability is 95 it means that the mu will lie 95 percent of the time between these two limits. This is how it is calculated. Now where does this 1.96 and come from? Let us try to understand this. So here I have normal probability density plot. This is the standard normal density plot and what we already know that what we used in the previous case is that if you take the area under this curve this area under the curve it represents approximately 68 percent of the data. The probability is 0.68. If you take between minus 2 and plus 2 limit this whole area under the curve this whole area under the curve is approximately 95 percent. It is not a very good approximation but it is little less than 95 percent. And if you take between minus 3 and plus 3 limit it you can see that it covers almost the whole data because the tails are very thin here. So it covers 99.73 percent of the data. This is something we have done in the past. I thought we better recall it. Now let us consider present case. The normal probability plot or normal probability tables generally give you probability of half. So if we are in the tables there is normal probability tables. The tables generally have a graph. This I have explained to you in the past. Suppose this is the standard normal probability density function. Then it gives you the value at a value t. It has a tabulated the value t versus the probability under this curve of X less than t. Where X is distributed normal 0 1. This is what is given. Sometimes you have to be careful and I believe in R they do this. Because this both the thing are half probability. It divides exactly into two halves. The probability this side is half. This probability is 0.5 and this probability is also 0.5. So sometimes they take the value t as this. It means that probability that 0 is less than X is less than t. And then that the value where X is normal 0 1. This is what gets tabulated in the tables. So you have to be careful what you see in the tables. So this 1.96 the question is where did it come from. So let us start afresh. So the question is where did 1.96 come from. So you want to have 95% area under the curve. So let us say that this is the area which we would like it to be 95% or 0.95. Now if the table is like this then we do not have directly the value of t here. So what we realize is that this area and this area together that is if you call this area then this area is 1-0.95 divided by 2 and this area is also 1-0.95 divided by 2. This is 1 is the total area you take out the 0.95. These are symmetric so both areas are same. So I have divided them by 2. If you look at this value this comes to 0.95. And therefore the area now let me change the color of the pen. Make it green then if you look at this point onwards this area under the curve the green color that I am showing which is all the area below this particular point is going to be 0.975. And therefore it is this data point that we are looking for and this value turns out to be 1.96. And this is how this value is calculated as 1.96 because you want 0.95 in the center area. So let us move on. So if you look at the data representation it says that as in the example of linear thermal expansion you have a plus or minus sigma value. What I mean in this case is that the sigma actually represents the standard deviation of the distribution. So here we will have to say that in that case we would like to have the data that lie between minus 1 and plus 1 limit of the standard normal variate Z is a standard normal variate. So you would like to Z lie between this minus 1 and 1 then the probability is approximately 0.68 and therefore it becomes probability of X bar minus sigma over square root n because you remember that the variance of sigma the variance of sample mean is sigma by square root n. So it becomes this as an equation and you get a 0.68. So this is how the data linear thermal expansion data is given in this format. This is what we understand and note that this is also referred as a data accuracy. I hope you have understood let us go through it once again so that this concept is clear. The interval estimation what we are trying to do is let us assume that the data n data comes from a normal population with a mean value mu and standard deviation sigma. Then we know that the Z which is a standardised or normalised variate of X bar it is X bar minus mu divided by sigma over square root n because X bar itself as is a normal variate with a mean mu and variance sigma square over n. And therefore this becomes a standardised or normalised variate which is wearing as standard normal distribution with mean 0 and variance 1. And therefore here I explained as to how this number 1.96 has come through this particular process that you have the data which is estimated using this method and the tabulated using this method. So if the table has a t versus probability of X less than t. So we found that if you want to have in the centre 95% of the data it means that you will have 0.25% of the data 0.0 to 5% of the data or the two tail ends. If you add up this tail end into this 0.95 you get this green lined area which is the tabulated area. So in table you have to look for probability is equal to 0.975 and that t value comes to 1.96. Please remember if you are using r distribution to calculate this t value please make sure read the help. I think in all likelihood it takes a value in this manner it ignores the constant half. So please make sure how you calculate your t value and accordingly you have to pick up the value from either the table or from the distribution. Then we said that in data representation earlier I said that it is plus or minus 1 sigma limit. This sigma sigma can be confusing. So here I am clarifying that by sigma I represent the standard deviation of the distribution. But in our case because we are considering the case of the standard deviation of the sample mean and therefore the standard deviation of sample mean will turn out to be sigma over square root n and therefore it is calculated in this manner or in linear thermal expansion is given by x bar plus or minus sigma square root n. Note that this is also referred as a data accuracy. What happens when sigma is unknown? The standard deviation is unknown. You see in the previous case we have assumed that mean is not known and therefore you have given the interval estimation of mean using a standard normal deviation or standard normal variable z. What if sigma is unknown? You are all familiar in that case what we do is in standard normal variation z is defined as x bar minus mu over sigma square root n sigma over square root n. So this sigma gets replaced by the sample standard deviation. It replaces by sample standard deviation and recall and now you know why we are learning we learnt the t distribution. This is t variate x bar minus mu over sample standard deviation divided by square root n is distributed as a t distribution with n minus 1 degrees of freedom. Remember that s square is distributed as a chi square with n minus 1 degrees of freedom s square over sigma square actually and therefore x bar minus mu everything divided by sigma square divided by s square over sigma square s divided by sigma divided by square root n is distributed as chi square n minus 1 degrees of freedom. Please recall we have done this in the past you can go through the previous slides and confirm it. Thus this is a t distribution with n minus 1 degrees of freedom and therefore we would like to find a and b such that probability of a and b with probability of a less than t less than b is 1 over alpha where 1 over alpha indicates the confidence level. Why we are calling it 1 over alpha you will know when we go through the session of hypothesis testing to remain consistent with all the explanation I am calling it 1 o minus alpha and therefore what we infer is that t distributions are a symmetric distribution around 0 and therefore this would simplify to say that probability of minus a is less than t is less than a is 1 over 1 minus alpha in other words these two are if t is a symmetric distribution around 0 then you want to find the two values a and b such that this area is some 1 minus alpha in that case these two are positive and negative to each other and these are two opposite numbers from 0 and therefore I call this a new number a and minus a maybe I should have called it minus b less than t less than or equal to b but it is okay I think you understood it and therefore let us see what happens you want probability of you want an a such that probability of minus a is less than a is 1 minus alpha it means that probability of t less than minus a plus probability a less than t or t greater than a is alpha and due to symmetry you can see that this also says that 2 times probability of t less than minus a is also alpha and therefore you can say that probability of t less than minus a is alpha by 2 and that simplifies to this let us try to understand this through t graph because these are very important points just as in normal say this is a t distribution density function of t distribution this is 0 we are looking for minus a and plus a such that this area is 1 minus alpha so obviously it means that if you add up let me use a different color if I add up this area along with this area it will be alpha or that is the this I call it the this is what probability of t smaller than alpha and this is probability t greater than alpha so probability t smaller than minus a plus probability I am sorry I have said alpha I must correct myself I am saying it correct now alpha is equal to probability of t less than minus a plus probability of a less than t this is what has been shown here now these two are also equal so I am saying that this is alpha by 2 and this alone is also alpha by 2 and therefore you get probability of t less than a is only alpha by 2 again you have to look into the normal probability tables sorry t probability tables it will follow the same procedure as normal there also you will have to check how the probabilities are calculated is it calculated as all under this curve or is it calculated by taking half here and only this so depending on that you decide what should be your value and therefore we call this as probability we can find this value we can call the a value is equal to a value of t with n minus 1 degrees of freedom which gives the this is minus a which gives this probability as 1 sorry which gives a probability as alpha by 2 so this is what it is given here so if then in that case a is minus t over t at sorry it is minus t value with n minus 1 degree of freedom and alpha by 2 probability and therefore this is what the value it has been given here and we have found this equation that you have to find t values from the tables again let us repeat we find that probability of minus a less than t less than a is 1 minus alpha therefore the probability at the two tail ends together add up to alpha and therefore only one tail end would add up to only alpha by 2 and therefore the t value that we need to find refers to t at alpha by 2 and we put those two values and we find it now of course if you can find it in this manner sorry or you have to be make sure which way your table is and then accordingly pick up the value of t but I am going to represent this values in this manner and therefore finally we get when sigma is unknown you get probability of x bar minus s over square root n s is the sample standard deviation so it is s divided by square root n t of n minus 1 alpha by 2 less than mu less than x bar plus s over square root n t of n minus 1 which is degrees of freedom at alpha by 2 is 1 minus alpha if 1 minus alpha is 0.95 and you take n value as 5 and n minus 1 is 4 then you are looking for t 0.025 because you are looking for alpha by 2 which comes to this which is this value and therefore 95 percent accuracy of the data can be given in this format so when sigma is unknown you use the t distribution to estimate the interval in which the mean value of the population would lie so let us quickly summarize we analyze the data representation with the error or with the accuracy and we found that this is same as what we call in statistics interval estimation in fact it has been derived from statistics only earlier with the data representation with error so we started from that and we said that that is what is interval estimation we derived the interval estimation of population mean for a normal distribution when variance is known using standard normal distribution and we derived the interval estimation of population mean for normal distribution or normal population when the variance is unknown