 Welcome back. We will continue on hypothesis testing. We were looking at an example on type 1 error and we were looking at the judge passing this decision on the hiring practices of the firm. The reference for this example is the book written by Walpole, Myers, Myers and A. Probability and statistics for engineers and scientists, 8th edition published by Pearson Educational in 2007. It has a large number of illustrative examples. There are also several other books which you may want to refer and do the problems which you are comfortable with. Coming back to the illustration, the other illustration where we were looking at the mean impurity obtained from the samples, we want to set the critical value at an impurity level such that the probability is 0.05. This means that the sample mean has to have an impurity of 0.2233 ppm or higher for the shipment to be rejected. Next we come to the normal distribution plot generated with the help of Minitab. This is the normal distribution for the sampling distributions of the means. The mean value of this distribution is 0.2 ppm. This mean value is also the mean of the population and the acceptance regions and the rejection regions are shown in this diagram. It can be seen that the region below 0.223 ppm is the acceptance region and the region above 0.223 ppm is the rejection region. 0.223 ppm is the critical value. It is important to note that the area under the curve beyond the critical value of 0.223 ppm is 0.05. So this is a low probability and only when this value is exceeded, when this value of 0.223 ppm is exceeded, we reject the null hypothesis. We are claiming that the samples are coming from a population of mean impurity 0.2 ppm. We have seen the one-sided or the right tailed test. When do we perform the two-tailed test? We may restate the query in the following manner. Instead of asking what is the probability that the sample taken from the shipment with mean impurity of 0.2 ppm could actually have a mean impurity of 0.215 ppm or higher. We rephrase the statement in the following manner. Actually what we are trying to say is in the first sampling exercise, we are getting a sample mean of 0.215 ppm. Perhaps this 0.215 ppm is deviating from the stated mean value of 0.2 ppm by a certain value. Obviously it is 0.015 ppm. This is a positive deviation. We can also have a negative deviation of minus 0.015 ppm. So if we are giving chances for both positive deviations and negative deviations to occur, then we have to conduct a two-tailed test. Let us see how to do it. So the question we rephrase is what is the probability that the sample taken from the population varies from the population mean by as much as 0.015 ppm or higher. This means the sample can have a mean impurity of 0.185 ppm or lower or 0.215 ppm or higher. So when we take the sample, we have to see what is the probability that the mean impurity may be 0.215 ppm or higher or 0.185 ppm or lower. So we have to find what is the probability of the sample mean being greater than or equal to 0.215 ppm or less than or equal to 0.185 ppm. Because of the symmetry of the normal distribution curve, the probability of the random sample mean exceeding 0.215 ppm is also equal to the probability of the random sample mean falling below 0.185 ppm. So it is enough if we find one of these two probabilities and the resulting value is multiplied by 2. So here we have the normal distribution. Again this represents the distribution of the sample means which is centered around the population mean value of 0.2 ppm. Remember the population mean value is also equal to the random sampling distribution mean value. Here we saw using the right tailed test the probability to be 0.144. So the probability of x bar falling below 0.185 ppm is also 0.144. Since we are now accounting for negative deviations and also positive deviations, we have to add these two probabilities and that would come to about 0.29. When we make a two tailed test, please remember that the alternate hypothesis is stated as h1 mu is not equal to mu0. Mu0 is 0.2 ppm. So we are accounting for either mu greater than mu0 or mu less than mu0. H0 was mu is equal to mu0. H1 is mu is not equal to mu0. In which case we are accounting for the mean being actually greater than the proposed mean of 0.2 ppm or being lower than the proposed mean of 0.2 ppm. Mu greater than mu0 or mu less than mu0. So we generalize by saying mu is not equal to mu0. So even after defining the critical and acceptance regions, we may go wrong by incorrectly rejecting H0. What is the error associated with this eventuality? Incorrectly rejecting H0 is the type 1 error. The probability levels associated with the shaded region of the sampling distribution beyond the critical values are related to the type 1 error. Simply type 1 error corresponds to the shaded region here. Since this is 0.1444, this is also 0.1444, the alpha value will be about 0.29, 0.2888. So it will be about 0.29. That is rather high value of type 1 error. So we can also interpret this shaded region as alpha by 2. This shaded region may also be interpreted as a probability value of alpha by 2. In the case of the one-tailed test, we have written alpha is equal to 0.05. That is also the type 1 error. The error in wrongly rejecting the null hypothesis. So restating, the probability of making the type 1 error is termed as alpha. Alpha is denoted by various names. It is termed as the significance level or the alpha error or the size of the test. I feel more comfortable with the term significance level. What is the relation between confidence interval and hypothesis testing? The confidence interval we may recall is the identification of the bounds for the population parameter mu in this case. The hypothesis testing speculates on a certain value of the population mean. Both the confidence intervals and the hypothesis testing approach based their results on the random sample taken. Only one random sample is taken and usually the sample mean and the sample variances are calculated. Let us now continue with the discussion on the relationship between confidence interval and hypothesis testing approaches. Only one random sample is taken and we have one value of the random sample mean and the random sample variance or standard deviation. We are using the same sample to construct the confidence interval and also carry out the hypothesis testing approaches. Hence, the conclusion given by the confidence interval should match with the conclusion made from the hypothesis testing. So I will show the calculations in the following slides but before we get to that, it is important to qualitatively see what is meant by a confidence bound and what can be the decision that can be taken based on the confidence bound. So just going to the board, I have sketched the different samples and their bounds. The null hypothesis is H0 mu is equal to mu0 which is 0.2 ppm that is the postulated speculated or hypothesized population parameter value. The alternate hypothesis is mu greater than mu0. We got a sample of 0.215 ppm and whenever we get a sample which is having a value higher than the population mean value, we construct a lower bound on mu. We are constructing a lower bound on x bar. We are constructing a lower bound on mu. How do you find the lower bound on mu? x bar-z alpha sigma by root n. Alpha is the chosen level of significance. Sigma is the assumed to be known standard deviation. n is the sample size. X bar is the sample mean we have taken. In this particular case, it is 0.215 ppm. So when we fix the lower bound on mu, we write it as mu greater than or equal to 0.191 ppm. This ppm came from this calculation. X bar was 0.215 and z alpha is 1.645. Sigma was given, root n is also known. So we get mu greater than or equal to 0.1917 ppm. So it can be seen that the 0.2 ppm is falling within the bound. Or since mu is greater than or equal to 0.1917 ppm, it includes the population parameter mu of 0.2 ppm. So we can indeed say that this random sample was taken from a population with the mean impurity of 0.2 ppm. On the other hand, if we had chosen a random sample and the value, we cannot choose a random sample. In fact, we have to take a random sample. So if we had taken a random sample of 0.3 ppm mean and we construct lower bound on mu, then the value would have been here. And you can see that this interval does not encompass or include the population parameter of 0.2 ppm. Then we can say that this random sample could not have come from a population with mean impurity of 0.2 ppm. Let us say what would have happened if we had a sample mean which was lower than 0.2 ppm. Then we could have stated the alternate hypothesis as mu less than mu0. Then what we have to do is reverse of what we did earlier. We have to now find the upper bound on mu okay. So mu should be less than or equal to the upper bound. We are only constructing the one-sided confidence bounds on mu. So x bar is falling here and then we construct the upper bound on mu using the relation x bar plus z alpha sigma by root n and then we see whether this upper bound on mu is such that it includes the population parameter 0.2 ppm. For this case, this confidence bound is including the population parameter. So we can say that this sample mean could have indeed come from a population of mean 0.2 ppm. On the other hand, if we had chosen, if we had not chosen, we cannot choose, I again want to repeat, if we had taken a random sample such that the mean value was pretty low and then we construct the upper bound on mu and that does not include the population parameter of 0.2 ppm. Then we can say that this random sample could not have come from a population with mean impurity level of 0.2 ppm. So now what we have seen is how to relate the hypothesis testing and the confidence interval procedures. As I discussed on the board, we can put the lower bound on mu as x bar minus z alpha sigma by root n and then the mean value mu will be greater than or equal to 0.215 minus 1.6449 into sigma by root n and we get a mu value of 0.1917 ppm. That is what I showed on the board a few minutes ago. Since mu is greater than or equal to 0.1917 ppm, then we have the speculated population mean value of 0.2 ppm and this is included in this interval. So we can accept the null hypothesis or we can also say that the sample has indeed come from a population of mean impurity 0.2 ppm. So when we are looking at the 95% confidence interval or confidence bound, the correct interpretation is the interval headed by the sample mean must have the population mean within its lower bound. That is the end of the interval must be below the hypothesized population value for mu. x bar minus z alpha sigma by root n should be less than or equal to the speculated mean value. The lower bound should be such that it includes the population parameter mu naught. We got the answer as 0.1917 and so it is falling below the required population mean value of 0.2. So hence population mean of 0.2 ppm is included in the confidence bound. There are different types of hypothesis testing. We can say the alternate hypothesis as mu greater than mu naught just as we did in the previous illustration. If we want to play it safe and also to be fair, we can always question that assumption that the random samples picked from a lot will always be exceeding the required mean impurity. There may be some random samples which may also fall below the mean impurity which is good. So we have to account for the case where the mean impurity may be greater than mu naught or less than mu naught. So we then go for the two sided test where the h1 is given as mu not equal to mu naught. It also depends upon the problem. We are really not interested in looking at samples which are having mean impurities lower than the stipulated mean value of 0.2 ppm. So then it is well and good. We are more worried about cases where the mean impurity exceeds 0.2 ppm. Then we have to check. So it is better in such critical cases to go in for the alternate hypothesis mu greater than mu naught. If you are going through the confidence interval approach then set the lower bound, one sided lower bound on mu. Suppose we are not happy with the manufacturer who is consistently sending mean impurity levels of greater than the stipulated mean and so either the same manufacturer or another manufacturer claims because of process improvements he can send specimens which will come from a population of lower mean impurity. So we take the new set of shipments and take a random sample and find the average impurity. If the average impurity is only slightly lower than 0.2 ppm we are rather skeptical. If the mean impurity is coming to be only slightly lower than the stipulated setting of 0.2 ppm we are a bit skeptical and then we have to make the hypothesis statements as mu is equal to mu naught for H naught and for H1 mu is less than mu naught. The new shipment is coming from a population of mean impurity levels lower than 0.2 ppm. So this is how we state the hypothesis problem and draw meaningful conclusions. As an exercise the engineer in a particular company is testing specimens of metals for their mean tensile strength okay. Will he be interested on the lower or upper bound of the tensile strength to avoid failure. Another example is the environmental protection agency is concerned whether a plant is dumping harmful effluents into the river. Will it be interested in postulating that as an alternate hypothesis H1 mu less than mu naught or mu greater than mu naught where mu naught is the acceptable norm for mean effluent levels. In this case mu is not equal to mu naught does not make any sense okay. You have to then see which of the two H1 mu less than mu naught or mu greater than mu naught is the correct alternate hypothesis to be considered by the environmental protection agency. Of course it will be doing samples and then analyzing the samples to find the concentration of the harmful effluents. If it finds the effluents to be above the stipulated protection limit mu greater than mu naught would be the correct alternate hypothesis. The company will say that this is only a random fluctuation. Our company is following all the measures required to minimize the pollutants concentrations in the effluents. So it is only a minor aberration but the environmental protection agency is saying no, no, no we are not convinced the mean impurity levels in your effluents are above the stipulated norm. So you have to decide whether you will go for mu greater than mu naught or mu less than mu naught for your alternate hypothesis. To summarize if the sample mean is greater than the postulated population mean mu naught find the lower bound on the population mean based on sample data and see if it includes mu at the specified confidence level. In such cases if you are going in for a one tailed hypothesis testing choose H1 mu1 less than mu naught. The critical region will be on the right hand side tail of the probability distribution. If the sample mean is lower than the postulated population mean mu naught find the upper bound on the population mean based on sample data and see if it includes mu at the specified confidence level. If going in for a one tailed hypothesis testing choose H1 mu1 less than mu naught. The critical region will then be on the left hand side tail of the probability distribution. So we are coming to the end of the hypothesis testing and also to the end of part one of our course on statistics for experimentalists. Now the ground work has been done to implement these tools in the design of experiments and analysis of experimental data. I request you to go through these lectures once again and refresh the basic concepts. Please remember that all these statistical tools are mainly meant to help us to analyze the experimental data in a proper manner. If the tools are very complex and difficult to implement then they would lose their practical value. These tools are very simple and easy to use. Our job is made easier now in the present times because of the availability of spreadsheets having statistical data and also numerous statistical software such as Minitab. The important thing is to understand what we are doing and how to interpret the numbers which are made available to us through various means. And once we have looked at the numbers we should be in a proper position to communicate these in an unambiguous manner. So in the next lecture we will be starting with the analysis of experiments involving only one variable. We will see you then. Thank you.