 We are going to do a few interesting problems in this lecture. We will be looking at example problems involving the t distribution, the chi square distribution and the Fisher f distribution. We saw that the t distribution and standard normal distribution had some similarities but they were also quite different. So, the first example demonstrates these differences especially. The t distribution and the standard normal distribution are both centered at the origin. They are both symmetric and unimodal. They have one maximum value. Now, the question is quite simple. Compare the two distributions namely the t distribution and the standard normal distribution when the degrees of freedom for the t distribution is 4. For different degrees of freedom compare probability of –2 less than z less than 2 and probability of –2 less than t less than 2. The first question is really asking you about the shape of the two distributions. The standard normal distribution is shown by the green curve and it will have a mean of 0 and standard deviation of 1. Now, when you look at the t distribution it is having a degrees of freedom as 4 and you can see that it is broader. There is more probability packed in the tail region and then it is also shorter when compared to the standard normal distribution. So, for different degrees of freedom compare probability of –2 less than z less than 2 and probability of –2 less than t less than 2. So, in this table we have the first column. It is running from 4 to 972 degrees of freedom through 1236, 108, 324 and finally 972. It can be seen that the degrees of freedom have been tripled 4 into 3, 12 into 3, 36 into 3, 108 into 3, 324 into 3. And the standard normal distribution it is independent of the degrees of freedom. Degrees of freedom is not a parameter here and hence we have probability of –2 less than z less than 2 that is equal to 0.954. If you look at the t distribution the probability value is less for 4 degrees of freedom when compared to the standard normal distribution. When it increases from 4 to 12 the probability values also increases from 0.884 to 0.931 and when you go to a very large degrees of freedom the standard normal probability and the t distribution probability are pretty much the same and this drives home the point that the t distribution approaches the standard normal distribution when the degrees of freedom tend towards infinity. Let us look at the second example. We have seen this example before but now let us look at another version of that example. So it is known from historical data that the yields of power from a nuclear reactor supplied by XYZ company are normally distributed. So the power is the random variable and that is normally distributed. So the reactor supplied by this company is operated in several power plants around the world. Of course this example is strictly fictitious. The population standard deviation based on process design specification is subject to dispute and is not to be used. We saw why it was so in the previous example set. If the standard deviation given by the company is presumed or taken to be quite high then the industries may feel that this supplying company is hiding behind this large standard deviation, the population standard deviation and so it is getting away with the supply of less power. So it is decided not to use that particular value of sigma then it boils down to the case where we are having a situation with unknown standard deviation. So the average output of power from 6 random measurements taken at a plant using this reactor is 2 gigawatts. The sample mean is hence 2 gigawatts and based on the 6 measurements the standard deviation from the sample is 0.63 gigawatts. 0.63 gigawatts is quite a large fraction about 30%, 31.5% of the mean value that is pretty high. The XYZ company which is supplying such nuclear reactors guarantees an average power output of 2.3 gigawatts from its reactors for a given set of operating conditions. Now the question is can the client accept the company's claim that this lower yield of 2 gigawatts is likely due to random fluctuations okay. The sample is indeed taken from a probability distribution centered around 2.3 gigawatts and there is a high probability that from this sampling distribution of the means that you can pick up a sample which is showing only 2 gigawatts. So the sample size is small and the population variance is unavailable, we do a t-test. Of course the parent population is normally distributed that information is given to us. So the condition for doing the t-test are satisfied. So the degrees of freedom is 6-1 which is 5. So we have to find the probability that the average power can be less than or equal to 2 gigawatts even though the population mean is 2.3 gigawatts. So the population mean is 2.3 gigawatts, the sample mean is 2 gigawatts, sample standard deviation is 0.63 gigawatts and sample size is only 6. So we do a t-test here. We define the t variable as x bar-mu divided by s by root n and there is a typo here. The standard normal form does not apply here because we are not talking about the z random variable, we are talking about the t random variable okay that takes care of the typo. So t is equal to x bar-mu1 divided by s by root n and that is 2-2.3 divided by 0.63 divided by root 6, do not put 5 here, 5 is the degrees of freedom but the sample size is n which is 6 and that number comes to – 1.166. So the probability of x bar less than 2 is equivalent to probability of t less than or equal to – 1.166 and that probability is 0.148. It is not easy to find this probability from the chart because alpha of 0.05, 0.1, 0.25, these are the standard alpha values given in the t distribution tables. So how did I find this 0.148? I used spreadsheet to do it. It will be a good idea for you to become familiar with spreadsheets to calculate these statistical probabilities. Actually there are also online calculators. I really have not checked into those. If you are having a statistical software with you that is very good. You can use Minitab for example to find the probability values. Otherwise there may be online probability calculators which may give you the t values, the t probabilities, the chi-square probabilities and so on. It is always good to have an independent check for your calculations so that you can double check that your reported probability values are correct. So the implication is the probability of the sampled mean being lower than or equal to 2 gigawatts is rather high at 0.148. So the supplier can say that the probability of the reactor supplied by me providing a mean output of less than or equal to 2 gigawatts is rather high at about 0.15 okay. So this is a high probability. So you have to go with my reactor and you cannot really contradict my statement that the average power output is 2.3 gigawatts okay. So looking at the Minitab plot here we have the t distribution drawn for 5 degrees of freedom and when I am looking at it I see that when I locate the t variable value – 1.1664 here the area to the left or in the tail region, the left tail region is 0.148. Now the second part of the question is construct a 95% confidence upper bound for the average power generated using the sample data. So what we are trying to really see here is I am having a pretty low sample average of power output at 2 gigawatts. So we have to construct a 95% confidence upper bound. Usually we were looking at the 95% confidence interval involving the lower limit and the upper limit. Now we are talking about a 95% upper bound only. So we have to report a 95% confidence bound as mu less than or equal to a certain upper limit value. So let us see how to construct this. We know that the definition for the upper bound is mu less than or equal to x bar plus t alpha n-1 s by root 10 is first given. Then we take the probability of mu less than or equal to x bar plus t alpha n-1 s by root n and that is equal to 1-alpha. Here we do not put alpha by 2 because we are talking about a one sided bound and another thing is since we are talking about 1-alpha to be 0.95 alpha value will be of course 0.05 and with this definition we can express the upper bound on mu, the population mean power output as mu less than or equal to x bar plus t alpha n-1 s by root n. So now a sample has been taken and sample standard deviation is known to us, x bar is also known to us, sample size is known to us. So this limit can be easily found. For alpha equals 0.05 we get t 0.05, 5 as 2.015. So mu is less than or equal to 2 plus 2.015 into 0.63 by root 6. So this value is 2.015. So we get mu as 2.5 to gigawatts. So based on x bar value of 2 gigawatts we put an upper bound on the mu and that comes to 2.5 to gigawatts. What do you interpret from this result? If our acceptance or tolerance or penalizing criteria is based on the 0.05 that means if we can say that probability of the occurrence of the low power is below 0.05 then we can reject the company's claim. So we are slowly moving into the hypothesis testing and things will become more clear when we do that. If the population mean is 2.52 gigawatts probability of the random sample mean taking on values less than or equal to 2 gigawatts will be 0.05. So only when the power guaranteed by the company is 2.52 gigawatts, the 2 gigawatts can be considered to be unacceptable with our probability limit of 0.05. Only when mu is 2.52 gigawatts probability of the sample average power falling below 2 gigawatts or equal to 2 gigawatts will be 0.05. If the guaranteed power output is lower than 2.52 gigawatts then our observed power outputs of 2 gigawatts or lower will obviously have a probability higher than 0.05. So if our tolerance is 0.05 probability then until 2.52 gigawatts we have to accept the power yields of 2 gigawatts or lower. So this might be difficult for some of you to understand at this point but if you think about it you will appreciate what I said just now. The same arguments will be presented in hypothesis testing and we can look at it then but both confidence interval tests and hypothesis tests are giving you the same final conclusion. So it is also good to think about the confidence interval approach to decision making. So we have to see whether the guaranteed power output by the supplier falls within this upper bound of 2.52 gigawatts. Obviously the supplier is making a claim of 2.3 gigawatts which is lower than 2.52 gigawatts. If the supplier had made a statement that I am going to guarantee your power of 2.6 gigawatts or 2.7 gigawatts then the probability of the observed random sample mean of 2 gigawatts or lower being taken from a population with mean 2.6 or 2.7 gigawatts will be lower than 0.05 repeating the statement. Suppose let us say the supplier is making a claim or guarantee of 2.75 gigawatts and your random sample is taken a value of 2 gigawatt average only. So the probability of the sample mean taking on values of 2 gigawatts or lower from a population mean of 2.75 gigawatts will be definitely lower than 0.05. So that sample mean could not be considered to be coming from a population with mean of 2.75 gigawatts. If the population mean was 2.52 gigawatts that is the guarantee given by the supplier then the probability value of the sample mean falling below 2 gigawatts provided the guaranteed mean is 2.52 gigawatts will be 0.05. But the supplier is making a guarantee of only 2.3 gigawatts. So the probability of a random sample taking values less than or equal to 2 gigawatts would be higher than 0.05. So on this basis we have to really accept that the deviation from the reported or guaranteed mean value is only because of random fluctuations. So if we put the 2.3 gigawatts as the upper bound what should have been the sample mean which would have led to a probability lower than 0.05 that is an interesting point. So we are now shifting the upper bound from 2.5 to gigawatts to 2.3 gigawatts. Then what should have been the sample mean of X bar. So what is the lowest sample mean which will just not include the supplier claim of 2.3 gigawatts. This should be lower than 2.3 gigawatts. So we know the value of t alpha n-1 t 0.05 degrees of freedom it is 2.015 and we know the standard deviation S and we know the sample size and X bar should be less than 1.78 gigawatts. Only when or if we had observed the average power output from the samples to be 1.78 gigawatts or lower we can question his claim of 2.3 gigawatts saying that my sample mean is as low as 1.78 gigawatts or lower and the probability of the sample mean falling below 1.78 gigawatts if the population mean is 2.3 gigawatts is less than 0.05. So what you are saying is not correct this cannot be because of random variations something is faulty with your reactor. So we were actually getting 2 gigawatts. So the probability value then was 0.148 or 0.15 which was a very high probability. Only if the samples showed a mean of less than 1.78 gigawatts can we really tell the reactor supplier look your supplied reactor is not performing up to its stated performance. So concluding the 95% confidence upper bound that will just not include the supplier claim the mean value of 2.3 gigawatts only if the sample mean had been as low as 1.78 gigawatts. Let us now go to the third example normally when any professional goes out for site visits or field tests or conferences or even vacations they usually take a laptop which has most features like spreadsheet, power point etc. But let us imagine a situation let us say about 20 years back and a neuronal engineer carries out some field measurements he carries out a t test for 9 samples from a polluted lake and he obtains a modulus t value of 2.306. So he has forgotten to take the t tables and he has only the f tables with him. So how will he find the probabilities using the f tables is an interesting problem essentially we have to find the relationship between the t distribution and the f distribution it is rather elegant. We know that the t random variable is given by x bar-mu by s by root n this may be rewritten as t is equal to x bar-mu whole divided by sigma by root n into sigma by s. So the sigma here and the sigma here will cancel out and essentially you are having s by root n which is our original definition of the t random variable. So now we can take square on both sides of the above equation and we get t squared is equal to x bar-mu divided by sigma by root n whole squared into sigma by s squared. So t squared is equal to this becomes the standard the normal variable defined for the sample mean the sample mean has mean mu and standard deviation sigma by root n. So when you write x bar-mu by sigma by root n we get z and we also assume that the parent population from where the samples have been taken are normal that is the implicit assumption made when we are using the t test. So the distribution of the sample means also is a normal distribution and so we are able to standardize it in this fashion. So we get t squared is equal to z squared into sigma by s whole squared and we know z squared we are having a single standard normal variable which has been squared. There is a chi square random variable with one degree of freedom. So we have a chi squared random variable with one degree of freedom and if you are looking at s by sigma squared I can write it as n-1 s squared by sigma squared by n-1 of course the n-1 will cancel out in a numerator and denominator. So we are writing it in this form so that the n-1 s squared by sigma squared may also be related to a chi square distribution with n-1 degrees of freedom. So this is the definition for the chi square distribution. So we have s by sigma whole squared as chi squared n-1 by n-1. So we can write t squared in terms of 2 chi square distributions t squared is chi squared 1 corresponding to z squared divided by chi squared n-1 divided by n-1 this corresponds to s squared by sigma squared this corresponds to z squared. So we showed that t squared may be written as z squared by s squared by sigma squared, z squared is written in terms of a chi square distribution with 1 degree of freedom and the s squared by sigma squared is written in terms of a chi square distribution with n-1 degrees of freedom. So we are having t squared like this and the ratio of 2 chi square distributions with numerator 1 degree of freedom and denominator n-1 degrees of freedom may be expressed in terms of f random variable with 1 and n-1 as the parameters. So we can show that t squared is equal to f1, n-1, t is 2.306 and so t squared is 2.306 squared is equal to f1, 9-1 how did you get this 9 the sample size is 9 so we are going to have 9-1 as the degrees of freedom. So this leads to probability of f greater than 5.317 we are looking at the upper tail probability obviously probability of f greater than 5.317 as 0.05. So if you want to counter check later when the t tables are available we can find what is the probability of t greater than 2.306 plus probability of t less than-2.306 at 8 degrees of freedom and if you add these 2 you will get 0.05 again. When a test involves both sides of the t distribution it termed as a 2 tailed test. Since modulus of 2 is 2.306 it implies that t may take a value of-2.306 or plus 2.306 and the probability calculation should actually involve probability greater than 2.306 probability of t greater than 2.306 plus probability of t less than-2.306. So if you compute this you will get 0.05 which is same probability value obtained from the f distribution. Let us look at the next example I do not know how many of you are cricket fans or how many of you even know cricket. Anyway that is a separate story altogether. Let us now look at the actual example even for those people who do not follow the cricket you can just try to identify the main parameters and then work out the problem. So I think people who are finding certain examples not in their fields or not in their knowledge domain should not get intimidated. So coming to the example 4 the title of the example is Eagle Eye it talks about cricket. There may be example problems which may not be in your field of research or in your field of specialization but I request in such cases the students and viewers of this particular course should not feel intimidated the more important thing is to correctly identify the parameters you have to make sure that you have written down the degrees of freedom correctly that is number 1 and you should also know which is mu and which is x bar which is sigma squared and which is s squared which is sample statistic and which is a population parameter that is very important and once you have written these things correctly you can use the formulae to get the final t value or the chi squared value or the f value and then you have to make sure that you estimate the probability values correctly. So the problem statement goes on like this Eagle Eye is used in cricket to track the trajectory of the ball. The equipment has been tested rigorously on many overseas cricket pitchers over 5 years. After large number of tests it uses the standard deviation sigma in the bounds of the ball pitched at good length as 50 centimeters in its tracking calculations. Please note that I am not talking about the average bounds of the cricket ball on overseas pitchers. I am only talking about the variability in the bounds of the cricket ball in terms of its standard deviation. So even the standard deviation is quite high. For a ball which is being pitched at good length the variability in the bounds expressed in terms of the standard deviation is 50 centimeters. We really do not know what is the average height of the ball pitched at good length. Obviously it must be greater than 50 centimeters. Now this Eagle Eye tracker is brought to India and then tested in 5 independent trials may be the 5 major cricketing centers in the country and based on the 5 trials the measured standard deviation in the cricket bounds for the ball pitched on good length is only 25.74 centimeters. Probably the Indian pitchers have more consistent bounds and hence the standard deviation in the bounds of the cricket ball pitched at good length is smaller at 25.74 centimeters. As far as this problem statement is concerned we are having a sigma value of 50 centimeters and the measured standard deviation which obviously is yes from the random sample is only 25.74 centimeters. So can the Eagle Eye be used reliably to track the ball to give LBW decisions on Indian pitchers because the LBW or like before wicket decisions in cricket is usually made on the trajectory of the ball after pitching. Anyway let us not go too much further into the details. So the solution is so we have to assume that the standard deviation in bounds of the cricket ball came from a population of standard deviation 50 centimeters. So we have to find the probability of observing the bounds of 25.74 centimeters or lower when the population parameter when the population standard deviation is 50 centimeters. So if I am taking a random sample from a population of 50 centimeters what is the probability of occurrence of the random sample statistic value being 25.74 centimeters or lower. So sigma is 50 centimeters s is 25.74 centimeters and n is equal to 5. So the chi square distribution with 4 degrees of freedom is used and chi squared alpha n-1 is n-1s squared by sigma squared n-1 is 5-1 and s squared is 25.74 squared and 50 squared is the sigma squared. Here sigma is known. So this chi squared value comes to 1.06. So we have to find the probability of the chi squared random variable taking on values 1.06 or lower. So probability of chi squared less than 1.06 comes as 1-0.9 which is 0.1. So there is a 10% chance that the sample with the standard deviation in the bounds as 25.74 centimeters could have indeed come from a population with the standard deviation the bounds of 50 centimeters. Whether this is a low probability or a high probability it is up to the decision makers. Normally we specify alpha value to be 0.05. So here we have got 0.10. So this probability value of 0.1 is low or high is left to the administrators of the sport. We will just report the value and then move on. So plotting this using mini tab we can see that the chi squared variable was 1.06. The value taken by the chi squared random variable was 1.06 and the probability below that is 0.099 or pretty much 0.1. Please know that the degrees of freedom is equal to 4 and this is a chi squared plot. You can see that it is skewed to the left. Going to the next example well I do not know how many of you are living in cities which experience lot of power cuts during summer. According to this problem statement in some suburbs power cuts during the summer months are quite common. In one such suburb there was a complete blackout and complaints on the duration of the power cut were quite variable. When there are lot of power outages or shutdowns different localities will experience different lengths of power failure and so the complaints on the duration of the power cut were quite variable. So the electricity board conducted a survey of 25 randomly chosen families from various locations and found that the mean duration of the power cut was 12 hours and the sample variance was 4 hours squared. So the sample variance is 4 hours squared and the sample size is 25 and the associated degrees of freedom would be 25-1 which is 24. The sample mean is of course 12 hours. We will not really use it. If actual data had been given on the length or duration of the power cuts we could have used those actual data and then the sample mean of 12 hours to find the sample variance. But in this particular case the sample variance is directly given to us so we really do not use the sample mean. Moving on what do we have to do? We have to construct a 98% confidence interval on the variance assuming that the population of power cuts in suburbs is normally distributed. Again this example is completely fictitious. Let us see how we go about constructing the 98% confidence interval. So we have 100 into 1-alpha is equal to 98% or 1-alpha is 0.98. So alpha would be 1-0.98 which is 0.02. So alpha by 2 is equal to 0.01. So we can use the chi square distribution confidence interval to find the upper and lower bounds for sigma squared. And so we have n-1s squared by chi squared alpha by 2n-1 less than or equal to sigma squared less than or equal to n-1s squared by chi squared 1-alpha by 2n-1. So sample size was 25, 25-1 into s squared was 4 hours and that divided by chi squared 0.01, 24 that value we can see from the tables as 42.98. And similarly we do the same thing here 25-1 into 4 and chi squared 1-alpha by 2n-1 here we use 1-alpha by 2 please note. So we have to find out chi squared 1-0.01. So we have to find out chi squared 0.9924. So reading the numbers from the tables we get 2.235 less than or equal to sigma squared less than or equal to 8.8398. So the standard deviation is obtained by taking the square root. So 1.5 nearly to 3. So the standard deviation of the power cuts is falling between 1.5 hours to 3 hours. Let us go to example number 6. How will you estimate the probability in distributions involving the chi squared using the f distribution tables? Earlier we saw how to relate the t distribution with the f distribution. Now we are trying to relate the chi squared distribution with the f distribution. If we take the denominator degrees of freedom in the f distribution to be very high tending to infinity then there is a simplification possible. When the sample size is so high we are pretty much sampling the entire population of variances and the sample s squared will tend towards the value of the population variance sigma squared. When you take a larger and larger sample it is as if you are sampling the entire population or as if you are finding out sigma squared itself directly. So s squared will approach sigma squared and that helps us to simplify a few things. So when the denominator degrees of freedom tends to infinity we have s2 squared tending to sigma 2 squared. What happens then? We know that the f distribution is given by s1 squared by sigma 1 squared divided by s2 squared by sigma 2 squared. Since s2 squared approaches sigma 2 squared it will become 1. The ratio of s2 squared by sigma 2 squared will become 1. So we are only left with s1 squared by sigma 1 squared. That we represent as f degrees of freedom in the numerator, infinity. So there is a typo here which I will correct. Dof1 is numerator so the typo is corrected. So f dof1, infinity tends to s1 squared by sigma 1 squared only. So we may write this as dof1 s1 squared by dof1 sigma 1 squared. We also know by definition of chi squared alpha dof1 as dof1 s1 squared by sigma 1 squared. So when we do that this term becomes chi squared alpha dof1 by dof1. So f alpha dof1 numerator degrees of freedom, infinity may be written down as chi squared alpha dof1 divided by dof1. It is very simple. You may want to work it out on a piece of paper. So suppose we want to find f of 0.955 infinity, f of 0.955 infinity. We have to find chi squared 0.955 divided by 5. If you just look at this numerator degrees of freedom used here, used here and then here. So f of 0.955 infinity is 1 by f of 0.05 infinity 5 which is 0.229 and hence chi squared 0.95, 5, chi squared 0.95, 5 is 0.229 into 5 which comes to 1.145. Independently you can use the chi squared distribution chart to verify that the value of the chi squared random variable which has an upper tail probability of 0.95 for 5 degrees of freedom is 1.145. So these are tables of the t distribution. So when we look at the chi squared, so we want a probability of 0.95 with 5 degrees of freedom. So you can see that for probability of 0.95 with 5 degrees of freedom it is 1.145 and that is matching with the value given when using the f distribution. So this completes our discussion on the t chi squared and f distributions. So we have seen a few illustrative examples. We also showed that these distributions may be elegantly related to each other. There are several textbooks available on probability and statistics. Essentially we have only covered the distributions, the random variables so far but these are most important in the design of experiments and analysis of experiments. Without this background it will be impossible for you to really appreciate the various results that are reported in design of experiments. To restate it in a different way, if you have a good appreciation and understanding of the normal distribution, t distribution, chi square distribution, f distribution you will have a firm grip on the concepts involved in design of experiments. Now there is an important bridge linking these distributions with the design of experiments and that will be done through the hypothesis testing that will form the basis for our future lecture or maybe a couple of lectures. What I would like to emphasize at this point is please try to solve as many problems as possible and it is not only enough if you get the answer correctly but also try to understand what the answer is telling to you and how you will interpret and apply the result in real world situations. Thank you. So we will continue on hypothesis testing in the next lecture.