 Hello everyone, in today's lecture we will be looking at the F distribution. The reference book for this topic is the one written by Montgomery and Runger. Let me give you a brief introduction. So far we have been looking at the sample mean and the distribution of sample means. The variance of the population is also an important parameter and we want to also compare two variances and make decisions on them. So in this connection the F distribution is applied. Here we compare ratios of variances in order to infer whether they are comparable to one another or one is much different from the other. The F distribution developed by Fischer is widely used for this purpose. What are the assumptions mean? The two populations from which the variances were measured for comparison are both normally distributed. The population parameters Mu1 Mu2 and standard deviations sigma1 and sigma2 are not known. Let us assume that we have taken two random samples of sizes N1 and N2. The sizes need not be equal. In other words N1 need not be equal to N2. Random variable F is defined as the ratio of two independent chi square random variables CD1 and CD2. Each of it being scaled by its associated degrees of freedom. So F is equal to the chi square distribution 1 divided by M1 where M1 is the degrees of freedom associated with the chi square distribution 1. If the second chi square distribution is represented by CD2 then we scale it by its associated degrees of freedom namely M2. So we define the F random variable as CD1 by M1 whole divided by CD2 by M2. The random variable F is non-negative and the distribution is skewed to the right. So the probability distribution function when plotted on the graph does not give usually a symmetric curve. It is giving a curve that is skewed to the right. Even though it is quite similar to the chi square distribution shape the two parameters M1 and M2 help to tweak the shape of the distribution. We have seen both in the case of the T distribution as well as in the case of the chi square distribution the degrees of freedom K was the parameter. In the F distribution also we have two parameters M1 and M2. If you recollect the chi square distribution 1 with its associated degrees of freedom M1 was divided by chi square distribution 2 with its associated degrees of freedom M2. So we can say the F distribution has two parameters M1 and M2 where M1 and M2 are the degrees of freedom for the first chi square distribution and the second chi square distribution respectively. So these two parameters M1, M2 may be changed to change the shape of the distribution. In certain cases you may want to fit a probability distribution to your experimental data to see from which family of populations your experimental data is more fitting. So when you have an experimental trend you want to fit a distribution to it and if you have two parameters then you have more flexibility or more possibilities of fitting nicely the curve to the experimental data points. So M1 and M2 help to tweak the shape of the distribution. So again as usual we will show the mathematical form for the probability density function and that would represent the F distribution with M1 degrees of freedom in the numerator and M2 degrees of freedom in the denominator. So we are having M1 as a degrees of freedom in the numerator, M2 as a degrees of freedom in the denominator. So do not try to take M2 to the top and M1 to the bottom and say M2 is the numerator degrees of freedom it is not like that. You are focusing on the chi square distribution 1 which is having M1 degrees of freedom and chi square distribution 1 is present in the numerator and similarly you have chi square distribution 2 which is present in the denominator and we talk of M2 degrees of freedom in the denominator. So this is an impressive or difficult looking probability distribution function it depends on how you want to look at it fortunately or unfortunately we will not be really using this distribution in our calculations of probabilities we would be rather using the probability charts for this purpose however it is useful to see the shape of the distribution and also the mathematical form of the distribution. So this is again the gamma function in connection with the chi square distribution I gave a brief introduction to the gamma function. So here you have gamma of M1 plus M2 divided by 2 obviously M1 and M2 must be integers and they may be 3 and 2 for example or 5 and 7 or it may be 8 and 5. So what I am trying to say is here you may have gamma 6.5 or gamma 4.5 and values for such non-negative numbers do exist. So M1 and M2 must be integers and obviously you cannot have a degree of freedom of 0 the degree of freedom should be at least 1 and another important thing to note here is the independent variable x it is present twice one in the numerator here and another in the denominator and it can take only positive values. So the Fisher distribution is describing the ratios of 2 variances the variances themselves are positive quantities. So here we have x taking only positive values ranging from 0 to infinity. The mean for the f distribution is given by M2 by M2 minus 2 and the variance of the f distribution is given by 2 M2 square into M1 plus M2 minus 2 divided by M1 into M2 minus 2 whole square by M2 minus 4. So to ensure that these parameters do not blow up we have to make sure that the degrees of freedom M2 is greater than 2 as far as the mean is concerned and for the variance M2 should be greater than 4. Now we are going to look at the percentage point of the f distribution we have already seen the percentage point for the standard normal variable also for the t random variable and the chi square random variable on similar lines we are going to define the percentage point for the f random variable. The percentage point of the f distribution is denoted as f of alpha M1 M2 with numerator M1 and denominator M2 degrees of freedom respectively. So alpha is the level of significance we were using it to define the confidence interval and alpha was typically taking values of 0.01, 0.025, 0.05 etc. So the sequence is alpha must come first then the numerator degrees of freedom and then the denominator degrees of freedom. The percentage point of the f distribution is defined such that probability of the f random variable greater than f of alpha M1 M2 is equal to the same value given as the lower limit it is a numerical value. So that value comes as the lower limit the upper limit is infinity and then f of x dx that is equal to alpha. So define a number f of alpha M1 M2 such that when it is used in the lower limit and then the probability density function which we saw a couple of slides back is incorporated here and the necessary integration is carried out we get alpha. So it is a kind of an inverse problem what is the f alpha M1 M2 such that the probability of the fissure random variable exceeding this number is alpha. So we always talk on the percentage point values in the upper tail of the f distribution the percentage points in the lower tail is given by f of 1-alpha M1 M2 you have a distribution you locate a point anywhere between 0 to infinity then what happens is the area under the curve beyond f alpha M1 M2 will be alpha and then okay so that constitutes the upper tail region since the total probability is equal to 1 the area under the curve below f of alpha M1 M2 will be 1-alpha. So the probability is 1-alpha in the lower tail region if it is alpha in the upper tail region usually f of alpha values are reported if we want to have the lower tail value f of 1-alpha M1 M2 it can be shown that f of 1-alpha M1 M2 is equal to 1 by f of alpha M2, M1. So what we have to note here is we are changing 1-alpha to alpha when we go from numerator to denominator and the sequence of the degrees of freedom also gets interchanged it is M1 M2 originally then it becomes M2 M1. So we defined f of alpha M1 M2 such that or we identify the number f of alpha M1 M2 such that probability of f greater than f of alpha M1 M2 is equal to alpha. What does this mean? It means probability of chi squared M1 divided by M1 whole divided by chi squared M2 whole divided by M2 is less than or equal to f of alpha M1 M2 that is equal to 1-alpha. So the first expression defines the upper tail region the second where I am substituting the definition for f here in terms of the 2 chi square distributions I am having it as probability of chi squared M1 by M1 whole divided by chi squared M2 by M2 is less than or equal to f of alpha M1 M2 and that is equal to 1-alpha the lower tail probability or area under the curve. We can cross multiply so we can take chi squared M2 divided by M2 here chi squared M1 divided by M1 below then the inequality sign will change and then you will have 1 by f of alpha M1 M2 let us see what happens. This is what we will get if you want to check it up please do it separately after pausing. So here we have probability of chi squared M2 by M2 divided by chi squared M1 by M1 1 by alpha M1 M2 that is equal to 1-alpha here the subscript is M2 and the subscript is M1 corresponding to the degrees of freedom associated with the chi squared distributions here M2 and M1 are the degrees of freedom okay and you have to be a bit careful here because M1 and M2 are numerator and denominator degrees of freedom when you define the upper tail probability and even though now you get chi squared M2 by M2 chi squared M1 by M1 please do not lose track of the degrees of freedom and that was shown to be greater than or equal to 1 by f of alpha M1, M2 and that is equal to 1-alpha. So this is a f random variable this also means that this random variable is greater than or equal to f of 1-alpha M2, M1 is equal to 1-alpha. So here also we are defining f of 1-alpha M2 M1 such that the upper tail probability is equal to 1-alpha. So both of them are the same here also it is 1-alpha here also it is 1-alpha we are talking about the same f random variable and hence this value should be equal to the value shown below. In other words f of 1-alpha M2 M1 equals 1 by f of alpha M1 M2 quite a nice derivation. So f of 1-alpha M2 M1 equals 1 by f of alpha M1 M2. Now let us look at the ratio of variances let us take 2 random samples and we will denote the random variable x as xij, i standing for the random sample index and j standing for the elements of the random sample i. So i will take values of 1 to 2 and j will take values of M and n. Let x11 x12 so on to x1 m be the members of a random sample from population of mean mu1 and variance sigma1 square. If you want you can say that I have taken this random sample comprising of M entities from population 1. So if you choose population 2 and take a random sample from the population 2 of size n we call the random variables sampled as x21 second population first random variable x22 so on to x21 and of course here n is the sample size taken from the second population and the second population has mean mu2 and variance sigma2 square. The sample variances may be computed in addition to the sample means. Now we are focusing more on the sample variances. We do not really use the mean of the random samples in our analysis here. This is in contrast to the analysis we were doing in the case of the distribution of the means. For example when we set up the confidence interval we used in certain situations these sample variance s squared. But now in our f distribution analysis we are not talking that much about x bar the random sample mean but we are talking mostly about s1 square and s2 square. In fact so far in our discussion x1 bar or x2 bar did not figure in them. So we have s1 square and s2 square as the sample variances from the two random samples we have chosen. So we have size m and size n. So the associated degrees of freedom are m-1 and n-1. We chop off 1 from the degrees of freedom because not all the squared deviations from the mean are independent only m-1 and n-1 independent squared deviations exist. Of course you use the sample mean implicitly because when you want to calculate the sample variances s1 square and s2 square we use x1 bar and x2 bar. So implicitly the sample mean is necessary but explicitly it is not appearing in these calculations. The two populations are also assumed to be independent. Since the f random variable is the ratio of 2 chi square random variables each scaled with this associated degrees of freedom we have f is equal to m-1 s1 square by sigma1 square that refers to the first chi square distribution that we are dividing by m-1. Next we are going to the denominator. We are talking about the chi square distribution in the denominator that is given by n-1 s2 squared by sigma2 squared and that we are scaling by the degrees of freedom associated with the second chi square distribution and that is n-1. Obviously this m-1 and m-1 will cancel n-1 and n-1 will also cancel. So we are simply left with s1 square by sigma1 square divided by s2 square by sigma2 square. We may write it in a simple and compact fashion as s1 square by sigma1 square divided by s2 squared by sigma2 squared. Now we can develop confidence intervals on the ratio of 2 variances. What are the upper and lower limits that bound the ratio of 2 variances. Remember both sigma1 squared and sigma2 squared are not known. They are the first and second population variances respectively. We do not know them and so we have to consider the confidence interval for the ratio of the 2 unknown variances. In the usual manner we define probability of f of 1-alpha by 2 m-1 n-1 is less than or equal to s1 squared by sigma1 squared by s2 squared by sigma2 squared is less than or equal to f of alpha by 2 m-1 n-1 that probability is equal to 1-alpha. So we have to identify 2 limits the lower limit here and the upper limit here such that the probability of the f random variable lying between these 2 limits is equal to 1-alpha. The 100 x 1-alpha percentage confidence interval on the ratio of the variances sigma2 squared by sigma1 squared is f of 1-alpha by 2 m-1 n-1 s2 squared by s1 squared less than or equal to sigma2 squared by sigma1 squared less than or equal to f of alpha by 2 m-1 n-1 s2 squared by s1 squared. So this is the confidence interval for the ratio of the 2 variances sigma2 squared and sigma1 squared. The important thing to note here is whether you are using sigma1 by sigma2 whole squared or sigma2 by sigma1 whole squared and depending upon which is present this also will get affected. So you have to be a bit careful whenever you are setting up the confidence interval for the ratios of 2 variances. You have to make sure that the correct numerator and denominator degrees of freedom are present in the upper and lower limits. If you look at this so if I simplify this sigma2 squared will go to the numerator sigma1 squared will come in the denominator and this will remain as they are then I can take multiply s2 by s1 whole squared throughout and so this will cancel out. So you will have f of 1-alpha by 2 m-1 n-1 into s2 squared by s1 squared less than or equal to sigma2 squared by sigma1 squared less than or equal to f of alpha by 2 m-1 n-1 s2 squared by s1 squared that is equal to 1-alpha. So the confidence interval is defined after the sample is taken in s1 squared and s2 squared are known in the fashion shown here. So once the 2 samples have been taken we can compute their sample means and the sample variances. So we can compute these 2 values and then we can define the confidence interval around sigma2 squared by sigma1 squared. Here s1 squared and s2 squared are the sample variances of size m and n taken from 2 independent normal distributions of unknown variances sigma1 squared and sigma2 squared. So again we have to assume that the parent populations are normal. Here we may define f of 1-alpha by 2 m-1 n-1 and f of alpha by 2 m-1 n-1 as the lower and upper alpha by 2 percentage points of the f distribution with m-1 n-1 degrees of freedom respectively. So here we are talking about the lower tail probability corresponding to f of 1-alpha by 2. We are finding out the alpha such that s2 squared by s1 squared represents a number and the probability below this value is alpha by 2. So that is why it is called as the lower tail. So in the region below this number the area under the curve is alpha by 2 and the upper tail probability would obviously be 1-alpha by 2. So we are identifying 2 numbers f of 1-alpha by 2 m-1 n-1 f of alpha by 2 m-1 n-1 alpha by 2 percentage points of the f distribution with m-1 and n-1 degrees of freedom respectively. So now let us take a quick look at the f distribution tables. So a colorful table is presented before you please read the text at the bottom. Numerator degrees of freedom are horizontal and the denominator degrees of freedom are vertical and they are shown in red. The numerator degrees of freedom go horizontally in this direction and they are shown in blue. Again here what we have to do is look at the parameter given here alpha is equal to 0.1. So for an alpha value of 0.1 that means the probability value is given as 0.1 more specifically the upper tail probability value is given as 0.1. Then if the numerator degree of freedom is 4 and the denominator degrees of freedom is 5. So I will go like this. I will go like this. So the required f value is 3.52. So probability of the f random variable greater than 3.52 will be equal to 0.1. The next table gives an alpha value of 0.05. Earlier we saw the alpha value of 0.1. Now it is alpha is equal to 0.05. So we are having more and more information that has to be presented. Earlier we were looking at the normal distribution chart, the standard normal distribution chart and we had only one chart or one table and there you are given the z values and you could read out the probability. But when you went to the t distribution you had the additional parameter the degrees of freedom. You had one additional parameter that is the degrees of freedom and then what you had to do was identify a probability value, identify a degree of freedom and then find the corresponding t value. Same thing happened in the chi square distribution. But when you look at the f distribution you have the numerator degrees of freedom and denominator degrees of freedom and so you need more tables or more charts. So one chart is for one probability value. So here you have alpha is equal to 0.05. Numerator degrees of freedom, denominator degrees of freedom. So once you know the alpha value or the probability value and you know the numerator degrees of freedom, one of these and the denominator degrees of freedom you can then find the corresponding f value. What is the f value that will lead to a probability of 0.05 with 6 and 3 degrees of freedom. So I will locate 6 which is the first degree of freedom located in the numerator and then I will locate 3 come here and I will get 8.94. Similarly other f values for other degrees of freedom may be estimated. Similarly we have another table where the probability is different. Now the alpha is equal to 0.01. What is the f value which will give me a probability of 0.01 if my degrees of freedom are 7 and 8. So 7 numerator, 7 and then 8 is here. So their corresponding f value is 6.18. So now let us look at a few problems. If there are two independent random samples of sizes N1 equals 7 and N2 equals 13 and they taken from a normal population, from a single normal population what is the probability that the variance of the first sample will be at least 3 times as large as that of the second sample. So we define the f random variable as s1 squared by sigma 1 squared divided by s2 squared by sigma 1 squared okay because we are talking about the same population. So the same sigma 1 squared will be in the numerator as well as in the denominator and hence we will get cancelled out and we are only left with s1 squared by s2 squared. What is the probability that this ratio will be greater than 3? So probability of f greater than 3 with 7 numerator degrees of freedom that is incorrect with 7-1 that is 6 numerator degrees of freedom and 13-1, 12 denominator degrees of freedom can be shown to be 0.05. Again probability of s1 squared by s2 squared greater than 3 is equal to probability of the f random variable with 6 numerator degrees of freedom and 12 denominator degrees of freedom is equal to 0.04981 or more or less 0.05. Find the value of f 0.95 for m1 equals 10 and m2 equals 20. In the previous case we were having the sample sizes. So we had to find the m1 as 6 and m2 as 12. Now we are directly given the degrees of freedom themselves that is 10 and 20 for the numerator degree of freedom and the denominator degree of freedom. So f of 0.95, 10, 20 is 1 by f of 0.05, 20, 10 and f of 0.05, 20, 10 may be found from the table. Let us go to that f of 0.05, 20, 10. So 0.05, 20 is numerator, denominator is, let me just check the degrees of freedom, 20, 10.05 and then we have 20, denominator is 10. So you find out the value as 2.77 and then you put 1 by 2.77 approximately 1 by 3, 0.33. Since it is 1 by 2.77 it will be slightly higher and that value is 0.3605. So at present we have completed the brief discussion on the f distribution. We have done the t distribution, the chi square distribution and the f distribution. What we can now do is look at a few example problems and see how these distributions may be applied. So we will take a small break here and then we will continue with the example problems.