 Hello, welcome back. In today's lectures, we will be looking at 2 important distributions. The first is the t-distribution and the next is the chi-squared distribution. Both these distributions are extensively applied in design of experiments. Before this, the most popular distribution we have studied is the normal distribution. The normal distribution had several desirable traits. By now we know that it is symmetric, mean, median and mode coincide and it is obeyed by many sampling distributions of the mean. Another nice thing about the normal distribution is that the parameters of the distribution mu and sigma coincide with the mean and standard deviation. We also were looking at the sampling distribution of the mean and we were looking at a large sample size and we also were considering the situation where the population parameter sigma namely the standard deviation was not known. So, hence it was recommended that we can substitute the sample statistic namely the standard deviation S instead of sigma. Using a large sample, the central limit theorem ensured that the distribution of the mean is normal, approximately normal in fact and we can substitute S for sigma. The natural question that arises is what will happen if I have a small sample size. In many practical situations, it may not be possible to have a large sample size. A large sample size had many desirable properties. It increased the precision while maintaining the level of confidence. It also enables non-normal populations to be considered in our analysis. However, we are now going to look at the case where the sample size is small and the population variance sigma is not known. The t-distribution sometimes also referred to as the students t-distribution is applied when the sample size n is small. How small is small is the query that immediately comes to us. Let us say that you have a small sample size of 6, 10 or 15. Only when the sample size exceeds 40, we can exploit the central limit theorem and also use S instead of sigma. But when the sample size is small, then we have to use the t-distribution. So, let us now look at the properties of the t-distribution. The important assumption made is that the population from which the sample was drawn is a normal population. In other words, the distribution of the population members is normal. This is a very important assumption. However, this is not a very binding or a very restrictive assumption in the sense that many of the populations behave close to normal type. So, this is not very serious assumption. So, small deviations from the normality for the parents probability distribution is permitted. However, what is to be done if the deviation from normality is quite significant, then we have to go for nonparametric tests. These are quite interesting but beyond the scope of our current discussion. The reason why we are doing the t-distribution and the chi-square distribution are, as I said earlier, they are extensively used in design of experiments methodologies and also when you look at any standard statistical analysis output, you will find the confidence intervals noted, the t values given. The chi-square distribution is the forerunner for the Fisher f-distribution that we will see in the next lecture. Let us now focus on the t-distribution. The t random variable is defined as shown in the next slide. This is quite similar, t is given by x bar – mu by s by root n. Well, you may think that I have seen something like this before. Indeed, we all have, we have seen the standard normal variable being defined as x bar – mu by sigma by root n. So, there is an important difference. z was given in terms of x bar – mu by sigma by root n whereas t is defined in terms of x bar – mu by s by root n. So, this is enough to create the difference between the normal distribution and the t-distribution. And the t-distribution also is not a universal distribution. We know that the normal distribution had parameters mu and sigma only. So, when you know mu and sigma, that pretty much described the normal distribution. Whereas, when you are looking at the t-distribution, we have the parameter mu and then we also have another parameter called as the degrees of freedom. When you normalize the random variable x by subtracting the population parameter from it and dividing it by s by root n, then the resulting random variable has a probability distribution which is centered at 0. This was the case also with the standard normal variable. However, the shape of the probability distribution for the t-random variable depends upon the sample size. So, when you have different sample sizes, the shape of the t-distribution also changes. Hence, the sample size is involved in the degrees of freedom. In fact, we say that this is the t-random variable which is following the t-distribution with n-1 degrees of freedom. Again, you might have seen this n-1 earlier. When you calculated the sample variance from the random sample data, you divided the square of the deviations by n-1. Hence, the same n-1 also figures as the degrees of freedom when describing the t-distribution. Well, we will not be really using the probability distribution function for the t-distribution. It is beyond the scope of the present scope and objectives to play around with the probability density function f of x. For completion, I am just giving the form of the t-distribution. It is gamma k plus 1 by 2 square root of pi k gamma k by 2 x square by k plus 1 to the power of k plus 1 by 2. So, what is this k? k is nothing but the degrees of freedom. Since the t-distribution is also having other applications, we do not use n-1 in terming the degrees of freedom. So, we use a general parameter k. But for our applications in statistics, we can set k is equal to n-1. But for general purposes, the parameter k is the degrees of freedom associated with the t-distribution. And another important thing is the mean and variance are not directly present in the t-distribution as they were present as parameters in the normal distribution. In fact, the mean and variance of the t-distribution are 0 and k by k minus 2 for k greater than 2 respectively. What it means is the mean is 0 and the variance is k by k minus 2. Of course, k cannot be equal to 2. So, the minimum degrees of freedom for the t-distribution would be greater than 2. It will be 3. Let us look at the t-distribution plots. Here they are. You can see that several graphs have been drawn here. And all these graphs are centered at the mean 0. The reason for that is we have centered the t-distribution at the origin. By making the transformation, t is equal to x bar-mu by s by root n. So, this is what we have done when defining the t-random variable. So, after you subtract by mu, the distribution gets centered at the origin. Mu is the mean of the population. It is the average value of the population. Please note that we do not know the value of mu. So, the question naturally arises is in order to define the t-random variable, you have the sample mean x bar. You have the sample standard deviation s and you have the sample size n. But you do not know mu. Then how will you have an value for the t-random variable? So, many of our statistical analysis in the future will involve testing of the means. So, we speculate or hypothesize on the value of the population mean mu. So, we are given an value of mu which is assumed or postulated or hypothesized or speculated and that value we can substitute here. So, we will see more on this in the future classes. So, now we are having different curves. Whereas, in the normal distribution, we had only one curve and a person looking at this curve cursorily or superficially will say, oh, this looks like a normal distribution. In fact, the t-distribution bears similarities with the normal distribution with the standard normal distribution because the mean is 0. The distribution is symmetric. 2 and – 2 are equidistant from the origin. So, if you go to 2, f of 2 will be equal to f of – 2. So, that is what is meant by the symmetric nature of the area enclosed by the curve is such that the area under the curve from 0 to let us say again 2 will be equal to the area enclosed by the curve when x varies from 0 to – 2. So, from 0 to – infinity, the area under the curve is 0.5. From 0 to plus infinity, the area under the curve is again 0.5. So, that the total area under the curve is equal to 1. That is the primary requirement of the probability distribution function. Now, where did these curves come from? These curves represent different values of k. One nice thing about the t-distribution is it remains symmetric even for low values of k. There are a few other distributions which we will see presently that are no longer symmetric when the k value or the degrees of freedom decrease. In this case, for k is equal to 3, you have this brown curve and then for k is equal to 7, you have the green curve. The blue curve is for k equals 11 and the dark brown curve for k is equal to 20. So, it can be seen that the curve becomes taller and the spread decreases when the degrees of freedom actually increases. If you can see this portion, I will highlight it. You can see the brown slightly above the dark brown curve. So, you cannot see much from this figure. So, let us go to the next figure which will make things clearer. So, the dashed brown curve corresponds to k is equal to 3. So, we are having a thicker tail for the t-distribution and when the k value increased, the curve became taller at the center but it became thinner at the tail. So, this is for k is equal to 20 and this is for k is equal to 3. So, you can see that when the degrees of freedom is less, then more probability is packed by the t-distribution in the tail portion. To summarize, as the k value increased, the distribution became tall and when the k value is small, the t-distribution became short at the center but it became thicker in the tail regions. These curves can be easily generated by a spreadsheet, okay. What you all have to do is to use the probability distribution and you see the x which is the independent variable here. You can take a very large value of x and a very small value of x. So, the range of x is from minus infinity to plus infinity. A small type was there. I will just correct it. It should be from minus infinity to plus infinity and k is the parameter. You can specify a value of k. You can say k is equal to 3, 4, whatever you want. Make sure the k value is specified to be greater than 2 and then you can define this function in spreadsheet, vary the value of x from let us say a very small number minus 20 and then go up to plus 20. The values of the probability distribution function f of x reduce t fast. You can see that even at k is equal to 3, if you go to minus 6 or plus 6, the value is pretty close to 0. And another thing you may notice is the curve changes its shape more and more slowly as the k value increases, okay. And beyond a certain value of k, a very high value of k, there would not be that much change in the shape of the curve. You can see that from k is equal to 3 to k is equal to 7, there is a big change. Whereas, from k is equal to 11 to k is equal to 20, the change is not much. So as I said earlier, the t distribution resembles the normal distribution with 0 mean and both are unimodal symmetric about the origin and have a maximum value at the origin. The t distribution is heavier in the tail portions and packs more probability in the tail region when compared to the normal distribution. Very interestingly, as k tends to infinity, the t distribution tends towards the normal distribution, okay. Many distributions share this attribute. In the extreme, the different distributions approach normality. k tending to infinity is a mathematical criterion, but when k exceeds let us say 40 or 50, it pretty much is close to the values given by the normal distribution. This degrees of freedom when we use with the t distribution also matches with the degrees of freedom associated with the sample standard deviation, yes. The degrees of freedom in the t distribution also matches with the degrees of freedom with the sample standard deviation. Whenever we use the sampling distribution of the mean for small samples, we have to use both the sample mean x bar and the sample standard deviation, yes. And the sample standard deviations associated with n-1 degrees of freedom, the reason for this we saw earlier in our discussions on the random variable and exploratory data analysis. We have n-1 degrees of freedom for the sample variance. The reason for that is not all the n deviations about the sample mean are independent. Only n-1 of them are independent. So, the sample standard deviation or the sample variance is based on n-1 degrees of freedom. When we use the t distribution, we also have k degrees of freedom. When we use the t distribution for the sampling distribution of the means, we use yes in the definition of the t random variable. We use yes in the definition of the t random variable. So, k becomes equal to n-1, where n is the sample size. So, whenever we use the t distribution for the sampling distribution of the mean calculations for small sample sizes, we use n-1 as the degrees of freedom for the calculation of probability values. Now where are these distributions used? In many practical situations, we have to see the probability of a random variable following between 2 values or being less than a particular value or being greater than a particular value. So, we have to find the probability. All the time, we cannot use the normal distribution. It depends on what distribution the random variable is following. What distribution? The random variable of our interest is following. Now when we talk about the distribution of the sample mean x bar for the case where the population parameter sigma is not known. For the case where the sample size n is small and for the case where the parent population is following the normal distribution, then we have to use the t probability distribution. So, whenever we are considering the probabilities of x bar lying between 2 values or greater than a particular value or less than a particular value, we have to use the t distribution. The probabilities have to be calculated using the t distribution. So, how do we get the probability values? So, if z alpha is a calculated value of the standard normal variable, we know that the probability of the standard normal random variable less than z alpha is equal to 1-alpha. So that the area beyond z alpha was alpha. So, this is what we used in the case of a normal distribution. Now, we define t alpha k such that for the specified k degrees of freedom, the area above this value in the t distribution curve is alpha. So, now we are having t alpha and k. k is the degrees of freedom. Let alpha be a specified value. We have seen earlier that in the generation of the 95% confidence intervals, the alpha value was given as 0.05. It may take values like 0.01 and 0.1 and so on but alpha is equal to 0.05 was quite frequent or usual. Now, suppose we have the same alpha. So, what is the probability of the random variable t taking a value greater than t alpha k? What is the probability that the t random variable takes a value greater than t alpha k? And we define alpha in such a way that probability of t greater than t of alpha k is equal to alpha. So, we define the subscript alpha in such a way that the t random variable exceeding t alpha k will take a probability of alpha. So, just note this definition. We will be looking at the t distribution curve to understand this further. So, t alpha k is an upper tail 100 alpha percentage point of the t distribution with k degrees of freedom. This statistical terminology is fascinating and also important. It is like grammar. We have to use the correct terminology. So, again I repeat t alpha k which is used in the probability calculation. It is usually a value okay. It is represented in a general term t alpha k here. It is a numerical value in fact. So, t alpha k is an upper tail 100 alpha percentage point of the t distribution with k degrees of freedom. What it means is whenever we are calculating the probability, once the value of alpha is specified, we look for the area under the curve beyond t alpha okay. In the cumulative normal distribution probability chart involving the standard normal variable, whenever a particular z value was specified, the probability tables gave the area under the curve below the value of z alpha okay. But in the t distribution, we are talking about the upper tail. So, whenever the value of alpha is specified, we are looking at the probability in the tail region beyond the value of t alpha and the area under the curve beyond the value of t alpha is alpha itself. So, we can say that the area of the distribution or the area covered by the distribution below the value of t alpha would be equal to 1-alpha. The total area under the curve is 1. The total area under the curve for any probability distribution by default by definition is equal to 1. So, summarizing probability of the t random variable with a specified degree of freedom k, taking on a value greater than t alpha, k is equal to alpha. So, this is the definition. Now, since the t distribution is symmetric about the origin, it may be easily shown that probability of t less than t 1-alpha k is equal to alpha as well. We are exploiting the symmetry of the distribution about the origin. So, if you locate a point t 1-alpha k on the x axis of the t distribution, the area below t 1-alpha k will be also equal to alpha. Now, it may be also seen that t 1-alpha k is equal to minus t alpha k because the t distribution is centered around the origin. So, one half of it corresponds to negative x axis values and the remaining half of it corresponds to positive x axis values. So, let us see how t 1-alpha k is equal to minus t alpha k. So, let us now look at this diagram. This is the t distribution for a specified degrees of freedom. The specified degrees of freedom k is equal to 3. This curve again may be generated using a spreadsheet. Now, let us fix the t alpha k which is a number at close to 2. Let me even make it at 2. Well, there is slight discrepancy but let us assume that t alpha k is equal to 2. So, the upper tail probability will be alpha because this is some chosen value of alpha and so the area covered by the upper tail is alpha. Now, if you want the lower tail probability also to be alpha then you should locate t 1-alpha k. The reason is the moment you put t 1-alpha here the upper tail is going to be 1-alpha. If the upper tail is 1-alpha then the lower tail value would be alpha. What is the relation between t 1-alpha k and t alpha k? Since t 1-alpha k is located on this side on the left hand side of the t distribution it would be a negative value but due to the symmetry of the t distribution t 1-alpha k will be located at a certain distance from the origin and that distance will be same as the distance of t alpha k from the origin. So, we can say that t alpha k is equal to minus of t 1-alpha k. So, this is the important relationship here. We can also say t 1-alpha k is equal to minus of t alpha k. So, this follows from the graph. Now, let us look at a typical probability chart for the t distribution probability table if you want to term it that way. So, the degrees of freedom are given here and the probability value is specified here. It is slightly different from the standard normal distribution. So, we have different probability values 0.4, 0.25, 0.1, 0.05, 0.025 so on to 0.0005. So, these probabilities are all upper tail values and the t value corresponding to this probability is given in these columns. So, the t value with a distribution for 4 degrees of freedom packing an upper tail value of 0.05 is 2.353. For 3 degrees of freedom it is 2.353 and the t value for 4 degrees of freedom is 2.132. So, if my degrees of freedom for a given application is 4 and I want to find the t value that will have a probability in the upper tail as 0.05 then the required t value is 2.132. So, as a degrees of freedom changes the probability is fixed but the t values will change. So, for a given probability as the degrees of freedom change the t values also change. What is interesting to note here is for packing a specific probability in the tail portion the value of t decreases as the degrees of freedom increases. What it means is as the degrees of freedom increases the tail becomes thin and when the tail becomes thin you have to go closer to the origin to pack more area. So, for a given probability the t value decreases with increasing degrees of freedom. It is very interesting to relate the figure with the data given in the table. By now you should know what is the z value corresponding to a probability of 0.05. In the cumulative distribution we talk about probability of 0.95 in the lower tail. In the t distribution we talk about the probability of 0.05 in the upper tail. Let us see what happens to the t value when the degrees of freedom increases. So, we are now looking at 0.05 and it is coming to 1.697 at 30 and at 600 it is 1.660 it is at 40 it is at 1.68 at 600 it is 1.66. So, the change is not much and at infinity we are having 1.645 1.645 is a very popular number and so there is not much difference between the t value and the z value because the z value is the standard normal distribution variable value. So, corresponding to a upper tail probability of 0.05 or a lower tail probability of 0.95 the z value is 1.645 and the t value for a very large number of degrees of freedom is 1.660. So, we can see that the t distribution tends towards the normal distribution with increasing degrees of freedom. Another famous probability is 0.025. So, you are talking usually about the 95% confidence level and since you are talking about upper tail and the lower tail we have a probability of 0.025 in the right hand side of the tail and 0.025 in the left hand side of the tail. So, the z value corresponding to 0.975 cumulative probability in the lower tail or 0.025 in the upper tail would be 1.96. So, for the standard normal variable we have 1.96. For a t distribution with an upper tail probability of 0.025 the degrees of freedom being 600 is 1.984. So, for the t distribution with 600 degrees of freedom the upper tail probability of 0.025 will correspond to a t value of 1.984. So, we can also generate confidence intervals for the t distribution. We are taking the small sample size. For the normal distribution we defined the confidence interval in this fashion. We were interested in obtaining the lower bound and the upper bound of the standardized random sample mean such that the probability value was 1-alpha. So, we were interested in finding out the lower bound-zalpha by 2 and the upper bound-zalpha by 2 such that the probability of-zalpha by 2 less than x bar-mu by sigma by root n less than zalpha by 2 was 1-alpha. Now we are taking a random sample from a normal distribution but the catch is the sample variance is not known. So, we are using the s value earlier. What do we do for the construction of the confidence interval? So, not surprisingly we use again the sample standard deviation s in constructing the confidence interval for the t distribution. So, we have probability of-talpha by 2 less than or equal to x bar-mu by s by root n less than or equal to talpha by 2 equals 1-alpha. The question is how do I find-talpha by 2 and plus talpha by 2 such that the probability of the standardized t random variable x bar-mu by s by root n lying between these 2 bounds will have a probability of 1-alpha. So, the t random variable we know is defined as x bar-mu by s by root n. So, we are having minus talpha by 2 n-1 and talpha by 2 n-1 here and you also may recall that minus talpha by 2 for n-1 degrees of freedom is nothing but t1-alpha by 2 n-1 because of the symmetric properties of the t distribution because of the symmetry properties of the t distribution. So, this is the basic definition for constructing the confidence interval. We can manipulate on either side of the inequalities to eventually get probability of x bar-talpha by 2 n-1 s by root n less than or equal to mu less than or equal to x bar plus talpha by 2 n-1 s by root n. Please note the use of capital S and capital X in the representation of the sample standard deviation and the sample mean respectively. And so, this is what we have from this equation we can write the confidence interval for the population mean as x bar-talpha by 2 n-1 into s by root n less than or equal to mu less than or equal to x bar plus talpha by 2 n-1 s by root n. So, you are having a small sample the sample is taken from a normal distribution from an assumed normal distribution and then we not only want a point estimate we also want an interval estimate and for constructing the interval estimate we define the t random variable. The t random variable in turn is defined in terms of the sample standard deviation s because the population standard deviation is not known. So, we have to define t as x bar-mu by s by root n and the degrees of freedom for the t distribution matches with the degrees of freedom used in the calculation of the sample standard deviation. So, once these are known we can easily construct the confidence interval and the confidence interval is given by this expression. So, this is referred to as the 100 into 1-alpha percentage confidence interval for the population mean mu. This is very important t alpha by 2 n-1 is referred to as the upper alpha by 2 into 100 percentage points of the t distribution with n-1 degrees of freedom. So, there is a typo I will just correct it should be percentage point not points it is a single point. So, this completes our discussion on the t distribution we use it for small samples the parent population is normal and the parent standard deviation sigma is not known parent population standard deviation sigma is not known we will continue shortly.