 In the previous lecture, I have introduced normal distribution. Now, I did not actually give you how the normal distribution arises. So, first of all let us look at the historical development of the normal distribution and then why it has come to be placed as a one of the most important distributions in the theory of statistics. Similarly, if we look at the origins then probably the mathematician Gauss was the first one who derived the density function of the normal distribution when he was studying the planetary observations. So, and he derived it as a distribution of the error. So, that is why it is also called error distribution and also the function which we use e to the power minus z square by 2 this is also called error function. But then one of the important results which is called central limit theorem. So, initially it was obtained as a limiting form of binomial distribution or the Poisson distribution, but then later on it was found as a general limiting distribution. So, let me state some of these main developments. So, the first one is called you can say Poisson it is actually called DeMiver-Laplace central limit theorem, but I will just for short I will call it Poisson central limit theorem ok. It is basically named after DeMiver and Laplace. So, the result is that if I consider x to be a Poisson distribution with parameter lambda, let us consider say y is equal to x minus lambda by root lambda. That means, in the this is actually the standardized form the reason is that in Poisson distribution mean and variance both are same equal to lambda. Let us look at the moment generating function consider the MGF of y. So, MYT that is equal to expectation of e to the power ty that is equal to expectation of e to the power t x minus lambda by root lambda. So, this is equal to e to the power minus root lambda into t and then this is nothing, but moment generating function of x at the point t by root lambda. Moment generating function of the Poisson distribution is known to us. So, that we write for t by root lambda. So, that is equal to e to the power minus root lambda t e to the power lambda into e to the power t by root lambda minus 1. So, we can do some simplification here it is e to the power minus root lambda t plus lambda consider the expansion of this 1 plus t by root lambda plus t square by 2 lambda and so on minus 1. So, easily you can see this 1 cancels out then next term root lambda t also get gets cancelled out and if I take the limit as lambda tends to infinity then all the terms here will get cancelled out except the third term here. So, this will converge to e to the power minus t square by 2 as lambda tends to infinity rather plus t square by 2. So, this is the moment generating function of normal 0 1. So, we conclude that the distribution of y converges to normal 0 1 as lambda tends to infinity. Now, this is one of the first manifestations of normal distributions or you can say origin because what we are saying we are looking at actually the rate of arrivals in a Poisson process is not it that is we are looking at how many occurrences in a interval of length t. So, that is a discrete random variable, but as lambda becomes large that means, the rate is far more that means, there will be more and more number of occurrences. So, this can be approximated by a continuous distribution. So, here what we are saying is after standardization it is becoming x minus lambda by root lambda that is normal 0 1. That means, roughly we are saying x has a normal distribution with mean lambda and variance lambda. Now, let us look at another one which I call say binomial limit theorem or which is again attributed to De Maillet law plus. So, let us consider say x follows binomial n p distribution and once again let us consider say z is equal to x minus n p by root n p cube. Let us again consider the moment generating function of z that is equal to expectation of e to the power t x minus n p by root n p cube. So, that is equal to e to the power minus t n p by root n p cube and the remaining part will be moment generating function of x at the point t by root n p cube. Moment generating function of the binomial is q plus p e to the power t whole to the power n. So, from there we conclude that it is e to the power minus t n p divided by root n p cube q plus p e to the power t by root n p cube whole to the power n. So, this we can write as e to the power minus t n p divided by root n p cube and this term we write as q I write as 1 minus p. So, I can write as 1 plus p e to the power t root n p cube minus 1 whole to the power n. This one I can expand. So, this is becoming e to the power minus t n p by root n p cube 1 plus p. So, this will become 1 plus something. So, that 1 will cancel out and you will get t by root n p cube plus t square by n p cube plus 2 n p cube and so on. That means, higher powers of t will come and higher powers of n will come here. So, if I take the limit as n tends to infinity, see this term will give me n times here. So, that will get cancelled out and then remaining term will give me again t square because this n p cube term with n will get cancelled out. So, as this tends to it will converge to e to the power t square by 2 as n tends to infinity. So, we have the second central limit theorem that is binomial distribution converges to normal 0 1 as n tends to infinity. We have earlier seen that as n tends to infinity and p tends to 0 such that n p tends to lambda then binomial converges to Poisson. But if we simply have the condition that n tends to infinity then actually the binomial distribution can be approximated by a normal distribution. Now, from here actually we have the more general central limit theorem. Let x 1, x 2 etcetera be a sequence of independent and identically distributed random variables with mean 0 and sorry mean mu and variance sigma square. Let us consider say x n bar to be the mean of the first n variables here. Then the distribution of x n bar minus mu divided by sigma into root n converges to normal 0 1 as n tends to infinity. In fact, this is one of the basic central limit theorems. Here I have made the assumption that the random variables are independent and identically distributed. So, with mean mu and variance sigma square. Actually what is the significance of this result? What I am saying here is that if I consider the mean of the observations then that will be normal no matter what is the original distribution. An alternative version of this can be written in terms of the summation also like if I consider S n is equal to the sum of the first n observations then the central limit theorem will be S n minus n mu by root n sigma. Then this will converge to normal 2 z which follows normal 0 1 as n tends to infinity. So, that means either we consider the sample sum or the sample mean the limiting distribution will be normal no matter what is the original distribution. Of course, we have to have the existence of the mean and the variance. Later on the generalizations of this result have been done to the sequence of random variables which may be non identically distributed. That means you may have here mu i's and here you may have sigma i square. But then you can consider the suitable version here by replacing by the mean of the first n means and here similarly the variances. So, similar versions do exist. Later on and of course, the condition will be slightly more stringent rather than we consider variance we consider something more than the variance that is more than the second moment should exist. And then even more further generalizations are there where the concept of independence has also been relaxed. However, in this course I will not be mentioning the fully statements of the central limit theorem. Those who are interested may look at the some of the books on limit distributions or the advanced probability theory. For example, in the book by Kailai Chung or the book by Kingman and Taylor etcetera where all these results are mentioned. Now, this is the result which actually places normal distribution in the center of theory of statistics because no matter what original distribution you are starting with. But if you consider the mean of the n random variables then that is having a limiting distribution which is normal. Now, what is the significance of this? The significance is that in most of the practical problems for example, you consider the problem of measurements. For example, how the gas arrived at it because he was considering the measurements of the astronomical distances and many other planetary observations he was considering. So, that means in place of one observation you will take several times the observations to account for the error and then you will take the average of those observations rather than taking individual observation you consider the average of the measurements taken several times. So, as n becomes large this convergence is to the normal distribution. So, this is one of the practical aspects also for example, if you consider the performance of a student in a examination. Now, in examination different questions will be there because the question paper consist of several questions. For example, it may have 30 questions or it may have 50 questions. So, the score of the student will be actually the total performance over all the questions that means the assumption of n being large can be applied and if we assume that his ability in answering the questions will be similar then the marks of the or the scores of the student can be considered to be normally distributed. Similar thing happens almost in various areas of human life for example, you consider human abilities or the height of a person say the distance a person can travel in an hour and so on many of these things have been found to follow normal distribution. Related to this normal related to this central limit theorem there are some other simplistic concepts also which we call in general laws of large numbers. Let me just mention the simplest version as we have mentioned here this for the independent and identically distributed random variables. Before that briefly I will just mention because here we are talking about the convergence. So, this convergence is clear in what sense it is I have shown you the MGF converges which means that if I am considering the CDF then CDF of the this quantity will converge to the CDF of normal 0 1 and similarly in the binomial if I am considering the CDF of x minus np by root npq then that will converge to the CDF of normal 0 1 and similarly for the Poisson. So, that means I am talking about something like convergence in distribution. Now likewise I can introduce some more convergences here in a brief form I will introduce those convergences and based on that I will talk about the laws of large numbers. So, the concept of convergence of random variables although I will not go into deep in this concept here I will only mention those who are interested may read the advance text on the probability theory as I mentioned just sometime before. In particular there are 4 types of convergences which is the first one is called almost sure convergence. So, a sequence of random variables of course, we assume that the probability space will be the same for all of them. So, x n is said to converge almost surely that is I will write in short a s to a random variable x if probability of the set such that x n omega converges to x omega is equal to 1. This is called almost sure convergence then you have convergence in mean. So, actually I consider r th mean if we have expectation of modulus x n minus x to the power r converging to 0 as n tends. So, we say a sequence of random variables x n converges to x in r th mean if expectation of modulus x n minus x to the power r goes to 0 as n tends to infinity. So, we say here x n converges to x in r th mean the notation for almost sure convergence is we write x n converges to x almost surely. So, similarly we have convergence in r th mean then we have convergence in probability. So, once again a sequence of random variables x n is said to converge to a random variable x in probability if for every epsilon greater than 0 probability of modulus x n minus x greater than epsilon this goes to 0 as n tends to infinity. We actually write in notational terms as x n converges to x in probability and sometimes capital P and sometimes small p is used and the convergence in distribution which we actually used in the central limit theorem, but let me formally write it convergence in distribution also it is called convergence in law. So, let us consider say x n has c d f f n and x has c d f say f x. So, we say a sequence of random variables x n converges in distribution to x if f n x converges to f x for all probability continuity points of f. That means, this x is a point at which f is continuous and the notational is x n converges to x in distribution or sometimes we say x n converges to x in law. The first thing is that one should ask that what is the relation between these various types of convergences. So, without going into proofs and other things I will mention this thing. Convergence almost surely implies convergence in probability, convergence in mean implies convergence in probability and of course, convergence in probability implies convergence in distribution. Of course, neither of convergence almost surely or r th mean imply each other without any conditions. In general convergence almost surely does not imply convergence in r th mean and y c versa. Convergence in probability does not imply convergence almost surely, convergence in probability does not imply convergence on r th mean, convergence in distribution does not imply convergence in probability. So, actually we can describe this relation that means, the flow of convergence in the form of a funnel. So, you have convergence to almost surely convergence in r th mean. So, that will. So, suppose I pour a liquid in a funnel then the liquid will flow down. So, this is convergence in probability and this is convergence in law or distribution. So, convergence almost surely implies convergence in probability, convergence in probability implies convergence in law. Convergence in r th mean implies convergence in probability, but neither of this imply each other. Of course, under certain conditions convergence almost surely will imply convergence in r th mean and y c versa. Similarly, if I impose some condition on the convergence in probability it will imply convergence in r th mean or it will imply convergence in almost surely. And similarly, if I put some condition in the random variables then convergence in law may also imply convergence in probability. Now, the purpose of giving this one is to tell about laws of large numbers like you had the central limit theorem we have a strong law of large numbers. So, let x 1, x 2 and so on be a sequence of independent and identically distributed random variables with mean mu. Then 1 by n sigma x i, i is equal to 1 to n. This converges to mu almost surely. That means in the long run the mean of observations is converging to its actual unknown or original mean. Similarly, if I consider that is called weak law of large numbers that is under the conditions given above 1 by n sigma x i converges to mu in probability. So, the names weak and strong law are simply related to the stronger convergence here and the weaker convergence here, but both are true. And actually the generalization of these results are there to for example, non IID random variables are the random variables which are independent, but not identically distributed are dependent and so on. So, I have stated it in the simplest form. Now, what is the practical meaning of this one? Now, the practical meaning you can see here that as we consider observations repeatedly then what we are saying is that the average performance or average measure or average yield or average height etcetera will be to it will converge to the true value of the mean. So, now these are the useful ads in the sampling because when we do the sampling I will be just coming to that concept in a little time. We are considering the sample mean there. So, what we are concluding here is the sample mean is almost becoming equal to the population mean as n becomes large. So, that is what allows us to use statistics in practical sphere. Let me come to those concepts now. Let me just to wind up this particular section. Let me mention few things. We have considered certain continuous distributions. Initially I started with the distributions which are arising as the waiting time of the occurrences. Now, waiting time of the occurrences has one important interpretation that is they can be considered distributions which are representing lives of systems, lives of components. For example, you are considering mechanical system, electrical system, electronic system or any type of organism. If we are considering the failure, first failure or rth failure etcetera then those distributions can be modeled by exponential gamma distributions etcetera. Then we also considered in terms of failure rate and then we could look at the distributions which are like Weibull distributions or extreme value distributions etcetera. Then I considered one of the simpler ones which is called the uniform distribution and then I have introduced the normal distribution. I have established now that it is one of the most important distributions in the theory of statistics because of the law of averages that we are saying that the if I consider the average of the observations then that is having approximately normal distributions. We have also seen the laws of large numbers. That is not to say that there are not other important distribution. There are very large number of continuous distributions that one can think of. For example, in the normal distribution I am considering the tails to go rapidly to 0 because e to the power x minus mu square by sigma square when we are considering. So, as x goes to plus infinity or minus infinity the shape of the curve that means, it goes to 0 the it goes to 0 very rapidly, but there may be distributions where you may not require that. For example, in place of a square you may have only linear then that gives you double exponential distribution or which is also called Laplace distribution Laplace or double exponential distribution. So, I will just write down the density 1 by 2 sigma e to the power minus x minus mu by sigma. So, here you can see that tails will be flatter than that of a normal distribution, but of course, here mean is again mu median is mu and the peak is also at mu that is the mode is also mu. This is called double exponential distribution. The name double exponential is because in the usual exponential distribution you have only one side. Now, I have both the sides here. So, that is why the name double exponential is also coming. You can think of even flatter version that means, in place of exponential function you have only quadrate take or something like that. That means, even flatter tails may be there. So, for example, you have coarsely distribution. So, let me write the simplest form 1 by 1 plus x square or a general version of this could be 1 by 1 plus x minus mu whole square. So, let me put sigma square here and then you may have 1 by pi and 1 by sigma will be coming here. That is minus infinity less than x less than infinity minus infinity less than mu less than infinity and sigma positive. Now, this is applicable to systems or you can say where the convergence to plus infinity or minus infinity is quite slow. For example, you may consider the decay of radioactivity of say nuclear fallout. So, as you know that it is very prolonged process and similar thing in various chemical degradations etcetera. You can see the time taken to complete the process may be too large. In fact, coarsely distribution I gave as an example earlier. In fact, here the mean itself does not exist. This is symmetric here mean does not exist. So, that means, higher order moments will also not exist. So, median is mu, mu is median here. Similarly, we have distribution such as beta distribution, log normal distribution. There are quite a large number of basically there is a family of distribution that can be described using various functions. So, I will stop this discussion here. Let us move to another concept that is of sampling. That means, we move to the use of probability theory for making inferences. So, to start with the extremely simple problems, suppose we want to estimate the average expenditure on the say medical by the people of a for example, the by the people of a state or by the people of a country. Now, what one has to do for this study? That means, one thing is that you take the data from each household of the country and then, but this is not a very useful situation, because in a similar way one may be looking at expenditure on say education, one may be looking at the expenditure on entertainment, one may be looking at the say expenditure on travel and so on. Now, if you if one does a complete enumeration of the population for each of this thing, then it is going to be a horrendous task and practical studies cannot be done, because for example, if you are having a large geographical area, a country or a state, then you will not be able to conduct it in a very reasonable point of time or in a very reasonable time frame and also the resources that will be required will be huge. So, what one suggest is that one can use sample. Now, the theory of sampling I will be covering at other point of time. Right now I am introducing it from the point of view of distribution that we look at the distributions that we arise in the sampling. So, suppose we have taken a sample. So, let us consider now, why the sampling is justified? Now, that is because of the large of large number and the center limit theorem, because in the long run what we are saying is the sample mean acts as the population mean. The distribution of the sample mean after certain normalization converges to a normal distribution and so on. So, these are the properties because which allows us to use the sampling. So, let me briefly go to the sampling. So, let x 1, x 2, x n be a random sample from a population with distribution. Some distribution it will have say capital F and you may have some parameter there f theta which may theta may be vector or scalar. Theta may be theta 1, theta 2, theta k where k could be greater than or equal to 1. As you have already seen examples like binomial distribution you have two parameters n and p. In Poisson distribution you had parameter lambda which is one parameter. In gamma distribution you had parameters r and lambda. In exponential distribution you had parameter lambda and so on. So, when I say this is a random sample from this basically I am saying each of x 1, x 2, x n will have independent and identically distribution f. Now, I consider a function say t of x 1, x 2, x n. This is called a statistic. For example, x bar say s square suppose I consider 1 by n minus 1 sigma xi minus x bar whole square. Suppose I consider say a range that is maximum minus the minimum and so on where this x 1 is the minimum of the observations, x n is the maximum of the observations and so on. So, these are all examples of statistics. Now, distribution of a statistic is called a sampling distribution. Now, by the central limit theorem we can say that normal distribution is itself a sampling distribution because I am obtaining it as a limiting distribution or asymptotic distribution of the sample mean. So, asymptotic distribution of sample mean of course, under certain conditions under certain conditions is normal. So, normal distribution is a sampling distribution, normal distribution is a sampling distribution. Let us also consider say for example, I gave you the linearity property of the normal distribution. I also discussed the additive properties of some distributions. For example, if you add certain random variables which are geometric then the sum will become negative binomial where the probability p of success in individual trial is considered to be constant. We looked at the sum of exponentials then that is gamma and so on. A similar property is true for the normal distributions also. Here this asymptotic distribution is normal, but if original distributions are normal then the sum is also normal. So, I state the general linearity property, general linearity of normal distributions. Let us consider say x 1, x 2, x n independent where x i follows normal mu i sigma i square for i is equal to 1 to n. Then if I consider say y is equal to sigma a i x i plus b i that is a general linear combination of x 1, x 2, x n then that is following normal with a i mu i plus b i sigma a i square sigma i square. So, in particular if I am taking the mean here then that will also have mean which is the mean of this mu 1, mu 2, mu n and if I take the variances then this will become sum of the variances and divided by n square here. In particular if I take if x 1, x 2, x n follow normal mu sigma i square that means if they are independent and identically distributed, independent and identically distributed then x bar will follow normal with mean mu and variance sigma i square pi. So, normal distribution itself is normal distribution is a sampling distribution in the finite sense also. Here it is asymptotically a sampling distribution, but here it is a with a fixed sample size also it is a sampling distribution. Now, let me introduce some other sampling distributions which arise in the study of distributions of various statistics. So, let us consider first which is known as chi square distribution. Let W be a continuous random variable, it is said to have chi square distribution with n degrees of freedom. So, the parameter of chi square distribution is actually called degrees of freedom. If its pdf is 1 by 2 to the power n by 2 gamma n by 2 e to the power minus W by 2 W to the power n by 2 minus 1. Here of course, n is positive. See if you look at it carefully this is actually nothing but a gamma distribution. This is gamma distribution with actually n by 2 and half r is equal to n by 2 and lambda is equal to half. So, this is actually not a new distribution then, but I am introducing it as a separate name chi square distribution because I will show it as a sampling distribution. And notationally we write it as W follows chi square n. Let us introduce it as a sampling distribution if say x follows normal 0 1. So, this is then if I consider say y is equal to x square and we can derive the distribution very easily. In fact, let me just demonstrate it here. What is the density function of this that is 1 by root 2 pi e to the power minus x square by 2. So, if I consider x square this is a 2 to 1 transformation. So, I will get it as 1 by 2 to the power half gamma half that is 1 by root 2 pi e to the power minus y by 2 w y to the power half minus 1 which is nothing but that is y follows chi square distribution on 1 degree of freedom which is again gamma. And in gamma we know that if lambda is common then additive property is followed. So, if I consider if x 1 x 2 x n are independent and I did not distributed normal 0 1 random variables then sigma x i square that will follow chi square on n degrees of freedom. Now, that is one derivation of the chi square distribution as a sampling distribution it is arising as this distribution of the sum of squares of n observations from a standard normal distribution. But we also can derive it from a general normal distribution if x 1 x 2 x n follow normal mu sigma square. I have introduced s square that is s square was 1 by n minus 1 sigma x i minus x bar square. Then n minus 1 s square by sigma square that will follow chi square distribution on n minus 1 degrees of freedom. So, this shows it as a sampling distribution of the sample variance also x bar and s square are independently distributed. For more details about the derivation of this and these results etcetera you may look at the NPTEL lectures on probability and statistics and also you can look at the books which I have mentioned in the references. So, I will not get into too much details of each of this distribution let us just look at the properties of this thing. If you look at the form of the density function this function is actually if we plot it of course, it will depend upon n, but it is positive. So, it will be usually positively skewed. So, it is positively skewed. In fact, let us look at the coefficients etcetera. Let me write expectation of chi square is equal to the degrees of freedom. Variance of chi square is equal to twice the degrees of freedom. It is moment generating function is equal to 1 by 1 minus 2 t to the power minus n by 2 sorry plus n by 2 for t less than half. If we look at the measures of skewness and kurtosis it turns out that the third central moment is equal to 8 n which is of course, positive, but if I look at say measure of skewness then you can see it is equal to root 8 by n. So, of course, this goes to 0 as n becomes large. That means, it will converge to symmetry if n is large which is because I am obtaining chi square as the distribution of a sum here. If you look at the distribution of the sum then by central limit theorem as n becomes large the distribution of sigma x i square minus n by root 2 n that will converge to normal 0 1. You can actually write here suppose I am calling it as say u then u minus n by root 2 n that will converge to normal 0 1 as n tends to infinity. So, this is there. So, therefore, you can also look at the measure of kurtosis mu 4 is 12 n into n plus 4. So, beta 2 is equal to 12 by n which is positive but this goes to 0 as n tends to infinity. Then regarding the calculation of the probabilities like in the normal distribution all the probabilities we were able to calculate through the standard normal probability curve which are tabulated. Now, for the chi square distribution if you see you are considering again if you look at the probabilities related to this will give you a incomplete gamma function. Now, for various values of n the tables of chi square c d f are tabulated but that will be too complicated. So, to consolidate or you can say to make it in a compact form what is tabulated is of this form. If this probability is alpha then this point is called chi square n alpha that means probability of w greater than chi square n alpha is equal to alpha. So, for different values of n and alpha this percentile points of chi square distribution this is called upper 100 alpha percent alpha point of a chi square n distribution. So, the tables of these are given in almost all the statistical books and tables this is tabulated. So, for example, you can see if I have n is equal to 10 and alpha is equal to 0.1 then the value is given to be 4.865 and so on. Here actually let me see this is 0.05 is actually 1 minus alpha. So, that will be equal to 0.95. So, the point is for example, 3.94 if I take 0.9 then it is 4.865 and so on. So, the tables of chi square n alpha are given for different values of n and alpha and as I have mentioned that as n becomes very large it is not required because then the distribution of u minus root u minus n by root 2 n can be approximated by normal distribution. So, those tables are not given. Generally in the books they tabulate up to n is equal to 30 or sometimes up to 60 or something like that. So, we have shown chi square as a sampling distribution in sampling from a normal distribution. Let us look at some further sampling distributions the next one we call students t distribution. Let x follow normal 0 1 and y follow chi square n and x y be independent. Then if I define t is equal to x divided by root 2 n and x root y by n then this is said to have a students t distribution on n degrees of freedom and we write here t follows t on n degrees of freedom. This name student is actually because of the statistician W is Gossett who gave it in 1907, but he worked in a brewery and he published it under the pseudonym student that is why it is called students t distribution. One can easily derive the density function of t that is 1 by root. Let us give the exact form here that is 1 by root n beta n by 2 half 1 by 1 plus t square by n to the power n plus 1 by 2. As you can see this is also symmetric distribution around 0 and it has some important things for example expectation will be 0 the variance of t is n by n minus 2. Of course, you can see that this will converge to 1 as n tends to infinity and if we look at mu 4 and say beta 2 here beta 2 for this distribution is actually equal to 6 by n minus 4 which is positive. Actually this distribution is closely resembling a normal distribution and in fact you can prove as n tends to infinity the pdf of t distribution converges to phi t that is the standard normal usually for n greater than or equal to 30 the approximation is quite good and that is why generally the tables of t distribution will be tabulated up to n is equal to 30 only and because of the symmetry if we consider the point here t n alpha that is this probability is equal to alpha probability of t greater than t n alpha is equal to alpha. So, for different values of n and alpha t n alpha values are tabulated. So, these are this is called the upper 100 alpha percent point of the t distribution. Now to look at it as more as a sampling distribution if I consider say x 1 x 2 x n follow normal mu sigma square and if I consider say x bar then x bar follows normal mu sigma square by n. Therefore, if I consider x bar minus mu by sigma into root n that will follow normal 0 1. At the same time if I look at n minus 1 s square by sigma square that follows chi square on n minus 1 degrees of freedom and these 2 are independent as I mentioned earlier. So, if I consider root n x bar minus mu by sigma divided by root of n minus 1 s square by sigma square into the degrees of freedom here. Then this will follow t distribution on n minus 1 degrees of freedom, but if you simplify this then you get this as root n x bar minus mu by s. So, here you differentiate this is root n x bar minus mu by sigma that is normal 0 1 root n x bar minus mu by s is t distribution on n minus 1 degrees of freedom. So, this is a sampling distribution. To the end I will define one more distribution that is called f distribution. Let say w 1 and w 2 be independent chi square say m and chi square n random variables. Then it let us define w 1 by m divided by w 2 by n let me call it f. Then this is said to follow f distribution on m n degrees of freedom that is f distribution on m n degrees of freedom. One is for the numerator chi square variable and one degree of freedom for the denominator chi square variable. One can again write down the density function pdf of this f can be written as. So, that is let us say f x that is equal to m by n to the power m by 2 x to the power m by 2 minus 1 1 plus m by n x to the power m plus n by 2 where x is positive. So, this is a pdf of this one can derive it using the usual distribution theory. And you can easily see that this is a skewed distribution you will have of course, it will vary with the value of m and n, but various forms will be skewed here. These are the things and just to give you if I consider say one of them and I consider this alpha this probability as alpha then this is called f m and alpha that is the probability of f greater than f m and alpha that is equal to alpha. Another thing that you can notice here is that if I consider 1 by f here 1 by f is also f that will be f on n m degrees of freedom. Therefore, if I find out the relation 1 by f m n alpha that is equal to f n m 1 minus alpha this relation is there. So, for different values of m and n and alpha the values of f n minus alpha f m and alpha have been tabulated. So, one can look at, but since this is a three dimensional thing therefore, only for selected values of alpha you can find the tables of the percentile points of f distribution. So, I have given important distributions as the sampling distributions just to end here let me give you here suppose I consider a random sample from normal mu 1 sigma 1 square and I define s 1 square as 1 by m minus 1 sigma x i minus x bar square. Similarly, I consider another random sample from normal mu 2 sigma 2 square where s 2 square is 1 by n minus 1 sigma y j minus y bar square. Then if I consider s 1 square by sigma 1 square by s 2 square by sigma 2 square that will have f distribution on m minus 1 n minus 1 degrees of freedom. So, this is also a sampling distribution. I have derived normal distribution chi square distribution t and f distribution as sampling distributions when we are sampling from the normal populations, but normal distribution itself is a sampling distribution in a more general form more general sense because it is also a sampling distribution of the sample mean from any population with of course finite variance provided the random variables are iid. So, you consider the sample mean and the conditions have been relaxed also that means the identical thing can be relaxed or independent thing can be relaxed and therefore, in a more general sense the normal distribution is a sampling distribution. These sampling distributions are very useful when we will do the inference that means we will consider confidence intervals for the parameters of means and variances when we will consider the testing of hypothesis for the means and variances etcetera. So, in the next module of this course we will be covering various aspects of this. You can look at the problem sets on this module of probability and distribution theory which I will be which is available on the website. So, that will be very useful to look at the problems for this with this I complete this section.