 Now, I will discuss the variance covariance matrix S the sample variance covariance matrix. So, for that if we derive the distribution it will be a matrix distribution it is called a Wichard distribution. So, you can consider this Wichard distribution as a generalization of the chi square distribution in the univariate case the sample variance had a chi square distribution. In fact, we wrote it in the form that sigma x i minus x bar square by sigma square that follows a chi square distribution on n minus 1 degrees of freedom. So, now we consider all the components of the dispersion sample dispersion matrix. So, we are having sigma x 1 i minus x 1 bar square sigma x 1 i minus x 1 bar into x 2 i minus x 2 bar and so on. So, what is the distribution of that? So, let us define the Wichard distribution. So, let u 1, u 2, u k be independent n p mu j sigma for j is equal to 1 to k then we say that sigma u j u j prime j is equal to 1 to k this is said to follow Wichard distribution with k degrees of freedom and we write S following W p k sigma m this S is p by p. So, we let us write this as S here and here m is the non centrality matrix this is of order k by p. Now, when m is equal to a null matrix then S is said to have a central Wichard distribution and the density function of. So, we write W p k sigma mu m this will exist if k is greater than or equal to p for p is equal to 1 W p k sigma that is W 1 k sigma square that is sigma square chi square k. Similarly, non central Wichard will reduce to non central chi square for p equal to 1. Before talking about the density function of a Wichard distribution it is quite complicated actually. So, we firstly look at its properties like the case for multivariate normal distribution. So, some important properties of Wichard distribution. The first property is that if I have S following Wichard with parameters p k sigma and here I am not writing that m here because I can consider both the case of central and non central here. And L is a fixed vector in the p dimensional space then L prime S L that will have sigma L square chi square k dot. And again if S is central Wichard then L prime S L is central chi square. And here this sigma L square I am defining to be L prime sigma L. For proof of this now we will make use of the non central chi square. So, S is written as sigma u j u j prime j is equal to 1 to k where we are actually considering u j as the multivariate normals. So, if I consider and these are independent u 1 u 2 that because that was the setup that we considered here u 1 u 2 u k are independent redistributed. So, let us write here L prime S L that is equal to sigma L prime u j u j prime L that is equal to sigma L prime u j square j is equal to 1 to k. So, L prime u j these are independent these are independently distributed normal 1 L prime mu j and sigma L square. So, L prime S L this will follow sigma L square chi square on k degree of freedom. And this non centrality parameter will come there as we discussed in the previous lecture that if I am considering x following n p mu y then x prime x as a non central chi square distribution with p degrees of freedom and non centrality parameter is given by summation mu y square by 2. So, if we use this then because what we are getting here this L prime u j's they are univariate normals. So, of course, here we have put sigma L square if I divide by that then that will come here. So, this result follows here. Now, if my original we short is central then m will be 0. So, expectation of y that will be 0. So, chi square will be central. Let us write that also if we short is central then m is 0 null matrix this implies expectation of y is m L that is 0 this implies chi square is central. So, we have shown a direct correspondence between a we short and chi square distribution as we have seen in the case of multivariate normal every linear combination is univariate normal. So, here in place of linear combination it is a quadratic form. So, S is a positive definite matrix I am considering L prime S L. So, this is a quadratic form, but the quadratic form will have a chi square distribution. Let us look at the second property. So, once again we are considering u j following n p mu j sigma for j is equal to 1 to k suppose they are independent then if I consider now in the previous one I define what is u I define the matrix here as u here. So, if I use this u as the components of u 1 u 2 u n then if I consider u prime a u then this will have we short this is equivalent to saying y prime a y this will be sigma L square chi square r dot. I will skip the proof of this because it is involving lot of terms and I do not want to make this course extremely theoretical here. Let us move to further properties of the u prime a 1 u and u prime a 2 u they are independent independent we shorts if and only if L prime u prime a 1 u L and L prime u prime a 2 u L they are independent chi square for any L. Further u prime b and u prime a u are independent n p and we short p if y prime b that is equal to L prime u prime b and y prime a y that is equal to L prime u prime a u L they are independent n 1 and chi square for any L. This relation is actually similar to the relation that in the sampling from a univariate normal distribution the sample mean and the sample variance are independently distributed. So, this is of similar nature. So, now let us talk about the joint distribution of the sample mean and the sample covariance matrix. So, let u 1 u 2 u n be a random sample from n p mu sigma distribution then if I consider L prime u 1 and so on L prime u n for any L L is a p dimensional vector then this is a random sample from n 1 L prime mu L prime sigma L. So, if I consider the sample mean 1 by n sigma L prime u i and if I consider the sample covariance matrix L prime u i minus u bar is square i is equal to 1 to n that is equal to L prime sigma u i minus u bar u i minus u bar prime L that is equal to L prime sl. So, from the distribution theory of a univariate normal distribution the sample mean and the sample variance covariance matrix are sample variance sample covariance is independently distributed they are independently distributed further L prime u that will be univariate normal L prime mu L prime sigma L by n and L prime sl will have L prime sigma L chi square on n minus 1. So, now if we use this result this is if and only if here. So, you will get that u bar will have multivariate normal and s will have a short. So, by using property 3 we get that u bar and s are independently distributed and u bar will follow n p mu sigma by n and s will follow we short n minus 1 sigma. So, this is a central we short distribution now like the additive property of chi square distribution we short also has additive property. Let s 1 and s 2 be independent say we short k 1 sigma and k 2 sigma then s 1 plus s 2 will follow we short k 1 plus k 2 sigma. Once again we can prove this result by considering L prime s 1 L and L prime s 2 L. So, they will be central chi squares and then they will be additive. So, it will become k 1 plus k 2 in the case of multivariate normal distribution we had considered the linear combinations. That means, if I consider x as a n p and if I am considering b as a q by p matrix then b x will have n q distribution. Now a similar thing is true for we short also. So, this is linearity we can say if s follows w p k sigma and b is a q by p matrix then b s b transpose that will follow we short q with b sigma b transpose. So, the result will follow from the definition of the we short distribution. Let us talk about the density part here. Let s follow we short k sigma and let us denote the density of s by say w p s k sigma. Let us define s star that is equal to c s c transpose where c is non-singular then density of s star is given by. So, by a transformation of this we get determinant of c to the power minus p minus 1 w p c inverse s star c inverse prime k sigma. Let s i j be the inverse of s i j that is equal to s and sigma i j be inverse of sigma i j that is equal to sigma. Now, if s is having the we short distribution then we have sigma p p divided by s p p that is following chi square k minus p plus 1 and this is independent of s i j i j is equal to 1 to p minus 1. At the same time l prime sigma inverse l divided by l prime s inverse l that follows chi square k minus p minus p plus 1 for any l not 0. In the case of multivariate normal we had seen the conditional distributions a similar thing is true for the we short also. Many of these properties I am stating without any proof because the proofs are quite involved using multivariate and parts here. So, we should know the results here. So, conditional distribution of components. So, suppose I assume we short distribution with parameters k and sigma and s is partitioned as s 1 1, s 1 2, s 2 1 and s 2 2. Suppose these are r components and these are s components here. Similarly, here this is r components, this is s components here. Then s 2 2 minus s 2 1 s 1 1 inverse s 1 2 that has a we short on s k minus r sigma 2 2 minus sigma 2 1 sigma 1 1 inverse sigma 1 2. And more representation of the decomposition of the we short determinant is given by the following. If I say s follows we short k sigma determinant of sigma is non-zero then determinant of s by determinant of sigma is distributed as a product of p independent central chi square variables with degrees of freedom k minus p plus 1 and so on k minus 1 k. And if s i is follow we short k i sigma i is equal to 1 2 s 1 and s 2 are independent. If k 1 is greater than or equal to p then lambda that is equal to determinant of s 1 by s 1 plus s 2 is distributed as product of p independent beta variables k minus 1 p plus 1 by 2 k 2 by 2 k 1 minus p plus 2 by 2 k 2 by 2 and so on k 1 by 2 k 2 by 2. In case k is equal to k 2 is equal to 1 the product of beta variables will be same as a beta with k 1 minus p plus 1 by 2 p by 2. So, this distribution is denoted by lambda p k 1 k 2. So, these distributions are used in the study of the correlation coefficients etcetera which I am not paying too much attention at this point here. Now, we move to another distribution which is extremely useful. See here we have introduced the Wischard distribution as a generalization of the chi square distribution and we looked at some of the properties. So, in the testing for the variance covariance matrix of a multivariate normal distribution we can make use of s and the tests or other inferences will be based on Wischard distribution. Let us also consider the concept of t distribution for the univariate distribution. In the concept of t distribution came when we are considering the inference on mean, but variance is unknown. So, we are divided by estimate of sigma that is s there and that was said to have a t distribution. Now, a similar concept exists for the when we are considering inference on the mean vector of the multivariate normal distribution and when the variance covariance matrix is not known. So, as a generalization of t distribution we are considering hotelings t square distribution. So, let us consider say s following W p k sigma and say d follows n p delta c inverse sigma. Suppose s and d are independent in that case this hotelings generalized t square statistic is defined as t square is equal to c k d prime s inverse d. Now, this we can interpret as k d prime s inverse d divided by d prime sigma inverse d into d prime sigma inverse d and this c also we write here. Now, if we look at this term here this is having chi square k minus p plus 1 for a given d. See this property we did a little earlier this was l prime s sigma inverse l by l prime s inverse l. So, we have written this property here that this will have a chi square distribution rather reverse of this not this d prime sigma inverse d by d prime s inverse d this is having a chi square distribution on k minus p plus 1 degrees of freedom and it is independent of d. So, this can also be considered as unconditional distribution of d prime sigma inverse d divided by d prime s inverse d. Now, d is following multivariate normal. So, if I consider c d prime sigma inverse d that will have chi square with p and c tau square where tau square is equal to delta prime sigma inverse delta. Hence, your t square by k is actually chi square p c tau square divided by chi square k minus p plus 1. So, this is a ratio. So, this is something like a non-central f distribution which I introduced in the previous class that if I consider ratio of I defined here if I have a central chi square and in the denominator I have a non in the denominator I have central chi square and in the numerator I have a non-central chi square then the ratio is a non-central f. So, actually we are able to come to that situation now that this is and these two are independent. Basically, we are writing here k minus p plus 1 by p t square by k follows f distribution. So, that is non-central f. If delta is equal to 0 then we have a central f. Let us consider an alternative representation. In the alternative representation let us consider say t square is equal to c k d prime s inverse d. So, we can write 1 plus t square by k inverse is equal to 1 by 1 plus c d prime s inverse d. This is equal to determinant of s divided by determinant of s plus c d d prime. See to prove this statement we can actually consider see s which is p by p minus c d which is p by 1 d prime which is of course 1 by p and 1. Let us consider the determinant then this can be written as determinant of s plus c d d prime which I can write as determinant of s into 1 plus c d prime s inverse d. Now, c d d prime that will have we short 1 sigma when delta is equal to null. So, hotelings t square after a monotone transformation is a special case of lambda is equal to s 1 by s 1 plus s 2 with k equal to 1. So, this we have already proved that this will have 1 by 1 plus t square by k has beta distribution with parameter k minus p plus 1 by 2 and p by 2. So, we are actually able to divide find out the distribution which is a generalization of the students t distribution here. I will not be giving the derivation of the density of the we short distribution we simply give the expression here the density of we short. So, we have the following result if u prime p by k has density of the form u prime u then density of s that is equal to u prime u is proportional to f s determinant of s to the power k minus p minus 1 by 2. So, if I am considering the density of s as a constant times I will write only the final expression here 1 by w p k which is some constant determinant of sigma to the power minus k by 2 determinant of s to the power k minus p minus 1 by 2 e to the power minus half trace of sigma inverse s many times we consider generalized variance that is determinant of s the distribution of the determinant of s can also be obtained in terms of this we also define sample correlation coefficient etcetera. Let me express these terms in terms of we short consider two dimensional case in the two dimensional case s will follow we short 2 k sigma then r is equal to s 1 2 by square root of s 1 1 s 2 2 this is actually the sample correlation coefficient. This is actually maximum likelihood estimator of rho that is sigma 1 2 by square root sigma 1 1 sigma 2 2. So, the distribution of r r its function can be determined it is given by the density of r square is given by 1 minus rho square to the power k by 2 divided by gamma k by 2 gamma k minus 1 by 2 1 minus r square to the power k minus 3 by 2 sigma rho to the power 2 l gamma k by 2 plus l whole square divided by l factorial gamma l plus 1 by 2 l is equal to 0 to infinity r square to the power rho minus 1 by 2 and the density of r can be obtained from here. We also have the asymptotic distribution of r which can be used for the inference purpose square root k r minus rho by 1 minus rho square this converges to z following normal 0 1 as k tends to infinity that is n tends to infinity here k is equal to n minus 1 and we also define Fisher's z that is half log 1 plus r divided by 1 minus r and if I consider xi is equal to half log 1 plus rho by 1 minus rho then square root n z minus xi this also converges to normal 0 1 as n tends to infinity. So, for testing H naught say rho is equal to rho naught against say H 1 rho is not equal to rho naught and we can use root n z minus xi greater than z alpha by 2 sometimes root n minus 3 is found to be better approximation. Next we define multiple correlation coefficient which is used in the multivariate analysis like in the case of one variable in the case of two variables we have discussed the Karl Pearson coefficient of correlation similarly in the case of several variables we define multiple correlation coefficient. So, let me define that here I have avoided deriving the distributions of various terms in this multivariate portion those who are interested can look at the book introduction to multivariate analysis by T. W. Andersen the chapter on multivariate analysis in the book linear statistical inference and its applications by C. R. Rao and there are some other books also for example MSU Astros book on multivariate analysis they consider these distribution theory not considering it here to save the time here. So, let us consider say let X be a random variable and let Y be a random vector then the multiple correlation coefficient between X and Y is defined to be the maximum of correlation coefficient between X and all linear combinations A prime Y of Y. If X and Y are one dimensional then maximum will be maximum of rho and minus rho multiple correlation coefficient between X and Y will be maximum of rho and minus rho that is equal to modulus of rho X Y. Now, let us consider say X is equal to Y and Z. So, this is one dimensional and this is say P minus one dimensional. So, X is a P by one vector and we want to define the multiple correlation coefficient here and we partition the dispersion matrix as sigma 11, sigma 12, sigma 21, sigma 22. So, this is here one dimensional scalar and this is P minus one dimensional. So, let us consider say maximum of correlation coefficient let us put square here between Y and A prime Z where A is a P minus one dimensional vector. So, we want to maximize this with respect to A. So, this is equal to A prime sigma 21 square divided by sigma 11 A prime sigma 22 A. We are considering the maximum of this with respect to A. So, this we can write as we can substitute B as sigma 22 half A. So, if we put there we will get this as maximum with respect to B, B prime sigma 22 to the power minus half sigma 21 square divided by sigma 11 B prime B. Now, here we can apply Cauchy's worth inequality. So, this quantity will be less than or equal to maximum of B prime B sigma 12 sigma 22 inverse sigma 21 divided by sigma 11 B prime B, but this term gets cancelled out. So, this quantity becomes free from B. This upper bound is actually attained this is attained when A is equal to sigma 12 sigma 22 inverse. So, we are getting here rho square Y sigma 12 sigma 22 inverse Z that is equal to sigma 12 sigma 22 inverse sigma 21 square divided by sigma 11 sigma 12 sigma 22 inverse sigma 22 sigma 22 sigma 21. So, this becomes identity and you will get this term gets cancelled out. So, you get simply sigma 12 sigma 22 inverse sigma 21 by sigma 11. So, this we call rho m square. So, rho m is actually equal to the sigma 12 sigma 22 inverse sigma 21 by sigma 11 to the power half. Now, a maximum likelihood estimator of rho m square this will become simply r square which is calculated simply from the sample analog of this. Later on we will show that in the multiple regression analysis we use this r square as a coefficient of determination and it is an important indicator of the goodness of the regression model that is fitted there. Now, the distribution of r square can also be obtained by the distribution theory that I have discussed earlier, but I will not be giving the final results here. Actually, we can see here that if I consider r square by 1 minus r square then this term is actually equal to S 12 S 22 inverse S 21 divided by S 11 minus S 12 S 22 inverse S 21. So, if I am considering S following we short k sigma then this will follow this can be written as z by chi square k minus p plus 1 and the general distribution of z given chi square k is chi square p minus 1 rho m square by twice 1 minus rho m square chi square k. So, if rho m is 0 then sigma 12 is 0 and this will imply that r square by 1 minus r square has f distribution on p minus 1 k minus p plus 1 degrees of freedom. So, the distribution of the multiple correlation coefficient after a transformation is shown to be f when rho m is equal to 0. So, this is used for the testing of hypothesis regarding multiple correlation coefficient. Likewise, we can also talk about partial correlation coefficients. Suppose, x is a p by 1 vector with expectation x is equal to 0 and dispersion matrix is equal to sigma then expectation of x 1 given x 2 to x p this is called regression of x 1 on x 2 to x p. So, here x 1 is known as dependent variable we will discuss this in detail in when we do the regression, but right now let me just introduce for the purpose of definition here and x 2 to x p these are called independent variables. So, this is used to predict x 1 from x 2 to x p if we consider the correlation between x 1, x 2 keeping x 3 to x p fixed then it is called partial correlation coefficient. So, we have for example, rho 12 dot 3 up to p that is equal to minus sigma 12 divided by square root sigma 11 sigma 22 and one can obtain sample partial correlation coefficient from here by considering minus s 12 by square root s 11 s 22. I will conclude today's lecture by giving some exercise here for calculation of these coefficients and also for testing here. So, s by n is given by say 95.29, 52.86, 69.66, 46.11, 52.86, 54.36, 51.31, 30.05, 69.66, 51.31, 100.81, 56.54, 46.11, 52.86, 54.36, 51.31, 35.05, 69.66, 51.31, 100.81, 56.54, 46.11, 45.02. Find R square, let R square be equal to say xi, then test the hypothesis that H naught R square is equal to xi integer part versus H 1 rho square is not equal to integer part of xi. Also find partial correlation coefficients. So, this is one exercise, another exercise I am asking. Let us consider the data on the performance of a students on 2 tests. So, the some performance measures are given here 1, 1.8, 0.8, 2.7, minus 1.5, 3, 1.8, 0.8, 0.0, minus 1.3, sorry minus 0.3, 4.2, minus 1.3, 5.2 and 0, 6, 4.2, 3.2, 7, 5.3, 3.9, 8, 1.5, and 0.7, 4.2, minus 1.3, 5.2 and 0, 6, 4.2, 3.2, 7, 5.3, 3.9, 8, 1.5 and 0.7, 9, 4.7 and 0.1, then 3.3 and 2.2. Here you find MLEs of mu sigma assuming x 1, x 2 follow N 2 mu sigma. Find rho and test H naught rho is equal to 0.8 against H 1 rho is not equal to 0.8 using asymptotic test for. In next lecture, I will introduce the use of this potellings T square etcetera for testing for the mean of a multivariate normal distribution or comparing the means of 2 multivariate normal distribution. We will also consider the problems of classification of observations. So, this thing I will be covering in the next lecture.