 In the last lectures, I have introduced multivariate versions of the chi-square distribution which we called vichard distribution. We also considered multivariate version of the student's t distribution which we called hoteling's t-square distribution. We also see some other distribution such as non-central chi-square, non-central t and non-central f and I showed a couple of applications where they arise. Basically, these distributions will be used when we consider testing in the multivariate normal population. If you remember the earlier lectures on the testing of hypothesis, we have introduced the testing for the parameters of a normal population. For example, testing for the mean like we have considered say x 1, x 2, x then a random sample from say normal mu sigma square, then we have considered testing for mu, we have also considered testing for variance. We also considered two sample problems that means we have say x 1, x 2, x m a random sample from normal mu 1 sigma 1 square and y 1, y 2, y n a random sample from another normal population say normal mu 2 sigma 2 square. So, we have considered equality of means, equality of means and variances etcetera. So, we have considered various testing situations. For example, testing for mu and sigma square is known, testing for mu and sigma square is unknown. We have considered testing for sigma square again when mu is known or unknown. We also found the confidence intervals for these parameters in these situations. In the two sample problems, we considered testing for mu 1 less than or equal to mu 2, mu 1 is greater than mu 2 etcetera and similarly for sigma 1 square is equal to sigma 2 square, sigma 1 square less than sigma 2 square etcetera. We have seen that these tests are based on normal chi square t and f distributions. Now in the multivariate situation, let us consider these. Of course, we also considered the confidence interval in all the situations and they were also dependent upon these distributions. So, now we consider the multivariate analog of this testing and confidence interval problems. So, let us consider, we will consider testing and confidence interval for parameters of a multivariate normal population. So, let me introduce one sample problem first. So, we will assume that we have a random sample. Let x 1, x 2, x n be a random sample from n p mu sigma population. So, let us consider sigma is known. So, we can let us consider say testing for mu. So, we want to test whether the mean vector mu is equal to a known vector mu naught against mu is not equal to mu naught. We consider the structure of the sufficient statistics here x bar, x bar follows n p mu 1 by n sigma. So, based on this, we can define root n x bar minus. So, for example, if I consider root n x bar minus mu naught, that will have n p 0 sigma. And then if I consider root n sigma to the power minus half x bar minus mu naught, that will have n p 0 i. Now, that means, we are assuming here sigma is known and positive definite. We are assuming it is positive definite. So, that inverse is defined and we have already discussed in detail then that how to define sigma to the power minus half matrix. That means, we consider the spectral decomposition of sigma as p d p transpose where d is a diagonal matrix and then we consider sigma to the power half as p d to the power minus half p transpose and sigma to the power minus half can again be obtained as p d to the power minus half p transpose etcetera. So, all these things can be determined for a positive definite matrix. Now, the components of this become independent standard normal random variables. That is, components let us call it say y components of y are independent standard normal random variables. Then y prime y suppose I am writing say y is equal to y 1 y 2 y n that is y transpose is the row vector. Then y prime y that is sigma y i square that will follow chi square distribution on p degrees of freedom. These will be p components because we are dealing with the p dimensional normal distribution. So, y prime y that is sigma y i square will have a chi square distribution on p degrees of freedom. So, if that is so, let us write y prime y. So, now what is y prime y that will become equal to square root n sigma to the power minus half x bar minus mu naught prime root n sigma to the power minus half x bar minus mu naught that is equal to n times x bar minus mu naught prime sigma to the power minus 1 x bar minus mu naught. Then that will follow chi square distribution on p degree of freedom. So, now based on this we can consider the test for mu is equal to mu naught. When mu is equal to mu naught, when mu is equal to mu naught we are getting this when mu is equal to mu naught then we have this distribution. So, the test for h naught mu is equal to mu naught against mu is not equal to mu naught is reject h naught mu naught is mu naught if this value let us call it w naught w naught is greater than chi square p alpha at significance level alpha. Now, we can also consider based on this see here what we are getting is that we are assuming mu is equal to mu naught. If mu is not equal to mu naught and then if I consider the distribution of w naught then that will be non central chi square with non central t parameter mu minus mu naught prime sigma inverse mu minus mu naught. So, we can also construct 1 minus 100 1 minus alpha percent confidence region for mu. If I consider say w is equal to n x bar minus mu mu naught mu naught sigma inverse x bar minus mu then that is having chi square p. So, I can write probability of w less than or equal to chi square p alpha that is equal to 1 minus alpha. So, because if I consider this region chi square p alpha this is the point this probability is alpha. So, this probability is 1 minus alpha. So, if I consider this portion then now here I consider the set of those mu's for which this is satisfied. So, if I consider probability of the region the set of all those mu's for which n x bar minus mu transpose sigma inverse x bar minus mu is less than or equal to chi square p alpha then this is equal to 1 minus alpha where mu is vector in the p dimension. So, this gives a p dimensional ellipsoidal ellipsoidal region in r p. So, this is called 100 1 minus alpha percent confidence region. So, basically this is the interior and the boundary of the space. So, for example, if I consider two dimension it may become something like this. Suppose this is my say x 1 x 2 vector. So, this is say this point is say x 1 bar x 2 bar and then you have this. So, you are getting the components of this. Let us take say special case say p equal to 2 and sigma is say diagonal that is say sigma is equal to sigma 1 square sigma 2 square. Then how this region will look like this will become n x 1 bar minus mu 1 x 2 bar minus mu 2 mu 2 mu 2 mu 2 mu 2 mu 2 mu 2 mu 2 mu 1 by sigma 1 square 1 by sigma 2 square 0 0 x 1 bar minus mu 1 x 2 bar minus mu 2 less than or equal to chi square p alpha. So, this quantity can be easily calculated it is equal to n times. Now, if I multiply I will get x 1 bar minus mu 1 square by sigma 1 square plus x 2 bar minus mu 2 square by sigma 2 square less than or equal to chi square p alpha. So, this is can be easily seen that what is the ellipse here. Here the center is x 1 bar x 2 bar and the you are also getting if I divide by this here x 1 bar minus mu 1 square x 1 bar minus mu 1 square by sigma 1 square chi square p alpha divided by n plus x 2 bar minus mu 2 square by sigma 2 square chi square p alpha by n less than or equal to 1. So, this is the interior of the ellipse with center x 1 bar x 2 bar minus mu 1 square and major axis is equal to twice sigma 1 chi. So, let us write you are having a square that is equal to sigma 1 square chi square p alpha by n and for minor axis you are getting b. So, b square is here equal to sigma 2 square chi square p alpha by n. So, you can see you can easily plot the region and see how the ellipse will look like. So, we are able to solve this one sample problem when the variance covariance matrix is assumed to be known. So, we are actually making use of the central chi square distribution. When the variance covariance matrix is known we can also write down confidence region or the test for equality of means in the two population case or the two sample problem. So, let me consider two sample problem and so let us consider let x 1 x 2 x m be a random sample from n p mu 1 sigma distribution and let y 1 y 2 y n be another independent random sample from n p mu 2 sigma population. Here again I am assuming sigma is known and positive definite. Let us consider say x bar. So, that will have normal n p mu 1 1 by m sigma. If I consider y bar that will have n p mu 1 1 by m sigma. So, let us work out the distribution theory here x bar minus y bar that will be n p mu 1 minus mu 2 1 by m plus 1 by n sigma. So, we can call this a nu that is equal to mu 1 minus mu 2. So, we can then write here this is becoming m plus n by m n. So, this we can write then m n by m plus n u minus say nu prime sigma inverse u minus nu that will have chi square distribution on p degrees of freedom where I am defining this u is equal to the difference of x bar minus y bar and nu is equal to mu 1 minus mu 1. Therefore, this can be used for drawing inference on mu 1 minus mu 2. For example, if I want to do the testing suppose we want to test say h naught mu 1 is equal to mu 2 against say h 1 mu 1 is not equal to mu 2. So, under h naught you will have m n by m plus n mu minus sigma inverse u sorry u prime sigma inverse u that will follow chi square p distribution. So, test is reject h naught if this quantity m n by m plus n u prime sigma inverse u is greater than or equal to chi square p alpha. And we can also construct the confidence region 101 minus alpha percent confidence again it will be ellipsoid only ellipsoid for nu is equal to mu 1 minus mu 2 that will be if I consider probability of say nu belonging to r p m n by m plus n u minus mu prime sigma inverse u minus mu less than or equal to chi square p alpha. This is equal to 1 minus alpha if I consider this region in the p dimensional Euclidean space. So, this is 101 minus alpha percent confidence a region for mu 1 minus mu 2. One can actually also write for some linear combinations also we may also draw inferences on linear functions of mu 1 and mu 2. For example, I consider say mu 1 minus mu 2 mu 2 some c 1 mu 1 plus c 2 mu 2 then I can consider say c 1 x 1 bar c 1 x bar plus c 2 y bar then that will have n p and we can write down the distribution here. We can also consider linear combination of the components of mu 1 and mu 2 that also we can consider. So, for example, here it will become c 1 mu 1 plus c 2 mu 2 and here I will get c 1 square by m plus c 2 square by n sigma. So, based on this again we can construct test and confidence interval for c 1 mu 1 plus c 2 mu 2 suppose I call it xi. So, we can test for xi is equal to xi naught or we can find confidence intervals or confidence region for xi. So, again it will be in the terms of chi square p distribution that is the central chi square distribution that we will be getting here. Actually this idea for making use of x bar minus mu this term actually the initial ideas are hidden in the myelanobis d square statistic. So, let me just mention that thing he suggested using d square that is mu 1 minus mu 2 prime sigma inverse mu 1 minus mu 2 as a measure of we call it divergence or distance basically between two populations. Let us also consider the general situations here say likelihood ratio test. So, again x 1, x 2, x n is a random sample n p mu sigma. And of course, we assume as usual n is greater than p and we are considering mu is equal to mu naught against mu is not equal to mu naught. The likelihood ratio criteria involves the likelihood function. So, we calculate the likelihood function here 2 pi to the power minus p n by 2 determinant of sigma to the power minus n by 2 e to the power minus 1 by 2 sigma x i minus mu prime sigma inverse x i minus mu. Now, under h naught this l is maximized when mu is equal to mu naught because under h naught mu is equal to mu naught and sigma will be considered as. So, let us put mu naught here and sigma naught that will be equal to 1 by n sigma x i minus mu naught x i minus mu naught prime. So, the maximum of the likelihood function under h naught is let us call it l head naught that is equal to 2 pi to the power minus p n by 2 determinant of sigma naught head to the power minus n by 2 and if I consider this term here e to the power minus 1 by 2 sigma x i minus mu naught prime sigma naught head inverse x i minus mu naught. Now, if you look at this term this is actually a scalar term this term is a scalar. So, we can also consider it as trace of this term. Now, this I can write as trace of sigma sigma naught head inverse x i minus mu naught prime x i minus mu naught x i minus mu naught x i minus mu naught prime. Now, this sigma I can take inside. So, it becomes trace of sigma naught head inverse sigma x i minus mu naught x i minus mu naught transpose, but this is nothing but sigma naught. So, this is becoming sigma naught head inverse sigma naught head into n. So, this is becoming n p. So, l naught head is becoming 2 pi to the power minus p n by 2 determinant of sigma head to the power minus n by 2 e to the power minus n p by 2. Now, under the full space when we consider mu belonging to R p and sigma is a p by p positive definite matrix then the maximization of l gives mu head is equal to x bar and sigma head is equal to 1 by n sigma x i minus x bar x i minus x bar prime. Once again if I put l head that is equal to 2 pi to the power minus p n by 2 determinant of sigma to the power sigma head to the power equal to 1 by n sigma x i minus x bar x i minus x bar prime. Once again if I put l hat that is equal to 2 pi to the power minus p n by 2 determinant of sigma to the power sigma hat to the power minus 1 by 2 e to the power minus n p by 2. So, this will be same part here. So, the likelihood ratio that we consider that is l naught hat by l hat that is equal to. So, if you look at these terms here l naught hat and l hat then these two things are common these two terms are common. So, this will get cancelled out you are left with only determinant of sigma naught hat and determinant of sigma hat and this is minus 1 by 2 I am sorry this is minus n by 2 here. So, this will be minus n by 2 and this will become minus n by 2. So, this is now I am getting determinant of sigma hat by determinant of sigma naught hat to the power n by 2 let us call it say lambda. So, lambda to the power 2 by n that is equal to determinant of sigma hat divided by determinant of sigma hat. Now, this is nothing but s here divided by s plus n times x bar minus mu naught x naught minus mu naught prime or we can consider lambda to the power 2 by n this will become equal to 1 by 1 plus n x bar minus mu naught prime s inverse x naught minus mu naught. Which is nothing but 1 by 1 plus t square by n minus 1 that is 1 by 1 plus t square by k this t square by k this term I introduced in the last class which is coming from the hotelings t square distribution. So, the likelihood ratio test is reject h naught when this is lambda is less than or equal to some lambda naught 1 by 1 plus t square by k less than or equal to some c naught or we can say t square greater than or equal to some t naught square. If we take confidence level to be alpha then t naught we can choose to be n minus 1 p by n minus p f on p n minus p alpha. This value actually will call the percentage point of the hotelings t square distribution. Here one computational problem is there that is if the data is given to you you need to evaluate this t square here that is n x bar minus mu naught prime s inverse x naught minus mu. This involves the evaluation of the inverse of s which may be quite complicated for example, if you have p is equal to 4 or p is equal to 5 then this is quite complicated exercise. But one can actually do it by using numerical techniques you consider it as a solution of the simultaneous linear equations. Let me just present a method here to compute t square we need not find directly s inverse instead we can consider b as the solution vector of the system of linear equations that is the s b is equal to x bar minus mu naught and then t square is nothing but n into n minus 1 x bar minus mu naught transpose b. So, one can use some numerical technique like gas elimination backward etcetera all those things can be used for solving the system. There is another interpretation to this here we have another interpretation for t square. Let me firstly state a lemma which is from Anderson let x be a p by 1 vector and a be a non singular matrix of order p by p then x prime a inverse x is the non zero root of the equation x x prime minus lambda a is equal to 0. If I use this then I can say that t square by n minus 1 is a non zero root of the equation x x prime minus lambda a is equal to 0. If I use this then I can say that t square by n minus 1 is a non zero root of the equation of n x bar minus mu naught x bar minus mu naught transpose minus lambda n minus 1 s is equal to 0. Similarly, the 1 minus alpha confidence region. So, here we will have 101 minus alpha percent confidence ellipsoid for n minus 1 is equal to 0. And mu this will be n x bar minus mu prime s inverse x bar minus mu less than or equal to t square p n minus 1 alpha the set of all the mu's in r p satisfying this condition this set is the confidence region for mu here. As we have given the interpretation earlier this is a ellipsoid in the higher dimensional space. One can also actually find out as I mentioned a little earlier that we can consider linear combinations of vectors. So, for example, here I mentioned that we can consider c 1 mu 1 plus c 2 mu 2 etcetera. We can consider more than one also that means, we can consider simultaneous confidence intervals for all linear combinations of a mean vector. So, that also we can give let me just briefly mention about that also here all linear combinations of a mean vector. We first have the following result that for a positive definite matrix s gamma prime y square is less than or equal to 0. So, gamma prime s gamma y prime s inverse y. Let us look at the proof here. Let us consider say b to be gamma prime y divided by gamma prime s y s gamma. Now, if I consider y minus b s gamma prime s inverse y minus b s gamma y prime s inverse y minus b s gamma y minus this is a scalar here. Now, if I expand this I can get it as equal to y prime s inverse y minus gamma prime y square by gamma prime s gamma. Now, this is greater than or equal to 0. So, this gives the result here. This is basically you can consider as a generalization of the Cauchy-Swarz inequality to higher dimensions. So, if I substitute say y is equal to x bar minus mu then we get gamma prime x minus mu is less than or equal to gamma prime s gamma x bar mu s inverse x bar minus mu to the power half. But if we use the distribution of this then this is nothing but gamma prime s gamma t square p n minus 1 alpha divided by n to the power half. This is true with probability 1 minus alpha. So, we are getting that all linear combinations here they will satisfy simultaneous inequalities of the form that is gamma prime x minus gamma prime m less than or equal to gamma prime s gamma to the power half t square p n minus 1 alpha divided by n to the power half. So, simultaneously these are satisfied here. So, we have considered one sample problem for mu when sigma is known and also we have resolved the problem when sigma is unknown. We have considered the two sample problem when sigma is known. Now, let us consider two sample problem when sigma is unknown. Here again as before we have two cases one case in which the variance covariance matrix is considered to be common and in another case we will consider it to be uncommon and the procedures will be different as you had seen in the case of univariate problem. So, let us consider the two sample problem when sigma is now when sigma is unknown. So, we are actually considering I am just little bit modifying the notations here. So, let us consider say we write in terms of y itself just because we will consider regionalization to higher dimensions also and that means multi sample also. So, if I consider y 1 1 and so on y n 1 1. So, this is a random sample from n p minus n mu 1 sigma. So, these are independent and identically distributed and similarly I consider y 1 2 and so on y n 2 2 this is a random sample from mu 2 sigma. So, these two are same sigma is common, but unknown and now as before we will be testing equality of the mean vectors. Let me mention here that we have considered the problems in which the testing problem is about the equality or not. In the case of univariate we had seen other kind of testing problems also like mu 1 less than or equal to mu 2 or mu 1 greater than mu 2 etcetera. But here I am not giving those procedures here. In fact, if we consider inequalities then there can be various cases. For example, first component may be less, second component may be more, third component may be equal. So, you can have various kind of hypothesis testing problems. Some of the popular ones are like ordered alternatives we call in which the concept of isotonic regression is used. So, there are many research papers currently available on that topic both for the known and unknown variance cases. In this particular course we will discuss only the basic ones that means the equality concept is being tested here. So, we will construct the hotelings t square here. So, let us consider the sample mean vectors y 1 bar that will be n p mu i 1 by n i sigma for i is equal to 1 2. So, if I consider the difference square root n 1 n 2 by n 1 plus n 2 y bar 1 minus y bar 2 that will follow n p 0 sigma. Now, in this case sigma is unknown. So, we make use of s now and for s we have two things like from the first sample we will get the variance covariance matrix as s 1 and from the second I will get variance covariance matrix as s 2 and then we will consider pooling of that. So, let us define this we consider sample dispersion matrices. So, that is s 1 that is equal to sigma y j 1 minus y bar 1 y j 1 minus y j y bar 1 transpose j is equal to 1 to n 1. So, if I put here n i and here I put i and this is I can call i is equal to 1 2 and we can consider s 1 plus s 2 then this is having the same distribution sigma z k z k transpose k equal to 1 to n 1 plus n 2 minus 2 where z k is normal 0 and sigma. So, we define s is equal to 1 by n 1 plus n 2 minus 2 sigma 1 plus sigma 2 sorry s 1 plus s 2 then based on this we can define the hotelings t square that is n 1 n 2 by n 1 plus n 2 y bar 1 minus y bar 2 transpose s inverse y bar 1 minus y bar 2. Then we this has a hotelings t square distribution on n 1 plus n 2 minus 2 degrees of freedom. So, if we consider based on the representation in terms of f. So, the rejection region is t square greater than n 1 plus n 2 minus 2 into p by n 1 plus n 2 minus p minus 1 f p n 1 plus n 2 minus p minus 1 f p n 1 plus n 2 minus p minus 1 alpha. So, this is level of significance level here will be alpha for this. We can make use of this for constructing the confidence region also for mu 1 minus mu 2. We can also construct it is the set of y 1 bar minus y 2 bar minus some xi s inverse y 1 bar minus y 2 bar minus xi less than or equal to n 1 plus n 2 by n 1 n 2 t square p n 1 plus n 2 minus 2 alpha. The set of all p dimensional vectors which satisfy this. So, this is the 100 1 minus alpha percent confidence ellipsoid for mu 1 minus mu 2. We can also write the of course, this term you can see that this is also equal to n 1 plus n 2 by n 1 n 2 t square into n 1 plus n 2 minus 2 into n 1 plus n 2 minus 2 p divided by n 1 plus n 2 minus p minus 1 f p n 1 plus n 2 minus p minus 1 alpha. So, one can evaluate this using the tables of the f distribution. Similarly, the simultaneous confidence intervals can be written gamma prime minus gamma prime xi less than or equal to gamma prime s gamma to the power half n 1 plus n 2 by n 1 n 2 t square p n 1 plus n 2 minus 2 alpha to the power half. One of the classical examples is given in the Fischer's paper in 1936 in which he considered the four variable series as sepal length x 2 as the sepal width x 3 as the petal length and x 4 as the petal width and this data I have taken from the book of Anderson. And 50 observations were taken on two populations, one is iris versicolor and another is the iris setosa. The sum rise data is x bar 1 is equal to 5.936, 2.770, 4.260, 1.326. This is the mean vector for the vector, sample mean vector based on 50 observations on the iris versicolor trees and the x 2 bar vector that is on the 50 random observations taken on iris setosa trees 5.006, 3.428, 1.462, 0.246 and n 1 plus n 2 minus 2 is 98 s. So, that is given here I am not writing it here. So, t square by 98 value turned out to be 26.334. So, if I consider t square by 98 into 95 by 4 that is equal to 625.5 which is highly significant. If I take f 495 at say 0.01 etcetera that is 3.52 only. So, naturally so, hypothesis H naught that is mu 1 is equal to mu 2 will be certainly rejected. Here the simultaneous confidence intervals have also been obtained simultaneous confidence intervals for mu i 1 minus mu i 2 for i is equal to 1 to 4 they are also obtained. So, it is something like 0.930 plus minus 0.337 minus 0.658 plus minus 0.265 minus 2.798 plus minus 0.270 1.080 plus minus 0.121. You can see that 0 does not belong to any interval. In fact, this is quite different from 0 this is quite different from 0 this may be little bit closer to 0. So, naturally you can say that the means of the two populations are quite different. As I mentioned that one may consider linear combinations also for example, I may consider testing H naught sigma beta i mu i is equal to mu i is equal to 1 to k against H 1 sigma. So, that is not equal where beta 1, beta 2, beta k are given scalars and they are mu let me say mu star this is a given vector. Then we can construct the statistics sigma beta i x bar i minus mu star prime S inverse sigma beta i x bar i minus mu star where x bar i is actually 1 by n i sigma x j i i is equal to 1 to n i. And S is 1 by sigma n i minus 1 x j i minus x bar i x minus mu star prime S inverse sigma beta i x bar i minus mu star j i minus x bar i transpose and C is sigma beta i square by n i. Then T square will follow T square distribution on sigma n i minus 1 degrees of freedom that is hotlings T square here. So, we can consider rejecting H naught when this value is greater than T square sigma n i minus k here. In the next lecture, I will consider a problem which is based on a symmetry. We will also consider the case when sigma 1 and sigma 2 are not assumed to be known. Now, this case is again like in the case of univariate, we had only approximate procedures. In the multivariate case, however exact procedures can be constructed, but then there is a compromise like we may have to ignore some of the observations. So, I will be discussing in detail this problem in the following lecture here.