 In this section of this course, we will introduce methods of multivariate statistics . Now, we have seen in the first section on random variables that we can also discuss bivariate and multivariate distributions. That means, for example, you consider a data on the patient when a patient goes for some diagnostic test. So, concerned physician or the doctor, he may record his various characteristics. For a for example, his age is recorded, x 2 is for example, his weight and x 3 could be his say systolic blood pressure, x 4 could be his systolic blood pressure, x 5 could be his say sugar level, sugar glucose level etcetera. So, in general then for a particular patient you have the data on 5 variables say. We can use it as a row vector then I will put a transpose here otherwise we can consider column vector then it will be x 1, x 2, x 5. In particular we have discussed bivariate normal distribution and also some specific problems on joint distributions in the course on probability and statistics. So, the primary thing that we have to notice here is that they may there may exist some correlatedness among these variables. And now like in the case of univariate distributions, we established normal distribution as one of the important distributions or you can say more frequently used distributions. The reason was the application of the central limit theorem that means, whenever we are considering averages or the summations then the data can be approximated or the distribution of the sums or the means can be approximated by the normal distribution. In a similar way we also have a multivariate central limit theorem which I may mention briefly and therefore, that brings us into the focus a multivariate normal distribution. Now, a multivariate normal distribution can be considered as an extension of a bivariate normal distribution and we will introduce the concept. So, we will firstly study multivariate normal distribution and then in particular certain distributions which are used for inference. So, for example, in the univariate case you had chi square distribution t distribution etcetera which were related to the normal distribution. Similarly, in the multivariate case there will be certain distribution such as we short distribution or hotelings t square distribution etcetera which will be used for the inference purposes. So, in this particular course I will in this particular section of this course we will introduce various multivariate distributions which are related to multivariate normal distribution. So, let me start with the theory of multivariate distributions. So, the first theorem or the first result which is actually attributed to Kramer-Wold let X be a random vector of order p. So, that means, we are assuming X is a mapping from the sample space into R p. Then the distribution of X is known if and only if the distribution of every linear combination say t prime X is known. So, basically characterization of a multivariate distribution can be done in terms of its linear combinations, but of course, we have to consider every linear combination. The proof is based on a characteristic function approach. Let us consider the characteristic function of say X is equal to X 1 X 2 X p at the point say t is equal to t 1 t 2 t p. So, let us use some notation phi X t then it is equal to expectation of e to the power i summation t j u X j j is equal to 1 to p which also can be written as expectation of e to the power i t prime X. Now, if I give this name as v then this is equal to expectation of e to the power i v. So, this can be then considered as let us call this expression as say 1 the expression 1 can then be called characteristic function of random variable v that is equal to t prime X at point say t is equal to 1. So, if the distribution of v is known for all t belonging to p dimensional Euclidean space then the characteristic function of v is known at t is equal to 1 for all t belonging to r p this implies characteristic function of X is known using 1. Now, conversely so this implies that distribution of X is known conversely assume that the distribution of X is known then phi X at the point say t of t where t is a real number then this is becoming expectation of e to the power i sigma t t j X j that is equal to expectation of e to the power i t v that is the characteristic function of v at the point t that is characteristic function of v is known at t. So, distribution of v is known for all t. So, this theorem actually it is a characterization theorem that means it says that the distribution of a random vector can be described in terms of the linear combination provided all the linear combinations distributions are known. Conversely if the distribution of a random vector is known then all its linear combinations will have a known distribution. Now, in fact we use this definition in the first case for introducing a multivariate normal distribution later on we will see equivalent versions, but we will find this quite convenient to introduce a multivariate normal distribution through this. If you remember a property for the bivariate normal distribution in the course of probability and statistics that we proved that if x y has a bivariate normal distribution then every linear combination A x plus B y has a univariate normal distribution. Conversely if for every A and B A x plus B y has a univariate normal distribution then x y has a bivariate normal distribution. So, basically you can see that this theorem Kramer-Wald theorem is a most general version of this form this result. So, basically we use the same definition for a multivariate normal distribution. So, we define definition of a multivariate normal. So, a random vector x said to have a pivariate normal distribution if every linear combination its components has a univariate normal distribution. And we write x follows and p as a remark let me mention here that to take the case of t is equal to 0 vector that means basically we are saying t prime x is 0 then we take we may consider degenerate distribution at 0 also to be a normal distribution. Now, let me some introduce some multivariate notations. Suppose I consider let us consider the p dimensional vector x 1, x 2, x p. So, this is say p by 1 vector another vector is say y which is maybe of a q dimension say y 1, y 2, y q. Suppose x and y they are random vectors then the mean vectors are defined by. So, here you will have all the components coming here expectation of x 1 and so on expectation of x p which we may call say mu 1, mu 2, mu p that is equal to mu vector and similarly say for the y which let us give some notation say mu 1, mu 2, mu q that is a new vector. Now, in the case of one variable we have the variance. So, for a multivariate that means for a random vector we will have a variance covariance matrix. The variance covariance matrix or say dispersion matrix it is defined as d x say that is equal to variance of x 1, variance of x 2. So, in the diagonals we will have the variance of the components and in the off diagonal terms we will have covariance between x 1, x 2 and so on covariance between x 1, x p. Now, these terms will be same this is a symmetric matrix d x is a symmetric matrix and if we consider. So, this can also have another interpretation this is equal to expectation of x minus mu x minus mu prime this is a column vector and this is a row vector. So, if you multiply we get this matrix here. Now, covariance matrix between two vectors say x and y this is called say c x y that will consist of all the covariances that is covariance between x 1, y 1 covariance between x 1, y 2 and so on covariance between x 1, y q and so on. Here you will have covariance between x 2, y 1 and so on and here you will have covariance between x p, y q this is a p by q matrix this is actually if you consider c y x then it is transpose of this. If we consider linear transformation of a vector suppose a is a is a v by p matrix and say b is say s by q matrix then a x if I consider this then this will have a times expectation of x and the dispersion matrix of a x that will become a times dispersion matrix of x into a transpose. This can be proved easily let us consider say let us take say a to be a 11 and so on a 1 p and so on a v 1 a v p. So, a x then will become equal to see we can consider this as the vectors here see this is actually a 1 prime and so on a v prime vector multiplied by x here. So, I can consider it as a 1 prime x and so on a v prime x. So, if I consider expectation of a x so, that will become component wise expectation that will become a 1 prime expectation of x and so on a v prime expectation of x. So, that is equal to a 1 prime and so on a v prime expectation of x. So, that is equal to a times expectation of x. In a similar way we can consider the dispersion matrix of a x the dispersion matrix of a x it is equal to expectation of a x minus a mu into a x minus a mu transpose. So, that is equal to a times expectation of x minus mu into x minus mu transpose into a transpose. So, that is equal to a times dispersion matrix of x into a transpose. If I use the notation dx is equal to sigma then we can say that d of a x is equal to a times sigma a transpose. If we are considering say x 1, x 2 and so on x k these are say some they are say p dimensional random vectors and say a 1, a 2, a p, a k these are say r by p matrices. Then, let us consider say a j x j sigma j is equal to 1 to k then expectation of this will become sigma a j expectation of x j and also if I consider say dispersion matrix of sigma a j x j then that is equal to sum of a j dispersion matrices of x j a j transpose plus twice double summation. So, we may put not equal to I think that will be better. We put a i c of x i x j a j transpose that is the covariance matrix between x i and x j. So, these are the certain you can say linear t properties of the random vectors. Now, we go back to our definition of the multivariate normal distribution. So, if we remember we define that if every linear combination as a univariate normal distribution then we say that x as a p dimensional or p variate multivariate normal distribution. So, now, let us look at properties of the multivariate normal distribution, various properties. Let us assume that x follows n p then the first thing is that this will imply that if I consider the components x 1, x 2, x p then x i will follow n 1 for i is equal to 1 to p by definition of multivariate normal because every linear combination. So, x i since each x i is a linear combination by choosing E i x where E i is equal to 0 0 and so on 1 at the ith place and 0 0 0. Now, if x i has a univariate normal distribution it will have some mean and variance hence expectation of x i is equal to mu i and variance of x i that is equal to sigma i square exist. This will exist for i is equal to 1 to p and also if I consider covariance between x i and x j then it is less than or equal to square root of variance of x i and variance of x j. So, that is equal to sigma 1, sigma 2. So, this will imply that if I consider a say absolute value this will imply that covariance between x i, x j also exists say sigma i j. So, mu is equal to so mu transpose let me write mu 1 mu 2 mu p and sigma that is equal to sigma 1 square sigma 2 square sigma p square sigma 1 2 and so on sigma 1 p and so on sigma 2 p etcetera this exist that is mean of x and mean vector of x and dispersion matrix of x exist. That means I started with the assumption that if x has a multivariate normal distribution then certainly its mean vector and the dispersion matrix are well defined. So, we will use the notation x follows n p mu sigma. Now, further let us consider say t belonging to r p and let us take say v is equal to t prime x. Now, by definition v follows univariate normal also expectation of v will become equal to t prime expectation of x that is equal to t prime mu and the dispersion matrix or you can say variance of t prime x that will be equal to t prime sigma t. So, what we are proving here is that v follows normal distribution with mean t prime mu and variance t prime sigma t. So, if x has a multivariate normal distribution we are able to identify its mean and variance and at the same time we are able to identify completely the distribution of any linear combination here. Now, in terms of the linear combination one can write down the characteristic function that can be used here. So, next we find the characteristic function of x. So, phi x t that is equal to expectation of e to the power i t prime x. So, this can be considered as the characteristic function of v that is equal to t prime x at t is equal to 1. Now, since v has normal distribution we can write down the expression as e to the power i mu t. So, t prime mu minus half t prime sigma t half sigma square. So, this is sigma square t square t is 1. So, this is the distribution this is the characteristic function of x at a general point t. So, if we go by the definition the definition of a multivariate was characterized in terms of its linear combinations. Now, assuming a multivariate normal distribution we are able to identify its mean vector its dispersion matrix at the same time we are also able to identify completely the distribution of its linear combinations and also we have found the characteristic function. So, what we are now we will do the converse. If we assume the distribution of the linear combination let us look at the distribution of the random vector itself. Conversely let us assume t prime x follows normal with t prime mu t prime sigma t for every t in the p dimensional Euclidean space. Then this will imply that the characteristic function of this t prime x at the point 1 that is equal to e to the power i t prime mu minus half t prime sigma t, but according to the definition it is equal to expectation of e to the power i t prime x. So, this is nothing but the characteristic function of x at t this will imply that x follows n p mu sigma. So, the converse result is also now established that if I know that every linear combination as a univariate normal distribution then it implies the exact form of the distribution of multivariate normal with the mean mu and variance covariance matrix as sigma. Now, what we do we consider the independence criteria. If the variance covariance matrix is only diagonal then let us write down phi x t that is equal to e to the power i t prime mu minus half t prime sigma t. So, that is equal to this will become sigma t j mu j. So, this is e to the power i sigma t j mu j j is equal to 1 to k minus half sigma t j square sigma j square. So, this I can write as product of i is equal to 1 to k e to the power let us put j here i t j mu j minus half t j square sigma j square, but this I can consider as the product of the characteristic functions of x j at the point t j. So, this implies that x 1, x 2, x p this will go up to p actually not k here they are independently distributed. So, as in the case of bivariate normal distribution we had seen that independence condition is equivalent to the covariance between the two variables being 0. In the multivariate case also likewise you have a generalization the independence condition is equivalent to all the covariances term being equal to 0 consequently all the correlation between the components will be 0. This is equivalent condition that means, it is if and only if the random variables are the components are independent all the correlations between the components will be 0. Conversely if all the correlations are 0 then the random variables x 1, x 2, x p they will be independent normal random variables. Now, this result can be generalized to consider decomposed vector for example, if I am considering say x is equal to say x 1, x 2 that means, I am putting some r terms here and p minus r terms here. So, we are considering decomposition. So, the corresponding decomposition of the corresponding decomposition of sigma let us consider it as sigma 1 1, sigma 1 2, sigma 2 1, sigma 2 2. So, this is r this is p minus r this is r this is p minus r. So, similarly you will have decomposition of mu as say mu 1, mu 2 then let us consider characteristic function of x e to the power i t prime mu minus half t prime sigma t. What I consider I split t also as t 1, t 2 where this is r components here p minus r components. If I have this then I can write it as e to the power i t 1 prime t 2 prime mu 1, mu 2 minus half t 1 prime t 2 prime t 1 t 2. Now, if I take sigma 1 2 is equal to 0 this implies sigma 2 1 is also 0 because this is transpose of this. Then this is equal to e to the power i t 1 prime mu 1 plus i t 2 prime mu 2 minus half t 1 prime sigma 1 1 t 1 minus half t 2 prime sigma 2 2 t 2 which I can consider as the characteristic function of x 1 at t 1 into characteristic function of x 2 at t 2 that is x 1 and x 2 they are independent. So, the correlations being 0 implying independence this is true for multivariate situation also that means, I consider multivariate components of the p dimensional vector. I am considering here one as a r dimensional vector and another as a p minus r dimensional vector and then if I put that the all the co variances between the components of x 1 with the components of x 2 they are here in sigma 1 2 and sigma 2 1 if they are vanishing then x 1 and x 2 will be independent. So, this result is also true in general. Now, I can consider any subsets in place of two subsets if I consider in general any number of subsets. Let us consider say this x as x 1 x 2 and so on say x m where this as r 1 components this as r 2 components this as r m components such that sigma r i i is equal to 1 to m is equal to p. So, the corresponding compositions of mu so that will be mu 1 mu 2 and so on mu m and for sigma it will be equal to sigma 1 1 sigma 1 2 sigma 1 m sigma 2 1 sigma 2 2 sigma 2 m sigma m 1 sigma m 2 sigma m m here sigma i j matrix this will be of the order r i into r j this is r 1 r 2 r m etcetera. So, here also you will have the result that if sigma i j for i not equal to j vanishes then sigma i j will be equal to r i into r j this is r 1 r 2 r m etcetera. So, here also you will have the result that if sigma i j for i not equal to j vanishes then sigma i j for x 1 x 2 this is for all for all i not equal to j then x 1 x 2 x m they are independent independently distributed multivariate normal distributions. So, I am not explaining the proof here it is again following on the same line that if I write the characteristic function of x and I decompose in place of this I decompose into m terms here t 1 t 2 t m and the corresponding decomposition I consider here then if I apply the independence condition then it will become the product of the characteristic functions of x 1 x 2 and x m at the corresponding terms. So, now we can say that now we prove the existence of a multivariate normal distribution let me call it in a form of a theorem there exists a random vector say x such that phi x t is equal to e to the power i mu t sorry it is equal to e to the power i t prime mu it is equal to e to the power i t prime mu minus half t prime sigma t this sigma is a real symmetric matrix. So, we decompose it. So, sigma is equal to some gamma d gamma prime where d is actually the diagonal matrix consisting of the Eigen values this contains Eigen values of sigma and gamma is an orthogonal matrix which will consist of Eigen vectors corresponding to lambda 1 lambda 2 lambda p. Now, sigma is actually positive definite the proof is quite simple actually because sigma how I define sigma was let us consider say x prime or some a prime sigma a then that will be equal to a prime expectation of x minus mu x minus mu prime a. So, this I can write as expectation of a prime x minus mu and this I can write as because x minus mu prime a I can write as square here. So, this is greater than or equal to 0. So, this is a positive definite matrix. So, lambda i is that is the Eigen values will be greater than or equal to 0. So, d and square root d that is let us define d half that is equal to square root of lambda 1 and so on square root of lambda p this is well defined actually this d to the power half actually this d to the power half and 2 d to the power half this is satisfying that it is equal to d that is why we can call it a square root matrix. So, we can consider sigma that is equal to gamma d to the power half d to the power half gamma transpose that we can write as some b naught b naught transpose. Let us consider say rank of sigma to be equal to m then m to the power half d to the power half m Eigen values will be positive say lambda 1 lambda 2 lambda m are positive and lambda m plus 1 to lambda p they are 0. So, I am assuming here actually the first m are positive and the remaining are 0. So, in general for sigma this may not be true because some in between it may be 0 and so on, but we can always arrange them in a sequence in such a way because I can interchange the order of the vector you are saying x has a sigma thing here. So, I can interchange the components of sigma and arrange in such a way that lambda 1 lambda 2 lambda m will be positive and the remaining will be 0. Let us consider this gamma to be gamma 1 gamma 2 gamma p. So, b naught is actually gamma d to the power half that is becoming something like b 1 b 2 b m and then 0s this is actually your p by p matrix. So, we can say that b 1 b 2 b m they are linearly positive and they are actually independent vectors and they are actually column vectors. So, if I consider now t prime sigma t then that will be equal to t prime b 1 b 2 b m and then null vectors here multiplied by b 1 transpose and so on b m transpose 0 by p matrix. So, we can consider it as simply sigma t prime b j square j is equal to 1 to m. Now let us consider the characteristic function e to the power i t prime mu minus half t prime sigma t. So, that is equal to e to the power i t prime mu minus half sigma t prime b j square. So, let us consider then z i to be normal 0 1 i is equal to 1 to m and let z 1 z 2 z m be independent. So, we are considering a sequence of independent and identically distributed standard normal variables. Let us further define say y i is equal to b i g i. So, y i is a p by 1 vector. So, characteristic function of y i that is equal to expectation of e to the power i t prime y i that is equal to expectation of e to the power t prime b i this is z i. So, let us change the notation here like let us put it j since z j's are normal 0 1. So, b j's if you are putting this thing then this is becoming with the 0 here and then you will get b j b j transpose there. So, it is becoming simply equal to expectation e to the power minus half t prime b i square. So, now let us consider say let us define x is equal to mu plus b 1 z 1 plus b 2 z 2 plus b m z m. We can write it as mu plus b and this notation we can use for z vector here. So, here your z is actually z 1 z 2 z m and b is actually b 1 b 2 b m. Now, let us consider the characteristic function x. So, that is equal to expectation e to the power i t prime x that is equal to expectation of e to the power i t prime mu plus i times t prime b 1 z 1 plus. So, on b m z m that is equal to e to the power i t prime mu plus e to the power i t prime mu expectation of e to the power i t prime b 1 z 1 plus and so on b m z m that is equal to e to the power i t prime mu minus half sigma t prime b j square. So, what we have proved we started with let us summarize what we have done. Suppose there is a vector mu and there is a real symmetric matrix sigma which we are assuming positive semi definite at least. We assume it to be positive semi definite then based on the decomposition of that I am defining because if it is a positive semi definite matrix I can decompose using a spectral decomposition where gamma is an orthogonal matrix and d is the diagonal matrix consisting of the Eigen values. Then we put it in an order where we are considering actually the non zeros first and then the zero ones and then the corresponding Eigen vectors are there. So, I arrange them by multiplying this here and this becomes zero here because we are multiplying by the zero terms in the d half here. Now b 1, b 2, b m are linearly independent column vectors. So, if we consider t prime sigma t then it is becoming t prime b 1, b 2, b m then zeros and here we are getting the transposes of this into t here. Now using this, this is simply becoming the sum of squares of scalars here t prime b j square. So, now if I consider e to the power i t prime mu minus half t prime sigma t we wanted to prove that there is a random variable for which this is a characteristic function. Now this characteristic function has a decomposition of this nature e to the power i t prime mu minus half summation t prime b j. Now from here what I consider I realize that it is of the form half sigma square t square kind of thing. So, I consider a standard normal variables z i's which are independent and I define y i using this b j's. So, I consider b i z i. So, this becomes multivariate here. I am multiplying by a p by 1 vector into a standard normal variable here. So, each of the components will become a different normal variable here, but each of them will have mean 0, but variance will become b i b i b i transpose. So, if I consider that thing characteristic function of y i will become e to the power i because the mean term is 0. So, it is simply becoming t prime b i square. Now, I define a vector x using these terms. So, let us see mu was already given to me b 1 b 2 b m we found out z 1 z 2 z m are the standard normal variables. So, using this I define a random vector x here which is of course, mu plus b z in the compact notation. So, if we use this then let us look at the characteristic function of x then this is turning out to be simply after using this decomposition I write it in this particular fashion and I use the characteristic function of the y i's here these are y 1 y 2 y m. So, it is of this nature. So, ultimately I get it as of the form e to the power i t prime mu minus half t prime sigma t. So, we are able to construct a random vector whose characteristic function is exactly the characteristic function of a multivariate normal distribution. So, basically it means that given a p dimensional random vector and a p by p positive semi definite matrix we can define a random vector which will have a p dimensional multinormal distribution with that as the mean vector and that variance covariance matrix. So, this is a characterizing property and the existence of the multivariate normal distribution. Here one thing is there we consider the rank to be equal to m now m can be less than p or it can be equal to p. So, if rank of sigma that is equal to m is less than p we call the multivariate normal distribution to be a singular distribution. So, you can also associate it with the sigma matrix. So, sigma inverse will exist if it is non-singular if it is singular then sigma inverse will not exist. So, later on when we consider the density then this point will be important. Now in the next lecture I will consider some further properties of the multivariate normal distribution and we will look at the estimation of the parameters etcetera for the particular distribution.