 So, we continue our discussion on the multivariate normal distribution and its properties. We have seen various characterizing properties which also helped us in giving an alternative definition of the multivariate normal distribution. Now, we are trying to see its connections with chi-square distribution as in the case of univariate normal distribution. For that purpose I stated Fisher-Cochron theorem and another lemma which is saying that y prime a y will have a chi-square k distribution. So, this is giving a necessary and sufficient condition that if we are having a standard normal random variables, then if I consider y prime a y that is a quadratic form this will have chi-square k. We know that y prime y has a chi-square k, but if I consider any a here then for idempotent matrix this will be true. Now, let us consider further results on this. The next result is that if x has a np mu sigma distribution then let us consider say q that is equal to x minus mu transpose ax minus mu then that follows chi-square k. This is if and only if sigma a sigma a minus a sigma is null and in this case you will have k is equal to trace of a sigma. Let us look at the proof of this. So, we can write x minus mu is equal to some b z. If you remember the representation that I obtained for necessary and sufficient condition for the multivariate normal distribution, we were able to write a multivariate normal as mu plus b z where z is a vector consisting of the standard normal independent random variables of dimension m. So, let us consider the decomposition of sigma as b b transpose rank of b say m which is also the rank of sigma and the quadratic form q that is x minus mu prime ax minus mu. So, since x minus mu is b z this becomes z prime b prime a b z that we can write as z prime and this b prime a b we can write as some matrix c. Now, if we implement this result that if I am having a collection of a standard normal random variables then y prime a y has a chi-square k if and only if a is idempotent. So, that condition will be applied to c and also the trace and rank of a will be equal to k here. So, if we apply this result q will follow chi-square k if and only if c is idempotent and k is equal to trace of c that is rank of c. Now, c is idempotent this condition is equivalent to so, c is equal to b prime a b. So, b prime a b is idempotent. So, this you can write as b prime a b into b prime a b is equal to b prime a b. So, I bring it to the left hand side. So, we can write as a b b prime a minus a b is equal to a null matrix. Now, b b prime is sigma. So, this becomes b prime a sigma a minus a b is equal to null matrix. Again this is equivalent to I can pre multiply by b and I can post multiply by b prime this is equivalent. Now, the question is that why is this equivalent because if I am having this I can consider here a transformation from here to get this thing here. So, this will be implying c sigma a sigma a minus a sigma is equal to null. Now, k is equal to trace of c that is equal to trace of b prime a b that is equal to trace of a b b prime because trace of some matrix c into d is same as trace of d into c. So, trace of a sigma. Now, as a remark let me mention here if sigma is non singular then I can multiply by sigma inverse and sigma inverse here then this condition is a sigma a is equal to a. In that way actually you can say that a is a sigma is a generalized inverse of a that condition will be there. Now, before going to we will also discuss in detail the non central chi square distribution. However, let me talk about certain characterizations of the multivariate normal distribution. Now, some characterizations of multivariate normal distribution. Let us consider let x 1 and x 2 be independent p dimensional random vectors such that x is equal to x 1 plus x 2 follows n p then x 1 and x 2 are also n p. Let us look at the proof of this. Let us consider say a linear combination of the components of x. So, that is becoming l prime x 1 plus l prime x 2. Now, since x 1 and x 2 are independent l prime x 1 and l prime x 2 are also independent. Now, there is a characterization of the univariate normal distribution in terms of the decomposed terms. That means, if I say x 1 and x 2 are univariate normal such that x 1 plus x 2 follows univariate normal then each of x 1 and x 2 will be univariate normal. So, from this we conclude that from the known characterization of we can say that l prime x 1 and l prime x 2 are univariate normal. Now, this l I chose arbitrarily of p dimension since l belongs to r p is arbitrarily chosen we can conclude that x 1 and x 2 are n p distributed random vectors. A second characterization is a generalization of this which let me state in the full form here. Let x 1 x 2 x n be p dimensional independent random vectors. Let us consider say y 1 equal to a linear combination of say b i x i i is equal to 1 to n and y 2 as a linear combination of another linear combination of the same where b i's and c i's they are scalars. Let us consider say b as b 1 b 2 b n and c as say c 1 c 2 c n then we have the following that is x i's are i i d n p and b prime c is equal to 0 implies y 1 and y 2 are independent and secondly x y 1 and y 2 are independent implies that x i's will follow n p for any i such that b i c i is not 0 and x i's need not be identically distributed. Let us look at the proof of this. So we can consider the vector y 1 y 2 let us call it say y I put them in the 2 dimensional form here. So this is now 2 p dimensional. So v is 2 p dimensional if I consider linear combination of say t prime y then that will become say t 1 prime y 1 plus t 2 prime y 2 where t is equal to t 1 t 2. If I am assuming that x i's are independent random vectors. So in the second in the first part if I am assuming that x i's are multivariate normal then these are linear combinations of the because what I have done here y 1 is a linear combination of x i's. So this is becoming t 1 prime sigma b i x i plus t 2 prime sigma c i x i that is equal to sigma b i t 1 prime plus sigma plus c i t 2 prime x i. So this is linear combination of components of x i. So t prime y will be univariate normal. So this is for any t this is 2 p dimensional. So y has n 2 p distribution that is 2 p dimensional multivariate normal distribution. Now let us consider covariance matrix between y 1 and y 2. Now that will be equal to because I have written this as b prime x. See basically what we are getting here is covariance between y 1 and y 2 will become covariance between sigma b i x i and sigma c i x i that will consist of since x i's are independent this will reduce to b 1 c 1 the dispersion matrix of x 1 plus b 2 c 2 dispersion matrix of x 2 plus b p I am taking n here b n c n dispersion matrix of x n. As we have assumed covariance terms between x 1 x i x j for i not equal to j they will be null. So this is nothing but b prime c sigma. If we are writing dispersion matrix of x i is equal to sigma then this is equal to this. Now if I am assuming here that b prime c is equal to 0 then this is simply equal to null. So we will get y 1 and y 2 are independent. So this result is proved that if x i's are independent and identically distributed p dimensional multivariate normal distributions where b 1 c 1 plus b 2 c 2 plus b n c n they are 0 then this y 1 and y 2 are independent. In particular you may consider something like this. For example I take say x 1 minus x 2 and x 1 plus x 2. So then they will be independent. Suppose I consider say 2 x 1 minus x 2 plus x 3 and say I take x 2 minus x 2 plus x 3 then they are also independent because if I consider here 2 into 0 minus 1 plus 1 1 into 1. So that will is going to be 0. If I consider say x 1 plus x 2 plus x 3 and I consider say minus 2 x 1 plus x 2 plus x 3 then here the product is minus 2 plus 1 plus 1. So they are also independent. So like that we can construct independent linear combinations here. Let us look at the part b of this. In the second part what we are saying is that if y 1 y 2 are independent then x i's must be np here for any this thing. So let us look at this. So we can make use of this is called actually Darmoy-Skitowic theorem. Let x 1, x 2, x n be independent univariate random variables. Then sigma b i, x i, i is equal to 1 to n and sigma c i, x i, i is equal to 1 to n are independent. It implies that x i's will follow normal 1 if b i c i is not 0 and can be arbitrarily distributed otherwise for i is equal to 1 to n. So let us consider say l prime y 1. So that is equal to sigma b i l prime x i and similarly l prime y 2 is equal to sigma c i l prime x i. On this apply the Darmoy-Skitowic theorem then l prime x i this will follow n 1 if b i c i is not 0. So l is arbitrary vector in p dimensional space. We conclude that x i's will follow np if b i c i is not 0. Now this is if you look at the statement this is again very powerful statement. What we are saying is that if I construct linear combinations of p dimensional random vectors and if they are independent then each of the terms in the linear combination will have a p dimensional normal distribution. Of course, we are putting a condition here that b i c i must be non-zero that means the corresponding term should be there. A third characterization is based on the decomposition that I obtained and that we gave as an alternative definition of the multivariate normal distribution also. So let us consider say y is equal to mu 1 plus say b 1 z let us call it z 1 and say y is equal to mu 2 plus b 2 z 2. Suppose these be two representations of a p dimensional random vector in terms of say vectors z 1 and z 2 of non-degenerate independent random variables and this b 1 is a p by m matrix b 2 is p by m matrix rank of b 1 is m and rank of b 2 is also m. We also assume that no column of b 1 is a multiple of some column of b 2 then y follows n p. So now you see here I am actually using the representation that I gave in the as an alternative definition of the multivariate normal distribution but in that one z 1 and z 2 are a vector of IID standard normal variables. Here I am saying is that this is a vector of simply non-degenerate random variables independent random variables and then just by putting a condition on b 1 b 2 we are getting that y must have a multivariate normal distribution. So this is also a very powerful characterization of multivariate normal distribution. Let us consider say m is equal to p then b 1 and b 2 are non-singular and then we can write b 1 as b 2 b 2 inverse b 1 that we can write as say b 2 and some matrix. This term we can write as some q. Let us assume say m is less than p let c be a vector which is orthogonal to columns of b 1 say and we write here c prime y that is equal to c prime mu 1 plus b 1 z 1 then that is becoming c prime mu 1 plus c prime b 1 z 1. Now this will become 0. So this is simply mu prime c prime mu 1 here. Now that is equal to c prime mu 2 plus c prime b 2 z 2. Now what I am getting here c prime b 2 z 2 is equal to now this is a scalar this is a scalar. So we are getting that c prime b 2 z 2 is a degenerate random variable. Now we assumed that this z 1 and z 2 are vectors of non-degenerate independent random variables. So here I am getting this as a degenerate random variable. So this is contradicts our assumption unless we have c prime b 2 is equal to 0. Now if c prime b 2 is equal to 0 c prime b 2 is equal to 0 is equivalent to saying that c is orthogonal to columns of b 2. Now let us look at this I started with c to be a vector which is orthogonal to the columns of b 1 and I am able to prove that c is now orthogonal to the columns of b 2. So this means that column space the orthogonal column space of say b 1 is a subspace of orthogonal column space of b 2. Now in this derivation I have taken b 1 b 2. Now I started with c to be a vector orthogonal to the columns of b 1 in place of that suppose I write here b 2 here then this statement will change here here I will get c prime mu 2 and here I will get b 2 z 2. So this will become c prime mu 2 and here then I can write c prime mu 1 plus c prime b 1 z 1. In that case I will get the same statement in the reverse way. So repeating the argument with an interchange in b 1 and b 2 we get orthogonal space of b 2 is a subspace of orthogonal space of b 1. So that means they are same basically column space of b 1 and column space of b 2 are same. This means that there exist a non singular matrix q such that b 1 is equal to b 2 q. So I have written here if m is equal to p then I am able to write that b 1 is equal to b 2 q and if m is less than p then also I am able to obtain a non singular matrix q such that b 1 is equal to b 2 q. So this 1 and 2 give that all the time there will be a non singular matrix. Thus there always exist a non singular matrix q such that b 1 is equal to b 2 q. Now we make use of this. So let us write say y minus mu 2 that is equal to b 2 z 2. So this implies b 2 prime b 2 inverse b 2 prime y minus mu 2 that is equal to b 2 prime b 2 inverse b 2 prime b 2 z 2 that is equal to z 2. So if I consider now y minus mu 1 that is equal to b 1 z 1 that is equal to b 2 q z 1 this implies that b 2 prime b 2 prime b 2 prime b 2 inverse b 2 prime y minus mu 1 that will be equal to q z 1. So what we are getting is that z 1 z 2 and q z 1 they have the same distribution except for a location change because both I am able to represent in terms of see this is y minus mu 2. So mu 2 is the translation here and here I am getting q z 1 that is y minus mu 1 here. So components of q z 1 they are independent. Now the condition that no columns of b 1 is a multiple of columns of b 2 then this implies that every column of q contains at least 2 non 0 elements. So by Darmoy's is Ketovic theorem then we conclude that z i is follows normal n 1 i is equal to 1 to m. So now z 1 is equal to your components of this let us call it say z 1 1 z 1 2 z 1 m. So what you are getting here is that y follows n p. Now we go to so these are the 3 characterizations now we move over to the actual density function. If you remember here in the case of 1 dimension and 2 dimensional distributions we always define a distribution and we talk about its probability mass function and the probability density function. In the case of p dimensional normal distribution I have not yet actually defined the density function. So one major reason is that when we talk about higher dimensions and if there is a for example here I am mentioning sigma as a variance covariance matrix is positive semi definite. So if it is a positive definite matrix then it will have full rank but if it is not a full rank that means the rank is say p minus 1 or p minus 2 or in general I am saying m m is less than p. That means there will be some linear relationships among the variables there. If there are complete relationships there in that case the density will exist on a subspace it will not exist on the full space that is in on p dimensional space. So that is the reason that I gave the definition of a multivariate normal distribution in terms of its linear combinations and then in terms of an alternative representation like mu plus b z where z is a collection of m independent univariate normal random variables. So there m was the rank. So that means I am able to actually define in terms of a alternative you can say characterization of the multivariate distribution I do not necessarily have to write the density function. But now I will write the density function for the full space that means when I consider the full rank then we talk about the density function and actually the representation that I have given it will be exactly used for deriving the density function. So we talk about probability density function of a multivariate normal distribution. So let us consider x following n p mu sigma and I consider full rank rank of sigma is equal to say p. If rank of sigma is equal to p then we can write x is equal to mu mu mu plus b z where b is p by p and z is a vector of independent these are iid normal 0 1. So if that is happening and also this b b prime is equal to sigma and this z is equal to actually and this z is equal to actually b inverse x minus mu. Now if I have independent normal random variables I can write down the density function. So the joint pdf of z is equal to this z 1, z 2, z p. So z prime is equal to z 1, z 2, z p that is nothing but let me use a notation f z. So that is equal to 1 by 2 pi to the power p by 2 e to the power minus 1 by 2 sigma z i square. So that will be equal to 1 by 2 pi to the power p by 2 e to the power minus half z prime z. Let me use capital letters here usually we write small letters for denoting the value of the random variable, but here for the sake of convenience I am using the capital letters here. Now this z is given in terms of this. So we write it here that is equal to 1 by 2 pi to the power p by 2 e to the power minus half. Now z is equal to this term here. So it is becoming x minus mu prime b inverse prime b inverse prime x minus mu. Now if I am assuming this b b prime is equal to sigma then sigma inverse is equal to b prime b b prime inverse that is equal to b prime inverse b inverse that will be equal to b inverse prime b inverse. So we can use this. So this is simply becoming 1 by 2 pi to the power p by 2 e to the power minus 1 by 2 x minus mu prime sigma inverse x minus mu. Now if I am obtaining the distribution of x from here then I have to calculate the Jacobian here. So what will be the Jacobian term here? In order to obtain the density of x from the density of z we calculate the Jacobian of transformation that is z is equal to b inverse x minus mu. So that is given by determinant of b inverse which is same as determinant of b inverse which is also the determinant of sigma to the power minus half. So the pdf of x is given by that is equal to 1 by 2 pi to the power p by 2 determinant of sigma to the power half e to the power minus 1 by 2 x minus mu prime sigma inverse x minus mu. Here x belongs to R p, mu belongs to R p and sigma is positive definite matrix sigma is R p by p that is p by p positive definite matrix. When rank of sigma is less than p then the multivariate normal distribution is called a singular distribution and this the density function is defined on a subspace. Suppose b that is p by k is a matrix of orthogonal column vectors belonging to column space of sigma and n that is p by p minus k b of rank say p minus k such that n prime sigma is null matrix. So let us consider the transformation u prime is equal to u prime going to x z prime where x is b prime u z is equal to n prime x is b prime u z is equal to n prime u. Then expectation of z is equal to n prime mu dispersion matrix of z is equal to n prime sigma n that is becoming null that means z is equal to n prime mu with probability 1 that is degeneracy here and expectation of x is equal to b prime mu dispersion matrix of x is equal to b prime sigma b. So x follows n k b prime mu b prime sigma b. So we can write actually b prime sigma b can be written as the product of the product of nonzero Eigen values of sigma. So b prime sigma b is non singular. So x will have density 1 by 2 pi to the power k by t b prime sigma b to the power half e to the power minus 1 by 2 x minus b prime mu b prime sigma b to the power minus 1 x minus b prime mu. So this description 1 and 2 that describes the density. If you consider say x minus b prime mu b prime sigma b inverse x minus b prime mu then that is u minus mu b b prime sigma b inverse then that is u minus mu b b prime sigma b inverse b prime u minus mu that is equal to u minus mu a generalized inverse of this u minus mu for some choice of sigma g inverse. So the density is actually 1 by 2 pi to the power k by 2 product of the product of the determinant the Eigen values i is equal to 1 to k e to the power minus. So this is actually a density on a subspace it is not the density on the full space when the rank of sigma is not full. Now before going to the estimation let us consider 1 or 2 applications of this conditional distributions or linear combinations etcetera. One example of a multivariate normal distribution let me write here. So let us consider say mu is equal to 4 3 2 1 and I consider sigma as 3 0 2 2 0 1 1 0 2 1 9 minus 2 2 0 minus 2 4. So this is what it is. So let us take say x following n 4 mu sigma. Let us consider some partitioning of this say it is equal to x 1 x 2 x 3 x 4 which I am actually writing as say x 1 and x 2 that is this is 2 dimensional and this is 2 dimensional here. Let us define say conditional distribution of conditional distribution of say x 2 given x 1 is equal to say 3 2. Now we have discussed the conditional distribution of one component given the second component. So this will follow n 2 and if I consider the corresponding decomposition of mu as say mu 1 and mu 2 then this will become mu 2 plus sigma 2 1. So I am partitioning this as sigma 1 1 sigma 1 2 sigma 2 1 sigma 2 2. So if I consider this then this term is sigma 1 1 this is sigma 1 2 this is sigma 2 1 and this is sigma 2 2. So this will become so let us calculate these terms here. So this one is now 2 1 plus sigma 2 1 is this term here 2 1 2 0 sigma 1 1 inverse is the inverse of this that is 1 by 3 1 0 0 and then you have x 1 minus mu 1. So 3 2 minus mu 1 so that will become minus 1 minus 1 and here I will get 9 minus 2 minus 2 4 minus sigma 2 1 that is 2 1 2 0 sigma 1 1 inverse into sigma 1 2 that is 2 2 1 0 that is the dispersion term here. So I will get here x 2 given x 1 is equal to 3 2 as n 2 17 by 3 11 by 3 20 by 3 minus 10 by 3 minus 10 by 3 8 by 3. So I am able to obtain the conditional distribution of x 2 given x 1 is equal to a certain number here. So this is quite interesting here we can obtain and you can actually look at this this is 4 3 2 1 and here you see x 2 given some value of x 1. So here you can see that there is a dramatic change here this is 17 by 3 which is bigger than 5 itself this is around 4 and whereas the original means of x 2 was only 2 1. So if x 1 is given 3 2 then it has increased the means of x 2 and similarly there is a substantial change in the value of the variance covariance terms here. Let us also define in the same a is equal to say 1 2 and let us consider say b is equal to 1 minus 2 2 minus 1. What is the distribution of say a x 1? So according to this a x 1 will have normal with mean. So a mu 1 because a is a scalar here a is a row vector here. So this will become a scalar and then you will have a sigma 1 1 a transpose. So you can calculate this this will be a value is simply 10 and this is 7. Similarly suppose I consider b x 2. So b x 2 is actually equal to a 2 dimensional vector here that is following normal with b mu 2 b sigma 2 to b transpose. So if you can calculate this this turns out to be 0 3 2 3 2 2 2 2 2 2 33 16 36 32. Let us also consider say covariance between a x 1 and b x 2 then this will become equal to a sigma 1 to b transpose. So that is equal to 0 6. So in this example I have shown you a direct application of the distribution theory of the multivariate normal distribution and various things were considered here. Let me give one more exercise here. Let us consider say sigma is equal to 4 1 2 1 9 minus 3 2 minus 3 25. So here I consider row as say 1 1 1 and here I will consider correlation between. So this is actually correlation matrix correlation matrix of x ok. So that means these term will be denoting correlation between x 1 x 2 not covariance it is correlation terms here. So consider find v half where this is diagonals are standard deviations and find row and also show that v half row v half is equal to sigma for this particular case. See this is an interesting thing because you can do the manipulation with the variance covariance matrix because of the positive semi-definiteness of the matrix. This is very important because it has a spectral decomposition you can have a gram decomposition in as b v transpose etcetera. So lot of nice properties are coming here. Let us consider so v half here will become equal to 2 3 5 that is the standard deviation matrix here. Let us consider v minus half. So that will be 1 by 2 1 by 3 1 by 5 0 0 0 0 0 0 and row is equal to v to the power minus half sigma v to the power minus half that is equal to 1 1 by 6 1 by 5 1 by 6 1 minus 1 by 5 1 by 5 minus 1 by 5 and 1. So this is the I mentioned about the uniqueness of the sigma 2 1 sigma 1 1 inverse term actually. So see there is a problem suppose we are calculating the inverses then sometimes the inverses will not exist or with a little variation the inverse may vary too much. So that means it is an example of a unstable matrix. Let us take one case here at least. Let me give example of one such problem here. Let us take say a is equal to 4 4.001 4.001 4.002 and b is equal to say 4 4.001 4.001 4.002 0.001. You can see here that in a and b 3 terms are exactly the same the fourth term I have modified only by adding point 0 0 0 0 0 0 1 only that much difference is there. Let us look at say a inverse a inverse is equal to minus 10 to the power 6 4.002 minus 4.001 minus 4.001 4 and if I look at b inverse then that is equal to 10 to the power 6 by 3 4.002 0.001 minus 4.001 minus 4.001 and 4. So you can see that there is a dramatic change in the value here actually determinant of a is turning out to be minus 10 to the power minus 6 whereas determinant of b is equal to 3 into 10 to the power minus 6. So there is substantial change in the value. So we are getting that a inverse is approximately minus 3 b inverse. If you look at a and b there is hardly any difference here in fact the three terms are exactly the same in the fourth term I am considering the change after 5 decimal places in the 6 decimal place there is a minor change by 0.0001. But if you look at the inverses here a is almost same as b, but if you look at the inverse so then you are getting substantial change. So this is an example of unstable system. The reason is that if I look at this that they are almost linearly dependent here. See a if you look at this is dependent so this is almost dependent here because there is a small change. In that case a small change in the value of one term makes a huge change in the value of b inverse. In the next class I will be talking about the estimation of the parameters of a multivariate normal distribution. We will also discuss the non-central chi square distribution etcetera because that concept is coming here and it will be also used in finding out the distributions of the statistics there. So that I will be taking up in the next class.