 Yesterday, we have introduced multivariate normal distribution. So, this is a p-dimensional distribution and let me recall the definition of the multivariate normal distribution. The definition was in terms of its linear combination. So, we say that a random vector x is having a p-variate normal distribution if every linear combination of its component has a univariate normal distribution and the notational form was x follows n p. Now, as a consequence of this definition we proved certain properties. For example, we showed that if x has a multivariate normal distribution then its mean vector and a variance covariance matrix will exist. So, for example, we showed that we showed here that the mean vector mu and the variance covariance matrix sigma will exist and therefore, we modified our notation to x following n p mu sigma. So, that means, when I make a statement then x has a multivariate normal distribution then at the same time we will have it is a mean vector mu which is a p-dimensional vector in the r p and sigma which will be a p by p matrix. Now, the nature of this matrix is it is a real symmetric matrix, but at the same time because it is a variance covariance matrix it will also be positive semi definite. We actually showed this statement through the definition of positive semi definiteness that means, we consider a prime sigma a and we were able to show that it is actually non negative and at the same time we were also able to find out the characteristic function of this and the characteristic function is of the form e to the power i t prime mu minus half t prime sigma t. So, using this we can also derive the distribution of the linear combinations we were also able to prove the independence result that if the matrix is diagonal sigma is a diagonal matrix then the components will be independent. We also considered a decomposition of the full random vector in terms of that also we proved. Finally, we proved that given mu and sigma we can always find out a random vector whose distribution will be n p with mean vector mu and a variance covariance matrix sigma. For this we applied a decomposition approach. We said that if there is a real symmetric matrix sigma then we can decompose it in the form of gamma d, gamma prime where d is a diagonal matrix consisting of Eigen values of sigma. The proof of positive definiteness was done. Now firstly let me start with an example which will show this decomposition. So, let me start with one problem here. Let us consider an example here for the decomposition an example on decomposition. Let us consider say x follows n 2 2 1 3 1 1 3. So, this is a positive in fact it is a positive definite matrix here this is the mean vector. So, this is a basically bivariate normal distribution. Now let us consider this sigma sigma is 3 1 1 3. Let us consider the Eigen values Eigen values of sigma. So, we apply the standard procedure we consider 3 minus lambda 1 1 3 minus lambda the determinant is equal to 0. This implies you will have 9 plus lambda square minus 6 lambda minus 1 is equal to 0. So, we will have lambda minus 2 into lambda minus 4 is equal to 0. This implies lambda 1 is equal to 2 and lambda 2 is equal to 4. These are the Eigen values of let us consider the Eigen vectors the corresponding. So, let us consider say for lambda 1 is equal to 2. For this we will have to consider a minus 2 sorry sigma minus 2 i into say a vector x is equal to 0. So, that will give me 2 x 1 plus x 2 and x 1. If I add these 2 I get x 1 plus x 2. So, here we get 9 plus lambda square minus 6 lambda minus 1. So, this is giving me lambda square minus 6 lambda minus 8 is equal to 0. So, I get lambda minus 2 sorry this is plus 8 that is why you are getting the wrong answer. So, lambda minus 2 into lambda minus 4 is equal to 0. So, that is lambda is equal to 2 and lambda is equal to 4 are the 2 Eigen values here. If I consider here 3 minus 2 1 1 and 3 minus 2 then I will get here x 1 plus x 2 is equal to 0. The second equation is also x 1 plus x 2 is equal to 0. So, this implies x 1 is equal to minus x 2. So, if we normalize it we get the Eigen vector as say nu 1 1 minus 1 and we can normalize it by dividing by square root 2. Similarly, for lambda 2 is equal to 4 we will get minus x 1 plus x 2 is equal to 0 x 1 minus x 2 is equal to 0. That means x 1 is equal to x 2. So, if I normalize I can consider the Eigen vector as 1 by root 2 1 1. So, based on this we can consider p to be 1 by root 2 1 minus 1 1 1. Then we can check that sigma is equal to p p transpose where d is the diagonal matrix consisting of Eigen values 2 and 4. So, if I consider the Eigen values we can consider say d to the power half then that will become equal to square root 2 and 2. So, this is the way actually the calculation for the B matrix was done which I showed yesterday for the existence proof here. We considered the decomposition sigma as p d half d half and p transpose which I called B naught B naught transpose. So, here B naught will become equal to 1 by root 2 1 minus 1 1 1 root 2 0 0 2. We can calculate it is equal to 1 by root 2 root 2 minus root 2 2 and 2 that is equal to 1 minus 1 root 2 root 2. So, B naught matrix is coming like this that means sigma can be written as B naught B naught transpose and using this we can define if I am considering this as say B 1 this as B 2 then I am having B 1 Z 1 plus B 2 Z 2 plus mu. Now, in this particular case I took mu to be 2 1. So, if I consider this vector let us call this as mu 2 1 then I can put it here mu plus this. So, this is equal to x this will have n 2 mu and this sigma distribution where your Z 1 and Z 2 are independent normal 0 1 random variables. So, here what I have shown here that if I am given a mean vector and a variance covariance matrix which is a positive definite here. In fact, it can be positive semi definite also using this I consider the decomposition of this I find out the Eigen values which are 2 and 4 respectively corresponding to that I find out the Eigen vectors. So, that is equal to 1 minus 1 multiplied by a constant and similarly 1 1 for the second 1 multiplied by a constant I consider the normalized 1. So, that if I consider the matrix of this Eigen vectors this is actually an orthogonal matrix here. So, the decomposition of sigma is now P d P transpose where P is given by this and d is the diagonal matrix consisting of the Eigen values of this in the diagonals. Based on this I define d to the power half that is root 2 0 0 2 that is square root of the diagonal entries and based on this I consider B naught B naught is then actually P into d half. So, that is equal to 1 by root 2 1 minus 1 1 1 and then root 2 0 0 2 and this matrix can be written like this and if the if the columns of B naught are written as B 1 B 2 and I consider now B 1 Z 1 plus B 2 Z 2 then this is they are normal distributions standard normal random variables. So, this is a vector now and I add here mu vector here and I define x as this then this x will have normal two dimensional normal distribution with mean vector mu given by this and variance covariance given by this. So, this is the application here of the theorem that I proved yesterday that there can be always defined a normal distribution with the given mean vector and a variance covariance matrix. Let us proceed further here for some further properties of the multivariate normal distribution. This result I state in the form of a theorem let x follow a multivariate normal distribution with mean vector mu and variance covariance matrix sigma and the rank of sigma is m. So, this is if and only if x can be written as mu plus B Z where B B transpose is equal to sigma rank of B is equal to sigma rank of B is equal to m and Z is a collection of independent standard normal random variables. So, the proof of the necessity part that means if I am writing this then for the necessity part see the previous steps. Let us consider say t prime x t prime x is equal to t prime mu plus t prime B Z. So, this implies that t prime x this will follow n 1 t prime x sorry t prime mu t prime sigma t. So, this implies that x will follow n p mu sigma. So, this implies that x will follow n p mu sigma. So, since this is a necessary and sufficient condition for the multivariate normal distribution we can give an alternative definition of the multivariate normal distribution in terms of this characterization. If you remember the original definition that I gave for the multivariate normal distribution that was in terms of the linear combinations only. The original definition if I recall here a random vector x is said to have a p-variate normal distribution if every linear combination of its components as a univariate normal distribution. But now by this results that we have proved here we are now able to give an alternative definition an alternative definition of the multivariate normal distribution. So, a p-dimensional random vector x is B is a p by m matrix of rank m and z is an m by 1 vector of independent standard normal random variables. So, this definition is actually using this representation that I have proved here. So, based on this this is an alternative way of defining a multivariate normal distribution. So, basically again you can see that you can think of these as linear transformations obtained from univariate normal distributions and from there we are actually constructing this. So, this is a constructive definition the previous definition was sort of characterization. Then we can again obtain let us consider suppose I say x follows n p mu sigma and I consider say let c be a q by p matrix and let us consider say c x here and I define it as to y. So, naturally then y is q by 1. Now consider say linear combinations of y. So, linear combinations of y let us consider something like l prime y. So, l prime y is equal to l prime c x. So, this is nothing, but I can call it l 1 prime x where l 1 is equal to c l c prime l. If I look at the dimension here c is q by p. So, this should be 1 by q this is q by p and this is p by 1. So, here l 1 will be having dimension p by 1 because this is p by q this is p by 1. So, this is becoming p by 1. So, this is 1 by p by 1. So, if I look at l prime y I have written it as l 1 prime x this is a linear combination of components of x. Now if we remember yesterday's working out after we defined the multivariate normal distribution let me show you the result once again. We talked about the distribution of the linear combination. If I am considering x following n p mu sigma and t is any p dimensional vector then t prime x has a univariate normal distribution I am defining v is equal to t prime x it has a univariate normal distribution t prime mu t prime sigma t. So, if I look at this then what we are getting l 1 prime x this will follow normal l 1 prime x. Mu l 1 prime sigma l 1 as the variance term. Now in place of l 1 let us substitute c prime l everywhere what does it will mean? It will mean l prime y that will follow normal l prime c mu l prime c sigma c prime l. So, this is a linear combination. What I have done I have substituted l 1 is equal to c prime l everywhere. So, this is a linear combination of c mu if I let us define say nu is equal to c mu and let us define say sigma star is equal to c sigma c prime then this is nothing, but normal distribution with mean vector l prime nu and l prime sigma star l where l is a q by 1 vector. So, by definition of multivariate normal this will imply that y will follow q dimensional with mean vector nu and variance covariance matrix sigma which is actually n q c is equal to c mu c sigma c prime. That means, a collection of the linear combination of multivariate normal distribution against has a multivariate normal distribution with the required number of components. We can write it in the terminology here. Thus a collection of linear combinations of linear combinations of components of a multi normal random variable has again a multi normal distribution. Next let us consider the conditional distributions. If we recall if I have x y following a bivariate normal distribution then the conditional distributions of x given y and y given x are univariate normal distribution. So, if I look at x given y then it is univariate because one dimension. Now, I can consider the decomposition of a random vector into two parts each of them may be random vectors. So, let us consider let us consider say x is equal to x 1 with r components and x 2 as n sorry p minus r component this is p by 1 vector. So, simultaneously I decompose mu as mu 1 mu 2 in r and p minus r components and the variance covariance matrix also I decompose here you have r here you have p minus r here you have r here you have p minus r components. So, basically you are saying that x 1 will follow n r mu 1 sigma 1 1 x 2 will follow n p minus r mu 2 sigma 2 2. Now, we want to consider the conditional distributions of say x 2 given x 1. Similarly, x 1 given x 2. So, in this one we will need certain inverses let us prove a result for that first we have to consider x 2 given x 1 given x 2. So, in this one we will need certain inverses. Let us prove a result for that first let us consider the column spaces of sigma 1 2 and sigma 1 1. Let us assume say rank of sigma is equal to say s then there exists a matrix say p by s of rank s such that c c transpose is equal to sigma. This existence I have shown you earlier in the previous discussion and if I decompose this c as c 1 c 2. So, this is p by s. So, this will be some r by s and this will be p minus r by s. If we consider this decomposition then we will have c 1 c 1 transpose is equal to sigma 1 1 and c 2 c 1 transpose will become equal to sigma 2 1 we can write c c transpose. So, this will become c 1 c 2 into c 1 transpose c 2 transpose. So, that will give me this. Now let us consider say say a vector y belong which is say orthogonal to columns of say sigma 1 1. That means, sigma 1 1 into y is equal to 0. This will imply that c 1 c 1 transpose y is equal to 0. This will imply that c 1 c 1 transpose y is equal to 0. This will imply if I pre multiply by y transpose I will get y transpose c 1 c 1 transpose y is equal to 0. This will imply c 1 transpose y is equal to 0. This will imply that c 2 c 1 transpose c 2 c 1 transpose y is equal to 0. This implies that sigma 2 1 y is equal to 0. So, this will imply that y is orthogonal columns of actually I am writing it is orthogonal to the rows basically sorry it is not rows actually because I have considered sigma 1 1. So, if I write here I will be multiplying this column vector into the rows of this. So, if they are 0 that means rows of sigma 1 1 are orthogonal to this one, but again because sigma 1 1 is a symmetric matrix. So, rows and columns both are same. So, rows are columns because sigma 1 1 is a symmetric matrix. So, the statement will be same. So, if y is orthogonal to rows of sigma 2 1 this implies y is orthogonal to columns of sigma 1 2 because rows of sigma 2 1 will be columns of sigma 1 2 because sigma 2 1 is transpose of sigma 1 2. So, this will imply that column space of sigma 1 2 is a subspace of column space of sigma 1 1. So, there exists a matrix say b such that sigma 2 1 is equal to b sigma 1 1. So, if I consider a G inverse of sigma 1 1 say let us use the notation say sigma 1 1 inverse. So, here this I am putting as a generalized inverse here. So, if I consider say sigma 2 1 sigma 1 1 G inverse sigma 1 1 then this will become equal to b sigma 1 1 sigma 1 1 inverse sigma 1 1 that is equal to b times sigma 1 1 that is equal to sigma 2 1. So, this is unique. So, although generalized inverse although G inverse is not unique, but this term is unique. So, this term can be utilized for derivation of the conditional distribution which I will be using now. Let us define x 1 is equal to some z 1 and x 2 minus sigma 2 1 sigma 1 1 G inverse x 1 equal to say z 2. If I am considering x as a multivariate normal distribution and the components x 1 x 2 are also multivariate normal then naturally this is linear combinations. So, z 1 and z 2 will also be multivariate normal distributions and also if you look at the dimension this is p r dimensional and here we are having the dimension of x 2 as p minus r. So, if I put say z 1 z 2 let us call it say z then this will have p dimension. So, from definition of n p it can be easily shown that z follows n p this z 1 z 2. Now, let us consider covariance matrix between say z 1 vector and z 2 vector. So, let me write it as c z 1 z 2 that is equal to c of x 1 x 2 minus sigma 2 1 sigma 1 1 inverse x 1 that is equal to c of x 1 x 2 minus c of x 1 sigma 2 1 sigma 1 1 inverse x 1. So, this is equal to sigma 1 2 minus now you look at this one this will give me sigma 1 1 sigma 1 1 inverse. So, this is actually generalized inverse here sigma 1 1 inverse sigma 1 2 now this is equal to 0 that is a null matrix. So, this z 1 and z 2 are statistically independent. So, the distribution of z 2 and z 2 given z 1 are same because if z 1 and z 2 are independent then the conditioning on z 2 by z 1 has no effect. So, the distribution of z 2 and z 2 given z 1 is the same. So, if I write it in terminology it will turn out to be z 2 given z 1 this follows n p minus r and the distribution of z 2 will be coming from the linear combination of x 1 x 2 because this is a linear combination defined by minus sigma 2 1 sigma 1 1 inverse and i of x 1 x 2. So, this linear combination is giving this. So, if I consider this then I get straight forwardly mu 2 minus sigma 2 1 sigma 1 1 inverse mu 1 as the mean vector. For the dispersion matrix it will come as this into sigma 1 1 sigma 1 2 sigma 2 1 sigma 2 2 and the transpose of this on the other side. So, that gives me straight forwardly let me write it as the dispersion matrix of z 2 let us derive this. The dispersion matrix of z 2 can be derived the dispersion matrix of z 2. So, that is equal to dispersion matrix of x 2 minus sigma 2 1 sigma 1 minus sigma 1 1 inverse x 1. So, this will become equal to dispersion matrix of x 2 minus sigma 2 1 sigma 1 1 inverse sigma 1 2 and the same term will come 2 times. So, I will put 2 times here plus sigma 2 1 sigma 1 1 inverse sigma 1 1 sigma 1 1 inverse sigma 1 2 inverse 2. So, this is equal to sigma 2 2 minus twice. Now, this term you see here sigma 2 1 sigma 1 1 inverse sigma 1 1 is again sigma 2 1. So, this term and this term are the same. So, this one of them gets cancelled out when we left with. So, we are saying z 2 given z 1 this follows n p minus r mu 2 minus sigma 2 1 sigma 1 1 inverse mu 1 inverse sigma 2 2 minus sigma 2 1 sigma 1 1 inverse sigma 1 2. Now, we substitute here z 2 is in terms of x 2 here. So, we put it here there. So, x 2 given x 1 that will follow n p minus r mu 2 plus sigma 2 1 sigma 1 1 inverse and here x 1 will get added up because I have brought this term to the other side minus mu 1 the variance term will not change. Now, the fact with that we have used that is the column space of sigma 1 2 is a subspace of the column space of sigma 1 1. This is following a decomposition that we have used for a positive definite matrix because this term is coming here or positive semi definite also it will be true. If it is not positive semi definite then this decomposition will not be available to us and therefore, this statement that column space of sigma 1 2 is a subspace of the column space of sigma 1 1 need not always be true. So, we are in fact making full use of the positive semi definiteness of the variance covariance matrix. Now, using this property we are able to write down sigma 2 1 as b times sigma 1 1 and due to that I can have a unique definition of sigma 2 1 sigma 1 1 inverse sigma 1 1. Now, why this was required because it is appearing in the ultimate expression here for the variance covariance matrix of the conditional distribution. So, although the G inverse has many representations, but the term that we will get here by this calculation it will be unique here. So, as a remark let me write for an arbitrary symmetric matrix say P is equal to P 1 1 P 1 2 P 2 1 P 2 2 P is equal to P 1 1 P 1 2 P 2 1 P 2 2 it is not always true that the column space P 1 2 will be a subspace of the column space of P 1 1. This holds only under the assumption that P is non-negative definite since sigma is dispersion matrix this fact holds here. Now, next we prove the reproductive property of the multivariate normal distribution. If we remember that if we are considering independent univariate normal distributions then the linear combinations of independent univariate normal distributions again have a univariate normal distribution and the means and variances are defined accordingly. Now, this type of property can be generalized to multivariate normal distribution also. So, let us consider this property now. Basically this is linearity property. So, let us consider x 1, x 2, x n be independent multi normal distributions with x i say following n p mu i sigma i for i is equal to 1 to n. For a 1, a 2, a n let us say not all 0. Let us define say u is equal to sigma a i x i then u will follow a multivariate normal with mean vector sigma a i mu i and variance covariance matrix sigma a i square sigma i. So, you can see this is a state forward generalization of the result which is available for the univariate normal distribution. There mu i is where the scalars and sigma i square where the variance term. So, here it has become a matrix here. So, the proof is actually based on the definition that is the linear function we can use. So, for example, if I write say L prime u. So, what is L prime u? L prime u will become sigma a i L prime x i. Now, if I define say y i is equal to L prime x i then y i's will follow normal univariate normal with L prime mu i and L prime sigma i mu i. For i is equal to 1 to n and y 1, y 2, y n will be statistically independent. So, now this implies that sigma a i y i that will follow univariate normal with sigma a i L prime mu i and sigma a i L prime mu i and sigma a i square L prime sigma i L. So, this you can write as normal with sigma a i sigma L prime sigma a i mu i L prime sigma a i square sigma i L. And this sigma a i y i is nothing but L prime u. So, by the definition of multivariate normal distribution we have u following n p sigma a i mu i sigma a i square sigma i. Now, we can consider the sampling. Suppose, x 1, x 2, x n is a random sample from n p mu sigma distribution then if I consider x bar vector as 1 by n sigma x i i is equal to 1 to n. Then that will follow n p mu 1 by n sigma. So, we are able to obtain the distribution of the sample mean in sampling from a multivariate normal distribution. There are some other results which are related to the multivariate normal distribution especially they will be useful for deriving its. For example, if you remember in the univariate normal distribution if I consider the sum of squares of the independent normal random variables then it is having a chi square distribution. So, similarly if I consider some squares etcetera which are quadratic forms which are related to the multinormal distribution then they are also having chi square distributions under certain conditions. So, we have some results which I will just mention here. For example, let p be an idempotent matrix then rank of p is equal to trace of p and also rank of p plus rank of i minus p that will be equal to the rank of p plus rank of i minus p that will be equal to the dimension. So, this is n by n here. So, the dimension will be equal to the rank of p plus rank of i minus p. Let us consider a simple illustration of this. Let us take say p is equal to say b 1 c 1. So, this is n by n. This is say n by r into n where rank of b 1 is r, rank of c 1 is r and rank of a is say rank of p is equal to r. Let us define the left inverse of b 1 as l and right inverse of c 1 as say r. Let us consider say l b 1 c 1 b 1 c 1 r that will be equal to l b 1 c 1 r that is equal to i r that is equal to c 1 b 1. So, this will imply that r is equal to rank of p that is equal to trace of i r that is equal to trace of c 1 b 1 that is equal to trace of b 1 c 1 that is equal to trace of p. Recall the definition of idempotent matrix that is p square must be equal to p. So, if I consider i minus p square that is equal to i minus p. So, that means, rank of i minus p will be equal to trace of i minus p. So, rank of p plus rank of i minus p that will be equal to trace of p plus trace of i minus p that is equal to trace of p plus trace of i that is equal to n. This result is useful in proving certain properties. In the most important result in this direction is actually known as Fisher Cochrane theorem. Let me give the theorem in it is full form. Let us consider say y i following normal mu i 1, i is equal to 1 to n. Suppose, these are independent and let us define say q i is equal to y prime a i y. Now, I am considering here y to be the vector y 1 y 2 y n and rank of a i that is equal to rank of q i that is equal to n i. Actually rank of a quadratic form is actually the rank of the matrix which is given there. Otherwise, it has no significance as such and I am assuming a i is a real symmetric matrix. So, if I consider y prime y is equal to say q 1 plus q 2 plus q k then a necessary and sufficient that q i follows chi square n i lambda i and are independent is that. So, this is actually non-central chi square distribution. I will spend some time on the discussion of non-central chi square distribution also because whatever chi square distribution we have done so far are actually central chi square distribution. But if I consider a normal distribution with mean mu i then if I consider the square of that see y i minus mu i square if I consider that will be chi square on 1 degrees of freedom. But if I consider y i square itself then it will have a non-central chi square distribution with 1 degree of freedom and non-centrality parameter mu i square by 2. So, here I am getting this quantity here. So, and they are independent is that n is equal to sigma n i i is equal to 1 to k and if n is equal to sigma n i then lambda i that is equal to sigma mu prime a i mu where mu is the expectation of y and sigma lambda j that is equal to sigma n i. As a corollary of this you also have the following result. Let y 1, y 2, y n be independent and identically distributed standard normal random variables. Then a necessary and sufficient condition y prime a y follows chi square k is that a is idempotent. And k is equal to trace of a that is rank of a. I will follow up this theorem by some further results on the connection of the multivariate normal distribution with a chi square distribution and also we will introduce the non-central chi square distribution because in the discussion we have used that thing. So, I will briefly discuss the non-central chi square distribution also. We also have some further characterizing properties of the multivariate normal distribution. So, I may briefly describe those things also in the next lecture.