 So, today we will. So, I am going to outline this topic on jointly Gaussian random variables. So, because I have only about an R and this is a very important and fairly detailed topic. So, I will skip proofs and such I will only give you an introduction and I will and you can look up some material if you want more details. So, you know the one dimensional Gaussian. So, the p d f looks like this right looks like 1 over sigma square root of 2 pi e power minus x square by well x minus mu square minus mu square over 2 sigma square right. So, this is a Gaussian with mean mu and variance sigma square. So, now. So, we want to discuss today jointly Gaussian random variables or random Gaussian random vectors. So, we want to talk about random vector x 1 through x n. So, this is nothing but a these are all random variables and we want to talk about what it means to what it means for a vector of random variables to be a Gaussian random vector or jointly Gaussian random variables x 1 through x n. See one thing in particular I want to say right away is that a random vector is not said to be a Gaussian random vector if each of the entries is Gaussian. So, if I tell you that x 1 has some Gaussian density and x 2 has some marginal Gaussian density and so on. It is not the case that this is considered a Gaussian random vector a Gaussian random vector is much more than each of the entries being marginally Gaussian. So, in a jointly Gaussian vector in a Gaussian vector or jointly Gaussian random variables each of these guys will have a marginal which is Gaussian, but that is not sufficient it is much more than that that is what we are going to talk about today. So, in two dimensions. So, bivariate. So, let us first talk about a bivariate Gaussian and then generalize to a multivariate Gaussian or a Gaussian random vector. So, you say that x and y are bivariate Gaussian. So, x and y are some random variables which live in the same probability space omega f p if the joint density f x comma y looks like this. So, it will be a standard bivariate. Standard bivariate Gaussian if f x y of x y looks like 1 over 2 pi square root of 1 minus rho squared e power minus x square minus 2 rho x y plus y square over twice 1 minus rho square. So, this is the standard bivariate Gaussian density here rho is a parameter that lies in minus 1 1. So, in this. So, what you can show is this I will not show this proposition. So, first of all you have to verify that this is indeed a valid pdf by showing the integrates to 1. So, the standard trick of doing that is to complete squares the top this is standard trick to do that. So, this is indeed a valid pdf in two dimensions which is easy to show when property a x and y are marginally distributed as n 0 1 b rho of x y the correlation coefficients the correlation coefficient between x and y is equal to rho and c. The conditional density of x given y is. So, this is normal with mean rho y and variance 1 minus rho square. So, we say that this x and y is a bivariate Gaussian if this is your joint pdf. So, what we can show is if you integrate one variable out let us say integrate y out you will get a standard Gaussian in x and vice versa if you integrate with symmetric in x and y. So, if integrate x out you will get a standard Gaussian in y. So, indeed x and y are marginally distributed as n 0 1. So, x and y are standard Gaussians marginally, but they are not independent. So, we are saying that. So, if they are independent the density will simply look like the product of the two Gaussian densities. In this case it is a more complicated form in fact, you can show that this number rho is in fact, the correlation coefficient between x and y and the way you do that is to explicitly compute expectation of x y put x y and integrate this out you will get rho which is the correlation coefficient between these the standard Gaussians. And the conditional density of x given y this you can compute using just dividing this by the density the marginal density of y which is a standard Gaussian. From there you can conclude that the conditional density is also Gaussian with mean rho y and variance 1 minus rho square these you can directly compute from this expression. So, that is a very simple. So, this is the standard bivariate Gaussian. So, one remarkable fact about this bivariate Gaussian or in general actually all multivariate Gaussians this is true is that if you have uncorrelated random variables x and y are uncorrelated then you know that rho is 0. Correlation coefficient will be 0. So, you put rho equal to 0 here what happens you get 1 over 2 pi e to the minus x square plus y square over 2 right which looks like the product of the marginals. So, the moment you put rho equal to 0 it immediately implies that yes it immediately implies that the random variables are independent correct. So, normally we know that independence implies the random variables are uncorrelated, but the converse is certainly not true uncorrelated random variables are not necessarily independent. However, in the jointly Gaussian world. So, if x and y are jointly Gaussian or which is the same as saying the bivariate Gaussian then uncorrelated jointly Gaussian random variables are independent. So, that is one remarkable property that comes out right from here. Another interesting thing that you can look at this and figure out is that. So, if you want to compute something like let us if you want to compute the conditional expectation of x given y. So, from here you can compute the conditional expectation of x given y because conditional expectation of x given y equal to little y is equal to rho little y right because. So, this is your conditional density. So, from here you can easily figure out that expectation of x given y equal to little y is nothing, but I mean of this Gaussian which is rho times little y right. So, this conditional expectation will be what rho times big y correct. So, this implies that if you have if all this x. So, if x and y are jointly Gaussian or bivariate Gaussian then expectation of x given y is a linear function of y. So, in the signal processing people say that. So, this is the MMSE estimate right and the MMSE estimate is a linear function right. So, linear estimate. So, linear estimate is optimal in the jointly Gaussian world. This is something that signal processing people use a lot if x and y are jointly Gaussian the MMSE estimate is a linear function of your observation. Why is what you observe right. So, that is just some this is in the this is just two dimensions. So, if you look at this density itself. So, you have to plot this. So, you have to plot this x y and you have to plot the density right. So, it will be some three dimensional curve right. So, if your rho is 0 if they are independent then you will have a nice symmetric bell curve in all directions right. You will have a nice you just plot it in matlab it is very nice to see right. It will be symmetric in all directions no matter which way you cut it will look like a n 0 1. But if your rho is not 0 the distribution is a little skewed in one way right. So, what. So, if you plot this distribution right for some rho value which is not equal to 0 and then you let us say you slice the pdf and you plot its level sets as they are called right. Let us say if this is your pdf in three dimensions you slice it at various levels and you plot it plot the level sets in that case you will get. So, the level sets will look like this. So, they are all 0 mean and so if rho is 0 of course you will get these concentric circles right which means like the Gaussian is symmetric in all directions independent case. If your rho is positive you get something that you get these concentric ellipses centered at the origin. So, I have drawn it poorly. So, this is centered at the origin and you get these concentric ellipses. So, the Gaussian will look kind of sharp in this axis and quite spread out in that axis. So, the variance along this direction and this direction will both be equal to 1. So, this is for the case rho greater than 0 and for so which means rho is greater than 0 means that if x is positive y is more likely to be positive. That is why the ellipses are like this right there is more mass along here than along here. There is more mass along first and third quadrants than second and four quadrants. Similarly, if you have rho less than 0 you will have ellipses that look like that. You just plot it in matlab and see the best way to see this is to actually generate this contour well the surface plots in matlab. So, more generally if you want to talk about a non standard bivariate Gaussian the density will be even more messy. So, this is already messy even for a 0 mean univariate case. So, I do not even know I should put if I should put it down it is a fairly messy expression here I go let me put it down. So, general bivariate Gaussian looks like this looks like f x y of x y equal to 1 over sigma 1 sigma 2 square root of 1 minus rho square x minus half x minus mu 1 square over sigma 1 square plus y minus mu 2 square by sigma 2 square minus 2 rho x minus mu 1 over sigma 1 y minus mu 2 over sigma 2. Close bracket and close that bracket 2 rho I think I got that right. So, it is a big mess. So, it is a big mess to write it down explicitly. So, this guy has mean of mu 1 comma mu 2 for x and y. So, expectation of x will be mu 1 expectation of y will be mu 2 and rho as usual will be the correlation coefficient and sigma 1 and sigma 2 the variance of variances of x x 1 y respectively. So, thankfully if you use a matrix notation this becomes easier to notate. So, the same thing use the matrix notation. So, this becomes like this. So, this whole guy this whole expression can be written as 1 over I think it is a 2 pi missing somewhere. So, this should be 2 pi here 1 by 2 pi determinant of v e power minus x. So, vector x minus vector mu v inverse x minus mu transpose over 2 where. So, actually let me call this f x 1 x 2 x 1 x 2 of x 1 comma x 2. So, wherever this y let me write x 2. So, here. So, my vector x is x 1 x 2 and my vector mu is mu 1 mu 2 this is square root here and v is sigma 1 squared sigma 2 squared. So, basically this will be of the form this will be like. So, this will be variance of x 1 variance of x 2 and this will have covariance of x 1 x 2 and covariance of x 1 x 2 you can write in terms of rho and sigma 1 and sigma 2. You can actually you can just explicitly see that these 2 are equal. So, this is the mean vector the vector of means expectation of x 1 expectation of x 2 and this is called the covariance matrix. So, it is nothing but the expectation of. So, x is a rho vector is not it. So, it is the expectation of x x minus mu. So, x minus mu x minus mu transpose which means you will write all the variances of the diagonals and covariance of x 1 x 2 1 on the off diagonals. So, generally what determines. So, let us just. So, if you just look at a more general joint Gaussian the center of these ellipses will be shifted to mu 1 comma mu 2 and your variances will decide the spread of these ellipses in both axis and what you can also show is that this these 2 axis the major axis and minor axis of the ellipse are. In fact, the Eigen vectors of are determined by the Eigen vectors of this covariance matrix. See this covariance matrix can be easily shown to be a positive semi definite positive definite in case it is actually non singular. So, if for a positive definite matrix the Eigen values defined an orthonormal basis in the space right. So, this these 2 orthonormal vectors define the axis of the ellipse. So, the where this is oriented this determined by the Eigen vectors of this matrix. So, basically the multivariate Gaussian in the n dimensional case is defined exactly similarly. What you do is now it is exactly the same formula except now you will have a vector of n random variables and mean vector will be n long and the covariance matrix will have variance x 1 variance x 2 variance x n on the diagonals and all the correlation terms on of that of the covariance terms of the diagonals. Oh by the way. So, before I go to the n dimensional case I want to. So, I made a remark in the beginning that x x n y x 1 and x 2 being both marginally Gaussian does not mean that x 1 x 2 is a bivariate Gaussian right. So, I just want to give a explicit example show that that is not the case. Let us say here example let y 1 y 2 be iid random variables distributed according to the pdf f y of y equal to square root of 2 over pi e power minus y square over 2 for y greater than or equal to 0 and 0 otherwise. So, this is how this is a pdf this you can actually clearly you can you can integrate this and see that this is a pdf and y 1 and y 2 are independent and identically distributed according to this pdf. So, if you look at this pdf what is this look like? It is like Gaussian with one side chopped off it looks like f y looks like this. So, it is as though the negative side of the Gaussian has been chopped off and all its mass has been put on the positive side. So, actually what you are doing is taking a Gaussian and taking its absolute value that is what this random variables. So, this is f y of little y versus y right late. So, these 2 are iid w equal to plus 1 with probability half minus 1 with probability half be independent of y 1 and y 2 define x 1 equal to w y 1 and x 2 equal to w y 2. So, I am constructing an example here. So, what will be the marginal density of x 1? So, for x 1 I am generating y 1 according to this distribution and then tossing a coin which will either come up heads or tails plus 1 or minus 1 with probability half right. So, my x 1 will be equal to the value of y 1 with probability half or minus 1 minus y 1 with probability half correct. So, with. So, my density of x 1 will have a density. So, you can show that the density of x 1 is with probability half it will have a density like that with probability half will have a density like that right. So, you can show I think this is right I think I am not making any mistakes tell me if I do x 1 is n 0 1 and similarly x 2 is also n 0 1 same argument. So, both x 1 and x 2 are marginally standard Gaussian random variables right they marginally both Gaussian, but the contention is that, but x 1 x 2 are not is not a bivariate Gaussian right. In other words they are not jointly Gaussian is not a bivariate Gaussian why is that x 1. So, I am looking at the vector x 1 and x 2 right when. So, in order to show that something is a bivariate Gaussian I need to show that it has a density of that form right minus 2 rho x y 1 naught right, but clearly x 1 and x 2 cannot possibly have a density of that bivariate form bivariate Gaussian form. So, if you notice so the same w is feeding to the signs of x 1 and x 2 see y 1 and y 2 are always positive correct always non negative and w decides the sign of these x's, but the same w feeds in here right. So, if my w is plus 1 here it is also going to be plus 1 here which means if I look at my x 1 and realize that it is positive then I can immediately say that x 2 is also positive there is no way that x 1 and x 2 will have different signs in other. So, which means that if you just plot little x 1 and little x 2 the joint density of x 1 and x 2 is constrained to the first and third quadrants correct. So, looking at the so x 1 alone is a standard Gaussian x 2 alone is a standard Gaussian, but the joint density is only constrained to the first quadrant and third quadrant by construction the density is 0 here and here agreed in particular x 1 and x 2 are certainly not independent right, but and the density is only constrained to the first and third quadrant which means you do not have a situation like this. If it were jointly Gaussian then you will have contours like this right I mean there is non 0 mass everywhere no matter what your rho is you have non 0 mass everywhere right. So, this cannot possibly be a bivariate Gaussian then you have so then so that is a case of a degenerate that is a degenerate case that is a degenerate case that is the case of rho equal to 1 actually. So, they are perfectly correlated so all the mass sitting on that line it is a degenerate case which qualifies it is a degenerate case. So, we will talk about joint Gaussian since n dimensions also where you will get situations where in n dimensions the density may exist in a subspace of k dimensions. So, in that case you can just treat it as a k dimensional Gaussian. So, in this case yeah so that is what you do normally. So, is this clear this example clear this is very instructive because it says that bivariate Gaussian is saying more than saying that they are marginally Gaussian. So, in the n dimensional multivariate case so we will define for n dimensions now. So, there are three equivalent definitions of a multivariate Gaussian or a joint Gaussian. So, my x is so column vector of n random variables x is said to be multivariate Gaussian or jointly Gaussian x 1, x n are said to be jointly Gaussian if it has pdf f x of x equal to 1 over square root of 2 pi to the n determinant of v e power minus x minus mu transpose v inverse of x minus mu. So, this is a multivariate case x minus mu divided by 2 where mu is a real vector and v is a positive definite matrix. So, we are just putting down the same formula in n dimensions and of course. So, I have not it in this. So, from here so this is a pdf you can show this is a valid pdf by making some appropriate substitutions and integrating it out. What we can show is that mu is the vector of expectations expected x 1 expected x 2 and so on and v is the covariance matrix. All the diagonal elements will be the variances and the off diagonal elements will be the covariance expected x i x j terms covariance x i x j terms. So, this is one definition with explicitly specifies the pdf. So, if you want to check if some vector is a joint Gaussian vector or multivariate Gaussian vector you go and see if it has this form for some positive definite v and some vector mu. If it does not it is not if it does it is. So, this is the most explicit specification of a multivariate Gaussian. However, this is very messy to work with even in 2 dimensions if you write this out it was a tremendous right. So, this is not very I mean it leads to T d s integrals and T d s computations right. So, often there are there are equivalent definitions there are actually 2 more equivalent definitions which are commonly used which I will also put down. But, the thing is if I put down multiple definitions you have to show that they are all equivalent right. So, I am going to put down 3 definitions. So, I have to show how many implications 6 implications altogether right. So, I will not do that, but I am saying that you can do it. Definition 2 first line is the same x is said to be a Gaussian vector if it can be expressed as x is equal to w plus mu where w is a vector of i i d n 0 1 random variables. So, this is an equivalent definition you say that x is a multivariate Gaussian if it can be obtained by a affine transformation of a i i d Gaussian vector. So, this w is something you can perfectly understand each of the entries is standard Gaussian and they are all independent. So, w 1 w 2 w n they are all i i d n 0 1 random variables you take that vector and perform this affine transformation on it d is some vector. So, d is a matrix. So, for some matrix d. So, d is a matrix. So, it is some affine transformation of mu is a real vector mu is in R n. So, you can show that I mean you can show the equivalence of these two definitions. And in fact, you can show that d d transpose or d transpose I think d d transpose will in fact be the covariance matrix d d transpose is the covariance matrix mu will be the mean vector. And definition 3. So, definition 3 is probably the most cryptic, but probably the most elegant also. So, x is said to be a multivariate Gaussian if for every a vector a in R n a transpose x is a Gaussian random variable. So, this definition says that x is said to be a Gaussian random vector or a multivariate Gaussian if every linear combination of the x size. So, a 1 this is nothing but a 1 x 1 plus a 2 x 2. So, on a n x n if every linear combination of these x is in fact Gaussian distributed. So, even if you find some vector a for which is not Gaussian distributed then x is not a multivariate Gaussian. So, may be perhaps you can find take it as an exercise in this case try to find some a 1 and a 2 for which a 1 x 1 plus a 2 x 2 does not have a Gaussian distribution. It is fairly easy I think because they have the same sign. So, you just have to figure out some a 1 and a 2 for which a 1 x 2 x 1 plus a 2 x 2 is not Gaussian distributed. So, I mean in particular. So, you can take a equal to 0 and then you will get 0. So, you have to basically accept the 0 random variable as a very special case of a Gaussian with 0 mean and 0 variance. So, subject to that understanding this definition is fine great. So, these three definitions are in fact equivalent. It can be proven actually the proof is available in. So, MIT course where I think. So, lecture number 15 has a as this material lecture number 16 has a proof of the equivalence of all three definitions. . . . . . . . . . . . . . . . . . . . So, the class i has a 0 in which case they aims to be Gaussian. So, this definition clearly says that is margins are marginally Gaussian, but it says more than that, any other questions? So, these three definitions are equivalent. So, another just like in a two dimensional case, you can show that if the. So, this covariance matrix v as I said contains all the increases in a sugar variances and the co variances on or the off diagonals. So, this is a positive definite matrix in general and it is. So, it is Eigen vectors which define a orthonormal basis for the space will again tell you the in which directions the Gaussian will spread out more on which direction the Gaussian is not is more compact in more is more compressed alright. So, this is in n dimensions. So, we cannot draw it, but just the two dimensional analogy can be pushed forward. Also you can show that if the diagonal if the matrix V has a diagonal structure which means or the co variances are 0 then what you can show is that this will actually product out into. So, V inverse will simply be. So, it will be like the sigma 1 to the minus 2, sigma 2 to the minus 2 and so on. So, if you expand this out you will actually get a product of the marginal p d f's. So, which again means that if you have a diagonal matrix for your co variance matrix that necessarily implies that the random variables are independent right. So, generally uncorrelated entries the entries in the Gaussian vector are uncorrelated then they are all independent which is generally not true only for Gaussian's vector is it true. So, even in this case I mean there are some results about see you can just like in the two dimensional case you can show that the conditional expectation is a linear function. So, which means the MMSE estimate is a linear function of your observation. So, linear filtering is optimal is what these signal processing people like to say right. So, linear filtering is optimal in the sense of MMSE for jointly Gaussian random variables. So, there is a couple of other cool facts may be I should put down about these jointly Gaussian's before I stop it as you see I am not spending too much detail on anything I am just pointing out some interesting facts. So, if you have x is equal to x 1 dot x n let say this is a multivariate let say this is a 0 mean multivariate Gaussian with co variance matrix V. So, I am just throwing out the means because means are just a headache right you can if they have if they have non negative non 0 means just subtract it out subtract the mean vector out. So, in this case you can perform. So, these guys are. So, this x is. So, the idea is to in some sense generate. So, this x is given to you this is a multivariate Gaussian vector with some co variance matrix. Now, we are going to generate uncorrelated or independent Gaussian random variables from here. So, this so the procedure is to do this you take x 1 w 1 equal to x 1 then you take w 2 equal to. So, you take x 2 and subtract out that. So, you know that this error. So, this is the estimation error in estimating x 2 looking at just looking at x 1 you know that this is orthogonal to that guy right. But, they are Gaussian the joint Gaussian therefore, they are actually independent not just uncorrelated they are actually independent right and if you want to normalize it again you can divided by it is variance or you know you can make it unit variance if you like. But, they can proceed like this right w 3 I can write as x 3 minus conditional expectation of x 3 given x 1 x 2. So, I so here again I am projecting. So, I am estimating x 3 based on observing x 1 and x 2 and again this error will be orthogonal to the subspace span by x 1 and x 2 correct and so on right I can keep doing this. So, this is called this is known variously as Gram-Schmidt procedure in some people call this Gram-Schmidt procedure signal processing people call it whitening ok. So, it is called this is called whitening because. So, I mean I guess this is from the noise literature right. So, you have this. So, you have this colored variable. So, to speak it as all these correlations and then you have these w's are uncorrelated and you can you can make it unit variance if you like. So, it is like in all these dimensions it has equal components. So, it is called whitening of this x i's this procedure is called whitening and it is a causal procedure right. So, I look at x 1 and x 2 and then decide my next w 3 and so on right. And since see one thing you should know is that if you look at this guy this will be a linear function of x 1 and x 2 because conditional expectations are linear in whatever you observe correct. So, this whole operation can be written as a vector w equal to l times x where l is a lower triangular matrix. So, you see what I mean. So, this will only be a this will be a linear function of x 1 this will be a linear function of x 1 and x 2 and so on. So, you have only a set of linear operations and so w. So, you can write this as a matrix form w equal to l x and because of this structure you get a lower triangular matrix this is called whitening matrix. So, you can whiten this vectors whiten this vector and you can in fact show that covariance of the covariance matrix right covariance of x x transpose or whatever. So, this was v is not it can be written as l inverse whitening filter times b times l inverse transpose or which is the same as where b is a diagonal matrix. So, you can write this as l inverse b to the half b to the half l inverse transpose. So, this will also be a lower triangular matrix this will be an upper triangular matrix and. So, this is like a triangular square root of your covariance matrix. So, this is called Koleski factorization Koleski decomposition anyway. So, that is about whitening which is again common in signal processing. Finally, the one another. So, I will stop with one more little nugget about this multivariate Gaussian. So, you know that if you have just let us say any let us say single variate Gaussian you know that the nth moment of the single variate Gaussian is only a function of mu and sigma right it is completely specified by mu and sigma square. Similarly, if you have a Gaussian vector x 1 through x n all moments like expectation of x 1 to the 4 x 2 to the 13 whatever any moment of any order can be figured out by just looking at the mean vector and covariance matrix. So, the theorem that says this is called Wicks theorem. So, let us say that let us say x 1 x 2 dot x n is a 0 mean let us throughout the mean because it is just a headache 0 mean multivariate Gaussian. Then expectation of let us say I am going to write little y 1 little maybe I should write maybe I should write expectation of y 1 y 2 y n equal to 0 for n odd and expectation of y 1 y 2 dot y 2 n. So, I am looking at an even product that is given by some pi expectation of y i y j see this y. So, the reason I switched from x is to y I did not do this as a mistake each of these y's could be any of the x's. So, in y 1 you can put x 13 y 2 can again be x 13 if you like y 3 can be x 1 right you can put it in any way you want see these y's are some x's. So, if you have an odd product because they are 0 mean you will have 0 and for the even case you will have all possible 2 way products of these covariance terms. So, this denotes all possible sums of these y i's taken to a written. So, for example, if you want to compute. So, for example, if you want to compute expectation of let us say if you want to compute expectation of y 1 y 2 y 3 y 4 even with repetitions y 1 could be x 1 y 3 could also be x 1 even with repetition. So, this will simply be expectation of y 1 y 2 expectation of y 3 y 4 plus expectation of y 1 y 3 expectation of y 2 y 4 plus expectation of y 1 y 4 times expectation of y 2 y 3 is that all. So, I have to take all I have 4 terms I have to take all possible pairs and then add this all and similarly if I have even for longer expressions you can do it even if you have x 1 to the x 1 squared x 2 to the 4th some x 3 x 3 to the 6th you will have to expand the whole thing out in terms of the covariance these are all covariance matrix entries and each of these y's are some x i's. So, what this Wicks theorem says is if you are given the covariance matrix namely these entries you can go ahead and compute any moment of these joint Gaussians ok. So, this is called Wicks theorem sometimes it is also called Feynman diagram formula because Feynman used it in some calculation in his quantum field theory work he did not invent it right it was much older than him, but he used it. So, it is called Feynman diagram formula. So, that is another nice negate about this joint Gaussians yes. So, that was my crash course on joint Gaussians random variables. This n ordered n even yes can we interpret with geometry on the y it is 0 on y this is some. So, no it is very simple. So, you have y 1 y 2 blah blah right. So, even if these are repeated let us say I have y 1 y 2 y 3. So, there is no way for me to take. So, even if I have y 1 y 2 y 3. So, y 1 and y 2 could be the same or different, but one of them will always have to be different. So, the expectation will be 0, but even case this may be like y 1 square y 2 square x 1 square x 2 square in which case I have to write all the x 1 x 1 x 2 x 2 x 1 x 2 at all these possibilities I have to exhaust right. So, this is what I am going to do. So, this is what I am going to do. So, I am going