 اوذہ بالله من الشیطان الرجیم بسم اللہ الرحمن الرحیم و انزل اللہ علیکہ الكتاب بالحکمہ و علمہ کاما علم تک کنت علمہ و کان فضل اللہ علیکہ عزیمہ یہ رسول اللہ صل اللہ علیہ السلام کو کہجہ رہا ہے کہ اللہ تعالی میں آپ کو کتاب اور حکمت تتا فرمائر وہ سکھ لائے جو آپکن ہی جانتے تھے اور بہت جبرتست فضل اللہ تعلق تو any kind of نالج تو any kind of نالج that is given is from Allah and it is a great blessing of Allah now today we are going to cover today we are going to cover more of the univariate normal distribution theory and as usual i am going to focus on the practical aspects when a building is constructed you have the engineering which is taught in many years and you have the architects and then you have the people who are putting the bricks together must do it so i am more or less concerned with teaching the brick laying but in this lecture we are going to do a lot of engineering also so a lot of theory and a lot of but i will point out the places where we learn about the real activity on the ground which is the important part so our lecture today is about bivariate normal distributions basically if you take the material from this lecture which is univariate and this lecture you could probably make a whole course out of this so this is not but it is basically because there is a lot of theory which if you go on and explain things in detail it will take a lot of time i am just going to with the function how you operate the steering wheel and how you put on the bricks and not worry about how it is built out of time ok so we start with the basic definition that there is a bivariate joint density function so if you have any two random variables they have a joint density function this is a function which we can call f of xy sometimes i call it g of xy just to show that this is a joint density so the joint density function has these two properties it has to be non negative and it has to integrate to one this is the same thing because the positivity is required for probabilities and the total probability has to be one so it has to be the whole integral has to be between zero and one now the probability that x and y belong to a region a so this region a here is any region in xy plane the outcomes are taking place on the euclidean plane with the xy coordinates and so any region can be an outcome does will the random variable land inside this region that is the question so the probability of that event is going to be obtained by integrating taking the double integral over a of fxy dxdy one thing important to note is that this function f can be greater than one even though the integral has to be less than one because the integral is obtained by integrating over a region if you have a density which is in a small region like if you have a random variable which takes values from between zero and one half with equal probability uniform then the density function will have the value two at all points because the value of the function will be multiplied by the area of the region so two times one half will come out to one so the function itself can be very high it's very useful to think of the density function as the infinitesimal probability this is something which is not taught in calculus but basically it is very useful to understand infinitesimals otherwise one gets very confused in probability because basically you see one fundamental problem with continuous variables is consider a uniform random variable a uniform random variable has outcomes in the range zero and one and all values are equally likely this is the intuitive idea but if you look at what is the probability that x equals one half well it's the integral of the density function over that point and what's that integral no just at one half it's zero because yeah exactly the probability at any point is zero because you cannot integrate and you can integrate but you just get the outcome zero because there's no volume there's no area in that region so basically the integral of a function is the area of the region over which you're integrated times the value of the function but if the area of the region is zero then there's no volume in it so no matter what the value of the function the outcome is going to be zero so that means that every single outcome has probability zero but one of these outcomes is going to take place because the whole space has probability one so this is a paradoxical situation I have even written a paper about this but the simple way to think about it is to say that there is a probability at that point which is f of x dx where dx is an infinitesimal now what is an infinitesimal well an infinitesimal is a number which is bigger than zero but it is smaller than one over n for all n so this is or you can say one way to say infinitesimal is a number which is between 0.99999 repeating and one so these two are not exactly the same because there is this 0.000 with infinite number of zeros but then within one at the end so that's an infinitesimal now this can be in regular calculus these things don't exist infinitesimals are not allowed to exist but there is a way to do mathematics in which you can allow them to exist and so here in probability it's convenient to think of the infinitesimals as existing because then we have some probability it's a very very small probability it's smaller than one over n for any n but still it is not zero so we have f of x dx probability at every one point ok when you integrate you just add up all of these infinitesimal probabilities and if you add up infinite numbers of infinitesimal probability then you get something positive ok so here's what a graph of a bi-variate normal density looks like it's symmetric and rises to peak at the origin and declines smoothly and basically what is this graph show you see it's a graph over the plane the probability is in the plane the probability in any region of the plane is the area under this graph so that's how to intuitively think about that now mathematically when you have a bi-variate normal distribution this can be written in vector form as follows you have two variables x and y and we can stack them one on top of the other and this is a bi-variate normal also has the mean mu and the covariance matrix sigma the mean vector mu is again two component one is mu sub x and the other is mu sub y mu sub x is actually the mean of x and mu sub y is the mean of y and then there is this covariance matrix which has four entries sigma squared x sigma xy sigma yx you will see that sigma yx and sigma xy are the same so these are just one entry and then sigma squared y so this bi-variate normal distribution has five parameters mu x sigma squared x these are the parameters for x so we have already studied in last lecture the single univariate normal distribution so x has mean mu x and variance sigma squared x y has mean mu y and variance sigma squared y and there is one more parameter sigma xy and sigma yx these two are the same so there are five parameters so basically the bi-variate normal distribution has one additional parameter sigma xy which is the covariance of x and y so the two means and the variances are the things we already know from our study of the univariate normal distribution so there is one additional parameter the covariance which we have to study in this lecture what is the covariance what it means and how do we work with that so the formal definition of the density is given here in this equation f of xy is 1 over 2 pi sigma x sigma y times the square root of 1 minus rho squared times the exponent of something complicated but we will analyze this later and see that it's not so complicated so but if you look at the term it's useful to see that it's x minus mu over sigma that's the standardization of x squared and then y minus mu over sigma that's the standardization of y squared minus 2 rho times the standardized x and the standardized y so it's the product of the two and then so that's enough to look at for the structure for now now we're going to look at the covariance and the properties and so when we talk about covariance first we have a general definition of the expected value for any function of two random variables and it's just a standard generalization of the expectation in one dimension so the expectation is the double integral of h of xy times the density integrated that's all that's always the definition of expectation expectation of any function is the value of that function integrated against the density over the entire region of the density one technical term which is useful is the support the support is the set of all values of x and y at which the density is non zero so the support is the area where the variable actually has some infinitesimal probability as opposed to having zero probability so if you have positive random variable then the support is only the positive and there is no support on the negatives so that's just some terminology now the covariance function this is the very important function this is the heart of the bivariant normal density is defined as the expected value of the product of x-ex times y-ey so basically it says center the x center the y and calculate the expectation so the integral form is double integral of x-ex y-ey fxy dxdy now one very important property of the covariance is that it is bilinear it is linear in both variables separately that you can establish using the properties of the integral if you analyze it a little bit but I won't do that now one important consequences that the covariance of ax plus b times cx plus d first of all the constants don't enter into the covariance because you see you take x-ex you center x so if you add a constant to x then that also gets added to the expectation so when you subtract x-ex that will disappear so the b and the d in covariance ax plus b cx plus d doesn't matter so covariance of covariance of ax plus bcy plus d is the same as covariance of ax and cy the b and the d drop out and then you have the a and the c can come out so covariance of axcy is the same as a times c times the covariance of x and y that's part of the linearity now one operation that you can perform which gives us a different and alternative expression for the covariance is here this is just to show you how to operate with the covariance function so we just take the expected value of this and we expand this it's x-ex times y-ey so we multiply together and we get four terms it's xy minus ex times y minus x times e y and plus ex e y now when we take the expectation of the four terms I can split this up into four different expectations the first term is the expected value of the product xy and then all the three terms are the same because ex is a constant so e of ex y is just ex times e y and all three terms are like this two of them are minus one of those plus so you end up with xy minus ex times e y so that's the other definition of the covariance now we will do some work on covariance later more now we want to calculate in this lecture this is one of the main objectives five densities we start with the joint density of x and y and then we have two marginal densities and two conditional densities so the marginal density of x is basically what is the density of x when x is considered all alone as a separate random variable instead of as a joint so basically this involves just getting rid of the y and you get rid of the y by integrating it out so the marginal density of x is obtained by taking the joint density of xy and integrating with respect to y from minus infinity to plus infinity or alternatively you can integrate over all y in the support of y and when you do that that will be the same because outside the support the density is zero similarly if you want the marginal density of y you have to integrate out the x so now when you do this calculation and you can do this and it's a very nice and elegant calculation and I used to spend a lot of time on that that's the engineering part and you should learn how to do that so you find that the marginal density of x is univariate normal with mean mu x and variance sigma squared x you start with that that this one which is the joint density and and you integrate this is the joint density and you integrate this and you get the result that x is normal with mean mu x and sigma squared x and y is normal with mean mu y and sigma squared y this is just the calculation you take that joint density and it's a little bit messy but it's routine it's difficult and I've given you materials in this separately which do all of these calculations so so that's one thing that you integrate and you get the margins now the more important thing and the more interesting thing is the conditional densities so there are two conditionals there is the conditional density of x given y this is what is the density of x given that you know the value of y now actually there is not one conditional density you have to understand that there are infinite numbers of conditional densities because basically y the random variable is a pre-experimental concept post-experimental one outcome of y is observed and then you ask now that I know the value of y post-experimental what do I know about x so this value of y if the two variable have some dependence and that's the general case will give you some information about x so the marginal density of x is what you know about x without knowing anything about y and the conditional density of x given y is what do you know about x now that you know the value of y so these are usually these two things are different and similarly the symmetric c of y given x so now this is very easy to do because the joint density is equal to the conditional density of x given y times the marginal density of y this is a standard factorization it's also equal to the conditional density of y given x times the marginal density of x so these two are equivalent expressions and now from this it's very easy to calculate because just division we know the marginal density we just take the joint density and divide by the marginal density and then there's again lots of messy algebra but the result is very nice and elegant that both of these densities the conditional density of x given y and the conditional density of y given x these are both normal densities and you get there is a very nice way to express this result basically if you standardize them then the results are nice if you don't standardize them then the formulas look awkward and different and you have difficulty understanding where they come from but if you standardize them then they are very easy you can get to the non-standardized very easily by simple algebra so that's why I look at the standardized expression basically if you remember that the x given y is a normal then all we need is the expected value of x given y and the variance of x given y once we have the mean and the variance then we know what the normal density is so the expected value of standardized x given y is just rho times the standardization of y so that's very simple rho is the correlation coefficient the expected value of the standardization of x x minus mu over sigma will be 0 if y does not give any information if y and x are independent then the standardization by the because it's a standardization will have expected value 0 so now the correlation coefficient rho tells us how much relationship there is between y and x and basically it says you take the standardized y and multiply it by rho and that's the expectation of the standardization of x so it's a very simple and easy and when you translate this formula by algebra then you get the expected value of x given y is equal to mu x plus rho times sigma x times y minus mu y over sigma y so now this formula looks complicated and ugly and asymmetric but if you look at the previous formula you see that you just get it by multiplying out the expected value of standardization of x is equal to rho times the standardization of y then you multiply both sides by sigma x and then you add mu x to both sides and you get the second formula so it's very simple and it's follows immediately from the relationship between the standardizations here I have made what might be called a notational mistake in that I have used capital Y on both sides actually when you take the conditional expectation you are talking about the observed value of Y the conditional expectation is defined for all observed values so it has both meanings it can be thought of as conditional expectation condition on the random variable in which case it's a function of the outcome of y or it can be considered as a fixed particular variable for a particular y so there's a little bit of a confusion it works in both ways but you have to keep in mind that there are two different interpretations of this one is the interpretation of this post experimentally when there is an actual outcome of y which has been observed it is true for that also or you can think of it pre-experimentally it is also true for that it has different meanings this same formula now that's the expected value and what about the variance well the variance is interesting the variance of the standardization of X given Y is 1 minus rho squared this is very interesting because basically if rho is 0 that's 0 correlation then the variance of the standardization is 1 and that's what it should be because standardization always has variance 1 but now basically if there is correlation then Y will give you some information the information will reduce the variance that's to be expected because now there is some fluctuation in X but this fluctuation will be reduced because you have some information about X and Y is giving you some information so basically it's reduced by rho squared if rho squared is 1 which is perfect correlation then the variance is 0 in that case the value of Y completely determines what X is and if rho is 0 then the variance is unchanged it's the same as the marginal and in the middle it varies linearly with rho squared so it's a very sensible and simple and easy to understand formula so from this again if we multiply by sigma squared the variance of variance of 1 over sigma X times X you can factor out 1 over sigma squared here because variance of a constant times X is the square of that constant times the variance of X so the sigma squared comes out with the constant on the other side subtracting mean mu X doesn't have any effect on the variance so basically variance of X given Y is 1 minus rho squared times the sigma squared X so these are the conditional mean and variance of X and from this we can get the formula what is the conditional distribution of X given Y well it's normal the mean equal to the expected value of X given Y and variance equal to the variance of X given Y and so it's the formula here the formula is a little bit messy and if we put it down like this it will be very hard to understand it's not really mean mu X plus rho times sigma X times the standardization of Y Y minus mu Y over sigma Y that's the mean of X given Y and the variance is 1 minus rho squared sigma squared X so that's the conditional distribution similarly by just interchanging X and Y you can get the conditional distribution of Y given X it's completely symmetric so there is no change exactly the same meaning and derivation and now one thing that we note that if rho equals 0 then the conditional distribution of X given Y is the same as the marginal marginal is mu X and sigma squared X and if rho is 0 then that's what it reduces to and that's very important to understand because basically rho equals 0 means that the X and Y are independent the knowledge of Y doesn't give us any information about X the marginal distribution X what you know about X without having any knowledge of Y is the same as what you know about X having knowledge of Y now rho will be I'm running a little bit ahead in this lecture rho is 0 means that the covariance of X and Y and 0 and if that is so basically covariance X Y equals 0 implies that X and Y are independent now this is a very deceptive result and people are very confused by this all the time and it seems very likely that this confusion will persist because this is a very special result it only holds when X and Y have joined by very normal density in general it is not true that covariance 0 implies independence but for by where it normal it implies independence in fact what independence means is that the expected value of X to the MY to the N is equal to the product of the Mth moment of X and the Mth moment of Y is equal to the product of the separate moments for all integers M and N this is a very strong requirement for and expected value of XY equals EXEY that's just one of these infinitely many conditions but in the normal condition case if just one of them holds then all of them hold but in general independence means that this condition holds for all integers M and N and so you cannot conclude from any one of them holding that all of them will hold so independence is a much stronger condition than covariance 0 but in the normal joint normal case the two are the same okay so now we do some more description of the normal density so one very important special case of the normal density is the standardized case where both X and Y are standardized so here I define Z of X to be the standardization of X X minus EX over the standard error of X that will be X minus mu X over sigma X and let Z of Y be the standardization of Y so Y minus mu Y over sigma Y then what is the density of Z of X and Z of Y that function is given over here and it's a much simpler function now it's 1 over 2 pi the sigma X sigma Y have disappeared but there is a square root of 1 minus rho squared and then there is exponent of minus Z squared X Z squared Y minus 2 rho Z X Z Y 2 times 1 minus rho squared in that denominator so this is the this is the joint bivariate normal density for standardized variables this form is so again if you start with this density you can integrate out Z X to get the marginal of Y integrate out Y to get the marginal of X and also calculate the two conditional densities and here they have much easier form and so these are the forms Z F X Z Y are both normal 0 1 because we have standardized them the conditional density is interesting it's Z X given Z Y is rho times Z Y this is exactly the what I said earlier and the variance is just conditional variance 1 minus rho squared so if you have if you look at the conditional distribution of Z X given Z Y the standardization of X conditional on the standardization of Y you get normal the mean is rest rho times the Z of Y the standardization of Y and the variance is 1 minus rho squared after standardization all the sigma's disappear and all the mu's disappear and one very important feature of the normal distribution is what you would call homo's cadasticity because in general in the conditional distribution the mean will also be a function of what you're conditioning on the value of Y and the variance will also be a function of the value of Y but in the normal distribution it has a very special property that the variance does not depend on the value of Y so this is a very special property of the normal distribution in general this will not be true for general joint distributions both mean and variance will be affected by the conditioning but in the normal distribution the mean is affected by the conditioning but the variance is not affected by the conditioning the variance is affected by the conditioning it becomes smaller but it doesn't depend on the actual value of the Y which is observed and again we have independence between the two if and only if rho equals 0 when rho equals 0 then the mean becomes 0 and the variance becomes 1 so the conditional is the same as the margin so now what we are going to do is basically you know something if you know how to build it this is the concept now we start with two we know what an independent standard we know what a standard normal 0 1 is what do we know we know what it is by generating it by generating on the computer so when we generate these numbers on the computer which are normal then we know that this is what a normal random variable is so if I generate f again generate one random variable which is normal 0 1 I can also generate a second which is independent normal 0 1 and construct show you how to get to the standard by where it not so first when we have independent standard normal densities then the joint density is 1 over 2 pi 1 half x squared plus y squared because the joint density is just the product of the two marginals because these are independent so for independent random variables x and y the joint densities is the product of their separate densities so this is a bivariate normal and the vector of means is 0 and the covariance matrix is identity that we want to explain what is the covariance matrix so the covariance matrix consists of these four entries one is the variance of x the other is the variance of y and the off diagonal term is the covariance of x y which is the same as the covariance of y x now one very important property of this covariance matrix is that it's determinant is always greater than or equal to 0 typically greater than 0 which means that the covariance of x y squared and covariance of x y and covariance of y x is the same you can easily see from the formula because the product and the expected of x y is the same as expected of y x so this is going to be the same so the covariance of x y squared which is the minus term in the determinant has to be less than or equal to the variance of x times the variance of y this is called the Cauchy Schwartz inequality and it has its own proof but I won't bother with that but basically what it means is the covariance matrix has a positive determinant in fact it has to be a positive definite matrix zero variance yes a variable has zero variance if and only if it's it's not really random so that means that it equals its mean with probability one so it's just a constant so you can regard it as not random because there is no random fluctuation but it's useful to have that as a limit as the variance goes to zero the random variable converges to a constant so it's just it's just a notational issue that do we call that variable random or not it's just as convenient it doesn't really matter but the thing to understand is that when the variance is zero then there's no randomness left in that okay so now some more properties of the covariance one formula for the covariance is e x y minus e x times e y and now this has to be zero for independent variables because the expected value of a product is the product of expected values for all independent variables and this is true so covariance always zero whenever x and y are independent and this doesn't depend on normality the converse is true only for normals that is covariance zero implies independence for joint bivariate normals only now one important aspect of the covariance formula is that what is the covariance of x with itself well if you just look at the formula covariance xx it's e x squared minus the square of the expected value which is just the variance of x so and that's greater than zero for every random variable if it's random otherwise it can also be zero if it's not random so the covariance of x is just the same as the variance of x if x and y are independent the covariance is zero now the correlation between x and y is just the covariance by definition is the covariance of the standardization of x with the standardization of y so standardization of x is as usual the standard error of x and the standardization of y is y minus e y the standard error of y so from these formulas you can also see that the correlation is the covariance of xy divided by the standard error of x times the standard error of y put the product of the standard error in the denominator now some more properties of the covariance basically covariance is what differentiates between the bivariate case and the univariate case so now what we are trying to do here is to develop the formulas that I start with two independent normals how can I convert them into two dependent normals because that's the general case that I want to so I'm trying to show you between all of these formulas there's lots of theory and lots of formulas so the brick layer the one who is the masdoor he needs to know how do I build this variable and that's what we're trying to do so between the huge number of pages and 99 pages of work and then there's this one page of how do you put the brick on so this is this so covariance of xy is the same as covariance for x plus a y plus b adding constants doesn't affect so covariance of xy xy minus e y so it means that we only need to think about the centered variables when calculating the covariance now take x y to be independent standard normals we're starting with these independent standard normals and we want to construct correlated variables with correlation row I know how to do that then I know how to put the brick now so جوز constants a and b such that a squared plus b squared equals 1 and define this new random variable z which is equal to ax plus by and x and y as independent standard normals then what's the variance of z well x and y are independent so variance of z is the sum of the variances so it's variance of ax plus variance of by is equal to a squared times the variance of x plus b squared times the variance of y is equal to a squared plus b squared because x and y are a standard and so that's one so z has variance one now what about the covariance between x and z well that's equal to covariance of x and ax plus by now covariance is linear so you can split it up into two terms it's covariance of x with ax plus the covariance of x with by now the linearity means that the a and the b can come out so you have a times the covariance of x x which is just the variance of x and b times the covariance of xy which is 0 because x and y are independent so this is just a what that means is that x and z have covariance a so now I have two variables x and z these are both standard normal separately z has mean 0 and variance 1 and x has mean 0 variance 1 and the correlation of x and z is a which is what so now I have constructed two standard normals which are which have marginal and 0 1 and they are correlated with a row correlation so I know how to go from two independent standard normals to two correlated standard normals so this is for the brick layers if I start with to put it together xy's independent standard normal define zx equals x this would be the standardization of x but x is already standardized so I don't need to do anything to it and zx y define as row x plus by where b is the square root of 1 minus row now then zx and zy are both normal 0 1 zx is of course x so it is normal 0 1 z y if you take the expected value of z y that will be row times the expected value of x plus b times the expected value of y both y and x have mean 0 so the expected value of z will have to be 0 and now if you take the variance of z y that will be row squared times the variance of x plus b squared times the variance of y and if you calculate that you'll find that that's equal to 1 so the variance of z y is 1 by construction the mean is 0 by construction and now the joint the correlation between z x and z y is a row that's that's how that's why we constructed z basically I constructed z by you know x and y are independent I took y and subtracted a little bit from it and added some portion of x to it in order to create correlation so the some portion of x was added into the y and that created the correlation because now this z variable has a little bit of x and a little bit of y so the part which is x is correlated with the x perfectly and the part which is not is independent so basically by adding those 2 I get a partly correlated variable and now how much weight I put on the 2 will change the correlation from 0 to 1 if I put all the weight on y there will be 0 correlation if I put all the weight on x there will be perfect correlation and in between I get all in between correlations so the joint density z y is exactly as before it is the standardized correlated density which we have already seen the formula for so I have just duplicated it here now this gives us 2 variables which are correlated and with row and they have mean 0 and variance 1 now suppose I want mean x and a sigma squared x that's very easy the correlation is not going to change when I make a linear transformation in x and then I make a linear transformation on y so I can just adjust them to have whatever mean I like and whatever variance I like correlation will remain fixed in this operation because the correlation is the covariance of the standardized variables so if I destandidize them it won't make any difference to the correlation so I define s equals to mu s plus sigma s times z x and t equals to mu t plus sigma t times z y and now I have s and t are bivariate joint normal with correlation equal to row covariance equal to row times sigma s times sigma t and they have the marginals that I want mu s and sigma squared s and mu t times sigma squared t so this is again the building function so now I can build any kind of bivariate normal density that I want by starting from independent standard normals first I take the independent standard number x y and then I create standard normals which have correlation row then I adjust the mean of the first one to whatever I like sigma and the standard deviation to whatever I like because I add the mean to get the mean I want from zero and I multiply by the sigma to get the standard deviation I want and I do this to both variables separately and this does not disturb the correlation between them and now I have a general by where it normal any mean any variance for the first variable any mean any variance for the second variable and any row that I like for the correlation between the two so all five of the parameters are completely under my control you tell me the five parameters and I know how to build my two independent normals and I can from my two independent normals I can construct a bivariate normal according to all five parameters that you specify so I know the perfect construction work this is really the most important thing and unfortunately this is the one thing which is never taught I mean if you can go through books and books and books and understand that this is what is how you do because in engineering school they don't teach you how to go and build houses they teach you the theory so this is for the big layers there are two steps required to get from IID normals to general first you induce the correlation and then you adjust the mean and the variance of the both variables okay the reverse process is also important when you have a house how do you deconstruct it take it apart so basically starting from the general bivariate normal I want to go back to the two independent standard normals so we know how to standardize for one univariate normal which means subtracting the mean and dividing by the standard error but now how do I standardize from a general bivariate normal process so now I have five parameters so again this is a two step process first you standardize x and y separately so you define v equals to x minus e x divided by the standard error and w equal to y minus e y over the standard error and now v w will be standardize normals but they will have correlation row so v w as standard normals normal 0 1 covariance of v w is equal to row which is the same as the correlation of x and y because the correlation is the covariance of the standardized variables that's the definition of correlation now I want to go from v w to independent normals z v and z w these are going to be linear transformations of v and w which are independent so I want basically to undo the dependence so take one of the variables to be v you can take it to be w also it's the same but you have to do one or the other you can't do both so I take z v equal to v and now I take z w to be w minus row v so basically I am extracting I know the structure this is the important thing what w is correlated with v because it has one part which is actually exactly the same as v but a little bit you know scaled and it has one part which is independent of v so if I take out the part which is dependent on v what I will be left with will be independent of v so now I take w and subtract from it row v exactly row is the correlation so row v is exactly the part of w which is coming from v so when I take w minus row v this part is going to be independent of v that's very important so now I have defined these two variables I take the covariance of z v and z w so by applying the linearity property of covariance covariance of v w is yeah so this is covariance of v with covariance of w minus row v and that splits up into two terms it's covariance of v w minus row times covariance of w w right sorry yeah it's row times the covariance of v v that's I've got this wrong here but it doesn't matter so it's going to be the first covariance of v w is row because these are standardized and the covariance of v v is one so this will become row minus row which will be equal to zero and now the magic of the normal distribution is that if the covariance is zero then the two z v and z w are independent so basically now z v is normal zero one z w is also normal zero one and these are independent for the normal zero one for the normal zero one I haven't actually proven that you have to look at the expected value of z w and that will be zero by linearity and then you have to look at the variance of z w oops I made a mistake here yeah I think you have to have z w equal to the square root of one minus row squared times w minus row v and then you get normal zero one otherwise the variance of z w as I have written will be one plus row squared which is not going to be standard normal so I made a mistake here so after you do that so basically now I've shown you how you start from standard normal sorry a general bivariate normal density and you make a sequence of steps to reduce it to independent standard normals this is a very common technique in working with general normal distributions it's very hard to handle the standard general normal distribution multivariate case we are looking at in the bivariate because it's easier to understand in the multivariate case it will be we will do the same things but it will be more complex so basically what you do is you make a linear transformation and you get to your standard normal which has mean zero and variance one or covariance matrix identity now the calculations are very easy for the standard normal all the calculations are very easy for standard normals so you do all of the calculations and then you undo because of linearity you can usually do this and the first step you start with general normal you reduce it to standard normal by a linear transformation then you do all your calculations that you need to do and then you undo the linear transformation to get back to your original framework and you get the original result this is a very common process okay now there is one more very important way to generate I am going to basically this whole lecture is about how you construct general bivariate normal so I have shown you one way this one way that I have shown you is based on the joint density now I am going to show you how to generate normal based on the conditional density so we start with pre-experimental so again our goal is the same I am going to generate a general bivariate normal from my basic bricks which are normal zero ones so an independent normal zero ones so I have one independent normal zero one y which is normal zero one I know how to generate that on my computer so I have one independent normal zero one y which is normal zero one on my computer so I generate this and I get one actual value so I have an observed value of y 1.2 now I know from the calculations that we have done earlier then the conditional density of x given y equals to little y so now I am looking at the observed value of y is normal with mean equal to rho y and the variance equal to the square root of 1 rho square this is in the general bivariate normal with correlation rho so now I can generate a z which is normal zero one which is independent of y so I start with that now I want to change the mean to rho y so I just add rho y and I want to change the variance to 1 minus rho square so I multiply the z by square root of 1 minus rho square and that's it so if I set x equals to rho y plus the square root of 1 minus rho square times z then x will have the conditional distribution normal with mean rho y and variance again there's a mistake in here the variance is not the square root of 1 minus rho square the variance is 1 minus rho square itself the standard error is the square root of 1 minus rho square now we have that's it x given y has this distribution now the marginal distribution of x will be normal with mean 0 and variance 1 and the covariance between x and y and the correlation between x and y will be rho pre-experimental y standardized because it is 0 its mean and variance 1 we have assumed that x is condition y is equal to y pre is equal to post this assumption first one basically I am now generating actual values observations of x and y which are normal just like generate numbers 2 and I know the value of rho then I multiply so suppose rho is 0.5 so I have 1.2 is the observed value of y so I want an x which has mean equal to 0.6 which is half of the observed value and the variance I should be 1 minus 0.5 square the variance should be 0.75 so now I want a normal with mean 0.5 and variance 0.75 I have an independent normal 0.1 which I generate on the computer which may be minus 3 minus 3 is too large minus 1.7 then so I my formula tells me that take this minus 1.7 multiplied by the square root of 0.75 and add 0.5 and you will get a number and that number will be your x which is which has the desired density conditional and y and now the marginal density of this x is normal 0.1 this is what we know from theory this is where the engineering comes in this is what the brick layer doesn't know that okay I'll tell him this is the recipe for your generating but why did it come he will be completely lost he will never be able to understand so that's all that mathematics which shows why given this conditional density and that marginal the marginal effects will be normal 0.1 there's a lot of heavy duty mathematics which goes into proving this but doing it is very easy it's just a recipe so both are normal 0.1 they have covariance correlation of rho and now I want to get the general bivariate these x and y they are both normal 0.1 and they have correlation rho general is very easy you just change the mean to whatever you like and change the standard deviation to whatever you like by if you take s equals a plus bx and t equals c plus dy the correlation between s and t will still be rho the mean of s will be now a because you added a and the mean of t will be c standard error of s will be b and the standard error of t will be c so you can set the mean and the variance of s set the mean and the variance of t and the correlation is fixed at rho and that's it now you have generated bivariate normal from two independent normals now we are going to look at some properties of iid bivariate normal random variables one of them is spherical symmetry the joint density you can write you are multiplying two univariate normal densities each one has a one square root of 2pi so the joint density will have 1 over 2pi the two square roots multiplying at 2pi in general in the n dimensional case the factor that you have is 1 over 2pi to the n over 2 where n is the dimensionality of the normal distribution so here in the case of two dimensions it's 2 over 2 and it becomes 1 in one dimensions 1 over 2 it's square root of 2pi and so on and the exponent i have e to the oh i've forgotten the minus sign here it's e to the minus one half x squared plus y squared e to the minus one half x squared times e to the minus one half y squared so one important thing to note about this is that the density is a function of the distance from the origin we are talking about the real plane the euclidean plane x squared plus y squared is the square of the distance from the origin and so basically the density declines it's because it's a minus one half it's maximum at zero it's equal to 1 over 2pi and then because e to the minus x keeps declining with r so the distance from the origin so basically it keeps going down and it goes down to zero very fast because e to the minus one half r squared is a very rapidly declining function and that's why the normal density doesn't have outliers because the function declines to zero very rapidly so there's not much probability left after you go out a little bit far away from the origin and other densities like the qoshi density you have something like 1 over 1 plus r or something like r squared and or r cube so those densities decline much slower and in those densities you get much much more you have much better chance of finding something which is far from the origin now we get to some more technical stuff which is the spherical symmetry this is very important to understand about the standard normal distribution the standard normal distribution is spherically symmetric it means that all directions are the same and to understand this technically it's useful to do a little bit of mathematics and that mathematics is to convert the normal bivariate normal to the polar form so conversion to polar form is done here now it's useful to think of the infinitesimal density is f of x y dx dy and now I'm going to transform the x y remember x y are your coordinates in your Euclidian plane now if you make a little triangle from the origin to x y then you see that the distance from the 0 0 to x y is r squared which is x squared plus y squared and if you look at the angle theta which is made between the line the triangle at x y and the x axis and call that theta then you see that the vertical part which is y is just equal to r sine theta and the horizontal part is x which is r cosine theta now I want to get the density of the r and theta the polar coordinates of the normal because this has a very important meaning and very important use and very important to understand so basically dx dy this is just a formula for change of variables and by the infinitesimals is equal to j times dr d theta where j is the Jacobian the Jacobian is the determinant matrix and if you just do a simple division you can see that j is equal to dx dy divided by dr d theta and that's exactly what it is j is the determinant of this matrix of derivatives dx dy dr d theta so you take x and you divide it by r and theta so you have partial x with respect to r partial x with respect to theta and you take the y and multiply it by differentiate with r and theta and so you get this 2 by 2 determinant and if you just look at x equal to r cosine theta and x equal to r sine theta you can do the differentiations so partial x with respect to r is going to be just cosine theta partial x with respect to theta is minus r sine theta because cosine differentiates to minus sine and partial y with respect to r is just sine theta and partial y with respect to theta is r cosine theta and when you take the determinant it's cosine squared minus r sine squared oh sorry plus r sine squared so you just get r so the crucial formula here is that dx dy is equal to r dr d theta okay so from that we get the following change of variables if you have fxy dx dy that's 1 over 2 pi again I have forgotten the minus sign minus 1 half x squared plus y squared dx dy now we make the change of variables x equals r cosine theta y equals r sine theta this implies that r is equal to the square root of x squared plus y squared and theta is equal to the r tan of y over x is just inverting the 2 formulas and then you get fr theta just by substitution is equal to rx minus r squared over 2 dr times 1 over 2 pi d theta and these are the 2 densities of theta and r the first density the density of r is r times e to the minus r squared over 2 and then it ranges from 0 to infinity the support is only on the positive side and theta itself ranges from 0 to 2 pi and the density is just 1 over 2 pi which means it's a uniform density it's equally likely to be anywhere now what happened what happened now at this point some people might be just completely mind boggled last I remember that in one of my lectures I said this course is completely beyond me it's not for me so this is the engineering part and but not too many these things are not they are difficult and it's not important to know them what is important to know is the result of this which we will show you I am teaching you how the brick laying part which is the important how to put one brick on top of the other so basically what we understand from this is that we will go into that one so the distance from the origin you take a normal distribution you generate x y it's a point in the plane now how far is that point going to be well the square of the distance is going to be x squared plus y squared now one very important property of the normal is that the sum of the squares of independent normal is a chi squared distribution so x squared plus y squared is chi squared with 2 degrees of freedom so basically the square of the distance from the origin is chi squared with 2 degrees of freedom and this property generalized to n dimensions when you have n random variables then you take the sum of the squares you have chi squared with n degrees of freedom so I know how far my normal is from the origin this is the key and how you derive that that's part of the formula but it doesn't really matter all you need to know is that the distance is chi squared in n dimensions the square of the distance what about the angle so there are two things which determine where this normal is now the important thing the beautiful thing about the normal is that the angle is equally likely to be anywhere so if you have a circle choose any point on that circle such that all points are equally likely and choose the r the distance according to the chi squared distribution and you have a normal distribution this is another way to think about the normal distribution the distance is by chi squared and the theta is uniform so that's a very important insight into the structure of the multivariate normal distribution now there is one so I'm going to use this insight to construct random variables I've given you two ways to construct the general bivariate normal now I'm going to create a third way and this third way is also important this third way is based on the spherical symmetry and it's based on the fact that r is chi squared 2 so now I want to show how to get from a uniform variable to a chi squared 2 now chi squared 2 happens to have a very nice property it has a very nice integral unlike the normal which has a very bad integral it's impossible to integrate but chi squared 2 is easy to integrate and it leads to a very simple formula but first let me explain the so called probably the integral transformation last slide some of the distance are from origin yes one say in green what do you mean of course and theta can take any value theta is just the angle so it can you dimension it takes values from 0 to 2 pi measured in radians and in n dimensions the angle is just a sphere is a point on the sphere on the unit sphere that's called the angle angle can also be if the value of r has increased the unity concept is not disturbed you see r can be anything r can be zero r can be 50 or it can be 100 this is the chi squared standard normal but the standard normal is a mean zero and variance one it can take any value between zero and three more than three is unlikely but it can take any value between zero and six one in trillian possibility so one has no special standing in the normal distribution the when you have x squared plus y squared then x squared can be one point can be two and x can be two and y can be two these are values which are possible for normal and x squared will be four and y squared will be four so r squared will be eight because because if the fundamental loss will be defined thetta is confined to the uniferical but r is not r can take any value just like the normal bi-variate normal can be any point on the plane is just a different coordinate system the polar coordinates is just آہر والہاں پرuya ہوں گی جو اس کراہت پروڈی پر ساتھ اہم ڈیٹا سہلے بھی ہے وہاں ہوں گے قدام ہے ، لہذا ایک ساتھ ہم یہ ہو بہت، عقصصی ایسا راہ پہاتے ہیں آپ کو کیشکہ گو ڈیٹ راہ برعی میںٹرار سکتے ہیں ، جب جو اپنی عقصصی اپنے گراننے کی برعی میںٹراری اس باتوں کا ذکر ، تو جب جب جس ڈیٹی پرمرائے جئے جو اس قدام بہت دیکھے لیگہ ہ todavía آئے جب تھے وہ وہ اس مقاون بہت باما 0 بälی نہیں ہے  mote 0ohl cart planting �� facult م những بعد دہ Tra我也 nun پر دیگو تفنی ڈاکزج سکتے تھے ایک کسی دداně آپ بھی کبح ایک سالک floor جو ڈاکزج کسی دا fades ایک جو ڈاکزج تفنی بھور سکتک ہے لیکن بات م workloads ا vergessen کہ ہیں لیکن62 لیکنثر پر ڈینگ سیات الحمر برائبہ ڈینگ اس پرکان ڈینگ ڈینگ جارلک ہے ڈینگ سرطفیون ڈینگ نا جہاں ڈینگ س مسک کرنا ڈینگ سرطفیون ڈینگ نا جہاں ڈینگ سرطفیون ڈینگ آگا ڈینگ سرطفیون ڈینگ اس کیدر ڈینگ اس کیلی لیکنے ڈینگ انہوں کے باہ میں جو جو ایک آپ میں اور ایک آپ میں آپ کے لئے ایک آپ نے جس میں اس پر اس پر پہلے میں کچھ good تکانتی ہے کہ اللہ زمان میں پہلے سلام کرتا ہوں تھی پر اللہ زمان کے کچھ رہا ہوں اس own distribution functions to do it to transform that random variable it's a way to actually smooth that random variable so it's just a mechanical calculation if you look at the probability that x is less than or equal to x then x is by definition f inverse of u so that's probability that f inverse of u is less than or equal to x now apply f inverse on both sides and you get probability that u is less than or equal to f of x which is equal to f of x so it's a little bit magical tricky nothing complicated in this but still it seems puzzling so anyway the result of this is that if you take any random variable x and transform it by its own cdf you get a uniform random variable conversely if you take a cdf and apply the inverse of it to a uniform random variable you get the random variable with the desired distribution this is called the probability integral transform and it's a very common way to generate any random variable because we know how to generate uniform random variables but how do you generate a normal random variable it's not there's no direct easy method or chi-squared or gamma variable or exponential there are thousands of different densities now the excel give us a random which is uniform between 0 and 1 and now the probability integral transform says the following take any random variable calculate the cdf calculate the inverse of that cdf apply that to the uniform and you've got your random variable very simple so again the brick layer part is very easy this is what you do and how you prove it that's actually also quite easy but it becomes mysterious so x equals f inverse of u has cdf f this is the probability integral transform take any cdf f calculate its inverse apply it to the uniform variable and you've got the variable that you want very simple now so what are the consequences of this probability integral transformation excel has a built in function called norm s inverse which is the inverse of the cdf of the normal and what it does is if you give it a probability it calculates the x at which the normal distribution cdf reaches that probability so now what our probability integral transformation says is that if you take random brackets which is uniform 0 1 and apply norm s inverse to this you will get a normal random variable which is a normal random variable so this is a very convenient way to generate normal random variable in excel because when we want to the other way that you already know is that you go into the data analysis and you ask it to generate but those random variables that you generate they are fixed they are not recomputed sometimes you want for simulations that every time you run it you get new random variables so if you generate your random variables like this then every time you do anything to the spreadsheet they will all be recomputed you can do your simulations with your normal variables which are dynamic instead of static but this is actually special to excel because excel has the built in inverse normal by the way inverse normal is not a function which you can write down there is no formula for it it's just numerically computed so this is not elegant and this is not possible in the earlier days of computing we couldn't use this method because we didn't have any formula for norm s inverse and still you don't have in many computer packages you want if this is not built in then you can't use it so the other way is the one that is based on the chi square and that is the following okay so now we are going to do again a little bit of integration so the density function for the chi square with 2 degrees of freedom is by definition integral from 0 to x are e to the minus r squared over 2 dr that was the density of the chi square 2 and this is a perfect integral in the sense that minus e to the minus one half r squared the derivative that is r minus e to the minus one half r squared so basically the standard calculus this is called the fundamental theorem of calculus that if the derivative of f is f prime then the integral of f prime is f these two are inverse functions so since the derivative of minus e to the minus one half r squared is r e to the minus one half r squared so the integral of that is the antiderivative so we evaluate this at the two limits 0 and x at the top limit x it's e to the minus one half x squared at the bottom limit 0 it's just 1 there's a minus sign so this can be written as 1 minus e to the minus x squared over 2 so that's f of x so I want to calculate the inverse of that well that's very easy I set u equal to 1 minus e to the minus one half x squared and now I solve this function for x that's how you get the inverse function so it's just simple algebra I pull the e to the other side it becomes 1 minus u then I take the log so it becomes log 1 minus u and I have minus one half x squared left so then I multiply both sides by minus 2 that gives me x squared equals minus 2 log 1 minus u and then I take the square root and I get x equals the square root of minus 2 log 1 minus u now there's a little bit of a trick which is not needed but 1 minus u and u have exactly the same distribution because uniform 0 1 we take 1 minus and you get the same distribution so instead of 1 minus u I can use u this makes a simpler formula otherwise if I use 1 minus u there is no problem that also works perfectly well so basically what we have as a result is that if I take square root of minus 2 log natural log of u then this has a chi square distribution with 2 degrees of freedom so if I take rand and I apply minus 2 log rand and I take the square root I will get a chi squared distribution with 2 degrees of freedom some condition first third term or second term first line کیا کنگشن؟ minus exponential 1 over 2 r square او اچھا that bar that is actually the evaluation at the 2 limits so there should be x equals 0 at the bottom and x equals 1 at the top sorry x equals 0 i.e. r equals 0 and r equals x at the top so this is the result that if the r is the square root of minus 2 log u r squared is minus 2 log u remember the r is the distance r squared is the equal to x squared yeah right this is actually the square root of a chi squared if I remember correctly this formula is correct but r squared is equal to x squared plus y squared so this r is not actually this always has to be positive so if you generate r as square root of minus 2 log u and theta as 2 pi v where u and v are standard uniforms then theta will be uniform variable between 0 and 2 pi so it will be the angle which is equally likely to be any possibility and r will be your square root of a chi squared and so you will have x equals a set x equal to r cosine theta and y equals r sine theta and you have your bivariate normal this bivariate normal will be mean 0 variance 1 and correlation 0 this will be the standard bivariate normal and now once you have these then you can generate any bivariate normal by using the tricks that we have already discussed so this shows you how to generate bivariate normal in excel without using norm as inverse norm as inverse is powerful and you don't need the strict in the olden times when we didn't have access to such functions in excel then we needed to do these kinds of tricks to get normals anyway still the strict is no longer very useful because now we have the norm as inverse but it is useful in understanding the structure of the normal so I have explained it contents of this lecture and the previous lecture would be enough to teach a course in normal distribution theory but I have been focusing on the engineering the engineering aspects are very small very trivial those are basically in the previous lecture it was the standard normal and its properties especially the central limit theorem and in this lecture we talked about how you generate bivariate normal what does correlation mean the relation between and some properties of covariance and basically how to construct so basically the builders the bricklayers version of these two lectures is that if you take a lot of random variables and add them together you get normals and normals have very nice properties and if you take two independent normals how you can get the joint density from the joint density how you get the conditional density and the marginal densities now these are the crucial elements of the what we need to do the Bayesian analysis for the normal normal so I think we have had a lot of theories so next time I was planning to do multivariate but that will be really I think too difficult so what we will do is Bayesian theory so it will be real applied analysis next time for the how you do the Bayesian analysis of a standard univariate linear regression model so we will we know already how to estimate y equals alpha plus beta x plus epsilon in the classical method and now we have all the theory we need to show how this is done in the Bayesian methodology and so next time we will start on that