 what is multicollinearity and the effects of multicollinearity or the problems you know due to multicollinearity. So, what is multicollinearity? What is multicollinearity? The problem of multicollinearity exists when two or more regressors variable strongly correlated or linearly dependent. Suppose we wish to fit the model y equal to x beta plus epsilon. So, we are assuming that considering that there are this is the matrix form of multiple linear regression and there are k minus 1 regressors and one response variable. So, the we know that the least square estimates of beta is beta hat equal to x prime x inverse x prime y. So, now you know if x prime x is singular then we cannot perform the inverse x prime x inverse and singular means you know the determinant is equal to 0 x prime x determinant is equal to 0 and this happens when at least one column of x is linearly dependent on the other columns. So, that means you know the columns of the ith column of this matrix x stands for the ith regressor variable. So, the meaning of this one is that know if one regressor can be I mean one regressor is linearly dependent on the other regressors then x prime x is singular and we cannot compute the inverse of the x prime x matrix. Let me give one example for this one an example can we use the data below get a unique to the model y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus beta 3 x 3 plus epsilon and the data here is the data x 1 x 2 x 3 and the response variable y. So, 1 minus 2, 4, 81, 2 minus 7, 11, 88, 4, 3, 5, 94, 7, 1, 13, 95, 6, 7, 8, 9, 9, 8, minus 1, 17, 123. So, here in this example we have 3 regressor variables x 1, x 2, x 3 and one response variable and here is the data. Now, the question is you know whether we can use the data use this data to get a unique fit to this model. Well of course, we can get a unique fit that means, we can estimate the regression coefficients using the using least square estimate that is beta hat equal to x prime x inverse x prime y provided you know provided x prime x is not singular. Now, here if you observe you know it is not difficult to check that these 3 regressors are not independent. Here you check that you know x 2 plus x 3 take this first observation. So, x 2 value is minus 2, x 3 value is 4 and x 1 value is equal to 1. You can check that x 2 plus x 3 that is equal to 2. So, x 2 plus x 3 is twice of x 1. Take the second observation x 2 plus x 3 is equal to 4 which is 2 times of x 1, third observation also you know x 2 plus x 3 equal to 8 which is x 1. So, here the relation between the regressors is like x 2 plus x 3 is equal to twice x 1 or you know same as writing that x 1 can be expressed in terms of you know using x 1 depends on x 2 and x 3. So, x 1 is x 2 by 2 plus x 3 by 2. So, here really we cannot estimate the regression coefficients because you know x prime x is singular here. So, here because of this fact you know they are not independent. So, that is why x prime x is singular. So, x prime x determinant is here it is exactly equal to 0 determinant is equal to 0. So, this is an example here which illustrate the definition of multicollinearity. Here the I mean in this problem I mean in this particular data the problem of multicollinearity exist because the columns of x. So, the x is nothing but you know the x matrix is nothing but this one. So, here you know the one for example, the first column can be I mean is its linearly depend can be written as the linear combination of x 2 and x 3. So, that is why x prime x is singular and the problem of multicollinearity exists here. So, next we will be talking about you know the effect of multicollinearity. Well, the first or you know I can also say that the problems due to problems due to multicollinearity. So, the first one is it says the strong multicollinearity between regressors results in large variance and in large variance. So, now I am going to illustrate what I mean by this you know this says that know if there exist strong multicollinearity in the data then that results in large variance and covariance of regression coefficients. So, let me consider multiple linear regression model with two regressors right. So, my model here is y i equal to beta naught plus beta 1 x plus beta 2 x plus beta i 1 plus beta 2 x i 2 plus epsilon i. So, this i stands for the ith observation. So, i is from 1 to now the x matrix for this fit consist of the columns consist of the columns. So, here is the x matrix x matrix for this model is we know that this is 1 x 1 1 x 1 2 right 1 x 2 1 x 2 2 like that 1 x n 1 x n 1. 2. Now, we will talk about the centering and scaling of this regression data centering and scaling of regression. So, what we do is that, if we replace, if we write x i 1 is equal to x i 1 minus x 1 bar by s 1 1 square root of that. So, let me tell what is this notation x 1 bar of course, it is the mean associated with the first regressor. So, this is equal to 1 by n summation x i 1 i equal to 1 to n and s 1 1 is equal to summation x i 1 minus x 1 bar square sum over i equal to 1 to n. So, let me write also say x small x i 2 equal to capital X i 2 minus x 2 bar by s 2 2 square root and y i small y i is equal to capital Y i minus y bar by root s y y. So, this is the mean associated with the second regressor. So, x 2 bar is equal to 1 by n summation x i 2 i is from 1 to n and s 2 2 is equal to x i 2 minus x 2 bar whole square 1 to n and here also you know y bar is equal to summation y i 1 to n and s y y is equal to summation y i minus y bar. So, what I did here is that you know my original data is x 1 x 2 and y and then what I am doing is that I am replacing this x 1 by small x 1. So, small x 1 small x 2 just notation only and then small y. Now, one thing we need to observe is that the mean of the original observation is I mean the mean associated with the first regressor for the given data is of course, it is always x 1 bar well, but here if you take the mean of the transform data. Now, this the mean of small x i 1 is always equal to 0. So, here x 1 bar is equal to 0. Similarly, x 2 bar is also equal to 0 and y bar is also equal to 0 and the other thing to observe here is that you know summation x i 1 square 1 to n this is equal to 1 and this is also true for the second regressor like summation x i 2 square 1 to n is equal to 1 and summation y i square 1 to n is equal to 1. Now, you know the model for the original data is we wanted to fit this model y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus epsilon well, this transformation is called you know centering and scaling of regressor data and here for this model we know that you know the least square estimates gives beta naught hat is equal to y bar minus beta 1 hat x 1 bar minus beta 2 hat x 2 bar right, but you know if you fit the same model here for the transform data also if you fit the model like y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus epsilon for the transform data. I am fitting the same model then it is not difficult to check that beta naught hat is going to be 0 here, because beta naught hat is equal to y bar minus beta 1 hat x 1 bar minus beta 2 hat x 2 bar, but all this y bar is equal to 0 x bar is equal to 0 x bar x 1 bar is equal to 0 and x 2 bar is also equal to 0. So, here this intercept for this transform data is always going to be equal to 0, so that is why you know for this scaled and centered data we will omit the intercept beta naught from the model. So, the model assuming that x 1, x 2 and y are centered and scaled is just y i equal to beta 1 x i 1 plus beta 2 x i 2 plus epsilon i. So, we omitted the intercept beta naught from the from the model, because for the centered or the scaled data we just checked you know that intercept beta naught is always going to be equal to 0. Well, so now next we will write down the x matrix for this model. So, the x matrix is equal to basically it is x 1 1 x 1 2 I am writing small that means this is the transformed for the transform data x 2 1 x 2 2 and x n 1 x n 2. See that column 1 1 1 1 is not there that column is associated with the intercept beta naught. So, this one is basically x 1 1 is nothing but x 1 1 minus x 1 bar by root of s 1 1 x 1 2 is x 1 2 capital x 1 2 minus x 2 bar by root of s 2 2 and similarly x 2 1 minus x 1 bar by root of s 1 1 x 2 2 minus x 2 bar we have x n 1 minus x 1 bar by root of s 1 1 x n 2 minus x 2 bar by root over of s 2 2. So, this is the x matrix for the transformed data you can say. Now, what is the normal equation associated with this model the normal equation in general you know basically I am trying to find the least square least square estimate of two regression coefficients beta 1 and beta 2. So, the normal equation is x prime x beta hat equal to x prime y you know what is y y is in matrix form y equal to y 1 1 minus sorry y 1 y equal to capital y 1 minus y bar by root of s y y y 2 minus y bar by root over of s y y similarly y n minus y bar by root over of s y y. So, this is what the y matrix is. So, here is the normal equation now what is x prime x this one is this implies see x prime x is you can check that the x prime x is 1 and then r 1 2 this r 1 2 is nothing, but the sample correlation between x 1 and x 2 and 1 here because of the fact you know because of the fact that because of the fact that this sum x i 1 square is equal to 1 and sum x i 2 square is equal to 1 that is why these two elements are equal to 1 and beta hat here this matrix beta hat has two elements beta 1 hat beta 2 hat which is equal to r 1 y r 2 y well this r 1 y is the sample correlation between x 1 and the response variable y and r 2 y is the sample correlation between the second regressor and the response variable y right. So, the normal equation we have is r is 1 r 1 2 r 2 1 r 1 2 and r 2 1 are same 1 beta hat beta 2 hat is equal to r 1 y r 2 y right well where I already mentioned that r 1 y or r 1 2 is sample correlation between x 1 and x 2 and r 1 y is the sample correlation between x 1 and y similarly r 2 y is the sample correlation between x 2 and y. So, what is this formula of r 1 y r 1 y is equal to sum x 1 i x i 1 minus x 1 bar into y i minus y bar 1 to n root over of s 1 1 s y y and r 1 2 is equal to sum x i 1 minus x 1 bar y i minus y bar sorry this is equal to x i 2 minus x 2 bar right s 1 1 s 2 2. So, maybe more precisely I should replace them by capital X capital Y. So, you can check with the original x matrix now this is the x matrix for the transform data and then you can check y you know x prime x is equal to this you know what is r 1 2 and what is r 1 y r 2 y all these things. So, you can check that how y this element is r 1 2 that is the sample correlation between the regressor x 1 and x 2. Now, to get beta 1 hat and beta 2 hat we need to compute the inverse of this matrix this matrix is called the correlation matrix. Now, this x prime x matrix this one is now it is called the correlation matrix well. So, what we have is that we have x prime x is equal to 1 r 1 2 r 2 1 1 here and you can check that the inverse of x prime x is x prime x inverse which is equal to 1 by 1 minus r 1 2 square minus r 1 2 by 1 minus r 1 2 square minus r 2 1 1 minus r 1 2 square 1 by 1 minus r 1 2 square. It is not difficult to check that this is the inverse of x prime x right well. So, beta hat the least square estimator of the regression coefficient beta hat is equal to x prime x inverse x prime y. So, which is equal to 1 by 1 minus r 1 2 square minus r 1 2 1 minus r 1 2 square minus r 2 1 1 minus r 1 2 square 1 by 1 minus r 1 2 square x prime x inverse x prime y is equal to r 1 y r 1 2 y. So, from here you know the estimates are beta 1 hat equal to r 1 y minus r 1 2 r 2 y by 1 minus r 1 2 square and beta 2 hat. So, this is you know basically the least square estimate for the transform data in terms of the correlation coefficient sample correlation coefficient and beta 2 hat is r 2 y minus r 2 1 r 1 y by 1 minus r 1 2 square right. So, what I said the problem with if there is if there is a problem of multicollinearity in the data then that results in large variance and covariance of regression coefficients. So, right now we have the least square estimator of the regression coefficient beta 1 and beta 2. Next we will check what is the variance of beta 1 and we are going to prove that you know the variance of beta 1 and also the variance of beta 2 that tends to G tends to infinity as r 1 2 tends to 1. That means, when there is a strong multicollinearity between the regression x 1 and x 2 then the variance of beta 1 hat and the variance of beta 2 hat are going to be infinity. So, that is what we are going to prove now. So, next we need to find the variance of beta 1 hat and variance of beta 2 hat in terms of the sample correlation coefficient r. So, the variance of we know that variance of beta j hat in general is equal to sigma square x prime x inverse j jth element of this x prime x inverse right. So, here from this formula you know variance of beta 1 hat is equal to sigma square sigma square into x prime x inverse 1 1th element. So, what is x prime x inverse 1 1th element the 1 1th element is 1 by 1 minus r 1 2 square. So, this one is equal to sigma square by 1 minus r 1 2 square. Now, you know if so this is the variance of beta 1 hat and similarly the variance of beta 2 hat is equal to sigma square x prime x inverse 2 2 th element which is also equal to you know 2 2 th element is also 1 by 1 minus r 1 2 square. So, here the variance of beta 2 hat is also sigma square by 1 minus r 1 2 square well. Now, you know if there is strong multicollinearity between the regressor x 1 and x 2 then the correlation coefficient r 1 2 will be large. So, large means you know the modulus value can be equal to 1. So, when r 1 2 the modulus of r 1 2 tends to 1 the variance of beta 1 hat tends to 0. So, this tends to sorry tends to infinity. So, this tends to infinity as r 1 2 tends to 1. So, this will tends to 1 when the regressor x 1 and x 2 are strongly correlated or you can say when there is strong multicollinearity between x 1 and x 2 then r 1 2 tends to 1. Similarly, this quantity is also the variance of beta 2 also tends to infinity as r 1 2 tends to 1. So, also I said that you know strong multicollinearity results in large variance and covariance. So, what is the covariance of beta 1 hat and beta 2 hat this is equal to sigma square x prime x inverse 1 2 th element. So, what is 1 2 th element or 2 1 th element that is minus r 1 2 by 1 minus r 1 2 square. So, this one is going to be sigma square r 1 2 minus 1 minus r 1 2 square and this also you know tends to plus minus infinity depending on whether r 1 2 tends to plus 1 or r 1 2 tends to minus 1 plus minus infinity depending on whether r 1 2 tends to plus 1 or r 1 2 tends to minus 1. So, this is you know the proof of I mean we using two regressors in the model in the multiple linear regression model we illustrated you know how the strong multicollinearity results in large variance and covariance in of the regressor coefficients. So, the same so this illustration is using the two regressors in the multiple linear regression model, but this is also true if you have you know more than two regressors in the model. So, that we are going to mention now suppose you have you know more than more than two regressors in the multiple linear regression model. Then it can be proved that it can be shown that the diagonal elements of x prime x inverse matrix are 1 by 1 minus r j square for all j from 1 to 1. So, I am talking about the multiple linear regression model with k minus 1 regressors and of course you know I need to define what is this r j square. So, here you know before we had r 1 2 square now r 1 2 has been replaced by r j where r j is the coefficient of multiple determination from the regression of x j on the remaining k minus 2 regressors. I think I need to explain this here you know r j square r j square is the coefficient of multiple determination. We know what is coefficient of multiple determination in multiple linear regression model that is r square is equal to s s regression by s s total and this parameter measures you know the proportion of variability in the response variable that is explained by the by the regression variable. But here what I mean by r j square is that. So, that is now we know what is r square the coefficient of multiple determination when we fit a regression model between y and the between the response variable and the k minus 1 regressor variables. But here this r j square is the coefficient of multiple determination from the regression of x j on the remaining k minus 1 regressors. That means here the multiple linear regression model is in between between x j and the remaining k minus 1 regressors. So, here x j you expressed in terms of the remaining regressors say x 1 x 2 x j minus 1 x j plus 1 x k minus 1 right. So, here the model is in between trying to express x j in terms of the remaining regressors. So, if you can recall my first example today you know there I told you know we have checked that the x 1 if I am if I am recalling correctly x 1 is equal to x 2 by 2 plus x 3 by 2 and this one is true for all the observations we had. So, this is what we want you know and the coefficient of multiple determination for this example this one is basically r 1 square and here you know r 1 square is 100 percent because x 2 and x 3 can explain 100 percent of the variability in x 1. So, what we do so this r j now you know I hope you understood that what is r j square r j square is the coefficient of multiple determination for this regression model. So, you feed the model you know you express x j in terms of the remaining regressors and then you compute s s regression for this model you compute s s total and you will get r j square well. So, since this is the diagonal element now I can say that in when you have know more than two regressors in the multiple linear regression model the variance of beta j hat is equal to sigma square x prime x inverse the j j th element and the j j th element is equal to sigma square by 1 minus r j square. So, this tends to infinity as r j square tends to 1 r j square tends to 1 I mean sometime we write most of the time we write this in terms of percentage. So, r j square is 100 percent means that r j square tends to 1. So, r j square tends to 1 means you know this will happen if there is strong multicollinearity between x j and any subset of x j. So, if there is a strong multicollinearity between the regressor x j and any subset of the other remaining k minus two regressors then r j square will be close to unity and then you and the variance of beta j hat will be tends I mean will tend to infinity. So, this proves you know just today we could manage to talk about only one effect of multicollinearity that means only one problem due to multicollinearity that says that you know strong multicollinearity results in large variance and covariance of the regression coefficients. And we illustrated this point both for the multiple linear regression model using two regressor variable in the model and then in general. Well next class we will be talking about some other I know there are several effects of multicollinearity in multiple linear regression model. So, we will be talking about those problems in the next class. Thank you very much.