 Hi, this is my second lecture in the topic called transformations and waiting to correct model in adequacy. In simple linear regression model or in the multiple linear regression model we make some basic assumptions on the regression model like we assume that the error terms have mean 0 constant variance and they are uncorrelated and also we assume that the error terms follow normal distribution. Now, given a set of data how do you know the given data follow or they satisfy the basic assumption we made for regression fitting. So, to check whether your data satisfy the basic assumption or not, we have learned several techniques in the module called model adequacy checking. And in this regard the residual plot is an effective technique to test whether the model assumptions are satisfied or not. So, what you do there is that you fit a simple linear regression model and then you find the residuals and you plot residuals against the estimated response. So, if you see that the residuals are sort of scattered or centered about the line e equal to 0 then the model is satisfactory and then you can assume that the constant variance assumption is correct. But, if you see from the scatter plot from the residual plot that the residuals are sort of the residual fit is similar to say outward open funnel or inward open funnel. That means, the constant variance assumption is not true and similarly, if your residual plot is similar to say double bow then also the constant variance assumption is not correct. So, if you have a data x i and y i and you know how to check whether data satisfy the basic assumptions or not. Suppose, your data does not satisfy the basic assumption we made in the linear regression fit then what we do in the current module or in the current topic is that we learn here what to do in this situation when the given data does not satisfy the basic assumptions. So, we talked about two techniques in the previous class and here is the content of this topic called variance stabilizing transformation. So, we take some transformation on the response variable to correct the non-constant variance assumption and also we talked about transformations to linearize the model and we are left with analytical math methods to select a transformation and generalized and weighted least square. So, today we will be talking about this generalized and weighted least square technique. First, let me talk about weighted least square. So, linear regression model with non-constant variance can be fitted by the method of weighted least square. So, if you can recall that for the for simple linear regression model the model is y equal to beta naught plus beta 1 x plus epsilon. So, what you do to estimate what is the ordinary least square? So, we are talking about weighted least square here. The ordinary least square is to is a technique to estimate this regression coefficient and there we estimate the regression coefficient beta naught and beta 1 by minimizing this quantity s equal to summation y i minus beta naught minus beta 1 x i whole square. So, this is the observed response and this is the fitted response and this quantity is the residual. So, this is nothing but the i th residual e i square. So, in the ordinary least square technique we minimize this quantity we minimize the difference between the original response value and the estimated response value to estimate this regression coefficients. Now, what we do here is that instead of minimizing this quantity we minimize this is called for weighted least square the weighted least square function is just to multiply by w i. So, w i is the weight given to the i th observation and here w i is proportional to 1 by sigma i square. So, see here we are talking about a model with non constant variance that means, y i is coming from a population with variance sigma i square. So, w i is proportional to 1 by sigma i square and then you find the normal equation. So, you have to estimate beta naught and beta 1 by using weighted least square the normal equations are partial derivative of s with respect to beta naught equal to 0. This gives summation w i y i equal to beta naught hat summation w i plus beta 1 hat. So, this is the first normal equation and the second normal equation is obtained by differentiating this weighted least square function s with respect to beta 1 equal to 0. This gives summation w i y i x i equal to beta naught hat summation w i x i plus beta 1 hat summation w i x i square. So, you have two normal equations and two unknown beta naught and beta 1 by solving these two equations, you will get an estimate of the regression coefficients beta naught and beta 1 hat beta and beta 1. Now, I did not say anything about why the weight is proportional to 1 by sigma i square and the other concern here is that, see you are given just data set like x i y i for i equal to 1 to n nothing else. So, you do not know what is this sigma i square for your given set of data. So, I will come back to this point the first of all why this weight is proportional to 1 by sigma i square and how to get this sigma i square. So, I just know give an idea of what is weighted least square. So, this is very similar to the ordinary least square, but here we are giving weight w i to the ith observation and that w i is proportional to 1 by sigma i square, but how do I get this sigma i square and why it is why this weight is proportional to 1 by sigma i square that I am going to tell in this class only well. So, now, I need some prerequisite to talk about generalized least square. So, first one is called Gauss Markov theorem. So, this theorem says that for regression model y equal to x beta plus epsilon. So, this is a multiple linear regression model with expectation of epsilon equal to 0 and variance of epsilon equal to sigma square i. That means, it satisfy the basic assumption of multiple linear regression model. It is constant variance and the expectation is 0 and they are uncorrelated you can see the covariance terms are all equal to 0 because this is the identity matrix. For regression model with this the least square estimators are unbiased and have minimum variance when compared with all other unbiased estimator that are linear combination of the response values y i square. So, what this Gauss Markov theorem says is that for the multiple linear regression model satisfying the basic assumption. So, we apply the ordinary least square technique here and we know the estimate. We know that beta, beta hat using least square estimator because this is a linear regression model. So, we apply the ordinary least square technique here and we know the estimate. We know that beta, beta hat using least square estimate beta hat is equal to x prime x inverse x prime y. So, what this Gauss Markov theorem says that the estimator obtained by least square estimate this is unbiased and it has the minimum variance compared to all other unbiased estimator. So, this is the minimum variance which are linear in y. So, the estimator obtained by using least square technique is the best among all linear unbiased estimator. So, this least square estimators are unbiased best linear unbiased estimator these are called blue. So, this is what the Gauss Markov theorem is and let me talk little bit about I hope that you know about positive definite matrix still I will recall it. So, a matrix n cross n matrix m is positive definite if z prime m z is strictly greater than 0 for all non-zero non-zero unbiased estimator. So, this is vectors z belongs to r to the power n. So, to make this definition clear I will give some example of positive definite matrix. So, example 1 this is basically I 2 0 1 sorry 1 0 0 1 consider this matrix. So, this matrix is positive definite because you consider z equal to say z naught z 1, then z prime m z call this matrix m equal to 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 z naught square plus z 1 square and this is strictly greater than 0 because z naught and z 1 they are real and at least one of them is positive. So, this is a positive definite matrix. So, you know this is similarly you can prove that I n identity matrix of order n this is also positive definite. Let me give one more example, example 2 consider this matrix which is equal to 2 minus 1 0 minus 1 2 minus 1 0 minus 1 2. So, this is a matrix involving some negative entry, but this is positive definite. Because you take a z, z equal to say z 1 z 2 z 3. So, it is basically it is from r 3, then you can check that z prime m z is equal to z 1 square plus z 1 minus z 2 square plus z 1 minus z 2 square plus z 2 minus z 3 square plus z 3 square and this is strictly greater than 0 because of the fact that all this z 1 z 2 z 3 they are real and at least one of them is positive. So, you may think that you know that if there are some positive terms in the matrix then it is positive definite. I just give one more example say example 3 which is not positive definite. So, take this matrix m all positive terms 1 2 2 1, but this is not positive definite because if you take z non 0 say 1 minus 1 then you can check that 1 minus 1 1 2 2 1 1 minus 1 this is equal to minus 2 which is less than 0. So, for so this is an example of a matrix which is not positive definite. For a huge class of example covariance matrix matrices are positive definite. In fact, every positive definite matrix is covariance matrix. So, next I will move to the generalized least square thing generalized least square. So, here what we will do is that we will develop weighted least square for multiple regression. So, first recall the multiple linear regression model. So, you consider the same model say y equal to x beta plus epsilon with expectation of epsilon is equal to 0. So, what we will do and variance of epsilon is equal to v into sigma square and this one this v into sigma square cannot be written as i sigma square. So, this is important because whatever we have done before for the multiple linear regression model we have assumed that expectation of e equal to 0 and variance of e is equal to sigma square i. So, sigma square i means here the constant variance assumption is true. That means, all the variance of epsilon i is equal to sigma square for all i and they are uncorrelated. That is why you can see here the e is equal to of diagonal elements are 0 here, but here this is not true. So, this v is this is the covariance covariance matrix and here v is positive definite. So, here now we are trying to fit a multiple linear regression model y equal to x beta plus epsilon where expectation of epsilon is equal to 0 and, but the constant variance assumption is not true. Here the variance of epsilon is equal to v into sigma square. That means, the variance there exist inequality in variances and also the epsilon i's are not necessarily they are uncorrelated. They are correlated here that is why you can see that here v is positive definite. It involves some non-zero of diagonal terms. So, this happens this happens when observations y have unequal variances and or observations are not equal to 0. So, this is not true. They are correlated. So, here is the violation of the basic assumption in the multiple linear regression model we assumed before. So, now in either case the condition of Gauss Markov theorem are conditions of Gauss Markov theorems are violated. So, beta hat which is equal to x prime x inverse x prime y, which is this is the estimator of beta using the ordinary least square estimator is not the best linear unburst estimator. Now, what we do here is that in this situation also it is possible to find blue of beta best linear unburst estimator of beta for arbitrary positive definite v by suitable linear transformation on the model. So, what we do is that we have this model at this moment multiple linear regression model y equal to x beta plus epsilon. Now, we will take a linear transformation on this model. We multiply this model by g and we take a linear transformation on this model. So, now note that what is the variance of. So, this is the transform model now. So, the variance of g epsilon I mean we have you have to choose the right g of course. So, how do you choose this g is the variance of g epsilon is equal to sigma square g v g prime. So, this is because because the variance of epsilon is equal to v sigma square. So, what we are doing here is that you are given the data x y. So, x y the given data does not satisfy the model assumption it has non constant variance. And then we transform this data to g epsilon x g y. Now, our problem is to choose a correct g such that this quantity is identity. So, we want to make we have to choose g in such a way such that g v g prime is equal to identity. So, therefore, if we choose g such that g v g prime is equal to I then the transform data then the transformed data satisfy Gauss Markov. So, if we choose g such that g v g prime is equal to I then the transform data then the transformed data satisfy Gauss Markov. So, conditions and the blue base linear unburst estimator of beta is obtained by ordinary least square estimator of transformed data. So, my original data does not satisfy Gauss Markov satisfy the basic assumption it has inequality in variances and they are not they are correlated. So, we transform the data to g x and g y and we have to choose this g in such a way that the transform error has variance sigma square I. So, once we have the transform now since the transform error has variance sigma square I. So, it satisfy the Gauss Markov theorem satisfy the conditions of Gauss Markov theorem and that is why the blue of B can be obtained by ordinary least square estimator of transform data. So, here this g v g prime equal to I which is equivalent to v equal to or v inverse is equal to g prime g and and which is equivalent to v equal to g inverse g prime inverse. So, now what is the least square? So, you have to choose a g in such a way that v equal to g inverse g prime inverse well. Now if we apply the ordinary least square estimator on the transform data what will get is that we will get beta hat equal to x prime g prime g x inverse x prime g g y. How we got this thing because see in case of ordinary least square estimator beta hat is equal to x prime x inverse x prime y. So, what I am doing here is that you just replace x by g x and y by g y because we are working on the transform data g x and g y. So, you put you replace this x by g x you will get this formula and the variance of and finally, this can be also written as let me write here this can also written as in terms of v I can write that this is equal to x prime v inverse x inverse x prime v inverse x prime v inverse x prime y. So, this is the blue obtained using generalized least square technique and the variance of beta hat is equal to sigma square x prime g prime g x inverse. This is same as the before we had you know in case of ordinary least square we had variance of beta hat is equal to sigma square x prime x inverse. So, what you are doing here what you are doing here is that we just replacing x by g x and in terms of v this can be written as sigma square x prime v inverse x inverse. So, this one is the generalized least square estimator of regression coefficient and here is the variance. Now, still I did not talk how to get this g I mean we have v and then we have to choose g such that g v g prime is equal to identity matrix where we know that v is a positive definite matrix. Well, so we have to choose g such that such that v inverse x prime v inverse x prime v inverse is equal to g g prime g. Well, so v is a positive definite matrix. So, it is always possible to find a symmetric g by using the order of the matrix. So, orthogonal decomposition of positive definite v. So, we can positive definite matrix v can be written as u prime capital lambda inverse u where capital lambda is the diagonal matrix of Eigen values and u is the matrix of Eigen vectors. So, what we want is that well v is a positive definite matrix that we know from there we can and then v can be decomposed in this way. Now, we want to write we want to find g so this implies v inverse is equal to u inverse capital lambda u prime inverse and I want to write this v inverse as g prime g then the choice for g is g equal to capital lambda half u prime g equal to capital lambda half u prime inverse. So, this is the choice for g. Well, so what we have learned here is that if your given data does not satisfy the basic assumption of constant variance if there are inequality in the variances and also if the errors are correlated or observations are correlated then we have I mean then variance of epsilon is cannot be written as sigma square into i it is v into sigma square where v is a positive definite matrix and then from that v we can find g such that if you take the transformation I just talked about here if you take the transformation then transformation like g y equal to g x b eta plus g epsilon then you work on the transform data and then the ordinary least square the transform data now satisfy all the basic assumptions and then you can apply ordinary least square technique on the transform data. So, this is what the generalized least square technique is and now we will show how this generalized least square I mean how we can get what weighted least square technique as a particular case of generalized least square. So, we already know what is generalized least square and an important special case important case is weighted least square. So, at the beginning I talked about weighted least square and then I said that instead of I mean we have to find the regression coefficient by minimizing this w i e i square where w i is proportional to 1 by sigma square and then I said that you know I will explain this why this weight is proportional to 1 by sigma square. So, now we know generalized least square and as a particular case weighted least square is a particular case of generalized least square. So, here observations are uncorrelated and have unequal variance. So, uncorrelated means the op diagonal elements are equal to 0. So, variance of epsilon you can take this as sigma square 0 0 0 these are the covariance term all the covariance as a 0 because it is uncorrelated 0 sigma 2 square 0 0 sigma n square. So, this is the variance covariance matrix well. So, this is equal to my v and I want to find a g such that v inverse is equal to g prime g and here it is easy you can very easily find a g such that v inverse is g prime g. You can check that the choice of g is equal to just v to the power of minus half. So, this is equal to v to the power of this is my v. So, this is equal to 1 by sigma 1 0 0 0 0 1 by sigma 2. So, if you choose this g you can check that v inverse is g prime g and this one is nothing but diagonal 1 by sigma 1 1 by sigma 2 1 by sigma n. So, now my beta hat is equal to we know that beta hat is equal to x prime v inverse x inverse x prime v inverse x prime v inverse x prime v inverse y and if we write my weight is equal to v inverse then this is nothing but x prime w x inverse x prime w y and variance of the variance of the variance of the variance. So, the variance of beta hat is equal to x prime v inverse x inverse which is equal to x prime w x inverse. So, this explain this explain. So, this is the weight we are talking about in the weighted the in the weighted least square the weight is nothing but this v inverse. Then whereas, v is this matrix and then it is clear that y w i is proportional to 1 by sigma i square. So, I explain this part, but still I need to explain one more thing. So, I explain why this weight w i is proportional to 1 by sigma i square, but I will explain in my next class how to get this sigma i square because see you are just given a set of observations or a data set x i y i and from that data you identified that some of the basic assumptions like constant variance and normality assumptions are violated. To correct those assumptions what you do is that you are going for weighted least square now to apply weighted least square you need to know sigma i squares right. So, in the next class, but sigma i squares are of course not given. So, in the next class I will take an example which the example gives you just a set of observation x i and y i and then we will first realize that for that observation the some basic operations are violated and then we will try to apply weighted least square estimated to correct that model and of course for that we need to find the sigma i square and we will talk about how to find sigma i squares in the next lecture. Thank you very much.