 So, this is my third lecture on transformation and waiting to correct model in adequacy and here is the content of this topic. We already talked about variance stabilizing transformation and transformation to linearize the model and also we talked about generalized and weighted least square. So, today I want to give an example to illustrate this weighted least square technique we talked in the previous class and also I am going to talk about this analytical methods to select a transformation. So, let me repeat once more that in simple linear regression model or in the multiple linear regression model we make several assumption on the error terms and given a set of observations say x i and y i you do not know whether your data set satisfy those assumption or not. So, you have studied or you have learned several techniques to check whether your data set satisfy the model assumptions or not in a module called model adequacy checking. More specifically you know this residual plot that is residual against fitted response is an effective technique to test whether your data set satisfy the model assumptions or not. Now, in this module what we are doing is that you know we are suppose your data set does not satisfy the model assumption then how to correct the model in adequacy. So, we have learned about two techniques like one is called variance stabilizing transformation and also we learned generalized and weighted least square to correct model in adequacy. So, today what I will do is that I will sort of repeat weighted least square technique and then I will give an example to illustrate weighted least square technique. So, weighted least square linear regression model with non constant variance can be fitted by the method of weighted least square. So, this is a particular case of generalized least square technique and here the variances are not equal, but the observations are sort of uncorrelated that means the covariance terms are equal to 0. So, suppose you are you are given the data set say x i y i and you are trying to fit a model between this two variables between x and y a simple linear regression model say y equal to beta naught plus beta 1 x plus epsilon and you know here that variance of epsilon i is not equal to sigma square for all i that is non constant variance. So, what we do here is that we consider the weighted least square function the weighted least square function is equal to s which is y i minus y i hat basically. So, that is beta naught hat minus beta 1 hat x i this is the function we minimize to estimate beta naught and beta 1 in simple linear in ordinary least square, but here what we will do is that we give a weightage w i to the i th observation. And what we studied in the previous lecture is that this w i is proportional to 1 by sigma i square and I already explained this part in the previous class why this weight is proportional to 1 by sigma i square. So, today what I am going to do is that the main problem you know see here you are just given the observation x i and y i. So, you do not know what is this sigma i square for the i th observation. So, I will illustrate how to estimate this sigma i square for a given set of observation x i and y i. So, let me take an example this is the restaurant food sales data. So, here we have 30 observations here and this is the response variable y this is income on food sale per month and this is the advertising expense for food sales. So, this is the response variable y this is x i for the whole year and. So, here is your response variable y i which stands for the income per month and the regressor variable x i which is cost on advertising per year. And we are trying to sort of find a relationship between these two variables x i and y i. So, first what we do is that well you are given x i and y i for i equal to 1 to 30 and why do not you just first fit a simple linear regression model say using the ordinary least square technique you fit a model y equal to beta naught plus beta 1 x plus epsilon. So, you know how to estimate this parameter beta naught and beta 1 that you have learned in the first module called simple linear regression model. So, here is the fitted model and then once you have the fitted model you compute the residuals. So, residual means you compute e i e i is equal to y i minus y i hat. So, this is the original observation or response value and this is the fitted response value and then you plot. So, this is what I call and this is called the residual plot. So, here residual is plotted against the fitted response y i hat and look at this plot here. It looks very similar to outward open funnel to mean. So, here residual is plotted against the fitted response y i hat and look at this plot here it looks very similar to outward open funnel to mean. So, that means here the constant variance assumption is violated. So, what happen here is that here variance of y increases or sigma square increases as y increases. So, the constant variance assumption is violated here. So, this implies that the ordinary least square fit is inappropriate here. So, you cannot go for of course, the ordinary least square fit is the starting point and then you have realized that from the residual plot that ordinary least square fit is inappropriate. So, now to correct this inequality of variances we will go for weighted least square technique and for weighted least square technique to use that we need to know sigma i square because there in the weighted least square technique we minimize this quantity s which is w i is e i that means w i y i minus beta naught hat minus beta 1 hat x i. We minimize this quantity so we need to know the weight w i for the ith observation from i equal to 1 to 30 here. So, now I will talk about given a set of observation x i and y i how to estimate sigma i square for the ith observation. So, sigma i square is the population variance from where the ith observation is coming well. So, here is the observation again this is the income and the cost on advertising. Now, look at this data here you can see that these three x values they are sort of near equal. So, what we do is that we will put them in one cluster so this is one cluster these two values are again near equal. So, we will put them in one cluster here you can see this five observations or five regressor values they are near equal we will put them in one cluster so on. Now, here you compute the average of this cluster x bar and the idea here is that you know these three points are enough near equal to consider them as a single point. So, corresponds to a single response corresponds to a single what is called what we call x value. Single x value we have three responses and so these are the three response values in this cluster and what we do is that we will compute the sample variance of this response values. So, here is the sample variance I am sure that you know what is the sample variance corresponds to this cluster. Here is the sample variance corresponds to so this is the sample variance corresponds to this two observation and here is the sample variance corresponds to this five observations. So, on you compute the sample variance is now variances. Now, if you look at the x value or x bar value and the sample variance, you can see that the sample variance of the response variable y that increases as x increases and if you sort of scatter plot between x bar and the sample variance, you will see that there is a that scatter plot will indicate sort of linear relationship between these two. So, what we do is that, then we fit a linear relationship between x and the sample variance so least square feed gives s y square equal to this is the estimated value equal to minus 7376216 plus 7819.77 x bar. So, what we are doing is that, we try to find the regressor variable or x values which are near equal and then we consider them as a single point corresponds to the regressor variable and then corresponds to that single point, you have several response values and you compute the sample variance of those response value. So, you can think of like that particular option say for example, this one. Now, you can think of that this observation if this is my advertising cost, then the response or the income whatever I get that is coming from a population which has sample variance this much and that we do for all the clusters and then we see the relationship between the sample mean for advertising cost and the sample variance for the response variable and we fit a linear relationship between these two. Why we do that? Now, what you can do is that, you can substitute each x i value into this equation gives an estimate of variance sigma square corresponds to y i. Now, here you put x i you have 30 x i values and then you will get some estimate of the variance corresponds to that x i. That means, you will get here the estimate of sigma. So, sigma i square hat that is you know basically s y i hat this is same. So, this way once you have sigma i square for all i i equal to I mean estimate of sigma i square for i equal to 1 to 30, you can compute w i which is equal to 1 by sigma i square hat. So, here you can see that you compute the sigma i square corresponds to I mean estimate sigma i square corresponds to this point and then you take 1 by sigma i square you will get the weights here. So, the weights are given here and then, so you are now in a position to apply weighted least square technique. So, you apply weighted least square technique because you know the weights all the weights w i from i equal to 1 to 30 and here is the model obtained using weighted least square technique. Now, to see whether this feed has any improvement for the previous one again what you do is that you have the fitted equation and then you compute the residual and you know the fitted response. So, you again what you do is that you again plot draw the residual plot. So, this is a plot of e i against v i hat sorry y i hat and here instead of just simple residual against the fitted response we take the weighted residual and here also you know we multiply that by the weight. So, here is the here is the residual plot and this is the line e equal to 0 and this sort of the residual plot indicates that weighted least square has improved the feed because before the weighted least square it was sort of outward open funnel, but now it is it has improved I mean I mean here the residual are almost like you know centered above the line e equal to 0. So, this is how we apply weighted least square technique to when you are given a set of data. So, given a set of data so, the final message is that you know given a set of data you check whether that satisfy the model assumptions if it is not then if you are willing to apply weighted least square technique you find out sigma i square because the weight w i is proportional to 1 by sigma i square and we talked about how to find or how to estimate sigma i square just now and then you fit linear regression model using the weighted least square technique and finally, after fitting the after once you have the fitted model using the weighted least square technique you again draw the residual plots and see the improvement. Well, so this is what about the generalized least square and weighted least square technique and finally, we will talk about one more technique which is again to correct the model inadequacy. So, this technique is called box-cox method. So, this is called box-cox method and this one is a technique to correct the model inadequacy by transforming the response by transforming the response variable and it says that so, this one is in I mean this is this one correct the model inadequacy by transforming the response variable y to something else and it says that a useful class of transformation is power transformation is power transformation that is you transform y to y to the power of lambda where lambda is a parameter to be determined. So, what we are doing is that so, we are trying to correct the model inadequacy by transforming y to y lambda and lambda is a parameter I mean which lambda whether it is 2.5 something or minus 2. Now, the problem with this power transformation or this particular power transformation is that so, this advantage is that is that as lambda approaches 0 y to the power of lambda approaches 1. So, that means this is sort of meaningless because here all the response variable equal to 1 irrespective of what is the value of regressor variable all the response variable is equal to 1. So, this is an disadvantage of this particular power transformation I mean transforming y to y lambda so, it says that the method says that one approach solve this difficulty is to use this transformation instead of y equal y transforming to y lambda you take this transformation w which is equal to y to the power of lambda minus 1 by lambda for lambda equal lambda not equal to 0. And as you know that this function tends to log y log y as lambda tends to 0 for lambda equal to 0. So, this solve the problem of all the response variable transforming to 1 as lambda tends to 1 sorry as lambda tends to 0, but the problem I mean problem with this one or even this one is that as lambda increases the values of this functions change very much which makes it impractical to compare regression models as you see you know of course, if when lambda is large then w value will be very large and this makes it impractical to compare the regression models. So, we need to use some normalization factor here. So, the geometric mean of the response variable which is denoted by y dot which is equal to y i is used as normalization. So, what we do is that we transform y to v where v is equal to y to the power of lambda minus 1 by lambda into y dot lambda minus 1 for lambda not equal to 0 and transform y to y dot log n sorry log y this is base e for lambda equal to 0. So, this is the transformation suggested by box cocks method and the question is know how to get this lambda value. So, given a set of observation x i y i what you are doing is that you are transforming all the response variable y 1 y 2 y n to v 1 v 2 v n and use it to fit linear model between y and x sorry between v and x v equal to x beta plus epsilon by y this square for any specified value of. So, what box cocks method does is that you know it suggest some transformation from some power transformation of course, from for response variable y. So, you transform y to v for all i I mean you transform y i to v i for i equal to 1 to n and then you fit a linear regression between the transform variable v and the regression variable x by using the ordinary list for a specified value of lambda because see still I did not talk about how to how to decide how to fix the value of how to decide the value determine the value of lambda. So, here is the method method to determine the value of to estimate the value of lambda this is called maximum likelihood method of estimating. So, what are the steps you know it suggests that you know choose a value of lambda from this interval minus 2 to 2 it is a closed interval at first and extend the range later if necessary if necessary. And then for each chosen lambda value evaluate v and compute s s residual corresponds to that lambda for the regression model v equal to x beta plus epsilon for a chosen value of lambda you fit this model between x and beta and then between x and v and then you compute the s s residual you know what is this s s residual is and then it says that the maximum likelihood estimator of lambda corresponds to the value of lambda for which s s residual lambda is is minimum. So, this is you know of course you you compute you take different value of lambda between this in this interval you compute you fit the model between y between v and x simple linear regression model or of course multiple linear regression model if number of regression is more than one. And then you compute s s residual for each lambda and then you see for which lambda this s s residual is minimum and that value of lambda is the maximum likelihood estimator of lambda. Now, I give an example to illustrate this box cocks method. So, here this is called you know electric utility data. So, I have 53 observations total and this y the response variable is stands for the peak hour demand and the regressor variable x stands for the energy usage and then you per month. So, this is for a first family this is the energy usage in a particular month and here is the peak hour demand. And what we are interested to do is that we are interested to find a relationship between the monthly usage and the peak hour demand. So, here we have only one regressor variable and the response variable here and of course what we will do is that we will fit we will start with the simple linear regression model using ordinary least square technique. And and here is the fitted model between y and x y hat is equal to minus 0.8313 plus 0.00368 x. So, this is the fitted model between or the relationship linear relationship between x and y. And of course, you know we need to check whether this fit is good or is it good or is it good or is it good or is it good. Or in other sense I mean whether this data set that you are given a data set x i y i for i equal to 1 to 53 I believe 1 to 53. And you need to check whether this data set satisfy the basic assumptions or not. So, for that what we do is that you fit a simple linear regression model does not matter whether this data set satisfy the basic assumptions or not you fit a linear regression model using ordinary least square. And then you compute the residuals you compute the residual and you have the fitted response you plot them. So, this is what is called the residual residual plot. And this residual plot will suggest or will say whether the data sets satisfy the basic assumptions or not. So, here instead of I think instead of E i we have standardized we have used standardized residual it does not matter. So, if you see the residual plot here. So, T i is plotted plotted against estimated response. So, you can see here that again this residual plot is sort of outward open funnel. That means this residual plot indicates that ordinary least square fit is not appropriate. And because of the fact that here the very variance of y increases or sigma square increases as y increases. So, this data sort of violates the constant variance assumption. So, we cannot continue with the ordinary least square fit here what will do is that we will try to apply the box cocks technique here. So, the residual plot suggest that the error variance increases as you know energy consumption increases. That means I think this is the x value. So, this is the ordinary least square fit. And now what we do is that we will take a lambda from this interval minus 2 to plus 2. So, first we start with minus 2 and you compute v you know what is that v for lambda equal to 2. So, you transform your y to v and then you fit a model between v and x v equal to x beta plus epsilon. And here is the residual value. So, you do it for different length of the lambda value and compute the corresponding SS residual. So, here you can see that SS residual is minimum for lambda equal to 0.5. So, this is called the maximum likelihood estimate of lambda. So, this is called the maximum likelihood estimate of lambda. So, maximum likelihood estimate of lambda is lambda hat is equal to 0.5. So, the transformation finally we go for is this one. You transform y 2 y to the power of half that is y to the power of lambda minus 1 and lambda y to the power of lambda minus 1. So, lambda minus 1 is again half here. So, this is the final transformation. Of course, this is the geometric, this is the geometric mean of the response variable. So, you transform y to this. So, this is the suggestion from the box-cox technique and if you check the residual plot for this transform data for lambda equal to half. Now, you can see the residual plot after the transformation. Here you can see this is the line e equal to 0. So, the transform fit here I mean has improved because this residuals here, the standardized residual are almost centered about the line e equal to 0. So, this is what the box-cox method is and now we have some time for this. So, this what we will do is that we will talk about some problem. See the problem here suppose we have n observations of variables x 1, x 2, x k and y where x i's are of course regressor operator variable and y is response variable. You have n observations for these variables suppose we are told that observation y i are uncorrelated, but the last observation has variance for sigma square rather than sigma square. Then the problem is that find the best linear unbiased estimator blue of beta using weighted least square technique because you know this sort of fits the weighted least square assumption because weighted least square is a particular case of generalized least square and in weighted least square we assume that the observations are uncorrelated, but the variances are non-constant I mean they are unequal. So, what the data we have here is that variance of y i is equal to the variance of y i is which is equal to of course variance of epsilon i which is equal to sigma square for i equal to 1 to n minus 1 and variance of y n which is same as variance of epsilon n is given to be 4 sigma square. So, what is the variance covariance matrix here variance of epsilon is equal to then sigma square and the data are uncorrelated. So, the covariance terms are all equal to 0, 0 sigma square 0 0 0 and then finally here it is 4 sigma square. So, this I can write as v sigma square where v is diagonal 1 1 up 1 and v is diagonal 1 1 up 1. And then finally it is 4 finally it is 4. So, what have to do is that you have to find best linear unbiased estimator of beta that is the regression coefficient. So, if I forget the formula for the estimator of best linear unbiased estimator of beta you can derive it of course. Let us start with the model y equal to x beta plus epsilon and this variance of epsilon is not of the form sigma square i. So, what we do is that in the generalized square technique what we do is that we take a transformation of this model you multiply by a matrix called g g y is equal to g x beta plus g epsilon. So, g y are the transform data now and we need to choose a correct g right. So, and also we need to make this variance of g epsilon which is equal to sigma square. So, g v g prime I hope you understand this one I want this to be sigma square i which is equivalent to g v g prime is equal to i right which is same as v inverse equal to i right which is same as v inverse equal to g prime g. So, v inverse is equal. So, we need to choose g such that v inverse is equal to g prime g and we know that beta hat is equal to x prime g prime g x inverse x prime g prime g y. I hope you understand this because in the simple linear regression model it is beta hat is equal to just x prime x inverse x prime y. So, what I am doing is that I am replacing x by g x and 1 by y by g y then you get this formula. So, this one is equal to this is this is known as this is blue. So, what I have to do here is that this g is nothing but v inverse. So, the blue is final blue is x prime v inverse x inverse x prime v inverse y and I know my v this is my v and then know I know what is v inverse. So, the final best linear unbiased estimator of beta is x prime diagonal of inverse of this. So, that is again diagonal matrix and it is 1 1 1 by 1 4 just inverse of these elements. This is my v inverse x whole inverse and then x prime again v inverse that is diagonal 1 1 1 1 1 by 4. So, this is this is what the blue of best linear unbiased estimator of beta hat when you have this sort of restriction for the variance of y i. So, that is all for today and we have to stop now. So, just let me conclude the whole module once more. So, we know that in simple linear regression model or in the multiple linear regression model there are some basic assumptions. Now, given a set of data you do not know whether your data set satisfy those basic assumptions or not. So, what you do is that you start with a simple you fit a simple linear regression model using the ordinary least square technique and then you compute the residuals. You go for the residual plot which is a plot between residual and the fitted response from the residual plot will say whether your observation satisfy the basic assumptions or not. Well, so if your data does not satisfy the basic assumption then what we have learned in this module is that how to correct those model in adequacy using different techniques like variance stabilizing transformation weighted least square technique, generalized least square technique and also finally, we talked about regarding transformation of response variable by using box-cox method. So, that is all for today. Thank you.