 in module 5, that is the model adequacy checking. Here is the content of this module, we have already talked about various types of residuals like you know regular residuals, standardized residual, studentized residual and press residuals. And also we have talked about two residual plots like you know normal probability plot and plot of residual against the fitted value y i hat. So, what we are going to do today is that we will be talking about plot of residual against the regressor and we will be talking about partial residual plot and finally, we will be talking about the press statistics. So, first we will be talking about the plot of residual against regressor. So, here we are in the multiple linear regression model setup, multiple linear regression and we have the response variable y and several regressor variables like no x 1, x 2 may be beta k minus 1 x k minus 1 plus epsilon. So, like residual against y i hat plot here we will be plotting the residual against the regressor x 1 and then again residual against x 2 like that. And this particular plot you know this plot of residual against regressor is important in determining relationship between the response variable y and regressor. And also you know this plot are very similar to the plot of I mean the pattern of the plots here in the case of residual against regressor are very similar to the pattern of plot in case of residual versus y i hat. So, in this case also you know here instead of y i hat we will plot some regressor x j and here also you know the residuals containing in a horizontal band is desirable and this is treated as you know this sort of horizontal bond band containing all the residuals indicates satisfactory model. And the open funnel I mean like whether it is similarly here also you know instead of y i hat we will be plotting in this horizontal axis x j. So, here also the outward open funnel indicates that the variance of I mean indicates non constant variance like whether it is outward open funnel or inward open funnel that indicates non constant variance. Similarly, in the case of double bow also for example, x j here and x j here. So, all these things you know this double bow also indicate non constant variance and this type of non linear pattern indicates that the assumed relationship between y the regressor the response variable y and x j is not correct. So, that means you know we have to consider the x terms like you know maybe the higher order regressor like for example, x j square or x j q or some transformation on x j for example, may be 1 by x j or may be log x j. So, this is what the non linear curve indicates. So, what we learn from this diagram is that residual against the regressor plot this is important to determine the relationship between between the response variable and a particular regressor x j and the limitation of this plot is that it may not show the marginal effect of x j a particular regressor x j on the response variable. Because what we are doing is that we are plotting residual against a particular regressor and let me tell little bit more about this one. So, what we do here is that we are plotting E i against a particular regressor x j. So, what I told that you know the limitation of this plot is that it may not show the marginal effect you need to understand this part you know it this plot it may not show the marginal a regressor x j given the other regressor model. See we are talking about multiple linear regression if it is simple linear regression say simple linear regression and there is only one regressor. So, y equal to beta naught plus beta 1 x j beta 1 x j plus beta 1 x j plus beta 1 x j ok. So, in that case in case of simple linear regression there is no difference between between the plot of E i and y i hat and the plot of E i against x j. So, those two cases are same and I mean in case of simple linear regression the plot of E i against y i hat is is is same as the plot of E i the residual against against x because there is only one regressor and you know they are linearly related. So, why whether you plot against x or y it does not it does not matter, but in case of multiple linear regression there is a difference between these two cases are not same because there are so many regressors more than two regressors are there. So, what I told here is that the limitation of this E i against x j plot is that that may not show the marginal effect of x j given the other regressors in the model. So, that is why we need to go for partial residual plot. So, I will try to explain you know the partial residual plot now this is an improvement of residual against regressor plot. So, this is called partial residual partial residual plot. So, what we do here is that here in the partial residual plot partial residual plot consider marginal role of the regressor x j given other regressor that are already in the let me just give the outline of this plot what what does this partial residual plot does and then I will try to explain the logic behind this plot. So, what it does is that in this plot in this plot the response variable the response variable and the regressor say x j both regress against the other regressor in the model and the residuals obtained for each regression. So, what I told is that you know the this partial residual plot consider the marginal role of what is this marginal role of the regressor x j given the other regressor that are that are already in the model. So, since since we are talking about talking about multiple linear regression here we will need to consider the role of you know the marginal role of one regressor in the presence of other regressors in the model. So, the technique is that you know first you first the both the response variable y and the particular regressor x j both are regress against the other regressors in the model. So, here the response variable y is regress on all the regressors except x j and similarly x j is also regress on the remaining regressors. And then the plot the plot of so we will get two residuals from this two regression feed and the plot of these residuals then we plot. So, there are two residuals one is there is two regression model we are trying to feed one is we are trying to regress the response variable on all the regressors except x j. So, this is one regression model and the other one is that x j is regress on the remaining regressors. So, this is the second regression model from these two once you have the once you have the fitted model for for these two regressions these two regression model you will get the corresponding residual values and then you plot those two residuals to get the this margin marginal residual plot. The plot of these two residuals against each others show the marginal role of regression regressor x j on on response variable in the presence of in the presence of other regressor. This partial regression plot is little difficult concept you know I will I will try I will try my best to explain it. Just now I will give one example which consists of two regressors that means I will I will take an example of multiple linear regression model with two regressors and I will try to explain the the technique first and then after that I will try to give the the logic behind you know the idea behind the partial residual plot. I said that you know this this partial residual plot consider the marginal role of x j. So, you need to understand what I mean by the marginal role of x j on the response variable in the presence of other regressors in the in the model. So, so there are several regressors I told let me just first give the give an example to illustrate the technique what I explained here and then I will be talking about the the logic behind this partial residual plot. So, here is the example here let me illustrate the technique first illustration. So, consider multiple linear regression model with two regressors x 1 and x 2. So, my model is of this form y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus epsilon. Suppose for example, I first I am interested to know the marginal role of x 1 on the response variable y in the presence of x 2. So, I am interested suppose we are interested in marginal role of x 1 on response variable y in the presence of in the presence of other regressor in the model. So, according to the technique what I will do is that there are two regressors x 1 and x 2 and I am interested to see the marginal role of x 1 on the response variable. So, first what I will do is that my aim is to eliminate the effect of x 2 see there are two regressors x 1 and x 2 I am interested to I am interested to see the marginal role of x 1 on y. So, what I will do is that I will just first I will eliminate the effect of x 2 from the response variable y and for that what I have to do is that I have to fit a model between I mean I will I will I will regress y on x 2 first and from there the residual will give me the part of variability in y which is not explained by the by x 2 and then I expect x 1 to I want to see how much of that residual x 1 can explain. So, this is the idea you know I am I am interested in the marginal role of I repeat you know I am interested in marginal role of x 1 on the response variable y. So, first what I will do is that I will eliminate the part of variability that can be explained by the variable x 1 by x 2 that means I am just eliminating the effect of x 2 from y. So, that remaining part I want x 1 to explain and I will see how much the remaining variability can be explained by x 1 that is what the basic idea behind this behind this marginal sorry partial residual plot. So, to eliminate the effect of x 2 from y or to note the part of variability which cannot be which is not been explained by x 2 first I need to regress y on x 2 and I will calculate the residual that is the residual part is the part of variability that is not explained by by the regressor x 2. So, first what I will do is that I will regress I will regress y on x 2 that is same as saying that in general in case of many regressors regress y on all regressors except x 1 if you are interested in finding the marginal role of x 1 since here are here we have only 2 regressors I am regress I am regressing y I mean y 1 x 2 only. Suppose my fitted model is y i hat equal to theta naught hat plus theta 1 hat x i. So, this is the fitted model between the response variable and x 2 and I want to introduce one notation here I will put bracket 1 that means that response variable has been regressed on all the regressors except x 1 this is the meaning of this. So, the regressor variable is has been regressed on all the regressors except x 1 this is the meaning of this. So, the regressor variable is has been regressed on all the regressors except x 1 and once we have this fitted model we can compute the part of variability in y which has not been explained by the second regressor that is the residual residual e i is generally we write y i minus y i hat, but here I will write y i minus this is the observed value and this is the fitted when y is regressed on x 2 this is the fitted value is y i hat 1 and I will put y i minus y i hat 1. One bracket 1 here to denote that this is the residual and also I will put y here little bit complicated notation, but this is the residual obtained from this regression and also according to my partial residual plot technique it says that in this plot the response variable and the regressor x j suppose I am interested to find the marginal role of x j on the response variable. So, I will regress response variable and the regressor x j on the remaining regressors. So, here also I will regress x 1 on x 2. So, regress x 1 on x 2 and suppose my fitted model is x i 1 equal to alpha naught hat simple linear regression model alpha 1 hat and I am regressing on x 2. So, x i 2 this is my final result and this is my fitted model and so this basically you know see how much of the variability in x 1 can be explained by the other regressor. Sometime they are not completely independent there might be little bit of dependence between them. So, we just want to eliminate the effect of the contribution of the other regressors on the regressor we are interested on. So, here the residual is E i 1 x I will put here maybe little complicated notation x i 1 is the original value of the first regressor i th I am talking about the i th observation minus x i 1 hat and then what this partial residual plot does is that I am talking about partial residual plot does is that it plots E i 1 y i 1 and then what this partial residual plot does is that I am talking about partial residual plot does is that it plots E i 1 y i 1 y i 1 y i 1 against E i 1 x. So, this plot is called the partial residual plot whatever it might be will slowly establish the relation between these two you know in case if you if we are assuming that the relation is linear then what we can expect from the plot of this well what pattern we expect in the partial residual plot before that. So, this is an example I hope you understood the technique at least of the partial residual plot. So, it says that you first regress both suppose if I am interested to find the marginal role of x j on the regressor y then I will regress both y and x j on the remaining regressors and. So, there will be two regression model and you find the residuals corresponding to those two regressors. Regressors and models and plot them to get the partial residual plot. So, this is an example which consist of only two regressors next I will just introduce the notations the same concept only I will just introduce the notations for in in general case when there will be you know for example, k minus 1 regressor instead of only two regressors well. So, you know everything and and just some more notation here in in case of say for example, there are k minus 1 regressors for example, there are k minus 1 regressor regressors instead of only two regressors. So, the partial the partial residual of y for x j is defined as I j is defined as I j is y the same notation here I for the ith observation is equal to y i minus y i j I hope you understand the notation. So, this is the fitted value when the response variable has regressed the on all the regressors except x j. So, in my previous example this is what you know this is nothing but the partial residual of y for x 1 this is the partial residual of y for x 1 this is the partial residual of y for x 1 and this is the partial residual of y for x j. So, where y i j hat is a prediction prediction of y i from a partial residual of y i from a regression model using all regressors x j. So, I hope it is clear now and and this one this e i j y represents the variability in y i and y i represents the variability in y that is not explained not explained by a model exclude that excludes the regressor x j. So, this is the part of variability which is part of variability in y i which is not been explained by the by all the regressors except x j. So, what in other words also what does this mean is that the this is you know sort of the effect of all the all the effects of all the other regressors except x j has been removed from the response variable y. So, we want to see how much of this variability which is remained unexplained e will be how much of that variability how much of this variability can be explained by x j. So, the so the dependence I mean the dependence of the so this complicated you know this residual e i j y is the part of variability in the response variable y which is not been explained by all the regressors except x j. That means that residual represent sort of you know the effect of all the other regressors except x j has been removed from the response variable y. And we want to see how much of this variability can be explained by x j alone. So, that is the marginal role of x j to explain the variability in the response variable y. So, this is what we want to mean by the marginal role of x j. So, this is the variable y and the notation corresponds to y. Now, the partial residual of x j the similarly x j is regressed on the remaining regressors. So, the partial partial residual of x j is is defined as e i for ith observation j for jth regressor. You put the notation x here. So, this is x i j minus x i j hat. So, this is the fitted value obtained when the jth regressor is defined is regressed on the remaining regressors. Where x i j hat is a prediction the prediction of the regressor value x i j from a from a regression of from a regression of x j on all other regression variable. Similarly, you know this e i e i j x represent the variation the variation in x j that cannot be explained by other regression. So, this is quite a routine thing. So, this is the variable y i j hat. So, this is the variable y i j hat. So, this is the variable y i j hat. Now, we have understood the technique of partial residual plot and we know little I mean we have some idea about you know the logic behind this partial residual plot why it is so. And next see what we do is that we fit two regression model and we get two residuals. And now our aim is to find out the relation between these two residuals whether these two residuals are linearly related or there is some other relation between these two these two residuals. So, little difficult multiple linear regression model we are in this set of and we are interested to find the marginal role of x j on the remaining on on on on response variable y in the presence of other regressors. So, in the multiple linear regression model in matrix form in matrix notation the model is y equal to x beta plus epsilon. You know what is this x you know what is this y y is the vector of y 1 y 2 y n and beta is the vector of. So, this x is the vector is a n cross k vector sorry n cross k matrix and beta is k cross 1. So, beta consist of beta naught beta 1 up to beta k minus 1 and epsilon is epsilon 1 epsilon 2 up to epsilon n. What is this x x is equal to 1 x 1 1 x 2 x 1 2 x 1 2 x 1 2 x 1 2 x 1 2 x 1 2 x 1 k and then 1 x 2 1. So, this is the for the second observation x 2 2 x 2 k and 1 x n 1 x n 2 x n k. So, these are the rows are corresponds to the observation. So, there are n observations and the columns are corresponds to. So, this is the column associated with the second regressor x 2. So, this one is x 1 for example, and you call it say x naught and this is I should write my b k minus 1 k minus 1 k minus 1 x k minus 1. So, what I am to do is that I want to break it into two parts one is x j x j is the x matrix except the j th column. So, I want to remove the j th regressor from this matrix. So, x j is the matrix obtained from x by removing the j th column here. So, similarly b j b j is also you know before it was beta naught beta 1 up to beta k minus 1. Now, I am just removing the b j from this vector. So, this is called b j. Now, I will add those two things what I have removed from here I will add here that is x j. So, x j is the j th column and b j what I have removed the coefficient from this coefficient vector plus epsilon. I hope you understand it this is just adjustment you know once you remove from here and then again you add it here well and also we know about the hat matrix. If this is the model then the hat matrix is h equal to x x prime x inverse x prime, but if I consider only this part if I just remove the j th regressor from the model then that means x j here x j here j here x j here x j here x j here x j here j here then this is called h j. So, this is my h j now what I will do is that I will multiply this. So, this is you know this is the had a but I did not put in matrix I will just multiply i minus h j left multiply this matrix in this equation. So, what I will get is that I will get i minus h j little difficult, but I hope you know if you under if you concentrate you can understand. So, this is I have left multiplied this matrix the left hand side. So, again i minus h j x j beta j. So, see what I told the beta j beta is consist of originally beta naught beta 1 beta j and up to beta k minus 1 and what I did is that this beta j is nothing but this beta bracket this is nothing but the same vector you know just remove the j th 1. So, it will become before it was for example, it was k cross 1 now it will become k minus 1 cross 1 right anyway. So, plus i minus h j just this is the equation you know x j x j beta j plus i minus h j epsilon. So, what you know is that we know that in the matrix notation E equal to i minus h y. So, what you know is that we know that in matrix notation E equal to i minus h y. So, what you know is that we know that in matrix notation i minus h y then this one is the residual when y is regressed on all regression except x j. So, then this one is nothing but E i j y which is equal to you can check that you know this quantity is equal to 0 this is 0 because if you multiply x j here this 2 will cancel out then x j minus x j basically and then this one is nothing but according to one notation this is nothing but beta j into this is sort of when the residual obtained when x j is regressed on the remaining regressors. So, this is E i j x and I call it epsilon i star. So, this shows that. So, this is the relation between you know if the jth regressor has a linear relationship with the response variable then the 2 residuals will have also the linear relationship with the same slope with the same regression coefficient. So, this shows partial residual plot should have beta j. So, what you understood is that we know how to compute 2 residuals and we have understood that the relationship between these 2 residuals are also linear there will be a straight line fit between these 2 residuals with the slope beta j. Now, I will just talk about different pattern of the partial residual plot. So, here you can see the partial residual of y is along the horizontal axis sorry vertical axis and the partial residual of x j along with the x axis and similarly here you can check that the partial residual residuals are scattered around a line y equal to beta j x in both the cases and if this is the pattern obtained for your example then you can conclude that there is a strong linear relationship and also it says that the less scattered indicates strong linear relationship between x j and y. So, this is one type the less scattered indicates strong linear relationship between the x j and y. So, this is how the marginal role of x j in one y and the other pattern is the other pattern is if the pattern is like this you know in curvilinear band then this indicates that the higher order term in x j or transformation such as you know 1 by x j or log x j may be helpful. That means this indicates that the x j is not linearly related to y either you know either you have to go for the higher order term for x j or you can you have to go for some transformation on x j like log x j or 1 by x j. So, this is how you know we find the marginal role of regressor variable on the response variable y and I think I have to stop now thank you.