 This is my second lecture in module 5 that is model adequacy checking and here is the content of this module various type of residuals and in the previous class we talked about regular residuals, standardized residuals, studentized residuals and today we will be talking about you know the press residuals and also next we will be talking about several types types of residual plots like you know normal probability plot or plot of residuals against the fitted values why I had and in the next class maybe we will be talking about partial regression and partial residual plot. So, before I start talking about you know the press residual just I want to repeat once more the objective of this module here you know if you can recall you know in simple linear regression model or in the multiple linear regression model we have assumed that the error term the epsilon has 0 mean and the error term epsilon has a constant variance and the error terms are uncorrelated and they are normally distributed. So, what we are doing going to do in this module is that we will present you know several methods to check the underlining assumptions we made on the error term epsilon and the methods are you know they mostly depend on the primarily depend on the study of residuals because we think it is convenient to think that the residuals are the realized or observed value of the epsilon. Since we are going to test some assumption on the error term epsilon. So, the test for that is based on the residuals and the graphical analysis of the residuals are very effective to test the underlining assumptions on epsilon. So, now I will be talking about press residual which is one scaled residual. So, I have already talked about regular residual and the other things today I will be talking about press residual. So, it is beautiful concept here well the ith press residual denoted by E bracket i is equal to y i minus y i hat. So, we know that the y i is the ith observed value and what is y i hat where y i hat bracket i is the fitted value of response based on all observations except ith one. So, the basic logic behind this is that. So, this is not the regular residual. This is the ith observation the response value of the response variable and this is the fitted value of the ith response based on all observations except the ith observation. Let me just give some idea about this special type of press residual. I will just refer the previous example this is an example of influential observation. Now, if I fit a model based on all the observations here including this influential observation also my fitted model may get influenced by this influential observation. So, my fitted model may look like this. So, this is the fitted model because of this because of this influential observation and you can see that the fitted model has been influenced by this influential observation and the fitted value here I mean this is the model based on all the observations. So, the fitted value here is y i hat. So, this is my y i hat and this is I am considering this as you know the ith observation for example and this is my y i this is my y i suppose this is ith observation. Then the residual I mean the true value of the ith observation that is the true value y i and the y i hat they are the difference is less. So, the difference is this much and this is the ith regular residual e i. Now, if I if this ith observation is deleted from this data set suppose this observation is not there in the model in the set of observations then of my fitted model will look like this and this model this fitted model is of course, it is not influenced by this influential observation because I have I am fitting this model based on all the observations except this ith observation. So, this is what I am talking this value is you know this value is y i hat bracket. So, this is what I mean you know the fitted value of the ith of the ith response variable based on all the observations except the ith observation. Now, my residual the revised residual which I call here e i bracket is this difference. So, e i bracket is equal to y i. So, this is my y y i right. So, this is my y i and this is my y i hat bracket here. So, y i minus y i bracket hat and you can see that the ith press residual is substantially larger than the regular e i well. So, what we do is that we delete ith observation fit the regression model to the remaining n minus 1 observations predict. So, it may appear that you know to compute the first press residual that is e 1 in bracket you need to fit a model based on all the observations except the first observation. So, that is how you will get e 1 bracket that means the first press residual again you know you do not know which one is the influential observation. So, you have to repeat this process n time. So, to get e 2 bracket I mean the second press residual again you have to fit a model to the based on all the observations except the second observations that means you know you need to fit model based on n minus n minus 1 observations that you have to repeat n times. So, but what we are going to do I mean it can be proved that you know you do not need to repeat this process n times to get the n press residuals it can be done you know based on one regression fit based on all the observations. So, here is the technique. However, so it says that it is possible to calculate press residual from the result of one single fit to all n observations. It can be proved that you know e i the i th press residual is equal to e i by 1 minus h i i just recall that you know that this e i is the is the regular residual and h i i is the i th diagonal element of the hat matrix. And this is how you know this quantity and this quantity they are they are same. So, this is how we calculate the press residuals e 1 e 2 up to e n. Now, let me just recall the example we considered in the last class. So, here is an example of multiple linear regression with two regressors x 1 and x 2. And here is the response variable y and we suspect that I mean it is likely that the ninth observation is an influential observation or at least it is a leverage point because x 1 coordinate is is much larger compared to the center of x 1 and similarly x 2 value is much larger. And here is the regular residual this is standardized residual this is the studentized residual and this is the press residual. Now, here we check that for the ninth observation the value of the regular residual is 7.41 whereas, the value of the ninth press residual. So, this is nothing but this is equal to e 9 press residual this is 14.788 whereas, the regular residual is 7.4. So, the for the point for the ninth observation the press residual value is substantially larger than the regular residual. Now, also you can observe that for the 22nd observation the value of the regular residual is 3.6 whereas, the value of the press residual is minus 6.05. So, here also the value of the press residual is substantially larger than the value of the regular residual for 22nd observation. So, this is what regarding the press residual it appears that if the ith observation is an influential observation or if it is a leverage point then there will be substantial difference between the regular residual value and the press residual value for that particular observation. So, next we move to residual plots. So, there are several residual plots we will be talking about the first one is normal probability plot. So, residual plots first we will be talking about normal probability plot. So, before I talk about this residual plot I just want to mention again that the objective of this module is to check the underlining assumption on epsilon on the error term. So, we will be talking about different methods to check the underlining assumptions. So, this is one of them. So, what this normal probability plot does is that it basically check the assumption is that the epsilon i follows normal 0 sigma square and they are independent and identically distributed. So, this normal probability plot what it does is that it check it checks whether the error term really follow normal distribution or not. So, here is the technique to test that. So, what it does is that let E 1, E 2, E n be n residuals that means the regular residuals let E box 1, E box 2, E box n be the residuals ranked in increasing order. So, given the set of data you know you can fit the model whether it is simple linear regression or multiple linear regression and then you can get the residual values you just rank them I mean you arrange them in increasing order. Then what this normal probability plot does is that it plots E i against the cumulative probability P i which is equal to i minus half by n and you do it for all i, i is from 1 to 2. So, very simple technique you get the residuals first you arrange them in increasing order and then you plot E i against P i here is the P i i minus half by n. So, just now I have different types of normal probability plots that may happen this is. So, figure one shows the ideal situation here you can see that all the points lie on a straight line. So, if in your case if you find that you know your plot and all the points are on a straight line then you can assume that the error distribution is normal. Now if here in the figure B you can see that it is not really I mean the points are not on a single straight line and here if this type of situation occurs then we can say that the distribution is heavily tailed that is you know it is not really true that the error distribution is normal and figure C indicates that the distribution of the error term is lightly tailed. So, there is little deviation from the I mean it is not reasonable to assume if such situation occur then it is not advisable to assume that the error distribution is normal. So, this is what regarding the residual plot. So, residual plot basically it takes the normality assumption of the error terms epsilon. So, next we will be talking about one more plot that is the plot of residual term residual against the fitted observation. So, here is the plot a plot of residual E i against the fitted value Y i hat. So, given a set of data you know you fit the model fast and then you can get the fitted value once you have the fitted value you can compute the residual and then you plot the residual against the fitted value well. Now we need to understand I mean if the plot if the pattern looks like this one what does it indicate. So, if the plot of residual against the fitted value looks like this one then you can conclude that the this is a good regression model and the good regression model will produce a scatter in residuals that is roughly constant with Y hat and centered about E equal to 0. Then I mean what I am trying to say here is that if all the residuals contained in a horizontal band. So, here is the horizontal band here you know you can you can see that all the points are contained inside this horizontal band. Then there are no obvious model defects that means if what I want to say here is that you know if all the residuals are contained in a horizontal band centered about E equal to 0 this is the line E equal to 0. Because the residual value could be this is 0 minus 1 minus 2 1 2 3 like that some of the residuals are positive some of them are negative. So, you have observed that before also. So, what I want to say is that if the residuals are contained in a horizontal band then there is no obvious defect with the model that means it is a it is a good fit. So, this is the only situation we say that the fitted model is satisfactory. So, there is no problem with the fitted model. Now, look at the second case this is this figure B indicates an outward opening funnel pattern. So, here you can see that this E I values increase I mean E I increases with the value with Y I hat. So, this is this is called the outward open funnel pattern what does this indicate is that this indicates that the indicates non-constant variance of epsilon I. We assume the see we have assumed that the variance of the epsilon is constant that is sigma square, but if your residual I mean if your plot which plot plot I mean the this plot particular like epsilon versus epsilon sorry E I against Y I hat if the pattern is like you know outward open funnel that indicates that the variance of epsilon I or the variance of epsilon increases as Y increases. So, this one is sort of this is the indication of non-constant variance. So, you cannot if this occurs you know it is not advisable to assume that the variance of epsilon is sigma square. So, next we talk about some more pattern the next pattern is one more thing just I forgot to mention here instead of outward open funnel it could be inward open funnel also that means it would be look like this in that case also indicates that non-constant variance of epsilon in that case variance of epsilon decreases as Y increases. So, both I mean the it could be outward open funnel or it could be inward open funnel. So, both the case both the cases indicate that indicate the non-constant variance of epsilon. So, next we will be talking about one more I mean some more pattern like this is one more pattern here this is the line E equal to 0 Y I hat in this direction and this is called double bow and this one also indicates non-constant variance that means variance of epsilon we cannot assume that this is equal to sigma square. So, this pattern also violates this assumption well and this type of pattern often occurs when Y is a proportion the response variable is a proportion and Y lies between 0 to 1. So, this is in this type of situations we often get double bow pattern. Now, the last one is that you know the figure D shows the non-linear pattern here this non-linear pattern indicates that you know other regressor variables are needed in the model. So, this indicates that the relationship between Y and the regressor variable is not linear we need to introduce some non-linear term and that means considered extra term like a square term X square I mean the relation is not just the linear relation like Y equal to beta naught plus beta 1 X you need to introduce some square term the higher order term or you take a transformation of the response variable Y I mean maybe we will talk regarding this issues later on again, but what this non-linear pattern indicates that the relationship between the response variable and the regressor variable is not linear we need to introduce some other terms like you know other regressors for example, you need to take you may need to take X square or X cube in the model or you may need to take a transformation of the response variable Y like log Y or something 1 by Y something like that. Now, there is a question you may you might be wondering you know why do what we did is that we have just checked about you know the plot of plot of E I against Y I hat. So, why it is you know why plotting E I against Y I hat why not E I against Y I. So, this is the question why do we plot the residual E I against Y I hat and not against Y I for the usual linear model it is not so easy to answer this question like you can think of it E I Y I hat the answer to this question is that you know E I and E I and Y I we do not consider the plot E I against Y I because these two are usually correlated what I mean by this I will talk about that whereas, E I and Y I hat are not correlated there is no relationship between E I and Y I hat. What are going to prove is that there is a linear relationship between between E I and Y I suppose the relationship is the form of the relation is E I I said linear relationship E I equal to some beta naught plus beta 1 Y I plus I mean that is a simple linear regression between simple linear relationship between the residual and the observed value Y I. Now, if you know the same technique this is nothing but the simple linear regression model between the residual and Y I you can check that beta 1 hat is nothing but S E Y by S Y Y. So, what is E S Y this is the least square estimate you can check with my first module beta 1 hat is nothing but this one this is called the least square estimate. So, this one is nothing but this notation only this is nothing but E I minus E bar into Y I minus Y bar by summation Y I minus Y bar whole square. So, what I am trying to prove that you know there is a relationship between between Y I and E I and I am trying to find out that relationship what type of relationship they have if it is linear then what is the value of the coefficient. So, this one is nothing but summation E I Y I minus Y bar you can check that it is not difficult and this one is nothing but the S S T and again you know Y bar into E I that is sum over E I is going to be 0. So, this is equal to summation E I Y I by S S T very simple. Now, in matrix notation this can be written as E prime Y or Y prime E same thing by S S T. Now, what is E in terms of H notation Y prime and E we know E is I minus H Y by S S T and we know that you know I minus since H is an idempotent matrix I minus H is also idempotent matrix. So, I minus H can be replaced by I minus H square. So, this is equal to Y prime I minus H into Y minus H because I minus H into I minus H equal to I minus H as I minus H is an idempotent matrix into Y by S S T well. So, this is nothing but this is E this is E prime E by S S T. So, E prime E is nothing but S S residual right this is nothing but S S residual by S S T which is nothing but 1 minus S S regression by S S T which is nothing but 1 minus R square. So, this R square is the if you can recall R square is the coefficient of multiple determination. Well, so the relationship between so what we proved is that there is a linear relationship between between between the residual and the observed value and the coefficient value I mean the slope is equal to the slope is equal to 1 minus R square. Well, let me check with Y I hat whether there is a linear relationship between between E I and Y I hat we can prove that there is that the slope is 0 in that case. So, let me check with E I and Y I hat. So, E I suppose there is a relation linear relation between E I and Y I hat and the relation is E I equal to beta naught plus beta 1 Y I hat plus epsilon like that simple linear regression. And then by least square estimate we can check that beta hat is equal to S E Y hat by S Y hat Y hat I do not care about this denominator. Let me proceed with S S E Y hat if I can prove that this is equal to 0 then then my slope is going to be equal to 0. That means there is no there is no linear relationship between between E I and Y I hat. So, this is going to be equal to E I minus E bar into Y I hat minus Y bar and you can check that this is nothing but summation E I Y I hat. So, in matrix notation this is equal to E prime Y hat right. So, what is E prime E prime is equal to Y prime 1 minus H because E equal to 1 minus H Y and Y hat is equal to Y hat equal to H Y we know that Y hat is equal to H Y. So, H Y here is equal to here now Y prime H minus H square into Y. Now, see H is an important matrix. So, H square is going to be equal to H. So, this is Y prime 0 basically Y. So, this is going to be equal to 0. So, there is no so this proves that there is no linear relationship between between the residual and the fitted values. So, what I want to conclude here is that no unless in case of in case of in case of E I and Y I we have observed that I have observed that beta 1 hat is equal to 1 minus R square. So, in the case of E I and Y I beta 1 hat is equal to 1 minus 1 minus R square. So, unless R square is equal to 1 there is a positive slope right. There will be a slope of 1 minus R square in E I versus Y I plot even if even if there is nothing wrong nothing wrong with the. So, if you can recall that E I is equal to 1 minus R square in E I versus in E I against Y I hat plot. I said that if all the residuals all the residuals are contained in a horizontal band centered in a horizontal band centered around E equal to 0. Then the corresponding fitted model is perfect, but here since there is a theoretical relationship between the residual and the residual and Y I it is very likely that the residuals will not be contained within a horizontal band centered at I mean E equal to 0. There will be a slope of 1 minus R square when R square is not equal to 0. So, that is why you know it is very difficult to conclude anything if we plot the residual against Y I instead of plotting the residual against Y I hat. So, that is why the reason you know and since there is no relationship between E I and Y I hat they are not correlated there is no linear relationship between them and that is why we prefer plotting the residual against Y I hat. So, that is all for today. Thank you.