 is my first lecture in module called model adequacy checking. In this model here first I will describe the problem what is model adequacy checking, what we do in simple linear regression or in the multiple linear regression we make some basic assumptions on error. For example, you know we assume that the error has zero mean and also we assume that the error term has constant variance and the errors are uncorrelated and also we assume that the errors are normally distributed and let me just you know write the things formally recall the simple linear regression. The model here is y equal to beta naught plus beta 1 x plus epsilon and for the ith observation just we put i, i, i here for i equal to 1 to n and for the multiple linear regression y i equal to beta naught plus beta 1 x i 1 plus beta k minus 1 x i k minus 1 plus epsilon i. So, this is a multiple linear regression model with k minus 1 regression variables. So, the basic assumptions here the assumptions are expectation of E i is equal to 0 variance of E i that is the error terms epsilon has constant variance variance of E i is equal to sigma square errors are uncorrelated and also we assume that the errors are normally distributed. So, all together I can write that E i follows normal with mean 0 and variance sigma square and they are independent and identically distributed. So, today what we will do is that we will I mean in this module basically what we will do is that we will present several techniques to check these basic assumptions on error whether they are correct or not. So, you know gross violation from these assumptions may yield a model which is very unstable. So, in this module we will learn how to given a set of data whether the data set satisfy this basic assumptions or not. So, we will talk about you know several plotting of residuals. So, based on that we will check whether these assumptions are correct or not. So, the residual first you know what is the residual in simple linear regression model or multiple linear regression model. So, E i the ith residual is equal to y i minus y i hat. So, y i is the ith observation. So, y i is an observation and y i hat is the corresponding fitted value. So, this is called the you know the regular residuals and this E i the ith regular residual it measures the part of variability in the response variable which is not explained by the model. Because E i is the difference between the original data I mean the response value y i and the fitted value. So, the part which has not been explained by the regression model is E i. So, it is you know very convenient to treat this E i as the observed value of the epsilon i because we want to test the assumption on E i sorry assumption on epsilon i that is the error. We assume that the epsilon i follows normal 0 sigma square and they are independent and identical distributed. So, the observed value I mean this residuals E i's are treated as you know the observed value of the errors epsilon i's. So, what we know about E i is that first of all epsilon i's are this is some observation you know you know that epsilon i's we assume that epsilon i's are independent, but the residuals E i's are not independent the n residuals only n minus k degree of freedom. Because you know you know about this degree of freedom all this E i's you know the residuals we cannot choose independently I am talking about multiple linear regression model with k minus 1 regressor. So, there are k constrain on involving E i. So, we cannot choose all the E i's the residuals independently. So, we can choose n minus k of them independently and the remaining k E i's have to be chosen in such a way that they satisfy those k constrain. Well, so what I said is that you know it is since we are we are trying to check whether this assumption E i follows normal 0 sigma square iid whether this is true or not this is iid or not it is it is convenient thing of the residuals as the observed value of the and you know plotting the residuals is an effective way to investigate how well the regression model fit the data or to check the model assumptions. So, we will learn about several residual plots in this module and before that you know I just want to introduce two definitions one is called you know the I mean the leverage and the influential observations because the things are connected. Let me just know first introduce what we mean by the what we mean by leverage point and influential observations. So, here is a scatter plot of the observation x i y i. So, suppose you know I have some observations x i y i and this is the scatter plot of the given data. Now, you see the point A the point A this has unusual x coordinate from the rest of the observations. So, the x coordinate for this point is much larger than the x coordinate of the remaining observations. So, this point this is an example of a leverage point. I will give some numerical example for leverage point and the influential observation also. So, if a data point has unusual x coordinate, but here you note that you know this point is lying on the general trend of the observation. So, if you fit the given data here the fitted model will be something like this. So, this one you know lies on the fitted model fitted line. Now, I will talk about influential observation. So, again you know this is a scatter plot for the observation x i y i i equal to 1 to n and the point A here is called an influential observation. You check that you know this point has moderately unusual x coordinate and the y i is called a influential observation. The y i value is also unusual. So, both the x coordinate and y coordinate. So, if I say this point is say x A y A then both x A and y A are larger you know compared not larger. I mean their far I mean what I want to say is that x A is different from the center of x coordinates and similarly y A is different from the center of y coordinate. So, this is you know here A this point A is not lying on the general trend of the data set. So, this is and this is a leverage point as well as it is not on the general trend of the data set. So, this is this type of observation is called influential observation and the influential observation has noticeable impact on model coefficients. So, just let me just you know again let me tell that what is you know leverage point and influential observation. So, a point is said to be leverage point if it has unusual x coordinate, but the point may lie on the general trend of the data. So, this is the general trend of the data. So, this is the general trend of the data, but in case of influential observation a point or an observation you know is said to be influential observation if it has unusual x coordinate as well as it has unusual moderately unusual y coordinate. Now, again you know today what we will do is that we will talk about several scaled residuals. So, first I will start with hat matrix and then I will talk about several scaled residual well the hat matrix and various types of residuals. Let me just recall you know the multiple linear regression model in matrix form we write this as y equal to x beta plus epsilon. So, y is a vector of y 1 plus y 2 plus y 1 plus y 2 plus y 1 I mean n responses and beta is also a vector of you know beta naught beta 1 up to beta k minus 1 and epsilon is vector epsilon 1 epsilon 2 up to epsilon n. And what we assume here is that the variance of epsilon is equal to sigma square i. So, this is the base assumption we make this is i n. Now, solution of this multiple linear regression model is we know that beta hat is equal to x prime x inverse x prime y if x prime x is non singular. So, we know how to this is the least square estimator estimator of the regression coefficient x prime x inverse x prime y. So, the fitted model the fitted model is x prime is y hat is equal to x beta hat which is equal to x x prime x inverse x prime y. So, just know plug beta hat here and this is equal to equal. So, this is equal to x beta hat h y say where of course, h is equal to x x prime x inverse x prime. So, this matrix is called the hat matrix. So, this is called the hat matrix because you know this is called hat matrix because it maps y to y hat that is why it is called you know hat matrix. Anyway, so the elements of h is equal to say h i j which is equal to h 1 1 h 1 2 h 1 2 h 1 h 1 n h 2 1 h 2 2 h 2 n h n 1 h n 2 h n n. So, this is what the hat matrix and you know how to calculate the elements of the hat matrix because x is known then you can compute the elements of the hat matrix h well. Now, we will talk about you know several properties of hat matrix first of all h is it can be verified that h is symmetric that is h transpose or you know h transpose is equal to equal to h. And the second property is that h the hat matrix h is idempotent that is h square is equal to h transpose is equal to h transpose h. Well, let me prove this one what is h square h square is equal to h into h my hat matrix h is equal to x x prime x inverse x prime and this is h into h. So, x x prime x inverse x. So, this is equal to now this will cancel out. So, this is x x prime x inverse x which is equal to which is equal to h again. So, the two properties of the hat matrix is the hat matrix is symmetric and it is also idempotent matrix. Now, the residual you know in matrix notation the residual is equal to y minus y hat. So, y minus y hat is equal to y minus now y hat you know that y hat is equal to h y. So, h y which can be written as i minus h into y. So, this is equal to i minus h and y is equal to x beta plus epsilon. So, this is equal to x beta minus h x beta plus i minus h epsilon. So, this is equal to x beta sorry. So, x h is equal to x x prime x inverse x prime x minus y hat. So, this is equal to x beta. So, this is h and then x beta plus i minus h epsilon. So, this one is nothing but. So, this is x beta minus again x beta plus i minus h epsilon. So, this is equal to i minus h epsilon. So, this is equal to i minus h epsilon. So, what I am trying to do is that you know in this lecture my aim is to introduce several scaled residual. What we know till now is that we know that is just the regular residual that is E i. Now, we will talk about several scaled residual which are useful to for some purpose. So, we will talk about standardised residual, we will talk about student residual and also we will talk about the press residual. So, for those purpose you know I need to find the variance covariance matrix of the residual. So, variance of epsilon i is equal to sigma square, but variance of E i, E i we treat as you know the observed value of the epsilon i, but variance of E i is not sigma square. So, what I am trying to do is that I am trying to find the variance of the i th residual E i. So, what I have at this moment I have I know that E i is equal to 1 minus h into epsilon i. So, what I know is that E i E is equal to i minus h epsilon. Now, I can find you know the variance covariance matrix variance covariance matrix of E. So, variance covariance matrix of E is a vector right. So, variance covariance matrix of E is equal to i minus h sigma square i i minus h is a very standard one and this is equal to sigma square i minus h is a very standard one. Now, as h we know that the hat matrix h is an identpotent matrix, then you can check that if h is identpotent then i minus h is also identpotent. So, i minus h square is equal to i minus h. So, this is you know we can write as this is equal to sigma square i minus h. So, this is the variance covariance matrix of E. E is you know it is a vector you know I hope you understand that E is E 1 E 2 E n and this is the variance covariance matrix of E. So, from here I can write you know the variance of variance of E i is then equal to sigma square 1 minus the ith diagonal element is 1 minus i 1 minus h i i. So, h i i is the ith diagonal element of h where h is i i is the ith diagonal element hat matrix and similarly you know you can find the covariance between not necessary, but you know just covariance of between the ith residual and the jth residual see E i is a you can find it from here that is equal to sigma square h i j is of course, a minus here. Well I have something to say more about this the ith diagonal element well. So, what is h i i? h i i is the ith diagonal element of the hat matrix h and the hat matrix h is equal to sigma square h i j is equal to x x prime x inverse x prime. So, what is this x? x is you know the coefficient matrix sort of. So, x has the rows x 1 x 2 prime x n prime. So, x i prime is associated with the ith observation right. Now, I am interested in the ith diagonal element of this hat matrix. So, I hope you understand that h i i is then just x i prime x prime x inverse x i. So, where where x i prime x i prime x i prime x i prime x i prime x is the ith row of x matrix. So, it it you know you you can check that what h i i does is that h i i measures the distance of ith observation from the center of sigma. So, center of x coordinate and you have to understand. So, this is enough to explain you know this is what the h i i is that is this quantity and the x i is the x i prime is the ith row of the x matrix. And then h i i measures the distance of the ith observation from the center of x coordinate. And it is not difficult to observe you know realize that h i i they are in between 0 to 1. So, what message I want to give from from this h i i is that you know you must have understood that h i i is the ith diagonal element of the hat matrix h. And it measures the distance of the ith observation from the center of x coordinate. And now you recall the definition of leverage point leverage point is a point which has unusual x coordinate. So, it is quite obvious is that know the h i i is going to be large if the ith observation is a leverage point. So, you know somehow from h i i we can get information about the leverage point. So, this is what I wanted to mention here. Next we move to you know some there are various type of residuals till now we know only one residual that is e i that is called the regular residual. And we will I will introduce you know three more scaled residuals in this lecture. The first I will talk about studentized residual. So, what is studentized residual? We define we define r i as e i is equal to e i is by it is standard deviation. What we know what we did just now is that we have computed the variance of e i variance of e i is not sigma square. We know that variance of e i is equal to sigma square into 1 minus h i i right. Now the standard standard deviation is e i is just the square root of this quantity. And since sigma square is not known we generally know estimate sigma square by m s residual. So, the studentized residual r i is nothing but this thing m s residual into 1 minus h i i. So, e i by it is standard deviation the i th residual by it is divided by it is standard deviation. So, it is very easy to observe that studentized residuals constant variance that is variance of r i is always going to be equal to 1 regardless of the location in x coordinate of course, when the form of the model. So, this is one scaled residual next we will be talking about standardized residual. So, what is a standardized residuals standardized residual is defined by d i equal to i th residual divided by d i equal to i th just m s residual. So, what we did here is that we have just replaced you know we just approximating the standard deviation of e i the actual standard deviation is this quantity or actual variance is this quantity. And what we do in the standardized residual is that we approximate this quantity by sigma square. So, here we approximate m s residual as variance of i th residual. So, till now we know about two scaled residuals one is standardized residual and the other one is studentized residual. And both the scaled residuals they give almost the similar information, but in some cases they are different. So, I will just give one example you know this example is from a book by Montgoumari there you know we have the value of studentized residual and the standardized residual we will compare and we will see when the values both the those two scaled residuals are almost similar and when they are different. So, this is an example. So, this is an example from Montgoumari book here the first column is the observation number there are 25 observations and the second column is the delivery time which is the response variable is the delivery time y in minutes. And these are the values and there are two regressors here one is the number of cases it is denoted by x 1 and the other one is x 2 which is the distance in fit. So, this is an example with this is an example of multiple linear regression model with two regressors and one response variable. And you know how to how to fit a multiple linear regression model to this data that we have discussed in the previous module. And here is the fitted model once you have the fitted model you know the actual value of the response variable you know the fitted value of the observation of the response variable then you can compute you know you can compute E i which is equal to y i minus y i minus y i hat. So, I have the table for this regular residuals here is the table well. So, what it says is that you know the first column is the number of observations and the second column is E i minus y i minus y hat. So, the second column gives the residuals I mean the regular residual E i the third column gives the standardized residual E i by root of M S residual and the third column fourth column gives studentized residual. Now, if we look at these observations carefully first let me you know refer the previous thing here you can note that the ninth observation which has unusual x coordinate. So, this here the x 1 value is 30 whereas, the center of x 1 is you know quite less compared to 30 and the value of the regressor x 2 is 1460 which is also quite large compared to the center of x 2 coordinate. So, it appears that you know this seems to be a this seems to be the ninth observation seems to be a leverage point or an influential observation and let me check the residual for the ninth observation. Note that the second observation is the residual E 9 that is the residual for the ninth observation is 7.41 and which is you know this is suspiciously large this residual compared to the other residuals and also let me check the value of standardized residual for the ninth observation that is 2.27. And the value of the studentized residual for the ninth observation is 3.21 instead of now what I want to I want to make observation here what I want to comment here you know my statement is that this R 9 is substantially larger than D 9 whereas, this R 9 is substantially as you know if you if you compare you know R 8 and D 8 there is no much difference similarly say R 7 and D 7 there is not much difference between the between the standardized residual and the studentized residual. So, my final conclusion here you know what I want what I have observed or what you need to know is that that that the standardized residual and the studentized residual they they give almost the same information similar information but there will be a substantial difference between the between the standardized residual and the studentized residual if if the observation if if the associated observation is an influential observation or or or leverage point. So, if the given point is you know the given observation is is is leverage point or influential point then there is there will be a substantial difference between the studentized residual and and the standardized residual otherwise they are almost similar and you know you have to you have to understand why why it is. So, just just give outline of the of this fact you know let me just recall what is what is studentized residual studentized residual is R i which is equal to E i by M s residual into 1 minus h i i this is the studentized residual and standardized residual I think it is D i D i is equal to E i by root over of M s residual. Now, if the I if the observation is an influential observation then h i i is going to be large. So, h i the limit for h i i is it is it is between 0 to 1 if this is large then h i is large that means this is close to 1 and then this is small the denominator is small means the whole thing is large. So, that is why and and in case of standardized residual h i i is treated as 0 always. So, if if the i th observation is an influential observation is an influential observation then h i i is going to be large h i i is large means 1 minus h i i is small that means the denominator is small and then the whole thing is going to be large. So, that is why you know in case you know you you can this is one way you know to identify with the observation is influential or not looking at the difference between the studentized residual and the standardized residual. I have to stop now thank you.