 This is my first lecture in module 6 that is test for influential observations and here is the content of this module test for influential observations. First we will be talking about detection of a leverage point using H i i, where H i i is the ith diagonal element of the hat matrix H and then we will be talking about detection of influential observations using you know Cooke's D statistic and then D F F i t s that is you know difference of fits and this is D F beta's that is the difference of beta values. So, the objective of this module is to present you know different techniques to detect influential observation. So, we have learned about the leverage point and the influential observation in the previous module. Again I will just repeat those things one will recall those things once more. Here is the definition for leverage point. So, I you have several observations like you know x i y i for i equal to 1 to n and here is the scatter plot for the observations and the point. So, these are the points you know in the scatter plot and as you can see the point A here has unusual x coordinate from the rest of the observations. So, this is this point A is called leverage point. So, it could be so not necessarily that you know if you fit a model to the given data or if the this point is exactly lying on the fitted model, but not necessarily I mean this is also a leverage point for example, you know. So, what I want to say is that it is the point is on the is lying on the trend of the observation, but this has unusual x coordinate. So, this type of points are called a leverage point and next we will be talking about influential observation. So, here is an example of influential observation. So, look at this point A here again you know this is the scatter plot for the given observation x i y i and the point A here is this has you know moderately unusual x coordinate and also it has moderately unusual y coordinate. So, this point is has moderately unusual x coordinate and also it is not in the general trend of the data. So, this point is called an influential observation. So, what we learned is that if a point has unusual x coordinate then the point is, but it may lie on the general trend of the data then the point is called a leverage point and if a point has moderately unusual x coordinate and also unusual y coordinate then the point is called influential observation and the influential observation has a significant effect on the model regression coefficients. So, what we will do in this module today is that we will be talking about several techniques to detect a leverage point and influential point. So, first I will be talking about one technique which can detect a leverage point. So, test for leverage point. So, what I will do is that first I will again recall the multiple linear regression model and in matrix form here is the model y equal to x beta plus epsilon and you know the least square estimate of this regression coefficient beta. So, this is a vector you know that this is a vector with k regression coefficients. So, beta hat is equal to x prime x inverse x prime y. So, the fitted model is the fitted model is y hat which is equal to x beta hat. So, this is equal to x beta hat equal to x prime x inverse x prime y and this we can write as h y. Now, we talked about all these things in the previous module but just I am recalling here the same thing again. So, where h is equal to x x prime x inverse x prime and this matrix is called the hat matrix and this is called hat matrix because this matrix maps y to y hat. So, this hat matrix plays an important role in identifying leverage point. So, I told that you know in fact you know the i th diagonal element of this hat matrix that is h i i. So, h i i is the i th diagonal element of this hat matrix and this has an important role to check whether the i th observation is a leverage point or not. So, let me write what is this h i i. So, h matrix is equal to h is equal to x x prime x inverse x prime then what is h i i h i i is the i th diagonal element of hat matrix. So, x is n cross k matrix and let me just write that the n rows n rows are say for example, x 1 prime x 2 prime x n prime are the n rows of this x matrix then in terms of in terms of x i the i th row I can write h i i h i i equal to x i prime x prime x inverse x i. So, you know this part is independent of i I mean whatever the value of the i this does not change. So, what does this measure is that this h i i is this one is a standardized measure of the distance of i th observation from the center of x coordinate. Well, so what we have observed is that you know the i th diagonal element of the hat matrix h i i this one measures the standardized distance of the i th point from the center of x coordinate. So, and we call a point leverage point if it has unusual x coordinate. So, then obviously you know if h i i is large for a particular i then that indicates the i th point is a leverage point. So, the conclusion here is that you know the criteria for i to be a leverage point is that you know. So, usually the high h i value h i i value indicates i th observation is a leverage point. So, and I said that you know of well. So, let me just compute the average value of h. So, I said that if the value of h i i is large then that indicates that the i th observation is a leverage point. So, what I am trying to do is that I am trying to find the average value of h and if for a particular i h i i is very larger may be the double more than the double of the average value then we say that the point i is a leverage point. So, h bar is equal to h i i i equal to 1 to n by n. So, this one is nothing but the trace of the h matrix trace means the sum of the diagonal elements. So, trace of h by n and for this matrix h the trace of h is equal to rank h by n and the rank h this matrix is k. So, the average value of h is k by n. So, as a general rule h i i greater than 2 times k by n indicates i th observation is a possible leverage point. So, next we will be talking about some technique to detect influential observation. So, in case of see in case of leverage point a point is said to be leverage point if it has unusual x coordinate and h i i measure the distance of the point i th point from the center of x coordinate. So, then obviously, if the h i i value is large then that indicates the i th point is a leverage point. But in case of influential observation recall the definition of influential observation that it has a point is said to be influential observation if it has moderately unusual x coordinate as well as moderately unusual y coordinate. So, here we need to take care of both the x coordinate and the y coordinate and cook has suggested a statistic to do this. So, we will be talking next we will be talking about cook statistic to detect influential observation. So, detection of influential observation. So, we will be talking about cook statistic and it is denoted by D. So, it is a distance between two thing. So, what this cook statistic does is that it measures the distance between the fitted response value obtained using all the observations that is the usual fitted. I mean I am talking about the cook statistics for the i th observation. So, what it does is that it measures the distance between fitted observation obtained using all the data. For all the observations and the fitted response obtained without using the i th observation. I mean using all the observations except the i th observation. So, let me write down the cook statistics formally. So, cook statistic for i th observation is based on on the difference between predicted response we call it y hat obtained. I mean this is obtained using all the observations predicted response y i. So, this one is the predicted response which is obtained using all the observations except the i th observation obtained. So, this one is obtained without observation. So, what I want to say is that what the cook statistics does is that suppose you want to compute the cook statistic for the observation a here. That means we call it d a. So, what that d a is that. So, d i is the cook statistic for the i th observation. So, d a is the cook statistics for the for observation a. First you compute the distance I mean you have to compute the Euclidean distance between between y hat and y a hat. So, y hat is the fitted response based on all the observations. So, if you consider all the observations then the model will get influenced by this observation and the fitted model may be look like this. Now, so this is what the y hat you know once you have this fitted model you can get y 1 hat, y 2 hat, y 3 hat, y n hat everything and this is the vector and vector of fitted observation fitted response values. Now, you compute y a hat. So, this is the fitted or predicted response value obtained without using observation a, but using all the other observations. So, if I do not use this observation then my model will look like this fitted model will look like this. So, from here I will get y 1 a hat and y 2 hat and all these things y n a hat. So, this is a vector this is a vector you find the Euclidean distance between these two vectors that can be obtained using this way this is y hat minus y a. So, this is what I want to mean by the distance of I mean the this cook statistic measured the distance between the predicted response obtained using all the observations and the predicted response based on all the observations except the ith observation. So, this is how we get the cook statistic for the ith observation and similarly you do it for all the observations. So, you will get d 1 d 2 d n well. So, this is how we get the d i the cook statistic for the ith observation d i is equal to y i hat it does not matter which one you are writing first y hat prime y i hat. So, you need to understand that you know this is a vector. So, this y hat is nothing but y y 1 hat y 2 hat and y n hat and y n hat. So, this is what I want to mean by the distance and you need to understand that y i hat is the predicted observation obtained using all the data except the ith observation. So, this vector is you know I may write it in this form y i 1 hat y i 2 hat y i 2 hat y i 1 hat y i 2 hat y i 2 hat y i 2 hat y i 2 hat y like that y i n hat. So, if you write you know then this can be also written as this is equal to y i maybe let me use j here minus hat minus y j hat square. Not difficult to understand by K M S residual. Well, so this is the cook statistic for the ith observation and this can be also you know treated as the square Euclidean distance between the vector of vector of fitted values and and vector of fitted I mean response values when ith observation is deleted. Well, and the ith observation is deleted. And the rule to say that one observation is an inferential observation is that the value of d i. So, say you know see you need to calculate all this d 1 d 1 stands for the cook distance for the first observation d 2 like up to d n. So, the value of d i much bigger than others indicate that ith observation may be highly influential. So, this is not difficult to understand that if the d i is is is going to be large for the influential observation. If i is not an influential observation is not going to be much difference between the fitted value using all the observations and the fitted value without considering the ith observation. So, next we will be talking about one more two more statistic to to detect the difference influential observation. So, the next one is the next one is called d f f i t s. So, this is also called you know difference between fits statistic. So, this one investigates deletion influence of the ith observation observation on the fitted values. So, this is the next one. So, so cook statistics also does the same thing you know it it it investigate the deletion influence of the ith observation on the fitted values, but here the statistic is different. So, for the ith observation for the ith observation this statistic is is is defined as d f difference between fits i which is equal to y i hat minus y i hat and we make it standardized by MS residual i h i i. So, here you are only considering the ith observation I mean difference between. So, what is this? This is this y i hat is the is the fitted value of y i obtained using all the observations and this guy this y i bracket hat is the fitted value of y i obtained without using the ith observation. So, this is y i hat where y i hat is the ith observation. The fitted value of y i obtained without the use of ith observation. So, here yeah. So, this is using without using the ith observation means using all the observations except the ith observation. And similarly we have one new term I need to introduce this notation that MS residual MS residual i is also is the estimated I mean is the predicted value of predicted value of MS residual obtained without the use of ith observation. So, here also you know generally MS residual estimates sigma square which is sigma square is variance of epsilon which is unknown, but here the MS residual bracket i is the MS residual only that is obtained using all the observations except the ith observation that is the difference between MS residual and MS residual i. And of course, you know if ith observation is going to be an inferential observation then this quantity is large. Let me just try to explain this one using this example might be. So, well what I told is that my this difference of fits i. So, I am doing it for a for example here. So, what is this? This is I said that y a hat this is based on all the observations minus y a bracket hat and it has been standardized in some way. So, what is y a hat here? That means this is using the fit using all the data. So, this is my y a hat I mean this distance this height is y a hat this is the point x a y a hat and what is y bracket a hat is that. So, this is the fit you know using all the observations except the observation a. So, this is the point you know x a this is the point x a y a bracket hat and this is the point x a y a hat. The difference is this much and this difference is going to be large if the point is influential quite clear because you take any other point and compute the d f fits for that point it is not going to be so large. Well, now the testing criteria is that a possible a possible high influential observation is indicated by if this statistic value for some observation I different difference of fits f i t s i if this quantity is greater than 2 times k by n I am not going into the detail of you know how to get this critical value, but just is enough to know that if the if the if the if the if d f fits statistic for the i th observation is greater than this quantity for some observation then the observation can be treated as influential observation. So, next we will be talking about one more statistic that also this new statistic you know that also measure the deletion influence sorry the influence of deleting the i th observation from the data set. So, this one is called difference between the data set and the data set. And the betas and here it measures you know instead of measuring the difference of fits I mean the difference of 2 fitted values it measures the difference of 2 I mean the estimated value of the regression coefficient beta j. So, here we what we compute is that we compute how much the regression coefficient beta j hat changes if if the i th observation if the i th observation is deleted. So, instead of looking at the difference in fitted values here this d f betas it looks the change in regression coefficient beta j. So, this d f betas for the i th observation and for the j th regression i j is equal to beta j hat minus beta j hat i and it has been standardised using this m s residual i. So, this is the and you know what is this and x prime x inverse j j just let me tell what is this beta j beta j is the least square estimate of beta j hat is the least square estimate of beta j obtained using all the observations. And this beta j hat i this beta j hat i this beta j hat i is the j th regression coefficient computed computed without use of i th observation. So, what you have to do here is that you know and what is this is the you know what is x prime x inverse and j j means is the j th diagonal element of x prime x inverse. And so, this difference or difference of betas I mean d f betas is calculated for each i and for each j. So, here i runs from 1 to n and also j runs from 1 to n and also 0 to k minus 1 and as general rule as a general rule a possible high influential observation is indicated by d f difference of betas i j if this value is the difference is greater than 2 by root and again I am not going into the detail of how to how to get this critical point. So, these are the different techniques to different techniques to detect influential observation or the leverage point. And next we will be talking about you know once an influential observation is detected then whether this influential observation should be discarded or not. So, what we will do after detecting an influential observation in the given set of data? The question is should influential observation be discarded? So, here you know the recommendation is that you know we need to take care of the influential observation I mean first we need to check whether there is any error in measurement for that particular observation and if you see that you know if you see that there is an error in recording the observation then the discarding the observation is appropriate otherwise you know if you see you know if your analysis reveal that there is the observation is a is a valid observation then the observation is a valid and there is no justification of discarding the discarding of valid observation only we need to take you know special care of that influential observation. So, that is all about the influential observation or you know how to detect and how to take care of the I mean if there is an influential observation what to do with that influential observation and in the last module because of the time constraint you know I could not talk about one thing that is called the press statistic. So, this is not a part of this module, but I have some time today. So, what I want is that I just want to talk about the press statistic now and then I will stop. So, the press statistic so note that this is not a part of this module the press statistic. So, this is one thing I wanted to talk in the previous module that is in module in module adequacy checking. Let me just recall know the ith press residual is E i which is equal to I already talked about this y i hat. So, you know that you know this is the observed value and this is the fitted value of y i obtained without the use of ith observation and of course, here large press residual are useful in identifying observations where the module does not fit the data well. Large press residual are useful in identifying observations where the module does not fit the data well. So, if you can recall the ninth observation in the that you know delivery time data there E 9 was very large E 9 was something like 14.7 and. So, this indicates that you know this fitted module does not I mean the module does not fit the ninth observation well. And anyway so the press statistic is the pure press which is equal to E i square 1 to n. So, this is nothing but y i minus y i hat square from 1 to n and this also can be written as we learned in the previous module this can be written as E i by 1 minus h i i square. So, what this measure is that it measures how well a regression model perform in predicting new data. So, this is the this is how the press statistic is used and just I want to recall the previous example that delivery time data there you can check that the press value is equal to E i square i equal to I think 1 to 25 that is 457.4 and see the press is the counter part of s s residual when you consider all the data. So, this is some just E i square i equal to 1 to 25 which is equal to 233.4 7 and as I told you that for the ninth observation in that example E bracket 9 was 14.7. So, E 9 square is almost the half of this press value. So, the higher press value indicates that the model cannot perform very well in predicting the new data. So, especially for the data where the regression values are large. So, this indicate so this indicate fitted model is not likely to predict new observation with large x 1 and x 2 value well because I do not have the example with me at this moment yeah I have it. So, this is the ninth observation this is the ninth observation. So, you can see that you know for here the x 1 value and the x 2 values are very large and that is why the E 9 value is not likely to predict the new data. So, this indicates that and the E 9 value is very large. So, this indicates that that you know the fitted model is not likely to predict new observations with large x 1 and x 2 value well. So, this is what about the press statistics and also I know as I told this is not the part of this module. So, in this module we have learned about how to detect an influential observation and once the influential observation has been detected what to do with that observation and I have to stop now. Thank you.