 a simple linear regression model to the data and once the model has been fitted then we need to determine the goodness of the fit. So, and also we need to test the statistical significance of the regression coefficients. Well, so one way to do this one like we in the last lecture we have tested the hypothesis H naught which is beta 1 equal to 0 against the alternative hypothesis that that beta 1 is not equal to 0 and we have used the test statistics t to test this hypothesis. So, another way to approach this problem is the analysis of variance. So, today we will be basically talking about ANOVA. So, the content of lecture 4 is analysis of variance in abbreviation it is ANOVA and also we will be talking about the coefficient of determination. Well, so our model is y equal to beta naught plus beta 1 x plus epsilon. So, given a set of data say for example, x i y i for i equal to 1 to n we know how to fit a simple linear regression model to this data that is y hat equal to beta naught hat plus beta 1 hat x. And once the model has been constructed it is important to confirm the goodness of fit and that is statistical significance of the regression coefficients. So, we have tested the hypothesis H naught which is equal to beta 1 equal to 0 against the alternative hypothesis H 1 which says that beta 1 is not equal to 0. And in lecture 3 we have used the test statistic t which is equal to beta 1 hat by m s residual by s x x root over which follows t distribution with degree of freedom n minus 2 and this is under H naught. So, we have used this test statistic to test the hypothesis H naught and we reject H naught if this t value is greater than t alpha by 2 n minus 2 at the level of significance here is equal to alpha. So, now another approach to solve this problem is called ANOVA technique that is analysis of variance. So, here given the set of data x i y i. So, I am talking about ANOVA now the variation in response variable the total variation data. So, the variation in data I mean the variation in response variable which is equal to summation y i minus y bar square i equal to 1 to n. So, this y i minus y bar is basically the deviation of the i th observation from the total from the overall mean well. Now, the question is so this is the total variation in response variable. Now, how much of this variability is explained by the regressor variable and how much of this variation is left unexplained. Well, the question is how much of the variation is explained by the model model or the regressor variable. Now, consider the identity y i minus y i hat which is equal to y i minus y bar minus y i hat minus y bar. So, this one is basically the deviation of i th observation from its predicted value and this quantity is deviation of the i th observation from overall mean. So, this is the deviation of the i th observation from overall mean. Similarly, this quantity is you know this is the deviation of the i th predicted value from the overall mean. Now, this identity can be written as y i minus y bar which is equal to y i hat minus y bar plus y i minus y i hat. So, the basically the significance of this identity is that this is the deviation of the i th observation from overall mean. And this is the deviation of the i th observation i th fitted observation fitted value of the i th observation from overall mean and this is the residual basically this is e i. So, how much of the variation I mean I am talking about the i th observation. So, how much deviation of the i th observation from overall mean is explained by the model and this is the portion which is remain unexplained. Now, let me just draw a figure suppose given a set of observation x i y i I have the fitted model this is my fitted model and my i th observation is here this is my i th observation. So, basically this is x i and this height is y i this is y i this is y i hat because this point is basically x i y i hat and suppose the overall mean of the response variable or of the data y is y bar which is this well now you see that this is this distance is y i minus y bar. So, this is the deviation of the i th observation from the overall mean. Now, part of this deviation is explained by the regression model and this distance basically this distance is basically it is y i hat minus y bar and this portion is y i minus y i hat well. So, the total deviation is this much and part of this deviation is explained by the regression model and the remaining portion is the unexplained part. Now, if we if we square both sides both sides of this equation and sum from 1 to n then we get summation y i minus y bar whole square is equal to y i hat minus y bar plus y i minus y i hat from i equal to 1 to n i equal to 1 to n. So, this is basically the variation in the response variable or variation in the data. Now, we want to split this variation into several parts basically two parts the part which is which is explained by the regressor variable and the part of the variation which is not explained by the by the regressor variable. So, and we want to the part which is not explained by the regressor variable is is basically s s residual and we want to minimize the part we want to minimize s s residual. So, we want the model to be such that so that it can explained the variation in observation I mean most of the part is explained by the model that is what we want well. So, this is equal to summation y i hat minus y bar whole square plus summation y i minus y i hat whole square plus 2 times 1 to n 1 to n plus 2 times y i hat minus y bar into y i minus y bar right and I am going to prove that this cross product term is equal to 0 well. So, the cross product term t is equal to summation y i hat minus y bar. Now, you can check that y i hat minus y bar this is nothing, but beta 1 hat into x i minus x bar and similarly y i minus y i minus y i minus y i minus y hat is basically y i minus beta naught hat plus beta 1 hat x i. So, just replacing y i hat by the fitted value and now I replace beta naught that is basically y i minus y bar plus beta 1 hat x bar minus y i minus y i minus y i minus y i minus y i so I can write this is equal to x i minus x bar. So, if I now replace this 2 quantity here I equal to 1 to n. So, I will get sum over beta 1 hat x i minus x bar into y i minus 1 hat x i minus x bar into y i minus y bar. So, here it is minus beta 1 hat x i minus x bar. So, basically it is beta 1 hat s x y minus x i minus x bar into y i minus x bar beta 1 hat square s x x this is the notation for the summation which is equal to beta 1 hat s x y minus beta 1 hat s x x and this quantity is equal to 0 this is equal to 0 because we know that beta 1 hat is equal to s x y by s x x. So, what we proved is that the cross product term is equal to 0. So, we are left with then summation x i minus y i minus y bar whole square is equal to summation y i hat minus y bar whole square plus summation y i minus y i hat. So, this quantity is denoted by s s t minus s s t so basically it is a total sum of square and this quantity is denoted by s s regression that means sum of square due to regression and this is the portion is called s s residual. So, what we have is that s s total is equal to s s regression plus s s residual well. So, this is the splitting of total sum of square into into two parts the total variation in y is splitted into two parts the variation due to the regression and the variation residual sum of square that means the variation which is not been explained by the regressor variable. Well, now we have been proved that now what is s s residual s s sorry s s regression is equal to summation y i hat minus y bar whole square and this quantity is nothing, but just now we prove I mean we have proved that this is this quantity is equal to beta 1 hat x i minus x bar. So, square here square here summation y i minus y i minus y i minus y bar whole square and this is going to be beta 1 hat square s x x. So, s s residual sorry s s regression is this quantity. Now, s s residual which is equal to summation y i minus y i hat whole square 1 to n this is nothing, but the ith residual. So, I can write this one as summation e i square 1 to n and we have proved that this quantity is equal to minus t is I mean this follows chi square distribution with degree of freedom is not n it is the degree of freedom is n minus 2 because you know all the e i's they are not independent we know that the residuals they satisfy the constraint that summation e i is equal to 0 and the another constraint is summation e i x i is equal to 0. So, because of this due to this 2 constraint all the e i's are not independent you know out of n e i's you can choose n minus 2 e i's independently and the remaining 2 have to be chosen in such a way that they satisfy these 2 constraints. So, we are losing the 2 degree of freedom because of because of this 2 constraint because e i satisfies 2 constraint that is why s s residual here it follows chi square n minus 2 anyway I have proved this thing before also. Now, s s t sum of square due to total sum of square. So, s s t is equal to sum y i minus y bar whole square i equal to 1 2 n and this has this has degree of freedom n minus 1 because of the fact that you know you have y 1 minus y bar y 2 minus y bar y 2 minus y bar and y n minus y bar and this satisfy the condition that of course, it is always true that sum y i minus y bar is equal to 0 i equal to 1 2 n. So, for this reason you know out of n quantities n minus 1 can be chosen independently and the n th 1 has to be chosen in such a way that this constraint is satisfied. So, that is why total sum of square has n minus 1 degree of freedom and also you know we came to know that s s t is equal to s s regression plus s s residual the variation which has been explained by the model and this is the variation which has which is not explained by the regression variable. So, this has degree of freedom and also degree of freedom has the additive property. So, degree of freedom total is equal to degree of freedom of regression plus degree of freedom of residual. So, this quantity is equal to is n minus 1 and we know that s s residual has degree of freedom n minus 2 and then the degree of freedom of s s regression is equal to 1. Now, we will make the ANOVA table well source of variation degree of freedom sum of square well let me write here total sum of square and the source of variations are regression and the residual. So, this has degree of freedom 1 the residual has degree of freedom n minus 2 and this has degree of freedom n minus 1. This one is denoted by n minus 1 and we know what is this quantity this is denoted by s s regression this is called s s residual and this is called s s t. Now, m s mean square which is obtained by dividing the s s by degree of freedom. So, here it is m s regression is equal to s s regression by 1 and similarly m s residual is equal to s s residual by degree of freedom that is n minus 2 s. Now, we already know that expected value of m s regression is equal to sigma square this we have proved before and it can be proved that expected value of sorry this is m s residual m s residual it can be proved that m s regression is equal to sigma square plus beta 1 hat square s x x right and also we know that n minus 2 m s residual by sigma square this follows chi square with degree of freedom n minus 2 and it can be proved that m s regression by sigma square this follows chi square with degree of freedom n minus 2 and it can be proved that m s regression by sigma square this follows chi square with degree of freedom 1 that is under h naught that is beta 1 equal to 0 and these two quantities are these two these are basically this is a function of random variables y i y. So, and they are independent. Now, a statistical theorem we have seen we have theorem says that let x follows chi square m and y follows chi square n and they are independent then x by m by y by n this follows f by n this follows f by n this follows with degree of freedom m and n this quantity is denoted by f this is basically the ratio of two chi squares they follow f distribution. So, from there from this theorem we can say that we can define now for one f for our ANOVA table that f is equal to m s regression by m s residual this follows f with degree of freedom 1 n minus 2 and obviously, looking at these two expected values. So, what we are going to do is that in the ANOVA table next we are going to compute f with degree of freedom 1 n minus 2 and obviously, looking at these two expected values. So, what we are going to do is that in the ANOVA table next we are going to compute f this f is equal to m s regression by m s residual and looking at their expected value it is intuitively clear that in the ANOVA table next we are if f is large then it is likely that this beta 1 is not equal to 0 if beta 1 is equal to 0 then this ratio is going to be close to close to 1. So, from there we can say that to test the hypothesis H naught beta 1 equal to 0 we compute f and reject H naught if f is greater than f alpha with degree of freedom 1 n minus 2. So, this is another way to another approach to test the hypothesis beta 1 equal to 0 I mean the test the same test we can do using the T distribution also basically those two are same I am going to prove that one first let me give one example for this ANOVA table well I am going to consider the same example cost on advertisement that is x i and this is the sales amount y i and we have the data 1, 1, 2, 1, 3, 2, 4, 2, 5, 4 and we know that the fitted model is y i hat is x i and x i equal to minus 0.1 plus 0.7 x i and from here we can you know before also we have computed e i's and we know that SS residual for this problem is for this data is equal to 1.1 and what we need to compute is that we need to compute what is the total variation in the data that is s s t which is equal to summation y i minus y bar whole square. You can check that your y bar is equal to 5 here and what is the total variation in the data that it is not difficult to check that this is equal to 6 and also we know that s s regression which is equal to beta 1 hat square s x x we know that beta 1 is 0.7. So, this is equal to 0.7 whole square and one can check that s x x is basically equal to 10. So, this quantity is equal to 4.9 and here is my ANOVA table for this problem ANOVA the source of variation this is regression residual total degree of freedom degree of freedom for this one is equal to 1. Well the total degree of freedom here is n minus 1 and n is equal to 5. So, total degree of freedom is 4 and the degree of freedom residual is n minus 2 that is equal to 3 and hence the degree of freedom for regression is equal to 1 and the s s values are 4.9, 1.1 and the total variation is equal to 6. So, here you can see that for this problem the fitted model is really good because the total variation in y is 6 and most of the part of this variation has been explained by the regression. So, out of 6 4.9 has been this is the part of the variation which has been explained by the regression model. So, most of the part has been explained by the regression model and the portion which is not explained by the regression model is 1.1 well. So, the f value here is equal to 4.9 sorry sorry first we have to we need to compute m s value the m s value is s s by degree of freedom. So, this is going to be 4.9 and m s residual is 1.1 by 3 which is going to be 0.367 and the f value f value is basically m s regression by m s residual. So, 4.9 by 0.3 is equal to 0.367 which is going to be 13.6 and now you you know this f follows this f here that follows f distribution with degree of freedom 1, 3 and you find the value of f 0.05, 1, 2 and 3 which is going to be equal to 10 point. So, you can check this value from the statistical table now you check you can you can see that our computed f value which is 13.6 is larger than the tabulated value. So, we can conclude that which I mean this computed value is is larger than f 0.05, 1 with degree of freedom 1 and 3. So, we can conclude that we can reject H naught is rejected at the level of significance of this test is equal to alpha is equal to 0.05. So, basically we we have got the same value same result using using the t test. Now, I am going to prove that these two tests are basically you know these two whether you are using f test or t test it does not matter in the case of simple linear regression basically these two tests are the same. Well, let me prove that that f is is nothing but t square. So, f is nothing but t square I mean t square value well well. So, to test this hypothesis beta 1 is equal to 0 against the alternative hypothesis beta 1 not equal to 0 either you can go for the t test which says that t is equal to beta 1 hat square and t is equal to 0. Sorry beta 1 hat by m s residual by s x x this is the t statistic to test this hypothesis. Now, you compute t square which is going to be beta 1 hat square into t square by m s into s x x by m s residual right and and this quantity is well. So, this quantity is nothing but m s regression and the denominator is m s residual which is nothing but the f test so the value of the f distribution with a degree of freedom 1 n minus 2 is same as the value of the t distribution with degree n minus 2. Now, just for we can we can check this one in our previous example what we got is that for testing this hypothesis we got f equal to 13.61. Now, if you go for the t statistic to test this hypothesis we will have this is beta 1 is equal to beta 1 hat is equal to 0.07 and m s residual is 0.367 by s x x is equal to 10 and this is going to be equal to 3.655 and you can check that beta 1 hat is equal you can check that this 3 I mean the t square which is basically equal to 3.655 is is equal to equal to 13.61 which is equal to f. So, whether you use the t statistic to test this hypothesis or the f ANOVA approach that is the f statistic to test this hypothesis they are basically the same and basically the same for simple linear regression model, but once we will be talking about the multiple linear regression then we need to follow the ANOVA approach only. So, let the next I will I will be talking about the coefficient of determination well. So, what is this is that this is denoted by the n minus 2. So, n minus 2 is equal to denoted by is this is defined by defined as r square which is basically the ratio of s s regression by s s total well you know there are several approach to evaluate the performance of fitted model. So, this is one parameter which can be which is used to evaluate the performance of the fitted model here this quantity is nothing, but 1 minus s s residual by s s t. And you know that s s t is equal to s s regression plus s s residual. So, that is why and obviously the range for r square is going to be from 0 to 1 it can be at most 1 if the s s residual is equal to 0. So, this r square it is basically determines that the proportion of variability that has been explained by the by the regression model because r square is equal to s s regression by s s total. So, this ratio will give you the proportion of variability that has been you know explained by the regression model well let me just give this is a very important parameter when r square is going to be equal to 1. So, r square is going to be equal to 1 if s s if s s residual is equal to 0. So, s s residual is equal to 0 and this will happen if the fitted model explained by the regression this will happen if the fitted model model explain all the variability in y. That means there is no part which remain unexplained by the model and the example of this one is this is the case basically suppose you have the data like this then your fitted model is going to be this. So, here r square is going to be equal to 1 and in this case it is intuitively clear that s s residual is going to be equal to 0 well if suppose in some example r square is equal to 0.8 for a maybe this is the case. These are the three data point and you have fitted this model and intuitively you know this is the r square is going to be very high and what we say here is that what is the meaning of r square is equal to 0.8 is that approximately 80 percent of the total variability in y is y that has been explained by the regression model and the remaining 20 percent of the variability in y remains unexplained by the model. So, the higher the value of the regress r square value the better the model is. So, that is all for today and next class we will be talking about talking more about r square. Thank you very much.