 simple linear regression. In the first lecture, we have learned how to fit simple linear regression model to a data set, which fits the data best. That means, we have learned how to fit a simple linear regression model using the least square technique. In the second lecture, we have learned the statistical property of the regression coefficients that is ß0 hat and ß1 hat and we have observed that both the ß0 hat and ß1 hat, they are unbiased estimator of ß0 and ß1 respectively. And we computed the variance of ß0 hat and ß1 hat and we found that both the variance of ß0 and ß0 hat and ß1 hat, they involve sigma square. So, sigma square is the population variance, which is unknown. So, what we need to do is that, we need to estimate the population variance sigma square. So, here is the content of today's lecture. We are going to estimate the population variance sigma square. So, we will give an unbiased estimator of sigma square and next, we evaluate the performance of a fitted model. So, we will talk about the confidence intervals and tests for the regression coefficient ß0 and ß1. So, first we talk about the estimation of sigma square. Well, the estimation of sigma square is obtained from SS residual and in lecture 2, we have proved that SS residual, this can be written in the form s y y minus ß1 hat square s x y. Well, now our ultimate aim is to prove that SS residual by n minus 2, this is an unbiased estimator of. So, what we need to do is that, we will find the expected value of residual sum of square. So, expected value of residual sum of square SS residual is equal to expectation of s y y minus expectation of ß1 hat square s x y. Well, so first we, let me find the value of expected value of s y y. So, expected value of s y y which is equal to expectation of s y y is nothing but summation y i minus y bar whole square and this can be written as expectation of summation y i square minus n time expectation of y bar square. Now, again this one is equal to summation of expectation of y i square minus n time expectation of y bar square. Well, so what is expectation, let me recall the model y i, y i equal to ß0 plus ß1 x i plus epsilon i and we assume that epsilon i expected value of epsilon i is equal to 0. So, expected value of y i equal to ß0 plus ß1 x i right, x i is not a random variable and we also know, we also assume that the variance of sigma i is equal to sorry variance of epsilon i is equal to sigma square. So, the variance of y i is also equal to sigma square, so variance of y i is equal to sigma square. Now, expected value of y i square is equal to variance of y i plus expectation of y i whole square, this is the, this is from the, from the definition of the variance well. Now, the variance of y i is equal to sigma square and expectation of y i is this quantity, so this is equal to ß0 plus ß1 x i whole square right. And similarly, we can find out the expectation, expected value of y bar square, it can be proved that expected value of y bar square is equal to of course, this is equal to variance of y bar plus expectation of y bar whole square. So, variance of y bar is equal to sigma square by n and expectation of y bar is equal to ß0 plus ß1 x bar, this is equal to ß0 plus ß1 x bar whole square. Now, basically what I will do is that I will just plug these values here, so expected value of s y y is equal to this thing summation over i, so n sigma square plus summation ß0 plus ß1 x i whole square minus n times expectation of y bar square, which is this quantity, so n times this term, so this is basically sigma square minus n times ß0 plus ß1 x bar whole square. Well, so little bit algebra will prove that this is nothing but n minus 1 sigma square plus ß1 square summation of x i square minus n x bar whole square and this is nothing but n minus 1 sigma square plus ß1 square s x x, this is the notation for this term summation x i minus x bar whole square, so what we proved that expected value of s y y is equal to n minus 1 sigma square plus ß1 square s x x. Now, see that we want to compute the expected value of residual sum of square, so this involves the expected value of s y y and the expected value of ß1 hat square s x y, next we will compute the expected value of this one, so this is equal to next expectation of ß1 hat square s x y is equal to I think I did a mistake here, this is not x y this is x x this is x x, so this is not x y this is x x x x, so this one is equal to s x x expected value of ß1 hat square and we know that ß1 expected value of ß1 hat is equal to ß1 this is an unbiased estimator and the variance of ß1 hat is equal to sigma square s x x, so expected value of ß1 hat square is equal to variance of ß1 hat plus expected of expected value of ß1 hat whole square. So, we know both the values here, this is equal to sigma square by s x x and this one is equal to ß1, so this thing is equal to s x x and expected value of ß1 hat square is equal to sigma square s x x plus ß1 square, so this is going to be equal to sigma square plus ß1 square s x x. Now, just we need to plug these two values here, expected value of residual sum of square is equal to we prove that this one is equal to n minus 1 sigma square plus ß1 square s x x and this one is equal to minus sigma square minus ß1 square s x x and this one is nothing but n minus 2 sigma square. So, what we prove is that expected value of s s residual sum of square by n minus 2 is equal to sigma square, that means residual sum of square by n minus 2 is an unbiased estimator of sigma square. So, what we prove is that expected value of s s residual by n minus 2 is equal to sigma square and this s s residual by n minus 2, this is also denoted by m s residual and this is called residual mean square, this is called square. So, this is we ultimately found an unbiased estimator of sigma square which is m s residual. Next we will talk about the distribution of m s residual. So, what is s s residual? Residual sum of square is nothing but summation e i square, i is from 1 to n. So, this e i is nothing but the ith residual, this is the difference between the observed response value and the predicted response value. So, it can be proved that expected value of e i is equal to 0. So, you can prove it and also it can be proved that the variance of e i is equal to sigma square. See this e i is nothing but the estimate of ith error term. So, the variance of e i this can be proved sigma square and also you know we have assumed that epsilon i follows normal 0 sigma square and they are independent which implies that the observations y i's they are also normal with mean beta naught plus beta 1 x i and variance sigma square. Now, see this e i is it is a linear combination of y i's. This y i hat is nothing but beta naught hat plus beta 1 hat x i and both we proved that beta 1 hat is a linear combination of the observations and also beta naught hat is also a linear combination of the observations. So, the whole thing this e i is a linear combination of linear combination of the observations which implies that e i follows normal because it is a e i is a linear combination of normal variables and the linear combination of normal variables is also normal. So, that is why e i follows normal with mean 0 and the variance is equal to sigma square and from here we can say that e i by sigma this follows standard normal 0 1 and also since e i by sigma follows standard normal we can say e i square by sigma square this follows chi square 1 well and s s residual is basically it is a sum of e i square i equal to 1 to n, but the distribution of s s residual is not chi square n because all the e i's are not independent they this e i's satisfy some constraint well we know that I mean beta naught hat and beta 1 hat are least least square estimator of beta naught and beta 1 respectively and this e i which is equal to y i minus y i hat they satisfy the constraint that e 1 plus e 2 plus e n equal to 0 that is this is what we proved before also the sum of the residuals is equal to 0 and also it satisfy this is basically the first normal equation which the residuals satisfy and the second normal equation is summation e i x i equal to 0 that is e 1 x 1 plus e 2 x 2 plus e n x n is equal to 0. So, what I want to prove here is that this s s residual by sigma square which is equal to summation e i square by sigma square this does not follow chi square n it follows chi square with degree of freedom n minus 2 because there are n minus 2 degree of freedoms for the residuals there are n minus 2 degree of freedom for residuals all the all the e i's are not independent you know the first I mean you can choose n minus 2 residuals independently and then the remaining 2 residuals have to be chosen in such a way that they satisfy the condition that summation e i equal to 0 and summation e i x i equal to 0. So, you have the freedom of choosing n minus 2 residuals or n minus 2 e i's independently and the remaining 2 residuals must be chosen in such a way that they satisfy these two conditions that is why that is why the distribution of s s residual by sigma square follows chi square with degree of freedom n minus 2 it is not chi square n if all the e i's are if there is no constrain on or there is no constrain on e i then this is this could follow chi square n, but since we have 2 constrain on the residuals it does not follow chi square n it follows chi square n minus 2. So, I repeat just that we have we have the freedom of choosing n minus 2 residuals independently and the remaining 2 residuals have to be chosen in such a way that they satisfy these two conditions like summation e i equal to 0 and summation e i x i equal to 0 well. So, what we proved is that summation s s residual by sigma square follows chi square n minus 2 which is equivalent to say that n minus 2 m s residual by sigma follows chi square n minus 2 because because you know m s residual is nothing, but s s residual by n minus 2. So, this is the result we we proved and this is very much useful in in testing of hypothesis. So, next we move to the evaluation of model well. So, what we learned in the in the first lecture is that given a set of data or given a set of observations we learned how to how to estimate we we have learned how to estimate the regression coefficients that means we have learned how to fit regression model to the data. So, once the linear model has been fitted the next job is to confirm the goodness of the fit. So, what we will do is that we we are going to test the significance of the regression coefficients beta naught and beta 1 well. First we will test the hypothesis that beta 1 equal to 0 this is the null hypothesis against the alternative hypothesis that beta 1 not equal to 0. What is the significance of this null hypothesis beta 1 equal to 0 if beta 1 equal to 0 then the model become y equal to beta naught plus epsilon. That means there is no linear relationship between the variables x and y. So, if h naught equal to 0 then the model naught is accepted then we conclude that there is no linear relationship between between the regressor variable and the response variable. And we say that x is of little value in explaining the variation in y whatever be the value of x we can we can estimate the regressor variable y i by y bar. If h 1 if h naught is rejected that means if the alternative hypothesis the two sided alternative hypothesis beta 1 is accepted that means it says that there is a linear relationship between the regressor variable x and the response variable y and x is of value in in explaining the variation in y. This is the significance of the positive alternative hypothesis well. So, how to to perform this test we have to compute the value of the test statistics. And the here to test this hypothesis h naught which is equal to beta 1 equal to 0 against the alternative hypothesis h 1. Beta 1 not equal to 0 the test as statistics here is beta 1 hat which is an estimator of beta 1. And before we have proved that beta 1 hat which is equal to summation x i minus x bar into y i by summation x i minus x bar whole square this is equal to this can be written as summation c i y i that means beta 1 hat is a linear combinations of the observations y i. And we know that y i follows we know that y i follows normal distribution with some mean and variance sigma square. And the mean is beta naught plus beta 1 x i. So, here beta 1 hat is a linear combination of normal variables which implies that beta 1 hat is also normal with mean we know the mean of beta 1 hat. Beta 1 hat is an unbiased estimator the mean of beta 1 hat is beta 1 and the variance of beta 1 hat is we proved before that is sigma square by s x x. So, we need the sampling this is called the sampling distribution of beta 1 hat to find the critical value for the testing of hypothesis well. Now, from here we can say that thus z equal to say beta 1 hat minus beta 1 minus beta by sigma square s x x we know that this follows normal 0 1 standard normal. And the test statistic is equal to beta 1 hat by sigma square s x square. So, this is the value this is the z z equal to this under h naught because under h naught beta 1 is equal to 0. And this is the this is the test statistic to test the hypothesis that h naught beta 1 equal to 0 against the alternative hypothesis that beta 1 is not equal to 0. Now, if see usually this variance I mean sigma square is not known the population variance is not known. If sigma square is known we can use z test the hypothesis h naught beta 1 equal to 0. And we reject the critical region here we reject h naught if z is greater than z alpha by 2. So, we reject the null hypothesis at alpha level of significance if z is greater than z alpha by 2. So, this z alpha by 2 is is nothing but the upper alpha by 2 percentage point of standard normal distribution. So, this is the point z alpha by 2 and this is the point minus z alpha by 2 this is the p d f of standard normal distribution. So, intuitively I mean I mean we we we reject we reject the hypothesis null hypothesis if if beta 1 is is is not close to 0 well. And formally I mean formally we we we we found the test statistic value z and we check if the z is greater than z alpha by 2 then we reject the null hypothesis otherwise we accept the null hypothesis. But I mean usually sigma square is usually sigma square sigma square is not known this is the practical case I mean we cannot assume that sigma square is known. And we we know that the unbiased estimator of sigma square is m s residual because we prove that m expectation of m s residual which is equal to expectation of s s residual by n minus 2 this is equal to sigma square. So, what you do is that the test statistic to test the hypothesis beta 1 equal to 0 this was of the form beta 1 hat by root over of sigma square by s x x. But sigma square is not known here I mean this is the more practical case that sigma square is not known. So, what we do is that we just replace this sigma square by its unbiased estimator m s residual and we call it t and this is the this is the test statistic say beta 1 here this is the test statistic to test the given hypothesis. Now, this does not follow normal we need to find the distribution of this one what we know is that beta 1 hat follows normal with mean beta 1 and variance sigma square by s x x right and. So, from here we can say that beta 1 hat minus beta 1 by sigma square s x x root this follows standard normal normal 0 1 this is 1 and also we know that also we know that n minus 2 m s residual that means s s residual by sigma square this follows chi square n minus 2 and it can be proved that these two are independent. Now, there is a very one very standard result in in sampling distribution let x follows normal 0 1 and y follows chi square n and they are independent then x by root over of y by n this follows t distributed by n. This is a very standard result I am expecting that you know sampling distribution well and now we are going to make use of this result here. So, this one follows standard normal and this one follows chi square n minus 2 and from here I can say that this one follows that beta 1 hat minus beta 1 by sigma square s x x by n minus 2 m s residual this is my x this is my y. So, y by n minus 2 root of this thing this follows t with n minus 2 degree of freedom I am just making use of this result this is my x this is my y and y follows chi square n minus 2 instead of n and x follows standard normal. So, x by root of y by n minus 2 this follows t n minus 2 and from here we get that beta 1 hat minus beta 1 by this will cancel out by m s residual by sigma square s minus sigma square m s residual s x x this follows t n minus 2. So, you obtain this from here only. So, when we are talking about the testing the hypothesis beta 1 equal to 0 against the alternative hypothesis beta 1 not equal to 0 and we are considering the case that sigma square is not known. So, what basically I did here is that I replace the sigma square by s s residual and we prove that this follows t n minus and this t is equal to beta 1 hat by m s residual by s x x under m s residual. So, under h naught beta 1 is equal to 0. So, I am expecting that you know testing of hypothesis well and now we know the distribution of the test statistic and this is a two sided test. So, we reject the null hypothesis. So, we reject the null hypothesis. So, we reject the h naught beta 1 equal to 0 if this t value is greater than t alpha by 2 with degree of freedom n minus 2 where this t alpha by 2 n minus 2 is the upper alpha by 2 percentage point of t distribution with n minus 2 degree of freedom. So, this is the critical region and if the t value is greater than this one the mod of t value is greater than this one and then we are going to reject the null hypothesis beta 1 equal to 0 and we conclude that there is a linear that rejecting the null hypothesis means accepting the alternative hypothesis that beta 1 not equal to 0 and we conclude that there is a linear relationship between the regressor variable and the response variable. Now, let us recall the toy example and these are the results of the test statistic. The cost the money spent on advertising and this is the sales amount and what you want to check before we already we have a fitted relationship between x and y and now we want to check whether the relationship is significant at 0.05 level of significance that is alpha equal to 0.05 that is type 1 error. So, let me do this one here x is equal to the cost the money spent on advertisement that is the regressor variable the values for this one two three four five and y is the sales amount and the values are one, one, two, two, four and using the least square technique we already we have estimated the regression coefficients and the fitted model is y hat equal to minus 0.1 plus 0.7 x. So, beta 1 is 0.07 and beta not equal to minus 0.1. So, this is the fitted model well now corresponds to x equal to 1 the predicted value of the response variable is 0.6 you just put x equal to 1 here that gives that y 1 hat or the predicted value is 0.6 corresponds to x equal to 1. Similarly, for x equal to 2 the predicted value is 1.3 for x equal to 3 the predicted response value is equal to 2 for x equal to 4 y hat is equal to 2.7 for x equal to 5 y hat is equal to 3.4. So, this is y 1 hat y 2 hat y 3 hat y 4 hat y 5 hat. Now, let me compute e is equal to the residuals e i I call them i. So, e i equal to y i minus y i hat that is the observed response and the predicted response. So, the values are 0.4 minus 0.3 0 minus 0.4 minus 0.4 minus 0.7 0.6. So, from here we can compute s s residual which is equal to summation e i square this can be proved that from here you can compute this is equal to 1.1 and here m s residual is equal to we know m s residual is s s residual by n minus 2 here n is equal to 5 we have 5 observations. So, n minus 2 equal to 3. So, this gives this is equal to 1.1 by 3 which is equal to 0.3666. So, let me compute e is equal to 0.666. Let me compute the test statistic value t t equal to beta 1 hat by root over of m s residual by s x x. So, we know what is the value of m s residual beta 1 hat equal to 0.7 and the value of m s residual is 0.3666. What is s x x? So, this is the value of m s residual s x x equal to summation x i square minus n x bar square. So, it is not difficult to compute from here you are given the x i's values. So, this is equal to 55 minus n equal to 5 and x bar equal to 3 square which is going to be 10. The t value ultimate t value is going to be 3.356. So, the t value is equal to the observed t value is 3.356 and from the table you find the value of t alpha by 2 here alpha is equal to 0.05 by 2 and degree of freedom is n minus 2 n is 5. So, this is equal to 3 this is going to be 3.18. So, since t is greater than this quantity we conclude that h naught is h naught that is beta 1 equal to 0 is rejected. That means, we say that there is a linear relationship between the regressor variable and the response variable and here is the summary we are trying to test this hypothesis. We have the given observation and the level of significance is 0.05 degree of freedom is 5 minus 2 equal to 3. We already computed the test statistic value which is equal to 3.656 and the critical value is 3.1824 which is equal to 0.356. So, this is nothing but t 0.025 with degree of freedom 2 and since this 3.6 is in the critical region it lies in the critical the observed value is in the critical region we are going to reject the null hypothesis at alpha level of significance and conclude that there is evidence of linear relationship between the response variable and the regressor variable and here now we can stop for today. Thank you.