 So, this is my second tutorial class and we will be solving some problems from simple linear regression, multiple linear regression and also may be ill conditioning of the coefficient matrix X. So, here is the problem 3, because problem 1 and 2 we solved in the previous tutorial. So, this says that there are very few occasions where it makes sense to fit a model without an intercept beta naught. If there are occasion to fit the model Y equal to beta X plus epsilon, that is a model without any intercept to a set of data X 1 Y 1 X 2 Y 2 X n Y n, then the least square estimate of this beta would be this. So, we will check this part, what is the least square estimate for beta, let me call it beta hat for a model without any intercept beta naught. And then the second part is that, suppose you have a programmed calculator that will fit only the intercept model Y equal to beta naught plus beta 1 X plus epsilon, but you want to fit non-intercept model. So, you have some programmed calculator that fit only intercept model, but you want to fit non-intercept model. Now, the question is by adding one more fake data, say m X bar m Y bar, where m is a function of n and letting the calculator fit the intercept model, can you estimate this beta by using beta 1 hat. So, this is the problem, maybe I will try to explain it once more the problem itself. So, what we are given is that, we are given a set of data X 1 Y 1 X 2 Y 2 and X n Y n and you want to fit a non-intercept model like Y equal to beta X plus epsilon, but we have a programmed calculator for fitting this model, model with intercept that is Y equal to beta naught plus beta 1 X plus epsilon. So, we have a program to fit this model and we know that for this model beta naught is the estimate of beta naught is equal to Y bar minus beta 1 hat X bar and beta 1 hat is equal to S X Y by S X X that means, this is X i minus X bar into Y i minus Y bar by summation X i minus X bar square. So, we have a program to calculate these two things given a set of data, but what we want we want to fit this model to the given data and it says that the question says that the least square estimate for beta here is equal to summation X i Y i by summation X i square. First check this one, what is the least square estimate for beta in the non-intercept model. So, to find the least square estimate for beta what we do is that we minimize this function S which is equal to Y i minus Y i hat, this is the ith residual and we know that this one is equal to Y i minus Y i hat I can minimize by beta hat X i right and then the least square estimate of beta hat is obtained by minimizing this function with respect to beta. So, d s you differentiate it with respect to beta that equal to 0 implies that summation Y i minus beta hat X i into X i equal to 0, which implies that beta hat is equal to summation X i Y i X i Y i by X i square. So, we prove that you know the least square estimate for beta is this quantity. Now, the problem is that we want to find this beta using the program calculators we have. Let me see the question once more, suppose we have a programmed calculator that will fit only the intercept model this one, but we want to fit a non-intercept model. Now, the question is by adding one more fact data say this one can we estimate beta by using beta 1 hat that means by using this program. So, at this moment we have the data X i X 1 Y 1 X 2 Y 2 and X n Y n and X i X i if you add one more data say n plus one-th data that is m X bar n sorry m X bar and m Y bar where m is equal to m is equal to n by n plus 1 to the power of half minus 1 and this is equal to say n by a. Then what is a? a is from here I can write that a plus 1 square is equal to n plus 1. Now, what we will do is that we want to estimate X beta plus epsilon we want to estimate beta hat which is equal to X i Y i by X i bar, but we do not have program for this one. We have program for estimating the model intercept model that is Y equal to beta naught plus beta 1 X plus epsilon and the programed calculator gives us beta 1 hat equal to summation X i Y i minus n X bar Y bar by summation X i square n X bar square. Now, we will use this formula for the revised data let me call this data as u v. Now, for this new data which is now involving n plus 1 points what is u bar? u bar is equal to n X bar plus the new data m X bar, m is nothing but n by a X bar by n plus 1. So, this one is equal to n X bar into a plus 1 by a plus 1 by n X bar plus 1 by n X bar plus 1 by n plus 1. So, we know that n plus 1 is a plus 1 whole square. So, I can write it this way I can write this one as n X bar by a into a plus 1. So, this is my u bar for the data involving n plus 1 points and now let me compute S u u, S u u is nothing but you know what is S x x. So, S u u is equal to summation X i Y i minus X i square plus the new data m X bar square minus n plus 1 u bar square right because S u u is let me write here S u u is nothing but summation u i square minus n plus 1 u bar square and here i is from 1 to n plus 1. So, that is what I wrote here and here i is from 1 to n and this is the n plus 1 th data and this one can be written as summation X i square plus n square by S square into X square minus n plus 1 u bar square. What is u bar square? u bar is this quantity. So, n square X bar square by S square into a plus 1 into a plus square. So, I can cancel out these two things and then you can see that this term is same as this term. So, you are left with summation X i square. So, S u u is equal to X i square from i equal to 1 to n and similarly you can prove that similarly you can prove that S u v is equal to summation X i y i. So, now if you apply the programmed calculator to the new data new set of data which involves n plus 1 data points and we call them u v then the programmed calculator will give you the estimate beta 1 hat which is equal to S u v by S u u and you have proved that S u v is nothing but summation X i square sorry summation X i y i and S u u is equal to summation X i square. So, you prove that you know using the programmed calculator if you use if you add a fake data here at the end then you can estimate the model without intercept using the formula of model including the intercept. So, that is what we proved just now. Now, let me consider another problem. So, call this problem 4. So, this one says that fit the model y equal to beta naught plus beta 1 X 1 plus beta 2 X 2 plus epsilon to the data given below. So, here we have 2 regressors X 1 and X 2 and one response variable y and what you have to do is that you provide an an over table this is quite straight forward problem and perform the partial F test to test H naught that beta i equal to 0 against H 1 that beta i is not equal to 0 for i equal to 1 to 0. So, given the other variable is already in the model. So, to test the significance of ith regressor in the presence of other regressors in the model. So, let me do till this one and then you said that comment on the relative contribution of the variables X 1 and X 2 depending on whether they entered the model first or second. I will come to this point later on. So, it is a multiple linear regression problem involving 2 regressors and we are given the data like for X 1, X 2 and Y. So, we have to fit the model that means we have to estimate the parameters beta naught, beta 1 and beta 2. So, once we are done with the estimation of the parameters then we can have the an over table and after getting the an over table we estimate sorry we test the significance of the model that is called the global test and after the global test what we will do we will go for the partial F test to test the significance of the ith regressor in the presence of other regressors and after that we just sort of compare the relative contribution of X 1 and X 2 I mean which is more significant to explain the variability in Y. So, this is the given data for 2 regressors and 1 response variable and we have to fit the model. The model is Y equal to beta naught plus beta 1 X 1 plus beta 2 X 2 plus epsilon and we know how to fit this model. So, what I will do is that. So, first we will write down the X matrix. So, X matrix you can see that this is corresponds to X naught which all 1 and then the X 1 and X 2. So, once you have the X matrix you know that the estimator beta hat is equal to X prime X inverse X prime Y. So, you are given Y you know X you can compute you can check that beta naught is equal to 46 by 7 beta 1 is equal to 1 and beta 2 is equal to 1 and beta 2 is equal to 2. So, the fitted model is this one this is the fitted model Y hat is equal to 46 by 7 plus X 2 beta 1 X 1 beta 1 is equal to 1 and beta 2 X 2 beta 2 is equal to 2. So, this is the fitted model. Now, what we have to do is that once we have the fitted model we will go for the ANOVA table. So, ANOVA table. So, the source and then degree of freedom sum of square m s and the a value and the source is here the total. So, what we have to do so what is s s total s s t is equal to summation y i minus y bar square and you can check that you are you know the y i values. So, you can check that this is equal to 73.71. So, my s s total is 73.71 and we know y hat i values and then we know the original observed values also. So, y i so from here you can compute e i the i th residual and then you can compute the s s residual also. So, s s residual is nothing but summation e i square i is from 1 to n. So, you can check that the s s residual is 1.71. So, this is residual and then we are left with the regression. So, the regression you can check that this is the original that the regression is 72.00. Now, here is the problem how many observations we have we have 7 observations. So, the degree of freedom for s s total is 6 and then the residual degree of freedom would be 4 because there are 7 e i's and there are 3 restriction because of 3 parameters. So, the degree of freedom for the residual is 4 and the regression has degree of freedom 2. Now, you can compute the m s value m s values are 36 here and this is 0.43. So, the f statistic value is 36.00 by 0.43 which is equal to 83.72. So, the f statistic value is 36.00 by 0.43 which is equal to 83.72. So, using this f value this is this is the total variability in the response about the mean. So, this is the and this is the part of total variability which is explained by this model and this is the part which remain unexplained. So, we can test the significance of the fitted model by testing this hypothesis that H naught that beta 1 is equal to beta 2 is equal to 0. So, this says that the fitted model is not significant against the alternative hypothesis that H naught is not true. That means, the alternative hypothesis says that the fitted model is significant and you can clearly see that the model is very significant because it is almost 99 percent of the total variability is explained by the fitted equation. So, how to test this one? This one can be test using the f statistics given here. So, the observed f value which is equal to 83.72 and you compare with compare this value with the tabulated value. What is the degree of freedom for f? So, f follows here f has degree of freedom 2 4. Now, you check the tabulated value of f 0.0524 from the f table that you can check that this one is 6.94. So, the observed value is greater than the tabulated value which implies that H naught is rejected. So, the global test says that the fitted model is significant. Now, what we will do is that there are two regressor variable in the model. We will test the significance of say x 2 first in the presence of x 1. So, we will test whether x 2 is significant in the presence of x 1 when x 1 is there in the model and then again similarly, what we will do is that we will test the significance of x 1 in the presence of x 2 in the model. So, those things we will do using partial f test also we can go for t test. So, now we will go for partial f test. So, we test H naught. So, we test H naught that beta 2 is equal to 0 against the H 1 that beta 2 is not equal to 0 and this is this test is in the presence of x 1 in the model. So, how do you test this one? We go for if you go for the partial f test here is the f statistic f is equal to 0. So, S S regression for the full model minus S S regression for the restricted model restricted model. And we will see you can check that this has to be divided by 1 by 1 minus S S regression MS residual. We know what is the full model? Full model is y equal to beta naught plus beta 1 x 1 plus beta 2 x 2 plus epsilon and the restricted model is the model under H naught. So, restricted model is basically y equal to beta naught plus beta 1 x 1 plus beta 1 plus epsilon. So, we know what is S S regression for the full model? So, if you look at the ANOVA table we know that S S regression for the full model is 72. So, here we will put 72 minus now to find the S S regression for this restricted model what you have to do is that you have to fit a model with x 1 alone. That means, you have to fit this model and you can check that the fitted model would be y hat is equal to 46 by 7 minus 66 by 68 into x 1. So, you are given the data x 1, x 2, y. So, you can fit a model between x 1 and y. You know how that is that means, a simple linear regression. So, this is the fitted model and once you have the fitted model you can compute the S S regression due to this model that is nothing but 64.06. You check this one. Now, this has degree of freedom 2 and this has degree of freedom 1. I hope you know why that is. So, 2 minus 1 is equal to 1. So, you divide this by 1 and now the MS residual from the ANOVA table from MS residual is 0.43. So, we put that MS residual is 0.43. So, this one is equal to 18.53 and we know that this F statistic has degree of freedom 1, 4.4 is the residual degree of freedom. Now, you find the tabulated value of F 0.0514 that is equal to 7.71 and you have the observed value F that is 18.53. So, this test says that yes beta 0.4 is equal to 0.4. So, this 2 is significant. So, this means H naught is rejected. So, H naught is rejected at 5 percent level of significance. So, at the 5 percent level of significance we can say that X 2 is significance in the presence of X 1 in the model. Now, let us check whether this is significant at 0.01 level of significance. So, compute you find the value of tabulated value of F 0.114 that you can check that this is equal to 21.20. So, the F value is less than this one. So, here H naught is accepted that means at 1 percent level of significance X 2 is not significant in the presence of X 1. Whereas, at the 5 percent level of significance X 2 is significant in the presence of X 1. So, that is the conclusion of this partial test. So, at least you know at 5 percent level of significance we observed that beta 2 is significant or X 2 is significant in the presence of X 1. Now, what we will do is that we will check the significance of X 1 in the presence of X 2. So, we will go for the partial test say H naught that is beta 1 equal to 0 against the alternative hypothesis H 1 that beta 1 is not equal to 0. So, here same thing we the statistic for testing this hypothesis is F which is SS regression for the full model minus X 1 minus X 2 minus SS regression for the restricted model and here it should be divided by 1 by M S residual. So, here what is the restricted model here? Here the restricted model is Y equal to beta naught plus beta 2 X 2 plus epsilon because the restricted model is the model under H naught. So, again you have to fit model with X 2 alone and you can check that the fitted model is Y hat equal to 46 by 7 plus 69 by 68 X 2 and once you have the fitted model you can find out the SS regression. So, SS regression is equal to 70.01. So, this one is greater than the SS regression we got for the model with X 1 alone that was 64 anyway. So, this one is equal to the 72 is SS regression for the full model and this one is 70.01 by M S residual we know that is 0.43. So, this one is equal to 4.64. Now, you check so your observed value is 4.64 and the tabulated value f 0.0514 here that is equal to 7.71. So, the observed value is less than the tabulated value that means H naught is accepted. What is the meaning of this one? That means beta naught 1 sorry that means beta 1 equal to 0 when beta 2 X 2 is in the model beta 1 equal to 0 is accepted. So, the implication of this one is that if X 2 is there in the model we do not need X 1 because you can see that the full model can explain 72 I mean for the full model SS regression is 72 and for the model involving only X 2 is 70. So, which is almost like full model right. So, the implication of this one is that implication is that if X 2 is in model we do not need X 1 X 2 is enough. So, if you have X 2 first in the model we do not need to include X 1, but if X 1 is in the model then we have tested that you know the significance of beta 2 in the presence of X 1 is significant. So, if X 1 is in the model then we include X 2. So, X 2 helps out significantly right. So, this is what the implication of these two partial test and let me also conclude that thus X 2 is clearly the more useful variable and X 2 is the more useful variable. It explains I will compute the coefficient of determination R square for X 2 itself. So, for the model involving X 2 alone we have observed that the SS regression is 70.01 and the total variability is in the response variable is for SS T is 73.71. That means the 95 percent of the total variability is explained by X 2 alone. So, 95 percent of the variability in Y. So, it explains 95 percent of the variability in Y about whereas, X 1 explained let me compute R square. So, along with only X 1 in the model the SS regression was 64.06 by SS T that is 73.71. So, this is 86 percent. So, X 1 explained 86 percent of the total variability in Y about mean. So, X 1 explained 95 percent of the total variability in Y whereas, sorry X 2 explained 95 percent of the total variability in Y whereas, X 1 explains 86 percent of the total variability in Y and X 1 and X 2 together explain 72 by 73.71 that is 97 percent of total variability in Y. So, the conclusion is that X 2 is more useful regression variable than X 1 because X 2 alone can explain 95 percent of the total variability and also X 2 is significant in the presence of X 1. Whereas, X 1 explained 86 percent of the total variability and X 1 is not significant in the presence of X 2. But one more thing you know I just want to say this problem is particularly interesting because of the one more fact that here you can see that the data here you can see that they are X 1 and X 2 are not independent I mean X 1 can be written in terms of X 2. In fact, you can check that X 1 plus X 2 is almost equal to 0. So, these two regression variable are dependent. So, this sort of indicates that you know there could be multicollinearity I mean in fact, there is multicollinearity in the in this data and that is why the test also says that you know you do not need X 1 if X 2 is present in the model. So, it is enough to keep only X 2 in the model I mean both X 1 and X 2 are not required. So, next we will be considering another problem. So, this problem is let me call it problem 5. So, it says that can we use the data below to get a unique fit to the model Y equal to beta naught plus beta 1 X 1 plus beta 2 X 1 plus beta 2 X 2. So, X 2 plus beta 3 X 3 plus epsilon. So, can we fit this model uniquely using this data that is the question. So, you look at the data carefully. So, this involves how many parameters it has 1, 2, 3, 4 parameter and we have this problem. This data so it is a multiple linear regression model with 3 regressors. So, you can write it in this form Y equal to X beta plus epsilon and then we know that we know how to estimate this regression coefficients uniquely. We know that beta hat is equal to X prime X inverse X prime Y and what is the problem here then? Why it says that can we use the data below to get a unique fit to the model? Why not? So, look at the X matrix here. So, I have included X naught here. So, X naught means beta naught X naught and then X 1, X 2 and X 3 and you know that X 1 column corresponds to this column. Then simply I will compute X prime X inverse and then I get the estimate right, but is there any problem here? So, we need to check whether there exist whether this all this columns are independent or not. Here I can see that X 1 plus X 2 plus X 3 is equal to 0. That means the columns of this matrix are not independent which implies that X prime X is a singular. So, X 1 plus X 2 plus X 3 right and then we if it is. So, the determinant of X prime X is going I mean it is singular. So, determinant of X prime X is going to be 0. So, you cannot compute the inverse of this X prime X matrix. So, that is why the problem know you cannot compute the beta uniquely here. So, the ultimate answer to this question is no. So, we cannot use the data below to get a unique fit to this model. So, this problem is related to the yield conditioning of X matrix right. So, today we consider three problems you know first problem was from the simple linear regression model. The second problem was very standard problem in multiple linear regression model and the problem was very interesting you know it is sort of you have two regressors and then we finally, we observed that you know X 2 is significant in the presence of X 1. So, but whereas, X 1 is not significant in the presence of X 2. So, X 2 is more useful regressor variable than X 1 for the given data and finally, we observed that the two regressors are not independent. So, there exist multicollinearity that is why one variable is enough to explain the total variability in the response variable and the finally, the fifth problem was about the yield conditioning of the coefficient matrix. So, in the next tutorial again we know we will discuss some more problem. Thank you.