 this is my third tutorial and today will be solving you know some randomly selected problems. Let me start with this problem, problem six because I have solved five problems before that in the general linear regression situation with the beta not term in the model the correlation between the vector e and y is one minus r square square root of this. This is sort of you know theoretical problem but this has some implication we will come to that point later on. So, the first part says that you know you find the correlation between the vector e. So, this e is the residual vector and y is observed response and we have to find the correlation between e and y and we have to prove that that is square root of one minus r square where r square is the coefficient of multiple determination. So, we know that this r square is equal to s s regression by s s total. So, this r square parameter it sort of measure the proportion of variability in y that is explained by the model. So, let me solve the first part a the correlation between e and y. So, the correlation between correlation between e and y that is nothing but summation e i minus e bar into y i minus y bar by summation e i minus e bar square into y i minus y bar square and the square root of whole thing. So, we have to prove that this is equal to one minus r square where r square is s s regression by s s total not so easy. So, we can start with this numerator e i minus e bar y i minus y bar i is from one to n. So, can I write this one as just e i into y i minus y bar because of the fact that e bar is equal to 0 if beta naught is in the model because if beta naught is in the model then while differentiating the least square function s with respect to beta naught we get summation e i equal to 0 and from there it is e bar is equal to 0. So, we can write this is equal to e i into y i minus y bar i is from one to n and this can be written again this can be written as e i y i i is from one to n because you know this summation e i is equal to 0. So, in matrix notation this can be in vector notation it can this can be written as e prime y or y prime e. Anyway what is e prime y can I write this as e prime e it is not trivial. So, we need to check this part whether this is equal to e prime e. So, let me start with e prime e and then I will prove that e prime e is equal to e prime y. So, before that let me recall the multiple linear regression model y equal to x beta plus epsilon and here we know that the parameter beta is estimated by beta hat which is equal to x prime x inverse x prime y and then the fitted model is y hat is equal to x beta hat right this is the fitted model and we know that beta hat is equal to x prime x inverse x prime y. So, I can write this one as x and then I plug the value of beta hat here. So, x prime x inverse x prime y and then I call this matrix I call this h. We this is called hat matrix you know we know about this h matrix. So, this is called hat matrix because of the fact that. So, finally, what we got is that we. So, y hat is equal to h y because this hat matrix transform y to y hat that is why it is called hat matrix. So, where h is equal to x x prime x inverse x prime. So, y hat is equal to h y. Now what is e then e is equal to the observed value minus the estimated value. So, just now we have proved that y hat is equal to h y right. So, this one is equal to i minus h y. So, i is the identity matrix. So, now what we want is that I want to prove this e prime e is equal to e prime y that is what I want. Now I have a formula for e in terms of hat matrix. So, let me start now e prime e I can write e prime e is now in terms of hat matrix that is y prime i minus h prime i minus h y and it is known that you know this hat matrix is idempotent matrix. That means h square is equal to h and then i minus h is also idempotent. So, what I can write is that this is y prime this can be replaced by only i minus h y. Now i minus h y is equal to e. So, this is equal to y prime e. So, we have proved that e prime e equal to y prime e. So, what we have proved is that the numerator is equal to e prime e which is equal to e prime e is nothing but summation e i square which is nothing but s s residual. So, we have proved that the numerator is s s residual and what is this? This is nothing but e prime e and this one is nothing but y prime I mean this is nothing but s s total in fact. Let me write down once more here. So, the correlation between we want to find the correlation between e and y which is equal to summation e i minus e bar y i minus y bar by square root of summation e i minus e bar square y i minus y bar square. And we know that this one is equal to we proved that this one is equal to e prime e the numerator is e prime e and then since e bar is equal to 0 this one is nothing but summation e i square. So, this is again e prime e and this one is nothing but s s total square root of this. So, this can be written as e prime e by s s total square root and e prime e is nothing but s s residual by s s t and s s residual is nothing but summation e i minus e i minus s s total minus s s regression by s s t. So, we are almost done and then this can be written as 1 minus s s regression by s s t. So, this one is nothing but 1 minus this one is r square so 1 minus r square. So, we proved that the correlation between e and y is square root of 1 minus r square. So, here is the problem. So, we have solved the part 1 of this problem and what is the implication of this? The implication of this result is that it is a mistake to attempt to find the defective regressions by a plot of residual e i against the observation y i as this always shows a slope. So, if you can recall you know once we have a fitted model say y hat is equal to x beta hat then what we do is that we compute the residuals and in a topic called model adequacy checking we talked about several residual plots and that residual plots is sort of to check whether the model assumptions are correct or also to check the goodness of the fit. So, what we do in the residual plot is that we plot e i residual against y i hat not y i. So, this is the reason why we plot the residual against y i hat not against y because there is a correlation between e i and y i. There is a correlation between e and y and the correlation we just proved that it is 1 minus r square square root of that. So, that is why we do not go for plotting e i against y i because there is always a theoretical slope between them since because of this correlation now. So, the second part says that so that this slope is 1 minus r square what is the meaning of this one is that if you fit a linear relationship between e and y the slope is going to be 1 minus r square in that linear fit. So, now let me fit a relation between e and y suppose the relation is a straight line relation. So, e is equal to say a plus b plus b plus b plus b plus b y. So, what have to do is that I have to prove that this b is equal to 1 minus r square the slope is 1 minus r square this is part b. So, we know what is this b we know that b is equal to e prime y by y prime y because you know when we fit like y equal to e prime beta naught plus beta 1 x we know that beta 1 is nothing but s x y by s x x. So, similar to that so x y is nothing but x prime y and then x prime x. So, this has been followed from here now e prime y is just now we proved that e prime y can be written as e prime e. So, by y prime y. So, e prime e is nothing but s s residual by y prime y is s s total. So, this one is nothing but s s total minus s s regression by s s total. So, this is nothing but 1 minus s s regression by s s total which is equal to 1 minus r. So, the slope the b we have proved that this is 1 minus r square and the third part of this question is that. So, further that the correlation between e and y hat is equal to 0. See in the residual plot we plot e against y i y hat because their correlation is 0. We have to prove that now maybe I have proved this in the during the class also anyway. So, the correlation between part c between e and y hat is 0. So, the correlation between part c between e hat and this one is equal to. So, first we will start with summation e i minus e bar into y i hat minus y hat bar. Well now e prime e bar is equal to 0. So, this is the correlation this can be written as e prime y hat and we know that y hat is equal to h y and also we know that y hat is equal to h y and also we know that e is equal to 1 minus h sorry i minus h y. Just now we proved these two things. Now if I plug these two values here what I will get is that my e prime y hat is equal to 0. So, this is the correlation between part c between e prime. So, y prime i minus h and y hat that is h y. So, this is equal to y prime h minus h square into y. Now see this h is a Hadamard matrix. So, h square sorry I do not know Hadamard matrix. So, h square is equal to h. So, that is why this is equal to 0. So, the correlation you can see the co-variance is equal to 0. So, the correlation is the numerator this is the numerator of this correlation expression. So, the correlation between e y hat is equal to 0 because of this fact. So, the correlation between e y hat is equal to 0. So, we are done with the first problem. Let me prove one more problem which is again no theoretical call it problem 7. This problem says that you have to prove the coefficient of determination is equal to the square of the correlation between y and y hat. So, what you have to prove here is that you have to prove that the correlation between y and y hat and the square of this one is equal to r square. So, this is equal to r square is again the coefficient of determination which is equal to s s regression by s s total. So, it is nice to prove this one. Let me write down that what is r this correlation I will use the notation r y y hat. This is the correlation between y and y hat. So, this hat which is equal to summation y i minus y bar into y i hat minus y hat bar you can write but this two are same by summation y i minus y bar square summation y i hat minus y i bar. Yes I hope you understand that y bar is equal to y hat bar. This is the observed mean of the observed value and this is the mean of the estimated value. You know that e i is equal to summation e i is equal to 0 for a model with intercept and then e i is nothing but y i minus y i hat. So, this says that y i minus y i hat. Summation y i is equal to some y i hat and then of course y bar is equal to y hat bar anyway. So, this square and the square root of this. Now, this can be this y i can be written as we know that this y i is equal to y i hat plus the residual e i. So, if I put that expression here what I will get is that I will get y i hat minus y bar into y i hat minus y bar because this is nothing but y bar only plus I have extra term here because of this plus e i. So, plus y i hat minus y bar into e i and the denominator is the same. Let me write down that this is y i minus y bar square sum y i minus y bar square minus y i hat minus y bar square. Now, what about this one is this 0? Yes, just now we have proved that the correlation between e and y is equal to 0. So, this is the covariance between e and y hat. Just now we have proved that in the previous problem we have proved that the correlation between e and y hat is 0. That means the covariance between e and y hat is equal to 0. So, this term is going to be equal to 0. So, what we are left with is that then the correlation this r y y hat is equal to some y i hat minus y bar square by this. So, I can write this one as summation y i minus y bar square and the square root of the whole thing. Is it clear? Because this one is just I am writing square and then you can cancel out. So, r y y hat is equal to this one and this one is equal to the numerator is s s regression and the denominator is s s total. So, what I got is that this is equal to r square. So, that is what you have to prove that the correlation between the square of the correlation between y hat. So, what you got is that r y y hat square is equal to r square. This is what we wanted to prove. So, this is what we wanted to prove. So, next we will consider a practical problem and this problem is involving orthogonal polynomial. I mean polynomial fitting using orthogonal polynomial. So, here in this problem we will sort of explain how to decide about the degree of decide about the order of the polynomial. So, here the problem is that new born baby was wetted weekly and 20 such weights are shown below. So, this is the first week the baby has weight 141 ounce and similarly for 20 weeks the data are given and if you draw scatter plot for this one I am sure that you are going to have a non-linear pattern may be. So, you fit to the data using orthogonal polynomials a polynomial model of degree justified by the accuracy of the figures. So, the degree is not given. So, you have to decide about the degree of the polynomial you are going to fit here. So, we will sort of follow the pattern of that we talked about how to decide about the degree while talking about orthogonal polynomial fitting or polynomial fitting. So, you start with the linear model and then next you fit a polynomial of order 2 and then you see the significance of beta 2 that is the coefficient of x square. If beta 2 is significant then only you go for third order polynomial, but if you see the beta 2 is not significant then first order polynomial first order model is enough, but if beta 2 is significant then you go for third order polynomial again you have to test the significance of beta 3. If beta 3 is significant then you go for fourth degree polynomial if it is not significant you stop at second degree polynomial something like that. So, here this problem sort of you know give idea about how to decide about the order of the polynomial. So, let me recall little bit what is polynomial model. So, we wish to fit the model say y equal to beta naught plus beta 1 x plus beta 2 x square plus beta k x to the power of k plus epsilon and what we learnt there is that you know instead of fitting this model there are several advantages if we go for polynomial fitting using orthogonal polynomial. So, instead of fitting this model we fit y equal to I think alpha naught plus alpha 1 p 1 x alpha 2 p 2 x plus alpha k plus alpha 2 p 2 x plus p k x plus epsilon. So, p k x is k th order orthogonal polynomial and of course, there are all orthogonal polynomials well. So, we know how to fit this model we know what are these orthogonal polynomial if you know I please refer my lecture on polynomial fittings for to know more about you know this orthogonal polynomials. So, what we know is that we know that alpha naught hat is equal to y bar we know that alpha j hat is equal to p j x i into y i by p j x i into y i by p j x i into y i by p j x i square. So, what we want to say here is that by using all this formula. So, you know y i and of course, x i's are all you know equally spaced you can just consider them 1 2 3 4 up to 20 and you know this orthogonal polynomial. So, this is the j th order orthogonal polynomial. So, you can compute beta naught and beta j hat for j equal to 1 to k. Now, let me just write down that also you know that the s s regression due to alpha 1 that is the contribution of the first order term in the polynomial that is equal to alpha 1 hat sum over y i p 1 x i. So, and similarly s s regression due to alpha 2 is equal to alpha 2 hat sum over i y i p 2 x i for i equal to 1 to n. And similarly s s regression alpha 3 equal to alpha 3 hat sum i is from 1 to n y i p 3 x i. Why I am talking about all this s s regression due to alpha 1 alpha 2 alpha 3 is that what you know I feel is that you first given given the problem you know you have x i y i values. So, these are the x i y i values this is your x i this is y i. So, given this thing you first compute what is the total variability in the response variable. So, the response variable here is y. So, weight and this is x you compute s s total first to check the variability in y about the mean that is y i minus y bar square. And once you know the total variability here it is you can check that this is 26000 018. So, this is the total variability and I want my model to explain this variability. Now, what I will do is that I will check the s s regression due to alpha 1. So, s s regression due to alpha 1 will give me how much of the total variability in y is explained by the linear term. And similarly the s s regression due to alpha 2 will give me how much of the total variability is explained by the quadratic term that is by beta 2 or alpha 2. And similarly s s regression due to alpha 3 will give the linear term. So, this is amount of variability that is explained by the cubic term. So, you see how much of the variability is explained by alpha 1 how much of the variability is explained by alpha 2 and how much of the variability is explained by alpha 3. If you see that alpha 1 and alpha 2 together they have almost explained major part of the variability in y then you can stop at the quadratic fit, but if you see that still there are significant part which has not been explained by the quadratic fit then you can go for cubic term. So, this is my s s total and I want my model to explain this variability. So, what I will do is that I will compute see I know I am not going into detail of computing all these things. So, you can compute alpha 1 hat you can check that this is equal to you can have the computations you can check that this s s regression is equal to due to alpha 1 is very large this is 25,438.75 and s s regression due to alpha 2 is 489,00 and s s regression due to alpha 3 is 1.438.75 and s s regression 1.5 and let me repeat that s s total that is y i minus y bar the variability in the response variable about mean is equal to 26,018. So, you can see this is the total variability and the first order term or the linear term has already been explained has already been explained major part of this total variability. And alpha 2 is also significant this is 489 and what you can do is that you know see you can now make a maneuver table let me do that. So, maneuver table okay, source degree of freedom s s, s s and s s and s s s s s s s and m s and f. So, regression due to alpha 1 regression due to alpha 2 regression due to alpha 3 and say residual and then total. So, total s s we computed that this is 26000 018 and s s regression is due to alpha 1 is 25000 438.75 s s regression due to alpha 2 is 489 and s s regression due to alpha 3 is 1.15 and you can check that the s s residual is just 89.30 and the degree of freedom for this one is one this one is one this one is one and total degree of freedom is 19 because there are 20 observations and then the residual degree of freedom is 16. So, you can compute the m s value m s residual is 5.58 and this value will be will remain same 25000 438.75 489.001.15 well now see of course then the f value f value you can check that this by this that is going to be 4558.92 and f value for alpha 2 is this by this that is 87.63 and the f value for alpha 3 is 0.21 and this all this f follows has degree of freedom 116 and now you check the tabulated value of 0.05116 that is equal to 4.49. So, you can see that this one is much larger than the tabulated value this one is also much larger than the tabulated value, but this one is smaller than 4.49 the meaning of this one is that alpha 1 is significant alpha 2 is also significant because the f value observed f value is greater than the tabulated f value and this table says that alpha 3 is not significant that means you can go for a quadratic fit. So, the final fit in terms of x is 136.227 plus 2.68 x plus 0.167 x square that you can check, but what I want to say is that this is my total variability and I want to explain I want to I want my model to explain this variability. Now, you can see that this linear term explain huge part of the total variability the quadratic term explain this much of total variability and the cubic term is not significant right and now we can see that you know out of 26000 of the total variability major part has been already been explained by the model and the remaining part is 89.30. So, I can go for a quadratic fit, but here just I want to say suppose your cubic term is not significant but still cubic term is not significant, but this residual is still reasonably high then you can go for alpha 4 see alpha 3 might not be significant, but alpha 4 could be significant I mean I fit so. So, in that case you can go for a model with linear term quadratic term and the fourth degree, but not the third degree. So, it depends on how small the residual is if the residual is reasonably small then you do not need to go for the higher order polynomial, but somehow I feel that you know even if the third degree even if the cubic term is not significant it might happen that the fourth degree term is significant, but you have to go for the fourth degree polynomial only if the residual part is large otherwise you know just forget about that you go for the quadratic fit. So, this is the problem we talked about from the polynomial regression and next let me talk about one more problem this is I do not know what is the number this might be 8 well. So, this problem says that if you are asked to fit a straight line to the data this given data what would you say about it I mean it is sort of difficult exactly know what the question is asking for looking for. So, you are given 1, 2, 3, 4, 5 data points and here is the scatter plot for the given data. So, this is my x axis this is my y axis and here this is my y axis do you have anything to say because see here you can see that this particular point that is 6, 4.5 this particular point is not in the usual trend of the data. So, that means some at some point of time you know I talked about influential observation and leverage point. So, this point this 6, 4.5 it seems to be a influential point because this one is not in the general trend of the other data. So, 6, 4.5 is an influential observation right and see if you with this point I mean without this point you can you can think of fitting if you if you fit a model to the remaining data you will get a fit with negative slope. So, this should be the fitted line right I mean you will get a you will fit a line like this for if you ignore this point right and, but if you include this point you will get then your fitted line should be something like this one. So, that is the problem with the influential observation. So, without the point this 6 this is my 6, 4.5. So, without this point the fitted model has negative slope and with this point you will get a negative the fitted model has a positive slope. So, this clearly says that the point 6, 4.5 is an influential observation. So, the recommendation here is that you know if there there exist influential observation very few influential observation. If you can identify if you can identify an influential observation in the given data may be you can ignore them if they are not large in number you can ignore them and fit a model for the remaining data. And what could have been better here is that you know if. So, if you see here you know you have data up to 3 and then the next one is 6. So, the next one is 6 some observations between x equal to 3 and x equal to 6 would be useful here. So, this is also you know you can comment in that way. So, we solved you know some problems in this tutorial and again in the next tutorial we will solve some randomly selected problems. Now, we need to stop thank you.