 Okay. Welcome to your revision session session. Today we're going to do a revision of study unit 10 and study unit 11 using assignment 5, exam paper, assignment question. So when we talk about study unit 5, study unit 10, study unit 10, we're talking about chi-square and we need to know that with chi-square also because it's a hypothesis testing, you need to know the six steps. You need to know how to state your null hypothesis and your alternative hypothesis. We know that in the null hypothesis we say the categories are independent and your alternative will state that they are dependent because chi-square tests the relationship between two categorical variables. So here we test the relationship between two categorical variables. With step number two, you will state what you are given in relation to your n and your alpha, the level of significance. And step number three, you will need to find the critical value and the critical value for chi-square. We find it using the degrees of freedom and your level of significance. And your degrees of freedom is n minus 1. Step number four, you need to be able to calculate your chi-square statistic, which is the sum of your observed value minus your expected squared divided by your expected. My sum must be on all of them, not only at the top, but at all of them. So it means you need to be able to calculate your expected value and your expected value is calculated by using the row total times the column total divided by n. And once you have calculated your row total and your column total, then you can make your decision and making your decision, you're going to use your critical value. Because this is chi-square, with chi-square, we only have one origin of rejection and you will use your chi-square critical value, which you would have calculated by using the degrees of or found on the table using your degrees of freedom and alpha. And when it falls in this area, we reject the null hypothesis. And then step number six, then you make your conclusion and then you conclude. So you need to know all these steps in order for you to do the chi-square test. So with the chi-square test as well, what you also need to remember is if you use a table, yes? Decrees of freedom, don't we use row minus one times column minus one? Thank you very much, you are right. You are correct. So the degrees of freedom, we use number of rows minus one times number of columns minus one. Thank you for that reminder. Oh, my mind today is not waking properly. Okay, so when you are given column one and column two and row one and row two, if you're given a contingency table without the totals, you need to quickly calculate the totals because you will require the totals to calculate your expected values. And that's in a nutshell your chi-square test. So you must also remember to use your table. So you must go to where it says critical values of chi, that table five. And on the table, remember we don't use those values at the top. We only use the values closer to the table. So we only use those probability values or the upper tail area level of significance values. Okay, now let's answer the questions. Don't forget also we do have, the template, I'm not sure, but in your module, you don't need a invigilator or iris. So you can use the templates if you want when you answer certain questions as well. So you remember the template already? You look for the template with the type of questions that they gave you. So if it's a two by four contingency table, it means two rows, four columns, then you can use this way. This will give you your expected value. You can just answer your questions there. And also the other thing that you need to know is the test statistic, which is the chi-square test. And we just look at which table you can use for that. And then you just use it. Okay, so let's then look at the questions. Remember, the questions combine theory and calculations. You need to know both. Which one of the following statements, which one of the following statements is incorrect about the chi-square test of independence between two variables? Number A, the test statistic has r minus one, c minus one degrees of freedom where r is the number of rows and c is the number of columns. Remember, we're looking for the incorrect statement. The observed frequency for each cell is equals to the row total times the column total divided by n, where n is the sample size. Two variables are qualitative. The null hypothesis is that the two variables are independent of each other. If the observed and the expected frequency for each cell are equal, then the test statistic will be zero. Which one of the following statements is incorrect? A, B, C, D, or E? It's B. B is the incorrect statement because the test statistic, as we just did, the degrees of freedom is number of rows minus number of columns. C, we also spoke, the two variables has to be categorical variables or if we don't say categorical, we can also refer to them as qualitative variables, categorical variables. The null hypothesis, we also stated that null hypothesis always says independent. Two variables are independent of each other and it says the expected frequency and the observed if they are equals to zero, the test statistic will be equals to zero. That will be true because if you say zero minus three, you will get zero at the top. Therefore, you cannot divide any number by zero. So you just, this will be equals to zero. So B is the incorrect answer because B refers to the expected frequencies. In a contingency table with nine rows and two columns, how many degrees of freedom are there for a chi-square test of independence? Number of rows minus one, number of columns minus one. It's eight. It's eight. How did we get eight? Rows nine minus one, columns two minus one, nine minus one is eight times one, which is equals to eight, which the answer is C. Now, consider a contingency table with nine rows and two columns. So we already did that. If we need to find the degrees of freedom, we know that that is eight. In a chi-square test of independence, if the level of significance is one percent, what would be the critical value? So alpha of zero comma zero one. So it means you need to go to the table and look for the degrees of freedom of eight and alpha of zero comma zero one, where they both meet, that value there will be your critical value. Remember degrees of freedom of eight and alpha of zero comma zero one. Do you have an answer? 20 comma zero nine zero and zero comma zero one, 20 comma zero comma, 20 comma zero nine zero, 20 comma zero nine zero. And that is B is the answer. Given a two by two, two by three contingency table containing observed and expected frequencies in bracket from a sample of 369, calculate the test statistic required to test the independence of the rows and columns. So what they want you to do is calculate the chi-square stat, which is the sum of your observed minus your expected square divided by your expected. Now it means if you're going to calculate this manually, you will say 55 minus 42.54 square divided by 42.54. And you have to do for all of them plus all of them until you get to 100 minus 93.43 square divided by 93.43. So if you want to do it manually, you can do that. Otherwise you can go to the template. We're doing a two by three contingency table. So it means we're going to use the second one. So we'll just make it bigger in a smaller day so that I can see the table. We only capture the values that are not in the bracket. So it's 55. I'm not going to change my title to ABC. It's 40 and 79. And this one is 34. This is 500. Can just double check if my value of my expected values are the same as what they have. 45.5, 54, 48.8, 45.89, and 85.7. So they match exactly as they have. So I'm going to assume that everything captured correctly. So the answer we're looking for is a test statistic. So we should be seeing 9.40. Let's see if we can find that. This is our test statistic. 9.40. If we want to make it two decimals like they have, you can also by using the decimals to get to the same. So the answer is option D. Are you winning? Are you happy? Are you winning? Are you pashasha? Are you grand? Consider the following contingency table between the distance from school and the school level. What is expected frequency of high school learner? High school learner traveling between three kilometers and six kilometers from home. So between for high school learners, we need to calculate the expected frequency of that. So expected frequency of three to six kilometer and high school will be given by row total times column total divide by n. Now you don't have the totals here, so you need to calculate the totals in order for you to be able to get that total. So you must add all of them to get the total, add all of them to get the total there, and add all of them to get the total there, the total there in the total. And either you add those total or you add this total so that you can find your n, which will be your grand total. And then we're going to use 120 divided by the grand total. So since it works almost the same, we're going to use the same table that we had previously. The two by three, we go there, we'll use the two by three table to capture the values. So I have 115 and my pizza, my pizza is between. So this is where I will find the between. So 115, 210, 315 and 75 and 120 and 165. So there are 1000 of them. So we can just say what is the total there? I know that when I explain, I use the wrong example. So the total is 380. 380 is 360. So 380, 360. So in order, the raw total is 360 or divide by 1000 times column total of 330 divided by 1000. And that should give you the answer. And in this instance, the answer will be, do you get the same answer? The answer is 118 would be the answer. So let's see. So because they rounded off to a whole number, so what did we get? So since it's a whole number, we can also round off to a whole number or an integer, which is 119. So the answer 119. So that will be option number E. So now they say the test statistic for the test of independence from question five has been calculated as 1,56. So it means they calculated the chi square state and they found that the answer they got was 1,56. If the level of significance alpha is 10%, so alpha is 0,10. Which one of the following statement about the decision is correct? So it means we need to go find the degrees of freedom, which is number of row minus one times number of columns minus one. How many rows do we have? One, two rows minus one. How many columns do we have? One, two, three columns minus one. Two minus one is one. Three minus one is two. So one times two is equals to two. So we're going to go find the critical value so that we can make a decision. We don't want to go down too far. So we need to go find what this value will be for creating that region of rejection. We know what the degrees of freedom is, is two. We know what alpha is, is 0,10. So 0,10 and two. What is our critical value? 4,605. Our critical value is 4,605, is it? 605, 605. Now take the test statistic. Where does it fall? Because if it falls here, we're going to reject the null hypothesis. If it falls here, we do not reject the null hypothesis. Where does 1,564? Do not reject. It falls and they do not reject. Therefore it means then the null hypothesis is true. So now let's make a decision. Number eight says, sorry, number A says the null hypothesis is that the distance from school and school level are not independent. So what are we looking for? The correct statement. Which one of the statements about the decision and the hypothesis is correct? So number A says it's not. The test statistic is less than the critical value. We therefore reject the null hypothesis. Number C, the test statistic is greater than the critical value. We therefore do not reject the null hypothesis. Number D, the test statistic is less than the critical value. Therefore we do not reject the null hypothesis. Number E, the test statistic is greater than the critical value. Therefore we reject the null hypothesis. A, B, C, D or E. D is the only correct answer because D says the test statistic is less than the critical value because our critical value is 4.60. And our test statistic is 1.56. And then we say we do not reject the null hypothesis. This one said less, but we reject. So that is why that part will not be correct. And this statement says it's not independent. Null hypothesis is independent. Number C, the test statistic is greater, therefore it's wrong. And number E, the test statistic is greater, therefore it's wrong. The only thing is less and we do not reject. Correct. E is the answer. Now, oh, we're done. We're done. We're done with it. With chi-squat, now we move on to the regression. So now let's recap on regression. So a regression, we test the relationship. When chi-squat is testing the relationship between two categorical variables, the regression tests the relationship between two numerical, or what we call quantitative variables. Quantitative variables. So it means with regression, you will be given the values of your independent and your dependent variable, your independent variable, independent variable X, and your dependent variable Y. And you can also plot that relationship on a scatter plot. And when you look at that relationship, you can check the relationship of this by calculating what we call the coefficient of correlation, which is R, which can be explained in terms of the strength and also the direction. So the direction will be either a positive or a negative, and the strength can either be a strong, a weak, a moderate, or a perfect relation or correlation. So if I have this relationship, it can be R is equals to 80%. Or we can call it R is equals to 0, 0, 8, which then says R is a positive strong relationship, or if there is a positive correlation between the two variables X and Y. And that is correlation with correlation of correlation coefficient. We can also, from the correlation coefficient, we can find the variability of all the variation or the total variation that I explained in the values of Y by the values of X, which then we call that the coefficient of determination. And the coefficient of determination, it talks about the total variation or the sum variation or some variation in Y as attributed by some variation variation in X. So for example, if my R is 80 and my coefficient of determination, which is my R squared, will be 80 times 80, which is 0.08 squared. Let's get there. 0.8 squared is 0.64, which is 0.64. And when we interpret that, we can say 0.64 variation in Y is attributed by the variation in X. And what also you need to know is your value of R is between negative one and one. And you should be able to explain this because the value that is equals to minus one or one, it's a perfect relationship. The value between 99 and let's say 60 or 50 below 50, we can call that a moderate, a strong relationship. And anything between 50 and 40, you can call that a moderate. And then anything less than that, we can say it's got a weak relationship. And we know that when R is equals to zero, we say there is no relationship. My pen doesn't want to write. When R is equals to zero, we say there is no relationship. And that is in relation to R. In relation to R squared, the value of R lies between zero and one. Okay, what else you need to know is in terms of this correlation coefficient as well, there is a regression line that you can draw, which also gives you the relationship between the values of X and Y. And this regression is Y is equals to be zero plus B1 X, where B0 is your intercept and B1 is your slope. This relationship can be determined by that. Where B0 is your intercept and B1 is your slope, the intercept just gives you the average estimate value. This gives you the average estimate value and it is where X is equals to zero. So when the value of X is equals to zero, then the estimate will just be the value that we are estimating will just be equals to B0. The slope tells you the change in the values of Y as a result of the change in the values of X, which the sign in front can also tell you whether that relationship, there is an increase. There will be an increase when it's a plus or there will be a decrease when it's a negative when you interpret. So this is the only measure you are able to interpret. And the X is the value that you can use to estimate a new value. So we use this to estimate a new value. If we have a new value that we want to estimate, we just substitute it in the value of X and calculate. What else you need to know about this? You should be able to know how to calculate the R value and the slope, your Y intercept and be able to form the equation of a straight line or the regression line. By using the formula, I'm not going to write the formulas down. I only remember one formula by heart, but I can also write all of them anyway. So B1, which is the slope, you can find it by finding I'm going to use n multiplied by X times Y minus the sum of X times the sum of Y divided by, I think here it's where the challenge is with these formulas. I'm not going to memorize all the formulas by heart. We can just check the formulas. So the slope, so I just took it there where I was trying to do that. I was just multiplying this value here by n and removing the n here because it means one and the same thing anyway. So that will be the sum of X squared. So then we do the sum of X squared minus the sum of X times squared, not the line. This line doesn't exist. The summation of X squared divided by n or in state of dividing by n there, we can then just multiply by n there, which means one and the same thing. So that is the formula to find the slope of the straight line. Then we also need the mean of X, the mean of Y, and to calculate the Bo or the Y intercept, we use the mean of Y minus the mean, the slope times the mean of X. And we just substitute the intercept and the slope into the formula to get the regression line. What you also need to remember, you can use this template, but you need to be able to know how to substitute the values. On this template, it calculates the sum of X and Y, the sum of X squared and the sum of Y squared in case they ask you the questions relating to that. So you should be able to calculate manually or by using the templates as well. To calculate the correlation of coefficient, the formula is almost similar to the slope, but the formula is there. n times the sum of X and Y minus the sum of X times the sum of Y divided by the square root of the standard deviation of X and the standard deviation of Y. And that will give you R and R squared is just the square of your R value. And if they give you the sum square measures and they ask you to use those sum square measures to calculate either the total variation SSR or the sum squared, you need to know the formula that R squared also can be calculated using the sum square measures, which is your sum square measures of regression divided by the total sum square measures, which will give you your R squared. You should be able to use the formulas as well. So if they give you the sum square measures and they ask you to calculate your total variation or your R squared, which is the coefficient of determination or the common variance, then you can just use them. Okay, so let's look at the questions. Which one of the following statement is incorrect about some of the concepts of linear regression? A correlation coefficient of zero comma one indicates a weak positive linear relationship between two variables. The least square method estimate the regression equation by minimizing the error sum of squares SS in. Okay. The coefficient, the correlation coefficient always takes on the values between zero and one. The coefficient of determination can be interpreted as the percentage of the total sum squares that can be explained using the estimated regression equation number E. The slope of the slope and the intercept of the estimated regression equation are estimated using the least square method, which one of these statements is incorrect? A, B, C, D, O, E. C. Yes, C is incorrect because the coefficient of correlation takes the values between minus one and one. And zero comma one represents a weak positive relationship because it's positive and it's zero comma one. And the least square method estimates the regression equation by minimizing the errors of the error sum square measure. It's some way. I know we didn't touch this. When we do the equation of the regression line actually, we also, at some point, we exclude the fact that we have the errors. And those errors are the things that we want to always minimize because if your errors are big enough, then your regression line might not be well enough to estimate your value. So we always try to make sure that these errors come closer to zero or one or zero, most of the time closer to zero. That is why we do not include them in the equation, in the regression line. So because we need to always, when we do regression, to try and minimize those errors, those residual errors as well. So that makes this question correct. D, we know that D, it tells you about, it talks about the coefficient of determination and we know that those total regression line is explained by that because we know that it is your SSR divided by SS total, meaning it explained by the regression equation. And the slope and the intercept, we use them to estimate the value of your regression and because it uses the least square measures as well. So C is the only incorrect answer, which is the correct answer, but it is the incorrect answer. Part eight, nothing. Oh, there it is. Okay, consider this data plot, this information. Or what we didn't do in the template is to create this data plot. But anyway, consider this data plot and develop, or consider the sample data below and develop a scatter plot. Which one of these scatter plot A to F best describe the data? Scatter plot A, scatter plot D, B, C, E, C, F. Okay. So now we need to look at this data. Our data, it says 23 and 25, which makes life difficult now. 23 and 25. 23. So I'm just going to assume that this one is the one we're talking about. 32 and 31. 32 and 31. 28. And it's going to be very difficult to do this. Miss Liz, I have a suggestion. In order to do this, I think if you can go through the graph, some of them end like at 40 something. And when you look at your data, you don't have 40s. You have like your 32 as your max. So you can start by looking at it. So that's a good idea. So let's look at the X values. Our highest value on the X is 32. And our lowest is 22. So X, I'm going to assume that is 32. And so I'm going to assume that those, oh, sorry, those ones are right. And yeah, it says X of 40. We know that we don't have an X of 40. So that is not correct. Also, yeah, we have X of 40. That is not correct. That won't be correct. And this will say 50. And this says 35. So did we have an X of 35? So probably we only have one if we do it that way then. Miss Liz, your answer is A. The answer is A. Scatter plot A. Yes. So the answer will be scatter plot A because if you look at all these graphs, by just looking at the suggestion made, we don't have X of 50 because I just went and looked at where the last point is on all of them. And this also ends at 35. And we don't have anything with 35 as part of the X values. Okay. So you can also do the same with the Y values because then the Y values, they also go above and beyond. The highest value is 31 is 31. So the highest value should be 31. The graph should not pass 31. There shouldn't be any value above 31. And the value that corresponds to 31 is 32. So this one, we can see that 31, even if it's not, it's below the age. So it's not the highest value because it's the only value there. So that one will make it incorrect. And this ends at 60. This ends at 25. We know that the highest value is 31. So the only graph is plot A. So let's see A, B, C, D, E, F. It says the correct one is plot E. So that would be right. The answer is E for plot A. Oh, wait. Okay. So these are A plot D, B plot F. Okay. Yeah. Okay. E is plot A. So you also need to be very careful there because E is F. These examiners are agents. E is A for plot, scatter plot A. So the answer is E. The answer is E. Okay. You are tasked with investigating the relationship between Y and X of primary and high school learners using the simple linear regression, which one of the following statements about the investigation is incorrect. So they say the reading speed is Y and the age is X. We know that this will be your independent and this is where you have your dependent. So if this is my independent, therefore it means it's Y. This is my sorry, X and dependent is Y. So which one of the following statement will be incorrect for or for those statements? So number one says the dependent variable is H. B, the estimated regression equation will take a form. Reading speed because what are we estimating? Reading the independent. So this will be the value that we are estimating, the Y value. So reading speed will be given by your B1 times your H plus B0 where B1 is your slope and B0 is your intersect. C, it says reading speed is a quantitative discrete variable. D, if the correlation coefficient is negative, then the negative linear relationship between then there is a negative linear relationship between the speed and age. So which of this following statement is incorrect? A, D, C, or D. A is incorrect because they say H is X and X should be independent variable and the estimated value B0 plus B1 X. We know that X will be equivalent to the value of H that we want to estimate and we know that the reading speed which is the Y, it should be a quantitative value. Whether it's discrete or continuous really does not really matter much because we know that we are looking at quantitative values and if the correlation is negative, what do we say when it's negative relationship? Then we can interpret to say there is a negative linear relationship that exists between the coefficient of, sorry, between the reading speed and age where R is negative. A is the only incorrect answer. Third, the age in years of reading speed and reading speed in weights per minute of learners at Kiba Middle School in Hama Baydy is given below together with the quantities number one to five. Which one of the quantity one to five is incorrect? Now they want you to check whether those values are correct or incorrect. So now we can bring in the template as well. Let me see, on the template we didn't have the Y. So we only have one in this instance. Okay, but it's fine. We can find out. Let's quickly include this. How many values are there? Let's count the number one, two, three, four, five, six, seven, eight, nine, 10. There are 10. So it's a drop and a half, 10. So I have six. So I can just add, I'm just gonna add and set down. Okay, let's drop it off. 178, 155, 207, 208, 180, 224, 96. Okay. Now we can answer the questions one by one and see which one is. Okay. So the first one says the sum of X is 113. That's correct because that is 113. Number two says the sum of X minus the mean squared. Oh, we don't have that. Don't worry. We can calculate that quickly. So I'm just gonna add a column there inside and I know that I'm breaking some of the things here. What am I breaking here? I'm not gonna break a lot. We can always fix that inside. So what do we need to do here? Sorry, Lizzie. The mean of the X, isn't it in that right block next to this table? It gives you the Y and the X. No, I only have it here. We're not looking for the mean. I need to calculate this. The mean minus that's quick. So I do know that I have the mean here. It's 113. So I'll come and reference this mean. I'll just show you just now. So yeah, we want to calculate X minus. I'm just gonna call it the mean. So we're going to say this value because we're looking for the answer below that value minus and mind you because the mean doesn't change. So I'm going to use the dollar sign and go to the column where the mean is at. Sorry, because of my dollar sign. I need to click on there so that back. I'm not gonna use the dollar sign now. I'll show you how to use the dollar sign later on. 10 minus. I need that mean because it needs to stay constant and then I press equal. So if I go here, it's the X value minus the mean. But because for all the columns, my mean needs to refer to this. So that will be dollar there and then a dollar before the number as well. And the dollar before the number enter. But this is X minus the mean and it's great. So I still need to put an open bracket in front of all of them and close the bracket and then use the power as the copy that raised the values to two and then press equal. If you don't want to do that, you can just take the same value and multiply it again by itself by taking everything that is included in the bracket and putting it that way. You will still get the same answer. So if I drag this to the bottom, so yeah, it should still do the same. There may be probably I must take it back to the power. Let's use the power of two and then the other thing they said this is the sum. So I just need to do the sum. The sum on your, if you go to your home, there is the total summation, you just press the summation and press enter that will do the sum or you can drag from the previous one to the it will also do the same. So the sum, the sum of x minus the mean squared is 28.10. So my number to two decimal, which is correct. So that is also correct. Number two, gosh, it says the mean of your x times the mean, sorry, the sum of the mean of minus the difference of x and the mean and times the y minus the mean of y should give you 511. So we can also do the same in search. Can we maybe do the manual calculations as well of the previous one? Yes, you can do the manual calculations. So you can take all these values, add them together, find the total and you have the total there. So the first one will be to calculate the mean. So the mean the mean of x is the sum of x divided by n. So you'll have to take the total of x and divide by how many there are. So it's 113 divided by 10, which will give you 11.3. Then what you need to do is to calculate to get this value, you will have to say 10 minus 1 comma 3 squared plus 15 minus 11 comma 3 squared plus until you do all of them until you get to 10 minus 11 comma 3 squared. And that will give you the answer that you're looking for there. The same way with the y, you need to go calculate for this one. Because if you have the answer to this one, remember this answer is not that answer. It's different, two different answers. So you'll have to also go back and calculate the same thing again manually, like for this one. By not using the square, just doing the same thing, but not using the square. And then do the same with the y. Calculate the sum of y divided by n and then calculate that. So because I need to get the sum of y. So this one should actually say this is the square. I must put this into a bracket and put here in front. This is the square of that. So if I want to do only the x and the mean, and that will be the same information that we have here without the square, what is inside and put the equal sign in front. So this should be the mean x minus the mean because my mean, okay, let's go back there, because my mean is 113. So without squaring the values and then taking this to the other side. Then I also need to add for y insert. So I must do y minus the mean. So yeah, we do y. So it's equals to the value of y minus the mean of y, which is that column. And I put the dollar sign in front and the dollar sign right in the middle. And that should give me y minus the mean, y minus the mean of y. And I can just do for all of them. And this will be equal, very surprising. Let's just add them together and see. That's zero. And that is zero because what we're saying is that value of y, what is the mean of y? I didn't even check that. Yes, it's 204. So we're saying this minus 204 is minus 26, 274 minus 204 is 70. So when we add all of them together, when we add all of them, the answer is zero. So therefore it means for this one, it says zero and zero. So the sum of, this is where it gets complicated now. This says the sum of x1 times the mean times y times that. Oh, gosh. Let's do that. Probably that's not the sum that we're looking for there at the end. We can do it here in set again. So we just need to take this multiply by that. Take the sum of all the values. So because the equation says, let's go to the equation. It says x minus 1 times the x minus 1 mean x minus the mean of x times y minus the mean of y. So we go to the next. So we're going to add. So we do 10 minus the mean of y is 11 comma 3 times 178 minus the mean of y, which is 204 plus. We go to the next one. 15 minus 113 times 274 minus 204 plus. And then you do for all the values. That is number three, which we find that it is 11115. So this what we did here, we said this is x minus the mean of x times y minus the mean of y. And that is what we did with those two columns there. So far, that is also correct. That is also correct. Now it also says b0 is 6.91. So let's go find if b0. So now we can come to our template and look for b0. So b0 is minus b0 is minus 1.491. What do they have? b0 is 6.9. So that is incorrect. So we know that that is correct. That is correct. That is correct. And that is incorrect. And b1 is 18.19. So b1 18.18, which is 19, because it's 85. That will be 18.19, which is correct. So 1, 2, 3, and 5. So only number 4, 1, 2, 3, 4 is incorrect. So let's go to number 4. Let's go and answer the question. Only 4 is incorrect. Only that is incorrect. So when you calculate manually, it's going to take you also even longer, because you see all this calculation. This is not as straightforward. So you have to take every value of x subtracted from x, from the mean of x, and square the answer at the next one, square the answer, do the next one, square the answer until you get to the end, all of them, and then add them together. That will give you that option. This one, you have to do the same thing without the square. Take the value minus the mean. So this we found that is 204 minus the mean of x times this value, which is 178 minus the mean of y, which is 20.4, which is 20.4. Did we find it 20.4 or 204? It's 204, 204 minus 204 minus 204. And then you do the same 15 minus that minus 113 times 274 minus 204 plus 9 minus 11.3 times 155 minus 204. Then you add all of those things, you will get 115. And that's what we did. Oh, that's what I did on this spreadsheet here. Okay, which also changed my spreadsheet a little bit. Anyway, moving on, hopefully in the exam, they're not going to give you complex questions like this because you're only going to get two questions from the regression. Oh, one question, depending. Attached, I don't see any attached. So probably I'm going to assume that this was the one that was attached. The following regression estimates the reading speed, which is y, of learners at Zekias Malaza Secondary School in Emmerle Cleaning as a function of their age, which is x. So we have the regression line, which they wrote in the other way, which is y hat is equals to b1x plus b0. Even if they don't write it the way we know it, like y is equals to b0 plus b1x, you must always know that b1 is the value next to the x, which is the slope. Calculate the estimate reading speed of a 15-year learner, which is the age, the new age of a learner. So it means you must take your equation. Let's rewrite that equation here, not in this manner, the way they wrote it anyway. We can just use that. y is equals to 10.4x plus 107.9. So what you need to do is 10.4 times 15 plus 107.9. And that's 263.9. 263.9. And the answer is in an integer, so therefore it is 264. So number a should be the correct answer. The reading speed estimated equation of Hofmeyer is given by that with the correlation coefficient of, which is r of 0,68. Which one of the following statement is incorrect? A, a 17-year-old learner is expected to have the reading speed of 298. Therefore here they're asking you to go and do the following. Reading speed of equals to 13.1 times, because the age, they say 17, so times 17 plus 61.9, you need to find what that reading speed is. Does it be equals to 298? We're looking for the incorrect one. There is, I'm gonna go through all of them and then we can, we can do a feedback. So there is a moderate positive relationship between the learner's reading and speed and age. A, a increase in the learner's age increases in the reading speed of 18.1. A 47.1 total variation of the reading can be explained by the variation in age. So here they're asking you to do r squared, which is 0,686 squared. So what did you find here as an answer? Let's see, do we have, no, we only have, I can't do d. So what was the answer? 284.6. 284.6. So therefore, this is, we're looking for the incorrect one, so that is the incorrect statement. And in interpreting r squared is between 0.6, so it is a moderate positive, moderate and it's positive. So it's moderate positive relationship. That's correct. An increase in the learner will increase by that. So what is our slope? Remember the slope tells us when the one value increase, it will increase by how much? So an increase in the learner's reading age or in the learner's age will increase by 18.1. So if there is an increase, which is one unit increase, it will just be because if I multiply this by one, so the learner's average reading speed will increase by 18.1. So that is just the explanation of this slope. And calculate r squared. Do you get 47.1, which in your case, you will have 0.471. Do you get that? Yes. Yes. If you multiply that by 100, it should give you 47.1, which is your r squared, which is coefficient of determination, which tells you that the total variation in the reading speed can be attributed or be explained by the variation in the age of the children. So that is correct. A is the incorrect one. 18, which I assume that will be the last question. Consider the observed and the predicted. So observed and predicted reading speed for Kiba. Use the given information to calculate the sum square error june. Now the SSE, so you can use the template. Let's go to the template. We can use the template because what they're asking you to calculate is this SSE formula is that the sum of your observed value minus your estimated value. That's what they are asking you to calculate. So let me do and do and do and do. So since they gave us the new values, what I can do is how many values I need to count the number of values that are there. For some reason my desktop is missing in action. So one, two, three, four, five, six, seven, eight, nine. One, two, three, four, five, six, seven, eight, nine, 10. So we have 10. So what they're asking you to do is the sum of y minus the estimate. That is SSE. So when you get to the bottom, you just add the total and that will be your answer. Yeah. That you are looking for. So you say that minus that, that minus that, that minus that, write your answers there, there, there, there, and then add the answers. I will do this on the template as well. Use the templates. So we're adding 10 values. So I have one, two, three, four, five, six, I have six. So I need to add four, one, two, three, four, five, six. Since we added new columns onto this, I need to also include them. So up to the insert. Now, actually we have all the x values because we're using the same y values at the same y values. I just need to just substitute them. Okay. I can also even include the y values, the x values because they are the same. I've just double checked. They're the same. So the only thing that they gave, which I'm not going to use is this because now they have given us the new y values. So I'm just going to write the new y values. No, I'm not going to write the new y values because they calculated them and probably this are the same. So you can use the template as well. So that is 183628. So already they are calculated there on a template. So the template does the work. So the only thing I need to fix is the gaps in between all these other values. We already have the answer to the SSE. The answer is 2633. Let me open this bigger. I'm going to hide all these values in between, which are the ones that we just saw. So those who are calculating manually, do you have an answer? What answer do you get there? At the bottom, we have 2633 from that. That's the answer. So we can round it off 2633, which is that answer there, which is D. I'm just going to unhide. I'm just going to do this. Okay. I didn't break anything except the SSR squared. We still have everything working. Now check the coefficient of correlation. There we go. Okay. Any questions? Any question? No questions? Then we're done for the day. Any? Lizzie, what was the answer? 26. Okay. If you use the Excel add-in for the analysis, where do you get that 2633? Which Excel add-in? The one for regression. No, you won't find it. It's a manual calculation that you have to do because the SSE is the errors. Those are the errors. So it says they are made up by calculating your observed values, which is 178 minus the estimated value because you have to use the formula to estimate. And estimating that value, we use the normal regression formula, which is our regression formula multiplied by x. And then x is 10. We use the actual x value. And because on the assignment, actually they gave you your y. So yeah, they gave you y and y heads, which on the Excel, on the Excel, if you take your y values, the same y value, x and y value there, if you take them and substitute them at the bottom on the template, if you substitute them, yeah, they will calculate the SSEs and SSR because they are there on the template. You just use that and you just adjust your calculations. You will put this Excel on the group, right? I will just send you this one on the WhatsApp group. Let's save this. We'll call it Lizzie Regression Model Version 2. I will also post it on my private site as well, so that those who don't have my UNISA on my group, then they can also get it. I will also email it through. Okay, so that concludes today's session. Please, you can have fun on a Saturday, 20 minutes before time. Lizzie? Yes. What session or what work are we going to do on Wednesday? No way. No session. We can discuss that. Let's discuss it in this next 20 minutes. Let me stop the recording.