 Yeah, welcome to our final session of revision in terms of using your assignment questions because today we'll be doing assignment five and then we will be done. And the following week, or not the following week on Sunday, we can go through the mock paper. We can do session one, the first half and then session two, the second half. We'll see how it goes. Like with the previous sessions, since the downloaded assessment, the preview of the assessment download sometimes doesn't have all the questions. We are just going to look for alternative questions from last year's assignment five question. Yeah, so without wasting time, let's get to it. You always remember, you need to remember that assignment five was chi-squared and regression and this are the last two chapters we just did, all study units we did. So it should still be fresh in your mind as we go through this. Which one of the following statement is incorrect about the chi-squared test of independence? Number eight, the alternative hypothesis is that two variables are independent of each other. Number two, the expected frequency for each cell is equals to the row total times the column total divided by the grand total or the sample space or the sample size. The test statistic has the row minus one times column minus one degrees of freedom where r is the number of rows and c is the number of columns. D, the two variables are categorical and e, if the observed and the expected frequencies for each cell are equal, then the test statistic will be equal to zero. Which one of the following statement is incorrect? Option A, that would be option A because the alternative hypothesis should always state that there are dependent with each other. So you need to know the properties of chi-squared test for independence or chi-squared test for contingency table. In a contingency table with the rows five and the columns four, how many degrees of freedoms are there for a chi-squared test of independence? Remember, your degrees of freedom is your number of rows minus one times your number of columns minus one. Option C, 12. It would be rows, there are five minus one times columns, there are four minus one. Five minus one is four times four minus one is three and the answer is okay. Because I don't have the information, we're going to skip number three. Probably we will have a question similar to that from E-Past. You see that the contingency table below to test the independence between distance from home and school level. What is the expected frequency of primary school learner traveling between three kilometers and six kilometers from home to school? Primary and between three and six. What do you need to do first? Choose the correct answer from the list below. So what is it that we need to do here? We need to calculate the total. What is the total for between three kilometers and six kilometers? It's 313. 313. Oh, we can all actually do the total for the whole table if you want. What is the total for 115 and 75 and 315 and 165. 418. 115 plus 210 plus 315. 640. 610 foot and 75 plus 120 plus 165. 360. 640 plus 360. 1000. 1000. So now let's answer this question. We need a frequency of primary and between. Which is the raw total multiplied by column total divided by N. What is the raw total for primary and for primary is 640 multiplied by the column total of 313 divided by 1000. What is the answer? It's B, but the answer is 211.2. Like that? Yes. I don't know whether they would have asked for that. Hold on. I just want to check some of these questions are similar to the last year's questions. We can use this one. Given a, given the following two by three contingency table contains the observed frequencies and expected frequencies in bracket from a sample of 364 observation. Calculate the high square test statistics required to test the independence of raw and the variables. Choose the correct answer. Now in the exam, in order for you to save time because there are one, two, three, six, there are six. Remember, the test statistic is calculated by means of the sum of your observed minus your expected squared divided by your expected, right? In the exam, you can use your template. This is a, let's go back to the template. It's a two by three contingency table. So you go to the two by three contingency table that the table can be bigger and put it right on the corner. So here is our two rows, three columns. All I just need to do, change the values to P to Q, A, C. I don't have to worry about the values in the brackets because they will get, they will be calculated at the bottom. I need to remove all those and put in 55, 40, 79, from there before, 56. And you will see that you need to put it to two decimal but it looks exactly the same as the rest of them. You can see that these values are the same as the values that are on here. If I put them to two decimals, all of them, they will be the same. So given this information, they just want you to calculate the test statistic. And the test statistic is just the value here at the bottom. She's 9.40, which is option D. So you can use the template. Remember, it's an open book. So you can use a template for your question. But we need to know how to use the template. If you are not using the template, then we're going to... You need to have patience and work faster because you will need to save 55 minus 45.54. Take the square of that, divide everything by 45. 45.54. Plus, then you go and do 40 minus 45.89. And you take the square and you divide by 45.89. Plus, until you do all the values up until you get 200 minus 93.43 squared. Divide by 93.43. And the answer when you're done, it should give you the same as 9.40. But please, on this question that we have, we can do the same. For answer this, we can do the same. Instead of using the 55, we can use 75. Let's just remove all of them. 75. 84. 60. 50. 88. And 41. And the answer is 6.3 from the data that we have. And that's how you will do the calculations. This one, same as what we had previously, they told us that in question 5, you would have answered the same question like this. And they already calculated your test statistics. So you don't have to go and calculate it again. And they are telling you that if alpha of 10%, which one of the following statement about the conclusion will be corrected. So what you need to do, because the decision, let's start there. The decision that you need to make is based on if your test statistics, if it's greater than your critical value, then you reject the null hypothesis. That is the decision rule. So we need to go find the critical value. Finding the critical value, your critical value will be alpha and the degrees of freedom. So what is our degrees of freedom? The row minus one times column minus one. So how many rows do we have? We only have two rows. How many columns do we have? One, two, three columns. And that will be two minus one is one. Three minus one is two. One times two is two. So we do have our degrees of freedom. So we need to go find of zero comma one zero and the degrees of freedom of two. For sure, I didn't feel ease of sharing because I don't want you to see my entire the table. You need, when you go write the exam, right? Please make sure that you have everything you need in front of you. Because you won't have the chance to stand up and move around and go somewhere. Make sure you have two pens, two pencils, a ruler, whatever you will need, a working calculator, papers, everything. You can only stand up once you're done with one session. Okay, so it's open. I'm just going to share again my entire screen so that then we can kick off. Okay, so, well, we need to go to Chi square first. It's a critical values of Chi and there is critical values of Chi. We need the alpha value and two where they both meet. So your critical value is four comma six zero five. So four comma six zero five. We got that. Now we need to make a decision based on that. So our test statistic is one comma five seven. So if we go into make a decision, you can also for a Chi square test, you can also draw for yourself because the Chi square actually it is a left skew test if you don't know. It's a right skew test, sorry. So it means if you don't know how Chi square test look, it will be like this. So there is where you go. Always your critical will be on the upper side. It's always a one tail. It is a two tail test, but for decision we only use one tail. So I'm just going to draw it like that. I'm not good with drawings. So bear with me. As I try to draw your critical value was four comma six zero five. Anything that falls in the shaded area, we reject the null hypothesis. So our one point five, six falls in there do not reject. So therefore we do not reject the null hypothesis. I'm already giving you the answers. Okay, which one of the statements we it's correct. So we're looking for the correct statement of already giving you some of the things that you need. Okay, so let's see statement number one says we do not reject the null hypothesis. And we can conclude that the distance from school and school level are independent of each other. Number B, we do not reject the null hypothesis and conclude that the distance from school and school level are not independent of each other. Number C, we reject the null hypothesis and conclude that the distance from school and school level are independent of each other. Number D, we reject the null hypothesis and conclude that the distance from school and school level are independent. Number E, the alternative hypothesis that the distance from school and school level are independent of each other. A, B, C, D, O, E. H naught says independent, H, A says which one of the statements will be correct? I've already gave you a hint in terms of how you get to the answer. I've given you a partial answer, so which one? I think the answer is between A and B. A, A, A, A, A, A, A, A, because we're not rejecting the NALA hypothesis, therefore it means that the distance from school and school level, they are independent. Yes, it's A. Let me see if we do have, so now we're moving into the definition. If we do have a question like this, I know that we should have a question similar to that. There we go. I already gave you the answer. If you were so quick to look at the options, you should know which one is the answer. Which one of the following statement is incorrect about some of the concepts of simple linear regression? Let me put it this way because it's going to be different. Look and feel. So we're looking for the incorrect. I'm hoping, I'm hoping that it's looking for the incorrect statement. Okay, so number A, the least square method estimates the regression equation by maximizing the error of the sum square, which is the SSE. Number B, the coefficient of determination gives an indication of how well the estimated regression equation fits the data. Number C, the coefficient of determination takes a value between 0 and 1. Number D, the correlation coefficient takes on values between negative 1 and 1. E, the correlation coefficient takes on the sign of the slope of the estimated regression. A, B, C, D, E. Which one? Is it A? Or is it B? Because C is correct because the coefficient of determination is R squared. So it will be, it can never be negative. D, it's correct because we know that it can be negative or positive. And the slope and the coefficient of correlation are also the same sign. So you can have the coefficient of correlation sign in different from the slope. Which one of the following statements is incorrect? So if this is correct and this is correct and this is correct, therefore it leaves us with the two statements. Do you know how to interpret the coefficient of determination? Remember, the coefficient of correlation tells you the strength and the direction, right? And we can say things are positively correlated or negatively correlated. What about the coefficient of determination? It has everything to do with total variation or the variation in your independent variables, independent variables can be attributed by the variation in your dependent variable. So the only incorrect statement here is B. The least square method estimates the equation by minimizing the errors of the sum squares. Because in terms of, if you can remember, your y hat is equals to B0 plus B1 and plus errors. But we normally, because we want, we try to minimize those errors, hence it's not always included in your, in your equation because of that. So number A would have been correct. What? Oh, probably what? What, what? But number B is the one that, number A is the one that is not correct. The least square method regression equation, that statement is correct estimate by minimizing the errors. This is correct. Probably on this question, they had an error with this question. This should be B, because this statement is correct. Which is different to the question that we had here. As you can see, we're looking for the incorrect statement and you can see that B is correct because this statement is correct. A correlation coefficient of 0,1 will indicate a negative correlation. C says the correlation coefficient takes a value between 0 and 1. We know that it takes a value between negative 1 and 1. So that is correct. That is correct. The coefficient of determination can be interpreted as the total or as the percentage of the total sum squares that can be explained by using the estimated regression line. And that is also correct. The slope and the intercept of the estimated regression line are estimated using the least square method. Yes, because that is the equation of yB is equals to B0 plus B1x. So the only one that is incorrect is that. And you can see on this side, they actually even got it as correct. But when it comes to this question, they stopped them around. I don't know why probably. I don't know. Okay, because this, I don't have all the values at the top. We can use last year's one. It might be that we won't find the answer on this one as well. I think there was an error somewhere. Okay, so even the consider the sample data below and develop a scatter plot, which one of the, which one of scatter plot A to F best describes the data. So you can select two points and come and look at the, and select which one will best represent the data. I can't make the screen. So if I choose 23 and 25, I need to make sure that if I'm on 20, if I'm on 20, on the x axis, the value of the x axis should correspond with the value of the y axis at 24. And I can see that we do have this point, which represent that we can use the method of elimination. So here I am on 23. It's somewhere here and 20. And this graph as well doesn't start at 25. It started 30. So this is incorrect. You can use that method of elimination as well like that. Okay, and moving to this one, we have 23 should be somewhere here and 25 should be somewhere here. Unless this point is on 25, if I draw a line like that, maybe because of my shaky hands, do you think that dot is on top of 25? No, it looks like it's above. It looks like it's above. We can put the question mark on that one. Let's do another process of elimination on this one. So we know that 23 will be somewhere here, somewhere there, because that is 30 and 25. And we can see that 25 is above. So we can eliminate as well. Coming on this side, 23 can be somewhere here. And 25 will be somewhere there. So this also will not fit the data because I'm looking at 23 and 25, right? On this one, there is no 25 there. So it cannot be. So between plot A and D, let's give that one a benefit of it out. So let's go to 32 and 31. So if I'm on 32, it should be somewhere here, right? And there are no points. 32 and 31. So I can give that one a benefit of it out and say that is 32 and 31. And so this will not be correct. So I already eliminated all of them except A. So the correct plot should be A. Let's see. So the correct plot is plot A. So coming to the answers, we say it's plot A, where is plot A? Plot A is on E. Plot A is E. Yay, that's correct. That's plot A. That's how if you get a question like this in the exam, you just use the process of elimination and get over and done with that. So especially like if you look at the answers here, you can see that your y-axis has 17 or 22 as the highest. This goes up until third. Any value above 22. So 22 is here on the y-axis, right? So it means this graph already, I can look at the y-axis, which has 22 and I can apply the method of elimination from there. So this graph doesn't have any value linking to 22 and below because there is 8. We don't have on this one, 22 at least it allows. So we can still use that. This one doesn't even start at 0. So we can also extract that one. And this one starts at 0, it starts at 0 and we can do a process of elimination like that. But the challenge with this because I don't have the x-value, so I don't know what this point corresponds with. I can find the y-values. So here it says a y-value should be at least 30. We don't have any y-value that is more than 22. So that one can go. So already we eliminated 3. Let's see. We need a y-value of 22 and a y-value of 22 is there. A y-value of 17 is there. So there should be 1, 2, 3, 17. 1, 2, 3, 17. And there should be 2, 18. So there are 2, 18. So I'm going to go with e at this point because looking at this there should be 3, 17. There is nothing there, right? So this is also not. There should be 3, 17. If these are 17, then they will be correct. But they should be 2, 18. If those are 2, 18, then it's correct. They should be 2, 10. Then I don't have the 10. Because then I've already used my 2, 13. So this won't be correct. So plot e on this question should have been the correct one. So e is on e. And you can see there. Even though I didn't have the x-value, I can still make out on the plot by thinking how to do a process of elimination on this. Okay. So going on to the next question. So this one talks about the reading h. There should be another question like that. And that is the question. So we can use this one. You are tasked with investigating the relationship between the reading speed y and hx of a primary and high school learners. Using simple linear regression, which one of the following statements about the investigation is incorrect. So we're looking for the incorrect answer here. A, the dependent variable is h. Always remember x is in the dependent. And y is your dependent. So a, the h variable is a dependent variable. B, the estimated regression equation will take a form of reading speed is equals to b1 times h plus b0, where b1 is your slope. B0 is your intercept for the estimated regression equation. Number C, the reading speed is a quantitative discrete variable. I've already gave you the answer because I'm showing the answer on this question. If the correlation coefficient is negative, then there is a negative linear relationship between reading speed and h. And the answer here would be h is a dependent variable. Let's go back to your questions from your assignment, which is almost exactly the same as what we went through. So if we are looking for the incorrect statement. So we're looking for the incorrect statement, probably the same statement that reads would have read like this. So we know that y is your dependent variable. So on your question, they say your y variable, which is the reading speed is independent. So that's what would be incorrect. Age is continuous. The coefficient of correlation is positive. Then for it means that there is a positive linear relationship. And this is the regression line. This is another question. Consider the data following the number of those following the age in years and the reading speed in words per minute of learners at Kiba Middle School in Fremont Bay. Which one of the following or which one of the calculated quantities is incorrect? You can use the template to do this. So remember, we do have this template. So now, because this template doesn't look exactly like the template you have, or the, the values you have on there, you need to know what is happening on, on yet in order for you to be able to answer your questions. So let's put it this way. Just one, only this value. In this a is the sum of x. Sum of x is the total, is the total. At the bottom way, it has the color shaded. It's where you calculate the summation or your total. So the sum of x is adding all the x values. The sum of x observation minus the mean squared. I have it here. It is that color. So this is change the color when I use this is the sum of your x observation minus your mean. No, it's not. It is not because this is the sum of x and y. Sorry, my bad. We're looking for the sum of xi minus the mean, xi minus the mean squared, right? It is this one. It is the sum of your x minus the mean squared. That is that color. This one, see the sum of x minus the mean times this y minus the mean. It is this one. The sum of x minus the mean x times y minus mean of y. It is that color. So the data that I have on here is the same as the data that we have there. I'm able to see that the correct answers. The slope, remember, once you have put in your data, it also do the calculation on here. Our slope, which is b1. It's your slope. If you don't know what slope, I do have this information at the end to tell you what you are calculating. So the slope is b1 and oh, sorry, this is the intercept b0. Intercept b0, it's minus 1.49. b1, it's 18.9. If you round it off to 2 decimal. So the incorrect statement would be and this probably is this. Hence, I still have the data on the template. It would have been the same as last year's one. Okay, cool. Happiness? Yes, thank you. Then this one, it's cut off. Do you mind if I ask a question? Yes, you can. Is it based on the one that we just went through? It's a different question based on the exam. Is that fun? Okay. Yeah. Awesome. So you have mentioned in the WhatsApp group that we are allowed to use resources. Does that include the template? Okay. I'm going to not answer that at the moment. Okay. Yeah. It would stay a little bit longer. Okay, cool. So let's do the, I think probably is the same. Yes, it is the same. The following regression equation estimates the reading speed y of LENAS and Zakias Malaza secondary school in Amalatleni as a function of their hx. And this is their regression line y is equals to 10.4x plus 107.9. Calculate the estimated reading speed for 15 year LENAS. So easy, right? Because they gave you 15, you just substitute where you see x, you put 15. So this is your x. Do the calculation and let me know what is 10.4x15 plus 117.9. I got 263.9, which is the same as 264. 263.9 and you can just round it off to 264. And that would be A. Okay, let's go back to your assignment. So on your assignment, the reading speed estimated regression equation for the LENAS at Mount View Secondary School is your reading speed of 20.2 times the h minus 60.5 with the correlation coefficient of 0.97. Which one of the following statement is incorrect? Okay, so the coefficient of determination is r squared. So you need to calculate r squared. So therefore you need to calculate 0.97 squared because it is r squared. And if it's 94, then that statement should be correct. Number B, A1 year increase in LENAS h, increase in the reading speed with the sign in front tells you the sign. If it's negative, it's a decrease. If it's positive, it will be an increase. Number C, so the sign in front of the slope, or yeah, probably the sign in front of the slope let's not use the coefficient of correlation because we're not talking about that. Number C, an 18 year old LENAS is expected to have a reading speed of that. So where you see h, you put the 18 year and solve the equation. So this is the reading speed. So your answer should be 20.2 times 18 minus 60. And that should give you, if it gives you that, then it's the answer. Number D, look at the coefficient of correlation and say the direction and the strength. So the strength and the direction of that relationship. Does it reflect that? Which one of the following statement is correct? A, B, C or D? Incorrect. Did you calculate all of them? C is correct, D is correct. I did not calculate all of them. A is also correct. A is correct. If it's going to be B, right? Yes, yes, yes, because if you look at, oh yes, I didn't read all of it. This is the problem here. The hence why it is incorrect. So this is correct. That is correct. That is correct. This would have been correct if they used 20 points because this is the definition of the slope and the slope is this value here. This is the intercept because they use the intercept there. It should be 20.2. So a one year increase in the learner's age increases the reading speed by 20.2. It should be like that. So the incorrect statement here is B. I don't know why. So when you calculated the 18 year old onto this, you do get three or three, right? Yeah. The next, I think this is the last question. Given the data, I don't know whether they were asking you to calculate what. Let's see. Okay, consider the observed and predicted reading speed for the school. Use the given equation to calculate your sum square due to error and choose the correct answer. So you need to be calculating SSE. In the exam, they will not give you a big table like this, right? Or big numbers like this because they want to make sure that you have enough time. Okay. So because this was part of the assignment, at the bottom of this template, there are some square measures like this. So they're asking you to calculate this. Now, calculating this, it means you will need to change this values because it uses the mean of Y and the mean of Y is calculated idea at the top. So it means you will have to also calculate or change the data set there so that you can have this mean of Y because in this formula, let's go back to the formula. If we click double click on the formula, you will see that it uses, oh, doesn't use the mean it uses what? Let's make a query of this. You need to make sure that oh, it uses the estimate. So it's your observed minus your estimated values. So now, but you will still need to use, you will still need to use the top part because it uses the regression line to estimate every value. So I need to have these values in order for us to calculate the estimates. The estimate, this value is here. So if I double click on this, it should give us the formula. So you can see that it goes and it takes the value of your regression line. If we go back up, you will see that it takes those values there. So it means the top part needs to also be replaced. So one, two, three, four, five, six, seven, eight, nine, 10. I have 10 values and I have 10 values. So it's, I can still use the template as it is. It's just replacing the values. So it's going to take long. I hope you are also doing it on your site, on your template 78. I'm just going to do the one line first, 274. You will look at my data and tell me if I'm putting it in wrong because I'm not watching what I do. I'm looking at the keyboard. 96 and 246 and 168. I'll check everything correctly. Then do the next one, which is a little bit tricky, 180.36, 271.28 and 180.16. I'm not doing the right one. I think you need to use commas and not full stops. I have it 271.28. Lizzie, I think you need to use commas in your numbers, not the full stop. Not the full stop. Thank you. Okay, it's my setting on my laptop. I think I've changed my English. My, yeah, it's using the USA. I must change my settings. I will change them. So 162. Oh, I changed it to South Africa, I think. I'm not sure. 36198.284.91, 216. Sorry, Lizzie. Did I miss something? Are they 298? Yeah. Yeah, it's fine. I'm hoping this. 198.4216.73. There are two of them, right? 6.73 and 180. If I was the second time of any company, they would have fired me long time because I'll take forever to type. I can't type faster. Okay, so I'm just going to copy the same information here, the X and Y and paste them right here at the bottom. And the values change. And I must check this value. It is the sum of all of them. So we can answer our questions. Let's go to the options. Let's go. I don't have the answers. Did I capture everything correctly? I think what happened was that you captured the values perfectly fine, but that's not the X and Y values that you're actually supposed to be using. Because I have from when I did the assignment, I still have the document the same way. And my X values are like below 20. All of the values are like below 20. And all the Y values are the ones from the, if you scroll up a little bit on your document. So all the values you currently have in the X column, right? I have all of those in the Y column. And in my X column, I have different values to yours. And then I actually have the answer as well, that I can see from the document as well, that 2633 on D. I have that value listed in my sum of error, sum of squared error. Wait, let's go back there. Sure. So I think what happens is you don't have the values that you missing a few, but I don't think you have them available to you either. I don't have all the values. So those values will stick with the decimal places. I don't think you need them for your mind. No, there's values in that with the decimal this. You need them on here because the formula here is the sum of your Y observation, which is your original observation minus your estimated observation, which are this estimated observation using your regression line. So let me just double check my regression line. So using your regression estimate. So I must use those two values. Wait, something is not, yeah, it is correct because it's using the B1 and the B0. And you estimate, I'm just going to go back there. And this is just, I just need to double check all the values that they do, what they supposed to be doing. And just to remove this. Okay. So this is going to double click in search. It should be your Y value minus your estimated value squared, right? So it does that for each and every one of them. I just want to make sure that it should be, make sure this value, they didn't make a mistake here too. So what are the titles of the columns? The X and the Y columns that you're currently using on the assignment? There? Lizzie, I think you need to swap it around. I think the values that you have, yeah, no, no, no. I think the values that you have on the Y is supposed to be on the X side and the values that you have on the X side supposed to be on the Y because I also had a similar problem when I did my assignment. When I swapped the figures around, Danny gave me the answer. Okay. Okay. Oh, you know what? Because now I'm doing something very totally different. No, no, no. We're not swapping anything around. It is, they've given us the predicted values. So these are not the Y values. So my bet. So on the formula, this formula. Okay. Right. On this formula, this is your Y and this is your Y hat. That's what they gave us should be like that. So on here, we should be using this value minus, actually, I should not change there. We go back there. Sorry. My bet. Now I see what you mean. So we should be saying our Y value minus our Y hat value because they gave us your Y value and your Y hat value there. So we just change the template a little bit. And it should give us two, six, six, three. There you go. Which is that way. So what they gave us is not X and Y is the predicted value and your reading speed. So they, you just substitute the sum of the formula. Yes. See, sometimes when you use the, it's the sum of your Y value minus your estimated square. So you just take this value minus that square the answer and do the rest of them. Should be it. This should be the same as what you have on here. So on this one, because the data is different, you do the same. If this SSE will be the sum of your Y minus your Y hat, it's quite, they gave you your Y and your Y hat. You just complete. And that's it. That completes today's session. Are there any comments or questions with the question, the assignment? If not, then I'm going to stop recording.