 Okay, so welcome to your 27th session online. Do you have any questions, query, comments before we start with today's session? Let's first check. Have you submitted your first submission of assignment five or have you started looking at your assignment five? You know that it's due next week and there will be no extensions because the max needs to be finalized for exam so that they can calculate your year mark. We'll be done by then. Okay, others, how did you find it? Is it easy, difficult, or are you going to start also? Start. I'm also going to do it this evening, try the first one. Okay, so at least by Saturday if you are still not sure about certain things then we can have those discussions. I will be online for an hour because I don't think we will have any other activities that we can do because after today we would have covered almost all the exam papers that I have. So if we have covered all of them then it means we have covered all questions I had so I don't want to go and create new questions. And I'm going to assume that that will be enough because we did the content and then we did activities and then now we're doing additional activities. So you should practice and then feel comfortable to do your assignment. And those who are planning to then you can use the Saturday to do your assignment as well to go through your assignment. Remember, don't wait until the due date to submit your assignment guys. All right, so let's get to it. So the first hour that we have we will do chi-square and then the second hour I will keep an eye on the time as well. Then we will look at regression. So what I want to do is to stop sharing and then share my entire screen again. Because at the moment I'm not sharing my entire screen so that when we do some activities I should be able to toggle between. Okay, so your first activity, some reason, okay, it's amazing. So we're going to start with chi-square. The easy activity. Consider the following table. Which one of the following statement is incorrect? I'll give you some time to do the activity and then let me know when you're done and you have the answer. I'll also do it on my side. Three, so three by three, consistency table. I'm just at the values. It looks like it's the same thing that I have here. That's the same thing that I have. Which one of the following statement is incorrect? Okay, do you have an answer? Not yet. I can only say one, two and three are still correct. So I'm left with four and five. Okay. Okay, so one, two, three and for me is correct. So five must be wrong. Yeah, five is wrong because 18.35 lies in the rejection region. It has to be rejected. It cannot not be rejected. Another thing I forgot to open is the table. You need to go to the critical values of chi. Okay, so already I have one person. The others, are you winning? So number one is saying the expected frequency for bias above 45 and medium car, which is, that is the observed. So in order for us to find the expected frequency, we need to go to over 45 or above 45 and medium, which is 39.6. So it means number one is correct. Number two, the observed frequency for under 30. Under 30 and but large car is 34. So that one is also correct. The degrees of freedom, remember degrees of freedom is number of rows minus one times number of columns minus one. How many rows we have? One, two, three rows minus one. How many columns we have? One, two, three columns minus one. So it will be two times two, which is equals to four, which means that is correct, that is correct, and that is correct. The critical value at alpha of 0.05, we need to use the degrees of freedom of four and the critical value of the level of significance of 0.05, we go to the table. We look for 0.05 and the degrees of freedom of four, where they both meet, that is 9.488, which is correct. And making a decision, we know that this is the critical value region, which is 9.8. Our test statistic, they say it is 18, if it's 18.35, which will fall in the rejection area. So we will reject the null hypothesis and the statement says we cannot reject the null hypothesis. Therefore, that is the incorrect statement. Exercise two, calculate the test statistic. It is a 2 by 2 table. So you will go to the 2 by 2 table. If you are using Excel, go to the 2 by 2 table and capture the values. So we have yes and no. We have main 10 and take 6. If I move this table, I should be getting the same value as they have. And since my values here are not rounded off, I can round them off to 2 decimals. And they should look exactly the same as what they have there. So it means I have the right expected values. So it means my test statistic, which is what they are looking for. Is that correct? If I look at this, it looks like. I have the same as you, Lizzie. I think their answer is maybe a little off. The number one, it would probably be number one. Yeah. Only number one is the closest one. It's the closest one. So that should be because they use the rounded off values and on our Excel, we're using the calculated values. So if I would have saved this 9.51, I should get the same answer as they have 25.49. And this is the importance of rounding off. Before you get to the final answer, 33.51. And I get the same answer as them because of the rounding off. So as you can see there, the test statistic is the same. So since I want to take it back to how I found it, just do and do and do until you go back to your calculations. Hey, number three, let me leave you to it. Let me not do it because then I give you all the answers when they are looking for the critical value. But the question is very tricky. It says referring to the table at 5% level of significance, the critical value of the test statistic is. That's very confusing. But for me, I will assume that they are looking for the critical value. So if they are looking for the critical value, you just need to say row, number of rows minus 1 and number of columns minus 1. How many rows do you have? How many columns do they have? And then use the rows and the degrees of freedom and your alpha of 0,05 to go find your critical value. Remember, you can post your answer on the chat as well. Let me just check the chat. The chat is not lighting up, guys. Come on, guys. You are able to chat. What's the code? So exercise three. I just wanted to just the number the exercise. How many number of rows? There are only two minus 1. There are only two columns minus 1. Therefore, our degrees of freedom will be 1. So going to the table, we're looking for 0,0 and 1. And the answer is 3,841. 3,841. So since this one, they used the four decimal table. But it's one and the same. And research I wanted to test whether the beer is independent of gender. And there is your 150 beer drinker's random sample. Nala hypothesis, beer preference is independent of gender. Alternative, beer preference is not independent of gender. The chi-square test statistics is equals 2. Let's see if you can get it right. This is a 2 by 2. I see already someone has posted. So I'll just put in the value. It's 20. I'm not going to change the titles. 20 and 40. I think it's a 2 by 3, Rafa. Oh, and Doug. I think we did do this. I always see only one of the two. So it's a 2 by 3. So it's a 2 by 3 table. So we need to use a 2 by 3 table, OK? So that will be 20. That is 40. And that is 20. And this is a 30, and a 30, and a 10. And the answer is option 5, like as Etienne has posted on the chat, OK? Exercise 5. This is copied on the wrong section. You can ignore that. Exercise 5 will do that those exercises later on. Exercise 6. To test if the absence of workers from their job occurs at a higher rate on rainy days than non-rainy days, your sample of 400 days taken. A statistician wants to test the independence to infer whether the incidence is higher on the rainy days. So is rainy day related to independence? Weather and absenteeism, are they independent? Which of the following statement is incorrect? No, correct. Oh, it's correct. I'm used to seeing incorrect, incorrect. So yeah, we're looking for the correct statement. It is a 2 by 2 table. Is it not 2 by 3? The 2 by 2, don't count the total. It's rainy, no rainy, yes and no. Oh, I see now. We have an answer. Number 4 is correct. Number 4. This one, that's correct. Let's see on the check, what do you have? Nothing, posted on the check. Oh, exercise 6, 4, 4. So you're saying it's number 4. So let's see, we're looking for the correct one. There is two rows, two columns. 2 minus 1 is 1, 2 minus 1 is 1, so 1 times 1 is 1. So the degrees of freedom cannot be 4. So number 1 is incorrect. Observed value for non-rainy and absence, non-rainy and absence, it's 100. It says 103, so it is incorrect. The expected, we calculated the expected for rainy and absence, which is yes. At 5, it should be 8.491. So that one is incorrect. The critical value at 1%, we go to the critical value. We're looking for 0,01 at 1%, and 1. So the answer should be 6.635. 6.3, and that is the correct one. And H0, which is the null hypothesis, it says independent. It should say independent, so that is incorrect. Okay, 7, I think this we did do. I'm not sure, can't remember. It is a 3 by 3, at the values 15, 21, 30, 27. And Lizzie, your answer is probably your response will be the answer for this. What do you do when the rejection or the critical value falls exactly on the same spot of your graph as the, what's the word I'm looking for, the test? Your critical value, because when we, oh sorry, when we do the critical value, we say when it's greater than, we say, only greater than. When it's greater than, when it's more than, because if it falls, yeah, usually to that side, we do not reject. So if it's less than or equal, then we will not reject. You do not reject. Yes. Donkey. Are we winning? We're leaning towards number three. We're leaning towards number three. Because we're looking for the incorrect one. Number one, number two is correct. Number four is correct. Number five, it says, sorry about the spreadsheet that is on top, you can minimize it. We can conclude that the mode of transport is independent. So it means we are not rejecting the null hypothesis. So we do not reject. So it falls this side. OK, so let's see, do you have answers on the chat before I give you my answers? Let's see, let's see. Only one person answered said number five is the incorrect one. So let's look at them. Number one, H naught should always state independent. So number one is correct. H1 should state dependent. Number two is correct. Number four, I'm going to skip number three because someone said they are leaning towards that one. I will go to number four, because I also can see that the sign that they put there is incorrect. Number four, it says the test statistic is 6.29, which is correct. So number four is correct. Now, number, if my test statistic is that, I need to go find my origin of rejection, which is my critical value. How many number of rows? They are three. So it means three minus one equals to two. How many number of columns? They are three minus one. They are equals to four. They are equals to two. So two times two is four. So it means I'm going to use my alpha of 0,05 because they gave me the. I need to go look on the critical value table. 0,05 and four. And I think we did look for that. 0,05 and four is 9,488. So if that is my critical value at 5%, actually, I don't even have to draw another graph because I have a graph here. So we know that our critical value is 9,488. Our test statistic is 9.6, so it falls in there. Do not reject. So this says we can conclude that the mode of transport is independent, which is correct because we're not rejecting. Since our critical value falls in there, do not reject area. So the only incorrect answer here is that because of the sign. Mainly because of that sign, because the sign says, if my test statistic chi-squared, if it's greater than, then we reject the null hypothesis. So it says, we're going to reject the null hypothesis. If my calculated test statistic is less than my critical value of 9,5, of which if it's less, we do not reject anything that falls this side. Do not reject anything that falls here. We reject the null hypothesis. So only number three is incorrect. Complete the table, then answer, which one is incorrect. So I can also mess up the tables. So we can use number two by two. I have 265, I have 950, and then you can use this to calculate the values. So we say 950 minus 265. And that is the value we're looking for. And we are also given. I can just remove all of this. We are given the total here, which is 700. And we are also given the total here, which is 1,000. And in honor for me to find this, I'll just say 1,000 minus 950. And this one is 1,000 minus. Sometimes you might be unable to click on a value because of the highlighted field. So it's fine. Since I know that my field is O, and I can just go and type there on the thing O, and it's on row seven. And it should not O. I'm looking for P minus P, P, because that's the column that I'm looking for, this one. And to get that value, 300 minus 265. To get this value, I can use 50 and 35, or I can use 700 and 685. And my whole table is calculated. I think our answer is option four. We're looking for the correct answer. So number one, that is not correct because number one, it should state independent. Number two, the observed frequency for acceptable and employee zone, which is that value there, it's 685. It is 685. So that is 665. It's not correct. The expected frequency for acceptable and employee pita, the expected value for 265 is 285. So that is 265. That is incorrect because it is this value here. Our degrees of freedom, there are two rows, two columns. Two minus one is one. Two minus one is one. One times one is one. So our correct answer is nine. Number five, suppose the calculated statistic is 40. So instead of using the one that they gave us, they say, let's use 40 at 5%. So we need to use 5.1 degrees of freedom and 5%. Go to the table and that is 3.841. 3.841. 40 falls in the rejection area. So that will be we reject the null hypothesis and this says not. So that is incorrect. So the only correct answer here is number four. To perform chi-square test of independence, you require two or more nominal variables. The distribution to be negatively skewed, the degrees of freedom, the level of significance, a test of contingency table. I think this we did discuss. No, it was a horrible question. I still don't like it. Yeah, I remember that. So we did discuss that. We need a contingency table in order for us to test the chi-square for independence. Which one of the following statement is incorrect? So you have a two by three table. Let me also go and complete it. A two by three table. It is too big. I will quickly go and populate. Five, one, five, four, eight, five, three, fifty, and you guys are busy. I will also play around. Just see which is P and which is SS. When you're done, you'll just move your X a little. When you're finished. Oh, yes, I am done. Actually, let me minimize it. Number four. Number four. OK, let's look at the answers. So I mean, I have to toggle between the two. The expected frequency for social media. And politician needs for 700. So it's on the second column. It's 791. 791.4, which means this is correct. The expected of traditional media and sports star is 450. That's the last column. So it is 163.75. 163.76. I think when you're on it off, you'll get 76. So that is correct. Oh, here. OK. The observed, meaning the values that you were given originally, the observed frequency for traditional and politician is 350. So that is correct. So we know that that is correct. That is correct. That is correct. Number five. The degrees of freedom. We've got two rows. Two minus one. We've got three columns. Three minus one. So that would be one. That would be two. One times two is two. So that is correct. That would be the incorrect one, because null hypothesis should state independent. Independent, yeah. So on this one, all what they want you to do is go and validate. We know what the degrees of freedom is. So the degrees of freedom, we calculated it previously. That is the degrees of freedom. It's two. We know that. So use the degrees of freedom of two and 10% and check if that value is correct. The critical value is correct. Do the same for 5%, 2.5. And 1%. Remember, we're looking for the incorrect one. So you just need to go to the critical value table and do your answers from there. So open up your critical value table and you just use your two and a range of all those critical values by using 10%. 5%, 0.25, and 1%. And validate if those answers are correct. Or as the level of significance, which is your alpha value, when it decreases, the critical values decreases as well. So it means if your level of, when the level of significance decreases, that's your critical value also decreases. So option 5 as well for this one. Option 5 is the incorrect one. Yes. So if we open the table, come on. Just want to make it smaller. So we know that it's the second row. It is that row that we're looking for, the second row. So if I put it here, line number 2 for 10% is correct because it's 4,60. And 5%, it's 5,9. Sorry, let's do this. That is the answer for number 1, answer for number 2, answer for number 3. They are the same and answer for number 4. That is correct. If you look at this, when the thingy, when not increase, when it decreases, so when it moves down, so when it moves there, the values of your critical values are increasing. So when your alpha value are getting smaller and smaller, your critical values are getting bigger and bigger. So when they decrease, it should say there is an increase in the critical value. So that will be the incorrect one. Next one is to calculate the test statistic. Do this. So I already populated the values. So there are 4,000 years. So the value of your test statistic is option number 4. So actually, the Excel spreadsheet helps a lot, especially when you don't have to do the long calculation. So please remember, if, for example, in the exam, they ask you what is the formula to calculate a chi-square test, please don't disappoint me that you don't know the formula. Remember to calculate your test stat. We use chi-square stat of the sum of your observed minus your expected squared divided by your expected. Also, you must know how to substitute this, because you need to know that you need to substitute the observed value minus the expected value squared divided by the expected value plus the next value plus the next value. Because if they give you questions where they already did your summations and they say select which one is the correct option, if, let's say, 1,800 minus 1,722, 1,800 minus 1,722 point whatever the decimals, you need to know that this is how you write it when the decimals plus. And we need to go to the next one, 7971. 700 minus, not plus minus, 791 point the decimals, because I don't remember all the decimals that are there, squared divided by 791 point the decimals. Plus, you need to know how to identify the formula and how to substitute them. Don't just rely on the Excel spreadsheet, the shortcuts. There are still other questions that they can ask you. How to calculate the expected frequency? You need to know how to calculate it. Instead, in case they ask you about the formula. So you just need to know that is the raw total times the column total divided by the grand total. You need to know those things. We'll tell them to press F2 on the Excel spreadsheet. But you need to know them. So because I know that I'm giving you all these shortcuts and easy way of doing things and calculations to save time. But sometimes they don't just ask you for the calculations. They can ask you or give you options like formulas and then you need to know how to use the formula. Okay, for Dean, given the same statement and the same table, consider the following statements. So since we have used the data and calculated some of the things, it's easy to answer the question. The challenge here is they didn't give the level of significance. How would you answer the question without the level of significance? So let's go back to the previous one because they are all linked. So if I look at this, so the previous one, I know what the test statistic is, is 58. And for all the level of significance that they provided here, we can use any one of them regardless because you can see that the values are small, right? For all of them, you can even take the highest one. Let's say it's for 1%. Let's say our alpha is 0,01. Therefore, our critical value of alpha and the degrees of freedom is 9 points. What did we find? 9.210. So if that is our critical value, then we need to consider these statements, all these statements that are here. So let's draw and select our critical value, which is 9.210. And we know that our test statistic was 58.6, whereas our test statistic will fall in the rejection area. So now, if we know all that information, consider the statements below. We reject the null hypothesis. We do not reject the null hypothesis. Media, platform, and personality are independent. Which statements or statements are correct? Which A and C? Correct. Remember, the null hypothesis states independent. The alternative states dependent, whereas 58.66 lie in the rejection area. So we reject the null hypothesis. And if we reject the null hypothesis, therefore this statement no longer, we no longer consider that statement. So the only statement that is valid will be that both of them are dependent. So it means social media and personality are dependent on one another. So based on that, it means this one, C is incorrect. We do not reject the null hypothesis, but we are rejecting the null hypothesis. The only statement that is correct in this instance is only A. So your answer is option 1, A. Lizzie, sorry, please just explain again why was option C not valid? I thought if you rejected the null hypothesis, it is automatically means it's independent, not dependent. No, because if I'm rejecting the null hypothesis, I'm saying the null hypothesis is not true, because I'm rejecting it. If I'm not rejecting it, it means I'm accepting it. I'm saying it is true. OK. You know when somebody proposes to you and you reject them, do you marry them because they rejected you? Then it means they are no longer there. So take the statement as that. So when you say you reject the null hypothesis, therefore you say you are rejecting that person. You don't want that person to be in your life. So C would have been a valid option if it said or dependent. So if let's say the answer that we got was 5.6, the chi-square test statistic. So we would be in they do not reject the null hypothesis. Therefore it means the null hypothesis will be valid. And that is why most of the time when we do not reject the null hypothesis. So when we do not reject the null hypothesis, when the null hypothesis says independent and the alternative says they are dependent, when we do not reject the null hypothesis, we say there is no sufficient evidence to show that the two media platform and personality are related. That's how we say it. It's because now we say this statement is no longer valid. We then assert that they are independent. Confused? So we say there is no sufficient evidence to show that media platform and personality are related. Therefore it means this statement is no longer valid. Or we could have said it because in statistical manner we say it in that manner. In the layman's term, we can just say we do not reject the null hypothesis because fake news media platform and personality are independent. That would have been a straightforward answer, not a statistical answer. That is if we do not reject the null hypothesis because then the null hypothesis is true. And here we're committing a type one error. We're not rejecting the true null hypothesis because it is what the researcher wanted to prove because the researcher wanted to prove that media personality and media platform and personality are independent. OK. We left with one minute for doing chi-square questions. But I think there are not a lot of them that you can also do on your own. I think there are about, let's see, three questions which are related one after the other. So we can look at the first one. What is the value of the test statistic and the critical value? So we have two answers for the test statistic and we have different critical values. So we can calculate the test statistic quickly because it is a 3 by, what is the table? It's a 3 by 2. So we need a 3 by 2. Let's see if I have a 3 by 2. That's the last one. It's a 3 by 2. So since it's a 3 by 2, we can just put the values 90, 45, 80, 15, and 10. Voices and girls. And girls. I'm just passing time so that you can also do, on your side, the questions, the thing. I've got number four. ST. So when you have time, you can do it that way. So you can change the variable names or the data names. So we know that our test statistic is zero. So it means none of those at the top will be valid. So those ones will be incorrect. So we just need to validate only those two. So this is 3 minus 1 and a 2 minus 1 because there are two columns. So we'll have 2 and a 1. So 2 times our degrees of freedom is equals to 2. We were told that we need to look at our level of alpha of 0,05. So 0,05 and 2 is 5,991. So that is option number 4. Because that's what we get. Number is second rule. That's 1 and 2. So that will be the answer that we are looking for. What will be the decision with regards to the hypothesis and the conclusion about the two variables? Based on the information I have. Hi. Well, it's 8. What about 8? You said 8 will start the regression. Yes, we will start the regression. Just not. Yes. So you take these two values that you have, you go make the decision, and then you choose whichever one is correct. Then the next step, oh, sorry, then once you got the answer, then that will be, oh, you just need to know how to make or choose the correct statement. You can just use a table, anything. We know what the critical value is, is 5,991. Our test statistic will fall in the site. So it falls in there, do not reject. So this is not correct. That is not correct. That is not correct. The only two statements left to choose will be number three and number four. Which one is correct between the two? Option three. Option three will be the correct statement, because option three states, we do not reject the null hypothesis, like with the discussion that we just had. We do not reject the null hypothesis. No, because the null hypothesis is true. We can conclude that the specialist type and gender of children are independent of each other. So that would be incorrect. And that is chi-square tests. So let's move, since someone already is looking forward to the regression, let's get to the regression questions. A sample of eight observations of the variable x and y are given, and they also gave you the summation values, which one of the following statement is incorrect? The coefficient of correlation is negative 0,9, and the coefficient of determination is positive. And the best line fit is that there is a strong negative relationship between x and y. Estimate the results in connection to the above x and y. Oh, they estimate the results in connection with the above variables, x and y are reliable. So now you can either use the summations manually and calculate your regression questions, or you can just use your template. You can use your template as provided to answer the question. So remember, your x and y, you just need to count how many variables you have. One, two, three, four, five, six, seven, eight. So I have six here. I need to add two more. So you just move on the two above the total and highlight them inside the two columns and say down, and then start capturing your data. Five. And you can do the x first by pressing five, enter, three, enter. See, I'm saying three, and then I put five. Three, enter. Seven, enter. Nine, enter. Two, enter. Four, enter. Six, enter. And eight, enter. And then I go back to the y. 20, 23. And remember, when you enter the data, you must make sure that the x and y values corresponds with one another. 9 and 11, 27, 21, 14. Just to double check my values, x squared is 284. I do get 284. Y squared is 2,930. I do get the same. X and Y is 710. So I get the same values. So once you have captured all your values, then you can just scroll to the left or to the right. And there are your values that you can use. And you can just choose which one is incorrect. So looking at the answers, the first one, it says coefficient of correlation, which is r, is negative 0,99, which is correct because our r, if I can make my values bigger so that you can see. So your r is negative 0,99. So therefore, this is correct. Number two, coefficient of determination is positive. Anyway, because coefficient of determination is r squared. If you have a negative response of r, if your r is negative, if you square the negative value, it will become positive. So your r squared will always be positive. The best line, so on the Excel sheet, already you do have the best line there. So my one, this should correspond to the one that they have here. So if not, then it means this one that they have here, it's incorrect. So they have Y hat is equals to minus 2.119, which should be the slope that should be next to the X. So this is incorrect. So we already know which one is incorrect. There is a very strong negative linear relationship and we could see it from the r that is true and the estimated connection of the r liable because of the r, you can just say the r liable and that's how you will answer the questions. So let me give you some time to also answer the following questions. Calculate the coefficient of correlation. So if you're using the Excel sheet, also similar thing. So since they only have one, two, three, four, five values and here we already have eight values, all you can do is go on the big column and delete three of this. Just say delete, I click and say up and then it will take you up and you just capture the values, 50, 50, 25. And we're looking for r. Option five. For some reason, I cannot, oh, there we go. So our r is, if I want the same decimals, they have got four decimals. I can also take it to the four decimals and those are the four decimals. 0,892, okay, I'm getting two three. And I guess this is also based on the rounding of two quickly of some values before you get to the final answer and that is option number five. If I look at the values, they match exactly the same decimation values, they match exactly the same as the values on the table. Let me give you some time. Let me know. I will look at the chat also to see if you have responded to the question. So this is exercise. I will win it. Let's see, okay. It says option four. I didn't capture the data. Something is scratching with this one. If I look at, if I look at, Lizzie, maybe you want to capture the data because that option one, according to what I'm seeing in the Excel and what they've got is also wrong. Okay, let me capture it. It could be the commas or something in the Excel, I'm not sure. Okay, so how many they have? They have 10. There's 10. 10. So I'm going to capture it. So you just need to... Lizzie, can I show you what I did with my Excel? I clicked on row eight. I clicked on row eight and I inserted from row eight. Row eight itself. Click on row eight. Row eight. Right click? If you right click on the row and you just say insert, then it moves everything down. Then you don't lose the formatting and things like that. Yeah, I don't want to do it that way. Okay. Just want to make sure that this table stays as is and not get the blue area in order for me not to mess around with the blue area. Let's do undo because this blue area is very important. If you mess up the calculations there, you need to re-align all the calculations. So since we're eight in five, you just click on the B and you just insert down. I guess the way I have done it, I've just complicated the whole thing. So in fact, anyway, insert down. I'm hoping that there will be enough rows. Let's see, 68, 39, 42, 41, 53, 40, 40, 48, and 60. Let's do the same with the Y, 14, 193, 161, 183, 179, 231, 173, 214, 241. You just highlight them and drag the values and date the whole table. So number, then I must go to the blue area because all the questions, the answers I need are on the blue area. I need to go. So the first one says the mean of X is 47.5. So the mean of X there is 47.5. I see why I went wrong. OK, thank you. I was looking at the wrong area. So then number one is fine. Yes. The mean of Y is 202.5, which is correct. The slope. And I think I also did put the description of what is what they. So you should be able to see the labels, the slope, or in terms of the symbol, because I think in the beginning I used the symbol. So the slope is B1, which is correct. The Y intercept is your B0, which is 2.51, which is correct. The formula as well should be correct. Let's see the formula. It's incorrect because they dropped your intercept and your slope around. So the incorrect one here is the regression line. And here it says there is a positive relationship. We can just look at the slope. The slope has a positive slope. So and if we also look at the regression, we can see that the regression is also positive. So that is also correct. So correct, correct, correct, incorrect, correct. Now, here they didn't give you the table. You need to calculate the coefficient of determination. What they gave you is SSR, but they didn't give you SST because your formula, if we use R squared, is SSR divided by SST. So since they didn't give us that, there are other ways of calculating SST. Or let's not say SST. Let's say let's calculate the regression because that will be easier. Let me show you on this table because you will receive this table when you go write the exam. Then they will tell you what SST, SSRA. So you don't have to go and think about that because they would have given it to you. Or they will tell you what SST is here. So they have given you the SSR. You just need the SST, which is just the summation of your Y squared minus the sum of your Y squared divided by N. So we just use that formula. So we know that R squared is SSR, SST. So we just need to calculate SST, which says the summation of Y squared minus the summation of Y squared divided by N. Just substitute the value. Y squared is 18050 minus the summation of Y 320 squared divided by N of 6. And the answer, my calculator is expiring in seven days. So after this, I won't be able to show you how to do the calculations. All right. So we have 18,050 minus function 320 squared divided by 6. And that gives us 983. 983.3. I'm just going to keep it like that because that confers to 333333333. It's equals to our SSR. They said it's nine, four, three, six. I'm not sure if this is a decimal there because with the past exam paper, I said it's very difficult to know what the values are. I think it's a decimal. so I'm just gonna put the decimal 6592 so since they kept four decimals I'm also going to keep four decimals on the one below which is 983.1234 so we have 943. 6592 divide by 983.3333 and the answer is 96 0.96 if we round it up to 2 decimals 0.96 so which is option number one so you also just need to know how to use your formulas as well because you will never know so the formulas will be given to you so the same table the same document you will get before you go write the exam or on the day of the exam I don't know how your lecture won't want to distribute it but those are the formulas that you can also rely on and use okay go to the next question come on we have the data if this is the regression line which one of the following statement is incorrect so that yeah they are actually not expecting me to do any calculations because they gave you the regression line easy to calculate the rest of them so yeah they say if x is 8 so you just substitute the value of x with 8 and calculate the value of y they are asking you if this is the correct value for the slope and this is the correct value for the intersect that's all what they are asking so don't try and go to the take to the excel sheet okay so let's answer the question is number one correct no number one is not correct because our independent is our x variable therefore our y variable is our dependent variable is therefore it means also number two will be incorrect number three when x is eight so you just come here and substitute the value of eight is that correct so you just go to your calculator and calculate 1.66 plus 0.93 into bracket eight close bracket is that correct that is correct oh gosh that is correct and that is what we're looking for the correct one the slope if you go back and if you understand your regression line you must always know that your slope is the value next to the x this is the slope and this is the intersect so that is incorrect that is incorrect you will just need to remember that that the slope is the value next to the x the slope multiplies with the x the intersect state stands on its own next exercise now you can go and use your excel instead of using the summation but remember in the exam if they didn't give you the table because this is very important if they don't give you the table then you need to know how to use your summations to answer the questions so here you need to go to the excel because they gave you the table which makes life easier and since I added 10 values and I need to take away some of the the values because here we have here we have one two three four five so I need to take away the five that I added one two three four five so I can take away from here to the just delete so I have five so the count also will tell me if I am heavy more than I'm supposed to so let's capture the data let's just make this bigger we have four six okay I clicked on the screen eight nine and twelve we do the same 12 5 8 8 39 and then you can answer the questions let's just double check if my values my summations are adding up x is 39 y is 182 x and y is 101082 x squared is 341 so as long as I'm adding up on those ones then it means my calculations are five just want to go up so I can get my correlation so this is the slope this is the mean I'm just re typing them because the the titles are on the site and you can see them right now I'm just going to check what this was so this one says the coefficient of determination there is also the coefficient of determination here I've only calculated on excel if in case you want to know what that is I'm not sure if you are able to see all the values this is exercise six exercise six so la lady says option three is almost correct so let's see the intercept v zero is minus three we're looking for the correct answer so that should be three point two minus three point two three and option two it says the slope there is the slope it should say three point five so they swapped the slope and the intercept around the regression line why are you saying it's almost correct because the regression line is correct so you can see is just the the value yeah the the decimal point yeah it's rounded off yeah because of the rounding of it it will take it away from because if you do calculations manually and you round off quickly by dropping off decimals every time you're dropping off important decimals or sometimes you're adding up important decimals together so it will depend on how you also round off so that is minus so the intercept it's minus three point something plus the slope of three point five five four which is correct this is the correct regression line as you can see it matches like with that one but then we have two oh maybe probably you are right because we we have no I think the decimal is just yeah a coincidence there as well and then the next one says the coefficient of correlation is 0.99 it is correct but the coefficient of determination should be 0.98 so that is incorrect and if x is 10 remember yeah on your spreadsheet if you have downloaded it you can just go and replace the last value with 10 and press enter and that is 32 that should be 35 that it says 35 it should be 32 so this is the correct answer so this one easy I'm just I'm not going to do it for you answer this one it says if you're given the least square of that much estimate what the value of the productivity will be if the dexterity is go is 15 so just substitute 15 on there and let me know what you get 64.2 64.2 so if it's 64.2 therefore it means what the lady has done was to come on cut later work with me to say 19.2 plus three times 50 close bracket 64.2 and that's how we got the answer please ask me to cut you off here can you go back to the previous section I think it's the first answer that's correct not the third one what are you referring to the intercept should be negative there it says the intercept is 3.4 and it's positive it should be negative 3.2 if I just look at the two decimal the first two decimals like the first state actually the first decimal so the intercept which is b zero should be negative and here on your response is positive regardless of whether the values here are like that or like that but it should be negative oh okay thank you which one of the following statement is incorrect if r is equals to 86 percent or 0.86 it implies that the relationship between two variable examined is strong enough if r squared is equals to 0.7 it implies that 70 percent of the variation in y is explained by the regression line number three if the coefficient of correlation r is highly negative it cannot be reliable number four if r is equals to 0.64 then r squared is equals to 0.4096 number five r indicates the strength and the direction of a relationship let me give you some time to think about it are you done thinking yeah okay so let's go step statement by statement if r is equals to 0.86 it implies that the relationship between two variable is strong enough is that correct or incorrect we're looking for the incorrect statement is this true 0.8 means strong relationship that's true if r squared of 0.7 implies that 70 percent of the variation in y in y is explained by the regression line is that true true that is true if the coefficient of correlation r is if r is highly negative it cannot be reliable that doesn't make sense it's wrong it is incorrect only when it slows to zero it cannot be because there is no relationship between that so that is the incorrect one so when it's positive or negative there is still a relationship so we can still rely on that to use the model to find the value of y to use the x value to find the value of our dependent variable so the only this one is incorrect if r is 0.64 r squared is 0.4096 so you just go to your calculator and put the 0.64 and then put the x squared button and see if the answer that you get out there is the same so let's go 6.4 squared equals 0.4096 which is correct r indicates the strength and the direction of relationship that's correct that's the definition that we use as well in the notes given the test grades and the supervisor grades determine the equation of your regression and answer which one which of the statement is incorrect so I'll also go and capture the data while you also capture the data let's do that so there are how many and didn't count one two three four six seven eight there are eight values need to scroll to the side so I can see everything on the table there are eight so I need to add four values and set down did I skip one value I added two one too many values there should be eight so I can just double check my values 51 58 393 oh because it's not calculating those two values let's see 51 58 393 and x squared is 369 so I have the same values I can just rely on that let's go and I can see here we also have an eight I'm just going to change that value to eight so that we can answer the questions give you time should you get your one see did you give responses said la lady says option four is incorrect let's see if option four is incorrect let's see it's slightly out so now my values have shifted because my table is they same so I don't have to also worry about about that let's see if I can drag it to the side okay so it says the mean of x is 6.375 that's correct the mean of y is 7.25 that's correct slope is 0.5299 that's correct the intercept is 3.872 which mine is surrounded so if I leave it to four decimals I get the same and you can do this same here since we don't so I need to go one level sorry about that just need to directly okay so should look exactly the same let's see so since I knew that the slope is correct the intercept is correct then the slope is correct so the intercept this low sorry my mind is somewhere else so this is incorrect this number four is incorrect and if we estimate the value of eight we get 8.11111 we also get 8.111111 because that is the same so the only option that is incorrect here is option number four these things can get you dizzy if you're not paying attention and with the last six minutes let's see if we can answer this question suppose the coefficient of determination r squared value is reported by the researcher to be 49 percent that is r squared which one of the following statement is correct so let's go through all the statement and see if they are correct so what I know is I'm not going to be interested in any statement that does not include the 49 percent so I can just assume that is not going to be used and also that one I'm not going to use that because it says suppose that the coefficient of determination r squared to be reported by the researcher is 49 percent so all the other values that has other measures that are not 49 percent are not going to describe the statement so the first one so I'm left with only two questions here so the first one says the explanatory variable explains 49 percent of the variability in the response variable that says my x explains my y the second one says my y explains my x which one of these statements are correct remember in terms of your r we did do that the discussion we say 70 percent is explained by so here also you must remember that the total variation in y is explained by the total variation in x that is the standard let's not put the block there let's put the total variation so that is how you explain that's the standard way of explaining coefficient of determination which one of those statements is correct number one or number two number three you say number three but number three says the response variable y explains 49 percent of variability in x so number three says x is explained by y so it's number one so the answer is number one so you need to be very very careful when you look at this remember your your response variable is your outcome variable is the variable that you want to predict or the variable that you want to look at to explain to explain that's the variable you want to explain or predict Lizzie these are agents they are testing our English these are agents they're testing our English yeah they don't get tired of tricks yes so you just need to to know how to how you define certain things and then how do we interpret it in the layman's time like in a easy way something like that so anyway that concludes today's session on saturday we can continue with other activities because there are about small activities on here but you also have your own time to also go through them if you want to to check your responses you can post on the whatsapp so there is 11 there is 12 13 14 15 16 17 18 19 or 20 there are more questions more activities that you can go through and then we can also connect on saturday and go through them and then if we finish within an hour we are done if not we continue we have two hours scheduled anyway and then you should be fine by sunday you should be able to write your your assignment then submit it while in advance and in conclusion of today's session just to recap on what we discussed last long time ago in terms of the preparation for you to write the exam because at the moment we're finalizing all content related activities so we should be done by this weekend with everything so that when we meet on wednesday we are going to look at past exam papers so on on wednesday we're going to do what i call a brief discussion overview of exam paper we're not going to go into too much detail we might go into detail but as a revision type of a thingy we're just going to go through that on the next couple of after that we're going to look at we're going to do a revision so i think revision will take us two sessions we'll do it on saturday or maybe we can even start on wednesday depending on how quickly i finish that overview of the exam paper then on saturday the following we will do revision where we look at the assignment questions uh because we we are asked to discuss the assignment questions and show you how to answer certain questions that way on the assignment in case some of them repeat themselves and you find them in the exam you should know how to answer those questions so we'll do revision for the next following two or four sessions coming so it will be the wednesday and the saturday sessions we'll we'll see how far we get in the meantime while we're busy with those revisions i will upload on my unisa on our site on my twitter site um i will upload the um this will only be for those who are on my my unisa like on my site if you're not one of my students on my unisa if you're not linked on my site you will only benefit from what we're going to do via the recordings or when we do the discussions because everything that we're going to do will be on my unisa so i will load the mock exam paper which is not timed you can do it as many times as you want you can take as many you can go back and forth back and forth or do whatever you want to do with but it will be a complete exam paper with the answers solutions to all the answers you will get your responses immediately when you answer the questions you will i will make sure that you get the responses immediately with all the activities worked out and how to solve answer the questions and and and with step by step i will do that um in the meantime in the background while we still do in the revision once you're done with that uh and i can see that many students have already completed it then once we're done with the revision we're going to go through that exam paper together in one or two sessions and i will also in the meantime uh after that couple of weeks i will also give you a timed exam paper a mock exam practice let me call it practice exam paper also and this is only for those on my unisa side so that you can practice and see how you're going to do in the actual exam because your exam is timed so you need to start practicing that so probably i might load two papers or one paper it will depend on the time that i have in creating those because it's time consuming to create the question paper on on my unisa and also in the meantime also do solutions because i want to make sure that when you do the activities you also get the solutions immediately you get the responses immediately so i will do that and those will be the format going forward so we do the revision using the assignment and then you need to go and take the mock exam paper and then you have later on before we go and write the exams then you should have an opportunity to write a proper exam a pre preparation exam and there might be one or there might be two depending on the time that i will have that is all for today those who are not in my group i will try by all means to download the questions and post and and and and put them on on the drive the way you get all the other notes so that you can also have access to them but i am not going to put the solutions because then it's it defeats the point those who are going through my unisa you will get the solution when you answer the question i'm also not going to when i upload the questions themselves i'm not going to give you the answers so you can only get the answers if you go through my unisa i'm not sure if you all understand otherwise the others who are not in my group i might find a solution for you to also do the test outside of this but it's a long short i am not promising anything i'm for now my only concept or my only priority will be those in my group on my unisa i need to look after them first they come first they are my responsibility so Lizzie going forward does it mean them that there won't be any recordings we'll have to attend these kind of sessions no all the all the sessions will be recorded until you go write the exam until we finish i will we will record all the activities all the discussions okay all of them will be recorded yeah um they will be recorded because i you you also want to go back and when you do your own revision or at your own time to go through the the recordings all i'm just saying is most of the things will will be done on my unisa side that yeah i feel very special right now no you should be thank you so much okay so then that concludes today's session