 Hello, is anyone there? Yes, we are here. How much? It's the most easy, but now the line is came up. I don't know how much it's a net. Can you hear me? Okay. I'm going to assume that you are hearing me. You will let me know if you're not hearing me. We can hear you. Okay. So today we're going to continue and do assignment five, which covers chi-square and also regression. So we're going to use the template for the first time. I'm going to assume the next 45 minutes we're going to cover chi-square. So you need to have your templates ready, which is a chi-square template. It looks like this. It's written there, chi-square is the example template. That's what we're going to use. You will need your table, your statistical tables also ready. We're going to use the chi-square critical values of chi-square. This is the table we're going to use. And remember the summary that we have already went through as well, which covers most of the things you need to know about chi-square tests. And remember, with chi-square, you only need to do chi-square for independence. So these are the notes that you're going to go and study. And you're going to use them as well when you are writing an example. Let's recap on chi-square. In terms of chi-square of independence, we only use independence. So there are also six steps or five steps. So step number one is to state your null hypothesis and your alternative. Your null hypothesis will always state that the two variables are independent. So in your null hypothesis statement, it will always state that your two categorical values or variables are independent. And your alternative will state that they are dependent. So your statement should have only those two. Step number two is to calculate your expected. So with chi-square for independence, we also call it contingency table. And the contingency table has the number of rows and the number of columns. So they would have given you the observed values. You will need to calculate your expected values for all your observed values. And then it means on your contingency table, if they didn't calculate your totals and your grand total, you need to calculate them before you do anything. And then you calculate your expected value by taking the raw total of that observed value. You multiply it with the column total of that observed value and you divide it by the grand total or what we call the sample size. And that will give you the expected value. That will be your expected values. In order for you to calculate your head statistic, you need to have already calculated this expected value. Step number three is to find the critical value. Is to find the critical value. And to find the critical value, you're going to use your chi-square critical value of alpha and the decrease of freedom. Your critical value, you will, sorry, it's alpha over two. So bad. Even though it is a one-sided, but we find the critical value using alpha over two and the decrease of freedom. You're going to find your critical value there. Step number four, you're going to calculate your test statistic. And your test statistic will be given by chi-square steps, which is the sum of your observed minus your expected divided by the expected and you need to square the answer. Only the top hat squared. It is the sum of your observed minus your expected squared divided by the expected. And that will give you, you will be calculating the test statistic. The last step, which is step number five is to make a decision. Now, with a chi-square test, even though you go to the table if you don't know how chi-square test, the general projection looks like. Even though we do use alpha over two, but with a chi-square test, you will also see that the distribution is a left skew distribution. And I think I'm using the degrees of freedom wrong, the degrees of the critical value we use alpha, alpha and the degrees of freedom, not alpha divided by two. That's my mistake there. It's just alpha and the degrees of freedom. And what I did mention is the degrees of freedom. The degrees of freedom is your number of rows minus one times the number of columns minus one. That is the degrees of freedom. In order for us to do number five, which is our chi-square alpha and the degrees of freedom means if anything falls in the shaded area, you're going to reject the non-hypothesis. Because it's always a positively skewed data or it is a left and one-sided distribution and you find the rejected value will be on the left, on the right-hand side. Any value that falls greater than the critical value, you're going to reject that. And the critical value, remember, you're going to find it on the table. And your critical value, you're going to use the table to find your critical value. And also, remember, we also use the upper tail area, which means we're using only the values closer to the table and the degrees of freedom that you have calculated. Those are the steps that you need to always remember about chi-square. Okay, so let's look at the questions. Based on everything that I've just explained right now, you should be able to answer the questions that are following. Question one. Which one of the following statements is incorrect about chi-square test of independence between two variables? A, the alternative hypothesis is that the two variables are independent of each other. B, the expected frequency of each cell is equal to the row total times the column total divided by n, where n is the sample size. C, the test statistic has the number of rows minus one times the number of columns minus one degrees of freedom, where r is your number of rows and c is your number of columns. D, the two variables are categorical. E, if the observed and the expected frequency for each cell are equal, then the test statistic will be equal to zero. Which one of the following statement is incorrect? A, B, C, D, or E. The only thing that we didn't explain is E. What E says, if we need to calculate this test statistic, it says the answer will be zero if your observed frequencies and your expected frequencies are equal. So let's assume that we have this contingency table. It looks like this. It has row, this is column one and column two and row one and row two. Right? And here we have one, one, one, one. As our, let's make it even interesting, one, two, three, four as our observed values. And then we take our observed values, we go and we calculate the expected values. And we have our column one, column two, and row one, row two. Let's assume that we did everything the same way. We take the total and we divide by n, all those, the formula for calculating. Remember, we did the expected value calculation. Let's assume that I'm not going to calculate it. We're going to make assumptions. Let's assume that based on this information, they say they are equal. So it means the expected value for this observed value is one. This will be two, this will be three, and this will be four. So then the observed and the expected value are equal. Let's validate that. So we know that our kind squared state that we need to have the sum of your observed minus your expected squared divided by your expected. Right? So let's see if my observed minus your expected will be equals to zero for this data that we have. This is our expected data. So we'll say one minus one squared divided by one plus two minus two squared divided by two plus three minus three squared divided by two by three squared because they are all equal plus four minus four squared divided by four. This will be zero because one minus one is zero. Any value that can divide into it will remain zero. So the answer here will be zero plus zero plus zero plus zero because four minus four is zero. Zero squared is zero. Zero divided by any number will remain zero. So the answer will be equals to zero. So let's see. We're looking for the incorrect one. So the first one it says if the observed and the expected frequency for each cell are equal and we made them equal, then the test statistic will be equals to zero. You can see that that is correct because the statement that we just made, we made an example dummy data and we tested that and we found that it is equals to zero. You can even use other values like 20 and 20, 10, 30 and 30, 40 and 40 you will still get the same results. Remember for contingency table, we said we use two categorical variables. So there are your two categorical variables. There will be two categorical variables when you do a square test. The test statistic has the number of rows minus, the number of rows minus one times the number of columns minus one degrees of freedom. What did we learn? We said the degrees of freedom has number of rows minus one times the number of columns minus one. And this says also where R means number of rows and C means number of columns. And it's the same thing as what we just explained because the degrees of freedom, it's calculated by using R minus one times C minus one. That statement is correct. The expected frequency for each cell is equals to, let's go back to our equation. We said to calculate the expected value or the expected frequency is your row total times your column total divided by the sample space or the sample size or the grand total. So let's go back to the question. The expected frequency for each cell is equals to the row total times the column total divided by N where N is the sample size, which is something that you would have learned with the six steps or the five steps of the hypothesis for I squared or for contingency table or for independence. Number A, the alternative hypothesis is that two variables are independent. How do we state the null hypothesis and the alternative hypothesis? The null hypothesis always has independent. The alternative will have dependent. So the question here it says alternative. It has independent. It should not be independent. It should be dependent of each other, which means the incorrect statement is that one. So as you can see, I don't know if you are saying something because I have muted you all because I'm hearing myself. I quickly want to switch off and come back. Hello. Yes, I'm sorted. I can hear myself again. What a relief. I apologize for that. I'm going to come back again. Let me know if you're not hearing me as well. I hope you can hear me. I can hear you. Okay. And the session is still recording right? I can hear you. Thank you. I just want to check the recording if it's still recording. Okay, so let's continue with question two. In a contingency table with six rows and three columns, how many degrees of freedom do we have? I'll write the formula and you are going to do the calculations. So the degrees of freedom is the number of rows minus one. Times the number of columns minus one. How many rows do we have? Nobody wants to talk to me. Six rows we have minus one. How many columns? We have three columns. Three columns. Minus one. And the answer is ten. Six minus one is five. Times three minus one is two. And the answer is ten. B is our answer. Is it right? Easy things. Now consider a contingency table with five rows. And two columns. In a square test of independence, if the level of significance is 1%, what is the critical value? So remember your critical value, critical values of chi, it's alpha and the degrees of freedom. So therefore it means you need to go and calculate your degrees of freedom, which is number of rows minus one. Times number of columns minus one. So it will be five minus one times two minus one. What is your degrees of freedom? It's four. So now let's go and find chi square of zero comma zero one because one percent is one divided by hundred is zero comma zero one and four. So let's go to the table. When it says take chi square, we're looking for four and zero comma zero one, not zero comma one, zero comma zero one, which means we go out. And the answer is 13.277, which is option A. And that's how you find the critical value. So you need to know that for chi square, you go to the critical values of chi and use the alpha values, which are the values closer to the table. You want to use values closer to the table, especially the alpha values and your degrees of freedom will be the values of this. Your degrees of freedom will be on the left. So you need to make sure that you use the right table. You must check and double check because sometimes you might be in a hurry and not double check and end up using the critical values of T because you can see that it looks exactly almost the same as the chi square. So pay attention to details when you answer the questions. Moving on to the next one. Unless if there is another question. If there is none, consider the contingency table below to test the independence of distance from home to school and the school level. What is the expected frequency of primary school learners traveling more six kilometers from home to school? Choose the correct answer from the list below. Now, here is your chance to use the, you can either use the formula or you can use the template. What they're asking you to calculate is the expected frequency. Your expected frequency is the number, the row total times the column total C total divide by N. What is missing on this table are your total. So it means you're going to calculate your total. You're going to also calculate your total. And you're going to have the grand total there and then you're going to give the old calculate substituting to the formula and calculate. So let's calculate Manuani 115 plus 75. 190. 210 plus 120 is 310 and 318 plus 165. 480. 480. Let's go to the row 115 plus 210 plus 315. This one I cannot do by heart now. 640. 640 and 75 plus 120 plus 165. 316. And 340. Oh, sorry. 640 plus 340. What am I saying? 640 plus 360. 700. 1000. 700. Huh? 640 plus 370. 1000. Oh, not 340. It's 1000. It's 1000. And also if we add 190 plus 380 plus 480 will get 1000. Okay, so now we can substitute into this formula. Remember, they only interested in primary school learners traveling more than six kilometers. More than six kilometers is greater than six kilometers. And primary, so we only interested in those ones. So we go into take the role total of the 350. The role total is 600 and footage. This is a role and this are the columns. So we take the role total and we take the column total times the column total is 480. Divide by the grand total, which is 1000. And do the math and give me the expected value. What is the answer? 307.7.2. 307.2. 307.2. That's what you get. 307.2. And you can round it off to a whole number because yeah, we only have all numbers. Therefore it means it will be C. Alternatively, you can use your template. So let's do that using the template. Go to our template. First, we need to identify what is the type of contingency table that we have. Two rows, three columns, right? It's a two by three contingency table, right? So since we are able to identify that it is A2, two by three, please remember not to count the total columns as well. Only the data columns counts. So we're going to go and look for a table that has two rows and three columns and it is the second one. So we can change this to primary and high school. And this will be, we need to use the information we have here. I just need those ones. Let's say this is less than three kilometers. This is between somewhere else. That's between and the last one is great. And just say greater than six kilometers. And then now we can substitute all the values. Let's start with primary and it's 115 and 175. And the next one has 210 and 120. The challenge with working online is all this because now I need to have a space in order for me to see the table so I can put the date. 315 and 116. All the data is there. So all I need is to go to the expected value. There is the expected value and I can just format it by using the format and there's my answer of 307. And that's what we found. Easy, right? The answer is 07. So you can use the templates to answer the question. They can save you time, but you need to practice because it's not as easy as the way I'm making it here, right? So that is if you use the template, you can either use the template or you can calculate my life. As you can see there was as quick as using the template. So let's move to question 5. Now we have the following two by three contingency table contains and this contingency table contains the observed frequency and the expected frequencies in the bracket. So they already calculated the expected frequencies and they are telling us that this comes from a sample of 348. That means if I add 75, 34, 60, 50, 88, 41 it should give me 348. The next thing they're asking you to calculate the chi-square test. So it means you're going to calculate the test statistic, chi-square test statistic using the rows and columns. So it means you will have to use your observed minus your expected squared divided by the expected. It might take you forever in the exam. So we'll go back to our template. When we use the template, ignore the expected because on the template it has already the expected values, right? So it is a two by three. So we're going to use continue to use the same contingency table. So I just want to make it smaller so I can get to the data. So now I need to change all this to reflect what I have. So I have P, I have Q, and I have A, I have B. And I have C. I can just put the values 75, 60, and 88. Do the same, 34, and 50, and 41, and 348, 348. So it has captured all the data. It calculated automatically my expected values. So I don't have to worry about the expected value. What I need is the test statistic. If I scroll down on this, you will see that there is a filled for test statistic. A it is, and it says our test statistic is 6.36, which is option A. Isn't it? So like I said, on Sunday, please check what type of question you got on your assignment. Use the template, see if you get the answer, because you will be following the same steps. That's what we get. If you are going to calculate manually, then I will suggest that you put the timer to how long you take to do the calculations, because you will need to say 75 minus 69.85 squared divided by 69.85 plus 60 minus 70.49 squared divided by 70.49 plus 88 minus 82.66 squared divided by 82.66 plus, until you do everything up until you get to 41 minus 46.34 squared divided by 46.34 and you calculate the whole thing. It's not too many, but it's time consuming, especially if you can make one mistake in both cases. Whether you are using the template or you are calculating manually, one mistake will give you the wrong answer. So you need to be very careful when you are putting the values onto your template as well, but you're putting the right values. Okay, so question 6. It should be almost the last one. Question 6. Yes. I'm scared to ask a question, but where can I get the contingency table? Anyone who has an answer to the question. It's in the list that you sent. Ian, were you not part of my E-Twitter group? No. Are you part of the WhatsApp group? So I've gone to YouTube to see your videos and on the playlist. You are not answering my question. Are you part of my WhatsApp group, the STA 523 group? Yes, on WhatsApp, I am. Yes. This afternoon, I sent very harsh messages there and I laughed at that. No, that's what I'm scared to ask. And I sent some screenshots. Okay, let me have a look at that. Okay, don't worry. Thank you. All the information I'm using, you all have access to it. And since from day one, I have never ever even neglected to say this and I've never withheld any information. I made sure that it is accessible. It's somewhere very public where everyone can have access to. Thank you. You just need to pay attention to those communications as well. Very easy. Before you continue, I just want to find out in the exam whether they allow us to use this template. Oh gosh. Oh my goodness me. So you guys, I don't know whether you are listening to me or you're not. Really? But I just finished explaining about the template, so it means you are allowed to use it, right? I wouldn't. Okay, no. I'm thinking if we're going to use multiple screens, isn't that going to flag us? No, because also the people who passed this module, like in November, they also use the templates, right? They also add the template. So when I say in the exam, check your time, whether you want to calculate manual or you want to use the template, I am giving you options that you can choose. It's up to you. You need to choose. The only thing that the iris system will do is to flag if there is some sort of actions that are happening on your computer, whether there are people or you look like you are talking or something like that. You must remember this is still an exam and it's written under the exam conditions. One should be able to help you write the exam, but your exam is an open book because it's an online exam. You are required to bring the tables and the tables are on your computer most likely so you will be moving around in terms of going to the tables looking for the values and coming back onto your screen. The iris will not flag you if there is no inconsistent. The flagging is around how you behave during when you are writing in terms of standing up, moving, talking, raising hands and doing actions that tells you that there is somebody else in the room with you. That's how it gets flagged, but now I am distracted from this and now I'm explaining the administrative issues which we address at the later stage. Okay, so let's go back to content. Anything I share with you you can use. Sometimes I'm even scared to tell you what you need to take because then I'm teaching you how to cheat in the exam, which is not what I want to do. So some of these things take them as with a pinch of salt. Sometimes I don't want to stress it because the YouTube videos become public. Everybody can listen to everybody can share them and they might reach the wrong people. I'm explaining all these other things here. That means I prefer not to explain administrative issues and hacks and tricks on the YouTube video but outside of the YouTube video. So please, let's pay attention. And then, yes, right. Let's move on. Next, we go to question five. I am so sorry about that. Question six, the test statistics for the test of independence from question five has been calculated, but now this is not from question five. It should be question four. That is the previous question that we calculated that question four. We were using the same information. It has been calculated and we found that it is 1.56. That is the test statistics that is our Chi square stage. They calculated that and they found that it is one comma five, six, there is no need for you to go and do anything else extra on this. Because looking at the options, the first thing you do when you open your questions or you go to your question, look at the options. Options usually guides you and help you save time in the exam. So look at the options. The options are you need to make a decision that calculated your Chi square test. So the first question is for the asking you. Based on this Chi square, where does it fall? If the region of rejection is somewhere, but you don't have that. Let's go and calculate this region of rejection. Their critical value. How do we calculate the critical value? The critical value is alpha and the degrees of freedom. How many number of rows do we have? So our degrees of freedom is number of rows minus one times number of columns minus one. How many number of rows we have? Oh, because I was so harsh on you guys, you decided to keep quiet. Maybe you even left the number of rows to there are two minus one times the number of columns. Three minus one. Three minus one and your degrees of freedom is? That's two. Two minus one is one three minus two is two one times two is two. And they say at 10% level of significance. So therefore our critical value. We're going to find it at zero comma one zero and two. So we need to go to the table. Let's go to the table. The next step is to go to the table and look for two degrees of freedom. And our alpha is zero comma one zero because it's 10%. So they both meet. That is your critical value four comma six oh five. So we go back there and we say this is four comma six oh five. So let's look at our test statistic. Where does our test statistics fall? Is it in the white or in the red? Our test statistics is one comma five six. Is it in the red shaded area or the white area? In the white area it is in there. Do not reject H naught. Now we need to make a decision. What do we know? If we are not rejecting, if we are not rejecting the null hypothesis, how would we have said the null hypothesis is cool level and distance from school? We would have said school level and distance from school in the pendant. And then in the alternative we would have said they are dependent. Now, if we are not rejecting the null hypothesis, how do we conclude? So we're going to go to the conclusion and we're going to eliminate any way away. It says we reject because we are not rejecting. We do not reject. So let's see which statement between A B and E. So let's start with E because A and B they look almost the same. E says the alternative hypothesis is that the distance from school to home and school level are independent. That is alternative. Is it correct or incorrect? Correct. Incorrect. Alternative. Incorrect. It is incorrect. So we're also going to ignore that one. So C, D and E are incorrect. We are looking for the correct statement. So let's go to A. A says we do not reject the null hypothesis and conclude that the distance from home to school and school level are independent of each other. And B says we do not reject the null hypothesis and conclude that the distance from home to school and school level are not independent of each other. Which one? A or B? This time I'm not telling you the answer. A or B? Based on, if we are not rejecting the null hypothesis, how do we conclude? It's A, Lizzie. It will be A because null hypothesis sales says distance from school and school level are independent of each other. So A will be the correct answer. As long as you can remember that your null hypothesis always stayed independent and your alternative stayed dependent. There is nothing that can go wrong. Always remember that. H naught independent. HA or H1 dependent. Okay. Moving on. Now we go into the regression and I think 45 minutes could have been that. Let's see. Yes. Almost. Okay, I will take the five minutes for the venting part. So the next minutes, almost 45 minutes or so left, we're going to do regression. In terms of regression as well, we're going to use the templates. And this is the template that we're going to be using. So it is there. Regression model. Example template. You will find it in the same folder where it has templates. That's where you will find the same. So it looks like this. The big Excel sheet that we're going to be using. So what else do we need to know about regression? A couple of things in terms of regression. There is the regression line, which is given by the regression line given by why hat is equals to be zero. Plus B1 X where your B zero is your intersect and B1 is your slope. And X is your variable of interest where your Y estimate is your estimate. If your X is zero, your estimate will be the same as your Y intersect. Now your slope is the change in units. When you add one additional unit increase in this variable, it might increase or decrease your Y variable. If I add one additional increase of your X value, it might increase or decrease based on the sign. The plus increase, the negative decrease the value of your Y estimate. Those two we can interpret. One additional unit increase or decrease if it's negative. If the slope sign is negative, we say it is a decrease because it will decrease the value of your Y estimate. If it's positive, we say it will be an increase. That is explaining the slope, which is B1. B zero, it is your Y estimate if the value of X is equals to zero. Therefore, it means if X here is zero, B1 is zero. Therefore, your Y estimate will be equals to the intersect. The other thing you need to remember about the regression is the coefficient of correlation, your coefficient of correlation, which is R. That R is between, R lies between negative one and one when one or negative one are strongly or perfectly correlated. Zero, no correlation. 0.5, it's moderately correlated. 0.35, weak correlation, whether it's positive or negative. So you will add the side, which is the direction based on the side. If it's negative, you say it is negatively correlated. If it's positive, you say it's positively correlated. If it's 90% or 90, negative or positive, you say it's strong correlation. If it's 0.35, 0.15, you say it's weak or 0.35 or 0.4, you will say it is moderate. So you need to know how to interpret your coefficient of correlation. Also, every information that I'm just sharing with you is part of those summary notes. If you use them well, there's nothing that can go wrong. Then you have the coefficient of determination. The coefficient of determination tells you what are the variability in your independent that your, yes, the variability in your independent that is influenced by the variability in your independent variable or your independent variable. The variability in your independent variable. Oh, sorry, in your dependent variable, how much of it is influenced by the variability from your independent variable. So you need to know how to explain that as well. Now, your coefficient of determination, which is also R squared, it lies between two values, zero and one. Also, it's always positive because you will say 95% of the variability in Y is attributed by the variability in X, things like that. Okay, so that's how you will interpret your coefficient of determination. What else you need to know about these two things? That's all for now. Otherwise, go and read more about the properties of regression. Remember also the scatter plot. You can also get the scatter plot will tell you whether if the data values look like this, are they positively? And if the graph looks like this, where your Y and your X, it tells you when the value of your X increases, the value of your Y increases, so it will be positive. If it will look like this, it says when your values of X increases, your values of Y decreases. If it looks like this, it's constant. When the values of X increases, the values of Y stays constant, then the correlation there will be equals to zero. This correlation can be perfect or it can be 90% or 98%. The same is that this one will be a minus 98%. And this one can be a positive 98%, depending on how you calculated your correlation. And this can be 0%, which means there is no variability here. So you need to know how and how to identify and how to plot your scatter plot. Remember for every point of exit crossbones with the point of Y, those are the things. So now let's answer the questions. Question seven, which one of the following statement is incorrect about some of the concepts of simple linear regression, which is everything that I just explained right now. So let's see the least square method, which is that formula that we just use is called the least square method estimate. The regression equation by maximizing the error sum of square. No. Of course we're looking for the incorrect statement. That is the incorrect one. It doesn't do that because we, even though there are some residual, call them residuals. The sum square, no, we don't estimate maximizing the errors. We cannot, we try to minimize the errors because we want to make sure that there are a little bit of those errors that are coming up or creeping up from your least square. Because as much as we want to estimate the new values, we need to make sure that the estimates are as close as possible to the original values. So the bigger the errors, it means your estimations are bigger as well. So that it's a no, no. Okay, so that you wouldn't know. I'm not sure how you will find it in your textbooks or in your study guide, if they do explain such concepts. I know that in my tutorial classes, I've never touched on concepts like this, but I think I guess by looking at these questions, we're trying also to answer some of those gaps that we have. Okay, so I'm going to answer this whole question by myself and then the next we can do together. Number B, the coefficient of determination gives an indication of how well the estimated regression equation fits the data. Yes. Because we know that your values of your independent values, the attributes, how much of it attributed to how you estimated the values of your y. So that will be correct. This is the incorrect one. The coefficient of determination always takes the values between zero and one. I just explained it there. And that is correct. The correlation coefficient always takes the values between minus one and one. I explained it there. And that is correct. The correlation coefficient takes on the sign of the slope. Oh, that's the other thing that I didn't mention. So the slope and the coefficient of correlation would always have the same sign. So you cannot have a correlation of positive and have a slope of negative. They always have the same sign. That is correct. So you need to go and learn about the regression. So let's go to the summary notes. Because this is the last chapter section that we're going to do. So you're going to learn about the scatter plots. And then here are the explanations in terms of how you interpret the coefficient of correlation. What does it mean when R is smaller, when R is bigger than zero, when it's zero, and in terms of the strength and the direction, how do you interpret it? If you are given a scatter plot, how does your scatter plot look like? But all of these are also part of the PowerPoint slides that we went through on a weekly basis last year during our E-Twitter session and so on. The other is just this is compacted into a small document. You just need to learn how to go. This is for ease of reference in terms of the equation for the sum square measures, as well as how you calculate your coefficient of determination by using the sum square measures. How you interpret, like you can see there, 100% of the variation in Y is explained by the variation in X. So if it was 98% of the variation, it will say 98% of the variation in your dependent variable is explained by the variation in your independent variable. And so on how you interpret that and then how you calculate the regression line, which is your lead square method formula that we use, which is your Y estimate and your intercept, your slope, and the X observation. And how you do the calculations to calculate the slope, the mean, the median, the mean and the intercept and the regression line. So there are some formulas, but we will use the template for doing that. How we calculate the regression line using the summation formula or you can use your calculator. Oh, that's the other thing. You can also use your calculator to calculate the regression line to calculate the slope, to calculate the coefficient of correlation, the coefficient of determination. But this is one of the document that also summarizes everything that we did. And we'll see that most of these notes are the same as those in the PowerPoint slide. Okay, so let's look at more questions, unless if there is a question. Okay, if there are no questions. Now, when we look at questions like this where they give you the scatterplot and they say consider the sample data below and develop a scatterplot, which one of this, which one of the scatterplot A to F best describe the data. Use the process of elimination. You don't have the whole day in the exam to go through all the points. Choose any of the points. Let's assume we're going to check to check which value of X is the minimum and which one is the maximum. And then we do the same with why minimum and maximum. And we come to the graph and we eliminate if the graph does not have the minimum and the maximum value that graph we can eliminate from this so that then we are left with the graph that we're going to use to test our data. Instead of us going through all this information so let's look at X minimum X the minimum value here is 15 is 14. Right. The smallest value here is 14. So if X value starts at 14 so any graph that does not have 14 on the X axis, we're going to eliminate. That starts at 20. That starts at 15. This starts at 15. This starts at 10. That started 10. So it means D B and a are out. We are left with three process of elimination. Let's look at the. We can also go with the maximum value. The maximum value here is 41 for X. So that goes to 60 that goes to 40 that goes to 41. Yeah. Yeah. Wow. My apologies my sister. Thank you. So we already eliminated four without even looking at the data in detail. So we only have two graphs that we can say we will use to find out which one is there. Correct one. So let's do that. So looking at our X values as well. That's very important. If the data ends at 41, which is our highest. Right. And the graph has a point above 41 because 41 is here. If our X value is more than 41 then it means we're going to eliminate. There is this point which is bigger than 41. There is no other data point that is bigger than 41. The X is also eliminated. I'm only left with E. Therefore, E is my answer. The right graph is E. And then you look at the options. Pay attention to the option because the option might, they might have also scrambled them. But in your mind, know that E is the only graph that matches the data. We can also validate that. The graph says when X is 41, Y is 10. 41 and we expect that point to correspond to 10. We can choose another value. Let's choose 28 out of the blue, which is the last value. It says when X is 28 we go and look and assume somewhere between there and there is 28. We can assume that one of those two points is 28. And it says Y is 22. Why is 22? Then I'm going to assume that this is 28 and 22. So, there is no need for me to go and check the other values because of the process of elimination. You do that. You will save a whole lot of time in the exam. That's question number eight. There's nothing more I can explain about how you find out which one I just showed you. I showed you how to find which one is correct. So, you can also use the points. So, you can go and say I'm going to use the process of elimination. And I find X of 14 and Y of 17 on each one. So, you can see that X there does not have 14. X there does not have 14. X there does not have a 14. And if I'm on here and I'm assuming that is a 14, then that is correct. But it says it's 17. So, it needs to be somewhere here. There should be some 14 and 17 values somewhere there. You can see that also doesn't work. So, you use the logic like that to do the process of elimination. There are many ways to skin a cat. Find the one that you feel comfortable using the shortcut that you can use to find out which one is correct or incorrect. Are there any questions? If there are no questions, then we move on to question number nine. You are tasked with investigating the relationship between reading speed and age of primary and high school learners using simple linear regression. Which one of the following statements about the investigation is incorrect? And yeah, we're talking about the regression. So, based on the information that we spoke about earlier in the beginning when we started, there are certain things that I didn't mention straight through and all that, right? We didn't mention most of these things. So, we need to answer and find out which one of these questions are incorrect. I just want to point out also on there, on that summary document that we have, there are some notes that explains the regression, right? What the regression is. So, what an independent variable is and what is an independent variable, right? So, you just need to also make sure that you know the logic and understanding of that. So, remember, if I have this, I have my x and I have my y. My independent variable influences what my dependent variable will be. So, this will become my dependent variable and this is your independent variable. I'm already giving you the answers here. So, it means you are going to be able to answer this with ease. Remember also, we dealing with numerical values as well. So, reading speed, age, think about it. What is age? Is age discrete or continuous? Ask yourself that. Reading speed, can you count the reading or can you measure the reading? So, think about it because there is a difference between counting and measuring, right? Counting is discrete, measuring is continuous. So, think about it. If the correlation coefficient is positive, okay, these are the things that we explained in the beginning. We're going to repeat that. The slope, it's also one of those things that I explained in the beginning because then this tells you how you interpret the regression line. Okay, so let's see which one of the following statement is incorrect. That's your question, not mine. I'm asking you after I did all the explanation. The independent variable is reading speed. Look at the graph. Tell me if it's correct. B, age is a quantitative continuous variable. Can we count age or do we measure age? We did this with study unit one. We discussed this. If the correlation coefficient is positive, then there is a positive linear relationship between reading speed and age. Is that correct? Is it how we interpret coefficient of correlation? The estimated regression equation will take the form. Reading speed, which is your dependent variable, is equals to your intercept B0 plus your slope times your age, which is your independent variable, where your B0 is your intercept and your B1 is your slope for the estimated regression equation. Which one is incorrect? A, B, C, or G? Nope. Yep, A is the incorrect one. I gave you the answers. They tell you that reading speed is Y. I told you that Y is your dependent variable. Age is X and I told you that X is independent variable. So independent variable should be H because reading speed is your dependent variable. Age is continuous because we measure age in terms of years, months, days, hours, seconds. The minute you are born, the clock ticks and they tell you that you are born at 12.03. We measure that. We measure it. When somebody dies, we measure it in terms of the time. They died at 12.03. If it's at night, they will say they died the following day or they died on Sunday if they died at midnight, right? Because we measure it in terms of time. Okay, moving on to questions. Consider the data below. Showing the age in years and reading speed in words per minute WPM of Lenas Adi Khatron Premari School in Bujanala. Which one of the following calculated quantities is incorrect? Okay, so with this one, I will say use the template because the first one is easy to calculate. You can just take the age values, add them together. They will give you the sum of X. The second one, you need to calculate the mean and take every observation of your age value, subtract it from the mean, square the answer, add them together, and it will give you that value. Number C. And now it becomes even more complex. You need to calculate your X mean, which is the mean of age, and the mean of the reading speed, which is your Y. Then you take the observation of your age minus the mean of age multiplied by the observation of the reading speed minus the reading mean and add them all together and you will get the answer. B zero, you will need to go and calculate B zero. So let's use the template to calculate August. So we go to our regression template. Even your calculator here won't help. Even if you can try to use your calculator. Now, let's look at our template. Our template is easy. It comes with instructions. There is some instruction on the template here. On the yellow, it says to add a row by clicking in the cell in column B and row A and drag it to highlight the number of rows until Y2. If you click in set cell down, you can repeat this step until you have enough rows to complete the X values in the table. You will read, I will show you how to use it. I'm not going to read the whole paragraph. So all these are related from X up until Y squared. If you add a row, don't start here. Start by B column. You go into if you are deleting or you are adding, you have to start by B column and highlight up until you get to the Y and highlight, highlight. If you are adding few rows, you just highlight how many number of rows you are adding and click on insert and you say down. And it will insert three more rows. But there are a couple of things that you will need to do. You will have to go to column E up until column K and drag down so that then the calculation can follow. Right. Those are the steps. If you want to delete the rows, you go to column B. You follow the same step the same way as how you add the euro. You go to column B, you drag to the number of rows that you want to delete and you delete until and then it's going to say delete up because we're moving them upwards and there will be the number of rows that are remaining. Okay, so that's how you will add or delete rows. So now I need to add one, two, three, four, five, six, seven, eight, nine, 10. I have 10 rows I need. One, two, three, four, five, six. I have six, so I need four. I'm going to go here on the second one. One, two, three, four because I only need four rows and I need to make sure that I keep four rows highlighted. One, two, three, four. I click, insert, down, and there are my four rows. I'm just going to delete only X and Y columns in this instance. Only those ones except the row total. I'm not deleting that and everything disappears and I'm going to add the values. I'm going to add first the Y values and then I'll go to the Y values. So, 10, enter, 9, enter, 8, enter, 5, enter, 7, enter, 13, enter, 5, enter, 5, enter, 11, enter, and 13. So, I should have added all of them and then I have to easy check. Going up, I must enter them the same way and not mixed batch. So, 150 goes with 10. So, 150, enter, 130, enter, 120, enter, 5, enter, 105, enter, two, two, one, six, enter. The challenge is I cannot see the values I'm entering because I want to keep 71, enter, 71, enter. I'm going to the site. I've got 66, 66, enter, 172, enter, and 202, enter. Let me double check that I've entered all the values correctly. So, 10 goes with 150, 9 goes with 130, 8 goes with what? 20, 5 goes with 65, 7 goes with 105, 13 goes with 216, 5 goes with 71, 5 goes with 66, 11 goes with what? 72, and 13 goes with 202. I've entered the right data. I can just ignore that. Go to my column E, highlight that, highlight that until Y2 and direct the values. Now, I've got all the things I need. Just want to move this to the site. The values I'm interested in is the total because all this summation, summation, summation means total. Total means summation, adding up all of the values. So, the sum of all x values is 86, and this is my 86. The sum of Y is that, it's not what we're looking for. What we are looking for is the sum of xi minus the mean. So, it means somewhere I'm calculating the mean, and I'm not calculating the mean here. So, probably somewhere I'm calculating the mean, we'll get to it. So, the mean is calculated on column M, where is the mean? So, somewhere I'm calculating the mean, where is the mean? Our x bar, I am calculating there. So, if I want to clear that, I press escape, then it goes out. Okay, let's go there to this side. So, the mean is there for x bar, it's calculated here, which takes the x-vein divided by how many there are, because I have the how many there are, and the bottom there. Automatically it calculates them, because it just counts how many columns I have. How many rows, sorry, how many rows I have. Okay, so, let's answer the questions now, because now I've made sure that my templates have all the values. If you look at the top, it says the square of your x-observation minus the mean of x. That is the same as what I am saying here. The square of your x-observation minus the mean of x. The total will be the summation of your x minus the mean x square. All right, that is what it does. So, the answer is 8, 8, 4, 0. So, the answer is there. This says the sum of your x-observation minus the mean times your y minus the mean of y. So, here I'm calculating x minus the mean, which is this part, and y minus the mean of y, which is that part. So, I'm doing it in two parts, and here I am combining the two. I'm multiplying the two brackets together. So, the summation of that, which is the orange part, is the answer that I'm looking for. And let me to get my, I've got 31% left. But we've almost done, don't worry. Just give me some few more seconds. And then we'll end up here and then the rest we can do with them on Sunday. There's only two or three questions left. I think it ends in question taking. Okay, so that is the sum of that. So, that is correct, that is correct, that is correct. The slope is minus 2.80. So, if you scroll to your right, you will find first your intercept. Okay, so the other thing, and I'll just make it smaller. So, you will see there, the slope. So, this is the intercept. The slope, which is B1, you can see there, I'm also explaining, I'm giving you the formula. The slope, and I'm giving you the formula of how I calculated it as well. So, the slope is, the slope is 8.75, which is E. And here it says it's 17.489. And B0, which is the intercept, is this? It's minus 20.871, you can see there. It's the same, it's the same. So, that D is correct. B1 is the one that is not correct, because it says B1 is 8.75, unless it should be 17.498. So, the incorrect answer is E. That's how you use the template. Otherwise, you can go and calculate manually. The formulas are there. All the formulas are also on the template. If you want to go calculate manually, there they are. The formulas to calculate B1, B0. Otherwise, the formulas are also on the template here to calculate the B0s and the B1s. Or, if you want to continue and use the summations, you can use the summations. I see on this, it cut off some of the calculations. So, you can also use the nodes to see how we calculate the manually as well. They are on the nodes anyway. Okay, so that concludes today's session. The next questions, you can do them on your own and we can discuss them on WhatsApp. I will see you on Sunday when we get the past exam paper. I still need to find the exam paper, but otherwise, these are the questions. Or maybe let's do this question because it's very important that I explain this as well. So, with your regression line, you can estimate the value by substituting the value of X. Right? So, in terms of this question, the reading speed of Lenas and I don't know how to pronounce that. Baja primary is estimated by the following regression line. Y is equals to 11.3 times X, which is your slope of 11.3 plus 15.9 with X as the age of the Lenas. I can calculate the estimated reading speed of an 11 year old. So, therefore, my X here will be 11 years and choose the correct answer. So, it says where I see my X, 1.3 X plus 15.9, I must just substitute 11 and calculate what is there. Estimated reading speed of an 11 year old. What's the answer? We can even finish all this. 11.3 times 11. 114.2. 15.9. It's 140.2. We can take it to a whole number. It will just be 140 and we estimate it 240 with per minute. That is question 11. We can go to question 12. The reading speed estimated regression equation of Lenas at Mount View secondary school is given by this equation. The reading speed is equals to 20.3 times the age minus 60.5 with the correlation coefficient of 0.97. So, it means our R of 0.97. Which one of the following statement is correct? Number A, they are asking you to calculate R squared and interpret. So, if I have my R, you just say 0.97 squared. You just press the X squared button on your calculator to calculate the value. What is R squared? 0.94, which is 941. If I leave it to a decimal and multiply that by 100, it will give you 94.1%. 94.1% of the total variation in the reading speed. We know that the reading speed is Y can be explained by age, which the variation in age, which is X. Therefore, this is correct. That is R squared. That's how you interpret R squared. Number B, we need to interpret the slope. Where is the slope? That is the slope. What is the sign in front of the slope? It's positive. How do we interpret? One unit increase. If it's positive, increase. If it's negative, decrease. Then, yeah, increase in the LENAS age. Because if we increase the LENAS age, it will increase because the sign in front of the slope is positive. It will increase the reading age by 60.59. Because if, yeah, I put one, this reading age will increase by 60.1. Even though it is decreasing, but this is the intercept. It's not there. It's not going to increase the reading speed if it's negative. The intercept. But your slope is positive, so therefore it increases. That's another thing. Let's come back to that one. An eight-year-old LENA is expected to have a reading of three or three. Unless my sign here is wrong, it should be positive. So let's put the 18-year-old. So we just substitute where we see age. We put 18 and calculate this. That will be 18 times 20.2 times 18 minus 60.5. It's 303, which means this is correct. And G says we have, because this is positive, we have a strong positive linear relationship between the age. That is correct. And hence, I am still not convinced. But because of the slope, or not the slope, the intercept is decreasing, not actually the slope. The slope is positive, so it's increasing. But the intercept is decreasing. So even though the intercept is, the slope is increasing based on the, sorry, even though the slope is increasing, because the intercept is decreasing, it will decrease. It will not increase, but it will decrease your reading speed by 60.5. Because of the sign in front of your, your intercept. So based on this, that will be the incorrect one. It should be, it will decrease the reading speed by 60 because it's a minus. So that is the incorrect, the incorrect one. So let's look at the last, last, last one, and then we are done. Oh, why did I choose this one now? Okay, we will do this one on Sunday because then it forces us to use the template, but you can go and try. No, it doesn't forces you to use the template per se because they've given you the predicted value and the original values. So what you do is on the template, it's easy as well. On the template, let's go to the template. This is fairly easy. On the template, if you scroll down, down, down, down, down on the template, they, you must go to the left as well. There is this section. So now on your template, we are going to use our X and Y as the two values. So this will be our X and this will be our Y. That's what you substitute onto your template, even though it's your already your estimated value. But now this is what you do on your template. Because we're calculating SSR, and remember SSR is calculated by using the Y estimate and the mean. So if you put your Y estimate here, you take your predicted values and substitute them onto here. And you put your, sorry, this is not your X, this is your Y, my bad. This is your Y and this is your Y estimate. This is what you do. The only two values that you need to substitute are those two values. The Y values and the Y estimate, they will calculate this value. That's the value that you are looking for. 2, 7, 8 equals 3, 4, 3 equals 2, 7, 7 equals 2, 8, 5 equals 3, 1, 1 equals 250 equals 2, 9, 9 equals 3, 3, 2, 9 equals 2, 9, 6 equals 2, 9, 1 equal. Go to your estimate and put 2, 8, 4, 0, 7 equal, 3, 4, 6, 7, 9 equal, 2, 7, 0, 8 equal 284.47 equal 284.47 equal 284.47 equal, how many there? 123 so we now on 294 294.85 equal and 315.63 equal something wrong 315.3 equal and 305.24 equal and 284.47 equal the only thing that we need is SSR which is this color I just need to make it bigger and the SSR is 23 what did I do what am I doing that is so wrong here oh I didn't calculate the mean where am I getting the mean I'm getting the mean from somewhere which is not the mean that I'm looking for I'm taking the wrong mean so I need to change that we need to change empty empty which is something that I didn't take into consideration so that will be this divide by 10 and we can do that and then I just need to change this and I'm just going to change that with a dollar sign and because I need to load the sale but it doesn't change for all the views so direct direct direct direct direct there might be something wrong that I'm doing are they 10 10 as you like 33 is incorrect 27.08 should be 27.274.08 it should be 274.08 thank you yeah pleasure and then the one just above is 346 not 246 oh yeah three thank you for picking that up okay and the answer is 4197 maybe they still somewhere where I wrote something wrong you will have to double check my numbers as well to 284.47 346.79 274.08 284.47 284.47 294.85 315.63 305.24 and 284.47 okay and this side 278 343 277 285 311 250 299 329 296 and 291 I still get still get that 41961 or 96.4.74 which if I round it off to it's none of those ones but it's close to one of them which is D it's similar to D D D D D D it's similar to D I'm just curious about that then now okay and that concludes today's session and we have done all the revisions I see there is a hand in your website thingy under the revision section for assignment five you you literally you put in the the one that we usually wrote on unicell with the answer and I will explain that just now wait I will explain that I want to ask if you can share this one that doesn't have answers so we can practice without answers when I'm saying I'm gonna discuss that because that is somebody else's who was helping me get their questions for you guys I didn't check that before I shared that with everyone because her name is on there as well right and it might get her into trouble remember no assignment oh I also don't want to discuss it right here now on the recording are there any questions relating to the content let's get that one out any questions are we good then I can stop the recording so we can have our family meeting and then we can talk about that