 So I will have to go jump all the slides and go to. We already did all of them did answer. They look like we did answer a lot. We did answer this question, did we? Yes. And this one is wrong. No, we didn't answer this one. Yeah, we haven't done this one. Hang on. Yeah. Yes, we did. The answer was one. Yes, we ended up here. So we continue with where we were. So Italian Maba II final project is to determine if specialist type and gender of children are independent of each other at 5% level of significance. Consider the contingency table below. And the contingency table does not have the totals. We need to choose which one is incorrect. So the first thing you need to do before you start even answering the question, please complete the total table. Then you can give me the values as we go along. Quickly do it on your side. And then I will ask you to give me the values. OK, are you done? Do you have the values, the total for speech therapist? 120. For neuropsychologist? 60. Psychiatrist? 40. And the grand total will be? 220. For boys? 165. Girls? 55. OK, so we're looking for the incorrect statement. The null hypothesis is that the specialist type and gender of children are independent. Correct. The alternative hypothesis is that the specialist and gender are independent. Correct. We need to find the expected frequency for boys and speech therapist. So it's for 90. So we need to do row total, multiply by column total, divide by the grand total. So what is the row total for boys? Oh, sorry, for speech therapist, you need to go and find 120, multiply by 165, divide by 220. 90. Is it 90? Therefore, that is correct. The observed frequency for boys consulting with a speech therapist is 90. Is that correct? Remember, observed frequencies are the values given. That's also correct. That is correct. Number five, the degrees of freedom. We know that we need to say the number of rows minus one times the number of columns minus one. Sorry, can I just ask a question? Yes. I'm sorry, I have a lot of questions to ask. The expected frequency, I have this formula, but with the observed frequency, you still use the same formula or what? There, sorry. With the observed frequency, the formula you use. Observed means the data given, your actual data. OK, OK, no, that's the question. I'm sorry, thank you. Yes, no problem. Number of rows, how many number of rows do we have? Three rows. We have three rows. That's where my A rows are, three minus one. The number of columns, those are the columns. You must not count the totals. The number of columns, there are only two of them. Three minus one is two times two minus one is one. The answer should be two. So this one says they are three degrees of freedom. The incorrect answer is number five. OK, so this is exactly the same as the previous. 120, 60, 40, I'm just going to copy them. 120, 60, 40, and that was 220. And this was 165. And I forgot now, lazy to calculate with my height 55. Number one says we need to test the value. What is the value of our test statistic? So it means we need to go calculate chi-square-state. And here I lied when I said we need to use tables. I think for chi-square-state, you need to go to the Excel table and go use that. So chi-square-state, because this formula is going to be very long if I use the formulas. So that is the formula to calculate chi-square-state. Come on, what am I writing now? Observed minus expected squared divided by the expected. So we need to calculate all the expected values from here. I'm lazy as well. It's zero. So you already went and calculated it. Lizzie, you can use the long way with the formula. I'm going to use your words. No, I mean, like now I am going to use the tables. So let me just open them. Chi-square-state, for chi-square-state, it's going to take you forever. So this was a 3 by 2 table. So it's the second one from our template. So we just put the values on 90. Lizzie, wrong one. You need to go to the last one. That's a 2 by 3. So it's the one at the bottom. The 10 by 2. It's a 3 by 2. In the exam, when they give you more than the ones that I gave you, I don't know how you're going. Those who don't know how to use Excel, I feel sorry for you. So you must come for Excel lesson before you go right. If you always use Excel. Pardon? If you always used Excel in the previous exams, because I'm thinking now. No, they don't use Excel because the previous exams, you would have went and wrote in a venue. That's the question I wanted to ask. Since you're writing online and at home, I think you can use any resource you have at your disposal to assist you. So we really like this. Also take time as well. So 90, 30. I'm teaching you things that I'm not supposed to. So I'm going to be in trouble one day because I'm making you. We will deny Alice. Because these videos are published. So people can watch them and they will be like you. This one shows them how to cheat. OK, so that is our test statistic is zero. So therefore it means already we have answered option 1 and option 2. I'm sorry. Option 1, option 2 and option 3 are out. So the only thing that is left now is to go find the critical value. So finding the critical value, we need to use our degrees of freedom. I'm sorry, degrees of freedom. So our critical value, our alpha here, they told us is 5%. So it's 0,05. And the degrees of freedom, we did calculate it previously. Remember, it was 2. So we go and use 2. The other thing I don't have is my table open. OK, it's open. I don't have a table open. So I just have to go and open one of the STA. I'm sorry, I'm going to open anyone that I see. But remember, we're using the template from your lecture that I sent you. So on this one, let's see. And the tables here at the end, yes. So we're looking for the critical values of chi. All the chi square critical values. We need to go to table E4. So we're looking for the degrees of freedom of 2 and alpha of 0,05. That is the critical value. So option 4 will be the correct one. Any questions? OK, no questions. It seems like today we're going to finish early because I think all the questions will be done. OK, similar question. I'm not going to repeat all the values that we have there. So we can just use because they're asking us to make a decision. They say, what is the decision with regards to the hypothesis and the conclusion about the two variables? So remember, we set our test statistic is 0 and our critical value is 5,911. So in order for us to make this decision, we can just draw our self a diagram that helps us. I'm always not drawing it as a left skew because I don't know how to draw. I've never been good with art. And yeah, we said our critical value is 5,991. That's what we got as a critical value, right? 5,991. And we said our test statistic is 0. So if our test statistic is 0, it will fall in anything that falls this site. We say we reject the null hypothesis. Anything that falls this site, we say we do not reject the null hypothesis. So our critical value of this is our critical value. It's 5,9. Our test statistic, we did calculate what our test state is and we found that it was 0. So it will fall on the white area. So it falls in there, do not reject. So how do we conclude? Any way it says reject, we're going to say that is incorrect. So if we reject the null hypothesis, we reject the null hypothesis, those two, we know that they are not correct. Come on. My screen is very sensitive today. Anything I touch, because I'm using a touch screen, must bear with me when there are so many things happening. So we left with option 3 and option 4. And option 5, we do not do this. We do not say we, oh, come on. We do not say, oh, you're right. We're going to finish next year with all these things that are happening. So we do not say we accept. We never say it like that when we work with hypothesis testing. So anyway, when you see a statement that we accept, you just know that that is the wrong way, statistically the wrong way of stating the conclusion and the decision. So option 3 or option 4. Option 3 says we do not reject the null hypothesis and conclude that the specialist type and gender of children are independent. Number 4, yes, sorry. Is it not saying we do reject? Where do you find your word not? It says we do reject. There's no not there. Then there is a thing. We assume it's a typo then. Yeah, you can zoom in. You're not going to find the not. There is a typo here. So it should say we do not, we do not. It's we do not. So there is a typo. I hope in the exam that you don't find this kind of typos because this question, I got it from the 2017 tutorial letter and I acknowledge that most of the tutorial letters and especially the questions that we use from tutorial letters, they've got lots of typos and lots of errors. But let's fix them and then we move on. So number 3 says we do not because and conclude that the two variables are independent. And the first one says we do not reject another hypothesis and we conclude that the specialist type and gender of children are dependent. Which one is correct? We had this discussion last time. So which one is the correct one? Number 3 or number 4? I was so confused with the discussion of last time, but I'm still going to say number 3. Yes, number 3 will be the correct one because you need to always refer it back to the statement. So the other thing you need to also be aware of is when they do the conclusion, sometimes they can use the null hypothesis to make the conclusion or sometimes they can refer to the alternative to make it conclusion. So with this statement, it refers to the null hypothesis because if we're not rejecting the null hypothesis, then it means we're saying that the null hypothesis is true. And therefore, we do not reject and conclude that the specialist type and gender are independent because the null hypothesis, we're not rejecting it. It is true. There is a relationship between the two variables. Alternative, yeah, sometimes they could say we do not reject the null hypothesis and conclude that there is no sufficient evidence that specialist type and gender are dependent. So they could take the alternative, which talks about the dependent site and use it. But in the way you make your decision, you'll have to show that you're not saying that they are that. That you're not saying your alternative is true. You're saying because the alternative is not true. So this one was straightforward. They took the null hypothesis testing statement to make the conclusion. If they would have read it from the alternative, it would have said, we do not reject the null hypothesis and conclude that there is no sufficient evidence. There is no sufficient statistical evidence or there is no statistical sufficient evidence to conclude that specialist type and gender are dependent. So that null sufficient tells you that it also says there is a relationship or there is no relationship because then we're saying they are independent. Let's go to the one that we ended up on, which is exercise nine. We did exercise nine because we worked out and we found that eight, eight, one, and we said only that was not the correct one. That's where we ended up the last time. We need to be here. And I think we did also do this one. Did we? Yes, we also did 10, remember 10. I said we interpret R squared. If we can revisit that, we interpret R squared by saying the total variation in Y is explained by the variation in X. Or we can say alternatively, we can rewrite this whole sentence by saying the variation of X explains the total variation in Y. One and the same thing. But written in another, that's English, which is my third language in South Africa because my first language is one, then come not suit, then English, and then others. So in my third English language, this is how you can rewrite your English sentences. So we also looked at this and we said, if they gave us that the coefficient of correlation was 49 and they want us to explain the state or give the statement that explains that 49, anything that has a 70 or anything that does not have a 49 will not mean, or will not interpret the 49. 51 can interpret 49, but we're not really interested in that. We're interested in the ones where they explain the 49 percent. So the first one said the exploratory variable explains 49 percent of the variability in the response variable. The other thing that you also need to remember is what exploratory is, is your X and what your response is, is your Y. And the other one says the response variable for 49, or the response variable explains 49 percent of the variability in the exploratory pattern. So this one says Y explains X. And going back to the description of how we describe the coefficient of determination, X should explain Y. X should explain Y. Y should be explained by Y. So there is, in English, we say the present tense and the past tense. So that's how the sentences look. So the correct statement here will be only option number one. As we have said it last week, I just wanted to make sure that we're still on the same page. All right. Lizzie, just sorry, go back one, please. Just go back. So if option number three, instead of having the word explains and it said explained and they changed it around, let's say they made both explained and option three would be the right one. Yes. Okay. Yeah. So that is what I, that's why I've seen in the present tense we'll use explains and then in the past tense we will, it's something that happened. So we'll use explained. So these statements are written in the present tense. So, for example. The catch phrase is explained and explains. Okay. Let me also clarify number five because number five also can be correct in a way, but it's not correct in this instance. It won't be correct if, because number five says only 51% of the variation in the independent variables is explained by the module, right? If they could have said only 51% of the variable or the variation in the independent is not, if there was, is not explained by the model, that statement would have been correct because it will be the other variation that is not explained. It's accounts for the difference from a hundred percent. Yes. Yes. Correct. Okay. I hope your exam won't be this difficult. Now let's go and explain the slope because this question says suppose that the least square regression line for a random sample is y hat is equals to 15.50 which is our intercept minus 0.69 which is our slope times x which is our independent variable. Then the slope implies that. Remember how do we interpret the slope? Okay, so teaching moment again. The slope remember is the change in the values of y. So the change in the values of y divide by the change in the values of x. So we say every one unit, so if I move from here to here, I can also calculate the slope of that point. So we say for every one unit, so this one because it's positive, for every one additional unit increase of your x values, it will yield an increase in the value of blah, blah, blah, of your y hat, your estimated value of y. That is for a positive slope. I hope you do get me. Because this when x increase in one unit, there will be an increase in one unit. An increase of the slope value. So if the slope here, let me not give you the right, the correct answer. If the slope here was 0.54, we would have said for one unit increase in the value of our x, it will yield an increase because it's a positive. It will yield an increase in the value of y by that because we are adding or subtracting. So when the slopes, because we are adding, it will yield an increase of that much. If the relationship looks like this for the y and the x, we still say for every increase unit of x because we can also still calculate the slope from there. For every unit of the changes that happen between this y value and that y value for that point, what attributed to it will be the increase of this. So with this one, when x is increasing, y is decreasing. Because yeah, the slope is negative. This is a negative slope. And therefore if it's 0.54, we would say for every one unit increase of the value of our x, one unit increase of the value of x, it will yield a decrease of this much from the value of y of 0. whatever the value is because the negative is taken care of, that decrease. So you have y-health is equals to 15.450 minus 0.69. Which one of these statements is correct? Remember everything that I was discussing here is the slope. The first value is our intercept. The slope is the value that multiplies with the x. You must always remember that the slope is the value that multiplies with the x. In order to describe your slope, you need to look at the value that multiplies with the x. So it means option one, option four and option five will be out because they are interpreting the intercept. So you are left with one, two and three. And number two. Number two, it will be number two because number two says they will be a decrease because the slope is negative. Number one says, actually I think number one and number two, oh, they are different. So number one says there is one unit increase will yield an increase. So therefore it means they referring to the slope being positive. It would have been the slope is positive. But we know that our slope is negative. Number three, it says when decrease and increase, we never do it that way because with the slope, we always talk about the unit increase in the value of x. You'll never say decrease, increase. It doesn't make sense. So number three, also incorrect. So the only correct answer is number one, as I've just explained. For one unit increase in the value of your x, because that's what we use to estimate what the value of our y will be. For all unit increase in the value of x will yield a decrease of 0.69 in the value of our y. So it means we're going to decrease whatever the value of y we are estimating is going to decrease by 0.69. Just a question. Yes. On the previous one, yes. I'm kind of confused, I'm sorry, but I'm just kind of confused. But the 0.69, you looked at the slope, this figure, 0.69 x. So is that the one that you just need to look at? Yes. So tell you how much it's decreasing by, the y value is decreasing by? Yes. Okay, so if this was a positive, meaning that when it's a positive, the graph, the y is actually increasing, right? Yes. Yes. So it's fine, I understand now. Yes. So these two pictures should just give you an idea. Okay. Okay. So when you're slow? It's just that I heard you saying, you're talking about minusing something from some, so that's why I went. Because when your slope is negative, it means whatever you are estimating, the value of your y hat is, remember if your y hat was zero, oh sorry, if your variable here is zero, therefore it means your estimate will be 15.50, right? Yes. So your y hat there would have just been 15.50, but if your y hat here, oh sorry, your x here is y, is one, is one. Therefore it means on that 15.50, we're going to subtract, because it's one times zero comma 69. We're going to subtract zero comma 69. So it's going to decrease the value of y. Okay, okay. You understand? Yes, I understand, by zero point 69. So if it was positive, we're going to increase the value of five. Okay? Okay, yes, I understand, thank you. All right. Listen, before you move on just a hypothetical question, would they throw in for a two-unit increase in x? Would they ever do something like that? No, when we interpret, we always interpret by one unit increase, because we know that you remember, we know that you can estimate the value of your, your value of your y hat, you can estimate it. But yeah, we're not saying estimate what the value of your y hat will be. We say interpret what the slope tells you. That's all what they want. What does the slope tells you? So for every unit increase, one unit increase because we're going to just say if one. So if they say interpret this slope in relation to the unit increase of eight, then it means you need to solve this whole thing. And not the whole thing, anyway. You need to multiply eight times zero comma eight 69, or zero comma 69 times eight, and know how much your value of your y will increase or decrease by. Okay, I got that. You answer that. Thank you. All right. Now, let's interpret either the coefficient of correlation. I think that's what they want us to do. Remember your coefficient of correlation r lies between negative one and one. And remember there is either a perfect relationship or a moderate relationship. And when r is zero, there is no relationship. So you need to remember all that. So based on what you know, which one of the following statement is incorrect? We can read all the statement and then we're going to choose which one will be right. A value of minus zero comma eight listed as a coefficient of correlation r indicate an inverse relationship between two variable X and Y. Remember we're looking for the incorrect one? We will come back. I'll just read out loud. A value of minus 1.4 listed as a coefficient of correlation r cannot indicate an inverse relation between the two variable X and Y. And I'm going to underline the that. In simple linear regression, the coefficient of correlation r and the least square estimate b1, which remember b1 is your slope, of the population slope where they will go. They even explain it of the population slope must have the same numerical value. Number four, if all points in this catapult lie on the least square regression line, then the coefficient of correlation must be either one or minus one. Number five, an indication of no linear relationship between two variables would be a coefficient of correlation of zero. Okay. We can start from the bottom and we go. I will do a mix, a mix, a mix, a mix, a mix, mix masala. So I'll pick and choose the statement we can look at first. Number five, an indication of linear relationship between two variables would be a coefficient of correlation of zero. Is that correct or incorrect? So yeah, they are saying r is equals to zero. When r is equals to zero, we say? There's no relationship. No relation. So number five is correct. Number four, it says if all points on this catapult lie on the least square regression line, then the coefficient of correlation will be one or minus one. What number four is saying it says if my dots are like this, then the least square regression line will just be there. Because then this, my r will be equals to one. And if my points lie like this, then my least square regression line will be there and my r will be equals to minus one. That's what number four is saying. Sounds correct. That is correct. Now, here is where you might feel like, oh, I don't know what this, okay. So when they include words like inverse, it's just the opposite of the positive. So we know that the positive or the negative. So one is the inverse of the other because if I look at this, if I flip this, it will look like the inverse of the other one. So if I take these two graphs, one is the opposite of the other. One is the inverse of the other because one is negative and one is positive. So that is those inverse. What do they mean? But we need to find out whether the statement as they put it is correct. Number one, a value of negative zero comma eight listed as a coefficient of correlation indicates an inverse relationship between two variable because it says when one increase, the other one decrease. Is that correct? Correct. That is correct. Number two, I highlighted the words cannot because it says a value of minus 1.4 listed as a coefficient of correlation cannot indicate an inverse relationship between the two values. Is that correct or incorrect? I would say correct. It is correct because we know that the value of R must be between minus one and one. So this statement, if they didn't include that statement that this would have been incorrect because we cannot have a coefficient of correlation of minus one point or a value bigger than one. So the only statement that is incorrect would be number three and it reads in simple linear regression the coefficient of correlation R and the least where estimate B of the slope B1 of the slope must have the same numerical value. They will never have the same numerical value because the slope, we calculate it using the X and Y values because if I only have my X values of minus, let's say one and two and three and four, if I calculate my X value and then my Y value, I have zero comma eight, zero comma. If I calculate both of, they will not be equals to the regression line. And even the formula we use to calculate B1 and the regression line, remember from the sum square measure formulas, let's go to, let me open again. Let me open the regression line, sorry. Let me open this so that I can demonstrate what I mean. So remember also the formulas are different. So the formula to calculate the slope looks like this, the formula to calculate the regression line looks like this. So both of them will not be the same. Why? Because the coefficient of correlation also use the Y values. So if you look at this, they look exactly the same even though I've written it, they are written differently because they just multiplied N with the summation of X and Y, which this, you can rewrite it as that. And this is the value here. But on the coefficient of correlation, we also divide by the square root. Whereas on the slope, we don't divide by the square root, we just divide by the summation of X, sum square measures of X. Whereas here we divide by the square root of the sum square measure of X and the sum square measure of Y. So they will never be the same. And that's how you can answer that question or know how to answer that question by just looking at the formula of the coefficient of correlation and the slope. They will tell you whether you will get the same answer. If you get the same answer, it will be by chance, but usually you will never get the same answer. It will not give you the same numerical value. Consider the following data from the variable used X and Y. Which one of the following statement is incorrect? And I said I'm going to use a formula, sorry, a calculator on this one. So I'm going to use my Casio calculator when everybody else uses whatever the mechanism that they can use to answer the question. So I'll first put the calculator to state mode by pressing mode two, two again, two twice. So you press mode on your Casio calculator and then you press two and then press two again. Now ready to capture the data, eight equals, four equal, 12 equal, 16 equal and nine equal. And this is taught to the grade 12s. So I've just offered a class on Friday on the same topic to the grade 12 yesterday. Five equal, three equal, seven equal, six equal and five equal. Yes. Can you also, you can just use the table, right? The spreadsheet that you gave us. Yes, you can use the spreadsheet. Instead of using a calculator. Yes. Okay. So I said today I'm not going to use the spreadsheet, I'm just going to use the calculator. So that those who don't know how to use Excel, at least they've got a backup. They can use their calculator as well. So, and you can also use the formulas if you want, but it's going to be time consuming to use the formulas. So we've captured our data onto the table. I can just double check. I've captured all my information and I can go on and off. So the first one says I must find my B zero. Then I need to know that my B zero is my intercept and my intercept. This is, this is intercept and my intercept on my calculator is the A. So I can go to my calculator and say shift step, which is button number one and go five and press one and equal. And that should be the answer to my A. So A is correct. So I need to find my B one, which is my slope and on my calculator B one is B. B one is B. Shift step five and two and equal. It says two zero point two seven five, which is correct. So that is correct. Number three says I must check if this is correct. So it means if I take both of my values, Y hat is equals to A plus BX. So my equation should be Y hat is equals to my A, which is two point five zero seven plus because my slope is positive. So it will be plus. If my slope was negative, it would have been negative then it zero comma two seven five. Thanks. So I need to take this and see if it looks exactly the same as what they have. So that is the incorrect one because- They swapped it around. They swapped them around. Now I can also find my R because number four says the coefficient of correlation R. Remember in the exam, you don't have to go through all of them. You just choose whichever one you find correct and move on with your answer. So since we're doing practice, I'm gonna go through all the questions because that's how you will learn by doing. Okay, so let's calculate R. We go shift, set five and R is number five and I press equal. Ah, I didn't press three, it's three. So shift, set five and three. I pressed five instead of three equal. And my R is zero comma three, three, two, five, which is three, three. And that is correct. Coefficient of determination will always be positive. Yes, because is the square of your negative R value will be positive. Square of a positive will always be positive. And you can use the Excel spreadsheet. Remember on your Excel spreadsheet, which I'm not going to go into to answer the question. On the Excel spreadsheet, when you want to delete, just highlight the column and delete and right click and say delete and you move up, delete up. If you want to add, you just go to anyone, depending on how many rows you want to add. If you want to add three rows, you highlight three of them. Or if you want to just highlight two and insert and say insert down. So then it will shift them down. And once you have captured your data, let's say you've put in the data and the calculations doesn't populate for the values that you are calculating. Like those ones there, it's blank. You just go to the top one and drag. It should calculate. It will just do the same thing as it has been doing because you're just copying the formulas. And your Excel will just do the calculations and you can just come here and look at the values that you're looking for. For all the calculations. So it will show you how you calculated all of them. So we move on to 14. Which one of the following statement is incorrect? I'm gonna give you time to think and then we will look at it together. Remember you're looking for the incorrect statement. Hey, are we done? Do you have your answer? I'm going for number three. You going with number? Three, me too. You going with number three as your incorrect answer? Yeah. Yes. Are you all agreeing with number three? Who else has a different answer? So I assume all of you say number three. So if you are writing an exam today, let's see how, and this question is out of four marks. Let's see if you're getting the full mark, or you're getting a zero. Ms. Liss, can I differ? R can be a negative, because if you make it a determination with R squared, it will remove that negative to be positive. Okay. It's also true. Okay. Yeah, I also saw that one. Okay. So are you all changing your answers now? No, I have two answers that are right. I was going for number three, number four. Okay. You're putting anti-depression on us. No, I just want, guys, I just want to know if you understand the work. It's not, there's nothing wrong, because we're still in the practice sessions. We're still learning. So there's nothing wrong. So we're still going to get a lot of the questions incorrect, but we're going to learn from that. So if I'm allowed to change, I will start with, I will go with what the other lady said. I'll first go with number four, because R can be negative. Okay, good. Now, let's see if you're getting a four or a zero in the exam. Number one, we know the definition. I'm not going to go whether this is correct or incorrect. I'm going to give you the definition and you can make up your mind as well. So number one, the coefficient of correlation, it's in the definition that we went through. It measures the strength and the relation. And it measures the strength and the direction of the relationship. So yeah, they already, they saying it measures the strength. So it means it's correct. Even if they could have added also, measures the strength and the direction of the relationship because it tells you two things when you look at coefficient of correlation. Is it strong, perfect or moderate? That is the strength. The direction, is it positive or negative? That gives you the direction of that relationship. So that is one. Then number two, coefficient of determination, that's just the one to know if you know what coefficient of determination parameter looks like. And that is R squared and that's correct. So number one and number two are correct. Number three, remember, I know that in the definition that I gave you, I also said X implies, oh sorry, X explains Y, but you must also remember that. This model is still talks to, is still, the model is still reliant on the X values in order to estimate what your Y value is. So therefore, it still says with whatever the model is, it will be, it will explain what happens to the Y hat. So yeah, it says your R squared. So we can do the definition of R squared do the definition of R squared in three ways. In the first way that I showed you how, or we can do it in relation to the regression line. And this says R squared of 0,70 implies that 70% of the total variation is explained. And this can be explained by the total variation in X or it can be explained by the regression line because the regression line has the total variation of X because of that, the estimation that the X value will do in order to get your Y estimate. So you can explain it using, it explains by, sorry, by the regression line. Or you can say it is explained by the variation in X. Both of them say one and the same thing because I could have just ignored all these regression line and only relied on the X because it tells me the variation in X. And that's what the regression line is. The least square best fit line has the variation of the values of X. So option three is also correct. The word explained threw me out. But that, let's go back. I was thinking about the word explains, yes. I was looking, yeah, you see. It was the other way, yeah. No, no, I understand now. I didn't have your, I didn't have the variation in Y. I just saw X. And I knew it would affect the whole equation. I didn't have these group notes when we got to that question. You must always have your notes. No, remember now. Okay, so R can never be negative. Hello. We just said it. R can always be between negative one and one because it can be between those values. So R can also be negative. So this is the one that makes this question incorrect. If the coefficient of correlation R is positive, the slope should also be positive. And that's the other way of finding out whether your R, your slope or your correlation or your relationship is positive or negative is by looking at either the R or the slope because they have to have the same side. Lizzie, it's amazing how the human mind works. With that number four, my head was still stuck on the R squared at the above number three. So when I read number four, I'm looking at R squared can never be negative which is correct. That's, you know how the mind works. Yeah. Because they used R squared and then R squared again and then you come, yeah, you still have the R squared. So you need to read your questions carefully, relax when you are in the exam. You had us under pressure. Don't panic. Let's look at 15. A production manager has compared the dexterity score of the five assembly line workers with their hourly productivity units per hour. The least square regression equation is calculated at 19.2 plus 3.0x. If a job applicant has a dexterity score of 15, his predicted productivity per hour will be? Option three. Option three. You just substitute where you see x, you just put 15 and wake it out. So you will have y hat is equals to 19.2 plus three times 15, which should give you 64.2. It seemed like this we did too. Why it looks so familiar? We did it. A player to being hired, the five salesperson for a computer store were given a standard sale aptitude test. For each individual, the score achieved on the aptitude test and the number of computer systems sold during the first three months of their employment are shown below. So here we have our x as the score on the aptitude test and y being the unit sold. So those are the sales values in the three months that they have been employed. So they also gave you some square measures and they're asking you to calculate the coefficient of correlation. You can either use the formula, which is r is equals to n times the summation of x and y minus the summation of x times the summation of y divide, oh, I must not divide by n again because I'm multiplying n there. Divide by the square root of your n times the summation of x squared minus the summation of x squared divided by, I'm not supposed to divide by n because I'm multiplying there. The first one with n times n times the summation of y squared minus the summation of, why? Squat, you can use that or you can go ahead and go to your calculator or your Excel spreadsheet and do the calculations. So remember with the formula, you just substitute all the values that you see there. So let's see. So when you're working with your calculator and I already had some values on my calculator, I need to clear my calculator because now everything is stored. So I can just go in here. I'll use one to clear all of them. Reset and then I will press the AC to reset. Oh, but then that reset everything. So I must just start again. I must put the values onto the table. E equals 70 equal, probably you already have an answer because I'm explaining, let me just go up. 25 equal, 15 equal, 10 equal, 40 and I've entered the wrong data. I've seen 80, let's go down. 35, there is my wrong answer. So I must just enter the correct one. Enter, 90, 20. There we go, go out. We're looking for R. Shift, set, five, five, R is equal, which is option number, number five. So if you are using your formulas, your N, there are one, two, three, four, five. So it will be five times 7,200 minus, 305 times 95, divide by a square root of five times X squared, is 21825 minus 305 squared. Times, I must close the bracket, times five times 2575, close bracket minus, my Y is 95 squared. Let's calculate it manually because on my calculator, I found that it's 0,892283. And because my calculator is not dropping off any, I got that. So let's see if I use the formula, whether I get the same answer. Let's see, let's use my, must go back to met mode. So I must go to comp one. Fraction, we have five times 7,200, close bracket minus, open bracket, 305 times 95, close bracket, divide by the square root of, the square root of open bracket, five to open bracket, 21825, close bracket minus, 305 squared, close bracket, open bracket, five times open bracket, 2575, close bracket minus, 95 squared, close bracket. Do I close twice? No, once. And let's hope it works. Yeah, I get the same answer, 0,892283. So you can also go and use your Excel. And hopefully you will get the same answer as me. I said, I'm tempted to use the formula, use the Excel, I'm not going there. Okay, for this one I can use Excel. Hey, it's very long. How does that work now? Just a minute to go. It's gonna take me forever to punch these values on the calculator. And if I go wrong, I'm gonna take me forever. So, which one of the following statement is incorrect? So I'll use Excel on this one. So you can just determine which one you use Excel and which ones you don't. So let me use Excel on this one. Because I'm using Excel, how many? One, two, three, four, five, six, seven, eight, nine, 10. So we have 10, wake us. So I need 10 rows. So at the moment I have seven. So it means I still need to add, I need to add three more. So I'm just gonna go to the ones at the bottom here. And add the three rows, highlight the three so that I can just add inset, three rows, inset down. And it's gonna take us a longer time to complete. 34, 39, 42, 21, 53. Let's see, I think we did this on Wednesday as well. This question. We did. Yes. It means I'm repeating some of these questions. Yeah, this was a Wednesday when I remember it because I wasn't sure about the mean of X and Y. I was looking at the wrong place on the Excel. That's how I remember it. But you can maybe still do it for the folks who didn't join on Wednesday. 60. So we can do a go there. 140, 393, 161, 183. Elta is not waking. So I've been typing in the same column forever. Okay. 140. Probably I'm pressing shift in state of enter. Okay, I must make sure that my other finger is on the right button. 293. 293. Mystique with your calculator. 161, 183, 179, 221, 170, 223, 214, and 241. And I should be able to check my values if they match what they have on the screen. Then I will know that I have the right answers. The sum of X is four, seven, five. The sum of Y, they match exactly on those ones. So let's answer the questions. The mean of X and the mean of Y. So those who are using manual calculators and they don't know how to use Excel, easy. For the mean of X, you just take this value divide by 10 because they are 10 values. It should give you the mean. So four, seven, five divided by 10 is 47, five. Divide by 10 will give you the mean of Y, which will be 202, five. But that's how far I can get in terms of those ones values. So let's go look at the other options that we have. So these are the values that you can also look at. So the mean of X is 47, and you can reduce the decimals if you want. If you want to get exactly the same as what they have. They left their answers at one decimal or two decimals. So this is at one decimal. And the others are at two decimals. So you can just adjust the all decimals if you want this also. Or just adjust those ones. Okay, so if we look at this, oh, they are at three decimals now. I have just adjusted because I was looking at this. Any way, the slope, exactly the same where I am. The intercept is 2.514, that's exactly the same. The regression line, it should read 2.54, 2.514, plus 4.2, it doesn't say that. So that will be the incorrect one. There is a strong positive relationship just because they said strong, by looking at the slope alone, it will not help us to know whether is that a strong or a moderate or a weak or a perfect. We can only see from the slope that it's a positive relationship. In order to answer that strong, we need to look at our R. And our R is 0.98, so there is a strong positive relationship because also the R is positive. Okay, given the Kendi bar manufacturer, they have the price and the sale, referring to the table, what is the percentage of total variation? You see, yeah, what is the percentage of total variation in the values of your Kendi bar that is explained by the regression model? So what are they asking us to fight? Pardon? Let's go one back, please. Why do you want to do that? Why did you say Y is 4 wrong? You said the slope is that, so your equation of a straight line should be, or of the regression line should be your intercept, since I'm using the values that they have, intercept plus the slope, slope X, so that then you know what you need to be substituting. So let's substitute the value. The value of the intercept is 2.514. The value of the slope is 424.210X. Are they the same? Thank you. Thank you. So coming to number 80, what are they asking us? The percentage of total variation in the Kendi sales explained, we've dealt with this, so it means even when you go to the exam now, the only thing you would have masked that will be this section, this path of the regression line. What are they asking you to calculate? So you need to get your coefficient of determination answer multiplied by 100 to get that percentage. Thank you very much. So multiply that by 100. So that is what they are asking you to calculate. So if you're using your Excel, it's already there, oh, come on. It's already there on the Excel, so if you're using Excel, so my fellows who are with me today on the calculator, you're on your own. So let's go ahead and use our calculator and capture the values. So 1.3, I'm not going to keep the zeros at the end because it will still keep it as 1.3 on the calculator, especially the way it's got last path is zero. So 1.6, 1.8, let me show you. You see, it keeps only 1.8. So typing 1.80, it's just a waste of time. And also with the two, you just type two, 2.4. And the last one is 2.9. Then we go up, up, up, up, up, up to number one and go left and then 100. So here you have to type 190, 90, 40, and 38 and 32. It should be six of them, one, two, three, four, five, six. Go out, shift, set R5, and we go find first R. And we go three and equal, that is our coefficient of correlation. In order to find the coefficient of determination, we press X squared and press equal, and that is our coefficient of termination. And then we need to just multiply that by 100. And that gives us 78.39%. Sorry, Lisi, can I just ask you? Yes. Times it by 100 because we say, what is the percentage, right? Yes. Yes. Okay. So it's also a variation in the R squared. So R squared, so if we take R squared, lies between 0 and 1. Okay. Or we can say it lies between, if it's R squared percent, it says it lies between 0 percent and 100 percent, if it's like that. I don't know what I'm thinking. Number 20, and I think this is the last question. The slope represents, what says the predicted value of Y when X is equals to zero. Number two, it says the estimated average change in the value of Y per unit change in the values of X. Number three says the predicted value of Y. Number four, the variation around the line of regression. And number five, predicted value of X when Y is equals to zero. Is there only one correct answer? There is only one correct answer. So you need to know how to interpret the slope. We did this. Remember, that's the slope. How do we interpret the slope? Let's represent a change. It's option two. It's option two because the slope is the estimated value on Y. When there is then one unit increase or one unit change in the value of your X, that is the slope. And I think we are done for the day to so far. There is nothing more that I can offer. I've done as many as I can get. Is it possible to go to the spreadsheet for this one? Unit 11. The Excel spreadsheet. Okay. Do you want to answer this question on Excel spreadsheet? No, no, no. Sorry, I don't know. I'm asking, can you open up the spreadsheet because I wanted to ask you something that's on the spreadsheet. Okay. Yes. Okay. Can you just go down a bit? Well, I don't understand for me this part here. Can you just briefly explain to me? This one, yeah, at the bottom. Yes. Okay, so I must take back this to at least, I think there were six or five, I'm not sure, in order to align them. I'm just going to take it up a little bit. You see, I left my measures there. So I need to go back and fix and delete because I want to align them. So, let's see, are they back to one line? Yes, they are. So, yeah, at the bottom. Yes. On the content slides that we used, there were questions, there is a section where we talk about whether using the SSEs and the SSRs to calculate your regression line. So you can use them or not. So they were also formulas on how to calculate them. And this also, yeah, so in previous years, when you go and write in a venue, they would have also given you formulas. So some of the formulas are like this on that spreadsheet. So you can also use them to calculate. For example, your R-squat. Yeah, I just used the R-squat as an example. In order for us to calculate R-squat, we need SSR, which is the regression sum-square measure, divided by the sum-square total. So in order for us to calculate SSRs and SS total. So SS total, we can calculate it by adding SSR and SSE. But if we given X and Y value, how do we then calculate this? So if I look at, let's take, for example, SST, you can see that the formula is straightforward. With SST it says is the summation of your Y value, which is this values, minus the mean, which if I go up, that will be the mean of Y, so for all those values, you subtract the Y value minus this. So that is what I do here. And then square. So I do all of them. So this will this will be if, come on. Okay, I'm going to delete that. I said. So if you look at this, it says it's the Y minus minus the mean and I just square the answer with that part. It squares the value. And that gives you the answer for that one and I do for all of them and I add them. So this, the sum at the end is your SST is your sum square measure of total. If you look at SS E or SSR, so let's sorry, can I just stop you there? Yeah. So you look at the formula next to the whole table the regression sum of squares. Yes. So they use when dividing, they use the total. This one. Yeah, which figures like which figures make up the 1.0 3.025. So that's why I'm coming to the SSR because if you look at the SSR formula, let me make it bigger. You can see that that is Y hat is the estimated value. So in order for us to get this SSR we need to estimate the Y value. So if you look at this this is the estimated Y value there. So how do we calculate that? I use this X value plus the formula that we're using here. So as you can see there, I'm taking our regression equation and I'm estimating what the value of X will be for every one of them. I do that. So I estimate all this value and then I come here and I use the estimated value and the mean of Y. So this will take click on the right one. This takes the estimated value subtract the mean of Y and then I calculate the sum at the end. So adding all of them creates SSR. So SSR which is your total. So this is this value SSR is this 21,000 is it 21,000 260,000 and 96 something like that. And SST is this value here the 21,0623. That's SST because it's the summation is the summation of your your values and that's what I do. Take the SSR divide by the SST to calculate R squared and that's the formula. So this is just the additional information you will like in the exam you will not even use that. I don't think they will ever even ask you to do that. It's just that on the assignment, sorry to catch you on the assignment when I did it the first time there was a question like SSR and SSC on it and I got confused. Oh, okay. So that's why I want to understand this whole SSR and SSC. So sometimes if they have given you the X and Y values. So if the question already gave you X and Y you should be able to calculate them. But sometimes they do not because they will give you SS T and SSE or SSR and SSE. You can calculate SST because SST can I write on here? I cannot write. There was like one where they had two X and X1 and then a minus X as well. So now you have Y Y minus Y. There was a question like that in one of the activity that we did like this. So you see this those are not the SSTs and SSRs. So I'm just showing you so for example if yet they would have given you SST as well as a value. You can use this to calculate your R squared or R or things like that. On the Excel let me see if so yeah but then on here I'm not doing the SSR and SST formula. Lizzie, if you look at your values for your SSR and SSTs the 1, 4, 3, 6 it differs from the ones at the top that you have with other calculations. So you've got two different value sets. Wait, what are you talking about now? Am I confused? So if you look at this value here, you're sorry, go down. Yeah, so you've got your value X and Y values. It's 1, 4, 3, 6. They didn't change them. It must be the same as the top one because your calculations are based on these ones here. Yes, so I was just demonstrating so when you use this, you need to copy the values that you are using here at the bottom. You cannot use a different value so you need to make sure that you copy that so I didn't copy those ones back onto here. I was just demonstrating how they got calculated because this doesn't change the top part because this is just the hard copy thing and they didn't do any formulas. So you will have to go and copy so if you want to use the same formulas you will have to use the same values there. Can you go to the other side of the spreadsheet? Which other side now? This slide? Yes. Okay, so this one is an Excel output and that's why I have a note there that says this is not an automated, it's done manually because this I can redo this by, for example, in order for me to get the same values I can just redo it with these values there. You need to get your Excel to be able to have activated the add-in for the data analytics or the data analysis and then you will use the regression and then from there you select your values. So if I select my Y and input the Y values, don't select the X values as your Y. So you need to make sure that you select the right Y values. Keep the label and do not include the total. So our X must stick there. I've included the label because then I need that. Don't worry about the bottom patch and in terms of the output I'm going to replace what I have there so I'll just use and you must make sure that when you click on the output range the mouse or the thing is click inside that block there and since I'm replacing I'm going to click on the summary output and I run the thing and it will say do I want to overwrite? Yes I want to overwrite because I don't want the same information that is there. And as you can see this should look exactly the same as what I have on the table. So my R-squat or my R is 0,69 which is 0,9964 which is the same my R-squat should correspond to the R-squat that is there. Then you can skip all this because the only values you need is those two. So you can either use an Excel or you can use this formulas or you can create your own Excel output. So this you must know that this is your y-intercept so this corresponds to so if you followed all the recordings that we did we already covered this as part of one of the session in fact it is this value there. So it's minus 6.290 as you can see that it corresponds and this X value will be your X slope that's your value for the slope which is that value there and that exactly looks the same as that they and yeah so that is it in terms of this I think this one is more re-created if you want to use if you want it but yeah I was just using it as a demonstration that when you get the output from Excel so you can either use Excel in terms of that or you can use the data sheet this one the manual the automated manual calculations we have is still manual but it's an automated but the challenge with this is if you do something wrong here or let's say you do it like this and you insert you might look at the wrong information because then it moves everything that is on the Excel sheet and sometimes it might mess up some of the calculations like if I look at this value so it says 6.29 if I go back to my Excel spreadsheet let's see if it still kept all the values the same way so it did keep them the same way but sometimes if you do something wrong with how you insert your new columns or your new rows there especially your new rows they might calculate different so you just need to be careful when you do this that's why I added this just to give you a guide in terms of how to work with the Excel spreadsheet alright that concludes today's session I am tired I'm talking for the first time ever okay any question before we switch off the recording yes are we are we alright not related to today's class do we have another one on Wednesday oh yes on Wednesday what are we doing on Wednesday oh on Wednesday we doing revision so we'll be doing revision assignment one which had chapter 1 2 3 so it means we doing assignment or chapter 1 2 3 revision so it's not going to be a detailed revision because we'll be going through the questions that were asked in the assignment questions so and remember also so when I when we do the revision on those assignment questions the ones that I have might be different to the one that you wrote because remember the questions are randomized so I will get a different set of questions that is also randomized because the lecture when he sent us the peg for assignment one it will be different to what you received as individual students but the questions are the same it's just that maybe you got a 56 and then the other one got an 80 but the questions would have been exactly the same so when we do the revision I'll just give you hints in terms of how you should have tackled the question what you need to look out for so this will be in preparation for the exam as well because similar questions might be asked in the exam not exactly the exact questions but similar might be in your exam so we need to go through those assignment questions so on Wednesday we do chapter one, two and three because I think the way your assignment was in chapter one two and three was chapter one and two I can't remember but it's based on assignment one maybe even depending on how long it will take us maybe we can also start with assignment two as well and then like that because we don't have enough time to do individual assignment because there are five therefore it means we're going to have to use five sessions we don't have five sessions in September to just go through the assignments maybe we'll use two sessions or three sessions to go through all the assignments and then others will start looking at the exam question papers because I want to go through the format of the exam how you will because the assignment don't help you because the assignment there were 10 questions or 15 questions that comes from the same question that come from the same chapter maybe eight come from the same chapter and 10 come from the same chapter which does not help you when you write the exam you get two or three questions and you need to know the logic of how you answer the question from which chapter from which section and how do we answer this what formula do I need to consider and so forth so we need to get into that and soon we start doing a lot of exam prep questions going through the typical exam paper the more you get used to how to write the exam so with that like I said I'm tired of talking not that I don't want to talk to you guys I think also my voice is going and it makes my throat uncomfortable uncomfortable you must keep vodka next to you it helps yes only if I knew I could go and buy it's Saturday today so it's the thing is closed most we only open the buzz it until Thursday next time alright if there are no other questions then I will see you on Wednesday enjoy the rest of your weekend bye bye see you bye