 Going to repeat what I just said, welcome to your online session and I forgot to say please also complete the register. I've posted it in the chat and if you have any question or comment there is our email addresses. The first one is for the UNICEF Western Cape region. If you have any question regarding the recording needs or any technical issues that you have where to access the notes you can send an email there. For any module related content discussion question that you have you can send an email to me that is my email address and this month we are going to open up for one-on-one consultations. If you need assistance and you want dedicated time you will have to send an email to CTNTAT and request that dedication or that one hour one-on-one time with me but you need to send an email to CTNTAT and copy me and when you send that email don't say don't just say I am struggling with my research or I'm struggling with IOP2603704 content. It's not helpful. You have to have a specific question that you need assistance with and if you have done it or if there is a question it's part of the past exam paper maybe include that as well in your email so that they know exactly who to send the email to even though you copied me because there might be other people who can assist but then they can also book the session knowing the type of assistance that you need or you require so please make sure that you put a detailed email don't just say I'm struggling with hypothesis testing that is your whole module. Don't say I'm struggling with probabilities that is your whole like your whole first part of your module where you deal with probability you need to be specific as well. Okay so that is that for in case you want to have those discussions with me privately then you need to set up some time by sending an email to that cityentat at unisa.ac.za and this is only available for people who attend the session so before they schedule you for any one-on-one session they will check the call the register please make sure that you also complete the register because if you do not complete the register and your name is not there or your student number is not there on the register they might not give you a priority in terms of scheduling your consultation as well so please make sure that you attend the classes you also put your student number on the register as well. So today we're going to look at correlation and with today's session I'm going to show you how to calculate the coefficient of correlation using a calculator and also we calculated manually. Excuse me yes big question yes some of us are not from western cake so during the one-on-one size some sort of things would not be an issue. No as long as you put your your student number you register on you are part of the sessions you will get assistance so that is why I'm saying it's very important that you attend the sessions and you put your student number on the register so that when they book the sessions they can find your name on the register. Hi thanks for the learning team. Yes so we we actually prioritize western cake students but since we're doing it online we're not going to turn anyone away but if we have the majority of students asking for assistance from the western cake then we will prioritize them first then the other regions will follow but also we also going to prioritize those who attend the classes because we know that some students prefer not to come to class but wait for the recordings and then when they don't understand anything from the recording then they want some attention one-on-one attention so we need to also balance that out that you need to be part of the discussions in order for us to assist you as well. Okay so like I was saying today we're going to deal with hypothesis testing where we look at the correlation and especially where we use the coefficient of correlation to test whether there is a relationship or not and I'm going to show you how to calculate the coefficient of correlation manually using the formulas and I'm also going to show you how to calculate it on your calculator especially if you have a scientific calculator and I'm going to show you assist you with explaining the coefficient of correlation as well. So in the next two sessions in August we are going to deal with chi-square test which also deals with the relationship of two variables so today we only looking at the relationship of a numerical variables two numerical variables and the next session two weeks from now we're going to look at chi-square test where we look at the relationship between two categorical variable or nominal variables and then the last session of August we'll look at the probability we'll go back to the basic probability and then for the next sessions or September I will check which other topics can we cover to assist you. Okay so now is that time before I start with today's session if you have any question clearly comment I know that already you have asked your first question so any other question if there are no questions then we can continue and look at the relationship between two numerical variables by calculating or understanding the correlation of those two variables or the relationship of those two variables. So with correlation it's easy to visualize the two variables if we use what we call a scatter plot where we put one variable on the x-axis and the other variable on the y-axis and then we we get the point where they both meet and that is what we call a scatter plot and I will show you that scatter plot and when you look at the scatter plot it will be able to show you the relationship that exists between the two variables between those x and y variables between your dependent and your independent so your x variables which will be your independent variables to your y which are your dependent variables and you can say whether there is a relationship or there is no relationship or there is a positive relationship or a negative relationship by visualizing those on that scatter plot. A correlation analysis also, because on the scatter plot, you are able to see the relationship. If you calculate what we call a correlation test or what is known as the coefficient of correlation, it will give you a measure that tells you the strength of that relationship. Whether that strength is positive, strong, or negative, strong, or whether there is a no relationship, which will be a zero relationship that exists, or whether it's a perfect strong relationship or a perfect negative relationship that exists, or a weak relationship, or a moderate relationship. So we're going to learn how to also interpret all those values as well. With correlation, you also need to remember that there is no causal effect that is implied. So therefore, you cannot say X implies that there is a causation that exists when the people smoke and they die. There's no causal relationship there. It just gives you the relation that there is a relationship or whether it's a stronger relationship or a positive relationship, fair. So if we have two variables, X and Y, let's say our variable for X for this respondent was one and the variable for Y was four. And if, let's say this is age and weight of the kids born in June or the kids born in 2020. So, because now I have five years. So between, let's say between kids, children born between 2000 and 2020. So if one year was born and we took their weight and they weigh four kilograms. So X is the age and Y is our weight. So they weigh four kilograms. So one baby who was born is three years old now and they weigh six kilograms. And the other one is five years and they weigh 10 and the other one was five also, but then they weigh 12 and the other one was one and they weigh 18 kilograms. Okay, these are just dummy variables or dummy data. So we can take all this age of this kids and put them on a scatterplot because if I know that this is my age and my weight will be, so that will be the weight of the children and this will be the age of the children. For that child who is one year, weighs four, one year and they weigh four, so that correspond to that point. So we can draw that point there for that fence child. The second child, we know that the second child is three years and they weigh six. So three years and they weigh six and that is the point. And we can place all the points for all the children and by looking at this, we can see the relationship and we can see that as the kids grow old, because as they grow old, their weight also increases. It gets bigger and bigger. As the kids grow old, the relationship is that their weight also gets bigger and this is a positive relationship. So we can clearly see that relationship by looking at the information. Like we said, the correlation coefficient or the coefficient of correlation, it's a measure of how well the relationship or the linear equation describes the relationship between your x value and your y values. Let's say from the data that we have, in order for us to calculate this relationship or this coefficient of correlation value, which we always denote it by r, we take all the x and the y values. Remember, we had those two variables, x and y. If we sum all the values, which is adding up all these values, we get the total and in the values of why we also get the total. Now, in order for us to calculate the coefficient of correlation, we know what the coefficient of correlation is. It looks like this. Remember that is the formula to calculate the coefficient of correlation. In order for us to get this, we need to sum up the values because this formula says your r is given by the total times the sum of x and y, which is the sum of the product of x and y. It's x multiplied by y minus the sum of x times the sum of y. These two values are different. This is x times y, product of all of them, x times y, plus x times y, plus x times y, plus x times y, plus x times y. You add all of them, they will give you the sum of x and y. This says add all the values of your x and calculate the total. That is the total of the value of x. This is the total of the value of y. Divide by the square root of your total, sample size times the sum of your x squared. This say it is the sum of your x values squared. So you have to square all your x values and then square the, once you have squared all the x values, calculate the total of those square x values. And that will give you your sum of x squared minus the sum of x, which is the same as what we have done at the top, which is the sum of all the x values. You take the sum of the all x values and you square them, multiply by n times the sum of your y squared value. Here you will square every y value and add them together so that you create a total of the sum squared minus the sum of the y squared, which will be the, once you have the total of your y values, you square that total that will substitute the value there. So how do we do that? So taking this formula, we can expand it even further. So remember, one of the formula has the sum of x, y. So in order for us to get the sum of x, y, we know that the sum means total. So we need to create a blocky that will calculate the sum of x, y. So it means we need to multiply one times four will give us four, three times six, 18, five times 10, 50, five times 12, 60, one times 18 will give us 18. We add all of them. And that will be the sum of x and y. So this sum of x and y will be 145. The other one was the sum of x. So the sum of x means add all these x values. So we need to add one plus three plus five plus five plus one. It will give us 15. And the sum, and the other one was the sum, the sum of y. So let's write it correctly. So the other one is the sum of y. So the sum of y means add all these values. Four plus six plus 10 plus 12 plus 18 gives us 45. So the sum of y is 45. And knowing all these values, we can just go to the formula and calculate. But there was also the sum of y squared. So the sum of y squared means we need to take one, square it, so it's one times one, which will be one. Three times three, which is nine. Five times five, 25. Five times five, 25. One times one is one. The same, we add all of them because the summation says it's the total. So this will be equals to 61. And we do the same, oh, sorry. This is sum of y squared, not y squared. Sum of x squared. So the sum of y squared will be four squared, which is four times four, 16, six times six, 36. 10 times 10, 12 times 12, 144. 13 times 13 is 169. Add all of them, you get 465. Once you have all the values, then you can go and substitute them onto the formula. Now, the other thing that you also need to remember as well with this is the total. So your n, remember the formula also has the n. So the formula has the n. So we need to also take that into consideration. Our n is the total number of how many records did we have? Not how many in total of x and y, the rows that we have. So there are one, two, three, four, five number of children that we gathered. All in this instance, we have five values. So our n is five. And taking these values, we can then substitute into the formula, substitute the values into the formula and calculate. So we know that our n is five. The sum of x, y is 145. The sum x, sum x is 15. Sum y is 45. Divide by the square root of n times the sum of x squared minus sum, the sum x, squared, which then n is five. The sum x squared is 61 minus the sum of x is 15 squared times n times the sum of y squared minus the sum of y squared. Which is n is five times the sum of y squared is 465 minus the sum of x is 45 squared. Which will be 45 times 25. So once we have substituted all the values, then we can calculate and simplify the answer we get is 0,3227.02. That is the coefficient of correlation. Any question? Like I said, you do not have to know how to, or you do not, you are not going to be required to calculate this in the exam, but you need to know how they got it in when you're doing your revisions and your study. So on the calculator, because now I'm going to use the calculator to find the same values that we have right now, to just show you that on the calculator it's going to be very quick and easy. And you can also use Excel as well. So you just put the values of x and the values of y on your Excel sheet, the way you see I have them here. And then you go to your data analysis and then you just select the correlation and it will calculate all these values and give you the correlation of your x and y. So on your calculator, you first need to put your calculator to state mode. And doing that on this calculator, I have the mode button. Sorry, my calculator was already on state mode because I assist a lot of students. So you press the mode and it will show comp, two for state and three for table. So we want to put it on state mode. We're going to press two, which is the button that relates to the statistic. And you'll get all the functions that mathematical or statistical functions. This are most of the mathematical functions. The first function is to calculate like your mean, your standard deviation for one variable. Because now we're dealing with two variables, we're going to use the number two, which is the linear equation or what we call the regression line, which is a plus bx, which a will represent the y-intercept and the b represent the slope. The value of b and the value of r, the sign will tell you whether the relationship is positive or negative. So they have the same effect. The slope tells you the change in the values of y and the values of x, whereas with your regression or your correlation, it tells you the strength of that relationship. So in order for us to create the line and to calculate coefficient of correlation, we're going to press two and we get a table that looks like this. To add the values, we're going to put the number and then press the equal sign. And we're going to only do the x value first. And then once we're done, we use the arrow to go up the table and start capturing the values of your x, it makes of your y, it makes it easier that way. So we say one equal. When you press equal, it will place it into the table. Three and equal. It will put that three into the table. Five equal. Five equal. Excuse me. And the last one is one equal. Sorry, are you not able to see the calculator? Oh, sorry. Let me share my screen. I will start again. My bad. Sorry. I need to share my entire screen. Okay. So on your calculators, so I'm using, you should have the link on your phones. Now you should have this calculator on your phones because I send you the link. So let's put our calculator to state mode. So you first press the mode button and we're going to press two for state. And what I was explaining was this part where I said after you've pressed the two and it comes here, the first part which is number one, calculate your standard deviation, your mean standard deviation for the population or for the standard deviation and so forth. That is for the first one when you have only one variable. Number two, calculate your regression line or the straight line that you can draw on your scatter plot which is the regression line. Where I said A is your Y intercept and B is your slope and your slope and your regression will tell you the direction or the strength of your relationship whether it's negative or positive. They will have the same effect. So in order for us to calculate the coefficient of correlation, we're going to press two and we get this table. In order to capture the values onto this table, we're going to press a value and press the equal sign. So once and we do one variable at a time, we will do X and then we'll come back and do the Y variable. It's also very important. It's also very important to capture the values as you see them. Do not mix match the values otherwise then you will not get the right answer. So we go one equal, three equal and every time you press equal, it will store it onto the table. Five equal, five equal, one equal. And now I have all my five values of X using my arrow. First to go to the left so that I can come to the Y variable, then go up until I get to one. When I get to one, I can start capturing four equal, six equal, 10 equal, 12 equal, 13 equal. Like I said, you need to make sure that the values of one and 13 cross bond, five and 12 are the same. So now I am ready to do my calculations. Once you have captured all the values, they are stored on your calculator, you can press the AC button. And now I am ready to calculate all these values, the sum of X, Y, the sum Y and all that. So if you press shift, which is that button there, shift, and you press the STAT button, which is on button number one, that button there, you get this menu pop up on your calculator. One will give you the type, it will tell you all of them are numeric. So I think it will just say quantitative or numeric. Or something like that. So I'm not interested in that. Two is the data, so two will give you, it will take you back to the table. We're not interested in the table. Then three is your sum, sums. So if I press three, you will see all these functions. So let me press three and there are my functions. So if I want the sum of X and Y, remember it was, the sum of X and Y was 45. So sum of X and Y is on button number five. So if I press five and I press equal, I will get the same answer that I have there. If I need the sum squared X squared, which is not that one. The sum squared, which is this one, which is sum squared X or sum X squared. You first need to go back to shift. Then you go start. Then three for sum and that's number one. And if I press one, sum squared, on the table we got 61. So when I press equal, I will get 61. The same. So you can also go AC or you can just continue like I did. I just continued without getting my calculator because the values are stored on the calculator already. So shift, start, sum and my sum of X is on button number two. If I press two equal, I will get 15. And I also did when I calculated this manually, I got 15 and that's how you will get the value of your sum X, sum Y, sum what? Some X, sum Y and sum X squared and sum X squared Y. You can also get the value of N, sum, shift, start, three. If you scroll down, oh no, you can't get it from here. You will have to go to the var. So on the var, on the button var, which is four, it will give you the sample size N. It will give you the mean of X, which I've calculated, yeah, was three. So if I press two, I should get three as well, which is the mean of X. I can also find the mean of Y, which is shift, start, four for var. And the mean of Y is on button number five. And that gives me the mean of Y, which is nine, because it's 45 divided by five, which will give us nine. We can also find, if I go back, shift, start, four. You can also calculate the standard deviation for X and the standard deviation for Y, whether for the population or for the sample, because this is a sample table. So we're not going to use number three and number six for this, so we can use four and seven. But I'm not interested in the sums and the standard deviations and the means. I'm interested in finding R. So let's go find R. So they say- Someone has their hand up. Yes. Someone has their hand up. Yes, yes. When I'm presenting, I can't see all the hands. You need to unmute and ask a question or stop me and then ask a question. No, yes. I tried to look through it, you can't hear me. Can you hear me? Is there a question? Yes, can you hear me? Yes, can you hear me? Yes, I can hear. Well, apparently when I'm doing the first line when you count on the first row, apparently it's going on the side, it never goes up. When I calculate? When we started with the X, one plus three plus five on the table, it's X and Y. But now the funny thing is, it goes on the side, it never goes up. Although- You mean on your calculator? Yes, on my calculator, my calculator, that's what it does. Are you using a cashier? Yes, I'm using a cashier. You have the arrows. Yes, I've got the arrows. So if you go first to the left and then use the app one to go up, does it go? No, nothing happens. Say one plus three. No, it's just going on the side. It never gets into the block. Never gets into the X block. It's just underneath the one, two, three on the left side of the block. I don't know what kind of a cashier is it? The same as the one that I have here in front with the fraction, with this fraction button. Your cashier has this fraction button. Okay. So it might be that your calculator needs to be reset. Can we take this offline? After the class, after the session, then I can assist you with your calculator. Are you on the WhatsApp? Are you on the WhatsApp group? Yes, I'm on the WhatsApp group. Okay. Then we can take it offline so that then... No problem. It's just that I don't want the session to be about how to use your calculator because it's not about that. I just felt I need to show you other means of when you do activities, when you practice, because in your study guide, you are required to know how to calculate it, but in the exam, you're not allowed, you're not going to calculate it. So let's calculate, yeah. Let's calculate R. So to calculate R, we still go back to shift, set, sorry, shift one. And now we're going to use reg, which is the regression, which is button number five. If I press button number five, then I get, remember the A and the B, I get my intercept and my slope, and I also have R. So it means I can just put the value of X and the value of Y onto the calculator and then come and do shift, set, and then press the reg, which is button number five, and then press three, which is R and press equal, and that will give me the same answer. Instead of me using this complex formula, I can find it in three steps as well, which is shift one, five for reg and three and equal, and that would have calculated the whole formula. And I have my regression. Now, since I know that my regression or my coefficient of correlation value is zero comma 32, which is 32%, how do I then interpret this? What does that mean? What does 32 mean? R, four, as to know what does that 36 mean, you need to know that the coefficient of correlation will give you the value will be in a decimal or when you do the calculation, it will be a decimal and we know that if it's a decimal, like we're going to take it back to the probabilities. We know with the probabilities, we said they are between zero and one. With this, it can be between minus one and one. So the value of your coefficient of correlation, it can never be more than that. Your R, which is your coefficient of correlation can only be between negative one and one. So if you get, or you do the calculation and you get a value of two comma eight, you must know that there is something wrong that you did with your calculation. So since the value of your coefficient of correlation is between negative one and one, therefore it means if your value on your scatter plot, they follow this pattern, as you can see all these dots, they are perfectly lined in a straight line. And this, when you calculate the coefficient of correlation of this scatter plot, you will get a coefficient of correlation of one, which is 100%, which means 100% of the time the relationship between X and Y exists. And that relationship, if the scatter plot looks like this and it goes up, when the values of X goes up, the values of Y goes up, then we say this is a positive relationship. And since it's equals to one, the R is equals to one, then we say this is a perfect positive relationship. If the relationship looks like this, where the dots are scattered everywhere, there is no clear relationship here because you cannot make it up whether this is, when the values of X are going up, does the values of Y go up? And we can see it does somehow, somewhat, but not confidently, we can't say that. But this relationship is still positive because it still goes up. We can see that some of the values of X, when they are up, the values of Y, they are also up. And because the dots are scattered like this, the R is 0.18 and when your R value is 0.18 or it's less than 50%, we say that relationship is weak. And with this, because it's closer to zero, we can say this is a weak positive relationship. If it looks like this, at least also the values are going up in a straight line. If I draw a straight line there, you can see that they are almost going up in a straight line. It's not 100%, it does not look exactly the same as this first one, but there is also some relationship because when the values of X increases, the values of Y also are increasing. And your R squared, when you calculate it from these points, you get the R squared of 0.85. Then we say this relationship is a strong positive relationship. Sometimes the relationship can look like this, when now the relationship is going that way. So this one, the relationship was going up. This one, the relationship is going down because this one says when the values of X are increasing because when the values of X are going up that way, they are going up. When the values of X are going up are increasing, the values of Y are decreasing. As you can see that for the X value, which is higher, the Y value is lower, especially at that point. And also at that point, the X value is high, but the Y value is low. Also here, the X value is high, but the Y value is low as compared to that one where the Y value is high, but the X value is low. And this, when you calculate your R's coefficient of correlation for this type of a scatter plot, this relationship will give you a negative 0.92. And then here we can say, this is a strong negative relationship because it's also declining. When the values of X increases, the values of Y decreases. So there is a negative relationship on this one. So in a nutshell, I said the R value lies between zero, oh, sorry, lies between a negative one and one. And that will give you a perfect relationship if your relationship, whether it's negative or positive, it's equals to one. We say that is a perfect relationship. The closer your value of R comes to one, it means the more related the value of your X and Y are. The further it goes away from one, then it means there is a weak relationship where it can be that at some point, you might get a relationship that looks like this, which cannot be defined. And at this relationship, it might be that the R for this relationship is positive 0.05 at that point. Or you might find that the relationship is like this. And when you have a relationship that looks like this for your X and Y, and this we can say R is equals to zero because there is no relationship there between your X value and your Y because the Y value is constant. Whether the value of X increases, the value of Y stays the same. Whether it's 100, the value of Y will be the same. And that gives you a correlation coefficient of zero and therefore it means there is no relationship. So the closer the value of your R gets to R or gets to zero, it's either going to be a weak relationship or it will end up becoming no relationship. Whether positive or negative. Also, in your module, you need to know that from the value of your R, you can also calculate what we call the R-squat or the coefficient of determination. And this is a very important statistic because it tells you R of the total variation in the values of your X, how influences that relationship in the value of your Y. Or in a statistical way, how we put it, we say it indicates that the variance in your Y that is attributed by the variance in your X value. So if I have R of zero comma, zero comma one eight. If I need to calculate R-squat of this value, I just take zero comma one eight and I square that. So let's use our, oh, sorry. Let's use our example that we had. So our example we have, the R is equals to zero comma zero, three, two. So this is a weak relationship and this is a positive weak relationship that exists between the X value and the Y value if we have to interpret this. Now, if I want to calculate the coefficient of determination in order for me to know how much of the variance in Y are attributed by the variance in our X value, I just press the X-squat button there at X-squat and I press equal and it tells me only 10% of the variance in the value of Y are attributed by the variance in the value of X, only 10% of them. And that's how you interpret your R-squat as well. So in more detail how we can interpret the strength of the correlation, we can use some of this as a guide or a guidance. If R is equals to one, it's exactly minus one or positive one. So if it's negative one, we will say it is a perfect negative linear relationship or perfect relationship. If it's negative, we will say it's perfect negative relationship. If it is negative zero comma seven, we will say it is a strong negative relationship. If it's 50% or negative zero comma five, we say it is moderate. If it is negative 30%, we say it is weak and if it's zero, we say there is no relationship. So it's up to you how you define your variance or measure. So in this instance, I will say anything between 50 and 65 can be moderate. Anything between 65 and 85, I will say it is a strong relationship. Anything between 95 or 85 or 90 and above, I can say there is a perfect relationship. It depends because 90 also can be a strong relationship. Anything below 15, you can also say there is no relationship with those, but there's still a fairly weak, weak, weak relationship. I don't know if we can say there is a weak, weak, weak, weak relationship, but at least anything that is below 15, I could say also there is no relationship because a zero comma zero zero or a zero comma zero eight coefficient of correlation will relate to no relationship. A zero comma one zero, 10% will relate to almost no relationship as well. A 20%, that will be a weak relationship, a very weak relationship, or maybe we can use a very weak relationship, but depending on how they ask your questions in your exam, you just need to look at the scale and measure it and say whether they give you a weak or a very weak, is it positive or is it negative? So the sign will tell you if it's negative or if it has a minus in front, it will say it's a negative relationship. If it doesn't have a sign in front, so it's a positive value, then it's a positive relationship. And this is how you will interpret your coefficient of correlation. In order for you to test that relationship, because now you know that your R is equals to that value. Now you can test, you can use a hypothesis testing to test if there is a relationship. So we use the null hypothesis and we state that there is no relationship between X and Y and we use a pi or a phi. We call this a phi, not a pi, a phi, P-H-I, phi, phi. Phi, this letter, we call it a phi, a phi. The null hypothesis will say there is no relationship between X and Y. The alternative of that will say there is a relationship because if your phi is not equals to zero, therefore it means it says there is a relationship. So it means your R is at least 0.18, so 0.85 or 0.75 or something like that because it's not equals to zero. In order for you to calculate or to make a decision, we need to calculate the test statistic. Calculating the test statistic, we know we're given the value of our coefficient of correlation for the population because the phi will represent the correlation coefficient of the population. We need that, they will give you that. And your R, which is the coefficient of correlation for the sample, you would have calculated it and found the values, then you just substitute them, divided by the standard error, which will be the square root of one minus R squared, which is your coefficient of determination. One minus the coefficient of determination divided by the degrees of freedom of N minus two. And we also know that R can be either negative or positive depending on how the values are related. And once you have calculated your test statistic and you go and find the critical value or your p-value and you can make a decision based on that. So let's look at this example. We need to test if there is an evidence of linear relationship that exists between X and Y at 5% level of significance. So in this question, they're asking us to test the relationship that exists and they give us alpha of zero comma zero five because they gave us the level of significance, which is our alpha of zero comma zero five. So we need to test that. State and null hypothesis and alternative hypothesis we have been doing with all the null hypothesis testing. That's the first step. So the null hypothesis will state that there is no correlation. The alternative will state that there is a correlation exists. Given our alpha of zero comma zero five, our degrees of freedom is N minus two, which we let's use our same example that we had previously with the way five, five minus two is equals to three. Calculate the test statistic. Always remember that your population correlation your population parameter will always be given in your hypothesis testing statement. So we know that it's zero. So it will always be equals to zero. So it will be our R. We found it was zero comma three, two, two, two seven. So I'm just using two decimals here. Zero comma three, two, two minus zero divided by the standard error, which was one minus zero comma three, two, squared divided by five minus two, which we get the test statistic of zero comma five, eight, five. Once we have the test statistic, then we can go and find the critical value in order for us to make a decision. And since this is a two-tailed test, it's a two-tailed test or what do you call it? A non-directional test because of the sign that says not equal, which gives us a non-directional test or a two-tailed test. And since it's a two-tailed test, therefore it means we have two regions of rejections that we can use and because our alpha was zero comma five, zero comma zero five divided by two because there are two regions. So it will be zero comma zero two five. And our degrees of freedom is three. If we go to the T table, we will find zero comma zero two five and we go find three. We will find on the table the critical value S minus three comma one eight two four. You do not need to know how to get the critical value or the P value, they will be given to you in the exam. So like with this one, I'm giving you the critical value. So those are the region of rejection. We take our test statistic, we locate it where does it fall, if it falls within this white area. If it falls in this white area, we will not reject because in the white area, it says do not reject the null hypothesis. But if it falls in the shaded area there on the outside, if it falls in those area, we are going to reject the null hypothesis. Please make sure that you are a mutant. So where does our zero comma zero five falls? Zero comma five eight five. So we know that our critical value is here. So my arrow is pointing at the wrong side because my zero comma zero five, so that should point here. If falls in there, do not reject area. So the decision will be not to reject the null hypothesis and in conclusion, we will state that there is no sufficient evidence that the relationship exists at 5% of a linear relationship or a linear association at 5% level of significance. Because we always have to go back and relate it to the null hypothesis. So the null hypothesis said there is no relationship. So also we can clearly state that there is no sufficient evidence of a linear relationship that exists because there is no relationship. I have a quick one. Yes. The minus 3.1824 and the positive 3.1824 where they come from? From the critical value table. The same way as in your exam, they will give you the p-value for example. So since I'm using the critical value, so they can give you also the p-value. Let's say if the p-value, I wouldn't know what the p-value would be. It means I would have to calculate the p-value to get the answer for you. So our alpha is zero comma zero five in order for us not to reject the null hypothesis. So our p-value needs to be greater than. So let's say our p-value was zero comma two five zero. I'm going to make it bigger. So if our p-value is zero comma two five zero, therefore we need to make a decision. We know with the decision it says, if your p-value is less than alpha, you reject the null hypothesis. So now our p-value of zero comma for a two-tail test. So for a two-tail test, the p-value of two five zero will be greater than zero comma zero five. Therefore we do not reject the null hypothesis. You would have used the p-value also to make your decision. So you can, they will give you that value because it can only be calculated from a statistical table. Or you can be given the critical values which comes from a statistical table. Okay, and that's how you do hypothesis test for correlation of coefficient. Any questions? Then we can do some exercises. Okay. In the absence of questions, then we can go on and answer the questions. Remember with activities, if there are any calculations, I will give you time to do the calculations, but if there are no calculations, we can unmute and have a discussion. So, the first exercise. A group of hospitalized patients who have been diagnosed as suffering from dementia are treated with certain drugs over a period of time. These drugs were prescribed to improve their mental alertness. A researcher studied a random sample of 30 of these patients who have been on these drugs for varying amount of time, hoping to establish a relationship between the number of days of a drug treatment and the patient scores on a mental alertness test. Which is the correct way to express the appropriate null hypothesis for this research? Do we state the null hypothesis by saying P is equals to zero, or we say the mean is equals to zero, or we say R is equals to zero? One, P is equals to zero. We use number one. Like we did with the exercise. We always use the population parameters. We use P is equals to zero. P, P, P. Same, I'm not gonna read the statement again because we're using the same statement, which is an appropriate test to determine the significance of this relationship between the number of days that drugs, or that the drug was administered and the score on the mental alertness test obtained by the sample of the patients. Now, think very carefully. What did we use? So for correlation, we use T test. Number three. Number two, a T test based on Pearson. Yes, it will be the test based on, because now, and good that you saw that immediately, because now what I did as well was to confuse you because I said we use the T test, but we need to always remember that we use the ways that thing. I know that I did it only once. I mentioned the Pearson. We always use the T test to check the Pearson and correlation of coefficient. You must always remember that. So we're not using a T test for a one sample. Remember for the T test for one sample, the standard, it's your T state is equals to minus the population mean divided by the standard error, which is not the population standard deviation, which is S divided by N. And we're not using that. That is the test for one sample. So that won't be correct. This will be for the next time I meet you, we will be doing the car square test. So the only correct answer here is the test based on the Pearson product moment of correlation. We're gonna skip that question because it's not related to what we're doing today. Which of the following can take a value of negative zero comma five? One, is it the probability? Two, is it the level of significance? Three, is it the coefficient of correlation? Four, is it the variance? Is it not three? It will be three indeed because the probability can only be between. Remember probabilities can be between zero and one. The level of significance is your alpha. It can never be negative. And your variance also can never be negative because a variance which is S squared is equals to the sum of your X value minus your population mean squared divided by N. For the fact that we're doing the sum squared, therefore it means your variance can never be equals to a negative. The only correct answer is the coefficient of correlation. Because coefficient of correlation lies between negative one and, okay, I can make it positive. Negative one and positive one. If there is no relationship at all between two variable X and Y, what would be the most likely value of your Pearson correlation coefficient R out of the following? Three. That will be three because number one tells you that there is a perfect negative relationship. Number two tells you that there is a moderate positive relationship. And number one tells you that there is no relationship. A researcher suspect that children's level of anxiety during a test will interfere with their memories. Gives a list of items to be memorized to a sample of children. And gives them a test to see how many items they can remember. Directly after what, he also tells the level of anxiety of each child with an anxiety scale. Where higher score shows a greater level of anxiety, the researcher draws a scatter plot of a relationship between the level of anxiety and the number of items recalled. And the results are shown here, which gives you the scatter plot of level of anxiety and the number of items. The question is, what can the researcher infer about the relationship between the level of anxiety and the items remembered from the graph presented above? There is a relationship to one of them. It's definitely a negative relationship. And the main thing would be just, I'm not sure whether it's one or three, but I think it's one, but definitely a negative relationship. Okay, so let's look at every statement. We know that definitely it's a negative relationship. So therefore number two is out and number four is out. So we are left with one and three, because then the first part is correct. The second part, not looking at the values there. What does this tell me? It tells me that the higher the level of anxiety is, the less items the children remembers. So let's see. Number one says, an anxiety rises less item remembered. Number two says, less items I remember as anxiety falls. So what I understand with number one, it says, as the level of anxiety increases, the number of items decreased or are remembered. So an anxiety rises, less items are remembered, which is the same thing as I've just said. Number two says, less items are remembered as anxiety falls, but we know that levels of anxiety are not falling, they are increasing. So the fact that they have this part there, which says falls because then it says when they, the number of items are less, then the anxiety also is less. That is that, that would have been a positive relationship because the less time, the less items, less anxiety, the higher items, the higher anxiety, it would have been a positive relationship. So this, the only correct answer from this is number one, because number one clearly states what you are seeing here. The higher the level of anxiety, the lesser the number of items you are or the kids are remembering. As anxiety rises, less items are remembered. Any questions? You agree, you happy? Definitely. Thank you. I'm not gonna read again the same statement because it's a continuation. Suppose the researcher uses the data to calculate the Pearson correlation of coefficient to determine the size of a relationship between number of items remembered and the level of anxiety. Which of the following would most or would be most probable as a description of R? Which of the following description of the expected value of R seems to be the most appropriate? Now, you need to think very carefully on this one. Does R here equals to zero? Does R here greater than zero? Remember, if it's greater than zero, then it means it is a positive relationship. Does R here less than zero, then it's the negative relationship? Or does R here means no relationship, not equals to zero? I think it's three, R is less than zero, it's a negative. Yeah, yes, yeah, R will be a value less than zero because this will be a negative, it's a negative relationship that exists. Quick one. Yes. With relation to alternative proof, can please mute with number four? If it's not zero, it can be positive or can be negative. Yes. So what about that one? Cause it can be positive, it can be negative as well. No, but this one cannot be negative or positive. They can only be one of that. What I'm saying is that the sign there, not being zero means it can be either positive or negative. So with relevance to number three above where it's negative, number four can fit there as well because if it's not zero, it can be negative as well. Yes, but this, remember they're asking you in relation to this graph specifically. Okay. You know that specifically on this graph, it can never be a positive relationship because we know that it's a negative relationship. Yes, so suppose that they uses that to remember which of the following would be a probable description of your R. So you need to pay very close attention to the question because if they would have asked you in terms of coefficient of correlation, which one of the following could have been the correct if they are happy? So, or if there is a relationship, if let's say if they would have just said if there is a relationship. So it could have been either nor it cannot be because they said if there is a relationship. So it cannot be no relation because there is a relationship. It can either be a negative or a positive but because now with this one, they already stated or they already gave you and show you that this is a negative relationship. So you need to relate it to that. Okay, but I'm just thinking, you know, since it's not zero, it can be negative as well but I hear you, I understand that. Yeah, it, yes. If it's not zero there, then it says there is a relationship but yeah, in respect to the question asked, then it is a negative relationship. Okay, understood, thanks. So that, which of the following given below is the closest to the probable value of Pearson product moment of correlation? Now here is, I said, what did I say? Now I'm going to take it back because I said you are not expected to know how to calculate R and here they are asking you to calculate R. Sorry, I'm taking back what I just said there because then it means I didn't pay attention to all the questions that I posted here to do today. Okay, so easy to do. So we're going to use our calculators. So those with a cashier calculator, I'm not going to look at the values there. I just want to make sure that I have my X and Y. So when you answer the question in the exam and maybe probably there might be more than one question that asking you the same question, make sure that you get your calculator from any of the stored values. So at the moment, remember I still have the old values which are in my table. So if I go to three, I should find, not that. Well, sorry, I broke my two because I broke it now. So we have our table, remember you go mode and you go to two for stat and then you press two for A plus BX and you get the table. So I'm going to put all the values of X by using the equal sign and the number. So one equal, two equal, three equals, five, not five, four, four equal, five equal, six equal, seven equal, eight equal. So they should be eight of them because it's one, two, three, four, five, six, seven, eight. Then go to the site, go up, up, up, up, up, up. And eight equals, seven equal, six equal, five equal, four equal, three equal, two equal, one equal. And I have all of the values, click AC. Now I'm ready to calculate my R, calculating R. Press shift, go to stat, which is button number one. And I'm going to go to five, which is my R, my reg and then press three for R and press equal and there's your answer. Otherwise you can use your R is equals two N times the sum of X, XY minus sum of X, sum of Y, divide by the square root of your N times the sum X squared minus the sum, sum of X squared times N times the sum of Y squared minus the sum of Y squared. Or you can use that by first calculating the total, which your N will be eight. You calculate the sum of X, Y, which you can just say X times Y and say one times eight is eight, two times four, two times seven is 14, three times six, eight in N you go on and on and on and calculate the total because all the summations are your total. So the sum of X will be adding all these values and getting the total there, you will be able to get the sum of X. The sum of Y you will get it if you calculate the total there. The sum of XY you will get it when you get the sum of all these values, which one I'm leaving out. Then you also need to calculate your X squared and when you calculate the sum of your X squared because it will be one, four, nine, 16, and until you get to the end, then then you calculate the sum of X squared. Then you do Y squared, which will be eight times eight. This eight times eight, now my mind is stuck. Eight times eight is 68 times eight is 64. 64, 49, 36, which is six times six is 36, and you do all of them and then you calculate the sum of Y squared there. Could it also not work by just looking at the values? Cause if you look at the table, the Y starts from very high and then it goes down. It can work. Yes, it can work, but it can work because it says when the value of X increases, the value of your Y decreases. It can also work. That's another way of looking at it. Good point. So you can either use your calculator, you can use your formulas, or you can look at the values. Happiness? Happiness. Happiness. A researcher hypothesizes that the drug treatment of hospitalized schizophrenia patients improves their mental alertness. He studied a random sample of 27 patients to see whether there is a relationship between the number of days of a drug test treatment and the patient's goal. It seemed like we're repeating this question. On mental, the drug patient's goals on mental alertness test, which is an appropriate null hypothesis for this test. Four. Number four. What I also realized is in your module, most questions are repeated in multiple past exam papers. Because this is the same as what we looked at in the beginning, but it came from a past exam paper of October 2017. And this was asked in your tutorial letter one on one of this year, which is almost exactly, is the only difference is on the exam paper it was 30, now they're talking about 27. So you must just, you can get 100% easily with your module. That is if you practice and know what you're doing. Which of the following is suitable for representing the age versus the height of group of children? Age is a number, height is a number. Which of the following is suitable for representing the age versus the height? What is the visualization that you can use to visualize a relationship between two variables? It's kind of a plot point. It will be numerical. We can use contingency table for nominal variable. We can use a histogram for one numerical variable. Let's, one numerical variable. I'm not sure in your module, because I started your module in the middle. Do you use interval or ratio? So I think, or do you say numerical? We will get to that at some point. We will come back to the beginning of your module when we look at the variables. But this is numerical, whether it's a ratio or interval. I think you use interval. One interval variable. So here we use two interval variables or two numerical variables. Which of the following graphs below is most likely to represent a Pearson correlation of R equals to positive 0.85 between variable X and Y. If the measurements are plotted on a scatter plot, we have graph A, graph B, and graph C. That will be A. A. That will be A. Yay. Because this is negative and this will be close to 0.35 or 0.15. So that will be graph A. This is positive. I just want to make sure that I write it there. And this is negative. And this, I'm not sure whether I should even say, whether it's positive or negative or there's no relationship. It can be depending on what the average will be on there. A negative correlation between X and Y implies that a person scoring low on X will generally score on Y. Will score blank on Y. Will score high. Yes. It's option number three because it's negative. So when one increase, the other one decrease. When one decrease, the other one is high. So when one is low, the other one will be high. So if X is low, Y will be high. When Y is high, sorry, when X is high, Y will be low. Or a negative relationship. Pearson correlation coefficient can take values ranging between blank. Three. And that will be three. Number one would be correct for probabilities. Number two, I don't know which one will be correct for that one. So Pearson correlation lies between negative one and positive one. Pearson R represent blank. It represents a comparison between the observed frequencies and expected frequencies. If the null hypothesis is true for the distribution of the data across the variable. Number two, the relationship between two variables when the way in which they vary together is compared to the individual variances. Three. The difference goes between two variables relative to their pooled standard deviation. Did you give an answer? It's someone who did not mute their mic has been like this for some time. Oh, so which one will represent the Pearson R? Is it a comparison between observed frequencies and expected frequencies? Or is it a relationship between two variables when they weigh in which they vary is compared to their individual variances? Or is it the difference goes between two variables relative to their pooled standard deviations? It's true. The relationship between two variables. It will be two because number one, this refers to chi-square, it's chi-square test because with chi-square test we look at the comparison between your observed and your expected frequencies. And this refers to what we did the previous time we made which is your T test for independent. What is the correlation coefficient between the following variable or values of X and Y? Zero. So the last time you said you can only look at the values and make up your mind. That's zero then, number two. You're saying it will be number? Number two. It will be number two because also even without calculating if this was my scatter plot, I know that the X value is at this point. That is my X value. So here it says zero and negative one. So therefore it means, let's put it this way. Zero and negative one. So the point will be for Y will be negative one. It will be somewhere there. The other one will say zero and zero. That will be another point. And the other one says zero and one will be there. And this clearly shows me that there is no relationship because it's just as flat line on your eye exists. Number two will be the right answer. Even if you take your calculator and you calculate this, it will give you zero as the coefficient of correlation. Pearson correlation coefficient R represent the size of a relationship between two variables, the shape of the relationship between two variables, both of the above. I think it's one. I would also say one. I've not heard of the shape of a relationship. I don't know. What shape is this? If you haven't heard a shape, what shape is this? Histogram and scatter plot. It gives you a shape. Also with a scatter plot, it gives you a shape. There is a shape. Remember, with your scatter plot, there are two things. It is a mmm and mmm relationship, relay. Okay, so you're looking at how the things are scattered around there. Yes, I think it's both. Now the answer is three. It will be both of them because here, the shape tells me this is a negative or a positive. A positive. And the size will tell me whether it's strong or weak. You understand. So it will give you both of them. It will give you the shape because the shape you will be able to see whether it's going up or it's going down or it's scattered everywhere. The shape is no shape. So it gives you both the shape and the size. Both of them gives you the strength of that relationship. They make up the strength of the relationship. And with that, it means I come to an end of our session. Any question, because those are the questions how could gather from most of the past exam papers that I have access to. There are so many other exam papers, but most of the questions are exactly the same. So, and that concludes today's session on correlation. The request, few minister GF, would it be possible? If it's not possible, it's fine. Just to quickly show us the calculator because I have other one that you sent us. It's not the same as the one that you have there and we're struggling to follow the process here. If it's not an issue, please. Let's do that. Let's do that. So, we still have enough time. So let me see if I am able to. I just want to make sure. Let me connect on my phone as well. I just want to see if I'm able to connect. Then I can show you on the phone. I need MS team, MS teams, teams on my phone. Sorry about that. Let's go to one of the questions. We'll use this one. Okay. What do you see? It's only the list of attendees. Okay. Now? Okay. We are hot. Now you see the calculator. Okay. So, on this calculator, I'm not sure if I'll be able to point. Can you see when I click? Do you see the arrow at the top? There are two arrows? Yes. When you click, yes, we can see. Okay. And then there are those two arrows as well. So those are the same as the ones that I have on the calculator, right? Then the mode button is there. That's the mode. Oh, I see. Because the calculator then looks different. So we're going to use this. Yeah. We're going to use the STAT distribution. So you're going to select that. There is numbers. One, two, and then there is four and then five. You can see there is a A plus B pad variable X and Y linear. When I click on it, it gives me this table. Then I'm going to just put in the values. We can use the same values that we, let's go to the one that we used. Just slow down, please. How did you get to this window? You press mode. And then there is STAT slash DR, which is distribution. Okay. Just press on that one. Let me go out because now I'm already on there. So you press the STAT DR and then you get the menu, which has data summation, variable, min, max, and distribution. You will look down after the type statistical calculation there. There is A plus B on number five there. There is A plus B X pad variable X and Y linear regression. If you click on it, it will open up a window. So we're going to use the very same, the first exercise that we did, which is because there are very few numbers there. I have, sorry, I have a single VAR underscore VRE, which doesn't wanna go away. No, you need to click on, not the single VRA. So when you go to the STA, pass the one var, single variable. The one below it is number five. It says A plus B X. Do you see that? No, it just doesn't wanna, this single VAR doesn't wanna go. I cannot see this menu that you are just reading. Okay, so now what you need to do is go out of your calculate tabs. Go to mode again and press met, and then go mode and press STAT. And then you will have data and then one var calculate. Go down, don't click on anything, then you will have type of statistical calculation, and then you will have one minus var, and then you will have one, number five, it says A plus B X. You click on this, A plus B X. Thank you. Yeah, and then you go into say one, and you will press equal, and then you do three, equal, five, equal, five, equal, one, equal. Now you are done with all the values. If we want to go up, there is an arrow that goes up. So you click on that arrow that goes up. When you click on it, you will see that it will move up until you get to number one, and you need to use the arrow that goes to the right, which is the one next to the mode. You will click on that arrow and it will go to the top of your Y, and where it says zero, and then you press four, say equal, six, equal, 10, sorry, delete, 10, equal, 12 equal and 13 equal. Now you have captured all of the values. Now we need to calculate the, I'm sorry, we need to calculate the values. Oh, sorry, I clicked on the wrong button. My fingers, let's go back up. Close there. So now we need to calculate the values that we are looking for. So if you click on, we can go on and off the calculator, so you can press the AC button. So it will say linear regression there. So in order for us to go back to where we were, you click mode, sorry, we needed to click state, not the mode, sorry, my bad. The same thing that we do, we press shift, and you press shift, sorry, shift, and at the top there next to the norm state, it will have an S for shift, and then you press one for state, and then you have all these other values. It says data, two variable calculate, regression calculate. So we're going to use the regression calculate. You click on regression calculate, and there is your R. You see there? Already it calculated your R, you have all your values there. So let me start again. So if I go mode, and I go back to met, mode, let's go back to met, we go mode, and then we press the STATDR, and then you will press number five, where it say A plus B, and then you capture your values, or at the moment I didn't clear my calculator, you say one equal to three equal, five equal, you have all your values captured. When you have stored all your values onto your calculator, you press AC, and then you press shift, and then you press one, and you go to where it says regression calculate. Press the regression calculate, and it will have calculated all your values there. Now, I want to clear all these values so that we can enter that, you remember the other exercise that we did, was the eight, one, two, three, four, five, six, seven, eight, and eight, seven, six, five, four, three, two, one. If we want to capture that information, we need to clear the calculator by pressing shift, and then you press AC, it will, it should clear your calculator from all the stored values. So now we need to go back mode, stand, actually our calculator is already on linear regression. So all what we need to do is capture the data. So to capture the data, you go shift, stand, and we go data. Oh, it didn't clear my table. So shift, clear all. So it cleared my value. Shift, clear all will clear your calculator from any stored value. So one, two, three, four, five, six, seven, eight. Then I go up, up, up, up, up, up, up, up, up. Till I get to one, left, eight, seven, six, five, four, three, two, one. So I've stored all my values. All I need to do is go AC, and then shift, stand, which is button number one, and regression calculate, which is five, and there I get negative one. The other exercise that we did was, the one where we had zeros, we can do that. So shift, stand one, and I go to the data, which is the table. Shift, clear, my table is clear, and then I can enter all these values. Zero equals zero equals zero equals. There are only three of them, equal. Go up, up, up, go to the site, put the negative one. On this calculator, your negative is this one in the bracket. So you will do negative one equal zero, and positive one, which is just one equal. That way there is a plus is just a positive one, which is just number one. Now I'm ready to calculate shift, step, oh, sorry, what happened now? Shift, step, data, I have my data, and I just need to go out of the shift, step, regression calculate, something not right. Okay, I think now I've fixed the problem, shift, step, data, there is my data, can go out, shift, step, regression calculate. Oh, because it says it indeterminate, because it cannot calculate it. So it is equals to zero, there is no relationship. It doesn't know how to calculate the zero. So we should not be alarmed when we see this. Yes. It's indeterminate. Yes, so that will be equals to zero. Okay, it thinks a lot. Okay, so let me go back there to the data. It will be because of, yeah, because of the zero. Because if I go to shift, step, and if I want those, you remember the sum measures, I can go to the sum measures, they will always be equals to zero, zero, zero, zero, except for n, and except for y, because y had some values in there. And your sum experts, so there are your values of your sum of x. Okay, now how do I stop sharing from my phone? It gives me share, I don't want to share. I want to stop sharing. Okay, since I cannot stop sharing from my phone, I'm going to just go out of the meeting from that site. Please complete the register if you haven't. I will repost the register. Make sure that you complete the register. Those who haven't completed it. If there are no more questions, then we can call it the quiz and you can enjoy the rest of your evening. Thanks a lot, enjoy your evening too. Thank you, thank you so much.