 I need to have my calculator up. Please make sure that you complete that register and you have your calculators ready. Because today's session, I'm going to show you how to use your calculator. What type of calculator do you have? I have a Casio, but it must be an old one. It's not as fancy as yours. It can't do the fractions. Okay. But I don't think you will be required to do a lot of calculation in your studies. But I'm just going to show you in case they ask you to do some calculation so that then you know how to do certain calculations. Today's session, we're going to be looking at how we do hypothesis testing for the correlations. In a correlation, it's where we look at if two numerical variables are related or are correlated to one another. This is our second last session. Our final session will be on the 27th of September, and then we would have covered everything you need in terms of statistical skills that you require in order to go through your module with ease. Then we will see if we can set up an exam preparation session for us to go through at least one exam paper and see how to answer certain questions from there based on everything that we would have learned in preparation for you to go right there. Then also, I will be sharing with people who have been attending the sessions, we will share the link to a one-on-one session if you want to book such. There are times, so because I've got limited time, I can only offer support to those who have been participating week in, week out within the session, so we will prioritize them first. So look out for that email so they will look at the register and then send out the link to those people to say, if you still want some support or some engagement with me one-on-one, this is the process that you need to follow. So it's not going to be sent out to everyone, possibly it will be sent to those who have been attending the sessions so that we can support you even further. Okay, so are there any questions before we start with today's session? Have I started the recording? Yes, it says recording has started. Do you have any questions? Nope. The other person just left, so it's only me and you. Yeah, so it's fine. So like I said, we're going to be looking at correlation and like I said, also in your module, you're not expected as well to do a lot of calculation, but I'm just going to show you how to do some calculation for other measures, but you will be expected to know how to interpret your correlation values as well. So in order for us to go through this, you need to know there are formulas that you're going to go through and also you require a calculator because sometimes you will need to do some calculations. So when we talk about a correlation, because we're talking about the relationship between two numerical values and to visualize them, we always put them on a scatter plot. And a scatter plot shows you a relationship between your independent variable and your dependent variable and remember always everything that sits on your x-axis will be your independent variable because it's your input variable and anything that sits on there y-axis will be your dependent variable, which will be your output variable, which is something that we want to predict or so. In terms of correlation, it is used to measure the strength and the direction, not only the strength, but also the direction of the relationship between the two variables. And sometimes we call this relationship, we want to check if it is a linear relationship. So when we talk about the strength, we talk about whether is it a strong, a weak, a moderate or no relationships type of the strength. And when we talk about direction, we're talking about whether the value that you will be calculating will be negative or positive or when you look at the scatter plot does it show that it is declining or is it going up? What we mean by declining, it means when something goes up, the other thing goes down. That it means it's declining when it's ascending. We say when something goes up, the other thing also goes up. So that is what we're going to monitor and look at in terms of correlation. So if I have my independent variable x, which has the score of 1, 3, 5, 5, and 1, and I have my dependent variable, which has the score of 4, 10, 12, and 13, I can take those two variables and plot them onto my scatter plot. Remember, on the x-axis, we plot the independent variable. And on the y-axis, we plot the dependent variable. So in terms of a scatter plot, you can see that 1 corresponds with 4. So this will create a dot. So where I have my independent variable of 1, I will have my dependent variable of 4. And that will be the dot that you place. And 3, 3, and 6, you go and look at 3 and 6, that will be the variable. And 5 and 10, go to 5, and you go to 10 where they meet. That is the variable. And 5 and 12, you go to 5, you go to 12. And that will be the variable. And 1 and 13, where it's 1, and there is the variable. Now, after plotting this dot on this scatter plot, we can clearly see that this is a value that is away from the rest of the values. And that is what we call an outlier. You can look at what an outlier is. It's an extreme value. It's a value that is far apart from the rest of the values. But when you look at the other values, if we ignore the one that is far apart, we look at the other values, you can see that when my x values are increasing, my y values are also increasing. And that is what we're going to be talking about in terms of the correlation. And looking at this, we will say this is a positive correlation because my x value and my y value are both increasing at the same time. Okay, so there are measures that you can calculate to measure this relationship. Sorry, to measure this relationship, we can calculate it by using what we call a coefficient of correlation, which is also called R, which is a measure that will tell you how well your linear equation describes the relationship between the two variables that you are measuring. So when we talk about the linear equation, when we do the regression, you will understand where this linear regression comes in. But R will tell us how well the two variables relate to one another. So for example, how do we calculate the coefficient of correlation? So you will be given the x and y value, you're independent and dependent. There is a formula that you need to calculate. And on that formula, we always calculate, so this has the bar, this are some of the values that you need to be able to know how to calculate them. So for example, the first value, which these are the values that you are given x and y are the values that you will be given. And if you add all these values for x, you will get 15. When you add all the values of y, you will get 45. And these values, when you add them, the total, we call them the summation. So for example, the summation of x will be equals to 15. The summation of y will be equals to 45. That is what the total is. So total means summation, adding up. Summation means adding up. So now when you have the summation of all the values, you are able to calculate what we call the mean. So if I need to calculate the mean of x, which is x bar, we know that it is the summation of x divided by how many they are. And here we see that the summation of x is 15. And how many they are? They are one, two, three, four, five. They are five. And when we take 15 divided by five, you will get three. Hence, I have the mean of three. The same thing, 45 divided by five will give you the mean of y, which is 45 divided by five. It should be equals to nine, because it's the sum of y values divided by how many they are. So if you have your mean and you have your summation, you should be able to also calculate other measures. So for example, like the second block or the third block, the column, it says x observation minus the mean. So what it means, it says for every value of y, we go into subtract the mean. So one minus three equals to minus two. Three minus three is zero. Five minus three is two. Five minus three is two. One minus three, it will be minus two. And if I add all of them, I will get zero, because the sum of all this, they will cancel out. Minus two plus zero plus two will be zero. Plus two will be two. Minus two will be zero. The second, the fourth one, or the fourth column, it says y minus the mean of y. Hence, I put there the x bar and the y bar. I forgot to put the bar on there, or the line is very small, but there is a bar there. So we do the same. We go to the mean, your y observation. So four minus nine is minus four. Six minus nine is minus three. Ten minus nine is one. Twelve minus nine is three. Thirteen minus four is nine. And if you add all of them, you will see that you will get zero. The fifth column, it says x multiplied by y. So x multiplied by one, it means we multiply our x value with our y value. One multiplied by four is four. Three multiplied by six is 18. Five multiplied by 10 is 50. Five multiplied by 12 is 60. One multiplied by 13 is 18. And all of them, you get 145. And we do the same. The next one, it's x squared. So therefore, it means this value of x multiplied by itself twice. One times one is one. Three times three is nine. Five times five is 25. One times one, we already made that is one. The same will be the same for y squared. Four squared, which is four times four is 16. Six times six is 36. 10 times 10 is 100. 12 times 12 is 144. And 13 times 18 is 169. And in all of them, you will get 465. So once you have built all the columns that you need, you can then go and calculate what we call the coefficient of correlation, which is your R, which is your Pearson, R coefficient of correlation. Remember our table that we have calculated some of the values. So now we can take the summation because you can see there is the summation of x and y. Remember the total is your summation. So n is how many there are. So we have established that there are only five of them. So we can just substitute five onto n. The summation of x and y, we take the sum, the total of x and y is 145. We're going to substitute it into that formula. The summation of x times the summation of y, it will be, the summation of x will be 15. The summation of y is 45. The summation of x squared. Now, this summation of x and the summation of x squared are different. The summation of x squared, it is 61. The summation of x, remember it's 15. The same, the summation of y squared will be 465. So we just substitute into the formula and calculate your R. So in your exam sometimes, they will just give you the value of R is equals to 0,32 and ask you to interpret that value. This is how you calculate the value, but you might be asked to interpret what does this value of 0,32 mean? And in terms of that, based on the scatterplot, you should be able to see or determine whether your correlation coefficient will be a positive or will be negative. For example, if you look at this scatterplot, all the dots are lined up. It clearly shows from the scatterplot that when the values of x are increasing, the values of y are increasing. And because the dots are in a straight line, usually the R, for this type of a visualization or for this kind of a scatterplot, R will be equals to 1. And when R is equals to 1, we say in terms of the direction because R is equals to 1 is positive. We say it is a positive relationship. And when it is equals to 1, we say it is a perfect relationship because everything lines up. But when the dots are scattered everywhere, where you are unable to format or see a pattern arise, because clearly on these dots, you cannot say when x is increasing, y is also increasing. It's not the same case. However, in terms of the coefficient of correlation, when you calculate it and you get the coefficient of correlation of 0.18, we say this is a weak positive relationship because in terms of the direction, it is positive. So let's look at another example. If the dots look like this and R when you calculate it and get R of 0.85, we say this is a strong positive relationship. And when it looks like this, where R, the dots are, you can see that when x is increasing, but y values are decreasing. We end when you calculate R, you get minus 0.92. You can also state that there is a strong negative relationship. But this are not how you only interpret the data. So let's assume that they didn't give you the graph, they just give you the R. So you should be able, oh, sorry, you should be able to use the R value to say whether it is a perfect positive or negative relationship or it is a weak relationship or it is a positive or negative strong relationship or there is no relationship. So when R comes closer to 1, we always also state that the two variables are closely related or are more related to one another. At some point, you should be able to also complete what we call R squared, which is an important statistic that indicates the variation, the total variation in your y value or your dependent value that is explained by the variation in your x value, which is your independent variable. So how do we interpret the values? If your R is equals to 1, we say it is a perfect, or if it's exactly equals to negative 1, we say it's perfect negative relationship. If it's minus 0.7, we say it is a strong negative relationship. If it's minus 0.5, we say it is a moderate negative relationship. If it is minus 0.30, we say it is a weak relationship and if it is 0, we say there is no relationship. The same will happen when it is positive because the only thing that will change will just be the direction, which will state positive. And this is how you will interpret your R. So you just need to know how to do that. Then if we go into test the hypothesis, then the way we state our null hypothesis and our negative hypothesis, we use our coefficient of correlation but for the population, which is pi, which is your phi, not pi, but phi, phi, phi h i phi. So to state your null hypothesis, you will always say your phi is equals to 0, which is your population coefficient of correlation is equals to 0. Your alternative will state that it's not equals to 0. So we always state with not equals to 0. Our null hypothesis will state that there is no correlation between x and y because your correlation is equals to 0. Your alternative will state that there is a correlation between your x and y. You will need to be able to calculate your test statistic because your P will always be 0. So your R is something that you would have been calculated or it will be given to you. For example, like the first one that we did, we know that our R was equals to 0, 32. So you just use that R into the formula because you have your R, your R squared. I will just take 0.32 squared and your N, it will be how many there are. There were five values, so your N was equals to five and you can calculate your test statistic. You will need your degrees of freedom to go find your critical veil. So if your R squared is positive, your coefficient of correlation is positive and V1 is greater than 0. And also you will need to find it where your R is negative the square root of R squared if V1 is less than 0. So it will be negative if V1 is negative. It takes any value that is negative. It will be positive if V1 is greater than 0. Okay, so how do we then test the correlation? So for example, is there an evidence of linear relationship between X and Y at 5% level of significance? We know how to state the null hypothesis. V is equals to 0, no correlation. Alternative will state that there is a correlation. Statement number two, state whatever you are given. Our alpha value is 0,05. Our degrees of freedom, let's assume that they have given us previously, which was five minus, we said it is five, right? Yes, it was five. So our degrees of freedom is N minus one, which is five, oh sorry, N minus two, which is five minus two, which is equals to three. And we go and calculate our test statistic. R is 0,32. P is zero because we take it from the null hypothesis, it's equals to zero. It's always hypothesized to zero. And the square root of one minus 0.32 squared and N is five minus two. And when we calculate our test statistic, we find it is 0,585. And we can then go and make a decision. So we go to the chi-square test. So they will give you the critical values because you don't have a table. Otherwise, they will give you a p-value and we can always use a p-value. So using the degrees of freedom and your alpha value, you go and find the critical value. But in your case, you don't have to go and find the critical value because you don't have a table. They will give it to you and you create your region of rejection because it's too silent. It's a non-directional test because we can notice from the alternative hypothesis, it is a non-directional test. So therefore, there are two regions of rejection. And we look at our critical value, it's 3,1824. So if you go to the t-table, you will find with the degrees of freedom of three and your critical value, your alpha value of 0,025 because it's two sides, we divide alpha by two. And our critical value is minus 3,18. And then we take our test statistic, we check if it falls in a rejection area or it will fall in the do not reject area. And we can see that 0,585, it falls in a rejection area. Therefore, we do reject the null hypothesis. We reject the null hypothesis and we claim that there is no sufficient evidence of linear association at the 5% level of significance. Oh, we do not reject because my error is pointing on the wrong side. It should point here. We do not reject the null hypothesis because 0,58 is less than 3. It doesn't fall on this error. Yeah, it's pointing at the wrong point. It falls in the do not reject area. So we do not reject the null hypothesis. And we can claim that there is no sufficient evidence of linear because we said there is no relationship if we're not rejecting the null hypothesis. And that's how you will answer some of the questions. They just, they will just need to know if you're rejecting the null hypothesis, how do you conclude? So let's look at exercises in terms of your questions from your past exam paper to see how the questions are framed. You just need to know how to state the null hypothesis. Let's go back to those statements. You just need to know how to state the null hypothesis. What kind of a statement do you make when you state the null hypothesis? You need to know how to interpret your coefficient of correlation and so on. You will see when we look at exercises and activities that sometimes they will not even ask you to do any calculation. So let's go there. So the first exercise. A group of hospitalized patients who have been diagnosed as suffering from dementia are treated with certain drug over a period of time. These drugs were prescribed to improve their mental alertness. A researcher studies a random sample of deity, this, a sample of deity, this patient who have been on this drug, who have been on these drugs for varying amount of time, hoping to establish a relationship between the number of days of the drug treatment and the patient's go on a mental alertness test. Which is the correct formal way to express the appropriate null hypothesis? Is it number one, number two, or number three? How do we state the null hypothesis? Number one. It will be number one. Yes. Same question. Right? Same, same question. Nothing has changed. Which is an appropriate test to determine the significance of the relationship between the number of days that the drug was administered and the score on the mental alertness test obtained by the sample patients? What will be the appropriate test? Is it going to be a car square test? Is it going to be a Pearson product moment correlation? Or is it going to be a tea test for one sample? Number two. It will be number two. Now, I want to go back to the statement. The previous weeks, we were talking about a test to test the difference. Now, let's assume in the exam, you have question one, question two, question three. They all give you almost similar statement. How will you identify that this one is talking to the relationship or you need to be looking at correlation? So the only thing that is different between what we do in this week and what we did the previous week. Previous week, it was more about are there any differences? You can see, yeah, they don't talk about differences, but they talk about relationship. So the previous one would have said, establish if there are differences between X and Y. With coefficient of correlations or correlation or hypothesis test for correlations, they always talk about relationship. Next week, we're also going to look at the other one, where it talks about the relationship. The difference also between what we're doing this week and what we will be doing next week is that this week, we're talking about numerical values. Number of days, test, all of them are numerical values. Next week, we will be talking about categorical values. So it's not going to be the actual numerical value or values that you are able to count or measure. It will be values that you can put into categories. So you should be able to know the differences when you're looking at the questions. So we're moving on to the next exercise. Oh, this is for next week. This is for next week. Don't worry about this. I don't know how it came into this, because it talks about the contingency table. Remember, the two days this week, if it talks about contingency table, it's not what numerical correlation about. It still talks to the relationship, but it's not the one that we're looking for. Okay. The other thing that I forgot to mention when we were explaining the correlation coefficient is that your R value takes any value between the value of minus one and one. It takes any value between that. So it can be negative one and one. It cannot be negative two or negative three. It cannot be, but it can be negative zero comma eight or it can be zero comma eight or negative zero comma three five and so on and so on. As long as the value is between negative one and one. So this question, it looks tricky, right? It says which of the following can take on a value of a negative zero point five? Is it one, a probability, two, level of significance? Remember, level of significance is alpha. Three is coefficient of correlation, which is R. And four, which is the variance. I'm going to use the probability as px. So which one can take a value of negative? Always remember that. Your level of significance is always positive. Your variance will always be positive because you're taking the square root and you're taking the square of the values. So it will always remain positive and this will always be positive and this will always be positive because the probability, remember probabilities are always between the values of zero and one. So they cannot be less than that or more than that. They are always between zero and one. So looking at what I've just explained right now, it leaves us with only one, right? Which is option number three? Three. So that will be the value. If there is no relationship at all between two variable x and y, what would be the most likely value of your Pearson correlation between coefficient of correlation R out of the following? No relationship. Do you still remember what will be the value of R? Number three. R will be equals to zero, which is number three. So the questions in the exam will be like you will be sailing through because it seems as if you just need to know the basic concepts of correlation and that's it because I haven't seen any way we calculate something as well. So I don't think they will ask you to calculate. Okay, so this one looks a little bit different. A researcher suspects that children's level of anxiety during a test will interfere with their memories. He gives a list of items to be memorized to a sample of children and gives them a test to see how many items they can remember. Directly after, he also tests the level of anxiety of each child with an anxiety scale, where a higher score shows a greater level of anxiety. The researcher draws a scatter plot of the relationship between the level of anxiety and the number of items recalled. The results are presented on the scatter plot below, where our X variables, which is our independent variables, are your level of anxiety, and your Y variable, which is your independent variable, is the number of items they can record. Now looking at this graph alone, you can already make up your mind because what you're looking at, when the level of anxiety increases, as it goes up, what happens to the number of items they are going down because when the level of anxiety is low, the higher the number of items they can recall. When the level of anxiety is high, the lower the number of items they can recall. You need to be able to make interpretation based on the thing that you are visualizing before you even go to the question to answer the question. So let's see what the question is asking us. What can the researcher infer about the relationship between the level of anxiety and the number of items remembered from the graph presented above? Now how do we interpret the graph that is above? Number one, it says there is a negative relationship as the anxiety rises, less items are remembered. Number two, it's a positive relationship as anxiety falls less, as anxiety falls less, items are remembered. Number three, there is a negative relationship less item remembered as anxiety falls. Number four, no actual relationship the graph shows a negative trend over time. Is it one, two, three, or four? Number one, it will be number one, it is a negative relationship when the level of anxiety increases the fewer items or lesser items they are remembering. So the answer will be number one. Like today we'll finish early, we'll be done by an hour by the rate that we're going. Exercise seven is the same information I'm not going to read again the data, but now looking at the same information, suppose the researcher uses the data collected or calculated uses the data to calculate the Pearson product moment correlation are coefficient to determine the size of the relationship between the number of items remembered and the level of anxiety. Which of the following would be most probable as a description of R? Which of the following description of the expected value of R seems to be the most appropriate? So looking at this are we able to say R is equals to zero therefore it means there's no relationship? Are we saying R is greater than zero therefore it means there is a positive relationship? Or are we saying R is less than zero therefore it means there is a negative relationship? Or are we saying R is not equals to zero, we say R is not equals to zero there is nothing like that, that we calculate, it's either it's zero or any other number between minus 1 and 1. Looking at this graph, which statement? Number 3. It will be number 3 because number 3 it says less. So when you see the sign that it looks like this, always think of the values that goes to the left. Remember if I'm starting here at 0, any value that goes to the left will be negative. Any value that goes to the right will be positive, right? So when it's greater than it means it's positive. When it's less than it means it's negative. So this will represent negative value. Moving on to exercise 8. Now here is I said to you that you don't, you are expected not to, what did I say? Not to do any calculations, right? And they give you a table and they ask you to find which one will be the closest. So now let's see if you really need to do any calculations. Which of the following values given below is the closest to the probable value of Pearson product moment correlation coefficient for the value of x and y? Now all what you need to do is let's look at these values. It says if the value of x is 1, the value of y is 8. So I'm gonna put it there because I don't want to go and do some calculation because this says it's a perfect because this is minus 1.0 and this is 1.0 and this is 0, right? So I can just create a scatterplot of this. It says when it's 2, it's 7. When it is 3, is 6. So it goes down if I can write 1, 2, 3, 4, 5, 6, 7, 8. I can just do it like that. 1, 2, 3, 4, 5, 6, 7, 8. Just like that. Just making an illustration x and y. So we say when it is, let's start again because I created new values. When it's 1, it's 8. So it will be there. When it's 2, 7, then it will be there. When it's 3, 6, then it will be there. I scale is not two points so that is why I am struggling to write the right thing in my book. When it's 4, it's 5. When it's 4, it's 5. So it will be somewhere here. When it's 5, it's 4. It will be somewhere there. When it's 6, it's 3. It will be somewhere there. When it's 7, it's 2. It will be somewhere there. When it's 8, it's 1. It will be somewhere there. Looking at this now would you say this is positive or negative relationship? Negative. It's a negative relationship and looking at the dots are they perfect? Yes. So therefore if they are perfect remember when it's perfect therefore it means your R will be either positive or your R will be negative one. So which one will that be? Number one. It will be number one. So you don't even have to go and calculate anything or try and do anything. You just need to know how to visualize the values. The other thing you can do as well is look at your values. You can see there it says one two three four five six seven eight and in reverse eight seven six so it means when one is going up the other one is going down. Right? When the values of X are going up the values of Y are just going down. And you can just determine that when something is going up and the other one is going down your R will be equals to a negative something and if they line in a perfect straight line then it will be equals to one. If it's not then you can find if they were scattered all over then would have selected zero. But at the moment they are all aligned or lined in a perfect straight line. So that's how you will answer some of these questions. Yeah so let's go to question number nine. A researcher hypothesized that the drug treatment for the hospital schizophrenic patients improves their mental alertness. The study or he studies a random sample of 27 patients to see whether there is a relationship between the number of days of drug treatment and patient scores on their mental alertness which is an appropriate null hypothesis for this research. There's always one way of stating the null hypothesis regardless of which questions or what statement especially for the correlation is it one, two, three or four. Number four it will be number four. We always use phi, phi, number ten. Which one or which of the following is suitable for representing the ages versus the height of a group of children? Age is numeric height is numeric. So which one do we use a scatterplot? Do we use a contingency table or do we use a histogram? Number one. It will be number one a scatterplot. So when you have two numerical values always remember two numerical values then you use scatterplot. One numerical value then you use a histogram. Two Partegorical values. Then we use a contingency table and that is what we're going to learn next week. Okay, exercise 11. Which of the graphs below is most likely to represent a Pearson correlation of R equals to positive 0.85 between variable X and Y. If measure, if measurements are plotted on a scatterplot. Which graph A, B, C for a positive 0.85 from A? It will be graph A. So because graph A is positive there is a purely A positive strong relationship and this is a negative strong relationship and this is a weak. I don't even know whether to say it's a weak relationship or whether there is no relationship because it looks like most of the values for Y are constant. So I can't even say it's a weak negative or positive with this one. No. Why don't we say. I would say there is no relationship. Yes. On this one. Yes. Hi. So we almost at the end. A negative correlation between X and Y implies that a person scoring low on X will generally score on Y. What is a negative relationship? It will be Y on Y. Sorry. Wait. It says a negative correlation between X and Y implies that scoring low on X will generally. Here is your X. Scoring low on X will generally mean for a negative relationship, it will be you will score high because your point will be there and if you score low, your point will be somewhere there. And then it will be just like that. That's the positive correlation. Sorry. Yes. Yes. So this will be low and high. And if it was positive, it would have been low and low. Always visualize. If you are not sure about certain things, sometimes just draw for yourself how this catapult would look. You can just do something like this and then something like this to say this is negative, this is positive. So if I'm here at low and here at low, then it's a positive relationship. If I'm here at low and here at high. So find alternative ways to remember how to interpret certain things. Okay. Question 30. I did explain this sometime and I wrote somewhere on one of the slides at the top right here and I said R lies between certain values. So I hope you still remember that. Pearson correlation coefficient can take values ranging between. Number three. It will be number three, which is it can take ranges between minus one and one. Probability. This is for the probability. This is for R. I'm just going to write the R. And this is, I don't know what that is. I don't know any value that can be between that and that. Maybe probably your alpha. Yes, alpha can be like values, but for alpha value we don't actually even restrict it to only 10. It can even go to 20 or something like that. Because for 80% it will be 20 and so on. I thought we almost done it. So I have a lot of questions from your past exam papers. Okay. Let's look at another one. Pearson R represents. So now you need to be able to know how to interpret your Pearson R. How to interpret your coefficient of correlation. Number one. Does it represent a comparison between observed frequencies and expected frequencies if the null hypothesis is true for the distribution of the data across the variable? Number two. Does it represent the relationship between two variables when the way in which they vary together is compared to their individual variances? Number three. The difference score between two variables relative to their pooled standard deviation. Your keywords, difference, relationship and in terms of the first one I'm going to highlight the two values. Is it one, two or three? Number six. It will be number two. Number three is something that we dealt with last week. Remember for the pooled variance where we looked at testing for two independent groups. And number one, we're going to deal with it next week, which is one part of the Chi-square test for contingency tables to test the relationship between categorical variables. So relationship, question, exercise 15. What is the coefficient correlation between the following x and y values? Also, you can do the same. You can just say because on this one it has a negative value. I'm just going to write here in the middle zero and here minus one. Oh no, really? Let's assume that this is a Cartesian plane that looks like this. At the middle here, there is what we call a zero on the x-axis. On the y-axis, any value this side is negative. Any value this side is positive. Now, based on the information that you see in front of you, it says what will be the coefficient of correlation for this. If your x is zero, so it means x is right on top of the line, y is minus one. So the dot will be here. The next one it says where x is zero, which is right on top of the line. Your y is also zero, which will be right on top of the line. So you've got already two points. The next one it says when x is zero, so I know that on this line x is zero and y is positive one. So let's assume that positive one is there so that will be the dot. If I'm looking at this relationship like this, what is my r? Zero r will be equals to zero, which is option number two. So you don't even need to go and do some calculations that I showed you to say go and calculate your sum of x and y, calculate your sum of x, your sum of x squared, so that you can substitute into the formula and calculate. You don't even have to worry about that. All you need to do is use the scatterplot to find your answers. Okay, Miss Elizabeth, can you just go back to exercise that one? Can you please explain to me why I was saying it's zero? I'm trying to understand exactly why I was saying it's zero. What makes it to be zero? It's the dot where the middle one that makes it to be zero on the straight line. Okay, sorry. I see now we've got company. Hey, because we're trying to late. We already had explained what does zero mean and all that. Let me go back so that then we can be on the same page with everyone. Thank you. I don't think I ever even, no, I don't have. So looking at a typical scatterplot, this is what we call scatterplot, visualizing two numerical values. If the dot are lined perfectly on a straight line, like the way you see them, your R can be equals to, it doesn't have to be equals to one, but in terms of the purpose of explaining the value of your R, I'm going to say if the dots are lined, they're like in a straight line going up. It means it's positive. Therefore, your R will be equals to one for a perfect relationship, perfect positive relationship. If your dots are scattered all over the place, they can be what we call a weak relationship, or they can be what we call a no relationship when R is zero comma one eight. We say it's a weak relationship. If R was zero, we say there is no relationship. The same similar to when you have a scatterplot where you have all the values scattered all over. Your R is equals to zero comma eight five. We say it is a strong positive relationship because the dots are going up. When X is going up, Y is going up. If you look at the bottom one, the dots are also scattered almost everywhere, but at least they form a straight line. But you can clearly also see that when X is going up, R is going down. Oh, sorry, Y is going down. So these values are going down when X is going up. And when you calculate R, you might find that it is minus zero comma nine two. And that we can say it is a negative relationship. And R takes a value between zero one and negative one or negative one and one, which we call the perfect relationship. As the value comes closer to one, we say they've got a relationship. Now, the other way also of interpreting your correlation is based on the scale. Not only exactly like this, because if it's zero point six nine, you will say it's a moderate relationship, whether it's negative or positive because it can take a value of negative or a positive. If it's zero, it means there is no relationship. Therefore, it means the dots are scattered all over. You cannot tell whether when X is increasing, Y is increasing or it's decreasing. You cannot tell because the dots are just lying all over the place, right? Or alternatively, you can have a scenario where it might look like this. Even though your values of X, you might have your dots looking like this. It's not a perfect relationship because when the value of X are increasing, the value of Y are staying constant, right? So there is nothing there. So your R here will be equals to zero. Or alternatively, you can have some way where it looks like this. It is still perfect, but when your X is constant, let's say this is two and your value of your Y, so this is your X, your Y, as it increase or decrease, the value of X stays the same. So at the end, that will be equals to zero. So there will be no relationship between X and Y because X is constant or Y is constant in a way. And that is what is happening with that one. So we can go to another exercise that we just did, this one. So on this one, you can see when the value of X are increasing, the value of Y, they are decreasing. When the value is the value of Y are decreasing. And we did the scatter plot just to demonstrate and we find that because it's perfect, they line up straight in a linear fashion, therefore your R will be a negative one because it's a perfect relationship. Now on this one, let's go back there. And we also did some exercise there where we looked at whether it's a negative relationship or a positive relationship. But when it comes to X and Y value on a scatter plot, because right here it's zero, you can see that it's constant for the value of X is constant, but the value of Y is either going up or down. Therefore, there is no relationship here. So we cannot say it's negative one or positive one because there is no relationship. And when there is no relationship, remember how you interpret the values because then it means your R is equals to zero in that manner. And our last question of the day. TSN correlation coefficient R represents. Is it one, the size of the relationship between two variables? Two, is it the shape of the relationship between two variables? Or three, is it both? As they represent the size and shape, I know in the beginning I said, what did I say? I said, R represents length and direction. Now let's think about it. If I have this and I say R is positive, I'm just going to add a few more so that I don't get. So if I look at this, you are, what does it represent in terms of this? Does it represent the shape? What does the shape mean? I don't know even what this does. Does the strength talk to the shape or does the size talks to the shape or the strength or because they can both? Yeah, it's a tricky one. I would think it's probably both. I will also think that because if I look at some of the questions, there was some way where they asked about the shape when we were just going to go back again to see if we have any question where it spoke about the shape. There is no relationship. There is no way where it spoke about the shape. This was which one of the closest probable. There was no way. So for this one, I'm going to assume that it's both of the above because this will tell me the size in terms of how big is it? 0.98 and the shape will be is it scattered everywhere? Is it a negative relationship? Does it go towards the negative or is it constant? Whether it's no relationship or it's scattered everywhere or is it scattered like that? Which will tell me about the shape of this correlation like which also doesn't say much because then I'm going to assume that the shape talks to the direction which is either negative or positive or no relationship and the size will talk to the strength which is always determined by a numerical value of 0.98. So this is the shape and this is the size. If I'm reading this question the way they phrase it, therefore it means it's both of them. If I may say something, I also agree with you that it's number three because when you look here it says if dots form a U or S shape number one then correlation coefficient is not relevant so it means that also we deal with shapes. When it comes to this plotting of the graphs you understand so that's how I understand it. So it deals with size and shape also. Yeah, I agree. So it is number three and are there any questions because that's the end of our exercises. If there are no questions then it means we're going to leave early so I'm just going to put the register back onto the chart. Please make sure that those who joined late you just complete the register before you leave and remember that next week is our last session until further notice then we can start looking at exam preparations. Definitely it means I will need to know when you are writing your exam so that we can plan for that. And the time is on the 8th of November. That will be long before the 8th of November. Then it means the whole of October you are on your own but we can arrange in terms of sessions to look at how we look at the exam preparation by looking at different exam papers going through them from question one up until the end covering everything so that then you are prepared at the 8th of November because I'm going to assume that Unisa is going to stop the sessions especially the group sessions very soon as the exam starts because the exam starts in October and my last session is on the 27th but that doesn't stop us from engaging whether Unisa stops the sessions and say we no longer offer the sessions so we can still continue with our our sessions but we can arrange for that. So I'm going to assume that everyone who is here is also on the WhatsApp group if I know more information then I will share there but other than that then I will see you next week. Same time which is six o'clock until half past seven. Thank you very much this is very helpful I really appreciate it. You are more than welcome it's my pleasure. Are there any questions? If there are none then happy learning enjoy your evening I will see you next week. Bye.