 Moving forward with this thing hypothesis, we are moving toward correlation now. When we are doing research and we are trying to measure the variables, one way is to find the relationship between the variables. Correlation is a statistical technique that is used to measure and describe the relationship between two variables. When we look at our surrounding, when we look around us, there are many variables that are related with each other and correlation actually calculates the relationship between any two given variables. You being a researcher and being a psychologist can see so many things that you are curious about or you want to know about. For example, what is the relationship between studying hours and the final GP of students? What is the relationship of exercise and then weight loss? What is the relationship between any like personality traits and the deviant behavior? So there are so many examples where we are interested in calculating the relationship between two variables. So today we will be talking about correlation. What is it? How we calculate it? We will do it manually and then we will be doing it on SPSS. Usually the two variables are simply observed as they exist. Naturally in the environment, there is no attempt to control or manipulate the other variables. So one thing is that we just observe the variables how they are actually lying in the real environment, how they are both are existing in the environment without manipulation or without controlling any other extraneous variables. Correlation requires two scores for each individual. For example, if I want to study the relationship between studying hours and my GPA, so I'll be needing two scores for both variables. One is the studying hours, like how many hours I study each day and then the GPA for the same individual. So these scores normally are identified as X and Y variable. So in correlation, there are two variables X and Y. They are occurring normally in the environment and we try to find out what is the relationship between the two variables. So this is basically a graphical representation of the correlation. When I'm talking about that, we need to know the two variables, like one is the X and one is Y variable. So here on the screen, you can see that there is a family income in US dollars and the students average grade on their test. So scatterplot basically is a double entry chart where one single value presents score of one individual on both variables. For example, suppose this is me, Saima, and how my monthly income is, which is around $180,000 and then my average grade is like 100. So this is basically one score is representing my score on both the variables, X and Y. So similarly, each dot is presenting each individual and their score on the two variables. And this is called a scatter diagram or scatterplot. So correlation of different applications, like where we will be using correlation or why we will be calculating correlation. One specific example for correlation is that it is used for prediction. How? Because if there is a relationship between the two variables and if it is a strong relationship, we can predict one on the basis of other. For example, if I find a strong positive correlation between studying hours, like how many hours I study each day and the final grade, it means I can predict based on the relationship, I can predict that if I study three hours every day or I study four hours every day, what my GPA could be. And that we talked about that the goals of science or goals of psychology, also one of the major goal in psychology or science is the prediction and that we can do with the help of correlation. In general, in correlation, we represent the correlation with the symbol R. We can write R like this, which is a symbol that represents the correlation coefficient. That is a strength of the relationship between the two variables. And if we square the R, actually, this helps us in explaining the variance and the dependent variable. For example, if in how many hours I study each day and what is my GPA, the correlation is, for example, 0.6, it means that there's a high correlation between the two variables. I will talk about how we express the correlation, what is the magnitude, what is the direction. But just to understand that why the application of correlation is in prediction, we simply square this correlation value to do the prediction. For example, the square of 0.6 is 0.36. If I can say that 36% of the variance in the dependent variable can be explained by the independent variable. So 36% of the variability in my variable can be explained by the variability in the ax variable. So that's how we call, we use correlation for prediction. And this R square value, which tells us that how much variance is explained in the dependent variable due to the independent variables called the coefficient of determination. We will talk about it in a while. Correlation is used to demonstrate the value. So second application of correlation is that we calculate reliability and validity of the test and we use the correlation coefficient for that. We express the validity in terms of correlation coefficient. And also we express the reliability in terms of correlation coefficient. So these are the three major applications for the correlation coefficient that you being a researcher can use. So that's why correlation is important, not only to understand the environment around us, how variables are occurring around us, but also for prediction, for reliability, for validity, for doing science, there's many uses of it. Furthermore, the prediction of the theory could be tested by determining the correlation between the two variables, not only for prediction, for reliability, for validity, but also for prediction of the theory. For example, how much the theory is useful or kind of accurate, we can also access the help of a correlation coefficient. The value of the correlation can be affected greatly by the range of the score represented in the data. For example, there are two variables X and Y. These are the studying hours, three, two, one, four, and this is my GPA. For example, with the three GPA, I got 2.3 and 2.5 and 1.5, and with the four hours, I got 3.5. So this is the GPA and the variable. So basically what is the range of the scores in the two variables? The range is from one to four and here the range is from 1.54. So if the variability is more in the scores, correlation gonna be affected by that. In both the variables, whether it's the X or Y, the variability or the range of the score actually affect the correlation coefficient. One of the two extreme data points, often called outliers, can have a dramatic effect on the value of the correlation. For example, if this is the X, Y variable, the same example, we have say two, three, four, and here we have five, six, and then say 20. So even if there's a one or two scores that are called outliers, the data values that are outliers mean that they are very much away or far from the average kind of scores. If there are a few outliers or even one or two extreme values in the data, correlation coefficient gonna affect dramatically because here you can see that the, if X is increasing, Y is increasing as well, but because of this extreme value or the outlier, correlation gonna affect dramatically. This is like, remember that correlation is our relationship between two variables X and Y and we discussed a few of the applications of the correlation coefficient.