 In this video, let us talk about correlation. So, what is correlation? The correlation indicates the linear relationship between two random variables. What is the relationship between variable A and variable B? Why I said linear? That is a common correlation coefficient we use, that is why it is called linear relationship. Let us look at other correlation types also. So, the correlation value R is to indicate the strength. So, there is a value of R, correlation coefficient value R equal to say 0.5, it indicates if X varies, Y varies in the magnitude of 0.5, the strength of the variations. Also, the sign indicates which direction the correlation, is it direct or indirect? For example, if it is positive R equal to plus 0.5, which indicates if X increases, Y also will increase, it is directly dependent. If the negative, it indicates indirectly dependent. So, it is what happens, if X increases, Y will decrease. So, negative indicates the indirect proportional and direct proportional. So, positive and negative, it is a sign of the correlation coefficient. So, there are different correlation coefficients. The first is Pearson correlation coefficient. It is very common used widely in Excel or in other libraries too. So, the definition we give you above is for the Pearson correlation coefficient. There is other correlation coefficient called the Spearman's rank correlation or Kendall's other correlation coefficient available. Let us look at the Pearson correlation in this video. Assume that both X and Y are linear. This is very important, this assumption that yes and Y are having a linear relationship. And Pearson correlation gives minus 1 to plus 1 value. It indicates strong negative correlation if it is minus 1, plus 1 indicates strong positive correlation. If it is 0, there is no correlation absolutely. So, Pearson correlation coefficient is computed using this formula. I thought of explaining this formula, but it is not needed. So, this formula can be simplified and you can compute Pearson correlation coefficient easily using different tools available or free tools in the web. Let us look at the Pearson correlation coefficient. If you remember, we had a marks and attendance of 60 students. This is a part of the student. Say we have, we have some 8 students data here and we plotted that attendance and a marks. Do you see there is a correlation between attendance and marks? It seems to be right. So, if the attendance increases, somehow the performance increases except these two value. This can be like we do not know why it happened, but seems to be there is a relationship. So, the correlation coefficient will be positive because if attendance increases, the marks increase. So, it is a positive correlation coefficient. And so, it is a kind of, it is a positive correlation coefficient and it will be strong. It is not very weak because the line is actually trying to match all the points. So, the value is 0.25. That is because we have two values which is well below this line. So, but it is a positive. So, plus 0.25. So, this figure is from Wikipedia and let us look at this figure to understand what is Pearson correlation coefficient. A positive 1 increase, sorry, positive 1 indicates there is a strong positive correlation. If x increases, y also increases definitely. So, if x increases, y increases. And this is a negative correlation coefficient. It is strong negative like x, y. When x increases, y decreases. And this is noisy data like 0.8, 0.8 minus 8. This is noisy data, but still the lot of data, but still it is fitting around the particular line. So, it is just 0.8. 0.8 is very strong correlation by the way. So, 0.4 is weak correlation coefficient. That is not really a good correlation, but 0.4 is weak correlation. And 0 indicates there is no correlation. Absolutely, there is no correlation between x and y. We cannot say if x increases, y also increases because x is very low and y is high and x is very high till y is low. So, we cannot say any definite correlation between these two values. Let us look at this line. It is very interesting. So, all this value, except this, all this are plus 1. So, we saw that plus 1 indicates positive and strong correlation. But it is not telling anything about the slope. That is very, very important. So, minus 1 does not mean that it is a linear relationship. Yes, but there is not like a strong slope like x increases, y also increases in a same order. It is not so. So, for example, in a negative correlation, if x increases in the magnitude of say 1, 10, 100, y might increase, y might increase within say 8 to 20 or something like that. So, not in the same scale. That is what it indicates. So, case and correlation will not tell about the slope. It tells it is positive, whether the direction is positive and it is strong or not strong. That is it. Magnitude is what I said strong or not strong. Let us look at the last row. It is very interesting this row. So, definitely, all of this correlation coefficients are 0. And it indicates that there is no linear relationship between x and y. But there is a relationship which is nonlinear. For example, there is a good relationship, this particular value, this might be high, there is a some nonlinear relationship is existing. System, the Pearson correlation coefficient may not tell this relationship because it has no idea what is the relationship between x and y. But there is a nonlinear relationship x is between x and y. There is some kind of relationship. If you apply some predictor analytics or predictor classifier, definitely it will classify very simply using this data because there is a nonlinear relationship. But Pearson correlation coefficient cannot tell this. That is how the graph. So, this chart is very important to understand Pearson correlation coefficient, also its slope backs. For example, the slope is not indicated in a Pearson correlation coefficient value. Also nonlinear relationship between x and y is not captured in the Pearson correlation coefficient. So, you saw what is Pearson correlation coefficient and you saw the range of its value plus 1 to minus 1 and what plus 1 indicates and you saw what are limitations. So, can you write down two limitations that we discussed in last slide using Pearson correlation coefficient for diagnostic index. So, please write down two limitations. After writing it down, please continue. So, the slope does not indicate relationship. The slope is not captured in the Pearson correlation coefficient. Also nonlinear relationships are not captured. And nonlinear of x and y is not considered. For example, x can be a linear relation, but y is not a linear variable. For example, we will discuss this in detail in a next video. For example, if I plot only x, x value, x can be linearly increasing. This is about only x. If I plot y, y might be like this. So, if I compare x and y, there is a very good relationship between x and y. Although x is linear, y is nonlinear, that is not important for Pearson correlation coefficient. Pearson will tell you whether if x increases, y increases or not. We will discuss this in detail with example in next video. So, let us see what is nonlinear correlation. If you see this x and x square, if I plot this value x versus x square, you know that x square is increasing. So, that y value is actually like this. It is not a linear relationship. But x is actually as a linear scale. It is not really a nonlinear scale like a 2, 3, 8 within the range. But the y is in a different scale. Although the x is linear, y is nonlinear, but the correlation coefficient is 0.95. The reason is there is a relationship between x and y. If x increases, the y increases. And if x and y increases, there is a strong correlation, a positive correlation coefficient. y 0.95, the difference is just because it tries to find a linear relationship between this. So, for example, it might be trying to find a linear relationship here. It might have missed these points or these point or some points like this. So, that is why it is 0.95. Otherwise, it will go a perfect one. That is to show why Pearson is not considering the linear function of each x and y. It does not consider it. In general, it should be 1. It should be 1. It is an exactly good correlation between x and y. But Pearson cannot find that, the Pearson's formula. Hope you understand what I mean by Pearson do not check the linearity of both x and y. So, y can be nonlinear function or variable, but x can be linear function. Still, it cannot identify those relationships. So, in this video, we talked about what is correlation and what is Pearson correlation and what are the drawbacks in Pearson correlation. Thank you.