 In this video, let us look at what is Peirman's rank correlation because we know the limitations of Peirson correlation coefficient. So, we see that Peirson correlation coefficient is not sensitive to nonlinear relationships. Also, for this kind of data, we can use other coefficient coefficients like Peirman's or Kendall's rank. Let us look at Peirman's in this course. I am not going to discuss Kendall's rank because there will be too much of talking about correlation only in this course. But let us have a feel of what is Peirman's rank correlation. Similar to Peirson's correlation coefficient, Peirman's rank correlation also varies between plus 1 and 2 minus 1 and exactly it is a negative sign indicates that the indirect relationship between x and y that is indirectly proportional. If x increases, y will decrease. If x increases, then y will decrease that is a negative sign. And the positive sign indicates if x increases, then y also will increase. So, directly proportional. And the values indicates the magnitude of relationship between x and y. If values more like 0.8 or 0.9 is strongly correlated, 0.2 indicates the correlation is bit weak. The first point of Peirson's correlation coefficient between ranks is called Peirman's rank correlation. Peirman's rank correlation is nothing but Peirson's correlation coefficient applied on ranks on the variables we use. So, we will discuss that in detail in the coming slides. So, let us look at that. So, rank of variables then do Peirson's or is exactly what Peirman's choice. Let us look at Sabin's students data and we have attendance and we have final marks out of this. First thing we have to do is let us take the, this is x consider this is x and this is the y. We will do the ranking of the available data like this is the least and maybe we can arrange it in ascending order. And this is the first rank, this is second rank, first is 45, second is 56, then 80, then 85, 90, 9500, those are ranks. Similarly, for the mass we can rank, this is the least value there I rank, so 1 and the rest is 2 and this is 3 and 4, 5, there are 2, 90. So, 6 and 7 we do not want to give, so we can put 6.5, 6.5. So, since we have a rank them, these variables are ranked. If you compute Peirson's or Ion this rank that gives you the Peirman's row. Peirman's row is, this is a row, the symbol is Peirman's row. Since I use a different font, it is like that otherwise it is just a row, there is no line below it, just a row. Peirman's row and is computed on the ranks of the, Peirson computed on the ranks of the variables. So, it is simple. That is other formula instead of using Peirson on this rank, the other formula is very easy. You compute the difference between these two ranks. So, the difference between these two ranks is 0, 0 and this is 1.5. It is 2.5. I am wrong. Then the calculation is very wrong. So, let us compute it again later. You can compute it on your set. So, the difference is 2.5 and it is 2 and it is 2 and this is 0.5 and this is 2. So, the values will be if you, I just did the motive because I just, if it is negative also I consider it is positive because I just want the absolute difference not the sign. For example, 2 minus 2, 0, 4 minus would be minus 0.2.5 but I just put 2.5 value and 3 will be minus and it will be positive. Doesn't matter. If you want to consider the square of this value, so whenever it is minus, it will be positive anyway. So, 0, 0, 0, 2.2, 5, 4, 4, 0.2, 5 and 4. So, you can use these values summation of this d square values and there is a formula to compute Peirman's row. It is actually the 1 minus 6 into summation of d square values n into n square minus 1. n is actually number of samples, it is 7 here and you can use this formula to compute Peirman's row. And this formula is a simplified formula of doing Peerson's or Runner-Rang. So, this is how you compute Peirman's row. So, let us look at a plot. The plot looks like this for these variables. There is a linear relationship, there is a good relationship from here to here. Let us look at the Peirman's row value and R value. The row is 0.74. The row is 0.74, this is not correct because I used 1.5 instead of 2.5. If you compute it, if you compute this value currently 2.5 square and you add that sum of it up and if you add it and compute in this particular formula, you might get a different value. If you use the difference in ranks and compute this and compute the Peirman's row using this particular formula, you will get this result. If you simply apply Peerson's or on this rank, you will get this value. This value, I will take it from the website, the website which will help you to compute Peirman's row for free. So, I used this data applied and I got this value. So, this value is not correct. You can compute it and compare. There is a small difference between the values you compute by applying this formula or applying a Peerson's or on this rank. So, I would request you to check what is Peirman's row in detail. But this is all the Peirman's row is computed. When we mean Peirman's row is computed by using rank, the variables then apply Peerson's or on the rank. This is what it means. You rank them, then you apply Peerson's or on that. So, here R indicates Peerson's correlation coefficient 0.84 is high correlation but Peirman is not saying it is high correlated because the relationship may not be linear or something is missing. So, let us see what is the exact difference between row and R in this slide. In this activity, I just want you to show how Peirman's row is sensitive to the outliers. So, now we saw this particular values in the previous slides. I changed two values that is for a max A, I changed this to 17 and for the high marks in A, I changed this 23. So, I made a two outliers out of this mark. I just changed myself to create two outliers because every other value is perfectly high score means high score in B, high score in A, high score equal to B. So, but I just changed these two values to show the outliers. Now, can you predict, do not need to compute it mathematically. Can you predict or guess what will be this Peirman's row for this plot? Also, what will be the R value like Peerson's correlation coefficient value of this plot and how it will differ? Can you guess it and how it will vary for outliers and everything. If you have done that, please watch in the video to continue. So, this Peirman's row for that particular is 0.81, but the correlation coefficient R is 0.34. Because it is outlier, the line tries to fit all the lines and there is a huge difference between the points and the line. So, the R value is going low, but row is good, row is not sensitive to that outlier. So, it can handle the outliers also. The rank value is used to compute the correlation. Why? Peirman's row is not that much sensitive because out of all 60 students data, only 2 data is not correct. For example, in that example 94 and 98. So, this will be highly ranked 59 and 60. And why the marks are 17 and 23? I just may not check it out. It is 79, 23. So, this might have been a rank number 1 and rank number 2. So, when you compute the difference between ranks, it might be say 58 and 58 and square of these values might be huge. But when compared to the other values, there are other 58 values which is closely related, you know, it is like highly related. Like all the ranks might be equal except these two ranks, so 1 or 2 values. So, there is a value and if you divide by 60 into 60 square minus 1, this value will be negligible. So, that Peirman's row is not much sensitive to the outliers like the Pearson correlation coefficient will do. The reason is we are ranking them, we are comparing on the rank, only the 2 values are not matched. So, we tried to compute the correlation coefficient as much as good based on how many numbers or how many values are highly correlated and it might ignore the outliers. So, hope you understand the relationship between how Peirman's row is computing the correlation in outliers. Also, the Pearson's correlation coefficient is computing concerned outliers when they compute the R value. Let us do the other activity just make it clear. You see x value in this plot is 1, 3, 4, 6, 7 or some scale between 1 to 10. But the y value is 1, 10, 100, 1000, 2000s like in the log scale or something. If it is 10,000, that means the logarithmic scale. And if I want to compute Peirman's rank correlation row and also or Pearson's correlation coefficient, what will be the values? No need to do computations, mathematical computations. Instead, based on your understanding till now, try to find out the correlation coefficient between these 2 variables. Maybe the linear relationship will now connect some lines like this and that might have some different points and experiments will do differently. Can you guess the row and R value after doing that? Please rush him to continue. The R is, I am sorry, the R value is 0.87 positive and it indicates it is highly correlated, but some points are not linearly related with x and y. If x increases, y also increases, but not fitting in that linear relationship, the linear line. But row indicates 1, it means for all the incremental in the x, y also increases. They may not be in the linear scale of increment, but there is an increment in x, there is y is incremental. So, 1 indicates whenever the x increases, y also increases, maybe the different scale, but there is a relationship, that is a row is 1. So, this indicates row can handle if one variable in a different scale, logarithmic scale and x in other scale, it can even able to identify the correlation coefficient, because we add a rank. You understand, right? So, if you compute the rank of the values in the previous plot, you get exactly a similar rank. So, all the ranks are same, if find a difference, it will be 0. So, you get 1 minus 0, it will be 1, that is how the row is computed. But the patient's correlation coefficient will try to indicate 0.87. So, I just want to show that in the previous slide. If you just compute the rank, the x rank will be rank of x will be 1, 2, 3, 4, 5, rank of y will be 1, 2, 3, 4, 5. So, if you find the difference between these two values, definitely it is 0 and 1 minus 0 by whatever value it will be 0. So, 1 minus 0 is 0. Or if you compute a patient's row on this value, kind of a self correlation, it will definitely 1. So, that is why the statements row is 1, but the R value is not when R value is trying to fit a line, maybe the line kind of trying to fit like this. There is a small difference between these lines, values in these lines, that is why it is 0.81. Hope you understand what is statements row and patient's correlation coefficient are. So, which one to choose is very, very important. And can I just go and use coefficient as the values to prove my algorithm or hypothesis or my inferences? I would like to say it is not the case. So, first always visualize your data. We show the figure diagnostic analysis means you should have done the descriptive antics also. Visualize the data and see is there any outlayer, what are the relationships between x and y, is x and y in the same scale or linearity, all these things, then you can choose which correlation coefficient you want to report. Most widely Pearson's correlation coefficient is used if it is values or both are nominal, it is good. If one of the values ordinal people use statements row. So, look at your data visualize since you know the math began both these correlations statements and Pearson, now you know which one to choose based on your data. So, visualize it and decide. And also consider non-linear relationship between these data, then pick the right correlation coefficient. If you are doing it show it why you are picking the right correlation coefficient explained to others and use that value. Most important is correlation it is not causation. Correlation indicates just what is the relationship between x and y. If yes increases is y also increasing or decreasing that is what the correlation is indicating. Never use correlation as a causation. It never tells that y increases because of yes increases. No, there is a correlation between these variables, but it never tells causation of y is increasing. So, never use the correlation coefficient values to prove your causation in your theory. So, in this video we saw what is Spearman's row and we also saw what is relationship between Spearman's row and Pearson's correlation coefficient. Hope you understood both. Thank you.