 So, hello and welcome you all learners for the online session on topic bivariate statistical analysis for the course data analytics. Myself Mr. Vipul Kondekar from Walton Institute of Technology, Sholapur. So, in this video session we will be addressing these two outcomes you will be able to calculate these statistical parameters like covariance and correlation and at the end you will be able to make interpretation of those values coming for certain attributes. So, these are the contents present in this video we will have some introduction we will talk about covariance correlation and later we will calculate these covariance and correlation values and then finally we can make some interpretations based on what values you are getting of correlation or covariance covariance for certain attributes. So, when you are making bivariate analysis we all know we are trying to analyze two attributes simultaneously. So, at a time two attributes like you have always we are referring a table with the contact list where you have stored contacts of 14 different persons and you have information about let us say weight of each and every person and height. So, if I take these two attributes weight and height. So, these are two quantitative attributes and always I am interested in is there any relationship between these two attributes. So, can I come up with some statistical measure of that relationship? So, yes it is possible with the help of a statistical measure called as covariance. So, what basically is the covariance? It measures the relationship. So, the degree with which the relationship exists between the two attributes. Now, when you calculate the covariance so, when the two attributes represent similar variation it is like the two attributes are proportional directly proportional to each other. So, then you will come up with the covariance value as a positive value. If you if you observe that the attributes are varying exactly opposite way opposite way in the sense it is like the speed as speed increases speed of a car increases the time taken by the car to reach to the destination will go on decreasing. So, if these are the two parameters suppose 10 different cars are running in a race and they are running with different speeds and they will take different time to reach to the destination. If you find these two attributes you will find that there is opposite relationship it is like inversely proportionality inverse proportionality is there. So, there you will find that the covariance value will be negative between the two attributes and there are some some attribute values where there is as such no relationship it is like what weight you have and how much is the education level. So, these two attributes are rather independent of each other. So, you will get that covariance between the two attributes. So, two such attributes will be 0. So, the basic point regarding the covariance is it is trying to give the relationship what type of relationship exists between the two attribute values. Mathematically when you are interested in calculating the covariance value. So, Sij is representing the covariance value between the two attributes xi and xj. Let us say xi represents weight, xj represents height. So, what is the covariance between weight and height? So, how we will calculate that? So, it is given by the average average that to calculated 1 upon n minus 1 n is representing total number of instances present in your data. So, for those attributes suppose I have a contact list of 14 different contacts. So, n represents 14. So, what is what this covariance formula is talking about? So, it is calculating the summation of the products. Product of what? Attribute first attribute value and its mean, then its corresponding second attribute value and mean of the second attribute. So, the subtractions are carried out and then their corresponding products are done. And these products are done for k equal to 1 to n means 14 instances are there for every instance this product is done ok. And those product values are summed up and whatever is the summation you are getting. So, that gets divided by n minus 1 n is representing here number of instances. This is how you can calculate the covariance between the two attributes. Now, when you are finding the covariance the basic problem with the covariance is the size or the value of the covariance it is determined by the range of the values for the attribute ok. If range is more range is the difference between maximum and minimum value ok. So, if that range is more if some outlier values are there. So, they will result into the larger difference between the actual attribute value and its corresponding mean ok and that will increase the covariance value. So, solution for dealing with this problem is you may go for instead of using the attribute values as it is you can have normalization of the attribute values and then those normalized values you can use for calculating the covariance or another is you can go for another similar major called as correlation. Now, basically when you are calculating the correlation between the two attributes again 50 percent part is same you need to calculate the covariance first and this covariance value is divided by standard deviation values of those two attributes. So, correlation coefficient between the two attributes xi and xj is given by you calculate the covariance between the two attributes and divide it by standard deviation value of the i attribute and standard deviation value of the j attribute. So, covariance divided by this standard deviation values will give you the correlation coefficient this is also called as Pearson correlation coefficient formula. Now, with this formula if you are coming up with the statistical parameter called as correlation value Pearson correlation coefficient value. So, the significance is if it is a value is positive if the value is positive it means that there exist there is existence of positive tendency positive tendency in other words direct proportionality between the two attributes and as it becomes to as it becomes to closer to the straight line. So, then the Pearson correlation value becomes closer to 1. So, exact straight line relationship is there. So, then Pearson correlation value is 1 on the same lines if there is negative tendency means inverse proportionality is there then Pearson correlation value will go closer to minus 1 and as it becomes closer to straight line again in the with the negative tendency that value will be minus 1, Pearson's rank correlation value. So, if you look at this formula for calculating the Spearman rank correlation it is almost same to that of the Pearson correlation only thing is what you are doing over here is you are you are considering the ranks of the attribute and not values of the attributes. So, this is again the product carried out here, but the product is of what not attribute values, but it is the product of ranks. So, this is the rank of the ith attribute of first instance let us say minus average value mean value of the ranks. So, this r x bar earlier it was x bar x i bar x i bar was indicating what it was mean value of ith attribute. So, here it is mean value of ranks and then again it gets divided by the standard deviation values of the ranks itself and then you will come up with the Spearman rank coefficient value. So, when you are calculating this instead of calculating instead of using attribute values we order all those attribute values and from those ordered values you calculate the ranks and those rank values are used those rank values are used for calculating the Spearman's rank correlation. Consider this is the data available to you and you want to make the analysis of two attributes maximum temperature and weight. So, what I can do is I can calculate the correlation values. So, what I do is I take these two data values. So, let us say maximum temperature and weight values and yeah. So, we will implement this to the formula. So, first what I do is I calculate what is the average value of the average value of the height and average value of the weight. Consider this is the table data table available to you for the analysis where what you have is different attributes out of that I take let us say two attributes one is the height average height and average weight. So, I take these are the height values and these are the weight values for the analysis. Now, I implement this formula for calculating first we will calculate the covariance. Covariance calculation will be done for that you will require the mean value to be calculated. So, this is how the average value of these particular cell values is calculated this is the mean of the height this is the mean of the weight and once you have these values I calculate the difference between the actual attribute value and its corresponding mean value and then these products are taken. So, height minus average value of height multiplied by weight minus average value of weight. So, this is done for all these instances and then once I calculate these products I sum these products and then this summation getting divided by n minus 1 n is in our case 14 instances are there. So, n will be 13. So, this gets divided by 13. So, this is the covariance value. Now, once you have this covariance value I can calculate the standard deviation of the height and standard deviation of the weight and if I divide this covariance value by the standard deviation of height and weight I will get the correlation value. This is the Pearson correlation value. Now, on the same lines how I can calculate the Spearman rank coefficient. So, here what is done is you take the attribute values and first you arrange those attributes values maybe in the ascending order and I check what is the position of those attribute values and then I form these rank matrix. So, height so, height rank I calculate it as suppose first height is 158. So, what is the rank? Rank is 1. So, second height value is 163. So, rank 2 but now if you look at here there are two instances having the same height. So, its rank value I take it as averaging means 5 plus 6 divided by 2. So, rank is taken as 5.5, 5.5. So, similarly here I could find that there are two persons having height 180. So, its rank is taken as 9.5. So, this is how I will get the rank values for heights for all 14 instances. Now, how I can get the ranks for the weight? So, I take suppose height rank 1 corresponds to height of 158 and 158 height corresponds to a weight of 55 and 55 weight corresponds to position number 1. So, that's why weight rank is 1 for height 158. Now, coming to this point let us say what is weight rank for height rank 4? Height rank 4 corresponds to a height of 168, 168 height corresponds to weight of 65 and weight of 65 corresponds to a position of 3. Hence, this weight rank is coming out to be 3. So, like this you will get different ranks. So, let us say how I get this rank value as 7.5. So, this is 9, this is for height rank of 180. 180 is the height taken by two different instances. So, out of that 180 is taken by two instances, one is having weight 85, another is having weight 75. So, first I will consider weight 75. Weight 75 is taken by two different locations, location number 7 and location number 8. Hence, 7 plus 8 divided by 2. So, that comes out to be 7.5 will be the weight rank. So, this is how you will calculate the ranks for height and weight. And then if you use instead of the actual attribute values, if you use these rank values in the formula just the correlation coefficient values are also getting modified. And then Spearman correlation coefficient value I am getting it as 0.96. Earlier values were actual attribute values and here I was getting the Pearson correlation coefficient as 0.94. And as this coefficient instead of coefficient attribute values, if I am considering the rank values, so that value has got changed to 0.96. Keep in mind the correlation values closer to 1, it indicates that positive relationship is there between these two attributes height and weight. So, you can calculate these attributes for the height and weight, height and maximum temperature. And just think why the correlation values in case B are smaller than that of in case of case A. So, these are the various references used for this video. Thank you.