 Hello and welcome you all for this online session on dispersion univariate statistical analysis for the course data analytics. Myself, Mr. Vipul Kondekar from Walchand Institute of Technology, Sholabur. So, these are the learning outcomes for this video session. So, you will be able to actually calculate these parameter values dispersion statistic parameter values and not only just calculate you will be able to make interpretation of these statistical parameters. So, these are the contents present in this video session we will have introduction we will talk about different statistical parameters then we will see how we can do the calculations and what are the results obtained for some sample data and then some observations and interpretations. Now, let us start with this when you talk about analysis the data may be available for analysis that may be univariate, bivariate or multivariate. So, number of attributes you are analyzing if it is only single attribute that becomes univariate analysis. If you are analyzing two attributes simultaneously that becomes bivariate analysis. If more than two attributes are there and you are analyzing that that becomes multivariate analysis and the all these analysis can be done in by using three different approaches. As we discussed first approaches you can go for frequency calculation, second approaches you can go for visualization of these attributes and third approach is you can come up with some statistical descriptors. So, when you are analyzing univariate data the statistical descriptor can be of two kinds the descriptor may be a location statistic descriptor or dispersion statistic descriptor. So, today we will be talking about different dispersion univariate statistic parameters. So, when you talk about dispersion statistic measures they are basically talking about how distinct and how different the values are present in the given sample space for different attributes. So, the most commonly used dispersion statistic parameters are it starts with the amplitude. What do you mean by amplitude? Let us say you have a data of heights of 100 different persons. So, what do you mean by amplitude of the height? So, amplitude is nothing but it is just the difference between the maximum value of that attribute and minimum value. So, you have 100 contacts and then a person is having height maximum height taken by a person is let us say 195 and minimum height taken by the person is let us say 150 in terms of centimeter. So, the difference between maximum and minimum 195 minus 150 comes out to be 45. So, 45 is the amplitude for that attribute height then comes inter quartile range. So, you know location univariate statistic parameters for the data are there we are doing the calculations of first quartile value, second quartile value which is nothing but the median value and third quartile value. So, inter quartile range is nothing but the difference between the third quartile value and first quartile value because definitely when you are calculating this quartile values the attribute values you are arranging in the ascending order. So, third quartile will be always larger than that of the first quartile value. So, you inter quartile range will be always positive. So, it just talks about the width of the box plot. So, in the earlier video we have just checked how I can visualize this quartile values in terms of box plots. So, the width of the box plot will be represented by inter quartile range. Now, next important descriptor is mean absolute deviation because mean is not that much significant when the attributes are taking the values in such a way that the difference is like suppose you are doing 10 different measurements for all those 10 measurements you are calculating how much is the measurement error. So, for every measurement let us say you are measuring temperature. So, 10 different temperature measurements you are done. So, for every measurement you are considering actual temperature and major temperature and the difference between these two is coming out to be error between the measurement. Now, it may happen that for certain measurements the major value is greater than actual value for certain measurements major value is less than actual value. So, hence the measurement error may be positive for some measurements and negative for some other measurements, but if you just calculate the mean what may happen that those positive values may get compensated because of the negative values and effective mean value I am getting as 0, but it does not mean that error in the measurement was 0 because the magnitudes positive and negative values are getting compensated. So, to override this problem you come up with different descriptors like mean absolute deviation. So, this mean absolute deviation talks about what it is the sum it is the mean of summations of the difference values calculated and difference also we are interested in magnitude not sign. So, that is why that mod sign you are finding for that particular formula it is you can check the difference between the mean value of the attribute and actual value taken by the attribute. Let us say the mean of the weight value is 59 and let us say one person is having weight 61. So, 61 minus 59. So, that will be the difference of 2. So, then you calculate these difference values for all the instances you sum up all those difference values and then you calculate it is mean. So, that is why it is called as mean absolute deviation because this difference is giving you the measure of deviation how deviated the actual value is from its mean. So, you will come up with this important parameter called as mean absolute deviation, but the problem with this mean absolute deviation is the outlier values present in the data will contribute a lot and that will significantly increase the mean absolute deviation value outliers creates problem in calculation of this mean absolute deviation. Then comes a statistical parameter called as standard deviation. So, this standard deviation is another measure of calculating the distance between the actual observation and its corresponding mean value. Mathematically the standard deviation is calculated by this particular formula it is denoted as sigma x standard deviation calculated for some attribute x is sigma x which is given by under root of summation i equal to 1 to n here this n is representing number of instances present in your data, number of values present for your data. So, for each and every value what you do is you calculate the difference between the value and its corresponding mean and then you square the difference. So, no way of getting negative values here because you are calculating square and you sum up all those square difference values and those square difference values are summed and then its average is taken and as you have square this difference values finally, what you do is you take square root. So, to compensate that effect. So, it is like root of mean of square value it is like rms value only so called as standard deviation. So, this is how you can calculate the standard deviation. Now, in case of data analytics always keep in mind. So, many times you are making the analysis over the samples not for whole population and in that case when you are coming up with this mean absolute deviation and standard deviation values for the samples there is slight change mean absolute deviation over the sample is denoted by m ad bar x for attribute x and in the calculations and this is the standard deviation for the calculated over the sample and in formula what is change? So, this mu x here was representing the mean for whole population, but in these calculations we will consider the mean for the sample space itself. So, sample values mean. So, it is denoted as x bar. So, here you will find it is mu x here you will find it is x bar and one more change is when you are calculating these values for the population the mean is calculated by considering the total number of instances as it is n, but here it is n minus 1 1 less than that of the total number of values present keep this point important point in mind. Unless until specified these values are calculated are considered to be calculated over the samples and when you have this standard deviation value you can come up with one more descriptor dispersion statistic descriptor called as variance and variance is nothing, but just squared value of the standard deviation. So, square of s will give you the sample variance value. So, if you have this type of contact list where you have this quantitative attribute weight height and maximum temperature for these 14 different instances. So, n comes out to be 14 here and then these are the actual values. So, from this table if you implement the earlier formulas for this table sample mean calculation and standard deviation calculation you will come up with these as the results. So, this is the summary of the results that can be obtained for the earlier table as far as the statistical parameters considered are amplitude, interquartile range, mean absolute deviation and standard deviation. So, these are giving you the information about the spread or dispersion itself. More is the spread more will be the interquartile range more will be the amplitude more will be the standard deviation value more will be the mean absolute deviation value and in that also every value has got different significance. Now just think which of the following statements are correct. The mean is measure of deviation in the data set standard deviation is the measure of dispersion the range is the measure of central tendency and median is the measure of dispersion. So, just remember mean, mode and median all these are the measures of central tendency of the data. If you know that the only correct answer here will be the standard deviation is basically a measure of dispersion. So, this is all for the today's class online class here these are the here are the references which are used for this video presentation. Thank you.