 Hello, and welcome you all for the online video session on Location Univariate Statistical Analysis. Myself, Mr. Vipul Kondekar from Walton Institute of Technology, Sholapur. So, these are the learning outcomes. So, at the end student will be able to you will be able to calculate these different location univariate parameter values. As well we will see how we can have visualization of these parameters by using box plot. So, you will be able to draw that box plot as well interpret that box plot. So, these are the contents for this video presentation. So, first is what is univariate analysis? So, you know the data analytics may be done on different attributes. So, how many attributes you are considering simultaneously for the analysis? So, if only one attribute is considered, if single attribute is considered that becomes univariate. So, if you are analyzing two attributes simultaneously that becomes bivariate analysis and more than two attributes are considered simultaneously for the analysis it becomes multivariate analysis. Now, analysis of the data can be done in three different ways either you can have frequency calculations for the data then you may have visualization and then you can have some statistical parameters defined. So, let us try to understand what are the different univariate statistical parameters. So, univariate analysis is basically divided into two different kinds one is called as location statistic analysis and other is dispersion statistic analysis. So, we will concentrate on location statistics where the statistic values are depending on the position of the data. So, now let us try to have understanding of what are the different location univariate statistic values. So, location univariate statistic value first thing for calculating all these statistic values what you have to do is you have to rearrange that attribute data. So, either you arrange the data in the ascending or descending order and then it becomes very easy to calculate all these statistic values. So, usually what we will do is we will arrange the data in the ascending order. So, once the data is arranged in the ascending order you can come up with the location statistic value which is first attribute value is or the parameter is minimum value. So, what is the minimum value for some attributes? It is the lowest value observed is suppose you have a data of weights of different persons. So, what is the minimum weight present? So, that represents the minimum value of that weight attribute. So, on the same lines maximum is one more attribute value which is the largest value. Then comes mean value, mean value or it is also called as average value. So, what is mean or average value? So, how we basically obtain the mean? So, we just sum up all those attribute values and divide it by number of instances you consider. So, it is just averaging effect is obtained by summing and dividing it the result by total number of instances or the values. So, you will get the statistical parameter called as mean. So, when you are appearing for certain exam in your semesters you are appearing for the exams for five or six different courses and then somebody and after results are declared. So, results are available for individual subject, but still you come up with the average mass. So, average percentage mass. So, that has got some significance. So, that is a parameter called as mean then comes the value called as mode. Mode is nothing but it is the most frequent value present in your data, most frequent value. See, you consider the example you have data of about a batsman, cricket player who has played let us say 200 different matches, 200 innings and in each inning he has scored certain runs. Now, you are interested in certain most frequent values like how many times that player has made half century, how many times that player has made century. So, then if you are calculating the frequency of that 100, frequency of 50 runs. So, you are coming up with the frequency values for all runs and in that you will find out what is the most frequent value that is called as mode value for that particular attribute. So, it is the most frequent value present in the data for that particular attribute. Then comes three important statistical descriptors it starts with first descriptor is first quartile value of the attribute. Now, what do you mean by this first quartile value? First quartile value is representing the value that is larger than 25 percent of all the values. So, whatever is the number of instances you have for a certain attribute what you do is you arrange those instances maybe in the ascending order and then you point out a location which is dividing or which is separating the data up to 25 percent of all values. So, that is representing the first quartile value then median value or median value is also called as second quartile value. So, second quartile value is representing is a value that is larger than 50 percent of all values. So, this median represents 50 percent and on the same lines the third quartile value third quartile value is representing a value that is larger than 75 percent of all values. So, these are the some statistical descriptors when you calculate this descriptor value just what you have to do is you have to rearrange. So, let me show you how we can do that. So, this is one this is the data table available. So, what you can do is you can take any of these attributes. So, let us say you want to analyze this weight attribute. So, what you can do is you can take these weight values and then for the analysis I am taking the weight values, but I know that these values should be arranged. So, what I do is I arrange these values in the ascending order. I arrange these values in the ascending order. Once I have these values arranged in the ascending order. So, this the first value will be representing the minimum value of the attribute. On the same lines last value will be representing the maximum value of the attribute and then mean value. So, this excel tool can have some readily inbuilt functions and tools available. So, I can get directly what is the average value. So, average value of the weight present here is coming out to be 79. On the same lines if you want to calculate the quartile value what I can do is just you can use a function called as quartile you can use function called as quartile and then you have to select the range for which you want to calculate the quartile. So, this is the range for which you want to calculate the quartile value and which quartile value I want to calculate is the first quartile value. So, just you will get this is the first quartile value. So, on the same lines you can have second quartile value calculated, quartile value for this particular range and for the second quartile. And similarly, third quartile can be calculated using readily available function, quartile give the range which quartile you want to calculate. So, third quartile. So, this is how you come up with the different quartile values. So, whatever this statistical descriptors we have seen. So, this is representing the minimum value, this is representing the maximum value, this is representing the mean value, then comes the first quartile value for the weight attribute data is 67, then this is the second quartile, this is the third quartile value. Now, these things are ready, once these things are ready, suppose I want to analyze this for another attribute, let us say height. So, only thing is what you do is you put that height values here and just arrange these height values in the ascending order. You will get the quartile values calculated, quartile values calculated as well as mean, max, mean, max, minimum, maximum, mean, first quartile, second quartile and third quartile values calculated for this second attribute also. This is how you can make the calculations of these attribute values. Now, once you have these calculations, you can have a very useful and interesting visualization method of this univariate location statistic parameter and that visualization is in terms of box plot. So, whatever these statistical parameters are, so what you can do is you can have visualization of these statistical parameters by using a plot. So, it starts with minimum value, you will start the box plot with the minimum value, it will end up with the maximum value. This box plot is also called as box and whisker plot. So, here this is the what you do is you calculate the first quartile value also and then you this the box will start with the first quartile, there will be a divider and that divider will be the second quartile value and the box will end up with the third quartile. So, this is how if I have the values like earlier we have done the calculations if I show you this is the minimum value, this is the maximum value and then this is the first quartile, second quartile and third quartile value. So, all these values if I consider and I can have visualization of the box plot. Now, if you visualize this what information you get what is the significance of box plot. So, box plot is basically used to visualize these univariate location statistic parameters, it gives you the information about the skewness and symmetry of the data. So, if the box plot is symmetric the median is exactly at the center you may say that the data is skewed data. So, this plot also gives you the information about the outliers present in the data. So, this will get cleared. So, this is the data which we have considered for analyzing this is the contact list information about 14 different instances, 14 different persons and the attributes stored are the name of the person, temperature, weight of that person, height, gender, company. If you look at these attributes three attributes are quantitative and three attributes are qualitative attributes and these are the statistical parameters calculated for all those attributes. So, if we look at the attribute called as weight then 55 is the minimum weight, maximum weight is 115 and 179 is the average value this is the mode first quartile, second quartile values. Now, based on these values so, this is the output of output produced by the educational edition of IBM's SPSS tool where these are the two box plots for the two attributes one is for the height and another is for the weight. So, this is the vertical arrangement where this will be the minimum value, this is the first quartile, second quartile, third quartile value and this is the maximum value. And as I said in your data there may be some outliers, outliers are the data values which are far away from the regular values taken by the data and in that case these outliers are not considered by the tool and these outliers are visible in the box plot itself as the separate points. So, this is what you can have the visualization of the attribute height and weight in terms of its corresponding box plot which considers minimum, maximum, first quartile, second quartile and third quartile values for its representation. Now, just think over this problem a gardener has collected data on two different types of tomatoes. So, you have got two different box and whisker plots for these two types of tomatoes. These are the plots for what is the mass observed for the type A type of tomatoes and type B type of tomatoes and based on the mass for the tomatoes of type A. So, this is the plot we got. So, this is the plot for B type of tomatoes. So, you use the knowledge of your box plot and univariate statistical parameters and come up with the answer. Can you compare and contrast these two types and come up with the advice to the gardener which type of tomato he should grow? Just think and answer. These are the references used for this video presentation. Thank you.