 Hello and welcome you all for this online video session on course data analytics and the topic is data scales and descriptive univariate analysis in terms of frequency values. Myself Mr. Vipul Kondekar from Walton Institute of Technology, Sholapur. So, these are the learning outcomes of this video. You should understand the scales and will be able to differentiate between the different scales of representation of the data and then data frequencies you will be able to calculate and interpret or make interpretations from those frequency values. So, these are the contents present in this video. We will have introduction, we will talk about the different univariate frequencies in that we will have absolute frequency calculations, relative frequency calculations for one of the attribute and then we will make some interpretations from those calculations also. So, first let us start with what are the scales for data representation. Before that when you have a data analysis, your data analysis may be depending on how many attributes you are analyzing. If you are analyzing only one attribute it becomes univariate, if you are analyzing two attribute it becomes bivariate and if you are analyzing more than two attributes it becomes multivariate. Any analysis, any analysis can be done by using three different approaches. You may have frequency table calculated for that parameter attribute or you can have some statistical measures representing the attribute values and then you can have visualization in terms of different plots. So, when you talk about the different scales, when you have the data, data may be available basically in the two different formats, data may be a qualitative data or it may be a quantitative data. The qualitative data can have two different scales, either it may be a data with the nominal scale where only names are given to the attribute values or it may have ordinal scale where the names are there and you can have ordering of those names. The quantity data can have two different scales, it may be a interval scale where the distinct values are there and the difference between the two values is equal. And final you can have a scale called as ratio scale where you will be having the distinctiveness ordering is possible and as well there is a possibility of absolute zero value. So, that is the important property of the ratio scale. So, the data of ratio scale has got absolute zero value. So, this is the check table where these are the four attributes and then if you have a nominal data, it will be distinctive only, but if it is a ordinal data it will be distinctive as well ordering is possible. If it is a interval scale data, so it will be distinctive, ordering is possible and the spacing between the data values is equal. And finally, when it is a ratio scale data, see all four things are applicable, it is distinctive, ordering is possible, the values are equally spaced and the absolute zero value is possible and has got some significance. See definitely whenever you are having the scale representation for the data, when you are going from the nominal scale to the absolute scale the amount of information goes on increasing. So, most information is observed if you have the data represented with the absolute scale, least information is observed when you have the data with the nominal scale. Now, just think how we can convert the attribute weight, weight is the attribute represented in absolute scale, but how we can convert that to a other scales like how I can convert it to the ratio scale, sorry interval scale or ordinal scale or nominal scale, think and answer. Now, let us come up with this analysis in terms of frequencies. So, when you are analyzing univariate data in terms of frequency basically when you say frequency is what? Frequency is talking about a repetition. So, a frequency is basically a counter which counts how many times certain thing is getting repeated. So, for univariate data you can come up with four different frequency values. So, first frequency value possible is called as absolute frequency, absolute frequency. So, what is the absolute frequency? It just counts how many times a particular value appears. It is like a batsman has played 200 different innings and he has made century for 31 number of times means indirectly you are talking about absolute frequency of scoring runs 100 is 31. How many times that particular value appears? Then comes relative frequency. Relative frequency calculations are in terms of percentage. So, it is percentage of times a particular value appears. A player has played 100 innings and in 100 innings for 25 innings he has scored let us say 50 runs. So, then 25 percent is the relative frequency of scoring runs equal to 50. It is percentage of times out of 100 innings. So, it is percentage innings 25 times means 25 percent you will get the percentage value. So, this is a relative frequency. So, it is percentage calculation of absolute value itself. But now when you are considering these values, frequency values, absolute frequency values or relative frequency values in a cumulative way you will come up with two more frequency descriptors. One is absolute cumulative frequency. It is cumulative frequency or cumulative value of number of occurrences for the attribute value which is mentioned and all values which are less than that particular value. Relative cumulative is again cumulative calculation of the percentage or relative frequency values. This you will understand better with the example. So, now if this is the table of data we are considering for the analysis of frequency. So, this table has got six different instances sorry six different attributes name this is the contact list table stored. So, you have attributes like frame name, maximum temperature weight, height, gender company and then you have got 14 different friends information getting stored here. So, 14 instances you have and now let us say you want to make some frequency calculations how you can make that. So, let us say you want to make the frequency calculations for the attribute height. So, what can be done for that? Very first thing is the attribute which you want to analyze for the frequency is what you have to do is you consider the unique values taken by that attribute and arrange those values in the ascending order. So, in the from the earlier table I could check that the minimum value of the height was 158 maximum height was 195 in terms of centimeter and these are the in between values and I have neglected the duplicate values here right. So, these are the different height values taken these are the different unique height values taken. Now, next how I can come up with the absolute frequency calculation. So, you have to look in the database and you have to come up with how many persons are there in my data having height 158 centimeter I come up with there is only single person. If you talk about height 172 you you observe that there are two persons in your database having the height 172. So, just look at here is it true here. So, here is one person Carolina as well here is Leah. So, these are the two persons taking the height values 172 only. So, that is why the absolute frequency of height 172 is 2. So, like that these are frequencies are calculated. Now, let us come up with relative frequency. So, what is the relative frequency of height 158? There is only one person out of 14 having height 158. Hence, relative frequency of height 158 is 1 by 14 in terms of percentage it comes out to be 7.14 percent. So, 172 height what is the relative frequency? So, two persons out of 14 means 2 by 14 and 2 by 14 comes out to be 14.29. So, this is the relative frequency value for all the unique height values. Now, if you are considering these height values cumulatively this frequency values cumulatively you will come up with the two more frequencies one is called as absolute cumulative frequency one is called as relative cumulative frequency. Now, let us try to understand the significance. Let us say what is the significance of this absolute cumulative frequency coming out to be 10? It means that in your data there are 10 persons having height equal to or less than equal to 180. There are 10 persons. So, like that you will come up with this answer as 10. And then what is the significance of this relative cumulative frequency? It is cumulative sum of relative frequency values. If you sum up all these percentage values you will come up with this answer as 71.42. So, one point you keep in mind when you are preparing this frequency table the last element in the last row of this particular column absolute cumulative frequency should come up with the total number of instances and last element of this column relative cumulative frequency should be 100 percent. Because of the rounding effect we are getting closer, but ideally this should be 100 percent. So, this is how you should get the results for the frequency calculation for the single attribute height. On the same lines if you change the attribute let us say you want to analyze for the company attribute. So, company can be good or bad. So, absolute frequency of company good is 7 means there are 7 persons representing good company remaining 7 are representing bad company and relative frequency is coming out to be 50 percent, 50 percent here. And these are the important observations. So, the value of the relative cumulative frequency in the last row will be always 100 and last last row for the absolute cumulative frequency will be representing what? Total number of instances. And finally, when you have these relative frequencies calculated if you have a discrete attribute discrete attribute then these relative frequencies can be used for having a visualization or plot called as probability mass function. So, these frequencies can be used for plotting probability mass function if the attribute is discrete. Example the integer data types and if the attribute is continuous in that case you can come up with this probability density function from this frequency values. This is all for this online class. So, these are the different references used for this video presentation. Thank you.