 Module 12 is about descriptive statistics and we are going to talk in detail how the descriptive statistics help us in data science and how we can you know make different usage of descriptive statistics in the data science. So descriptive statistics as the name is self-explanatory, it describes certain facts about data which is already collected through different means. So there are important characteristics of this and specially we need to know about the data we say, generally we say, this class has a lot of boys, I suppose. So this is description of your collected data. We say in this class, for example we are talking about GCSE class and we say that the average age of boys in this class is 16 years. So this kind of measure is that basically the past data, which is a factual data, describes three major areas. The first one is that is the measure of central tendency. Overall our collected data presents which things. There are multiple things in this, the measure of central tendency, like the mean which we normally say average, cricket match, average is enough, we have so many average, so many overbuckings, so these are all descriptive statistics. Here we say that there is a batsman or a bowler, there is so much of average, there is so much of batting average, there is so much of one day matches, there is so much of T20, so this is all average. So this is called the measure of central tendency. There are two other measures which are normally less used, one is mean and one is median and mode. So the first one is means which is average, second is median and third is mode. Mode is the number of occurrences in a data, it is mode and mean is the highest value in it, normal is the median, sorry, that is how it is. Then there is the data spread, the entire population what is the data spread in it? Again we are talking about students, some students whose numbers are very low, some of them are very high, but the average people, if you take any grade A, B or C that is maybe average, D or F is rare then similarly A, A plus they are rare. So that is so normally over data is distributed, and there are some things in it which we call standard deviation variance, these things basically come about data spread, similarly relative standing, one set of data within a sphere of data has a second KKR relationship, we will discuss all of these things one by one. Before that it is important to understand that I have shared a little bit with you, but let us see it with a little focus, the characteristics of descriptive statistics, how it is different from inferential statistics or other types of statistics, like probability distributions, how our descriptive statistics differ from them, what are its characteristics? The first thing is that the summarized data is described, the data is presented in a meaningful way, you can find it, you drill down further on it, but you do not need more analysis on it, that whatever data you have prepared, you get to know that average grades, or if you take a blood report example, then the ranges are different, you immediately get to know that there is a sugar level or a white blood count, hemoglobin, these things are in normal range or not, then you do not need to study further on it, so this is again one example of descriptive data, and as you already know facts, you know that this is a normal range, this is acceptable, this is dangerous, this is low, this is high, all these things are basically you get to know on the basis of that data, which is already collected, as we have seen in the previous slides, that we can present it in different forms, in terms of table, charts, dashboards, graphs, all these are our descriptive statistics, we represent them in this way, after this, the different representations come together, there is a central tendency, there is a spread of data, there is a relationship of data, all these are the things that you get in the descriptive statistics, and if we talk about inferential statistics, then we can forecast these things, we can force them, we can assume the result on the basis of that, that if this happens, then we will do this, that is how it is, there are examples, as we talked about, the score of the students' math, because the exam is done, after that they tested it, and after that, the examiner checked the paper, and told the result, so this is the effect, similarly, how many students took A grade, how many students took B grade, so on and so forth, all these effects are about the particular class of students, if we look at the overall students, the whole university, we only looked at the results of the math of GCSE students, so to tell you, we collected the data in terms of papers, then we organized it subject-wise, then the examiner checked it, and the result was produced on the basis of that, all these are the descriptive statistics about students, then after that, the data in the descriptive statistics is always certain, there is no uncertainty whether this will happen or not, because there is a student who has achieved his score, so we cannot say whether this will pass or fail, if it is pass, then it is pass, if it is fail, then it is fail, so that is our descriptive data is always certain, we saw in this, in the beginning, of the measure of central tendency, the mean, median mode are the things that we have discussed, when we discuss variables and other things, then these things, our understanding and concept will be more clear about these things, the data spread, we measure it from standard deviation, from variance, correlation, from covariance, from range, so there are different measures, if my average student's pass rate is this, then what is variance in that, variance means that how many students are out of that range, or what is the correlation between them, that again, the people of urban population, their result is better, or the result of the rural population is better, or the result of a special school student is better, so based on that, or we will also look at this, the students of virtual university, the students and the other physical university, their students, what we can relate to their score, that is how over data is correlated, and based on this, we can do a lot of effects finding, then further planning of the government's institutions, of different companies, that is based on the data spread, then there is a relative standing, that our overall population is, there are different ways, one is that you divide it in 400, first quartile, second, third, fourth, and similarly, it depends on the type of data, if you divide it in 10 to 10, then your decil comes, divide it in 100, then your percentile comes, so these are some things that we have discussed quickly, about descriptive statistics or descriptive data, how we have to use descriptive data, so this was all about this module and I will see you in the next module