 So in status six, we are going to quickly revisit two things one is We are going to revise measures of central tendency and Secondly, we are going to talk about measures of this person Why I'm going to do measures of central tendency because this is a prerequisite for this All right, so let us start with a quick revision of Measures of central tendency Now normally there are many types of central tendencies that you would have already studied in your junior classes Okay mean median mode In fact in mean also we have arithmetic mean geometric mean harmonic mean we have Something called quartiles. We have something called desiles. We have something called percentiles All those things are there as a measure of central tendency, but we are primarily going to focus our attention only on mean Median and a bit of mode also because mode is something that is Important for from J point of view not with respect to your school point of view Now when we talk about means First of all, we'll be talking about the different types of data that we normally come across So the data that we normally come across there are two types of data that we see our data are Either ungrouped which we normally also call as our discrete data Okay, or our data could be grouped. Okay. That is also called continuous data That is also called continuous data Okay, so when I'm talking about these three quantities or these these three measures of central tendency I'll be also talking about how they're applied to ungrouped data and how they are applied to group data Okay, so let us start with the concept of mean Mean is nothing but weighted average So you must all be studying mean in your junior classes also mean is nothing but a weighted average So let us say if you have ungrouped data if your data is ungrouped or discrete Okay, and your data is something like x1 x2 x3 all the way till xn and Their frequency is f1 f2 f3 till fn. Okay, then we find the mean Which we also represent by the symbol x bar many a book many a times you will Realize that they use the word mu also for it. So mean is found out by doing the summation of The data with its frequency which we call is summation fi xi By the total number of data by the total number of data Now this method of finding the mean is called the direct method This method of finding the mean is called the direct method and sometimes this method becomes very very heavy If your data itself and the frequency itself is very very heavy for example, if you're finding out the Mean salary of people living in a place like Bombay, right, which is overcrowded So of course your xi will also be heavy because normally the salary over there is much higher because of the living standards And of course since there are many people living in Mumbai, your data will become very very heavy and it would take Big amount of computing Expenses for you to find the mean so this method which is the direct method is not preferred Unless until your data is manageable your data is light in terms of xi and fi Okay, so we follow something called the next method which is method number two So let me call it as direct method in method number two We found find the mean by using a formula a plus summation fi di by summation fi Just give me one minute. Sorry if some urgent work. Yeah now in this formula, we have this a as our Assumed mean so a here here a is your assumed mean That's why this method is also sometimes called assumed mean method or shortcut method Okay, so where you assume one of your exercise you can take any one of your exercise Let me call it as Xi over here One of the exercise is taken as your assumed mean, okay, and D is nothing but the deviants D is nothing but the deviant that is the difference of the data from the assumed mean Okay, and what do we find here we find here Summation fi di by summation fi now guys, let me try to explain here a very very important thing The mean of any data is basically dependent on the shift of origin right this this concept works on The principle behind this is it works on shift of origin What is shift of origin? I mean, it's not coordinate geometry I know but there is a term that we call in statistics as the shift of origin and change of scale First let me explain you what are the meaning of shift of origin shift of origin means if you add or subtract a particular constant from every data that you have The mean will also get subtracted or added by the same constant that is called the shift of origin concept Okay, so if you have let's say x1 x2 x3 etc And you start subtracting a from each of the data. So a is like a constant, right? So this is a constant So you're subtracting this you are subtracting this From each of the data right so your mean will also get shifted by the same amount of data Same same constant a right. So once you have found out your D And you find this expression. This is nothing but you are trying to find out the mean of The shifted data You have found out the mean of the shifted data using direct method Okay Now your actual mean would be nothing but you add the shift amount that is a back to your answer So what I did first I shifted the data I calculated the mean of the shifted data then I added back my a to regain back my original data Are you getting this point? Right, so this is the principle behind which this formula of assume mean method or shortcut method works Is that fine guys Any question so far? Okay So this method is called the shortcut method This is called the shortcut method now. There's another method that we follow that is called the Step deviation method in that. What do we do? We calculate the mean by using this formula summation fi ui By summation fi right now a is the same assumed mean Okay, here a is again the assumed mean so one of the data could be assumed as the mean However, it is not mandatory to assume one of the data itself to be assumed mean You can actually take any data to be your assumed mean. Okay, and you I is nothing but Di by H Di is nothing but Xi minus a by H What is H? H is basically a step size that we choose Okay, so we choose a particular step size and we divide the di's by H to get a suitable step size Okay, again, here's something called it is based on the principle of shift of origin and change of scale Shift of origin So this result is based on shift of origin and change of scale now What is change of scale shift of shift of origin? I already explained you. Let me explain you. What is change of scale? See when you subtracted a You actually shifted the origin Right, that means you subtracted a fixed amount of value from each of the data That means you're shifting the origin. Okay, and when you're dividing by H. You are actually changing the scale This is called the change of scale right Now remember your mean is dependent on shift of origin and change of scale Right, so when you shift it by a you have to add the a back and When you divide by H you have to multiply the H back then only you will get the actual mean Are you getting this point? Okay, and this method of finding the mean is called the step deviation method This method of finding the mean is called the step deviation method. Is that fine guys? Right, remember questions like this have been asked in J as well regarding the shift of origin and change of scale So be very very clear about it. How these formulas have come that is very important for us to know Is that fine? You can apply the same even if the data is continuous. So these formula these three formula are applicable Okay, the above formula is applicable above formula or above methods are Applicable even if the data is a grouped data Applicable to grouped data as well. It's just that your xi will now become your Classmark your xi will now become your class mark So instead of the data, you will now use the class mark. Everybody knows class mark Classmark is the average of the upper and lower class limit So group data is something which is actually classified into classes, right? And there's an upper limit and a lower limit of a class. Remember we deal with two types of classes Inclusive class and exclusive class, right? Everything is known to you from your junior classes Inclusive class is that class where the Upper and the lower limit are included in the class. So this is called inclusive class something like this Let's say a zero to four five to nine ten to fourteen This is called an inclusive class because these both the data are included within the class Okay, then there's something called exclusive class. I Don't need to tell you this because you have already done this in your class 9th and 10th. This is called exclusive class Okay So here this is included But this is excluded the five is excluded from the the upper limit is excluded upper limit comes in the next class So five will be included over here. Okay? Anyways So I'll just start with a quick example so that we are fine with finding the mean at least I don't want anybody to make a mistake in finding the mean. Let me just write down This question calculate The mean of the following data So let's say we have Marks and the number of students zero to ten seven people ten to twenty Three people 20 to 30 we have five people 30 to 40 we have two people and 40 to 50 we have let's say eight people Okay So let's calculate the mean of this data All of you, please take some time to calculate the mean of the data Please use your step deviation or the shortcut method for this 25.4 is what we get All right, so let's quickly solve this simple So first step is we have to find out something called the class mark Okay, so let's find out the class mark Which is called XI Okay, XI can be found out XI can be found out by taking the average of the upper and the lower class limits. So that's going to be five 15 25 35 45 you just have to add your class intervals to each of the data Okay, next we'll find something called the deviance that is DI Okay for deviance I have to assume a mean from one of the data Now I can assume any one of the data to be the assume mean so I'll let I'll assume 25 to be my assume mean Remember you have no constraints like choosing the middle and one of the excise you can choose any data to be your assume mean I can choose 500 also as my assume mean I can choose minus 200 also as my assume mean It's my call. I want if I want I can shift the origin anywhere. I want to But preferably you should choose the center and one of the excise to be the assume mean because it makes your life easy Why it makes your life easy. You'll come to know shortly. So Let me just make these bifurcations now 25 minus 20 is minus 20. This will be minus 10. This will be 0. This will be 10 and this will be 20 Next column that we make is FIDI column Your FIDI column. So this is your FIDI column Okay, so FIDI means multiply the FIDI column with the DI column. So that will give you minus 140 This will give you minus 140 minus 30 0 20 and 160 as you realize one of the Entries becomes zero. So it really saves a bit of time. Okay, then find summation FIDI Summation FIDI here would be 180 minus 170 which is going to be 10 Okay, and total number of data we need which is summation FID Okay, so total number of data will be our 10 plus 20 25 is our total number of data So I will use the assume mean method that is a plus summation FIDI by summation FIDI and that is going to be 25 plus 10 divided by 25. That's actually 0.4 So the answer will be 25.4 marks. Do not forget to put this unit The unit will be the same as the data Okay, because you are finding the mean of the data so it has to be one of the data itself Please do not forget to write the unit of the data. Is that fine? Okay, so if I have to do the same problem by step deviation method What do we do is we'll find something called UI UI is nothing but DI by H and this H could be anything Right, I can take the H here as 10 because I'm dealing with a class height of 10 I can choose it as 1 also. I can choose it as 100 also. It's your call, but do not choose it as 0 Okay, so when you divide the DI is with 10 I get minus 14. Sorry minus 2 Minus 1 0 1 and 2 so what it does it just takes off that extra zeros from DI Okay Which makes your data look lighter actually Next we find something called FI UI, okay, so FI UI will be minus 14 minus 3 0 2 and 16 Then you find something called summation FI UI summation FI UI will be 1 Okay, and your mean will now be a that is 25 plus 10 into 1 by 25 which again gives you 25 point four marks So this is by using my step deviation method Okay, now what happens to the mean when the data is combined so for the combined data Remember this simple formula. So if you have two data and B Let's say data A has number of entries or number of a size of the population we what we call it as N Let's say it has size of the population Size of the population in the data is let's say N1 Okay, and let's say mean of the population is X1 bar and there's another data B whose size is Size of the population of the data B is Let's say N2 and its mean is X2 bar Okay, then the mean of the combined data then if you combine the data A and B Then the mean of the combined data is given by the expression N1 X1 bar plus N2 X2 bar by N1 plus N2 Super simple formula everybody knows this. I'm just you know recollecting it for you. So now let's move on to the concept of median concept of median How do we find the median for a ungrouped data? Okay, for ungrouped data we follow these two steps step number one. We first arrange the data in ascending or descending order Okay So please arrange the data in ascending or descending order according to the magnitude of the data Okay, please note that the When the data is arranged it is not arranged with respect to the higher to lower frequency or lower to higher frequency It is arranged with respect to the magnitude of the data involved step number two is step number two is if Your N that is the number of data is odd if a number of data is odd Then your median Then your median which we normally refer as Q2 Now by guys the word Q has come because of the Q tile your quartile is the word Second quartile is actually your median is something called the first quartile There's something also called the third quartile Q1 and Q3 that we're not going to study anyways So median is the second quartiles. Basically median is nothing but the central data The central data. This is your median Okay, then your median is given by your N plus 1 by 2 with data that will be your median Okay if N is even then your median will be or Q2 will be the average of the average of N by 2th and N by 2 plus 1th data is that fine for a group to data for a group to data or for a continuous data Where the data is written in terms of classes We have to follow a formula Right, I'll take an example to explain how it works for a group to data. So let me take the same situation Let me take the same Question over here. So let's say I have this question The first thing that we do is we find something called the cumulative frequency. There's something called cumulative frequency Okay, what is a cumulative frequency cumulative frequency is basically a column which tells you How many people have scored marks less than 10 or below 10? So your answer will be 7 Okay, how many people have scored marks less than 20? So your answer will be 10 How many people have scored less than 30 your answer will be 15 How many people have scored less than 40 your answer will be 17 and how many people have scored less than then 50 your answer will be 25. So this particular table that you see over here that is called the cumulative frequency table or the cumulative frequency column. So just ask yourself how many people have scored lesser than the upper class limit of every class. The moment you ask these questions you will automatically get the numbers and that numbers will be your cumulative frequency column. So basically this 7 is copied here itself, now this plus this is copied over here, this plus this is copied over here so keep adding like this. Again this plus this is copied over here, this plus this is copied over here. Remember this is something which should match with the summation of the frequencies. So the last entry which we call as n should match with the summation fi, so summation fi should be equal to n. Now next step, step number 2, step number 1 is finding the cumulative frequency. Next step is we find something called n by 2, so n by 2 here is 25 by 2 which is 12.5. And then we see which of the cumulative frequency is just greater than or equal to 12.5. So let's say 15 over here is just greater than or equal to 12.5. Then this class over here we call this class as the median class, why we call this as the median class is because your median will be one of the data from the interval 20 to 30. And then we use the formula, median is given by l plus n by 2 minus c by f into h, now let me tell you what is l, what is c, what is f and what is h, l is nothing but the lower limit of the median class. So the lower limit of the median class that is actually 20 in this case, n by 2 I have already told you what is n by 2. What is c? c is nothing but the cumulative frequency of the class preceding the median class, what is f? f is the frequency of the median class and h is nothing but the class width or the class height of the median class. So by use of this formula I can get my answer very easily, so l is 20 plus n by 2 is 12.5. c is this, this is your c, this is your c. So minus 10 divided by frequency of the median class, frequency of the median class is 5, this is your 5, this is your f, this is your f, so divided by 5 into class height, class height is 10 in this case. So I can get the answer from here very easily, so it will become 20 plus 2.5 into 2 which is going to be 5, so 25 will be your median, I am not sure how are you getting 21.66, just check your calculation because my calculation says it is going to be 25th data, the 25 value. So 25 marks will be your median. Is the idea clear guys, how we find the median of the data? Now many people ask me how do we get this particular formula, from where do we get this formula, this formula is basically nothing but it is like using the formula of an arithmetic progression. So in arithmetic progression if you see the un term is nothing but the first term plus n minus 1 into d, so this first term is actually your l, n minus 1, this thing is basically the role is being played by this and this d role is actually played by h by f. Now let me explain you, so it is trying to say that, see I want to find which is my 12.5th data, so see my 10th data is actually 20 and my 15th data is actually 30, now I want to find out what is my 12.5th data, this is your median, this is what we call as q2. So it is basically you are trying to see how many divisions are possible, how is the data distributed in this class, so for every increase of frequency you will realize that there is a jump of 2 in the data, correct because from 10 to 15 when you go the data goes from 20 to 30, so for 5 difference in frequency the data changes by 10, so for every difference of the frequency the data changes by 2, correct. So how many more should I go to reach 12.5, so I should increase the frequency 2.5 more, so for 1 it is 2, so for 2.5 it will be 5, so add 20 to 5 you get your median and that is how we get this formula, is that fine guys? So how this formula comes and works it is clear to you, now without wasting much time we will now go on to the concept of mode, okay, again for ungrouped data, mode is basically the data which occurs with maximum frequency, the data with maximum frequency and if let us say there are 2 data which occurs with maximum frequency that means there is a clash between the 2 data which occurs with maximum frequency then we use something called the empirical formula to break that clash, there is a empirical formula for the mode which says mode is equal to 3 median minus 2 mean, to be used only when there is a clash between 2 data, is that fine? Empirical formula means it is not an exact formula, it is a rough estimation just like we use empirical formula in chemistry but if you have something called the grouped data, if you have a grouped data then we normally use a formula, the formula is mode is given by L plus F1 minus F0 by 2F1 minus F0 minus F2 into H, okay, let me explain you the meaning of these symbols L, F1, F0, F2, etc, in fact let me take an example, let us move on to the next page, let me take an example, let us say this is your data, okay, now in this data let me just introduce one more, let us say 50 to 60 we have something like let us say 6, okay, or let us say 5, okay, then we use something, we use the formula mode is equal to L plus F1 minus F0 by 2F1 minus F0 minus F2 into H where L is nothing but lower limit of the modal class, modal class is that class which has got the highest amount of frequency, this will be called as the modal class, F1 is the frequency of the modal class, F0 is the frequency of the class preceding the modal class, class preceding the modal class, F2 is the frequency of class succeeding the middle, the modal class, H is the class height or class width, okay, so using this formula over here my mode will be L which is 40 plus F1 is 8 minus 2 by 2 into F1 which is 16 minus 2 minus 5 into 10, so that will become 40 plus 6 by, this will become 9 into 10 which is approximately 2 third which is 46.66, is that fine, now mode can also be found out by the use of histograms, if you plot the same data in histogram let us say I take a dummy example of a histogram, so let us say when you plot the data in a histogram form, okay, I am just taking a very small example, okay, let us say this is your data and this is your, sorry, this is your frequency and this is your data, yeah, then the data actually if you are representing in the form of histogram you can actually find out the mode by connecting the top end with the next top end and wherever they intersect just extend this down this will give you the position of the mode, this will give you the position of the mode, yeah, yeah, I change the value actually, now mode is not that important because we are not going to use it but just for your information, remember all the central tendencies are dependent on change of origin and shift of scale, change of origin and change of scale, shift of origin and change of scale, right, so next we are going to start with the most important concept that is actually important for your CVSE curriculum that is called the measures of this person, that is actually called the measures of this person but before that I would like to take some questions, alright, so this question came in AIEE 2007, I would like you to quickly answer this question, very simple question, this is the type of question that you can expect in your JEE exam or other regional entrance exams, they don't normally give you a data-based question unlike in school exams, alright, so I have just got a response from 3D event share, what about others, Gargi, simple question, again it is a question which is based on the combination of data, so the average marks of the class in the boys is 52 and that of the girl is 42 and the average marks of boys and girls combined is 50, so what is the percentage of boys in the class, okay, so let's say x is the number of boys, number of boys, let's say it is x, number of girls, let's say it's y, okay, so the mean of the combined data would be 52x plus 42y by x plus y and this is given to you as 50, so this implies 52x plus 42y is equal to 50x plus 50y which means 2x is equal to 8y, that means x is equal to 4y, that means the number of boys are 4 times the number of girls, that means your boys ratio or percentage is going to be 80%, again another simple question which came in AIEE 2005, yes yes you will have a lot of concepts matching with this in physics and chemistry as well, 24 of course, you can use your formula, mod is equal to mod is equal to 3 median times 3 median minus 2 mean, okay, so here the median is 22, mean is 21, so it's going to be 66 minus 42 which is going to be 24, that's option number D is correct, next question which came in AIEE May 2015, you can see how easy questions have come in these topics, so they are the very scoring topics, you should never miss out on any question asked on statistics in these exams, so the question says the mean of the data comprising of 16 observation is 16, if one of the observation valued 16 is deleted and three new observations 3, 4, 5 are added to the data, then the mean of the resultant data will be, alright again very simple question, so 16 observations whose mean is 16 will sum up to this and you subtracted 16 and added 3, 4, 5 to it, okay, so this will be your new summation, okay, so your mean will be summation Xi by the total number of data, so this will give you 16 into 15 plus 3 into 4 into 5, I'm sorry, 3 plus 4 plus 5, okay, so you deleted one and added 3, so you will be having totally 18 data, so that will be 12, so 16 into 5, 16 into 15 plus 12, 240 plus 12 by 18, that's 252 by 18, that's going to be 14, so option number D becomes correct in this case, alright, so now we'll talk about measures of dispersion, now measures of dispersion is basically a very important concept in statistics which helps you to know the uniformity in the data or how data is scattered, it tells you the scatteredness of the data or we can say the uniformity of the data, so how is the data uniform or how is it scattered is something that we can come from, we can know from the measures of dispersion, okay, normally we use these four parameters for finding the dispersion of the data, one is called the range, okay, range is nothing but it's the difference between the largest and the smallest of the data, let's say l and the s, okay, then we use something called quartile deviation, okay, quartile deviation is nothing but it's half of q3 minus q1, now you will ask what is q3 and q1, q3 is basically nothing but the upper quartile, that means it is nothing but the three fourth position of the entire sample, so which data is present at the three fourth position of the entire sample that is called the upper quartile, just like median is the one which is present in the center of the data, similarly q1 is nothing but the lower quartile, this is just for your awareness because you know in JEE they can ask you a question like this, which of the following is not a measure of dispersion, so you should be knowing which of them are the measures of dispersion, okay, then of course there is something called you know there are other things as well I'll just point out over here, there are other things like interquartile range, interquartile range which is nothing but actually q3 minus q1, okay, there is something called which we may we call as the coefficient of quartile deviation, that's actually q3 minus q1 by q3 plus q1 and then we have the third type of measure of dispersion which we call the mean deviation and the fourth one which we call as the standard deviation, out of these four types of measures of dispersion the last two here are important for us because they are asked in your school exams also and very commonly in JEE main exam as well, okay, so before I start with mean deviation and standard deviation, guys you may take a small break of let's say five minutes, okay, so right now it's 11.16 by my watch, let's take a small break right now, we'll resume at let's say 11.23 a.m., all right guys welcome back, so we'll start with the mean deviation first, we'll start with the concept of mean deviation from any data or from any value a from a, okay now what are the meaning of this mean deviation, mean deviation means average of the absolute deviance of the data from x equal to a, okay, so you would be given some x equal to a data this would be given to you, okay and you have to find out what is the average of average of the absolute deviance, when I say absolute deviance means mod of the difference, right, mod of the difference, you don't have to take the algebraic deviance, you have to take the mod of the difference of the entire data which is given to you from this value x equal to a, okay, so this actually tells you how is the data scattered from a position x equal to a and this a could be anything, this a could actually be your mean, this could be your mean also or this could be your median also, okay this could be your mode also, okay, so when you are finding the absolute deviance of the data from the mean position or from the mean of the data that is what we call as the mean deviation from the mean, okay, if you are finding the mean deviance deviation from the median we call it as mean deviation from median, similarly you could also find out the mean deviation from the mode, but we will be mainly focusing on these two, these two are of importance to us, so people ask me why do we normally find out the deviation, wasn't mean median mode sufficient enough, what is the use of the measures of this person, see guys many of the decisions taken cannot be taken just on the terms of mean and median or for that matter mode, I will tell you a very simple story, I will tell you a very simple case study in order to understand the importance of the mean deviations or for that matter measures of this person, let's say I am a person who wants to set up a car factory, okay let's say I am a business man who wants to set up a car factory, now in order to set up a car factory of course I have to buy heavy machines, I have to hire a lot of labourers, I have to hire a place, I have to rent up a lease up a big amount of land in order to start the manufacturing of the car, okay, now of course when I am planning to set up a car factory I must do some kind of market research where I would try to identify which type of cars will be most sold once in that area depending upon the economic status of the area, so let's say I hire a statistician for this and I ask him to do a survey for me in order to find out the mean salary of the people living in the society, okay and let's say after one month of hard work he comes and tells me the mean salary of the people living in that society is let's say 40,000 rupees, okay let's say this is your mean salary of people living in that society, okay so I start making cars which are which is affordable by a person who is earning 40,000 rupees salary, okay so let's say I you know I start making Tata nano cars, okay I start making nano cars, okay just like Tata did, okay now when I when I started the production I realized that after one year also there is no sale happening, right there is no sale happening, okay and everything that could be done right I did that that means I know I did the advertisement for it my my car showrooms were at at an accessible position but nobody bought my nano car, right now what went wrong what possibly went wrong so when I actually went into the marketplace I realized that the people there were earning either very less let's say their salary was less than 5000 rupees per month or they were earning very high let's say you know like 2 lakh or 1.5 lakh you know per month, okay now the person who's earning 5000 he does he cannot afford a Tata nano car, right because Tata nano cars are they price at around 2 to 1.5 lakhs he's still in his bicycle mode he's still using his bicycles so he needs bicycle not nano cars, right whereas those who are earning very high salary they have their BMWs they have their BMWs they have their MUX, right why would they buy a Tata nano car so both these parties they don't need my car so there's no sale, right so the mistake which I did in this decision making was I based my decision on the mean salary I did not bother to check how scattered is my data so this person if you would have come and told me the mean salary is 40k but you could expect a deviation from this salary by let's say plus minus you know 35,000 rupees or let's say any figure I would have based my decision on that rather than basing my decision just on the mean, right so such kind of strategic errors can lead to lot of losses see I set up the entire car factory for this I hired so many people for this and everything got wasted because none of my cars got sold so that is why this deviation this person is becoming so important in the statistics or in terms of the data survey and research that we do about the market so with this small story I'll start with you know the first concept of the mean deviation let's talk about mean deviation from mean okay there are lot of means in this expression mean deviation from mean means average deviation from the mean position so we find the average deviation from the mean position by using a simple formula that is summation of the deviants of each data from the mean by summation fi basically what you're trying to do you're finding the average of deviations you're finding the average of the deviations right do you remember it's very much like the direct method that we use for finding the mean right so mean deviation from mean can be found out by finding the average of the deviation so this is your deviation actually this is your mod di right deviation from the mean and you're finding summation fi into mod di by summation fi why a mod because we have to take absolute the deviation we have to take absolute not the not the algebraic deviation guys by the way let me tell you please note this if you do summation fi xi minus x bar by summation fi that means if you take the algebraic deviation sum from the mean position this will always come out to be 0 okay please note that's why we don't take the algebraic deviation we take the absolute deviation we take the absolute deviation okay similarly we can find the mean deviation from the median we can find the mean deviation from the median here the formula remains mostly the same mean deviation from the median is nothing but summation fi xi minus the median mod let me write a mod over here so this gives you the mean deviation from the mean let's take a small example to understand this let's say for this question if I ask you calculate calculate the mean deviation from the median so let's say the question is calculate the mean deviation of the data from the median remember we had already found out the median for this median of this data was 25 marks you can use this result so please use this as your result all right so first thing that we will find is your mod of xi minus so first of all you'll find your xi right so that's going to be 5515 25 35 45 then you'll find something called the deviance absolute deviance from median so that's going to be 20 10 0 10 20 and then we find out fi into mod xi minus q2 okay so multiply this column with this column that's going to be 140 okay then we'll have 30 then we'll have 0 then we'll have 20 then we'll have 160 sum this up that summation fi mod xi minus q2 that will be 350 okay so your answer would be the mean deviation about the median would be 350 divided by the total number of data so the total number of data here is 25 so that's answer is yes it would be 14 so 14 marks will be your mean deviation from the median given that the median is 25 marks that's actually a good amount very high deviation actually okay so more is the deviation lesser uniform is the data less is the deviation more uniform is the data so if the deviation is more the amount of scatteredness in the data is more if the deviation is less the amount of scatteredness in the data is less so please remember this now there's something called the standard deviation now it was realized that there were some bottlenecks in the calculation of mean deviation so mean deviation normally about the mean and about the median they are not equal right so this resulted into a lot of confusion so people started as the consortium of mathematician decided that let's follow a standard deviation rather than following a mean deviation so where they said that in order to calculate the standard deviation which we normally represent by the symbol sigma we'll follow a new approach to find out how scattered is the data so they use the formula instead of using mod xi minus x bar or using the absolute deviation from the mean position they just did xi minus x bar square into the frequency summation divided by the total amount of data under root of it okay this is very much like it very much resembles your root mean square root mean square velocity that you must have learned in kinetic theory of gases right so it plays mostly the same role to find out the scatteredness so you can think as if it's a new measure of finding the scatteredness just like for measuring temperature you may use Celsius you may use Fahrenheit you may use Kelvin so this is just another you can say a calibration scale to measure the scatteredness of the data okay please note the result of this will not match with the mean deviation normally we say mean deviation about mean will be four fifth of the standard deviation okay this is just an empirical formula again that you should be knowing mean deviation calculated from the mean will always be lesser than the standard deviation now don't ask me which is more accurate which is less accurate is because each have their own way of calculation right we cannot say Celsius is more accurate than Fahrenheit or Fahrenheit is more accurate than Kelvin like this they have their own way of measuring the temperature that's it okay now let us try to apply let us try to find out the mean deviation for the data which we just now discussed a little while ago so let's apply it on this so we already know the mean so we don't have to break our head for finding the mean I think the mean of this data was 25.4 marks so again let us find out xi for this xi we have to find out then you have to find out xi minus x bar whole square then you have to find out fi into xi minus x bar whole square xi again 5 15 25 35 45 all right so the difference on the mean that is 20.4 square so 20.4 square that's about 416.16 then we have 25.4 minus 15 the whole square that's about 108.16 then you have 0.4 the whole square which is 0.16 then we have 9.6 the whole square which is going to be 92.16 and then we have 45 minus 25.4 the whole square which is 384.16 okay then you have to multiply with this with the frequency so 416.16 into 7 that will give you 2913.12 then you have 108.16 multiplied with 3 that will give you 324.48 then we have 0.16 multiplied with 5 which is 0.80 then 2 multiplied with this which is 184.32 and then we have finally 384.16 multiplied with 8 that's going to be 3073.28 okay now we have to find the sum of it so let's add them 13.12 plus 324.48 plus 0.8 plus 184.32 plus 3073.28 that's going to give me 6496 okay so your answer will be the standard deviation is 6496 by 25 under root so this divided by 25 okay under root of that that's under root of 259.84 under root of 259.84 that's approximately 16.12 marks is that fine now you realize that the process is too cumbersome because first we have to find out the difference of the data or the class mark from the main position and we need to square it then we need to multiply with the frequency and then you have to sum up sum this up and divide by total number of data and under root the process okay so what we'll do is we'll come up with another formula which probably will relieve us of these cumbersome calculation but before I move on there is something which is very important this is the standard deviation there is something called variance okay which we normally write it as the square of the standard deviation so variance is nothing but square of the standard deviation so please remember this now I'll tell you what is the use of variance variance basically helps you to find out the standard deviation of the combination of two data okay and you realize that it's the variance that we calculate first this part is actually the variance okay this data is actually your variance and many a times variance is sufficient enough to know how scattered is the data okay we don't have to do an extra operation of taking the root please note that every operation comes with a time order that means if you do an additional operation on the data the computer is going to take that much amount of more time right so every operation has a time order so is it if it is not required to take the under root and if you can assess the data by just looking at the variance please do not waste your and computer's time by doing the under root process so people who will be pursuing economics or for that matter some course in statistics the use of variance is in saving an additional operation step that you normally use to find out the scatteredness of the data okay remember one important thing that both standard deviation very important both standard deviation and variance standard deviation and variance are independent of the shift of origin are independent of the shift of the origin that means if the origin shifts there is no if let's say if you decide to add a constant or subtract a constant to each of the data then your mean will change but your standard deviation and variance doesn't change okay this is very important okay second thing is they depend on they still depend on the change of scale that means if you decide to multiply or divide the data with a given constant even the standard deviation and variance will get affected okay so this was some direct question that was asked by J and other regional entrance exam previously so you should be knowing this important thing okay there is a question that was you know asked on this a couple of years back and let me just bring out that question in AI triple A 2006 it was asked let's take this question suppose a population A has 100 observations 101 102 all the way till 200 and another population B has 100 observations 151 152 all the way till 250 and if we and VB represent the variances of the two population respectively then find the ratio of VA is to VB can anybody tell me the answer for this except for 3D exactly akash it will be one right so this ratio VA by VB in this case will be simply one right because they will be equal remember this data is actually a shifting of this data so this has been a shift of origin this is called a shift of origin if there's a shift of origin the mean will change right the mode will change the median will change whatever but the deviance of the data will still remain the same yeah it's like you had some set of points okay and you shifted this point and some other location okay the mean has changed but the scatteredness of the data remains the same so your answer is going to be one in this case your answer is going to be one in this case let's do this question again AI triple E 2008 question please type in your response in the chat box what's done okay akash says option D what about others that's a simple question to answer so mean is given to be six variance is 6.8 which are the possible values of A and B yeah so what is your variance variance is nothing but it's a summation of the data from the mean position square by the total number of data correct so 6 minus a whole square and let's say 6 minus B the whole square then 8 minus 6 the whole square which is 2 square then we'll have 5 minus 6 the whole square which is 1 square and then you have 10 okay divided by 5 this is 6.8 and of course we have a plus B plus or 23 by 5 is equal to 6 that means a plus B is equal to 7 in fact all of them are 7 so this is used this is of no use to me okay no use to me so I have to figure out from there so 6.6 minus a whole square 6 minus B whole square this if I'm right that's going to be 34 minus 16 20 21 that's going to be 23 okay so of course 0 and 7 doesn't satisfy it 5 and 2 also doesn't satisfy it it's not 13 not 23 yeah sorry yeah 5 and 2 also doesn't satisfy it 1 and 6 also doesn't satisfy it yes 3 and 4 satisfies it because 3 square plus 2 square is going to be 30 so option number D is correct in this case option number D is correct in this case all right so let's take this question and just a small correction this value is 255 over here this is 255 so read this as 255 so the mean deviation of the numbers 1 1 plus D 1 plus 2 D all the way till 1 plus 100 D from their mean is 255 then the value of D is equal to please solve this and let me know your response in the chat box remember it's a mean deviation question not the standard deviation question so be careful share things okay 10.1 fine so let's solve this now your mean deviation from the mean position is nothing but summation of fi xi minus x bar by summation fi since these data is present only once your fi's are going to be 1 each so this is going to be just summation xi minus x bar by the total number of data I think the total number of data is 101 right so n here is 101 data okay what is mean first mean is nothing but since it's an arithmetic progression mean will be the first plus the last data by 2 that's going to be 1 plus 50 D okay now if you take the difference of 1 plus 50 from 50 D you will get a 50 D if you take the difference of 1 plus D again from 1 plus 50 D again you will get 49 D and this will continue till you reach D and then there will be 0 and then it starts from D to D again all the way till 50 D okay so this total divided by 101 this is given to you as 255 this is given to us as 255 okay so this is nothing but if you take D common it is 1 plus 2 plus 3 all the way till 50 twice stop correct so divided by 101 that's going to be 255 so this is nothing but D into n into n plus 1 by 2 into 2 will get cancelled divided by 101 is equal to 255 this will get cancelled by a factor of 5 this will get cancelled by a factor of 10 so this implies D is 101 by 10 that's nothing but 10.1 please note it has the same unit as the data this unit will be the same as the data so mean and standard deviation will have mean deviation standard deviation all of these quantities will have the same unit as the data itself so do not forget to mention their unit in their exams in the exams okay so little while ago I figured out that the standard deviation formula was very very cumbersome to be applied right so this was the formula that we were actually working with earlier and we realized that it's too cumbersome so now what we'll do is we'll try to simplify this we'll try to simplify this formula so we'll spend some time simplification of the standard deviation formula okay so let's say I start with this formula so from one I can say sigma square is summation fi xi minus x bar the whole square by summation fi correct let me open the brackets let me open this bracket so I can write it as xi square minus 2x bar xi plus x bar square divided by summation fi again let me expand on the numerator it becomes summation fi xi square minus 2x bar summation fi xi plus x bar square by summation of fi and let me individually divide by the denominator which is summation fi summation fi summation fi now here you would realize that summation fi summation fi will get cancelled and this term actually is nothing but the direct method of finding the mean so this could be written as x bar again so it will become summation fi xi square by summation fi minus 2x bar into x bar plus x bar square that's nothing but summation fi xi square by summation fi minus x bar square okay so here comes another formula that we should all know for finding the standard deviation which is summation fi xi square please note that this squaring is only on xi not on fi okay by summation fi minus x bar square so this is your second formula for finding the standard deviation this was your first formula this was your first formula for finding the standard deviation now there's the advantage of this formula the advantage is you don't have to do you can deal with xi squares directly you don't have to do you don't have to do xi minus x bar square this is not required okay so this is not required so this is delta faster than the previous formula both will give you the same result but this will give you the result little faster okay so please make a note of this now let me give you a simple question based on this formula this came in j main 2014 find the variance of first 50 even natural numbers so even natural numbers are two four six eight all the way till 100 so these are your first 50 even natural numbers now here you would realize that the other formula which we discussed that is summation fi xi square by summation fi minus mean square this would be more useful this would be more useful okay so fi is one because each of the data is occurring once so you can put your all fi's as one so this will become just summation xi square by total number of data minus the mean square now what is mean in this case mean in this case is nothing but the first and the last expression divided by total number of terms so that's going to be sorry divided by two not total number of terms divided by two that's going to be simply 51 okay now summation fi summation xi square means summation of let me write it as in a crude form two square four square six square all the way till 100 square by total number of data minus 51 square okay if I take a four common from the numerator it's just one square two square all the way till 50 square by 50 minus 51 square so we all know the sum of the squares of natural numbers from 1 to n that's n into n plus 1 into 2 n plus 1 by 6 so I can use that formula n into n plus 1 into 2 n plus 1 by 6 and of course we have a 50 down minus 51 square so 50 and 50 will go off even this will be cancelled 2 and 3 okay let me not cancel this off because I will lose a 51 so let's say it's 2 third of 51 into 101 minus 51 square so I can take 51 common so 202 by 3 minus 51 that's 51 into 202 minus 153 by 3 that's 51 into 49 by 3 so 51 into 49 that's going to be 949 51 that's going to be in fact I can cancel off a factor of 17 why am I doing this yeah 17 into 49 we can do so 49 into 17 that's going to be 3 6c 43 and 49 that's going to be 833 as your answer so which of the options is correct option number B will be correct in this case so again so there's a utility for the formula that we have just derived here you don't have to sit and calculate the difference of the means from each data and square it you can directly square the data itself right and use this formula okay let's take this question which came in AIEE 2003 so in an experiment with 15 observations the following results were obtained summation x square was 2830 summation x was 17 170 one observation 20 was found to be wrong and was replaced with the correct value 30 find the corrected variance of the data okay 3d gives the response as option 8 what about others all right it's almost everybody is giving the response as 78 it's again for corrected variance I need I need this formula summation of let me call it as x dash square by n minus the mean square the mean is going to be this this is your formula for the this where x dash is the corrected value where x dash summation x dash is the corrected value okay so since 20 was found to be wrong and was replaced with 30 so the new summation x square will be nothing but the old one subtracted with 30 square sorry 20 square and added with 30 square that's 2830 plus 500 that's going to be 3330 okay and your new x dash will be 2830 minus 20 plus 30 sorry this will be 170 minus 30 minus 20 plus 30 which is 180 okay so using the formula over here using the formula over here my new variance or corrected variance will be 3330 by total number of data which is 15 minus 180 by 15 the whole square so this will go by a factor of 2 2 2 and this is 12 square 12 square is 144 so your answer will be 78 which is your option number a is correct let's take another one which came in a triple e 2005 okay I've noted all your response I want at least two more people to answer this before I start discussing this okay Akar Shreya all right so mostly people are saying 80 so let's solve this the question is x 1 x 2 x till x n be in observation such that summation x i square is 400 and summation x i is 80 then the possible value of n then the possible value of n among the following is okay now if you calculate your standard deviation you actually use this formula summation fi xi square by summation fi minus you use a mean square mean is nothing but by direct method it is summation fi xi by summation fi the whole square right this is what we call as the mean also right this is the mean actually this term is actually the mean okay now if your fi is all one you can actually write this formula as summation xi square and let's say the total number of data is n minus summation xi by n the whole square now of course if you're under rooting a quantity this quantity has to be positive this quantity has to be positive quantity then only you can find the under root of it correct then only it will be real in nature which implies summation xi square by n should actually be greater than or greater than equal to summation xi by n the whole square so basically it's trying to say that the mean of the squares should be greater than square of the mean right this is like saying the mean of the squares should be greater than equal to the squares of the mean the square of the mean okay so if you use this formula then summation xi square which is 400 let's say the total number of data is n this should be greater than equal to summation xi by n the whole square that means 400 by n should be greater than 80 into 80 by n into n since n is positive we can cancel it off so which means 5 n should be greater than equal to 80 so n should be greater than equal to 16 so whichever data says n is greater than equal to 16 that would be your answer so only one of the data that is 18 tells you that the rest all are lesser than equal to six lesser than 16 so option number b will be correct now guys so far we had discussed the two formula for finding the standard deviation one was summation fi xi minus x bar the whole square by summation fi and we simplified it further to get the second formula also which is summation fi xi square by summation fi minus the mean square in fact there is further most simplification that you can even do to your formula number two right so still further simplification can be done because this formula itself is not that fast because again dealing with square of the huge data is time consuming secondly you have to sit and find the mean you have to find the mean here okay so further simplification we can do on this formula so please listen to the simplification that we can do on this now we all recall from our shortcut method of finding the mean that we used to have a term like di right di was actually xi minus a right and we used to find the mean also by using the formula a plus summation fi di by summation fi correct now what i'm going to do is i'm going to use this in this formula right how let's check so what i'll do next here is the substitution of xi with a plus di and substitution of x bar with this formula in the second formula so second formula says summation sorry sigma square is nothing but the variance is summation fi xi square by summation fi minus mean square so what i'll do is instead of xi i would write a plus di so this term i substituted it with this expression divided by summation fi minus instead of mean i will write this let me simplify this for you so it becomes summation fi in this bracket i can write it as a square plus 2 a di plus di square whole by summation fi and here i can expand it by using the formula a plus b the whole square if you simplify over here you get a square summation fi by summation fi here you get 2 a summation fi di by summation fi and here you get summation fi di square by summation fi and subtract so fi and fi will get cancelled and a square a square will get cancelled this entire term will get cancelled with this leaving behind leaving behind sigma square as summation fi di square by summation fi minus summation fi di by summation fi the whole square as your variance and this is your third formula now that i'm going to give you for finding the standard deviation under root of summation fi di square remember the square is only done on di minus fi di by summation fi the whole square please note that this formula is much faster as compared to the previous formula because the first advantage is you don't have to calculate no need to calculate mean there's no need for mean correct that saves you're half the time secondly you're dealing with di's which is actually much smaller much smaller than your xi much smaller than your xi so the squaring of it will be much faster okay so i call this as the shortcut formula for shortcut formula for calculating standard deviation and variance so use this formula whenever you get questions in your school exams as well okay so guys without wasting much time we'll move on to the fourth formula as well so still further simplification so further further simplification i can call it so i'll start with the third formula which we just now discussed the third formula was a standard deviation the standard deviation formula was under root of summation fi di square by summation fi minus summation fi di by summation fi square right now many a times you'd realize that these di's will have lot of zeros it'll have lot of zeros okay so we normally use the step deviation approach which we use for mean to write your ui right remember in your step deviation you divided it by h or a suitable step size so we'll use the same approach over here so what do we do is we replace your di with h into ui in this formula so as a result we get the fourth formula which is actually nothing but summation fi di is nothing but h square ui square by summation fi minus summation fi h into ui by summation fi the whole square so you can actually pull out your h square from here h square from here out which gives you yet another formula that we normally use for finding the standard deviation summation fi ui square by summation fi minus summation fi ui by summation fi the whole square so we realize more or less the structure is the same is just that minor modification starts happening in each of this formula so altogether we got four formulas to use for standard deviation if you take my suggestion i prefer the third and the fourth formula for finding the standard deviation okay don't use the first and the second one because they will be time consuming unless until your data is too simplified okay so guys moving on to the last leg of the chapter that is called the coefficient of variation i'll not take more than five minutes just bear with me for five minutes more coefficient of variation what is coefficient of variation coefficient of variation is nothing but the standard deviation expressed as a percentage so if you take your standard deviation divided by the mean and multiply it with 100 you get something called the coefficient of variation basically it gives you the variation as a percentage of the mean it gives you variation as a percentage of the mean right how it is helpful it actually helps in comparing the scatteredness comparing the scatteredness of multiple data so let's say i conduct a test in nps hsr nps kormangala nps rajajinagar nps yashwantpur out of let's say different different marks let's say yashwantpur i conduct out of 100 marks nps hsr i conduct out of 50 marks nps rajajinagar i conduct out of 20 marks nps kormangala i conduct out of 500 marks now if i want to know which data which whose performance is more uniform or more non-uniform i would have to sit and calculate their respective mean and respective standard deviation so how do i compare their standard deviation by calculating them as a percentage of the mean so higher is the coefficient of variation more is the scatteredness so if coefficient of variation is more than scatteredness or you can say uniformity is less you can say uniformity is less if standard deviation is less uniformity is more so you would be given two data and you would be asked in your school exam which data is more uniform right in that case you have to sit and find the standard deviation for both the data the mean for both the data and calculate their respective coefficient of variation whichever is less that will be more uniform okay is that fine so i will give you a simple question that you can actually do it as a homework and submit it to me on the group itself by taking a snapshot of your working okay very simple question i'll give you homework question let us say there are two manufacturers a and b which manufacture polythene bags okay now the pressure that a polythene bag can take in kgs pressure in kgs let's say 0 to 10 10 to 20 20 to 30 30 to 40 40 to 50 are the following for manufacturer a for manufacturer a be tested for the pressure that can be taken by the polythene bags so they both manufacture polythene bags a and b okay so it was realized that it was realized that 10 polythene bags could take the pressure of 0 to 10 let me write it in white okay 15 could take a pressure of 10 to 20 18 could take a pressure of 20 to 30 7 could take a pressure of 20 30 to 40 kgs and around 2 could take a pressure of 40 to 50 of manufacturer a okay for manufacturer b 9 could take the pressure from 0 to 10 okay 18 could take a pressure from 10 to 20 16 could take a pressure from 20 to 30 11 could take a pressure from 30 to 40 and 3 could take a pressure from 40 to 50 now the question is which manufacturer which manufacturer polythene bags can withstand uniform pressure which manufacturer produces more uniform pressure taking polythene bags so guys please send this answer to me on the group remember for both of them these are your frequency tables for both of them you have to find standard deviation and mean okay and you have to calculate their respective CVs CV1 and CV2 let me call it CV1 this is CV1 this is CV2 and who's whoever CV is lesser that will be more uniform is that fine so guys thank you so much over and out from Centre Mechanic next class will be taking up mathematical reasoning and 3d geometry with you okay thank you so much for coming online on a holiday bye bye over and out from my side