 statistics statistics so statistics has been there as a part of your maths curriculum or mass topic since your class 9th days when you learned about basic of statistics in 10th basically you learn your measures of central tendencies right so you learned your measures of central tendencies what are your measures of central tendencies mean median and mode right measures of central tendency okay and these were mean median and mode let me write it mean okay written as x bar median written as q2 okay q for quartile the second quartile is actually your median okay median is the central data when the data has been arranged in the ascending or descending order we'll talk about all those things and mode okay mode is the data which is occurring very very frequently as the highest frequency data is called the mode of the of that particular frequency distribution now what are we going to learn in class 11th so first let me give you a brief idea brief overview of what do we have to learn of course we need to remember your measures of central tendency which includes particularly mean and median mode is not required okay mode is not required for your school point of view but yes you should be aware of the formula of mode that you had learned in your class 10 however we are not going to use it for our class 11th statistics okay so this is not required this is not required okay so in class 11th the overview of the overview of statistics is as follows we have to study what we call as the measures of this person okay so class 11th statistics in fact the statistics for your jee as well we are focusing only on measures of this person and the measures of this person we are going to talk about number one the mean deviation the mean deviation and we're going to talk about two types of mean deviation one is about the mean or from the mean from the mean or about the mean whatever you want to call it and other is from the median now you understand why do we need the revision of mean and median because for finding the mean deviation which is one of the measures of this person we need to know how to find the mean we need to know how to find the median okay so just a quick recap i will do with you not much of our time should go in that the second thing that we are going to learn is another measure of this person which is called standard deviation okay so these are like two ways of measuring the dispersion of a data so standard deviation and we'll also be talking about variance and variance both are related so i'm writing them under one subtopic the third thing that we are going to learn is our coefficient of variation coefficient of variation okay so these are the three concepts that we are going to learn in this particular chapter fine all right so first of all a quick revision of the mean and median finding so as you all know we have two types of data or whenever we are given some kind of data it is basically categorized under two types one is called discrete data or what we call as ungrouped data okay you can have a discrete group data as well okay what i mean by grouped data is there is no class involved okay ungrouped data is no class involved group data is where the class is involved okay so we when we talk about data data can be written given to us in two forms one is called ungrouped ungrouped okay many people will call it as discrete okay discrete and other would be continuous data continuous data means where you would require some kind of a class okay so class will be required for you to represent the data most efficiently so we'll learn how to find mean and median for both these types of data discrete and continuous data okay so let's get started with a quick revision of mean okay so i will not take much of your time i will i will basically take this up through certain slides okay first we'll talk about first we'll talk about mean where the data is discrete and ungrouped discrete data means single single data okay ungrouped means the frequency is going to be one for each data frequency is going to be one for each data so every data is occurring only once okay so what does mean first of all mean is nothing but it's a weighted average it's a weighted average why i call weighted average why not just average because the data depending upon how many number of times it occurs right number of times basically is the number of frequency is the frequency of the data which basically attaches a weight to that data so if a data is occurring you know a lot of times visibly other data it has more weightage it has more weightage so your mean will be more concentrated or we will say more inclined towards that particular data okay so here is an example where i have considered that the weight assigned for every data is one because the frequency is one so for such cases you just add all the data and you sub divide it by the total number of data that will give you the mean of the data okay simple as you all need already know about this from your class ninth and ten days for example in this case if you have one three four two and five mean will be what some of all of them divided by five so some is 15 15 divided by five that's answer is three okay so everybody knows about it any doubt any concerns please do let me know okay now we'll talk about discrete and grouped data discrete and grouped data so here on your screen you have been given that there is a data x there is a data x i and the data is occurring as per this frequency for example one is occurring three times three is occurring two times four is occurring three times two is occurring four times five is occurring five times okay so what is the what is the mean of this data so it is a case of a discrete grouped data okay so discrete because there are single single entities right there's no continuity in the data there's no class involved okay so in such cases what do we do we use a simple formula summation x i fi by summation fi summation x i fi by summation fi so basically one into three three into two four into three two into four five into five by the total number of data so this is your summation x i fi this is your summation fi which in this case comes out to be 3.17 you don't have to worry about the values here just the concept is what is important okay now this method of finding the mean is called as the direct method this is called the direct method of finding them okay however you also know two more methods which we call the step deviation method sorry it shortcut method and the step deviation method so what are these methods we'll also revise them so for example the very same data so let me take the same data over here so let's say x i fi so one three four two five and the number of times it is occurring is three two three four five okay so what we did earlier was the direct method of finding the mean but that method becomes slightly cumbersome and calculation intensive if your xi is very large and your fi is also very large so the product of fi xi will be considerably large right and to do those kind of calculations whether you do it manually or through the use of high-ended machines the time order increases right so it's not about calculation of the mean it's about calculating it fast also okay time order becomes very very important when you tomorrow become a coder or a developer you will be asked to write codes with a lesser time order right which you can execute in much faster time right because as as technology involves you know people don't want to spend too much time you know you know waiting for a computation to happen imagine imagine you have given a computation to your calculator and you have to wait for five seconds for it to throw the answer I'm sure you would not like that right in the same way when you're doing calculations okay we always expect a lower time order for the process to happen so in this case what do we do we basically take a assumed mean okay we take assume mean which could be any one of the data right for example I can take three as an assumed mean now in reality assume mean can be any number yes assume mean can be any number right it could be zero also but if you take it as a zero it will become the same as the direct method right so in our junior classes I remember even my teacher used to tell me take one of the exercise and preferably the middle one as your assume mean she was not actually right in saying it but she was not wrong also the reason behind taking one of the data as as your assume mean is that at least when you subtract the data so what do we do next is we take a di di is nothing but the deviation so we subtract the data from the assume mean so that xi minus a for that particular data will become a zero so our calculation will be considerably reduced okay and not only that when we are adding fi di that we'll see next situation it basically helps us to you know let's say the positive part and the negative part helps each other to compensate right so our calculation of fi di summation becomes very very less so we'll see in this example so here what I will do I will calculate something called di so di is nothing but xi minus a so this will be minus two this will be zero this will be one this will be again a minus one okay and this will be a plus two okay then we calculate something called fi di fi di is the product of the frequency with the di so it's minus six zero three minus four and ten now as you can see ten minus six and four will basically take care of itself right so ultimately giving you only three okay and total number of data here is five eight eight plus nine is seventeen okay so the mean here is given by a formula a plus summation fi di by summation fi okay so in this case my answer will become three plus three by seventeen which I think will match with the previous answer that we had already discussed three point one seven okay now I would like to ask you one small thing very very simple question I would like to ask you how does this formula come have you ever you know wondered how this formula came about huh anybody what is the reason for that formula yes Shashod would you like to discuss okay that formula comes from the concept that mean depends upon yes but now tell me how does this formula come okay I'll ask you a very simple question I'll ask you a very simple question let's say your teacher in your school gave you a test no actually you are saying mathematically correct but what is the logic behind it okay so let's understand the logic by a simple example okay let's say your teacher gave you a test okay and then later she realized that she has you know given more marks in a particular question that she has corrected well it was a three marker and she has given a six mark okay so everybody she decides that okay I will subtract three marks from everybody okay and let's say everybody got that question right and by the way that question was just supposed to be of three marks but accidentally she gave away three three six six marks to everybody so she realized she has given three extra marks to everybody correct now what she decides is that I will subtract three marks from everybody's paper correct now tell me if the mean of that test for the class was x bar and she decides to subtract three marks from everybody's paper what would be the new mean what would be the new mean let's say x dash bar how will it be related minus three absolutely higher error correct so this everybody agrees correct now in the same way basically let's say these are the marks scored by these are the marks scored by one two three four five student oh sorry seven students in a class sorry 17 students in a class I'm sorry okay and then she realized that and then somebody realized that I have given three marks more to everybody correct so she decides to subtract three marks from everybody's marks okay now when you subtract three marks from everybody's marks you get the new data which is your di so this is your three marks reduced from everybody's marks correct so this the mean of this that you are finding mean of this is actually this data so this is your this is your summation f i di by summation f i okay so this is the mean marks when you have subtracted three from everybody's marks correct and this was your original mean which was before subtraction of three marks okay minus three so in this formula what do you do you do three plus summation f i di by summation f i isn't it so this three is actually nothing but your assumed mean right this process in statistics is called mean is dependent on origin this is a very very important statement which I have given you mean is dependent on order means if you decide to subtract something or add something to every data that you have the mean will also suffer the same fate are you getting my point so if your teacher decides to subtract three marks from everybody mean will also get subtracted by three if your teacher decides to give five marks to everybody extra then your mean will also get increased by five marks are you getting my point so what do you do in this case when you realize your data is very heavy which in this case it is not but I'm just taking this as a you know you can say experimental basis so if you subtract something from all the data so that you reduce the data so this is actually a reduced data which is manageable so when calc the calc computer does this calculation f i di it will find it very convenient to do because your di will be a smaller magnitude data and multiplication process will become faster are you getting my point so right now you're seeing very simple data but these datas could be huge for example let's say you're finding the mean of the you can say you know marks scored by all the students in India okay so a computer like this if you start you know taking all the actual marks the computer might take a lot of time to do it okay because you're dealing with course of data okay so in that case what do we do we use the fact that mean depends upon the shift of the origin mean depends upon the origin so if you shift the data your mean will also get shifted by the same value so I shift here I calculate the mean of the shifted data and then I compensate it by adding the quantity by which I have shifted is this fine any question so that is the reason for this data to have this formula to come out now there's one more formula which is called the step deviation formula by the way this is called the shortcut formula shortcut formula and the next formula that we are going to talk about is the step deviation formula in step deviation formula what do we do we further divide so I'll do one more step over here we further divide let me remove this we further divide this data di by a certain value okay which we call as the step size okay it could be any number just do not divide it by zero it could be any number of your choice right it could be one it could be two it could be hundred it could be five thousand doesn't make any difference to your answer right so let's say in this case I want to divide it by a certain number however I mean it is up to you to figure it out because we normally choose the step size as that quantity which can divide all of these data very easily so that there's no fraction okay but however just for demonstration purpose let's say I decide to divide it by two okay so let's say I keep my step size as two so if you divide this by two I get a minus one zero half minus half one okay I know I may have made my life complicated over here but that is just for illustration purpose next what do we call it what do we find is fi ui so multiply the frequency with the ui so this is minus three zero three by two minus two five okay and then we find summation fi ui so in this case summation fi ui will be three by two okay and in step deviation method your formula becomes mean is a plus step size into summation fi ui by summation fi okay which as per your given situation it will be two into three by two divided by 17 which is nothing but again three plus three by 17 answer is 3.17 but what is important is to understand from where has this formula evolved what is the core principle behind this formula the core principle behind this formula is is mean is dependent on scale these two are very very important very very important statements which I have written mean is dependent on the scale scale means if you decide to half your data up even the mean will become half if you decide to double your data up even the mean will become double correct so remember it depends upon origin so if you shift your data by a particular value mean will also get shifted by the same value it depends on the scale that means if you scale your data up or down by a certain factor mean will also suffer the same fate okay so why do we need this for the same reason why do we need to shift it you shift it to make the quantity smaller similarly you multiply with something or you divide with something to make the quantity smaller or bigger as per your convenience to manage it okay so when you do summation fi ui what are you doing actually you are finding the changed mean let's say x dash okay x dash bar and this is nothing but your original mean minus assumed mean divided by h so if you compare these two x bar will automatically become a plus h times summation fi ui by summation fi okay and this is how this formula actually comes about so this is called the step deviation method i hope you already know this so this is just a quick recall for you any questions here okay now we'll quickly revise the continuous data so for continuous data i don't have to you know write i think i have already prepared a slide for it yeah so recall for continuous data we are given some classes we are given some classes okay now classes also are of two types one is called inclusive class another is called exclusive class let me tell you mean doesn't depend upon whether the class is given to you in exclusive format or inclusive format now what is an exclusive class can somebody tell me what's an exclusive class so something like this let's say one two five okay six to ten okay 11 to 14 the way you write these classes like this this is an example of an of an exclusive sorry inclusive class inclusive class why does call inclusive class can somebody tell me basically i'm trying to revise something that you have already learned in your class nine why does call inclusive class no that's not the reason why yeah of course the starting and ending values are not the same huh because the upper limit is included yes so this five is a part of this class are you getting my point but let's say if i write the same as an exclusive class so all you all of you know how to make an inclusive class exclusive class so what do we do we see the difference between the upper limit and the next class lower limit and divided by two and add and subtract it from the upper and the lower limit of each class right so here the difference is one so one by two is half so what do we do we do 0.5 here and 5.5 like that 5.5 10.5 okay 10.5 14.5 okay so i have made this class as an exclusive class now why does call exclusive class it is because this data is not a part of this class it is actually a part of this class so if let's say some data is 5.5 you will count it under this class not the previous one okay you already know it come on okay i'm just telling you your class ninth statistics okay so this is called exclusive because this is excluded okay and this is included okay this is called inclusive because this is included over here itself is it fine now your mean calculation doesn't depend whether the class has been written in inclusive format or exclusive format because you actually work with something called the class mark this is called the class mark what is the class mark class mark is the average of or mean of the lower limit and the upper limit of a class so for example in this case the class mark will be for this class the class mark will be 1 plus 5 by 2 6 by 2 6 by 2 is 3 now if you check if you do 1 plus 5 by 2 or if you do 0.5 plus 5.5 by 2 it doesn't make a difference to be answered the answer is still 3 the class mark still remains 3 right so basically here we work with class marks class marks you can see think of class marks as a representative of the class okay just like uh i'm sure you would have seen local body elections right so every local body you know votes for a candidate right so that candidate is a representative of it for example let's say i live in jp nagar so jp nagar you know one of the you know persons will be the representative of jp nagar one will be for core mangla probably one will be for hsr layout one will be for rajaji nagar one will be for aishwant okay so class mark is like a representative of that class okay so for finding the mean we use the same three techniques direct method uh shortcut method and step deviation method using the class marks so your class marks will become your data and just find out the mean by using the class marks as you can see here you find summation fi xi okay and use the formula summation fi xi by summation fi so this will basically a direct method okay so this is a direct method of finding the mean is it fine if you have to find shortcut method you have to take one of your class marks as your assumed mean let's say you can take seven you can take 11 you can take 15 you can take three guys let me tell you it is not necessary to take now you know one of your exercise to be the as you mean as i already told you right as you mean is what as you mean is the quantity by which you want to shift all your data it is up to you what you want to take you can take 100 also doesn't make a difference right but it would not make your life easy right so when our teachers used to say take one of your xi to be as you mean she was not completely right also she was not completely wrong also she was right in the sense that it will make the our life easy right and she was not right also because i could take anything i could you know shift the origin by any amount i want got the point okay all right so we'll directly jump to business i'm so sorry yeah we'll directly jump to business we'll take a question a very simple question this question came in j e uh no sorry not j ai triple e 2007 very simple question okay i would request everybody to solve this within one and a half minutes by the way today is the last class i'm expecting everybody to be present sorry last class for this year don't worry so i'm aware that your exams are starting on monday so most of you will get busy in that so we don't disturb you during your exam time but yes keep yourself free to ask doubts feel free to ask doubts don't hesitate put your questions on the group either i or some of your colleagues will reach out to you okay last five seconds five four three two one okay not many people have voted only 11 of you have voted but most of you say option number eight okay now what is given that the average marks of boys in a class is 52 and that of the girls is 42 okay so let the number of boys let the number of boys be x okay and let the number of girls be y so now the total marks the total marks that boys and girls together would have scored is 52 into x plus 42 into y okay so the mean marks will be 52 x plus 42 y by x plus y okay you can treat this as your data and this as the frequency of this data so again y is the data and 42 is the frequency of the data okay so divided by the total number of data sorry f 52 is the data and x is the frequency and y is the frequency of the girls okay so this is going to be 50 so from here can i say 52 x plus 42 y is equal to 50 x plus 50 y that is nothing but 2x is equal to 8 y so x is equal to 4 y correct now what are they asking they're asking you for the percentage of boys in the class so percentage of boys in the class will be x by x plus y into 100 so x is already 4 y so this will give you 4 by 5 into 100 which is nothing but 80 percent janta what is wrong with you huh this question i think a grade 9 will also be able to solve properly this is not expected come on this is a simple question that you can get on me correct right ai triple e 2007 question all right next question since you disappointed you will get one more question j main 2015 the mean of the data set comprising of 16 observations is 16 if one of the observation valued 16 is deleted and three new observation valued three four and five are added to the data find the mean of the resultant data let's also solve this within two minutes time starts now all right five four three two one kill all right most of you have said option number D let's check whether D was correct or not so you have been given 16 observation whose mean is 16 okay so the sum of the observation so let me write it as summation x i some of the observations will be 16 into 16 right now one of the observation 16 is removed so from the total sum 16 is removed and you add three more observations okay so this will give you 16 into 16 256 minus 16 plus 3 plus 4 plus 5 okay so let me call this as x i dash so your new mean will be summation x i dash by by the total number of data so this is nothing but 252 now remember there was 16 data initially one is removed it will be 15 but three is added so that will be 18 more okay so 252 by 18 which is nothing but 14 absolutely correct janta option number D is right now you can imagine how easy a question comes in j main exams with respect to statistics i'm sure nobody would like to miss out on such easy marks okay everybody likes such easy marking okay so we'll now move on to median so what is median median is the central data okay median is the central data so we'll first talk about discrete and ungrouped data so for finding the median the very first thing that we do is we arrange the data in ascending or descending order it could be both the ways okay doesn't make a difference because middle data is middle data it doesn't make any difference whether you were arranging it in ascending order or descending order okay and if your number of data is odd median is given by the n plus one by 2th observation okay let's say an example one three four two five this is your data so first you arrange it in ascending order okay in ascending order will become one two three four five so your third data as you can all see very clearly that this is your middle data okay this is your middle data and middle data is what we call as the median okay so five plus one by 2th observation that is the third observation which is three in this case will become your median median is also written as q2 q2 is your second quartile q1 is your first quartile q3 is your third quartile uh quartile is not required for you so don't worry too much about it okay any question regarding the median of the data when the number of data is odd when the number of data is odd so when the number of data is even what do you do when the number of data is even what do you do so for even so when n is even your median is given by mean of n by 2th data and n by 2 plus 1th data so you take the average of the n by 2th data and n by 2 plus 1 of the data to get the median when your n is even for example let's say you are given the data 1 3 6 8 11 14 okay I ask you what is the median of this particular you know discrete data so here the number of terms is even so middle term will be somewhere over here okay somewhere between six and eight right so what do we do we take the six by 2th data which is your third data and fourth data and divide it by two so third data is six fourth data is eight and divided by two your answer is seven now many of you may be surprised you'll say sir seven doesn't even occur in the data it is fine median may not be a part of the data okay yes why do we do that it's a very good question are we basically assume that the data gap is uniformly distributed okay so between any two data we assume that had the data been there it would have been uniformly distributed between it you can say that is the disadvantage in statistics even when we were derived we are deriving the formula for continuous data there is a formula which we'll see in some time that comes from the assumption that within a class the data is uniformly distributed even though it may in reality it may not be the case for example let's say if I say between 1 to 10 the data is let's say six right so what do we what do we basically take this is that between 1 to 10 your data is uniformly divided into six let's say I take it as a five okay so your first data is one right second data will be three then five then seven then nine this is how we you know distribute this five within 1 to 10 of course 10 is not a part of the data it'll come in the next lot okay so in reality this may not be the case in reality all your data may be four also right so this is the disadvantages so we say keep the class size as small as possible so that it is more closer to the actual population are you getting my point so statistics has a very big you can say advantage or a disadvantage whatever you may call it it assumes that within the class this number of data is uniformly divided so here also between six to eight we consider that the gap is uniformly divided so if a one data has to come in between it has to be seven only the mean of six and eight what does mean actually tell us about the data okay very good question the mean actually tells you about the central data I'll tell you very simple statement here my teacher actually told me when she was teaching me statistics the difference between mean and median is mean tells you the economic status of the society while median tells you the social status of the society what are the difference between them see let's say in a country like India you know that only 20% people in fact it is applicable to the the worldwide scenario 20% people are rich okay and 80% people are actually poor okay not very rich I would say okay so if you calculate their average income okay average income will come out to be in crowds early in very high of lax but is that the salary drawn by a person who is in the middle state of the society so let's say this is the highest data this is your rich this is your poor okay and let's say this is a middle you know you can say no what do you call it as a middle level income okay we call no we are this is a middle class people upper class people lower class people with respect to their you know income okay so this is that's a middle class income do you think this middle class income would be the average income of all the people living in the society no actually when the number when the rich are very rich and the poor are very poor the mean salary will come somewhere in between and this will not match with the middle income or middle class income family so middle class income family is your median right mean is the average of all the incomes so if your data is symmetric then only mean and middle class income you know salary will match else it will not match are you getting my point for example let's say i'll give you more practical example let's say in your class you let's say you are given a test okay and out of 50 let's say maximum marks was 50 correct and let's say few of the students they did exceptionally well okay they got around 49 48 49 and a half let's say okay and of course there are some people who did very bad so what you did let's say there were 30 people in the class or let's say 31 people in the class to be more precise okay you basically arrange them as per their marks let's say you give them a ranking okay so rank one till rank a rank 31 okay so your rank 16 which happens to be the middle person in the performance do you think his marks will be the average of the marks of the class do you think that it may or may not be yes or no his marks not yes absolutely not necessarily so if i have to know how skewed is my data okay then i have to know the median also now this is something which i would like to discuss with you when you try to distribute the data as per the frequency okay your data maybe sometimes very symmetrically placed like this okay your data maybe sometimes very symmetrically placed like this in such case you realize that this point is mean this point is also the median and this point is sorry this point is also the median and this point is also the mode getting the point so this is under those situation where you realize that your frequency distribution of the data is very symmetrical right but this may not always be the case this may not always be the case you may have your frequency distribution sometimes like this so here what is happening see in this case your data is in such a way that mode is lesser than the median and it is lesser than the mean are you getting my point such data are basically called as positively skewed data so when your mean and median do not match it basically gives you an idea about the skewness of the data okay so skewness of a data is a very very important parameter which tells you how is your mean different from the median and the mode make sense now i would request everybody to note this down because it is very likely that j may test you on the skills also or this concept also if your data is like this by the way this is your frequency this is your data if your data distribution is like this you would realize that your mode is more than the median and median is more than the mean such kind of data is basically a distribution is called a negatively skewed distribution so we study median just to answer your question aditya we study median and mean to test whether your data is positively skewed or negatively skewed right this is very much used in share trading actually okay so skewness of the data is a very very important aspects in share trading okay so there are a lot of aspects you know aditya which i am not basically sharing with all of you because it is not a part of your syllabus there's something called kaal pierson coefficient of skewness bauli's coefficient of skewness which is basically telling you how is your distribution shaped okay this is very symmetrical this is called symmetrical distribution so in symmetrical distribution your mean and median will actually be equal but when your data is positively skewed your mean will be more than the median okay however i would like to discuss one more thing with you which i'm sure you would have studied in junior classes for moderately symmetrical data for moderately symmetrical data the distance between the mean and median the distance between the mean and median is one third the distance between mean and mode okay as a result if you just take the three on the other side so three mean minus three median is equal to mean minus mode and i'm sure you would have learned this formula in your junior classes mode is equal to three median minus two mean correct this is actually an approximation and it only works for moderately symmetrical data many people in the junior classes they used to ask me sir when we already know this formula why do we separately learn the formula for mode we can use the formula two median minus three three median minus two mean this is just an approximation okay that is why we used to call it as an empirical formula empirical formula i'm sure you would have done it in chemistry empirical formula is used to know an approximation of the ratio in which your elements of a particular you know sorry number of atoms of a particular elements are related to each other while forming that compound isn't it so this is just an approximation it is not an exact formula okay so this also please keep in mind but this is only applicable for a moderately symmetrical data if it is heavily skewed this formula may give a faulty result okay anyways you will learn more statistics in your undergrad don't worry you will learn about kurtosis leptocurtosis mesocurtosis platy kurtosis moments everything is there yeah by the way this is your median for a discrete ungrouped data discrete ungrouped data so let's talk about discrete grouped data for discrete group data we basically first make a table called as the cumulative frequency table so first of all arrange the data in ascending slash descending order okay okay so here i have given an example where i have distributed our data in ascending order so i've put it in ascending order this is the frequency of the data because we are dealing with grouped discrete data okay we make a chart called cumulative frequency now what is the cumulative frequency table cumulative frequency is basically just ask yourself how many data is less than equal to one whatever answer you get write it over in the cumulative frequency column so how many data is less than equal to one three so write three how many data is less than equal to two you will say three plus four so basically this seven comes from there actually we add this to this to write it over here okay how many data is less than equal to three so two plus four plus three which is nine which is actually obtained also by adding seven plus two okay so similarly you make a table called the cumulative frequency table please note that this last entry should match with the total number of data if there's a difference between these two means you have done some mistake in addition okay now what do we do we take this data and divide it by two so you get 8.5 now that data which is just having the value or that data which is having a cumulative frequency just more than 8.5 that will be your median for example here nine is just more than see just greater than 8.5 okay so this two this three will become your median this three will become your second quartile or median got it clear is this clear everybody any questions so for a discrete grouped data how do we find the median okay next we'll talk about continuous data we'll talk about continuous data so for a continuous data we basically use this formula l plus n by two minus c by f into h okay now let me explain you what is this l what is this nn already you have seen what is this c what is this f and what is this h now see it first of all we make a cumulative frequency this you know column over there okay so medium class is that class whose cumulative frequency is just greater than n by two okay so n by two as you all know from the previous discussion n is basically the total number of frequency divided by two now the class whose cumulative frequency is just higher than n by two that is called as the cumulative frequency that is called as the median class okay I'll take an example I'll take an example let's say this question so this is the class given to us let's say this is the interval given to us this is the frequency of the interval so first what do we do we make the cumulative frequency table okay cumulative frequency already you know how to make it so three three plus two five five plus one six six plus two eight so this last number is your n okay now in the formula that we have written I'll repeat it down again median is equal to l plus n by two minus c by f into h so here l is the lower limit of the median class now what is the median class so n by two here is four so the cumulative frequency which is just greater than four okay this is five the class which is basically opposite to that particular cumulative frequency that will be called as the median class that will be called as the median class okay now so what do we do next is in this formula c is the cumulative frequency of the class which is just before the median class okay f is the frequency of the median class h is the class height of the median class now can somebody tell me from where do we get this formula where do we get this formula what is the basis of this formula that is more important if it is equal to n by two then the lower limit will become the median no it is like the subdivision formula okay they go try to understand try to understand see here what are the meaning of this three this three means between one to five three data is over are you getting my point so your third your fourth data is actually your five got the point yes or no so your fourth data your fourth data is actually your five okay now see when you have total number of data is eight your middle data let's say I write data as your x1 x2 x3 x4 x5 x6 x7 x8 your middle data is somewhere over here correct so if your fourth data is five correct and between the next five and the let's say your 11th data is let me just give you an idea where a fourth data is five and your ninth is your sixth data okay ninth is your sixth data sorry seventh data seventh data am I right have I made any mistake here sixth data okay yes I am so sorry sixth data okay now you want to find your 4.5th data and between these two you know there will be see between five to seven there five to nine there will be two data five to nine there will be two data so which two data should be there by the way how can there be two data okay this is including the fourth fourth data yeah so between five to nine excluding nine there should be two more data correct so you basically see how much more data you have to you know pass and h by f is the jump in the data every change in the frequency so it is like a unitary method so what is happening the class between five to nine is divided in by two so this h is divided by two that means every jump in the frequency you have to jump by two are you getting my point so four is divided by two that means every one change in the frequency there will be a change in the value by two are you getting my point okay so if you want something which is between let's say which is your 4.5th data see actually there's some mistake over here but I think I have not written it properly but what I want to explain here is that this is the jump that you have to take and per jump how much is the data changing so you add the lower limit to whatever is the new jump that you are going to take to reach to the middle value so I'll give you a simple example to make you understand let's say between between let's say I take a some arbitrary number let's say n by two is 30th data okay let's say okay you know your 20th data okay and let's say the next you know is your 36th data okay so let's say these are the values known to you a and b so if you want to reach to the 30th data how will you reach so first of all you'll see between these two between these two what is the you know change happening so change happening is b minus a and it's divided by the total number of frequency in between which is 16 and then you'll see how much jump you have to take so you have to take a jump of 30 minus 20 so whatever is the a value you added to this isn't it this is what you will do okay basically the same thing is happening over here this a is your lower limit of the median class this is nothing but n by 2 minus c and b minus a is your class height 16 is your frequency does it make sense it is actually an arithmetic progression absolutely absolutely so between 20th to 36 let's say 20th is your data a 236 is your data b and let's say median happens to be your 30th data so how are you going to get to a 30th data so you're assuming that your data is uniformly distributed between them are you getting my point so b minus a by 16 is the increment in the data for every frequency change that you experience so let's say 21st data 21st data will be a plus b minus a by 16 22nd data will be a plus 2 b minus a by 16 like that right so if I have to jump by 10 so I'll do 10 b minus a by 16 into a sorry plus a okay this is the reason behind this formula okay many times we approximate it okay so sometimes you know we don't go by the exact value so plus and minus small value is permissible so this is the reason behind this formula so knowing this is very very important because we are going to use this in our finding the mean deviation from the mean okay but before we move on I would like to take a small question by the way uh median median is also dependent on dependent on origin and scale just like mean median is also dependent on origin and on scale okay so let's take a question ai triple e 2003 question the median of a set of nine distinct observation is 20.5 is each of the largest four observation is increased by two then the median of the new set is which of the following I'll just give you one minute to answer this question very good all right five four three two one go nice just because I told median will depend on the origin most of you have also voted for option number b okay now see people have voted for d by the way most of you have voted for d but some of you have also voted for b and c let's check see what is the question the question is there are nine distinct observations and the median is 20.5 if each of the largest four observation is increased by two see median is anyways in the ninth data the median is the fifth data let's say a1 a2 a3 a4 a5 a6 a7 a8 a9 so I've arranged the data in an ascending order so these are the five largest data so if you add a let's say two to two to each one of them okay remember median is not going to be unaffected is not going to be affected it is unchanged okay the question was not the change of the origin okay the question was not the change of the origin the question never said here you are increasing the entire data set by two you're only fiddling with the last four data which doesn't include the median so median will remain the same option number d simple question okay now coming to the main part of this topic we are going to study actually the mean deviation so now we have started the actual part so there are a lot of types of measures of this person and we are going to focus upon few of them by the way what is measures of this person and why do we need it measure of this person is basically used to measure how scattered is your data right why do we need to learn the scattering of the data why do we need to learn the scattering of the data see mean and median are central figures right they do not give you an idea about how scattered is your data for example let's say I talk about the mean marks well let's say I give you a test out of let's say 300 marks which is your j e main total marks and let's say the mean marks comes out to be 180 okay now looking at this 180 should I be very happy or should I be very sad it could be either one of them see 180 doesn't mean the entire class has performed well because as a teacher my aim is to see the performance of the the lot it is not a performance of few of them what if some of you have scored very high marks let's say our 10 of you out of let's say 20 or let's say five of you out of 20 have scored around 280 290 295 let's say some of you have scored 300 also right and let's say 15 of you have scored around let's say 190 60 40 okay but when you took the average average became 180 correct now as a person who is just looking at the mean it would be a wrong analysis to say that the class has performed well no only five people had performed well and that too exceptionally well but most of the people have performed on a mediocre basis or probably performed pathetically okay and because of these top five performance the average marks came out to be 180 so if I base my decision on the central tendency here which is your mean my analysis will be completely wrong right and imagine if these decisions are taken on a very large basis where a lot of money a lot of decision making is involved it can lead to catastrophe so I cannot I cannot base my decision just on mean or a median or a mode basically I have to also see how scattered is my data right so let's say my data is very very scattered let's say I'm just placing some points in space and this is the mean value this is the mean value correct I also need to see how much is my data scattered let's say x i data is scattered from this position or let's say x j data is scattered from this position okay so these scattering also tells us a lot of story so I'll give you another example let's say I want to set up a cloth shop where I want to sell clothes for people okay t-shirt shirt jeans pant whatever you know clothes people wear and let's say I hire a statistician I say hey go around this area let's say I want to set up set this up in Yashwantpur so I say hey go around in Yashwantpur and figure out what is the you know a type of population that lives in Yashwantpur okay so this fellow comes after five days of research work and tells me that the average age of the people living in Yashwantpur is let's say 20 years okay so I would think oh this is a young population right so I should keep t-shirts I should keep jeans pants I should keep six pocket and whatever you know required for a 20-year-old you know a group of people and once I set up this shop I realized not a single person came to me for buying okay let's say I'm the only shop in that locality but still nobody came to buy a single cloth right what do you think may have gone wrong what do you think may have gone wrong huh let's say the population only contained five-year-olds you know people and 35-year-old people correct now five-year-old people they require diaper they won't wear jeans pant and they don't want their t-shirts 35 year people they they are happy in their lungi lungi or whatever you call that in this thing so my entire shop my entire investment went for a toss why because I did not you know do a study of the scattering of their pop you know age group but let's say the same guy would have come and told me that you can expect a plus minus let's say you know 15 years of deviation in this data that means I should I would be mentally prepared that oh there could be a five-year-old also and there could be a 35-year-old also on an average I'm not saying this is the range of the data don't get me wrong 15 is the average of the deviations so this data along with 20 would have done more benefit to me to my business are you getting my part so just 20 figure was catastrophic right yes absolutely enough they could be one-year-old and there could be 50-year-old one-year-old kone chai jeans pant they didn't want this thing they are happy with their you know small small you know diapers and all right and 50-year-old people they don't want to spend they're quite relaxed in their dhoti lungi and whatever they maybe normally the old people were okay so my business one no imagine the same is done on a very larger scale okay what went wrong with that nano business right what did tata do they thought that okay India is a population where you know tata one lakh car will do wonders because he basically calculated the mean value of the income and all okay but it did not sell it did not sell why because India is a land where there's a lot of diversity there are multi-millionaires and there are people who cannot afford food for you know even a single even a two times a day right multi-millionaire will not buy tata nano he will go for Mercedes he'll go for BMWs he'll go for Audis a very poor people he needs a bicycle to go around why will he need a tata nano so this was a complete flop now i'm not saying it is because the tata did not study measures of this person they are a very bigger company to do such kind of mistakes but somehow these decision-making errors may come in if you don't keep this into your account are you getting my point okay so there are different types of measures one of them is your range which is very simple range is nothing but it's the difference of the maximum and the minimum values and there is one more in fact there are many more there are quartile deviations there are inter quartile range all those things but those are not a part of your syllabus what we have to mainly study is the mean deviation that is another measure which we i'm going to put the slide yeah mean deviation is another measure of this person which we are going to talk about so first of all let us understand the meaning of mean deviation the word itself is made up of two words average of the deviation right deviation is how deviated is your data okay normally we call this deviation as absolute difference okay so as an example let's say if you want to calculate mean deviation from a value a what do we do we subtract this a value from all the data involved okay let's say i call it as x i and we sum up the absolute deviation from a and divide it by the total number of observations so basically this is the core formula that you are going to use for mean deviation from mean mean deviation from median where a can become your mean also a can become your median also okay so i've given you a generic formula this is a very generic formula so when you want to study how much is the scattering in that given frequency distribution from a given point and that point can be mean also that point can be median also so we take the difference of every data from that value let me call it as a for the time being it could be mean or median any one of them and we take the absolute value why do we take the absolute value why don't we take the algebraic value see for example let's say if i want to calculate if i want to calculate the sum of difference of every data from the mean and divide it by the total number of data let's say i do not take the modulus what will happen this answer will actually become a zero yes or no right so if you take the exact value and take the mean and divide it by the total number of data this will actually become a zero so this will not serve our purpose right that is why this modulus is taken into account am i clear here okay so as discussed we are going to talk about two types of deviation one is the mean deviation from the mean and other is the mean deviation from the median okay so for mean deviation from the mean let me write down the formula so for the mean deviation from the mean we just take the difference of the data from the mean take a modulus okay sum it up divide by the total number of data okay so this formula is followed for a discrete ungrouped data okay if your data is grouped a small change will happen you multiply the frequency with the absolute deviation and divide it by the sum of the frequency so this is for discrete grouped data discrete grouped data if you are dealing with if you are dealing with okay let me just take this up as a question first of all before we move on let's say i want to find out by the way i i did not write the mean deviation from the median okay let me write that also yeah so mean deviation from the median is same summation xi minus the median by summation or you can say by n okay this is for your discrete ungrouped data okay and it is going to be summation fi xi minus q2 by summation fi this is for a discrete grouped data okay now we can go to the question okay let's say we are doing this question Aditya i hope sir but the skew of the data does the same thing right no skew of the data tells you how is the mean and the median different from each other it doesn't tell you the scatteredness of the data no got it Aditya mean median doesn't tell you the mean median just tell you how skewed is the data it doesn't tell you how scattered is the data there's a difference between that okay let's take a simple example here compute the mean deviation from the mean or about the mean and the median for the marks of the seven students given to you in this question okay so first of all you need to arrange this marks in first of all let's find out the mean what is mean mean is nothing but it is the sum of the marks divided by the total number of students okay so if you sum this up this will come out to be 140 by seven so mean is 20 okay i hope i've done the right calculation 42 56 82 this is 104 104 and 36 okay and for median for median you have to first arrange the data in ascending or descending order so the lowest marks is 14 then 15 then 17 then 21 then 22 then 25 then 26 okay and total number of data is 7 so 7 plus 1 by 2th data will be your median okay which is actually your fourth data so fourth data will come out to be 21 so this is your median okay now what is the question the question is finding the mean deviation from the mean guys don't get confused this mean means average average deviation from the mean so on an average how is the data deviated from the mean position that means all the deviations you add up and take an average of it getting the point so for finding the mean deviation from the mean position we have to first see how much is each data how much is each data deviated from the median so let me write down the data over it okay so first find out this is your data so first find out the deviant absolute deviation of the data from 20 right so what is the absolute deviation of the data from the 20 this is going to be 5 this is going to be 3 this is going to be 6 6 2 1 5 okay okay so this is the absolute deviation this is the absolute deviation take the sum of this take the sum of this some of this will come out to be 8 14 20 28 okay so your mean deviation from the mean position will be 28 divided by total number of data which is going to be 4 okay if you want to find out the mean deviation from the median so take the difference of the data from the median position so this will be 4 this will be 4 7 5 1 0 6 find the sum of it find the sum of it so this will be 15 20 27 so mean deviation from the median position will be 27 by 4 which will come out to be approximately 3.85 oh sorry 24 by 7 3.8 is the idea clear any question regarding how to find the mean deviation from the mean mean deviation from the median when your data is given to you as a discrete ungrouped data okay so we'll take one example of a of a grouped data also discrete group data can I go on to the next slide any questions question is find the mean deviation about the mean for the following data now I would request everybody to try this out first of all all of you please try this out and if possible give me your response on the chat box now many people ask me sir what method should be used to find the mean in this case your data is quite simple you can go with your direct method summation f i x i by summation f i confirm your mean with me if you have found out the mean please confirm the mean also with me is that the mean point of your mean is wrong aditya your mean is correct okay so in the interest of time your mean should have come out to be your mean should have come out to be 7.5 okay so as you can see on your screen I have taken a product of f i x i so four two into two four then forty six into ten sixties seven into I think I think I think this is seven into I missed a data overhead this was this is this is this is this is this is eight this is 10 and this is 12 yeah okay so this table gives you the summation of f i x i which is coming out to be 300 total number of data was 40 so your mean should come out to be 7.5 anybody has made any mistake in finding the mean please let me tell you the entire process will go for a toss okay next is we find out what is the absolute deviation of the data from the mean so 7.5 if you subtract from 2 you get 5.5 7.5 minus 5 2.5 and 1.5 0.5 so as you can see this table this is nothing but your mod of x i minus x bar table okay next is your f i into some f i into mod of x i minus x bar remember this is a grouped discrete data so you have to multiply the frequency okay so frequency will get multiplied so this is the table that you should all be getting take the sum of it it'll come out to be 92 okay it'll come out to be 92 so your answer will be your mean deviation from the mean position will be nothing but 92 divided by the total number of data which is 40 that will give you the answer as 2.3 aditya why 2.8 have you done your calculations properly check yeah a lot of mistakes can happen but don't worry these type of questions are only asked in the school exam it may not be asked in the computer exams okay anyways let's now go on to finding the mean deviation for grouped data group data means where you have some classes involved so not to worry in classes all we need to do is we need to take the class marks as our data itself and work out in the same way as we did the previous question so I would request everybody to just try one question on finding the mean deviation from the mean for this particular data so I'll also do along with you 10 to 20 20 to 30 by the way I should not waste time writing it let's directly write down the class marks so the class marks here will be 15 25 35 45 55 65 and 75 okay so what I did I found out the class marks for each one of the data okay just middle data of each one of them fine frequency two three eight fourteen eight three two let us find out the mean by using our step deviation method so for that I have to assume one of the data as the assumed mean let us take let us take our data to be 45 a to be 45 so 15 minus 45 is minus 30 minus 20 minus 10 0 10 20 30 okay then calculate f i d i f i d i is minus 60 minus 60 minus 80 0 80 60 60 oh my data is so symmetrical that summation f i d i actually becomes a zero okay summation f i d i actually becomes a zero so mean value is a plus summation f i d i by summation f i that is nothing but 45 plus zero by total number of data I think the total number of data is 40 correct 13 27 27 35 40 so answer is 45 itself okay next we have to make a table of mod of xi minus 45 45 is the mean okay so let's do that so thankfully you just have to mod this game because your a was also 45 only so thank we don't have to rework the whole thing again okay and your f i d i will actually be nothing but just take the mod of this data which is 60 80 oh sorry it is I should not write it as d i anymore I would write it as xi minus mean is 45 so this sum will come out to be 400 okay so your mean deviation from the mean position will be 400 divided by total number of data which is 10 is the process clear guys you have to have a hands on on solving these questions as else what will happen in due course of time you will keep forgetting these concepts okay now I'm not taking any question on finding the mean deviation from the median because the process is exactly similar okay all you do need to do is find out the median in that case so let's do one thing in the same question let us try to find out the mean deviation from the median okay let's not take a separate question so let me move on to a new page yeah let's save this question okay same question this time I'll change the question slightly instead of mean let's find out the mean deviation from the median position okay let's do this so that we have a practice of finding the median also as what will happen you'll keep on forgetting I'm not solving it I'm just writing down the data again so that I save my time median anybody has found the median median is also coming out to 45 it should be because as you can see this data is very symmetrical see 14 8 8 3 3 2 2 there's very symmetrical data so it's very obvious that it will get the same answer as the mean which was 45 okay anyways so first what do we do in such cases we find the cumulative frequency table cumulative frequency will be 2 5 13 27 35 38 40 okay so this value is your n correct now what do we do for finding the median class we see which cumulative frequency is just greater than just greater than just greater than n by 2 that is just greater than 20 just greater than 20 is this guy so you're this is your median class okay so just greater than n by 2 the class is your 40 to 50 class this is your l okay so this is your l this is your f h is your 10 the class height okay this is your c of the formula so your median is given by l plus n by 2 minus c by f into h so it's 40 plus n by 2 n by 2 is 20 minus c c is 13 divided by f now see f here is 14 f here is 14 and class height is 10 so as you can see this goes by 2 this goes by 5 so answer is 45 again so absolutely correct aditya the median is also 45 so if median is also 45 i don't think so we need to do this question because your mean deviation about median will also come out to be the same answer that we have got in the previous case is it fine because mean and median are same so deviation from the mean and deviation from the median both will give you the same reason okay any questions now let's see some questions which have been asked in the je exams if the mean deviation of these numbers 1 1 plus d 1 plus 2 d da da da da till 1 plus 100 d from the mean is 225 then what is the value of d then what is the value of d uh let's have around two minutes for this question poll is on only two people have answered so far okay i'll stop the poll in another 30 seconds so please answer those who want to five four three two one go okay 50 percent of you have said option number c let's check check see first of all what is the mean of the data so mean of the data is okay add them okay and divided by the total number of data by the way how many data are there here can somebody tell me there are 101 data okay so one is written 101 times okay and d is from 1 to 100 1 to 100 will be 100 into 101 by 2 okay divided by 101 so your mean will be 1 plus 50 d okay now start subtracting each data from this so if you subtract a 1 you will get a 50 d okay then if you subtract 1 plus d you'll get a 49 d correct this will go on all the way till you reach a d then 0 then again a d and this whole story will begin again to 50 d am i right in other words the mod of xi minus x bar summation summation oh i'm so sorry i will not be able to write it in that corner i will write it on the right side corner so mod of each data is summation xi minus x bar okay and this will come out to be nothing but mod d summation of 1 to 50 summation of 1 to 50 is 50 into 51 by 2 okay divided by the total number of data should give you the mean deviation from the mean correct this is just your summation xi minus x bar mod less okay so your mean deviation from the mean position will be mod d into 2 also correct so it'll be 2 times 50 into 51 by 2 divided by the total number of data which is 101 and this should be 225 actually they should have given 255 let me do one thing let me take 255 because this will give me a very uh accurate figure so uh yeah so this will be nothing but 2 and 2 gone okay so i think it will be giving you 10 mod d by 101 is equal to 1 so your mod d will come out to be correct me if i'm wrong 10.1 so i'm so sorry they should have actually given 255 over here 51 into 5 actually 255 a small error in giving the question okay anyways those who have given 10.1 this i think we have made a approximation but it is actually 255 here instead of 225 sorry about that there was an error in the question yeah i have run out that's what it should be 255 not 225 okay one more question will take ai triple e 2005 question if in the frequency distribution the mean and the median are 21 and 22 respectively then its mode is approximately 45 seconds to answer this question just 45 seconds as you have taken this question earlier actually so mean is 21 median is 22 what is mode approximately guys use a formula i have already discussed one formula with you which i call as the empirical formula for moderately scattered data i gave you a formula a little while ago okay five four three two one go okay most of you went with option number d let's check so as i told you the empirical formula is mode is three times median minus two times mean okay so it's three times 22 minus two times 21 which is 66 minus 42 24 is the approximate value okay but remember this is just an approximate value not the exact value it all depends upon how is your data okay so with this we move on to another type of scatteredness parameter and that is variance and standard deviation so variance and standard deviation are another measures of measuring the scatteredness of the data so let me define variance first variance is signified by sigma square okay so this is symbol for variance variance is also called as the second moment okay what is the moment by the way this is not important for you but you may note it down arith moment of any data is defined as summation fi into xi minus x bar to the power of r by summation fi you study moments when you are basically studying the the parameter which tells you how peaked is your data which we call as the kurtosis measure okay so this is used in kurtosis kurtosis of the i can say measurement of kurtosis of our frequency distribution okay not there in your syllabus so just a idea because let's say if you see a word called moment second moment first moment first moment is always zero because if you take the summation fi xi minus mean by total number of the data i'll always give you a zero second moment third moment fourth moment they are basically used second moment is variance third moment and fourth moment is basically used for measure of kurtosis kurtosis tells you how peaked is the data okay how peaked is the data anyways does the power of the difference matter why not it'll definitely matter okay anyways it is defined as as i've already given to you it is nothing but summation so let's say if i talked about uh uh the grouped sorry uh discrete ungrouped data it will be xi minus x bar whole square by total number of data if your data is this is for discrete ungrouped okay if your data is discrete and grouped a small change in the formula will happen a frequency will come into picture the frequency will be multiplied to it by total number of frequency this is this i should have written on the other side because ultimately you'll be taking this as a notes no so i don't want to mess this up and it would be variance will be given by summation f i xi minus x bar the whole square by summation f i okay this is for a discrete grouped data grouped and you can say continuous data also in continuous data your xi will just become the class mark okay now standard deviation standard deviations written as uh uh sigma it is just under root of the variance it is just under root of the variance so let me write it like this standard deviation is just under root of the variance okay so symbolically we write it as sigma so sigma will be under root of summation xi minus x bar the whole square by total number of data if your data is a discrete discrete ungrouped data okay and it is nothing but summation f i xi minus x bar the whole square by summation f i if your data is a discrete grouped data or a continuous data or a continuous data now guys one more important thing i would like to tell you here there are some ridiculous mistakes made by student in the exam one ridiculous mistake is cancelling out these please do not do that this is not a this is a summation of the product of these two are you getting a point even though this bracket is not written please do not think as if you are summing up f i and multiplying it with this no it is the sum of the product of f i with xi minus x bar the whole square don't do those mistakes are you getting my point okay now when it comes to when it comes to finding out the standard deviation and the variance this formula sometimes gives you a lot of you can say intensive calculations especially when your x bar is of fractional value yes so if your x bar so it's not useful when your x bar has decimals okay because why see you are not given a calculator in the exam neither in J exam nor in your school exam so let's say x bar is a decimal subtracting will also give you a decimal squaring a decimal which is let's say two decimal places imagine how painful will it be so this formula is basically fine if you're dealing with such data where the mean is coming out to be a whole number or let's say if it is decimal also there's a decimal of 0.5 whose squaring is very easy but this formula will not work fine if your xi has decimal places let's say running up to two decimal or three decimal it will become a really really painful process so what do we do we simplify this formula even further okay so what do you see is basically the first you know version of the formula i'll give you more versions of the formula so let's take the second version of the formula so in the second version of the formula i will try to simplify the variance and the standard deviation by the way variance and standard deviation basically are just squares and square roots of each other so so let me just simplify this formula for you even though i've taken a case where the frequency is there it is equally valid even if the frequency was not there okay so this data i'm going to or this formula i'm going to simplify even further so i'm going to open up this square here and i'm going to write it as xi square minus 2x bar xi plus x bar square okay divided by summation fi now open the brackets it'll be summation fi xi square minus 2x bar summation fi xi plus x bar square summation fi so what i've done i've just opened the brackets on the numerator okay so fi xi square summation now x bar is a constant remember mean is a constant so mean will not participate in the summation process okay so i've taken 2x bar common out and individually divide each by summation fi fi that is there in the denominator okay this two will get cancelled off okay now this term i'm sure everybody would recall this is actually what this is actually what this is actually the means isn't it so it becomes summation fi xi square by summation fi minus 2x bar x bar x bar x bar will become x bar square plus x bar square so this will become simplified as summation fi square by summation fi minus x bar the whole square okay if you want you can also write it as summation fi xi square by summation fi by summation fi xi by summation fi the whole square okay both are same thing it's just that the mean i have written as a direct formula okay so this is your variance formula this is your formula for the variance so standard deviation is just under root of it okay now what is the benefit of this formula just one simple benefit this formula has if you see this formula nowhere it requires you to find the difference of the data from the mean so even if your mean was a figure which was having two decimal places you are actually not using it to take the difference and you're not squaring that value also okay here you're dealing with directly your data xi or the class marks if it is a continuous data there's no involvement of the difference of xi minus x bar so this has a better time order this has a better time order vis-a-vis the previous formula better time order means it is faster as compared to the previous formula don't worry we'll take an example don't worry about it we'll take an example for this okay any questions any questions any concerns okay let's take a small question okay yeah find out the variance and standard deviation of this data so first of all our main purpose would be to find out the mean of the data okay let's find out the mean of the data so for the mean of the data let's use our direct method i think the data is not very ugly so let's let's find out the class marks first so five fifteen twenty five thirty five okay frequency is two three five five so make a column for fi xi okay so this will be 10 45 125 175 sum this up so summation fi xi will give you 355 correct now pay attention everybody mean is summation fi xi by summation fi isn't it so your total number of data i think is just uh 20 oh sorry my bad 15 15 so this will give you 355 by 15 now it actually comes out to be 23.67 now when two decimal places come imagine you doing xi minus x bar whole square right it will really be a painful task my dear students very very painful task correct so the first formula that we had derived i will not use it which formula i will use i will use the second formula so now i will find out xi square okay and i'll find out fi xi square so xi square we all know the square of this data so 25 225 this will be 625 this will be 1225 okay now multiply with the frequency this will be 50 this will be 675 this will be 3125 and this will be 6125 okay sum this up this sum will be nothing but uh this will come out to be around 9975 okay now let's use the formula second formula the standard let's find out the variance variance first is summation fi xi square by summation fi minus the mean square isn't it so it is 9975 by 15 minus 23.67 the whole square okay so i've already done this calculation this comes out to be 104.73 okay so your standard deviation will come out to be under root of this which will come out to be 10.23 okay so here you can see this effort is much faster vis-a-vis you finding out the difference and you squaring the difference and especially with a decimal value of mean it will make that process very very lengthy right so this method is very very useful when you are having a mean which is running into decimals what the point got the purpose for deriving the second formula okay but many people including me i'm not happy with this formula also right the reason being i have to find out the mean so i'll give you a formula where even mean is not required for finding out the variance okay so let's look into the third formula so the third formula is where mean is not required so variance i'll write down the topic name variance and standard deviation without finding mean yes so why do we need mean when we have to find variance and standard deviation we can actually do without it also okay so what i'm going to do is i'm going to use the same formula that i had derived in the previous page and i'll i'll better this formula i will better this formula how let's see now all of you please recall when we were finding the mean by using short cut method there was something called di what was di xi minus a correct and our mean came out to be a plus summation fi di by summation fi correct i'm going to use these two in the given formula so i'm going to use these two in our formula of the variance or standard deviation which is you know we can find it out later on by squaring the square rooting the answer so instead of instead of an xi i will write a plus di so this xi i replace with a plus di okay and this mean oh sorry square of the mean yeah this mean i will replace with a plus summation fi di by summation fi the whole square okay let us expand it let us expand it most of you would be thinking are you sir are you simplifying it or are you complicating it don't worry just wait for the simplification to take place you would realize that this formula will become a very very easy and very very you can say efficient formula to use by the way the square of this i'm just expanding it just have a bit of patience because this may take a bit of time to simplify but on simplification will become very very easy okay now see here let me open the brackets on the numerator and divide it by the denominator so this will become summation fi a square a square will come out because a is a fixed value as per the assumption that you have made oh oh this is square on this not square on that and this yellow term i'm just keeping it as yellow itself now this will get cancelled with this in fact summation fi summation fi gets cancelled and a square will also get cancelled with this leaving you only with leaving you only with summation fi di square by summation fi minus summation fi di by summation fi the whole square okay so this is your new formula for variance and this is your new formula for standard deviation it's standard deviation is just under root of variance okay now how is this formula faster than the previous two formula let's say how is this formula faster than the previous formula it is basically faster than the previous formula because this is not required mean is not required here so what are you working here with you're working here with the assumed mean yes you're working here with the assumed mean and you're working here with the deviance there is no mean required to solve this question there's no mean required to solve this question okay let's bring the same old question back and try to solve it let's say i bring the same old question back you yourself would realize how efficient is the process let's say the very same question okay now all of you see how fast it will be so first of all find out the cross marks which is 15 sorry five five 15 25 35 this is two three five five then take one of the values as your assumed mean let's say i take 15 as my assuming okay find out the deviance deviance is xi minus assuming which is going to be minus 10 0 10 20 find out fi di which is minus 20 0 50 100 find out di square which is 100 0 100 400 then find out fi di square which is nothing but product of these two columns which is 200 0 500 2000 summation of this will be 2700 okay and use the formula use the formula variance is equal to summation fi di square by summation fi by the way summation fi was 15 if i'm not mistaken minus summation fi di by summation fi the whole square okay so this will give you 2700 by 15 minus summation fi di if you see this is going to be just 130 so 130 by 15 the whole square okay let me just do the calculation part of it and then i'll tell you what is the answer that comes out from this i have to use a calculator because i don't want to keep everybody waiting because of this so 130 by 15 the whole square there you go i get 104.8 okay is it the same formula same answer that we got i i hope so yes this is the same answer so your standard deviation is just the under root of this which comes out to be around 0.23 okay this formula is fast because we do not require mean in this case no mean is required so this formula is sometimes called the shortcut formula it's called the shortcut formula or shortcut method shortcut method okay one very important thing i would like to highlight over here before i you know give you questions there is one more formula which is actually not a very important one but still i will discuss with you in the previous formula we derived that your variance was summation fi di square by summation fi minus summation fi di by summation fi the whole square right now if you recall when we were doing the mean by using step deviation method we used a value called a user column ui ui was di by h right so basically i can use this again to make this formula slightly more interesting so instead of di i can use ui into h okay so just replace your di with ui into h okay and h square can actually be taken out common so this is yet another formula for finding out the variance so please make a note of this okay and this is a formula for standard deviation just query doing the previous isn't that's it okay now i personally feel this formula is not very fast because ultimately you have to do the same activities that you already did for the shortcut method so this is just for your extra knowledge that you must keep into your mind but trust me this formula is not very fast okay my personal experience is this is not a fast formula okay so four formulas altogether we learned only for standard deviation and variance and each of these formulas have their own importance so don't be like said why are we learning four formulas when we can manage with one no it all depends upon situation to situation so every formula has its own importance okay so guys here one thing i would like to highlight very important thing if you recall your very i think second formula of standard deviation we learned something that variance is summation xi square by total data minus the mean square okay and since you are equating it to square of something it's actually always a positive term okay so this means that summation xi square by n will always be greater than equal to the square of the mean so sometimes we call it as mean of squares is greater than square of the mean okay it's a very interesting inequality property as people who have learned prmo or who are part of a prmo rmo batch of last year we learned something called the we call learned something called the chebyshev chebyshev inequality okay so this is a very very interesting inequality and jake has asked one question based on this in one of his exams we'll take that question in some time second important thing i would like to highlight over here this is number one standard deviation variance mean deviation whether it is the mean deviation about mean or whether it's the mean deviation about the median these guys are actually independent of origin they're actually independent of origin independent of origin means they do not change if you are shifting the origin let's say if you decide to subtract two from every data standard deviation is not going to change variance is not going to change mean deviation from the median is not going to change mean deviation from the mean is not going to change are you getting this point so they are independent of the change of the origin but yes they are dependent on scale but they are dependent on scale dependent on scale so if you decide to half your data or if you decide to multiply your data bit two these guys are going to change okay sir why is it whole squared in the first term i didn't get that hurry up i'm used the i've used the normal formula why what's the problem long ago which one this one oh i i got your message okay all right so let's see questions so let's see what type of questions have come in the je exams so this is ai triple e 2006 question let's solve this time starts now okay five four three one go one thing i wanted to ask how many of you are from nps hsr most of you have said by the way c okay let's check how many of you are from nps hsr okay let me run a poll for this also so uh please press a if you are from nps hsr b if you are from yashwantpur and c if you are from a different school a non nps school okay i'm running a poll guys come on vote this is not a question that you're supposed to solve i don't tell you sir i forgot my school in the corona time that means 11 of you are not paying attention in the class because i only got 10 votes how many of you are from nps hsr vote for a how many of you are from nps ypr vote for b and how many of you are from a different school altogether let's say dps or uh let's say ssrvm vote for c pranav dharma gadi you are from this school pranav dharma gadi you are from this school he's not responding only prathik you're from this school and prathik you're wasting everybody's time why are you raising your hand pranav i've asked you to vote right alley pranav vote vote man option a vote people from nps hsr press on a button my goodness are you people dumb or something i'm asking you to press on a if you're from nps hsr why are you not pressing on a where is prathik i know for sure prathik is from nps hsr why you're not voting who else is from nps hsr that i came to know prathik where are you i can see your messages but you did not vote okay so eight of you are from non nps schools okay five of you are from nps ypr most of you have not voted anyways uh this is very important because we wanted to conduct some classes from the center which is in hsr only so i wanted to know the number actually in the summer break all right uh okay so this question is a five second question five second question most of you spend around two minutes and give a wrong answer see these two data they're two populations set population a and population b population b is nothing but 50 plus population a data so 151 is like 150 plus 101 152 is 50 plus 102 and so on and so forth correct if you shift your data by 50 50 does it have any change in the variance have any change in the variance no so v a and v b will be equal so v a by v b will be equal to one so option a is correct option number a is correct akash will let you know that because we are deciding on something so we'll let you know about it it is this in speculation we are not we are not concretized on it we are not consolidated that okay yeah yeah even your symptom classes will start on 15th only 15th or something yeah 15 16th is it fine any problem with this question okay let's take another one ai triple e 2012 question let me put the poll on for this so read the two statements statement one says variance of uh let x1 x2 be n data and x be their arithmetic mean and sigma square be the variance so variance of two x1 two x2 etc is four sigma square and arithmetic mean of this is four x simple question we'll conclude this in next 30 seconds five four three two one go most of you have said option number d let's check guys what did i discuss with you about the mean mean is basically dependent on the change of the scale so if you take your two x1 two x2 and you take the mean of all the data it's going to be two times the original mean that you had which is x in this case so this result is definitely false so statement two is false and only one of the answers says it okay anyways we'll complete it what about the variance now see what is variance variance is summation xi square by total number or you can say summation xi minus x bar the whole square by total number of data isn't it if you change your data let's say i take sigma dash square if you change your data by a multiple of two even your mean will become no let me write x in this case because they have used x so even your mean will become two x and square of this by total number of data you can actually bring out two square out correct and this is nothing but four sigma square so your new variance will be four sigma square so statement number one is actually true okay so statement one is true but two is a false statement so option number d is absolutely absolutely okay the last part of this chapter which is just a small part is coefficient of variation what is coefficient of variation coefficient of variation is basically it helps you to compare the scatteredness of two data or two or more data it is used to compare the scatteredness two or more data so why do we need to compare the scatteredness of two or more data because you want to know which is more uniform and which is more non-uniform right now let's say i give you a very simple situation there are three students s1 s2 s3 okay let's say these three students take their exam out of a different total mass let's say s1 is a pu student he takes his exam out of 600 okay out of 600 is course 540 okay s2 is a cbsc student he takes his exams out of 400 and he scores around 480 okay and let's say s3 is a uh you can say uh Cambridge board student who takes his exam out of let's say 1000 okay and he scores around 940 marks who is a better performer who has performed better in order to know that you basically find out their percentage marks so this is this this is called 94 percent this is called 96 percent and this fellow is called 90 percent so this guy is a better performer right so basically use percentage marks to know which student has scored a better you know percentage irrespective of what was whatever was the total marks isn't it in the same way if you have more than two data or two data involved of course they will have a different mean okay they will have a different standard deviation so how would you know which data is more scattered because your scatteredness of the data depends upon the value of the standard deviation read in light with the mean of the data okay so we define something called coefficient of variation so coefficient of variation is defined as the standard deviation of the data as a percentage over the mean okay so see if let's say I say the standard deviation is 40 okay mean is 100 let's say this is some data okay another data standard deviation is 20 okay and let's say the mean was mean was 30 which data is more scattered you please do not you do not you know try to analyze it just on the absolute value of the standard deviation here it looks that this data is more scattered because it is 40 and this is 20 but actually in reality the mean here if you see it is just 30 as compared to 100 over here so think as if this there are two students one has scored 40 marks out of 100 another has scored 20 marks out of 30 who is a better performer this guy is a better performer isn't it so whenever you are reading the scatteredness of sorry I should say B over here my bad I wrote A so if you want to compare whether data set A or data set B is more scattered vis a vis the other you find out its coefficient of variation okay so in the first case your coefficient of variation comes out to be 40% in the second case it comes out to be 66.7 percent so coefficient of variation is more for this that means this data is more scattered right so higher is this standard sorry higher is the coefficient of variation more scattered is the data okay lower is your coefficient of variation lower is the scattering of the data is it clear so please do not base your judgment just on the value of the standard deviation read it in light with the mean value got the point any question with respect to coefficient of variation so what type of questions can you expect of course j will not ask you any question but in your school exam they can give you two data and they can ask you to find out which of the two data is more variable so for that you find out the standard deviation and the mean for both the data okay and just check which has which has more coefficient of variation that will be more scattered which has less coefficient of variation that will be less scattered okay so with this we close this chapter and now you can take a small break I will not give you a long break