 As Salaamu Alaykum, welcome to lecture number 11 of the course on statistics and probability. You will recall that in the last lecture, we began the discussion of a very important concept in statistics and that is the concept of dispersion. I discussed with you the difference between an absolute measure of dispersion which measures the dispersion in the same units as the units in which the raw data has been expressed and also I discussed with you the concept of relative measures of dispersion. When we divide an absolute measure of dispersion by the corresponding average, we obtain what is called a relative measure of dispersion and the most important property of this kind of a measure is that it is a pure number and therefore, using relative measures of dispersion, we are able to compare the variability of two or more data sets. We discussed the range, the coefficient of dispersion, the quartile deviation which is also called the semi interquartile range and also the coefficient of quartile deviation and after that you will recall that we had discussed the mean deviation. Today we will continue with the discussion of the mean deviation and I will discuss with you in detail the first case when we deal with the raw data and then the case of the frequency distribution of a continuous variable. You will recall that from the last lecture's end, I had told you that when quartile deviation or range measures dispersion around the median in case of the quartile deviation and the mid range in case of the range, mean deviation is a measure in which our attempt is that we measure the arithmetic mean around dispersion. Let me explain this to you with the help of an example. Suppose that we have data regarding the number of fatalities in motorway accidents in one particular week in one particular city and the values are number of deaths on Sunday due to motorway accidents 4 and for Monday the number is 6 and then 2, 0, 3, 5 and 8. Adding these numbers the total number of deaths for the week comes out to be 28 and dividing this number by 7, the mean number of deaths for the week comes out to be 4 deaths per day. Now, you know that our focus is not on the average value, but on the variability in the phenomenon that is the number of deaths, the variation in the number of deaths from day to day. This is the focus at this time. So, in order to measure the distance of any individual x value from the mean students, we will simply compute all these distances as you now see on the screen. In the third column of the table that you see, the distances of the x values from the arithmetic mean which is 4 come out to be 0, plus 2, minus 2, minus 4, minus 1, plus 1 and plus 4. Students, you must have noted that some of these deviations are positive and some of the deviations are negative. Well, that is obvious. If our x value is greater than x bar, then obviously x minus x bar will come out to be positive and if our x value is less than the mean, then the deviation will naturally come out to be negative. In order to compute the spread of the x values around the mean, you will note that this is not the case. If I add all these deviations with each other, students the sum comes out to be 0 as you just saw on the screen and this means that overall dispersion of the x values from the mean is 0 but obviously this is not very correct. Obviously, there is a variability in all these values around the mean, so there is something wrong and you have to think of some way of overcoming this problem. The easiest way of overcoming this problem is to take the absolute values of all these deviations. So as you now see on the screen, the absolute values of the deviations are 0, 2, 2, 4, 1, 1 and 4. Adding these absolute deviations, we get 14 and in this way we have achieved a non-zero sum in our third column. Dividing this sum of 14 by the number of days in the week that is 7, our mean deviation comes out to be 14 over 7 and that is 2. Now students, there are two things to note. One is that the average value of the mean value of the absolute deviations as I just explained. So this measure's complete name is mean absolute deviation but to short it, we just say mean deviation. The second point is that if you look geometrically at the absolute values, then there is no problem because any one of these absolute values gives us the horizontal distance between the mean value and any one individual value. May that individual value be to the right of the mean, may it be to the left of the mean. To in distances may say, koi distance chota hai, koi distance bada hai or hum in order to get an overall idea, we find the average of these distances. Ke chote aur bade distances ki jo average value hogi that will be an intermediate value and that intermediate value of the distance acts as the measure of dispersion of the values around the mean. So the formula for the mean deviation is simple. It is the sum of the absolute deviations divided by the number of deviations. Yethu hua the case when we are dealing with raw data as in the example that we just discussed. Now what will be the formula in the case of group data? The case when we have a frequency distribution as you now see on the screen, the formula in the case of group data is sigma F into modulus of D divided by N. Ab is formula ki application ke liye 2-3 basic steps hai, jin ko aap zahin me rakhe. The first step is to compute the midpoint of every class and that will be denoted by x. The second step is to compute the arithmetic mean of the data set using exactly the same formula that we discussed in an earlier lecture and that is sigma Fx over sigma F. The third step is to construct the column of x minus x bar, yani har klas ke midpoint me se aap wo arithmetic mean subtract kar dije jo aap ne abhi compute ki hai. These deviations are the ones, some of which will be negative and some of which will be positive aur inne hum small d se denote krte hai, likin jaisa thori der pehle aap ko me ne bataya ke we need to take the absolute values of these deviations, so we can construct one more column and that is the column of modulus of D. The last column that we will need to construct is the column of F into modulus of D, kyu ke har klas ke jo frequency hai uske saath hum is absolute deviation ko multiply karenge prior to summing them in order to obtain sigma F modulus of D. Once we have this sum, of course we should divide it by the total number of observations in order to obtain the mean deviation. Ye tamam steps to hai, me aap ko encourage karungi ke aap apne textbook ki koi exercise uthaye aur in tamam steps ko uske upar practice ke je, so that you feel at home with this formula. Aaye ab mean deviation ki graphical representation pe gaur krte hai. Jaisa ke aap ko yad hoga, me ne pichle lecture me aap ko convey kiya tha ke range jut hai that is a horizontal line segment starting from x naught and going up to x m aur quartile deviation jo hai that is half of the horizontal distance between the first quartile and the third quartile. Me ne aap se ye bhi kaha tha ke mean deviation ke case me bhi aur standard deviation ke liye bhi jo abhi hum thori der ke baat discuss karenge because the basic concept is just as in the case of the range and the quartile deviation that we are measuring the horizontal spread of our distribution is liye in dono cases may bhi the measure of spread is depicted as a horizontal line segment, so as you now see on the screen the mean deviation is expressed as a horizontal line segment and it is drawn below the x axis starting from the middle of the distribution that is starting from the point which represents the arithmetic mean. This mean deviation ke baare me hum ne kaafi tabseel ke saad baat kar liye, ek point bohot ehem hai jo me aap ko convey kar na chati. Dekhe ye jo absolute values hum ne liye hai of the deviations this step is not extremely defensible from the mathematical point of view. The argument is that we introduce a kind of artificiality in the calculation of the mean deviation by ignoring the algebraic signs of the deviations. So, from this particular standpoint the mean deviations formula is not an extremely preferable formula. Hum iss ke baat jo measure discuss karenge the standard deviation students in that you will notice that this problem has been overcome in such a way that our method is mathematically defensible and there is no problem. Lekin peshtar iss ke ke hum uski baat karenge I still have to discuss with you the relative measure of dispersion corresponding to the mean deviation. Aur iss ka bilkul wohi tari ka hai jis taraka range ke case me ya quartile deviation ke case me tha we will simply divide the mean deviation by the mean and obtain what is called the coefficient of mean deviation. The moment we divide the mean deviation by the mean we obtain a pure number and if I multiply this quantity by 100 my answer will be expressible in percentage form. Students mean deviation ke havalese ek aur point me may aapko convekar du sometimes the mean deviation is computed by calculating the deviations not around the mean but around the median. So, as you now see on the screen in this case the formula for the mean deviation becomes sigma modulus of x minus x tilde divided by n and in the case of grouped data the formula will be very similar. The only difference is that we have an f coming after the sigma sign and x represents not the individual data values but the mid points of the various classes. Aur eisa kis situation me hoga jab hum deviations mean ke bhajaye median ke around compute karenge? Saaf zahir hai students aisa usi case me hona chahiye na when the median is the more appropriate average for our data set aur ye to me aapko pehle hi convekar chuki hum ke in that situation when our data set contains a few very high or very low values compared with the bulk of the data then it is the median which is a better average to use than the arithmetic mean. So, aisi situation me hum ye modified formula is tamal kar sakte hain aur is case me jo corresponding relative measure of dispersion hoga that will be given by coefficient of mean deviation is equal to mean deviation divided by the median. Alright, let us now begin the discussion of the standard deviation the most important and the most widely used measure of dispersion. Students ye ju me aap se abhi thori deh pehle kaha ke mean deviation me ye problem hai ke absolute values jo hum lete hain this step is not extremely defensible from the mathematical point of view. Iska jo hal hum ne dhunda hain wo ye hai ke rather than taking the absolute values we will square the deviations. And the moment you square the deviations students you will find that all your problems are over if your deviation is a negative number the square is going to be positive. If your deviation is itself positive obviously the square is going to be positive. To iss tara se hum wo jo negative aur positive case sum honne se hume answer 0 milta tha usko hum over come karthe hain iss situation me by taking the squares. And when we do this and after that adding all these squares and dividing by the number of observations this particular quantity is called the variance. Formally speaking the variance is defined as the sum of the squares of the deviations of the x values from the mean and this sum divided by the number of observations. Symbolically the variance is equal to sigma x minus x bar whole square over n ye jo cheese hum ne abhi define ki the variance iss ki bhi bohat emeyat hai statistical analysis me and when you go on to further study in the subject you will realize its significance all the more. Fil hal me ye kahan chaati hu ke standard deviation compute karne se pehle hum variance ki baat karthe hain jaisa ke me ne abhi ki and once we have defined the variance as the sum of the squares of the deviations of the observations from the mean and this sum divided by n iss ke baat taking the positive square root of this quantity we get what is called the standard deviation. So, as you now see on the screen the standard deviation is given by the square root of sigma x minus x bar whole square over n aye usii example pehle jiska zikar thodi deer pehle mean deviation ke hawale se me kar rahi thi as you will recall the x values in that example were 4 6 2 0 3 5 and 8 and this data pertained to the number of deaths in motorway accidents during one particular week. Now as before when we take the deviations x minus x bar our values come out to be 0 plus 2 minus 2 minus 4 minus 1 plus 1 and plus 4. Now this time we are not going to take the absolute values of these deviations rather we would like to square each and every one of them and doing so the third column in our table comes out to be 0 4 4 16 1 1 and 16. To apne dekha ke sare squares obviusly positive hai and now when I add these squares the sum is 42 and dividing 42 by 7 my variance comes out to be 6 ab yaha pe ek aur bohati important point saamne aathar. Students you will be interested to know that because of this process of squaring the variance is expressed in square units abhi jo example hum ne consider kia ish me hum you kahenge that the variance is equal to 6 squared deaths abh zahir hai ke ek ordinary conversation ke point of view se iss vaat ka to bilkul koi matlab nahi What do we mean by 6 squared deaths? Lekin from the mathematical standpoint this is exactly how it is because of the squaring of those deviations the variance has to be expressed in square units. How do we overcome this problem jo main aap se thori der pehle kaha that the standard deviation is defined as the positive square root of the variance students this is the answer to this problem. The moment you take the square root we are back to our original unit of measurement. So, in this example when I take the square root of the variance that is the square root of 6 the answer is 2.45. Yani iss example me hum ye kaya rahe hai that the standard deviation of the number of deaths is 2.45 aaye zara iss point ko tabseel ke saath samajne ki koshish karthe hain. Aap ko yaad hoga iss example me the average number of deaths per day was 4. Yani hum goya ye kaya rahe hai agar un saaton dino me if every single day there was the same number of deaths from motorway accidents if it was the same number on every single day that number would have been 4. Lekin hum jaanthe hain ke kisi din kam deaths hui kisi din zyada deaths hui. Ye jo difference hain between the average number of deaths that is 4 and the individual number of deaths for any particular day hum standard deviation ki roo se goya hum ye kaya rahe hain ke wo jo difference hain a on the average that is equal to 2.45 deaths. Students this formula of the standard deviation that I have just discussed with you iss me thori si dikkat aasakthe hain aap ko computation karthe hui if your x bar is not a whole number abhi jo example tha usme toh it was very simple and x bar was equal to 4. So, we had no problem in computing the deviations of the individual x values from x bar, but if your x bar is a number like 2.3174 to aap agri karenge ke iss situation me jo deviations hain me compute karne hongi they will come out to be in decimals aur phir jab unko square karenge toh number of decimals kaafi bar jayenge and the calculation will become abit kambasam. So, in order to overcome this problem I will present to you the short cut formula of the standard deviation and as you now see on the screen according to the short cut formula the standard deviation is equal to the square root of sigma x square over n minus sigma x over n whole square. The reason is that in order to apply this particular version of the formula all you need is to construct a column of x square and you find the sum of the x column as well as the x square column substitute them in your formula and you obtain your standard deviation. Applying this formula in our example sigma x comes out to be 28 and sigma x square comes out to be 154 substituting these values in the short cut formula our answer is 2.45 exactly the same as what we had earlier students the formula that I just discussed with you the original formula as well as the short cut formula these are valid in the case of raw data what do we do in the case of grouped data exactly the same thing as what we have been doing all through we will insert the f in the formula and as you now see on the screen the original formula becomes standard deviation is equal to the square root of sigma f into x minus x bar whole square divided by n and the short cut formula becomes the standard deviation is equal to the square root of sigma f x square over n minus sigma f x over n whole square. Let us now apply this formula to an example as you now see on the screen suppose that we have data regarding the life of bulbs produced in a factory suppose that we have taken a sample of 100 bulbs and we put them to test in order to determine their life and upon completion of the test the life of these bulbs came out as you see in the first and second columns of the table four bulbs were such which lasted between 0 and 500 hours 9 bulbs were such whose life was somewhere between 5 and 10 hundred hours 38 bulbs lasted between 10 and 20 hundred hours and 33 between 20 and 40 hundred hours also 16 bulbs were such whose life was either 40 hundred hours or more. Suppose we want to measure this variation by way of the standard deviation according to the short cut formula that I convey to you all we have to do is to first of all determine the midpoint of every class that is x and then to construct two more columns a column of f x and a column of f x square. So, as you now see on the screen the column of f x gives us the sum 2437.5 and the column of f x square gives us the sum equal to 78781.25 substituting these values in the formula of the standard deviation the standard deviation comes out to be 13.900 hours in other words 1390 hours. So, I hope you understand students from all the discussion that we have done until now that this figure of 1390 hours does not represent the mean life of these bulbs. And as you can see on the screen the mean life of these bulbs comes out to be 24.375 hundred hours. This 1390 figure that is not representing the mean life rather it is representing the dispersion the variation of the individual life lengths of those bulbs from the mean. Standard deviation and the mean deviation. So, as you now see on the screen the standard deviation is also represented by a horizontal line segment which is drawn under the x axis and which starts from the center of the distribution that is from that point which represents the arithmetic mean. Students the standard deviation is an absolute measure of dispersion it is expressing the scatter of the distribution in exactly the same units as the units of the data itself. As we have seen in the example of traffic deaths the standard deviation was also expressed as so many deaths. And now we have seen in the example of the life length of the bulbs that 1390 is 1390 hours exactly the same unit in which our original data has been represented. What is the relative measure of dispersion corresponding to the standard deviation? Students as you now see on the screen if I divide the standard deviation by the mean I obtain what is called the coefficient of standard deviation and if I multiply this quantity by 100 I obtain what is called the coefficient of variation and it is a quantity which is of quite a lot of importance in statistical analysis. Students because we are multiplying s over x bar by 100 so it is obvious that our result will be in the form of a percentage and as I have said before when you express the percentage form of dispersion which is relative to the mean. So, it is the fact that we can compare variability of two different data sets even if the units of measurement of the two data sets are very very different. Let me explain this point to you with the help of an example. Suppose that in a particular year the mean weekly earnings of skilled factory workers in one particular country were 19.50 dollars with a standard deviation of 4 dollars while for another country suppose that the figures were 75 rupees and 28 rupees respectively. You have seen that in a country the figures have been quoted in dollars while in another country the corresponding figures have been quoted in rupees. And the second thing is that the figures I have given to you cannot be completely clearly stated in which country there is greater variation in the weekly earnings of the workers. In such a situation the coefficient of variation comes to our rescue and as you now see on the screen for country number 1 the coefficient of variation is 4 over 19.5 into 100 and that is 20.5 percent whereas for country number 2 the coefficient of variation is 28 over 75 into 100 and that is 37.3 percent. So, you have seen that for country number 2 where earnings were quoted in rupees the variation of the individual earnings from the mean is 37.3 percent. Jab ke pehli country jis me earnings dollars me quote ki jari thi the variation in the earnings relative to the mean is 20.5 percent only. And so it should be obvious that in the second country this variability is approximately double of the variability in the first country. Isliye ke repeating myself pehli country me coefficient of variation is only 20.5 percent like in dusri country me it is 37.3 percent. Let us consider another example. Suppose that the crop yield from 21 acre plots of wheat land cultivated by ordinary methods averages 35 bushels with a standard deviation of 10 bushels. The yield from similar land treated with a new fertilizer averages 58 bushels also with a standard deviation of 10 bushels. Students abhi jo example apne dekhah at first glance you might think that the variability is the same in the untreated land and the treated land. Likin agar aap thoda sa war karein to ap note kareinge that the variability in the yield from farm to farm has decreased in the treated land as compared with the untreated one. And how do I come to this conclusion? The easiest way is to compute the coefficient of variation. So, as you now see on the screen the coefficient of variation for the untreated land is 10 over 35 into 100 which is 28.57 percent. But the same quantity for the treated land comes out to be 10 over 58 into 100 which is 17.24 percent. Un-treated land jo hai usme average yield 35 bushels aur jo standard deviation hai that is 10 bushels. Likin jo treated land usme jo average yield hai that is much higher than in the case of the untreated one. Isliye ke treated land me average yield jo hai that is 58 bushels. Ab students samajne ki baat ye hai, ke 10 ka jo figure hai, wo jo variation hai of 10 uski jo significance with reference to 35 hai, wo usse boht mukhtalif hai jo uski significance with reference to 58 banti hai. Keine ka makhsad ye hai, ke 10 is a much smaller dispersion relative to 58 than what it is relative to 35. And this is exactly the point that is conveyed when we compute the coefficient of variation. Alright, today students we have covered quite a lot of ground. I discussed with you the concept of the mean deviation as well as the coefficient of mean deviation and then we went on to talk about the variance and the standard deviation. The last thing I discussed with you was the relative measure of dispersion known as the coefficient of variation. I would like to encourage you to practice with these concepts by attempting quite a few numerical questions and also by studying these concepts not only in your textbook but in other books as well. My very best wishes to you and until next time Allah Hafiz.