 Assalamu alaikum, welcome to lecture number 7 of the course on statistics and probability. Students, aapko yaad hoga, ke pichle lecture me, hamne doh bade interesting diagrams discuss kiye the stem and leaf plot and the dot plot. After that, I began the discussion of the concept of central tendency and in that context, I discussed with you in some detail the concept of the mode. Today, I will continue with that concept and will discuss with you the non-modal as well as the bimodal situation and after that, we will go on to other measures of central tendency. Aaye, pehle ham us concept ki thodisi revision karletehe jo hamne last lecture ke end me kia tha, that is the computation of the mode in case of the frequency distribution of a continuous variable. You will recall that I used exactly the same example that we have been using for quite some time now and that of the e p a mileage ratings of cars and to that frequency distribution, I applied the formula x hat is equal to L plus f m minus f 1 over f m minus f 1 plus f m minus f 2 into h, where L and f m f 1 f 2 and h all had their particular meanings which I described and applying this formula, we obtained x hat equal to 37.825 miles per gallon. Uske baad hamne mode ki graphic location consider ki thi aur ussi data ka jo frequency polygon tha uske upar yani uske x axis ke upar mode ko locate kiya aur dekha ke it was lying exactly under the top most point of the frequency polygon. Jaisa ke aapne dekha, maara jo mode hai is diagram me x axis par us polygon ke niche center milai kar rai almost in the middle aur main last time b aap se yehi kaha tha ke aksar o bestar distributions junke hump shaped hoti hai ishi tara se ke uska jo maximum point hai wo middle me hota hai somewhat. Isliye the mode is regarded as a measure of central tendency. Let us consolidate this concept by going back to the example of the ages of the managers of child care centers that was considered in the last lecture. The students will recall that the example was as follows. The following table contains the ages of 50 managers of child care centers in five cities of a developed country. The ages were 42, 26, 32 and so on. Convert this data into a frequency distribution and find the modal age. Now students you will recall that following the various steps involved in the construction of a frequency distribution we obtained classes were 20 to 29, 30 to 39, 40 to 49 and so on and the frequencies were 6, 18, 11, 11, 3 and 1. Now in order to find the mode we note that in this example we are dealing with a continuous variable that is age. Hence the mode is obtained by the formula x hat is equal to L plus f m minus f 1 over f m minus f 1 plus f m minus f 2 and this whole expression multiplied by h. Now in order to apply this formula the first step is the determination of the class boundaries because as you know L stands for the lower class boundary of the modal class and as indicated in the last lecture the class boundaries for this example are 19.5 to 29.5, 29.5 to 39.5 and so on. The maximum frequency f m is 18 and it lies in the age group 30 to 39 therefore students the class 30 to 39 is the modal class and the lower boundary of this particular class is 29.5 in other words L is equal to 29.5. Now of course f m has already been found but we also need to determine f 1 and f 2 and going back to the table we see that f 1 is equal to 6 and f 2 is equal to 11. F 1 is the frequency of the class preceding the modal class and f 2 the frequency of the class following the modal class. Now the class interval of the modal class is 10 because 29.5 to 39.5 means that the interval is 10. Applying all these values that we have determined in the formula of the mode the mode comes out to be 35.8. Hence we conclude that in this particular sample of managers of the childcare centers the modal age in other words the most frequent age can be regarded as 35.8 years. Let us now locate the mode on the graph of this particular frequency distribution. Students you will recall that in lecture number 4 we constructed the histogram of this distribution and it was as you now see on the screen. Also the frequency polygon was as you now see and superimposing the frequency polygon on top of the histogram we obtain this rather interesting picture. But students is wakth we want to locate the mode on this graph. As you know the mode is a value of our variable x or it is denoted by x hat and in this particular example because of the fact that x hat has come out to be 35.8 therefore in this diagram we locate this value on the x axis. It is a point almost directly under the highest point of our frequency polygon. Please note that I said almost directly under. As you will recall that is constructed by joining the points which are plotted against the mid points of the various classes or if we determine this particular class mid point determine curry the class which is the modal class that mid point is equal to 29.5 plus 39.5 over 2 and that is equal to 34.5. So, x hat which we have determined from this formula which I have presented to you x hat is equal to 35.8 which is somewhat to the right of 34.5. In fact, students you should note that some statisticians even regard the mid point of the modal class as the mode of that particular frequency distribution. But the formula that I have presented may be regarded as a better way of determining the accurate value of the mode. The mode has some very desirable properties. It is easily understood easily ascertained and also one very important point is that it is not affected by a few very high or very low values. Students, in what situations should we use the mode? Actually it is a very valuable concept in certain situations and I will try to explain this point with the help of an example. Suppose that the manager of a men's clothing store is asked about the average size of hats that are sold. He will probably think not of the arithmetic or the geometric mean size or even the median size. Instead he will in all likelihood quote that particular size which is sold most often. Students, this concept of the most frequent value is of far more use to a businessman than the arithmetic mean or the geometric mean. The modal size of all clothing is the size which this businessman must stock in the greatest quantity and variety in comparison with any other size. So, you see that inventory point of view say this particular concept is very significant and much more important than some other measures of central tendency which in some other situations may be very very important. I said to you that I will also discuss with you the non-modal and the bimodal situation. Can I come up to the students that there is no mode because all the values are occurring equally often. Especially if it is a small data set. You can also have situations when you have more than one mode and if you have two that will be called a bimodal situation and the graph will be as you now see on the screen. Students, a very easy method of differentiating between a unimodal and a bimodal situation. You know that Pakistani camel has one hump but Chinese camel has two humps. That is the easy way to remember it. Students, this brings us to the end of the discussion regarding the mode. Let us now begin the discussion of the arithmetic mean. That value which is numerically the most representative of a variable series. Of course, as you know this is the most widely used average. It is fairly in fact very easy to calculate and it is the most well accepted. Let us have its formal definition. The arithmetic mean or simply the mean is a value obtained by dividing the sum of all the observations by their number. In case of sample data, the notation is x bar and x bar is equal to summation x i over n where i goes from 1 to n. Students, this is capital sigma and it denotes the sum. Small sigma that denotes the standard deviation which we will be doing in an x lecture. In this formula of course, x i represents the ith x value. In other words, the ith value of our data set. But for simplicity, we also many times write this formula as simply sigma x over n. Yani the subscript i is dropped. Let me now explain this concept with the help of a simple example. Suppose that we have information regarding the receipts of a news agent for 7 days of a particular week and suppose that the receipts are 9.90 pounds, 7.75 pounds, 19.50 pounds and so on for the 7 days of the week as you see on the screen. In order to compute the mean sales per day, all we have to do is add the 7 values and divide by 7 and doing that our mean sales value comes out to be 37.12 pounds. Students, let us try to interpret this result. This value 37.12 pounds sterling represents that amount which would have been obtained if the same amount was being obtained on each day. Students, just now we did the arithmetic mean in case of raw data. In the frequency distribution case, how we compute arithmetic mean? The point to understand here is that in case of a frequency distribution, the identity of the individual observations is lost as I mentioned earlier. What we do is that we assume that every value falling in a particular class is equal to the midpoint of that class and using that midpoint, we compute the arithmetic mean or the geometric mean or any other such measure. The midpoint of any class is also called its class mark. Now, if we have k classes in a frequency distribution of a continuous variable, then our class marks will be x1, x2 and so on up to xk and the corresponding frequencies will be f1, f2 and so on up to fk. Using these values, the formula for the arithmetic mean will be x bar equal to sigma fi xi over sigma fi, where i goes from 1 to k. It can also be written as x bar is equal to sigma fi xi over n. The simple reason being that the sum of the frequencies is equal to n, the total number of observations in our dataset. For simplicity, we drop the subscript i and our formula is x bar is equal to sigma fx over n. Let us apply this formula to the same example that we are very fond of the EPA mileage ratings of the cars. In this example, the midpoints of the classes as explained earlier will be found by adding the lower limit of any class to the upper limit of the class and dividing by 2 and doing so the midpoints are 31.45, 34.45, 37.45 and so on. Now, in order to compute the arithmetic mean, we first need to construct the column of fx, i.e. x column ki har value ko f column ki jo corresponding value hai uske saath multiply kar dije. So, 31.45 into 2 gives us 62.9, 34.45 into 4 is 137.8 and so on. Adding the column of fx, we obtain 1135.5 and dividing by 30, our arithmetic mean comes out to be 37.85 miles per gallon or is value ki interpretation kya hai that the average mileage of these cars tested by the environmental protection agency is 37.85 miles per gallon. Aapko yaad hoga ke kuch ter pehle hamne issi example ka mode compute kiya tha and that came out to be 37.825 miles per gallon. Aap dekh rahe hain ke fark bohot thoda hain. Lekin bahar hal ek fark hain and that is obvious. After all, both of these, the mode and the mean, they are two different measures of central tendency. Jab aapka formula hi different hain to laa mohala aapko answer thoda bohot mukhtalif hoga. Students, yaaha me aapke saath ek bohot important concept discuss karna chaati hoon and that is the concept of grouping error. Grouping error arises because of the fact that you have assumed in this computation that all the values falling in a particular class are equal to the midpoint of that class. Zahir hai ke in reality to wo sari values do kisi ek class me fall kar rahe hoti hain. Uske midpoint ke baraabar toh nahi hoti. But the moment you use the x values, the class marks, the midpoints in your computation of the mean, you are introducing this error. Aapka jo answer aega that will not be exactly the same as the answer that you would have got by using the raw data. Let us see what we have in this very example. You will recall that the mileage ratings of the 30 cars were 36.3, 30.1 and so on and adding these values and dividing by 30, the mean mileage rating comes out to be 37.82. Students, the difference between the true value of the mean that is 37.82 and the value of the mean that we obtained from the grouped data that is 37.85, as you can see it is actually very slight. Or yehi waja hai ke grouping error, arithmetic mean ke computation me zyada ehmiyat nahi rakta. Experience has shown that in the computation of the arithmetic mean, the grouping error is never serious. But of course, in the case of the standard deviation and in the case of some other quantities, the grouping error can have quite a significant effect on our answer. Students, the arithmetic mean is the predominantly used measure of central tendency of a data set. Is example mein agar aap uske frequency polygon peh dubhara nazar daale aur x axis pe arithmetic mean ko locate karne ki koshish karein. So, again you will find that it is lying in the center of the distribution. Not the exact center, but more or less in the middle of the data set. Students, the arithmetic mean has many desirable properties. As I said earlier, it is very easy to understand and to calculate and one of the very important points is that it is based on every single observation in our data set. But the problem is that because of this fact that it is based on every value, sometimes students, this mean is distorted. Agar aap ke data set mein jo values hai, un mein zyada fark nahi hai, they are more or less the same. Usk case mein to arithmetic mean bhat appropriate hai. But if there are a few very high or very low values in your data set, their effect would be to drag the arithmetic mean in their direction so that it no longer represents the bulk of the data what it is expected to do. Let me explain this point with the help of an example. Suppose one walks down the main street of a large city center and counts the number of floors in each building. Suppose the following answers are obtained 5, 4, 3, 4, 5, 4, 3, 4, 5, 20, 5, 6, 32, 8 and 27. Students agar is data set ka arithmetic mean compute karein to the answer is 9. But you should note that this is not at all a good representative of this data because 12 out of the 15 buildings that we have just counted, they have 6 floors or less. Yes, 12 out of 15, a very high percentage of the data, the bulk of the data, un sab buildings mein to 6 kya usse kam floors hai, jabke hamara arithmetic mean ye kaira hai that on the average there are 9 floors per building. So, this is exactly the way extreme values distort the arithmetic mean. In this example, the 3 skyscraper buildings have created a disproportionate effect on the arithmetic mean. Students abhi hamne arithmetic mean ka concept discus ki aar. Us arithmetic mean ka jise simple arithmetic mean kayaate hain yaad usse labzo mein unweighted arithmetic mean. But then we have another very important concept and that is the weighted arithmetic mean. Bah situations mein, hame jo hamari values hain unko weightage deni padti hai, different weightage for every value depending on the situation. And let me explain this to you now with the help of an example. Suppose that in a particular high school, there are 100 freshmen, 80 Sophomores, 70 juniors and 50 seniors. And also suppose that on a given day 15 percent of the freshmen, 5 percent of the Sophomores, 10 percent of the juniors and 2 percent of the seniors are absent. The problem is that what percentage of the students is absent in that school on that particular day on the whole. Agar ham in values ka simple arithmetic mean compute karein to hamara jo answer aega that will be incorrect. If we do compute the simple arithmetic mean of the 4 values that we have, we obtain 15 plus 5 plus 10 plus 2 over 4 equal to 8. Goya ham ye karein hain, ke on the average, kisi bhi ek category of students mein, 8 percent of the students are absent. But actually it is possible for us to find out why this calculation is incorrect. Aur main aapke saath iss tisko step by step discuss karungi. Jaise ke aapne dekhah, jo freshman ke category hain usme hamare paas 100 students hain. Unme se 15 percent were absent. So this means that the total number of students absent from this category is 15. Iske baraks jo Sophomores ke category hain usme total number of students is 80 and 5 percent of them are absent. But that means that the total number of students who are absent in this category is only 4. Bilkul isshi tara 70 juniors me se agar 10 percent absent hain to iss ka matlab hain ke 7 students absent hain. Aur 50 seniors me se agar 2 percent hain to iss ka matlab hain ke sif ek student absent. In this way the total number of students who are absent in this school on that particular day is 27. And if we divide 27 by the total number enrolled that is 300 and multiply by 100 the percentage comes out to be 9 percent. Aapne dekhah tha ke jab hamne arithmetik mean simple compute kiya to hamara ansar tha 8 percent and that was wrong. Students this brings us to a very important observation aur wo ye ke iss khusam ke situation mein jab hamare number of students enrolled in the various categories barabar nahi hain we cannot use the simple mean for the absenteeism figures yani ham un absenteeism ke figures ko equal weightage nahi desakte. We have to assign to them different weights in accordance with the group size for each one of those categories. Is example mein ye jo group sizes hain they are acting as the weights and we need to multiply each one of the absenteeism figures by these weights in order to obtain the correct answer. The formal formula for the arithmetic mean is sigma w i x i over sigma w i and applying it in this example we obtain sigma w i x i equal to 2700 and sigma w i equal to 300 dividing 2700 by 300 we obtain the value 9. It is certainly the same answer as what we obtained a short while ago as the correct value. So, it is obvious that there are certain situations where all the x values cannot be regarded as being of equal weightage and in that situation we modify the formula of the arithmetic mean and apply the formula of the weighted mean. The next concept that I am going to pick up is that of the median. Let me explain this concept with the help of an example, abhi thori der pehle ham ne wo example discus kiya jisme ham ne number of floors in the buildings ki baat ki aur abhi ye bhi yad hoga ke ham ne dekha ke arithmetic mean it did not represent the data properly because whereas, 12 out of 15 of the buildings had 6 floors or less our arithmetic mean came out to be 9 floors. This situation may the median may come to our rescue. The median students is the middle value of a data set once we have arranged those values in either ascending or descending order. In other words, the median is defined as a value which divides a set of data into two halves, one half comprising of observations greater than and the other half smaller than the median. More precisely, the median is a value at or below which 50 percent of the data lie. Students, the median can be ascertained very easily in many situations and especially in the case of raw data. Going back to the same example that I just discussed, the average number of floors in the buildings at the center of the city are 5, 4, 3, 4 and so on and arranging these values in ascending order we obtain 3, 3, 4, 4, 4 and so on. Picking up the middle value, we obtain the median x tilde equal to 5, the median number of floors is 5. Out of those 15 buildings, 7 buildings have 5 floors or less and 7 buildings have 5 floors or more, so that 5 is the middle value in that ordered data set. In this example, the arithmetic mean was distorted toward the few very high values. As you will all agree, this is much more representative of this data than the mean. This is the advantage of the median. If there are some extreme values in your data set, then there is no effect on the median. Simply because the median is not computed by, for example, adding all the values and dividing by the number, the extreme values are not involved in it. It is simply the middle value. Let us consider another example. Suppose that the retail price of motor cars for several makes and sizes is available to us and the values are 415 pounds sterling, 480, 525 and so on. In this data set, there are 9 values in all and after having arranged them in ascending order, the median price comes out to be 719. Once again, you notice that it is very simple. All you had to do was to pick up the middle value. You have an even number of values. If you have 8 values, then the fourth value is not the middle value. If you say, all right, I will pick up the fifth one, then that is the problem. So, what do you do in this kind of a situation? What we do is to take the arithmetic mean of the two middle values. I will explain this to you now with the help of the example that you now see on the screen. Suppose that the number of passengers travelling on a bus at six different times during a particular day were 4, 9, 14, 18, 23 and 47. This data is already arranged in order. It is not necessary that it is ordered like this, but because we want to compute the median, we have arranged it in ascending order. Now, you can see that there are two middle values, 14 and 18. These middle values are like this. If we see them jointly, then there are two values before and after these two values. Computing the arithmetic mean of the two, we obtain the median equal to 16 passengers. All the examples that we have done until now pertain to raw data. What will we do if we have the case of a discrete frequency distribution? Let me explain this point with the help of an example. Suppose that we have the data regarding the number of pupils per class in some particular comprehensive school. Now, 23 students is only one class. There is no class in which there are 24 students. But then we have three classes having 26 students each and similarly, we can interpret the entire table. Now, the point is that if we write this data in the form of raw data, it will be a range data set. In other words, in this case, this very frequency distribution converts to the raw data 23, 25, 26, 26, 26 and then 27, 6 times, 28, 9 times and so on. Total pentales values and that is an odd number. So, we do not have any such problem as having two middle values. Now, 45 values that is the 23rd value. So, you simply have to pick up the 23rd value and that is your median. But, if I had to convert this distribution to raw data, then what is the fun? I had converted it to the frequency distribution because my raw data would come in a concise form. I gave you this point just to tell you that even though it is concisely presented in the form of a discrete frequency distribution, our goal is to pick up the 23rd value and it is very easily achieved if we construct the column of cumulative frequencies as you now see on the screen. The cumulative frequencies are 1, 2, 5, 11, 20, 28, 38 and 45 and the cumulative frequency 28 corresponds to the x value 29, whereas the cumulative frequency 20 corresponds to the x value 28. It is obvious that the 23rd value does not lie among the first 20, but it does lie among the first 28 and hence its x value is 29. Yani, wo jo taisvi observation hai, wo jo class hai that contains 29 students. Once again, how do we interpret this result? Ke wo jo 45 classes hai wo school mein, unke jo class sizes hai wo jo average value hai, average class size that is 29. But, average in what sense? Not in the modal sense, not in the arithmetic mean sense, but in the sense of the median. Ke agar hum un tamam classes ko ascending order mein arrange karde based on the class size, to darmean wali class jo hai usme 29 students hai. Baise classes aisi hai, jinme 29 se kum hai aur 22 classes aisi hai, jinme 29 se zada hai. I think you will appreciate that the median is such a measure of central tendency in which this concept of center comes out very, very clearly. Alright, now that we have discussed the mode mean and median in considerable detail students, let us consider another example that illustrates all three of these concepts. Displayed in the following table are the annual attendance figures in millions of visitors of 32 US public zoological parks. The attendance figures are in millions 0.6, 0.9, 0.2 and so on. For these data, measures of location can yield such information as the average attendance for the zoos, the middle attendance figure and the most frequently occurring attendance figure. So, we would like to compute the mean, median and the mode for the attendance figures listed in the above table. Now, students of course the computation of the mean is very simple. All we have to do is to add all the values and divide by 32 and in this way we obtain x bar is equal to 1.3 million. Now, as far as the computation of the median is concerned, the first step is to arrange the data in an ordered array and when we do that we have 0.3, 0.4, 0.5 and after that 0.6 occurs 5 times and then we have 0.7 and so on. Now, in order to compute the median, the first point to be noted is that in this example students we are dealing with an even number of values that is 32 and you will recall that in the earlier discussion it was mentioned that whenever we are dealing with an even number of values we compute the average of the n by 2th and the n plus 2 by 2th values of the ordered data set. But students here let me discuss with you another way of looking at this situation. As there are 32 values we can say that the median is located at n plus 1 by 2th value and that is 16.5th value. Now, what do we mean by 16.5th value? By the 16.5th value we mean the value that is located halfway between the 16th value and the 17th value. Let me explain this point to you through the figure that you now have on the screen. If you look at x axis i.e. real line per 15th, 16th, 17th, 18th values. So, the middle of the distance between the 16th and the 17th values can be regarded as the 16.5th value. In other words by the 16.5th value we will mean the average of the 16th value and the 17th value which is exactly the same as what we have been considering earlier. After all students is not 16 exactly 32 over 2 and is not 17 exactly equal to 32 plus 2 over 2. We are still doing exactly the same thing the average of the n by 2th value and the n plus 2 by 2th value. In this particular example the 16th value is 1.0 and the 17th value is 1.1 and therefore finding their average the median comes out to be 1.05 million. Last but not the least we are interested in the computation of the mode. Now by inspecting the attendance figures we find that 0.6 is occurring 5 times whereas all the other figures are occurring less often. Hence the mode is equal to 0.6 million. So, the conclusion is that the mean or the average attendance at the zoos is 1.3 million, the median in other words the middle attendance figure for these 32 zoos is 1.05 million and the mode that is the most frequently occurring attendance figure is 0.6 million. Students, in today's lecture we have discussed three different measures of central attendance. We started with the mode which was actually a continuation of what we had started last time and then we went on to the arithmetic mean and also its modified version they weighted arithmetic mean. Lastly, we have discussed the median and we have computed the median in case of raw data as well as in the case of the frequency distribution of a discrete variable. Next time I will discuss with you the computation of the median in case of the frequency distribution of a continuous variable. The same EPA mileage ratings example that we are extremely fond of or I am taking a case situation may we have a formula that will enable us to locate the median in case of a frequency distribution of a continuous variable. Also I will discuss with you a very interesting concept and that is the empirical relationship between the mean median and the mode. In the meantime I would like to encourage you to attempt quite a few questions from the exercise of your textbook and also from many other books as many as you can get hold of and I wish you the best and Allah Hafiz.