 As-Salaam-Alaikum. Welcome to lecture number 9 of the course on statistics and probability. You will recall that in the last lecture and the lecture before, we have been discussing various measures of central tendency. In other words, averages. We started from the mode. We went on to the arithmetic mean and the weighted mean. And after that, we discussed the median. After that, you will remember that the concept of partitioning involved in the median, we talked about the quartiles, deciles and percentiles. Today, we will discuss the geometric mean and the harmonic mean. And after that, I will also briefly mention a few other measures of central tendency. So, let us begin with the geometric mean. The geometric mean of n positive values x 1, x 2, so on up to x n is defined as the positive nth root of their product. In other words, g is equal to the nth root of x 1 into x 2 into x 3, so on up to x n, where each of the x values is greater than 0. You note that I told you that these values should be positive. You think that if any value is equal to 0 in your data set, then your geometric mean will become 0. Because when you multiply that value with all the other values, regardless of whatever the other values might be, this 1, 0 value will make the product equal to 0. So, that is why the geometric mean is defined in this manner, that it is the nth root of the product of the values when the values are all positive. The second thing to note is that it is easy to apply this formula if the number of data values is not very large. But if you have a considerably large number of values, students, you will agree that it may be quite laborious to find the nth root. Well, of course, if you have a computer at your disposal, then it is not difficult. But if you are doing it with an ordinary calculator, it may be quite difficult to find the ninth root, for example, if you have nine values. This problem is overcome by the use of logarithms. Taking logarithms to the base 10, our formula becomes log of G is equal to 1 over n times log of x 1 plus log of x 2 plus so on up to log of x n. Hence, the formula becomes log of G is equal to the sum of the logarithms. Hence, the formula becomes log of G is equal to sigma log x over n. But since we are interested in finding G, therefore, we take the antilog and the final formula is that G is equal to antilog of sigma log x over n. Let me apply this concept of the geometric mean to a very simple example. Suppose that we have values 45, 32, 37, 46, 39, 36, 41, 48 and 36. If we want to apply the original formula, then we will need to compute the numerical value of the ninth root of the product of all these values as you now see on the screen. But in order to simplify the computations, we will take the log of each of these values and we obtain the figures as you now see. Log of 45 is 1.6532, log of 32 gives us 1.5052 and so on. Adding this column of log x, the sum is 14.3870 dividing that by the total number of observations that is 9, log of G comes out to be 1.5986. The last step is to take the antilog and doing so, the geometric mean of these 9 values comes out to be 39.68. This is the formula for the geometric mean in case of raw data. Now, let us see what will be the modified version of this formula in case of grouped data. As you now see on the screen, the formula in this case will be that G is equal to the nth root of x 1 raised to f 1 into x 2 raised to f 2 and so on up to x k raised to f k. In other words, each value of x has to be multiplied by itself f times before we go to the step of taking the nth root. In terms of logarithms, the formula becomes log of G is equal to 1 over n multiplied by f 1 into log of x 1 plus f 2 into log of x 2 plus so on up to f k into log of x k which is equal to sigma f log x over n and taking the antilogithm G is equal to antilog of sigma f log x over n. Obviously, this formula is much easier to handle than the first one in which we had to take all those powers. Students, let me now apply this formula to the example of the EPA mileage ratings that we have been dealing with. As you now see on the screen, the x values that is the midpoints of the various classes are 31.45, 34.45, 37.45 and so on exactly the same as what we had earlier. And now in order to compute the geometric mean, we have to take the logarithm of each one of these x values and multiply that logarithm by the corresponding frequency. So, in this manner, we obtain the last column that you now see on the screen and the sum of this column comes out to be 47.3042. Dividing it by the total number of cars that we had that is 30, we get 1.5768 and taking the antilog of this number, the geometric mean comes out to be 37.74 miles per gallon. You will recall that when we found the arithmetic mean of the same data set, the answer was somewhat different. That is obvious. The formula of the mean is quite different from that of the geometric mean. And so, it is obvious that we will have some difference. The question arises that the different formulas are available to us. From these, which formula should be applied? In which situation we should apply the arithmetic mean or in which situation we should apply the geometric mean? So, the answer to this question as far as the geometric mean is concerned, let me explain this point to you with the help of an example. Suppose that it is found that a firm's turnover has increased during 4 years by the following amounts. In 1958, it was 2000 pounds whereas, in 1959, it became 2500 so that the percentage compared with the earlier year is 125. In other words, the turnover here that is 125 percent of the turnover of 1958. Similarly, the turnover for 1960 is 5000 pounds and that is 200 percent of the turnover of 1959. Also, the turnover for 1961 which is 7500 pounds sterling is 150 percent of the turnover of 1960. So, you have noted that the turnover of that company is not increasing at a constant rate but it is increasing at different rates over different years. Now, suppose that the owner of the company is interested in finding the average rate of growth of the turnover per year. If he finds the arithmetic mean, he will obtain 125 plus 200 plus 150 plus 140 divided by 4 equal to 153.75 percent as the average annual rate of turnover growth. But students, if this percentage is used to calculate the turnover from 1958 to 1962 inclusive, we find that this will give us an incorrect answer. But students, if we use this figure to compute the turnover for the 4 years, we will find that the answer is incorrect. As you now see on the screen, if we utilize this figure, we obtain 153.75 percent of 2000 equal to 3075, 153.75 percent of 3075 equal to 4728, 153.75 percent of 4728 equal to 7269, and 153.75 percent of 7269 equals 11176. That means, the turnover comes out to be 11176 pounds sterling whereas the actual turnover figure for 1962 is 10,500 pounds sterling. Students, you saw that arithmetic mean gave us an incorrect result at the end. Let us see if we use geometric mean in this situation, what do we get in this case? As you now see on the screen, the geometric mean of the turnover figures is 125 into 200 into 150 into 140 and the fourth root of this product comes out to be 151.37 percent. And if we utilize this value to obtain the individual turnover figures, we find that 151.37 percent of 2000 is 3027, 151.37 percent of 3027 is 4583 and continuing in this manner, the final figure comes out to be 10500 exactly the same as what we had in the original data. Students, if this company's turnover increases at a constant rate, that rate of growth would have been 51.37 percent. Please note Keje, I did not say 151.37 percent, I said the rate of increase if it was constant, if it was the same from year to year, it would be 51.37 percent. So, I hope students that this example has been able to convey to you very clearly the kind of situation in which this is the most appropriate average to use. But of course, this kind of a situation is encountered relatively less frequently in practice. Jo sub sejada commonly used measure of central tendency here, wo toh baharhal arithmetic mean he here. Let us consider another example that illustrates the interpretation of the geometric mean. Suppose that a factory worker receives a 5 percent increase in salary this year and a 15 percent increase next year. It should be noted students that a 5 percent increase in salary means that if the salary was rupees 100 at the beginning of the first year, then it becomes 105 at the beginning of the second year. In other words, the salary at the beginning of the second year is 105 percent of the salary at the beginning of the first year or we can say that the value is 1.05. Similarly, it should be noted that a 15 percent increase in salary means that if the salary was 100 at the beginning of the second year, then it becomes 115 at the beginning of the third year. In other words, the salary at the beginning of the third year is 115 percent of the salary at the beginning of the second year or 1.15. Now students if we compute the arithmetic mean of these two values, we obtain 1.05 plus 1.15 divided by 2 and that is equal to 1.10. In other words, according to this particular formula we can say that on the average the salary at the end of a year is 110 percent of the salary at the beginning of the year. In other words, according to this formula, the average annual increase is 10 percent. But students it is important to note that in this particular situation it is the geometric mean and not the arithmetic mean that provides the correct answer. The geometric mean of the two values 1.05 and 1.15 comes out to be 1.09886. In other words, according to this particular formula on the average the salary at the end of a year is 109.886 percent of the salary at the beginning of the year. In other words, according to this formula the average percent increase is 9.886 not 10 as we obtained earlier. Now in order to verify that it is the geometric mean and not the arithmetic mean that provides the correct answer, students let us assume that the monthly earning of the factory worker was rupees 3000 to start with and he received two increases of 5 percent and 15 percent respectively. So raise number 1 is 3000 into 0.05 and that is 150 rupees. Therefore, at the beginning of the second year his salary became rupees 3150. Also raise number 2 is rupees 3150 into 0.15 and that is equal to rupees 472.50. Therefore, students adding the two increments in salary the total increment is rupees 622.50. Now if we calculate the raise according to the geometric mean that we obtained a short while ago, we have rupees 3000 into 0.09886 equal to rupees 296.58 as raise number 1 and rupees 3296.58 into 0.09886 equal to rupees 325.90 as raise number 2 and students if we add these two increments we obtain rupees 622.48 or 622.5 exactly the same as what we obtained just now when we added the actual two increments. Hence, it is clear that in this type of a situation it is the geometric mean and not the arithmetic mean that provides the correct answer. The next measure of central tendency that I am going to discuss with you is the harmonic mean. Students harmonic mean is very interesting. It is defined as the reciprocal of the arithmetic mean of the reciprocals of the values. I know, let me repeat it for you. The harmonic mean is defined as the reciprocals of the arithmetic mean of the reciprocals of the values. Symbolically it is equal to n over summation 1 over x. As you can see, if we take this formula as reciprocal that will be summation 1 over x divided by n and you know that if you divide sum or total number of values then that is the arithmetic mean of those quantities. So, if summation 1 over x over n means that we are talking about the arithmetic mean of the 1 over x values. So, that thing will be equal to 1 over the harmonic mean and when we take the reciprocal of that we obtain the harmonic mean as I said earlier equal to n over summation 1 over x. Now, this is the formula in case of raw data and in the case of grouped data that is a frequency distribution the formula becomes n over summation f into 1 over x where x represents the midpoints of the various classes. Let me illustrate the computation of the harmonic mean with reference to a simple example. Suppose a car travels 100 miles with 10 stops each stop after an interval of 10 miles. Suppose that the speeds at which the car travels these 10 intervals are 30, 35, 40, 40, 45, 40, 50, 55, 55 and 30 miles per hour respectively. The question is what is the average speed with which the car travelled the total distance of 100 miles? Once again, we should simply find the arithmetic mean of the 10 speeds that we have. So, if we do that we obtain 30 plus 35 plus so on up to 30 divided by 10 and that is equal to 42 miles per hour. But students if we study this problem carefully we will find out that the arithmetic mean that you have just computed that gives us an incorrect answer. I will explain this point in some detail. As you know from the beginning speed is defined as distance travelled, total distance travelled over the total time taken. So, if we want average speed for that 100 mile interval, this means that we should divide the total distance by the total time taken. But as we know the 10-10 mile intervals, so we will have to do the computation step by step for each one of those intervals. Doing that for the first interval of 10 miles, the speed is 30 miles per hour and hence the total time taken to cover that interval of 10 miles will be given by the formula distance over speed because speed is equal to distance over time. So naturally time is equal to distance over speed and applying that the time comes out to be 10 divided by 30 that is 0.3333 hours. Proceeding in exactly the same manner for all the intervals, we obtain the successive times as 0.2857 hours, 0.2500 hours and so on. Adding all these times, the total time taken to travel the 100 miles comes out to be 2.4881 hours and dividing the total distance of 100 miles by the total time taken that is 2.4881 hours, the true average speed comes out to be 40.2 miles per hour which is not the same as 42 miles per hour, the figure that we obtained when we found the arithmetic mean. So, you have seen that arithmetic mean has not helped us. Let us see what we get if we apply the harmonic mean in this particular question. We have the column of x that is the column of speeds as 30, 35, 40, 40 and so on. And in order to compute the harmonic mean, the first step is to find the reciprocal of each one of these x values. Doing that, the first reciprocal comes out to be 0.0333, the second one as 0.0286 and so on and adding all of them sigma 1 over x comes out to be 0.2488. Substituting this value in the formula of the harmonic mean, we obtain 10 over 0.2488 equal to 40.2 miles per hour and students, this is exactly the same answer as what we obtained a few minutes ago as the correct average speed of this vehicle. Students, you have seen how to compute the harmonic mean. Now, the key question is in what situation should we be using the harmonic mean? I think that by now you might be a bit confused that I do not know what is the situation of arithmetic mean or what is the situation of geometric mean or what is the situation of harmonic mean. Actually, it is not such a big problem. If we follow two or three basic rules, the whole thing will become quite simple. The first rule that I would like to draw your attention to is that when values are given as x per y, where x is constant and y is variable, that is the situation when the harmonic mean is the appropriate average to use. We have just discussed this example. We have seen that there is a vehicle which is travelling a total distance of 100 miles, but there are 10 stops in that entire interval of 100 miles. Each stop is after an interval of 10 miles. Each stop is after an interval of 10 miles. This implies that the distance between any two stops is constant. When the quantities are expressed as x per y and x is constant, then we use the harmonic mean. In this question, we have said that the distances are constant. The next thing is that the speed of the vehicle that is changing from interval to interval. In the first interval, it is 30 miles per hour. In the second interval, it is 35 and in the third interval, it is 40. The question is that the speed in the first interval is 30 miles per hour. This means that 30 is a figure which is actually representing distance over time because it is the speed. The next figure which is 35 is also representing distance over time. The third figure which is 40 is also distance over time. But the distance between the three is constant. 10 miles, 10 miles, 10 miles. Distance over time is x per y. x is constant. Now y is the distance over time, i.e. x per y. i.e. y represents time. Time is varying in the three intervals because the speed is different. I hope that this lengthy discussion has conveyed to you the point that whenever we have such values which are representing rates x per y and then we have a situation that the numerator x in that problem is constant but the denominator y is varying then it is the harmonic mean and not the arithmetic mean which is the suitable average. Or what we will do in that situation if we are averaging rates x per y but it is not x which is constant rather it is y which is constant. The answer to this question is that in this scenario it is the arithmetic mean which is the appropriate average. Let me explain this to you with reference to an example. Suppose that there are 10 students in a particular class and suppose that they obtain the following marks in a test out of 20. 13, 11, 9, 9, 6, 5, 19, 17, 12 and 9. If we find the arithmetic mean the average marks come out to be 110 over 10 and that is 11. I did it in a very simple way. I simply added those 10 marks and divided by 10. But students if you pay attention to this I told you that all these marks they were out of 20. If we look at these marks from this angle we will realize that we are dealing with 10 quantities of the form x per y. We have 13 over 20, 11 over 20, 9 over 20 and so on. Adding these 10 quantities and dividing by 10 we obtain 110 over 20 divided by 10 which is 110 over 10 into 20 and which is equal to 11 over 20. Exactly the same result that I obtained a short while ago when I was adding simply the marks and not using the value 20. After all, when I told you earlier that the mean marks are 11 I mean that on the average a student is obtaining 11 out of 20. Let us now consider another example to illustrate the fact that it is the harmonic mean which is the appropriate average to use in those situations where values are given as x per y and the numerator x is constant and y is variable. Now, students on the first occasion the investor buys shares at the rate of 45 dollars a share. So, we can say that 45 dollars is equivalent to one share or in other words 1 dollar is equivalent to 1 over 45 of a share which means that 18,000 dollars are equivalent to 1 over 45 into 18,000 shares and that is 400 shares. Similarly, on the second occasion the investor buys shares at the rate of 36 dollars a share. Therefore, if 36 dollars are equivalent to one share then 18,000 dollars are equivalent to 1 over 36 into 18,000 shares and that is 500 shares. Hence, students in all the investor spends 18,000 plus 18,000 that is 36,000 dollars on 400 plus 500 that is 900 shares. In other words, on the average the investor buys 900 shares at the rate of 40 dollars per share and this is so obviously because 40 into 900 is equal to 36,000. Now, students instead of carrying out all these lengthy calculations we find that the harmonic mean yields the same result very quickly. The point to be noted is that since at both occasions the money value of the stock was 18,000 therefore the harmonic mean is the appropriate average to use. Do not forget the rule that when we have expressions like x per y and x is constant that is the occasion when the harmonic mean is the appropriate average and in this example the harmonic mean of the two prices that is 45 dollars and 36 dollars is computed by the formula n over summation 1 over x which is equal to 2 over 1 over 45 plus 1 over 36. And upon solving this expression the harmonic mean comes out to be 40. In other words the average price that the investor has paid per share is 40 dollars and students this is exactly the same result as what we arrived at a short while ago. I said that when quantities are expressed as x per y and x is constant the harmonic mean is the appropriate average to use but if the quantities expressed as x per y are such that y is constant then it is the arithmetic mean which is the appropriate average to use. Now, we have completed our discussion regarding the arithmetic mean, the geometric mean and the harmonic mean. Before we bring this particular discussion to an end students I would like to convey to you an interesting relationship that exists between these three measures of central tendency and it is very simple the relationship is that for any data set the arithmetic mean is greater than or equal to the geometric mean and the geometric mean is greater than or equal to the harmonic mean. This means that there will be no data set in which arithmetic mean might come out to be smaller in magnitude than the geometric or the harmonic mean or that the geometric mean comes out to be smaller than the harmonic mean. If all the values in any particular data set are the same students in that situation the arithmetic mean will be exactly equal to the geometric mean and the geometric mean will be exactly equal to the harmonic mean. A syrupy situation may when all the values in a data set are equal otherwise the arithmetic mean will always come out to be greater than the geometric mean and the geometric mean will come out to be greater than the harmonic mean. Students I have discussed with you the central tendency of the path most important or most widely used measures. The arithmetic geometric and harmonic means the median and the mode but you will be interested to know that there are some other measures of central tendency as well. Two of these are the mid range and the mid quartile range and I will discuss them with you one by one. First the mid range suppose that we have n observations such that the smallest is denoted by x naught and the largest by x m then the mid range is defined as x naught plus x m over 2. Students I will say this formula geometrical point of view say zahir hai ke x naught jo smallest value hai that lies on the extreme left of our distribution or x m jo maximum value hai that lies on the extreme right of the distribution. Jab aap in dono ko add karke 2 se divide karenge to zahir hai ke uska jo answer hoga that will lie somewhere in the middle of the distribution. The other one that I want to discuss is the mid quartile range. If we have n observations x 1, x 2 so on up to x n and if q 1 and q 3 represent their first and third quartiles respectively then the mid quartile range is defined as q 1 plus q 3 over 2. Bilkul pehle ki tara exactly the same logic if you add the value of q 1 with the value of q 3 and divide by 2 you will get a value in the middle of your distribution. So, in this manner both the mid range and the mid quartile range they act as measures of central tendency of the dataset. Ye jdo formulae main aapko aakhir me diye hain students I hope that this makes you realize that there are many ways in which you can define measures of central tendency. Hosata hai ke aapke zahin mein koi or formula aajaye which can be yet a new method of measuring the central tendency of a dataset. So, why don't you give it a try? Students we are coming close to the end of today's lecture and in today's lecture and the one before we have discussed in detail the concept of central tendency. Agli matabha ham ek aur nahayat important concept discuss karinge and that is the concept of dispersion. Dispersion is an important technical term representing the variability that exists in our dataset. Peshter iske ke ham agli topic ke taraf jayin I would like to revise with you briefly the core concept of central tendency. Jaya sa ke main aap se pehle kaha tha whenever we conduct a statistical enquiry we collect a certain amount of data and the first thing a statistician wants to do with this data is to organize it in a summarized and comprehensible form and we do things like the frequency distribution and the diagrammatic representation like the histogram and the frequency polygon. But students usually that is not sufficient for our purposes and the statistician is interested in ascertaining one single number that represents that entire dataset in some definite way. This number is called an average and it is a measure of the central tendency of the dataset. As you have noticed there are many different ways of finding the central tendency and these various measures different measures are appropriate in different situations. This brings us to the end of today's lecture. I would like to encourage you to attempt the assignment for this week and also many other problems as many as you can handle. Next time inshallah we will begin the concept of dispersion. Best of luck and Allah Hafiz.