 Assalamu alaikum, welcome to lecture number 5 of the course on statistics and probability. You will recall that in the last lecture, I discussed with you the construction of the frequency distribution of a continuous variable. Also, we did the relative frequency distribution and the percentage frequency distribution. We learnt how to draw the histogram, the frequency polygon and the frequency curve. Today, we will discuss the different frequency curves that we encounter in real life. And then we will go on to the cumulative frequency distribution and the cumulative frequency polygon. Subse pehle, let us revise a little bit of what we did last time. You will recall that in the last lecture, I can wait to you that the frequency polygon is that diagram that we obtained by plotting the class frequencies against the midpoints of the classes and by connecting the points so obtained by straight line segments. In the EPA mileage rating example, our frequencies were 2, 4, 14, 8 and 2 against the various classes that we had formed and our frequency polygon came out to be as you now see on the screen. Also, you will recall that I communicated to you that the frequency curve is obtained by smoothing the frequency polygon that we have drawn. In the example of the EPA mileage ratings, the frequency curve came out to be as you now see on the screen. Ye jo dotted line aab dekh rahe hain, this of course is the frequency curve or jo straight line segments wala haka hain that is the frequency polygon. Ye jo frequency curve hain, iss ka purpose us distribution ke overall pattern ko display karna hota hain. Iss me ye zoroori nahi hain, ke aapki jo curve hain wo un sub points ke darmean mesi guzre. Shaya daap ko yaad ho, ke me ne last time bhi iss baat p stress kia tha. It should be noted that the frequency curve is actually a theoretical concept. Ye jo aapne histogram banaya tha agar aap iss ki jo classes hain aapki jo frequency distribution hain agar us ki jo classes hain unki tadaat ko bohat bada hain aur unka jo class interval hain usko narrow karne to aapki jo distribution banegi uski shakal kuch iss khusam ki hogi. The smaller the class interval and the larger the number of classes, the narrower the rectangles become as you are seeing now on the screen. Aap issi concept ko aap yani aur aage lejhain aap unko narrow se narrow se narrow karte jain to ultimately you arrive at what we call the frequency curve. In spite of the fact that the frequency curve is actually a theoretical concept, it is very useful in analyzing real world situations. The reason is that often very close approximations to these theoretical curves are generated in real life. Yani ye theoretical curves jo hain ye un real life phenomena ko itni closely approximate krti hain ke it is valid to utilize the mathematical properties of these curves to analyze those real world situations. Students, I will now discuss with you the various types of frequency curves that we do encounter in practice. We have the symmetrical frequency curve, the moderately skewed frequency distribution, the extremely skewed frequency distribution, the U shaped frequency curve and also the uniform distribution. Let us discuss these one by one. First of all, the symmetric frequency distribution is of the shape that you now see on the screen. Is distribution ka khasa ye hain ke agar aap ek vertical line place karein in the center of the distribution, then the left hand side will be the mirror image of the right hand side. Yani dono sides bilkul ek jaisi maloom hongi ek dusve ki reflection and this is what is known as the symmetric frequency curve. Next, the moderately skewed frequency curve. Students iske andar we have two categories, the positively skewed and the negatively skewed. Wo hain jisme the right tail is longer than the left tail jabke negatively skewed usko kate hain jisme the left tail is longer than the right tail. Jaisa ke aap note kar rahi hain agar iske darmean aap ek aina karak hain to the left hand side is not the mirror image of the right hand side. Aur ye jo lakh of symmetry hain issi ko skewness kaihte hain. Both of them that we have just considered are the moderately skewed distributions, but then we also have the extremely skewed frequency distribution. As you now see on the screen an extremely skewed distribution is the one in which the maximum frequency occurs at the end of the frequency distribution. As you can see this curve looks like a J and therefore it is also called a J shape distribution. Iska example agar aap death rate ki baat karein of the adult population of any country to aap realize karein ge ke us distribution ki jo shape hogi it will be like a J shape distribution. Why? Because for lower adult age the death rate will be lower, but for the advanced age the death rate is higher and so the shape is like that of a J. Ye to the extremely negatively skewed distribution disko aap J shape b kaisakte hain, but then also of course you can have the extremely positively skewed distribution which looks like a reverse J as you can now see on the screen. Let me illustrate this type of a distribution with the help of the following example. The following are the numbers of 6's obtained in 50 rolls of 4 dice 0 0 1 0 0 0 2 and so on. Construct a frequency distribution and a line chart and discuss the overall shape of the data. Students iss me aap sab se pehle ye note ki je ke 4 dice ko aap roll karein and you are interested in determining how many of the dice show a 6. To pehla jo figure 0 hain uska matlab ye hain ke aap ne wo 4 dice roll kye aur kisi peh bhi 6 akar nahi hua similarly second 0 also means the same thing, but after that we have the number 1. Iska matlab ye hain ke 4 dice jo roll kye gaye unme se ek ke upar 6 nazara rahe. Similarly you can interpret all the values. Now we would like to convert this raw data into a frequency distribution and all we have to do is to construct a column of x where x represents the number of 6's that we obtain by rolling 4 dice. As you can see on the screen x values are 0 1 2 and 3. Now in order to tally the raw data into our frequency distribution we will construct the column of tally marks and the column of frequencies. As you can see the number 0 is the most frequent value and f is equal to 28 corresponding to x is equal to 0. Similarly f is equal to 17 corresponding to x equal to 1 and f is equal to 4 against x equal to 2 and f is equal to 1 only against x is equal to 3. Agar aap raw data par dubara nazar daale to aap deek sakte hain ke 3 jo hai that occurred only once. The line chart of this distribution is obtained by taking x along the x axis and f along the y axis and as you can see on the slide students the first line is the tallest obviously the first frequency is the greatest and therefore the first line has to be the tallest one. The second one is shorter than the first the third still shorter and the last line is the shortest. If we draw a free hand curve on top of this line chart we obtain the shape that you now see and I am sure that you will agree with me that this can be regarded as a reverse J shaped distribution. Please keep in mind that we must not be drawing a free hand curve in the case of a discrete variable the way we have in this particular example but I have done this here only to illustrate to you the shape of this particular distribution. Students I would like to convey to you a very interesting point here. Aap ne dekha ke ye jo distribution hai this is extremely positively skewed. The question is does this data set indicate that the dice that were rolled were unfair? Yeh is liye ke haan samaj sakte hai na ke agar haan absolutely fair dice is tamal kar rahe hote to haan ye expect karthe ke koi symmetric see distribution haan haasil karthe. Shai thame intuitively ko ches khosam ki feeling ho but you will be interested to know that if these dice were absolutely fair we could have expected to obtain frequencies very close to what we have obtained. You will be studying this point in detail when I will be discussing with you the binomial distribution but at the moment I would like you to concentrate on this very interesting fact that four tosses of a fair die yield such a sharply skewed distribution. A relatively less encountered distribution is the hue shaped distribution. If you consider the example of the death rate of not just the adult population but the entire population of a country you will agree that you will get something like the hue shaped distribution. Infant mortality rate is higher than the death rate at at ages 20 to 50 and then the for the advanced age again the death rate is higher. Another rather less frequently encountered distribution is the uniform distribution. Let me illustrate this distribution with the help of a very simple example. Suppose that a fair die is rolled 120 times and the following frequency distribution is obtained. X represents the number of dots on the uppermost face. Obviously X takes the values 1, 2, 3, 4, 5 and 6 and the frequencies are 19, 22, 20, 21, 19 and 19. The line chart of this distribution is as you see on the screen a set of vertical lines which are almost equally tall. If we draw a free hand curve along the top points of these vertical lines students we obtain a horizontal line. In other words our free hand curve is not a curve but a line and because of the fact that it is horizontal we can say that we are dealing with a uniform distribution. The point to be noted is that since the die was absolutely fair therefore every side of the die had equal chance of coming on the top. As such out of 120 tosses we could have expected to obtain X equal to 1, 20 times, X equal to 2, 20 times, 3, 20 times and so on. And we note that the frequencies that we have actually obtained are very close to the expected values 20, 20, 20 and so on. After all you will agree that these values are very close to what we could have expected theoretically. So, the gist of the whole discussion is that whenever we are dealing with an equally likely situation of the type described in this example we are dealing with the uniform distribution. Students out of all these curves that I have discussed with you the most commonly encountered one is the moderately skewed distribution. Just go and start measuring the children in that school. Measure their heights, their weights, their shoulder lengths, their blood pressures, their body temperatures and what have you. And you collect that data and you make your frequency distribution and you draw the histogram and you will find that it is like a moderately skewed distribution. Of course all of them pertain to the continuous variable, but obviously a similar situation will hold for the discrete frequency distribution. And as you can now see on the screen in case of a discrete variable also you will either have a symmetric distribution or a positively skewed or a negatively skewed one. Let us now go to the cumulative frequency distribution concept which I have already done with you when I discuss the discrete frequency distribution. You will recall that in lecture number 3 when we were dealing with the discrete frequency distribution I can wait to you that if you start adding those frequencies i.e. first frequency as it is and then that plus the next one gives you the second cumulative frequency and so on. So, it is called cumulative frequency distribution. Now, we apply this concept on continuous frequency distribution and we use the same examples that of the EPA mileage ratings. So, as you can recall and see on the screen the frequencies of the distribution that we had for that example were 2, 4, 14, 8 and 2. And if we add these the way we did in lecture number 3 we obtain the cumulative frequency column as 2, 6, 20, 28 and 30. That tells you how many observations in that distribution are equal to that x value or smaller than that x value. So, it gives us the total number of observations starting from the lower class boundary of the first class up to the upper class boundary of that particular class that we are considering. For example, if we consider the cumulative frequency of the third class in this example of the EPA mileage ratings it is 20 and what does it mean? It means that the mileage of 20 cars lies somewhere between 29.95 and 38.95. Similarly, the mileage of 28 cars lies anywhere between 29.95 and 41.95. This type of a cumulative frequency distribution is called the less than type of a cumulative frequency distribution. Why is that? Because any cumulative frequency tells you how many observations are less than the upper boundary of that particular class. For example, in our same example of the EPA mileage ratings 20 cars have mileage less than 38.95 or you can say 20 cars have mileage 38.95 or less. The graph that is called the cumulative frequency polygon. Students, you will be taking the upper class boundaries along the x-axis and the cumulative frequencies along the y-axis as you can now see on the screen. The cumulative frequencies are plotted on the graph paper against these upper class boundaries and the points so obtained are joined by means of straight line segments. In the example of the EPA mileage ratings our cumulative frequency polygon comes out the way you now see on the screen. This graph is also called the OGIVE. This is the graph of the cumulative frequency distribution. This is achieved by adding a class to the frequency table on the top as you can now see on the screen. If you increase one of your classes in the beginning so that you have a class from 26.95 to 29.95, obviously the frequency of that class is going to be zero because actually none of the cars was falling in that category. Now if you want your cumulative frequency polygon to be close from the right hand side also this is achieved by dropping a perpendicular from the last point down to the x-axis as you can now see on the screen. Actually this step is not very important from the statistical point of view. Let us consolidate all these ideas with the help of the example of the ages of managers of child care centers that we discussed in the last lecture. As you will recall the statement of the example was the following table contains the ages of 50 managers of child care centers in five cities of a developed country. The ages were 42, 26, 32 and so on. We converted this data into a frequency distribution in the last lecture and today we are interested in constructing the cumulative frequency distribution. The students will recall that the frequency distribution that we obtained last time was 20 to 29, 30 to 39, 40 to 49 and so on as the age groups and the frequencies were 6, 18, 11, 11, 3 and 1. Now in order to construct the cumulative frequency distribution the first thing that we remind ourselves of is that the cumulative frequency is a running total of the frequencies through the classes. The cumulative frequency for each class interval is the frequency for that particular class interval added to the preceding cumulative total. Adopting this process in this particular example we note that the frequency of the first class is 6 and hence the cumulative frequency of the first class will also be 6. The cumulative frequency of the second class interval will be obtained by adding the frequency of the second class 18 to the cumulative frequency of the preceding class that is 6 and hence the cumulative frequency of the second class is 24. Proceeding in this manner we obtain all the cumulative frequencies and we note that the cumulative frequency of the last class is 50 which is exactly equal to the sum of the frequencies. Hence our column of cumulative frequencies reads 6, 24, 35, 46, 49 and 50. Now the important question is how do we interpret this column of cumulative frequencies? Well it is quite simple. Similar to what we did in the last example students in this example we note that 24 of the 50 managers are 39 years of age or less and if you look at the upper limit of the class which has the cumulative frequency 24 students you see that the upper limit is equal to 39. In the next column you see that 18 managers are 30 to 39 age group and 6 managers are 20 to 29 age group. So it is obvious that if you add 6 and 18 then the number 24 that corresponds to the age group 20 to 39. Dusre Lafzome 24 managers are such who are 39 years or less age wise and this is exactly what I said earlier. Similarly we can interpret all the other cumulative frequencies for example 46 of the 50 managers that is 92 percent of the managers are 59 years of age or less that is less than 60 years old and so on and so forth. Next let us consider the graphic representation of the cumulative frequency distribution. Students proceeding exactly the same way as we did in the previous example we obtain the cumulative frequency polygon that you now see on the screen. The upper boundaries along the x axis and the cumulative frequencies along the y axis with one class added in the beginning with zero cumulative frequency yields this attractive cumulative frequency polygon. Students let me now share with you some real life applications of the concept of cumulative frequency. It is used in many real life situations including sales accumulated over a fiscal year, sports scores during a contest in other words the accumulated points, years of service, points earned in a course and costs of doing business over a period of time. For example going back to the first one that you have on the screen sales accumulated over a fiscal year that means during a year the sales that you add to your record will go on to accumulate. So these are the concepts that relate to this particular technical concept. Students today I have discussed a very basic but important concept and that is the concept of the frequency distribution of a continuous variable and its frequency curve and its cumulative frequency polygon. Let us now consolidate all this that we have done by considering another example. The example that you are now seeing on the screen pertains to a sample of 40 pizza products produced by various companies or this variable may have interested here that is the cost of a slice of pizza in American dollars. Let us first consider the construction of a frequency distribution for this particular data set. The first thing to note is that the example that we did before that of the EPA mileage ratings that was correct to one decimal place. We will proceed in exactly the same manner as we did in the last example. If you go back to the data set you find that the smallest value in the data set is 0.52 and the largest value is 1.90. Subtracting x0 from xm the range comes out to be 1.38. Let us suppose that we decide that we would like to have eight classes in that case. As you will recall the next step is to divide the range by the number of classes so that you obtain an approximate value of your class interval. So, as you can see on the slide in this particular example dividing the range by the number of classes which is 8 we obtain 0.1725 but of course we would like to round it to a more convenient number and hence we say that our class interval is equal to 0.20. Now the next step as you will remember is to decide the lower limit of the first class. We have various options but if I take 0.51 as the lower limit of the first class that will make it quite convenient for me because as you now see on the screen if I do that I get class limits as 0.51 to 0.70, 0.71 to 0.90 and so on. Once again I will remind you that you may feel that the class interval is no longer 0.20 but although it appears to be 0.19 actually it is 0.20. Isme jo baat dekhne ki hai wo ye hai ke jab kabhi bhi aap class limits form kar rahe hai is tarike se that you are maintaining a gap between the upper limit of a class and the lower limit of the next class then you should not subtract the lower limit of any class from its upper limit in order to find the class interval. What you should do is to subtract the lower limit of any class from the lower limit of the next class and you will find that you have exactly the class interval that you want. As you now see on the screen in this particular example if you subtract the lower limit of the first class from the lower limit of the second you do obtain 0.71 minus 0.51 which is exactly 0.20 the class interval that you wanted. Similarly you can also subtract the upper limit of a class from the upper limit of the next class and in this example 0.90 minus 0.70 also gives you exactly 0.20 the class interval that you wanted. The next concept is that of the class boundaries you see this is the golden rule that when you first form the limits the number of decimal points is exactly as much as your data is and later when you form the class boundaries then the procedure we form automatically one decimal place is increased and there is no problem of telling the data when the telling process starts. So, as you now see on the screen in this particular example the upper limit of the first class is 0.70 and the lower limit of the second is 0.71 and when you add the two and divide by two your upper boundary of the first class as well as the lower boundary of the second class comes out to be 0.705. But since there is no value in my data set equal to 0.705 there is no problem regarding the telling of the data in this frequency distribution. Of course the next step is to actually tally the entire data in your frequency distribution. Students I will be very happy if you would take this up as an exercise and tally all the data of this example in your frequency distribution and then go on to draw the histogram the frequency polygon and the frequency curve of this data set. Iske baad aap us frequency curve ko judge ki jie is it positively skewed negatively skewed or any other shape and also you should try to interpret the shape that you are getting. Also if you do the cumulation of the frequencies then you will get an idea of the number of pizza products which have the cost up to a certain value as I explained with reference to the last example. Many aap se kuch ter peale kaha tha that the most frequently encountered frequency curve is the moderately skewed one moderately positively skewed or moderately negatively skewed. Is example ke data par to me ne work nahi kiya hai but I do have a hunch ke jab aap iske frequency curve banayenge so that will also be something of that of this shape the moderately skewed shape. Iske bhaja kya hai why is it that a lot of time our shape our curve shape is like this. Ke aisi ek shape ke jis ko hum hump shape bhi kaya sakte hain aur jo aam tor pe absolutely symmetric to nahi hoti but it is approximately symmetric slightly positively skewed or slightly negatively skewed. Iske bhaja kya hai. Aayi us example pe dobara chalti hain jo me ne aap se kaha kya aap kisi bhi school me chale jaaye and you just start measuring those children their heights, their weights, their body temperatures, their blood pressures and so on and so forth. In saare phenomena me se ek ko le liye jaye. Let us talk about the weight of the children of one particular class. Kisi ek class me they will be more or less of the same age. So age ke hi saab se hum kaisakte hain ke it is like a constant a constant value. So aap iss particular age group ke jo bachche hain, inke jo wazan aap karenge to kya aap mujh se agree nahi karenge. Ke jo majority of the students hain unka wazan average jo wazan hosakta hain us age me usi ke lag bhag hoga. Zyadatar bachche aapko usi wazan ke milenge. Aur bhot kam bachche aapko aise milenge jenko aap underweight kahin. Aur issi tara kam bachche aapko aise milenge jenko aap overweight kahenge. Ye jo baat me aap se kar rahi hoon. Ye hi apply hoti hain bezh shumar fenomena ke upar. Ke majority jo hain wo us cheese ki darmyani range me hoti ek akar kar rahi hoti hain. Aur uski lower extreme ya uski upper extreme. Un extremes pe kam values lai kar rahi hoti hain. Haid ki baat ki je. Most of the people in Pakistan adult males may be like 5 feet 10 inches, 5 feet 8 inches, 5 feet 11 inches or 9 or 7. Bhot kam log aapko 6 feet se jada aur jada issi tara kam log aapko 5 feet 4 inches ya 3 inches pe milte hain. Exactly yehi wajah hain why you obtain a hump shaped frequency distribution. Frequency number of observations rise kar jaati hain in the middle of the distribution wo jaha pe average value lai karthi hain uske aaspaas. Aur frequency decline kar jaati hain at the extremes of the distribution. And the most fascinating thing students is that this excess and this defect occurs in a more or less balanced manner. Allah taala ne yeh kainat adal, hussan aur tavazun pe kain ki hain. Phenomena jab aap uski distribution construct karthe hain aapko wo jo excess or defect hain from the average value wo ek balanced manner me milta hain. And you get something approximately like a symmetric distribution. Marks ki baat karein. When you are you know you are enrolled in a college and you are you have so many students in this college and we take this exam and we have this set of marks you know. Iska agar aap histogram banaye to iski bhi aapko shakal takriban issi tara se milegi. Ke majority of the students jaha pe frequency rise karegi unke marks 50 percent, 55 percent, 60 percent ke lagbhag honge depending on the caliber of the students aur bohot zyada namro wale students kam honge, bohot kam namro wale students bhi relatively kam honge. And generally you would get something like the symmetric distribution. Of course, hame iske counter examples pe bhi kaur karna chahiye. There are situations when we will not be getting something which is approximately symmetric. Ishi marks ke example pe aap rahein. Aur yeh sochein ke yeh particular exam jiski hain aap baat karein hoon. Kisi aise professor niliya jine bohot thi zyada, bohot thi zyada mushkil exam lene kar shok hain. Aur bohot hi taaf exam ho gaya. Ab aap agri kareenge ke isme toh bohot zyada students ke number bohot kafi kam honge. Aur jo frequency hain hain, jo bulk of the frequency hain. That will shift towards the lower marks. Yani maybe 20 percent, 25 percent, 15 percent, itne namro wale marks wale students ke tadaaj shahit bohot zyada hojain. So, the frequency rises not in the middle part where it is like 50 marks or 55 marks, but it shifts towards the left side and it rises against the value 20 marks. Is case me aap ki jo distribution hain that will become positively skewed. You see it rises in the beginning and then it drops and goes towards the other end. Aur aap iske bilkul opposite situation ko leh liye je. Think of a test which is very, very easy. Bohot hi aasan test dey diya professor ne. Aap toh iske baale se bilkul opposite situation ho gaya. There are so many students now, bohot zyada students who are getting 85, 90 and something like that. So, in this case your distribution becomes negatively skewed. You start from a low frequency and as you go and proceed towards the number 85 or 90, your frequency rises. So, the rise is not in the middle of the scale of the marks, but it is towards the right side and your distribution is negatively skewed. This brings us to the end of today's lecture. Students, in the last lecture and in today's lecture, you have dealt with a very important concept and that is the frequency distribution of a continuous variable. I have discussed with you the relative frequency distribution and the percentage frequency distribution. Also, we learnt how to draw the histogram, the frequency polygon and the frequency curve. In addition, we did the cumulative frequency distribution and the cumulative frequency polygon also known as the ogive. In the next lecture, I will discuss with you some very interesting diagrams such as the stem and leaf plot and the dot plot. Also, I will begin with you the concept of central tendency of a data set. In the meantime, I would like to encourage you to practice all these concepts that you have done until now and also to attempt the assignment that you will find on the website. Best of luck and Allah Hafiz.