 अस्लाम लेकुम, welcome to lecture number 3 of the course on statistics and probability. students अपको याद होगा, last time हम ने 2 main concepts दिसकास किये थे, sampling or methods of data collection. सम्प्लिंके अंदर हम ने मुख्तलिप जो तेक्नीख से सम्प्लिंकी उन पे एक ब्रीफ डिसकचन की ती और सब से जादा वक्त हम ने simple random sampling को दिया था. जिसके साथ हम इस कोरस के दोरान जादा वक्त बुजारेंगे. तुस्रा जो तोपिक था that was method of data collection. और उस में हम ने जो मुख्तलिप इंस्ट्रूमेंच है या मुख्लिप मेतच हैं data collection के वो दिसकस की है ते. जैसा के interview method, questionnaire method, or direct personal observation बगेरा. In today's lecture, I will discuss with you various techniques of representing the data that we have collected. In particular today, I will deal with you about the techniques for representing qualitative data which will include tabulation, bar chart, multiple bar chart, component bar chart, pie chart and some other things of the sort. You can see on the screen a tree diagram which gives you the two broad categories of data that you have. As mentioned in the first lecture also, you will distinctly remember that there are two types of data, qualitative data and quantitative data. Today, I will be picking up the qualitative category of data and I will discuss with you the various ways of representing qualitative data in case of a univariate situation as well as in case of a bivariate situation. For the univariate situation, we will be constructing the frequency table and also we will be drawing the pie chart and the bar chart. For the bivariate frequency table, we will be doing the component bar chart and the multiple bar chart. Let us begin with an example. Suppose that we are carrying out a survey of the students of first year who are studying in a co-educational college of Lahore. Suppose that in all, there are 1200 students who are studying in the first year of this college and we are interested in finding out how many of these students have come from Urdu medium school and how many students have come from English medium school. So, what will we do? Obviously, we will conduct an interview. I mean, that is one way we can do it. We will do an interview of all of these students and we will simply find out from which school they have come from. As a result, we will obtain a set of data like what you can now see on the screen. We will have a set of observations like Urdu, Urdu, English, Urdu, English, English. Yani, jese-jese aapko jwaab aate gaye uske mutabiq aap po data aapka aajayega. Now, the question is, what should we do with this data? Obviously, the first thing that comes to mind is to count the number of students who said Urdu medium and the number who said English medium. Suppose karein ke un 1200 students mese, 719 ne kaha ke wo Urdu medium schoolon se aaye hain aur 481 aasi ne kaha ke they have come from English medium schools. This will result in a table of the kind that you can now see. First column may obviously we will write Urdu and English medium and in the second column we will write number of students belonging to each of these categories. Aap aap note kar rahe honge ke second column mehne ek letter likhawa hai and that is F. Now, what does F represent? F is the notation for the term frequency and this is a very, very important term in statistical terminology. And what do I mean by frequency? It means how frequently something happens. Kithni marthaba aisa hua? Chunkhe 1200 students mehse 719 ne kaha ke hum Urdu medium schoolon se aaye hain. Isliye the frequency of that first category of students is 719. Similarly, the frequency for the English medium schooling is 481. Frequency to melge. But I think you will agree that this information is not as useful as if I was to convert these figures into percentages. So, that is the next step as you now see on the screen. We will simply divide the frequency of the first cell 719 by the total 1200 and multiply by 100 in order to get the percentage of students falling in the first category that is Urdu medium. So, as you can see 60% of the students in the first year of this particular college have come from Urdu medium schools and 40% have come from English medium schools. Students, what we have just accomplished is an example of a univariate frequency table pertaining to qualitative data. Now, I have used many words in this sentence. Univariate, frequency table or qualitative. Three words. Univariate, why? Because we are dealing with just one variable in this example and that is the medium of schooling. That is, the school where they came from was Urdu medium or English medium. The second term I have used is frequency table. Because of that, I have just told you that whenever we fall a number of items in a particular category, we call it frequency when we count that number. How frequently that particular thing happened. And the third term I have said that this is pertaining to qualitative data. This has been discussed many times. Obviously, Urdu medium does not mean 1.79 and English medium does not mean 3.21. You cannot express this numerically and we are definitely dealing with qualitative data. Let us now see how we can represent this data in the form of a diagram. One of the very interesting and useful ways of representing this data is in the form of a pie chart. Let us now see how we can represent this information in the form of a diagram. One of the very interesting and effective ways of representing this kind of data diagrammatically is to draw a pie chart. A pie chart consists of a circle which is divided into two or more parts in accordance with the number of categories that we have in our data. As you have seen, our variable medium of schooling was divided into two categories. Urdu medium and English medium. According to this particular example, our pie chart and our circle are divided into two parts. As you can now see on the screen, for Urdu medium, we have the larger part of the circle. As you can remember, 60% of the students belong to Urdu medium schools. Similarly, for English medium, the small part of our circle is for English medium. Because 40% of the students came from these schools. Now, the question is, how do we decide at what angle we are supposed to cut this circle? Well, the answer is very simple. All we have to do is to convert our frequencies into angles. And we do that by dividing the frequency of any cell by the total and multiplying by 360. For example, all of us were in elementary school. There are 360 degrees in a circle. So, after dividing 719 from 1200 to 360, we will get an angle of 215.7 degrees. You can also say 216 degrees. Therefore, when you start making diagrams, you will set your angle to 216 or 215.7 degrees. In this way, you achieve a very attractive and beautiful diagram called the pie chart. Students, the next diagram that I will discuss with you is the simple bar chart. This is also going to be used in case of a univariate frequency table pertaining to qualitative data. Simple bar chart is a thing. In this, we draw bars either vertically or horizontally. Most of the time, we take them vertically. And the widths of these bars are equal. But the lengths of the bars vary depending on the size of our data. So, let us consider an example. Suppose that we have data about the turnover of a company for a period of 5 years. As you can now see on the screen. Suppose that this turnover is for the years from 1985 to 1989. And the figures are 35,000, 42,000, 43,500, 48,000 and 48,500 rupees. Now, the question is that if we want to represent this information through a bar chart, how will we proceed? All we have to do is to take the years along the x-axis and to construct a scale for turnover along the y-axis. As you now see on the screen. Next, against each year, we will draw vertical bars of equal width and different heights in accordance with the turnover figures that I just shared with you. As a result, we obtain a simple and yet a very beautiful diagram as you now see on the screen. Students, I would like to convey to you a very important point and that is a mathematical point. Point is that even though these bars have lengths or widths, there is no significance of these widths mathematically. It is only the length of the bar which conveys the figure that we are trying to represent. So, the question is that why did we have these widths? Then we could simply have drawn a line. Well, actually that is true. We could have simply drawn vertical lines and we did not need any width for any of these bars. But this is done only because the moment you assign a certain width, the chart becomes very attractive. Particularly so, if you are coloring it according to any beautiful color of your choice. Students, what we have discussed until now is the univariate situation. Let us now discuss the bivariate situation. You see because in the real world, most of the time we are not dealing with just one variable. We are interested in phenomena in which many variables play together and interact with each other. So, if we want to begin with a very simple example, let us go back to the example of the students of first year in that co-educational college that I was talking about. Suppose that we are not only interested in these things but we should know overall that we have come from the Urdu-medium school or English-medium. In fact, we are also interested in distinguishing between girls and boys according to this. We should know that out of the female students, how many came from Urdu-medium and how many from English-medium and similarly for the male students. So, what will we do in such a situation? Obviously, we have to collect data that must cover not only the medium of schooling but also the sex of the student. Suppose we do that and we interview every one of those 1200 students of first year of that college and we ask him or her what was your school and also we note down the gender of the student. Asha karne se, of course, we will get a table in which we will now have three columns as you can see on the screen. The first one of course gives us the student number 1, 2, 3, 4 and so on and the second gives us the schooling medium and the third one gives us the gender of the student. If you look at the table, if the data is as you see, it means that the first student that we recorded, she was coming from an Urdu-medium school. The second student was a boy and he also had come from an Urdu-medium school and so on and so forth. Now the question is, how will we summarize this type of data? Alright, in this case we will construct a frequency table which is called a Bivariate frequency table. It will consist of a box of the type that you now see on the screen. In this, the upper row is called the box head and the first column is called the stub. Now it is our choice whether we want to write the sex of the student on the top or do we want to write the medium of schooling on the top. It doesn't matter, that is your choice. Suppose that we write the student's gender in the box head and the medium of schooling in the stub. That will result in the table that you now see. Now you have got the overall structure of the table. But the question is how will we fill it? So it is obvious that we will have to count these data in four categories. We have to know how many students were male and came from Urdu-medium schools. How many students were female and came from Urdu-medium schools. How many students were male and came from English-medium schools. And how many students were female and came from English-medium schools. Doing this, students suppose we get the figures that you now see on the screen. 1200 students were male students and came from Urdu-medium schools. Whereas 517 were female students and came from Urdu-medium schools. If you pay attention, 202 and 517 add up to 719. Exactly the same figure that we had earlier when we were not considering the sex of the student. Similarly, 350 students male and came from English-medium schools. And there were 30 girls who came from English-medium schools. These two figures 350 and 131 add up to 481. Exactly the same figure that we had earlier for English-medium. Students, what we have just accomplished is a bivariate frequency table pertaining to qualitative data. You note that again I have used three words, bivariate which I am sure now you readily recognize. Frequency table because all the figures are frequencies of the various joint events that we were considering. Or qualitative data because gender of the student or medium of schooling. Both are non-numerical data. Now let us see how we will represent this type of data diagrammatically. For this we have a very interesting diagram and that is called the component bar chart. It is called sub-divided bar chart. What we have to do in this particular case is that first of all we will draw a simple bar chart using one of the two variables that we have. As you can see on the screen. First of all we will draw charts according to the gender. Since male students were less, the number of female students was less. Now once we have done this, the next step is to divide each of the two bars into two parts. And we will do this division according to the medium of schooling. Now this is our own wish that we want to keep the English medium below any bar or the Urdu medium below. Suppose we decide that we will allocate the lower part of the bar for the English medium and the upper part of the bar for the Urdu medium. If we do that we get the diagram that you now see on the screen. Now you can see that among the male students there was a greater number of English medium students than among the female students. And this is very clearly depicted in this sub-divided bar chart. Students, this component bar chart which we have just discussed, it is a very effective and useful diagram. The biggest benefit of this is that you get a comparison of both the variables in one view. You can compare the number of male students with the number of female students. And also you can compare the proportion of English medium students among the males with the proportion of English medium students among the females. The next diagram that we will discuss today is the multiple bar chart. A multiple bar chart is also a very interesting diagram and a very beautiful diagram. And it is used in a situation where we have two or more related sets of data. Let us consider an example. Suppose we have, as you can now see on the screen, data about the imports and exports of Pakistan for the years 1970-71 to 1974-75. And suppose that we wish to represent this information in the form of a multiple bar chart. For this, what we will do? This time we will be drawing vertical bars, one for imports and the other for exports in such a way that both the bars will be adjacent to each other. That means they will be touching each other. For example, when you want to draw bars for 1970-71, the first bar will be 370 units long and the second bar will be 200 units long. Similarly, when you draw for 71-72, the first one will be 350 units long. And the second one is 337. In this manner, you will get a diagram as you now see on the screen. Now, one thing which is very important is the shading of the diagram. Because for every year, the first bar represents imports, it is natural that we should use one color for imports. Similarly, the second bar is representing the exports and hence we use a different color for the exports. As a result, we get a very interesting and beautiful diagram as you now see. In this example, we had two related variables, imports and exports. But of course, the multiple bar chart can also be used effectively if we have three pieces of information. If we had production data, we could have drawn one more bar against each year adjacent to the first two and we would have used a different color for that third bar. Students, what is the basic difference between a component bar chart and a multiple bar chart? And this is a point where students are often confused. Although it is very simple, the only thing to remember is that the component bar chart is to be used when we are dealing with totals and their components. For example, we had the total number of male students out of which so many were English medium and so many were Urdu medium. Similarly, we had the total number of female students out of which so many were English medium and so many were Urdu medium. On the other hand, we will use the multiple bar chart where the two pieces of information are related but they do not add up to give you some one thing. Imports and exports cannot be added to give you some one quantity the way you had in the first exam. Students, in today's lecture we have discussed that pertaining to a qualitative variable or more than one qualitative variable. Let us now start the discussion of the quantitative situation. As you can see on the screen, the quantitative variable is of two types. As of course, we have discussed in our first lecture, the discrete variable and the continuous variable. For the discrete variable, we will be constructing a frequency distribution and we will be drawing a line chart. For the continuous variable, again we will be constructing a frequency distribution and we will be drawing the histogram, the frequency polygon and the frequency curve. Let us first consider the discrete case with the help of a very simple example. Suppose we walk into the nursery class of a small primary school and we count the number of books and copies that every student has in his or her bag. Obviously, we will get data which will be in whole numbers as you can now see on the screen. 3, 5, 7, 9, this is the number of books and copies that the various children have in their bags. Now we will convert this data into the frequency distribution. The first thing to do is to denote our variable by x and then make a column of the x values that we have in our data. So, as you now see, we will have a column which is headed number of books and the number of books is denoted by x and the numbers are 3, 4, 5, 6, 7, 8 and 9. The reason is that the school in which we went to, in the nursery class the books that the children had, their minimum number was 3 or maximum number was 9. So, we have the column of the variable x. Next, we need to count the number of times the various values of x occur in our data. So, for this purpose, we will construct two more columns which are adjacent to the column that we have just constructed. The first of these two columns is for tally marks and the second for frequency. So, as you are seeing, we now have three columns from which we have to fill the second column. As you saw a short while ago, our data consists of the values 3, 5, 7, 9 and so on. So, if we want to do this process of tallying manually, then of course the easiest way is to pick up the values one by one and put a tally mark in the second column of our table. So, because the first value is 3, we will put a stroke in the second column against the number x equal to 3 as you can now see on the screen. The next value is 5 and hence as you can now see our second tally stroke will be against the value x equal to 5. So, it is a very simple process. We will pick up each value of our data set and mark it on the proper place of the x column. Now, continuing in this process, we obtain the distribution that you now see on the screen. We have tallyed all the values in the columns of the tally marks and as a result, our entire data is in this table. Now, the thing to note as you can see is that after every 4 vertical strokes, the 5th stroke has been horizontally placed so that it intersects the first 4 lines. This is only for convenience. The reason is that it is easy for us to count the number of strokes if they have been grouped into sets of 5 rather than if all of them were in the form of vertical bars. Now, the question is why the frequency distribution of this table is called? The reason is that the total frequency 45 has been distributed among the various values of x. One of those 45 values has been allocated to x equal to 3, 3 of the 45 values have been allocated to x equal to 4, 9 of the values have been allocated to x equal to 5 and so on. It is very simple that we have distributed the total frequency among the various categories and that is why it is appropriate to call it a discrete frequency distribution. Let us now consider the graphical representation of this table that we have just constructed. The best way of doing this is by way of the line chart. The line chart is in a way quite similar to the simple bar chart that we discussed a short while ago when we were dealing with the situation of a univariate frequency table. There, you will remember that you have made the bars which were long according to the values that we were trying to represent. But in that, you will remember that there were also the widths of all those bars which we had colored in order to make the chart very attractive. You will remember that I had said at that time that the width has no mathematical significance. Here, we will not make those weights. This is going to be more accurate from the mathematical standpoint. All we have to do is to take the x values along the x-axis and the frequencies along the y-axis as you now see on the screen. We will be drawing vertical lines against each value of x in accordance with the frequencies that we have. You will remember that the frequency of x is equal to 3 was 1 and hence the first vertical line is only 1 unit tall. The second frequency was 3 and accordingly the line is 3 units tall. Similarly, we have for all the values and as such we get a simple and yet effective way of representing discrete frequency distribution. In this, what is important is that we have used separate lines rather than a continuous curve that we usually draw when we are drawing graphs. This is very important. The reason is that we are dealing with a discrete variable and our graph must convey the concept of discontinuity. If we do not make these separate lines rather than plot these points and combine them with a continuous curve, then that would have given an impression of continuity which as I mentioned earlier is not going to be appropriate for this kind of an example. In a child's bag, there will be either three books or four or all three books. Therefore, this is a very important point that the reason why a line chart is a better way of representing discrete variable is that the separate lines do convey the concept of discontinuity. Students, what we have just done is the tabular and diagrammatic representation of a discrete variable. This concept of discrete frequency distribution is of two or three concepts that I would like to convey to you. Abhi abhi jo hamne banayi thi, they were the frequencies, the absolute frequencies. Jaisa ke aapko yaad hoga aur aap screen pe bhi dek rahe hain. There was only one student who had three books and three students who had four. Now, this information is not extremely useful if supposing we did not have the value of the total number of students. If you had lost that 45, then you would not have been able to judge what the situation is. So, what we do is to construct a column of relative frequencies and that is called a relative frequency distribution. Now, what does relative frequency mean? Extremely simple. All we will be doing is to divide every frequency by the total frequency and that will give us the relative frequency of that particular X value. We can convert these relative frequencies into percentages and as we all know, percentages jo hain wo to ek aam aadmi, bo hot jaldi samajta hain. Chai the relative frequency wo itni jaldi na samaj pai. All we have to multiply the relative frequency of any X value by the number 100 as you all know. That is a very simple procedure. So, 1 over 45 into 100 gives you the percentage of students who had three books in their bags and 9 over 45 into 100 gives you the percentage of students who had five books in their bag. Another very interesting and important concept is that of the cumulative frequencies. See, it is possible that we are interested in this kind of information that in this class there are so many children or so many children who do not have more than three books in their homes. However, according to their requirement, they have to read six or seven subjects every day and according to that they should have brought six or seven books. So, if we are interested in this kind of information, how many students brought only three, four or at the most five books and not more than that. So, how will we know this? As you can see in the slide, we have one student who brought three books, three students who brought four and nine students who brought five. Obviously, we have to add these three numbers in order to get the total number of students who brought five or less books. i.e., one plus three plus nine. In other words, 13. So, out of 45, 13 students are of this type who did not bring more than five books. So, you can see that this is very confusing if we want to read directly like this. So, a very convenient way of getting over this problem is by constructing another column of cumulative frequencies. All we have to do is this. The first frequency one remains as it is. One plus three equal to four comes against the value x equal to four. Four plus nine equal to thirteen is written against x equal to five. Thirteen plus thirteen equal to twenty-six is written against x equal to six and so on. As you can see, the last cumulative frequency is 45 exactly the same as the total number of students that we had in our dataset. Now, from reading this column, we get to know very easily how many students brought five books or less, seven books or less, three books or less. For example, as you can see, in a glass we can say that 13 students were such who brought five books or less. And 26 students were such who brought six or less. So, this is the advantage of a cumulative frequency distribution. In today's lecture, we have discussed many topics. I started from the tabular and diagrammatic representation of qualitative data. And we discussed both cases, the univariate situation and the bivariate situation. After that, we dealt with the tabular and diagrammatic representation of a discrete quantitative variable. Next time, we will be discussing the tabular and diagrammatic representation of continuous quantitative variable. In particular, we will be doing the continuous frequency distribution, the histogram, the frequency polygon and the frequency curve. Also, we will be doing the cumulative frequency distribution for a continuous situation. And we will be drawing the cumulative frequency polygon, which is also called OGIP. In the meantime, I would like to encourage you students to practice the various concepts that we have discussed today. And I would like to recommend to you to attempt at least four or five questions of the exercise of tab 2 in your textbook. Best of luck and until next time. Allah Hafiz.