 Okay, so by the end of the session, in the next eight minutes, we should be able to construct tables and charts for numerical data and also for categorical data. So when we construct tables for categorical data, which is easier, we use a table, we call it a frequency table. So when we do the graphs, we use three types. We do a bar chart or a pie chart or a Pareto diagram. A Pareto diagram is your bar chart with your line chart together. In your module, most of the time, we don't even have to bother about explaining what the Pareto diagram is about, but I have included it into the slides as well. So we normally talk about categorical data in terms of frequency or visualizing categorical data in terms of the frequency table, the bar chart, and the pie chart. So a frequency table is a summary table that summarizes your information in a table. So it takes your categorical values, your variables, and you count how many are in. So remember with the A, O, B, A, B, black type, we can take and say black type O, black type A, B, black type, let's go there quickly. So we can say black type O, we have one, two, three, that is a count. So we just summarize it in that manner. And there is only one count for A, A, B, and there is only one count for B, and there is two for A, B. Those are counts or what we call or what we call frequencies. So if I do it as a table, a summary table, I will create a table like that. And I will put my categories, which are my blood type there, and then I will have my count or frequency, and I will say O, A, B, and A, B. How many O's? I will just count one, two, and that is three. How many A's? One, two, that's two. How many B's? There is one. And how many A, B? There is one. And that is my count or frequency. And that is the summary table. From my summary table, I can calculate the percentage. So percentage is taking the count and dividing by how many there are. So how many there are is a total. So I'll take three plus two is five plus one is six plus one is seven. So that will be seven. And I can double check because there were seven randomly selected questions. So it must make seven. Calculating a percentage, I must take every one of them, divide by seven. But before we calculate the percentage, there is what we call a relative frequency. A relative frequency, I'll call it RF here, is just the value divided by seven. So it's three divided by seven, which gives us three divided by seven, gives us 43%, not 43%, but 0,43. So that is a relative frequency. And if I need to calculate a percentage, I will take that relative frequency and I will multiply it by 100. So a percentage is your relative frequency multiplied by 100. So I will take that multiplied by 100 and that will give us 43%. And that's how you will come, you will create your frequency table. So let's go back to our notes. I explained what a relative frequency distribution is and what the percentage frequency distributions are. And those make up what we call a frequency table. Another example, so if I take the South African retail bank and I counted how many frequency of people who goes into those banks and I can see that 30 people go into the APSA bank, 44 goes into KTEC, 40 goes into FNB and those are our frequency or frequency or what also we call it COUNT. And we can add all of them. So 30 plus 44 plus that you must add all of them and create what we call a total at the end. And I don't know what the total is yet. So just adding all the values now, it's 30 plus 44 plus 40 plus 30 plus 22 plus 34 gives us 200. So total here is 200. To calculate the relative frequency of APSA, we say 30 divide by 200. So 30 divide by 200. So it's the frequency divide by N, which is our sample size. That gives us the relative frequency. So 30 divide by 200 is 0,15. To calculate the percentage, we multiply by 100 and that gives us 15%. So multiply that by 100 will give us 15%. And that is what we call a percentage frequencies. So with the summary table, you are able to say 20% of the people goes to FNB or 40 actual people went into the FNB bank. Or you can say 34 people went into a standard bank or bank with standard bank and 17% of the people, 17% of the people bank with standard bank or 0.17 bank with standard bank. And that's how you use the frequency table. Remember, relative frequency is your frequency divided by the total. So calculate the relative frequency of A and C. So you are given a summary table or a frequency table with 500 random sample size. So if you're not sure that there are 500 here, you can add 160 plus 246 plus 94. And I think the total year should give you 500. Now, with the total of 500, calculate your relative frequency of the A and C. So you will take your frequency divided by your total, which is your N, which then is 160 divided by 500, which is 0.32. Yes, correct. And that's how you will calculate. So when we visualize categorical data, we can also visualize it by means of bar chart. In a bar chart, the bars of the graph will represent the categories, the individual categories. So every bar represent the categories. The height, which is how tall it is, the height of it will represent the frequency. Sometimes we can use a relative frequency or we can use percentage frequency. So the height will represent the amount or the frequency or the percentage. And the other thing about the bar chart is the bars, there is a space in between because they do not, when one starts, the other one doesn't, when one stops, the other one doesn't start because it's category. So individual categories have to have spaces in between. They are not linked to one another. And that is the bar chart. And those are the things that you need to know about the bar chart, that the bars of the charts represent categories. The height represent either the frequency or the percentage or the relative frequency. And with the bar chart for categorical data, the bars have spaces in between. Okay. The other type of visualization we can make is by using a pie chart. So with the pie chart, you need to know that the slices of the pie represents the category. So all the slices, they represent the category. So this slide and that slide and that slide and that slide and that slide, they represent the category. Then how big the slide is is represented by either the frequency or the percentage. So the bigger the slide, the bigger the percentage or the frequency. And that is what you need to know about pie charts. That the slices represent category and the size of the slide represent the percentage or frequency. The other type of a chart for visualizing categorical data is a Pareto. And a Pareto shows the categories in the bars and also the cumulative percentages of those by means of a line graph. And that is a Pareto. Visualizing, sorry, which one of the following graphs represent, representation can be used to visualize a qualitative data. Remember qualitative data, categorical data. So which one? Think about the ones that we spoke about. Is it the histogram? Is it the pie chart? Is it the scatter plot? Is it an orgif? Or is it a frequency polygon? What have we discussed? Which graph represents? It will be a pie chart. I'm sorry, I'm seeing people raising their hands. Apologies, guys. No, it's fine. It's fine. Okay, now we move on into how we visualize numerical data. Numerical data, remember, those are data that can be counted or can be measured. So when we visualize numerical data, we use what we call an audit array, meaning we order the data from lowest to smallest. And once we have ordered the data in that manner, we can create what we call a stem and leaf plot or stem and leaf diagram or stem and leaf display. With numerical data, we can also create a summary table which is called a frequency distribution and cumulative distribution table. It's just a summary table that summarizes numerical data. Remember the frequency table summarizes categorical data? A frequency distribution summarizes numerical data. Once we have a frequency distribution table, then we can create a histogram from that data. We can also create a polygon or we can create an orgif. Later on, when we do chapter 11, we will learn about scatter plots. But also for numerical data, you can create what we call a scatter plot when you have two numerical data and you want to compare the two. So how do we then use an audit array? An audit array, like I said, is when you take your data and order it, put it in an ascending order from lowest value to highest value. If you look at my data, I have day students and I have night students. I have their ages. So you can see that both for day students and night students, the age is ordered from lowest value to highest value. Also for the night, the lowest value is 18, 18, 19. So you can see that it is in order of lowest to highest. So since I have ordered my data from lowest to highest, also you are able to see clearly if there are any outliers. So if there is a value that is way outside of the other values. So let's say maybe there is a 18-year-old who is a day scholar, a day student at a college, that will be an outlier because it will be far away from the rest of the other values. Or let's say I have a 60-year-old who is a day scholar or a 60-year-old who is a night scholar. That person would have been far outside of the other value and that is what we call an outlier. And an outlier, so let's say I have a 60-year-old there, this is an outlier because 60 years is way far from the other of, it's a value that is far from the rest of the other values. That's what we call an outlier. We'll learn more about outliers when we do measures, the measures of central tendency and measures of variation when we when we work with chapter study unit three. Okay so once you have the audit arrays or you audit your data, you can create what we call a stem and leaf plot. And a stem and leaf creates groups of data where it splits the data in terms of the in terms of the stem where every stem will have many leaves. So the stem will always be the first digit and the leaf will be the second. And this is dependent on the type of the data that you have. So for a tenth stem and leaf plot, so it means there are only two digits, one first digit and second digit. That is a tenth stem and leaf. If it's a hundred stem and leaf, the first two digits will be our stem and the last one will be our leaf. So this is our stem and this is our leaf. The same, the first two digits will be our stem and the last digit will be the leaf. If we have a decimal, so let's say 1.2, the value before the comma will be our stem, the value after the comma will be our leaf. If it's 1.1 comma 2, the value, so in terms of the decimal, there should be one decimal after the comma. So it should always be into one decimal. So the first two, the values, the two values before the comma will be our stem and the second, the value after comma will be our leaf. And if we have a thousand stem and leaf plot, the first three values will be our stem and the last one will be the leaf. You can see the pattern here, right? So the leaf, always the last digit is your leaf. The first digits are your stems. Okay, those two data set that we had, the day students and the night students. So if I want to create a stem and leaf plot for the day and night students, remember this is 10th digit because there are only one digit and one digit before and one digit after. So this we call it a 10th stem and leaf. We're going to create a 10th stem and leaf. So it means every value, the first value will be our stem. All those will be our stem, the first digits, and the last digits will be our leaf. So when we create a stem and leaf plot, we need to look at the values that looks similar. So we can see that this is one, one, one, one, one, one for all the stems. And then we have two, two, two, two, two, and then we have three, three, and then we have four. So it means we're going to create one for stem if this is the stem. So it will be one, two, three, and four. Then we're going to create a leaf. And when we create a stem and leaf plot, we always do a line. And then the line will separate the stem from the leaf. And this side will be the leaves. And we take all the digits after the stem. You must repeat all the values. So like for example, this will be six, and that will be seven, and that will be seven, and eight, and eight, and eight, and nine, nine. That will be the end. Then you go to the two. It's zero, zero, one, two, two, five, seven. Then you go to the three. It's two. It's eight. You go to the four. It's two. And that is your stem and leaf plot. That will be your stem and leaf plot for the day student. And you do the same for the night students. Later on, probably next week or after next week, when we do the measures of central tendency on Wednesday, we will look at other questions that in your exam or assignment, they ask you when you're looking at the stem and leaf plot, because they might not ask you direct question to say draw a stem and leaf plot. You're writing a multiple choice question. They might say identify from the data what is the stem and leaf plot. You also need to be able to interpret your stem and leaf plot into a table like this. So if they have given you a stem and leaf plot, this is not a six. This is not a seven, a seven, a eight, a eight, a eight, a nine, a nine. This is 16. You need to read it with the stem. This value is 16. This is 17. This is 17. This is 18. This is 27. This is 38. This is 42. And that's how you read your stem and leaf plot. From your stem and leaf plot, they can ask you question like, what is your highest value? They can ask it even on the table like this to say what is the highest value? What is the second highest value? What is the last highest value? What is the smallest value? What is the middle value? What is like that? They can ask you as many questions on this, which you can use the table. You can convert your stem and leaf into a table and use the data or you can use your stem and leaf because I can see that the smallest value is 16. My highest value is 42. My fifth largest number, I can count one, two, three. When I count the numbers, I don't have to count the three. I only count the leaf, but when I get to that number, so if I need the fifth largest number, so this is large, but the other fifth largest number will be one, two, three, four, five, which is 25. It will be equals to 25. And that's how you're going to use your stem and leaf to answer some of the questions. But when we do the exercises, we will do more so that you can learn how to use your stem and leaf to answer some of the exam questions. Okay, so one of the questions that you might get looks like this. A stem and leaf display describes two digit integers between 20 and 80. For one of this class display, the row appears as follows, which is the stem is five and the leaves are two, four and six, which is 52, 54 and 56 values. What numerical values are described here? Which one represents the values of that stem? Number three. And I just gave you the answer. Number three. It is number three because you need to know that this is 52, 54 and 56. Yes. And that's stem and leaf plot. In the next five minutes, one of you asked a question on the groom. And I think it was one of the assignment question on frequency distribution table. So now, building a frequency distribution table, they will never ask you to build one. They will give you a frequency distribution and ask a question around it. So you need to understand the basics on how to build that frequency distribution. So I'm not going to go into detail detail, but I'm just going to highlight the gist of it as well. So given, for example, we have the manufacturer of insulation has selected 20 random winter days and recorded the highest daily temperatures in degrees. And these are the temperatures. So for example, in this, my data is ordered already, ordered from lowest value to highest. My lowest value is 12. My highest value is 58. So now, based on this information, I need to first calculate the range of the data. So I need to know from my first value to my smallest value to my highest value, what is the distance, which is the range. Don't get confused with what I explained on Saturday when I spoke about the range. I'm going to get there. So this is to help me build the frequency distribution table and create those class width so that I can count how many of these values are in. The first thing I need to know is how big is my data set. So when I calculate, I find the range is 46. So it means from the smallest value to the largest value, they are 46. Okay, the distance is 46. So it gives me an idea. So how many classes do I want to create because I have 46 values? So in some instance, I might say I have 46 values. So I might say I want, because also when you create a frequency distribution, you don't want to create a big, you don't want to have so many class intervals because then you're going to create a very big frequency distribution with not meaningful information in there. So you want to create to be as small as possible. So here I have the distance of 46. So it means I can create five class intervals. So if I'm going to select five class intervals, so it means I'm creating categories of data, which will be five categories from my numerical data. So it means I'm going to group my numerical data into five of those groups. Usually, this is something that in case you want to create a frequency distribution at your company or you're working and you want to show them how to do the frequency distribution. Usually it's between five and 15 because you don't want to create so many classes. Depending on the type of data that you have, if you have a thousand thousands of data, you can create 15. If you have only 10, you need to select maybe two or three groups of data, something like that. For this, we're going to create five classes. So it means we're going to group the data into five groups. So to create a class width, I'm going to take my range, which is 46 and I'm going to divide it by five. And when I do that, I will get a value of 46 if I go there so that you don't ask me a lot of questions. Let me clarify my response before I do that. So if I take 46 and I divide that by five because I want to create five groups. Yeah, I'm going to kill meds, slaughter meds. You know when they say people slaughter meds. 46 divided by five is 9.2. Now I can use 9.2 to create my class width. So automatically when you use a program, a statistics program, it will create the bins. We call them bins in a statistical programs. Or if you're using Excel, you create what we call bins. Those bins, not bins as in like the bins we eat. So we call them bins like B-I-N-S. So creating those bins, if I use a statistical program, it will create them. And I can say create only five class width and it will create because I'm doing this manual. And I want to make sure that my class width and my upper boundary and my lower boundary are as clean as possible. So 46 divided by five gives me 9.2. I can either say I'm going to create the class width or the distance to be 9. Or I can take a decision and round it up to the nearest tenths because then my start point and my ending point will be cleaner in that way. There's nothing wrong with creating 5.5 to 10.5 or 9.2 to you can do it that way. But for the purpose of the demonstration so that nobody asks me this, that 46 divided by five is not 10. I've rounded it up and I killed METs in this way. I slaughtered METs to the highest degree because when we round up, it's still going to be 9. But for me, I'm going to create the width of 10. So it means the distance between my lowest value and my highest value will be 10. So since I know that I am going to create 10, so I can look at my data and say, what is my starting point? So if my starting point is 12, what is my highest point and my highest point is 58. So if I create 10s, then they will be included in that. So I will start with 10. And I'm going to add, since I'm starting with 10, I'm going to add this 10. So this is my start. And I'm going to add my interval width, which is 10 to that. And that will give me 20. So it means my first one will start at 10 and it will end in 20. Now, when you create a frequency distribution table, and I know that we left with two minutes, I'm going to wrap up just now, when we create this, we can put a dash there. Therefore, it means the value should include 10. It must include 10, but it must not include 20. So that the next value can start at 20 and I add 10 to it and it will end at 8. So this one will include 20, but this one not include 30. Because the next one, the next value will include 30 and it will end at 4, because I'm just adding. So if I want to know what is the distance between 20 and 30, I can just say 20 minus 10, which gives me 10. If I want to know what is the distance between 30 minus 20, which will give me 10, which is the same as my class width, which is the same as the distance between my upper boundary and my lower boundary, which gives me my width of the table. Okay. I hope you got that. But now, after I have calculated that, then I create my classes and my classes 10 less than 20, 20 less than 30. And I do for all of them up until because I said I'm creating 5, I will create 5 and it will include the last one will be 60. Since I started with 10, I will end with 60 there. And those are my class width. Now, I need to take my class width and start counting the observation and assigning them to the class. So the observations are all these values. So I would say any value that falls within 10, but not less than 20, then I start counting 1, 2, 3, because 21 is bigger than 20. So it's only 3. So in this, the count is 3. And then I move to the next class, 20 to 30, 1, 2, 3, 4, 5, 6. It does not include that and that will be 6. And like that, like that, like that, until I complete the whole table. That will be my frequency. If I add all the frequencies, I will get the total of 20 because I know that there were 20 days. Calculating the percentage like we did with the summary table is your frequency divided by the total, which gives you 15 percent for every one of them. Therefore, it means those winter days that were the temperature was 20, but less than 20, less than 20, that were 10, but less than 20. There were three of them or 15 percent of the days had a temperature between that. Two days were between 50 and 60, or 10 percent of the days had the temperature between 50 and 60. And that is frequency and percentage. Calculating the cumulative frequency, which in your assignment they gave you this, you need to be able to use your frequency. From your cumulative frequency, you can go back and calculate what the actual frequency is. Cumulative frequency, from the beginning, the cumulative frequency will be equal to that because it's 10 less than 20, which means any days less than 20, there were only three of those. Coming to the second one, we're going to say three. We normally don't use the three from the frequency because it's going to confuse you. We take the three from the cumulative frequency, we add the six from the second class interval, and that will give you nine. Three plus six gives you nine, and you do the same. So we take nine, we add five to it, and it will give us 14. What does that mean? It means the following. I am wrapping up. This is the last thing that I do. Then we are done. What does that mean? It means in this frequency, cumulative frequency of 14, it says any temperature which is less than 40, which means it includes 30s, and 20s, and 30, everything that is above there. So it's all these frequencies. That is the cumulative frequency. So if in the assignment they're asking you what is the frequency of 30 to 40, you need to be able to say 14 minus nine, which is equals to five. 14 minus nine will give you five, and that is your frequency. And cumulative frequency of cumulative percentage frequency, you take the value you divide by the total. You take nine, divide by total, and that will give you the cumulative frequency. And that is your frequency distribution. And with that from here, I'm going to stop today's session. On Saturday, we will continue where we left off. Let me just go there. We will continue where we left off and do the histogram. And we do the last two, which is the ogive and the polygon and the scatter plot. And then we do lots of other exercises. So let's just finish up and recap on in this one minute that I'm going to give myself. We learned in the beginning, chapter study unit one, what is statistics? We learned the types of variables. We learned the levels of measurements. And then we went and looked at how we summarize the data in terms of the categorical data and also in terms of the numerical data. So on Saturday, we will start with recapping again and revising and refreshing our memories, because we're going to do a lot of exercises and then continue where we left off and then do lots and lots of exercises in order for you to be able to know what you need to do in your assignment. To complete your assignment one. With that, if there is any question, I am going to stop recording and I am going to leave. Unless there is any question regarding the slides, I'm going to get out of the slides now. You will share the recording once it's completed. Yes, I will share the recording. I will give you questions on Saturday.