 Hello, we will now discuss frequency tables. So frequency table or distribution as it might be known as, groups data into several categories referred to as classes, sometimes they use the word bends instead of classes, by listing all of the categories along with the number or frequency of data values in each of them. So for instance, if you're a teacher with a bunch of exam scores ranging from anywhere from a 30 to 100 percent, you can't just list every single exam score and how many times it occurred. No, you group the data into groups or classes or bends, which is the proper terminology to use. So the anatomy of a frequency table. So what I have displayed would be a frequency table that shows you the age of lottery winners from 20 to 29, 30 to 39, and so forth. So the left column represents my classes or my bends. Then my right column represents frequencies. So as you can see from the 20 to 29 age range that has a frequency of two. For 70 to 79, there's a frequency of nine. Lower class limits are the smallest numbers belonging to each class. So if you look at your classes or your bends, you look at the smallest number for that specific category, 20, 30, 40, 50, 60, 70, these are called the lower class limits. In contrast, the upper class limits are the largest numbers that can actually possibly belong to each of the classes. So 29, 39, 49, 59, 69, 79, these right hand numbers, those are called the upper class limits. The class width. That is the distance from or the difference between the lower class limits of two consecutive classes or the upper class limits of two consecutive classes, like 20 and 30, there's a difference of 10 there, 30 and 40, there's a difference of 10, 40 and 50, there's a difference of 10 there. The class width is 10 in this case. Remember, you look at the lower class limits of two consecutive classes and subtract. Upper class midpoints are found, literally you take the lower class limit plus the upper class limit and divide by two. Lower class limit plus upper class limit divided by two. You're averaging the limits of your class or your bend. So 20 plus 29 over two, that's where the 24.5 came from. The 34.5 came from 30 plus 39 divided by two, and so forth. Those are class midpoints. Class boundaries. See the issue with frequency tables, depending on how you make them, because different textbooks have different preferences, notice that my first class cuts off at 29 and my next class cuts off at 30. Well, when we're talking about ages, using years is fine, but what if we were dealing with data where you had something like a 29.5 in there? Well, what class would 29.5 belong to? That's why we have to have class boundaries. So what class boundaries are is you literally take your upper class limit of one class, your lower class limit of the next class, and you average them. So like 29 and 30, you average them to get 29.5, 39 and 40, you average them to get 39.5. That's the halfway point between the upper class limit of the first class and the lower class limit of the next class. In general, to find class boundaries, if your data consists of whole numbers, so for whole number data, you can add or subtract, this is another way to find your boundaries, you can add or subtract .5 from your limits, from your class limits. That would still get you the same data values listed here, the same boundaries listed here. For data with one decimal place, you would have to add or subtract .05 from your class limits. So however many decimal points your data go out to, you subtract one more decimal place with the five at the end. So for no decimal places, you subtract .5. For one decimal place, you subtract .05 or add to each of the class limits. So it's not really a big deal to be able to master finding class boundaries. The important thing is that you're able to create your classes. Like I said, people have different preferences for how to create frequency tables. This depends on who you're dealing with. So with frequency tables, sometimes they'll use what is called interval notation for the classes. For instance, if you see 20 to 29 as a class, they may write 20, 29 enclosed within brackets. You've likely seen this before, it is called interval notation. It's the same thing as saying anything from 20 to 29, including 20 and 29 themselves, is class 1. 30 to 39, you have 30, 39 enclosed within brackets. It means the same exact thing. Another way we might do this, because we have to deal with boundary values overlapping with each other, like 29 and 30, what do you do with that 29.5, if that was a possible data value to have? Well, one way to do this is to still keep interval notation. They say 20 to 29, let's write that as 20 to 30 and the next class is 30 to 40. As you can see, the upper class limit of the first class and lower class limit of the second class are identical. So what if you have an age of 30, where is it supposed to go? So to counter this, what they do strategically here is they put a bracket around the lower value in each class, that means you include that value 20 in the first class, and they put a parenthesis around the upper class limit of each class. What that means is that 30 does not belong to the first class, but anything smaller than 30, but greater than or equal to 20 does, and class 2, this parenthesis by the 40. That means 40 does not belong to that class, but everything smaller than it does, 40 actually belongs to the third class, that's why it has the bracket and so forth. So these are just looking at all the various ways these frequency tables can be created. All a relative frequency table is, it includes the same class limits, it includes a frequency column, but it also calculates the relative frequency, which is the percentage that something occurs. So it's a percentage frequency, or a fraction or decimal frequency. So for instance I'm dealing with exam scores, I have my classes or bins listed in the first column, then I have my frequencies listed in the second column. So notice that I have how many students here? Looks like I have 35 students, right? Alright, so to find the relative frequency, you look at, okay, class 1 has a frequency of 1, 1 out of 35 of the students or the exam scores belong to class 1. 1 out of 35 is 0.029, which is 2.9%. Second class, 4 out of 35 is 0.114, moved a decimal to the right two spots, that's 11.4%. 3, you have 7 out of 35, that's 0.2, so that's 20%, and so forth. And what this last column is, is it's called a cumulative frequency, it says, okay, class 1, your cumulative frequency or total frequency of all the classes is 2.9%. Everything in class 1, which is the only class we have. When we get to the cumulative frequency for class 2, it includes the total frequencies for class 1 and 2. Everything, class 2 and before it. When we get to the cumulative frequency in class 3, well then we're adding up class 1, 2, and 3. That's how cumulative frequency works. It adds up the current class and everything before it. So this class 4, this cumulative frequency includes everything from class 1, 2, 3, and 4. And notice class 5, you finally get to 100%. Depending on how you rounded, depending on how you rounded, might be 99% or it could be 101% your cumulative frequency. Sometimes rounding errors cause us to get slightly different than 100%, just to warn you. So use the relative frequency distribution of exam scores to find the indicated information. Where are the lower class limits? Who are they? Remember that is the smallest number in every single class. We have 5 classes, we need 5 lower class limits. So we have 50, 60, 70, 80, 90. What about the upper class limits? The bigger number in each class. So we have 59, 69, 79, 89, and 99. Then the class width. Remember that is the difference between the lower class limits or the upper class limits of 2 consecutive classes. What is the difference between 50 and 60? 60 minus 50 is 10, 70 minus 60 is 10, 10, 10, 10 is the class width. Class midpoints. Remember how we said to find these? We said you add up the lower class limit and upper class limit of each class and you divide by 2. So for class 1 we are dealing with 54.5. So I have 54.5, here is a little hint. Instead of you averaging every single lower class and upper class limit, you can just add 10 to 54.5 to get 64.5, then they get 74.5, 84.5, and 94.5 and literally just add the class width. And there is your 5 class midpoints. Alright, class boundaries. Class boundaries. Remember I said if you have whole number of data you can literally add or subtract .5 from your limits, from your class limits. So for instance you literally take 50 and subtract .5 and you will get 49.5. So my class boundaries will be 49.5, add 10, 59.5, add 10, 69.5, add 10, 79.5, add 10, 89.5, add 10, 99.5. So notice that these are the boundaries for my classes that began at 50 and end at 99. So you have a lower boundary of 49.5 for the very first class. Notice your cap out for the last class, the boundaries 99.5. These boundaries, well guess that, they're your class boundaries. They separate each of your classes, both below and above. Next, some tips for collecting data, kind of some fun information. Reported data is where people are actually asked to give data directly to those collecting it. So I ask you what your weight is and you tell me. When data is actually measured, those actually collecting data actually measure the data themselves. So instead of me asking for your weight, I have a scale and I'm like step on the scale, let me get your weight. Of course you should never really ask anyone what their weight is. It might get slapped, potentially. So I hear here the weight of 100 people that was collected and the results of the last digit of their weight are shown in the table. So last digit of the weight was a 0, I had a frequency of 46, 1 had a frequency of 1. Last digit of the weight was a 5, that's 30 people, so that's how all that was distributed. Remember, that's the last digit of the weight. So notice 0 and 5 had really high frequencies. Do you think all those people truly had weights that ended in 0 and 5? Do you think this data was actually measured or was it just recorded? I would say it was recorded. I don't think I had a scale and I was weighing all 100 people because there are so many people that had a 0 and 5, meaning when you ask someone their weight, they typically will likely give you some sort of nice, pretty number. So there's lots of 0s and 5s for last digits. So I'd say the data is recorded because there's a lot of people that gave you a weight that ended in a 0 or 5 because we were just so used to nice, pretty numbers. I may weigh or someone may weigh 160 pounds or 163 pounds, but when someone asked me my weight, I could tell them 165 or tell them 160 or I could totally lie and tell them a totally different number. So is the data accurate? No, the data are not accurate in this case. Sorry.