 Our first stop in this module will be classifying data. So first off, statistics by definition is using data from a sample to make inferences about a population. So when we say population, I'm talking about everything being considered or everything being talked about, like for instance, everyone in the United States. In the United States sample, I'm talking about some of the population. So for instance, maybe there was an election coming up and you wanted to pull all of the registered voters in the United States on who they would vote for for the next president. Well, it's kind of impossible to go out and ask every single registered voter in the United States. It could be expensive, take a lot of time. So what you have to do is you have to pick a sample, only some of the population to create predictions and get that data. So there are several different types of data. Let's talk about some of the different ways we can categorize data. So first, there is parameters versus statistics. A parameter is a number that represents or uses everything or every person or every subject in the population. For instance, a class is asked to raise their hand. My class in this case is my population. It's what's being taken into consideration. They're asked if they like math and 40% raises their hand. Since everyone in the class and my population was used, this 40% is a parameter. You have population, parameter, they both start with p. A statistic is a numerical characteristic of a sample. So in other words, only some of the subjects from the population were used to create the piece of data or the numeric description. For instance, voters are randomly asked, that's your key there that you're dealing with the sample. You're taking all the voters, the population of voters, you're randomly asking some of them who they will be voting for in an upcoming election. 40% plan to vote for Canada A. This makes this a statistic because only some of the population was used, meaning we had a sample, meaning the calculation we get from that is a statistic. So classify each of the following as a parameter or statistic. You have a college algebra class. That's what's being taken into consideration here. And that class has a mean score of 82 on its first exam. So you have your numeric descriptor here, this 82. To find that 82, they used everybody in that class. It took everyone's exam score and found the average. As a result, because everyone in the class was used, this 82 represents the entire population, everyone was used, so we have a parameter. In part B, a man tracks his steps for one week of a year and says his average weekly step count is 39,000. So we have this 39,000. How was this 39,000 calculated? Well, this is the average weekly step count. Did we use all 52 weeks in the year? No, we only used one week. We used one week to create this 39,000, which we believe to be the average weekly step count. As a result, this 39,000 is a statistic. We only used one week of the 52 weeks in a year. Now part C, a Honda dealership has 50 used cars. The mileage of 40 cars is added together and divided by 40. The result is 42,000. So here I have my Honda dealership with 50 used cars. And the result is 42,000, the average mileage. Well, did I use all 50 cars to find that calculation? No, I used 40 cars. So as a result, this is a statistic. I used a sample. I used a piece of the population. Only 40 out of 50 cars were used. So that's parameter versus statistic. Next, quantitative versus categorical or qualitative data. So we can take data and break it up into two categories. You have quantitative and you have qualitative. We'll define these in just a minute. If data are quantitative, data is a plural word, if data are quantitative, then it can be further classified as discrete or continuous. Also, that will be defined on one of the future slides. So qualitative variables or categorical variables or data. So if you look at the word qualitative, you see the beginning of the word. It looks like the word quality almost. And that's what qualitative variables are. You're looking at qualities, attributes, characteristics. For instance, someone's gender, someone's state of residence, someone's shirt color, anything that identifies somebody. It could even include their social security number. That's an identifier. That's an attribute or characteristic. It's a label given to something. Next is quantitative. If you look at the beginning of the word, you see the word quantity. That's exactly what quantitative variables are. They represent a quantity. They are numerical measurements or counts of individuals. There are values that can be added or subtracted. And there is significance in that sum or difference. So like number of shoppers in a grocery store, length of a song, how much rainfall you have in a day, how many siblings someone has, the temperature, and goals scored by a soccer player. Those are all examples of quantitative variables. They are either counts or measurements. So let's look at quantitative variables. I have my examples listed here from the previous slide. And let's explain what a discrete variable or what discrete data are. Discrete variables are countable values. So the only possible values for something could be one, two, three, four, five, a million, a million, one, and so forth. They're nice, neat, whole numbers basically or natural numbers. You could have zero in there as well. But the point is you have these nice, neat, countable numbers. There's no fancy decimals or anything like that. So one, two, three, four. If I say, for instance, five, you know immediately the next number that would have to happen would be six. There's nothing in between. Whereas continuous variables or continuous data are measured. They're measurable, they're measured. So like someone's weight, height, speed, et cetera. The key to think here is when your data can assume decimals with as many places possible. So if I had data values like two and three, if you have a continuous variable between two and three, there exists infinitely many other data values. So you could have like 2.25. You could have 2.573. You could have 2.921 and so forth. So between two and three, there exists all these different data values, infinitely many of them. And it could be the one decimal place, two decimal place, three decimal places. For instance, if I'm measuring the height of a plant on the ground, you could say it's one foot. Well, I could say it's 1.12 feet because I wanna be more precise with my measurement. So continuous variables, they take on decimals to as many places as could be discussed or could be possible in between any two data values. There exists infinitely many. So once again, these are measurements. So number of shoppers in a grocery store, that's one, two, three, four, five, six, and so forth. That would be discreet. This countable whole numbers. Length of a song, you could say a song is two and a half minutes or 2.5 minutes. And I could say no, it's 2.512 minutes. I wanna be more precise. So length of a song, that's a measurement. It's time continuous is what that is. Number or the daily rainfall amounts, that is also continuous. It rained one and a half inches last night. No, it rained 1.572 inches last night. You can be more precise with that measurement to your liking. Number of siblings, one, two, three, four, five, takes on values of nice, pretty whole numbers. Therefore, a number of siblings is discreet. Temperature, well, 98.6 degrees Fahrenheit. I say 98.62, you could have decimals to as many places as you want when you're talking about temperature. And then goals scored by a soccer player, that's one, two, three, four, five. Now it was a little hint for you as you go through your homework. It's probably wise that even if you get a question correct, I would regenerate the question and look at more examples so you can practice labeling them as discreet or continuous because seeing examples is the best way to understand classifying data. So even if you get the question right, please regenerate the question. Look at more examples. All right, so let's look at quantitative versus qualitative data. Here I'm classifying, if the data are quantitative, then I'm going to classify as discreet or continuous. So Jamie runs 10 laps around the track. If I'm at the starting line of track and Jamie's running around, she crosses the starting line once, that's one lap. She crosses it again, that's two, three, four. I'm counting every time she crosses that starting line. So as a result, first, this is quantitative because it's an American form. Not only that, it is discreet because we're dealing with nice pretty whole numbers. What about the color of Jeff's car? Is that numeric? Nope. As a result, it's an attributer characteristic that has to be qualitative. Next, the area of a piece of property is 5.5 acres. When you start seeing those decimals going around, well, first off, we do know for sure that our data would be quantitative in this case. Is it discreet or continuous? Well, decimals means it's a good possibility it's continuous. Well, 5.5, I could say 5.51. You could say 5.512 because of the decimal place deal we have continuous here, quantitative continuous. Next is levels of measurement. This is a totally different way of classifying data. So these are literally like a hierarchy where the most basic or generic form or label of that is nominal, then ordinal, then interval, then ratio. It's an order, it's in hierarchy. Ratio is on the top and nominal is on the bottom, so to speak. Nominal level of measurement contains data that cannot be ordered nor can it be used in calculation. So it's non-numeric. We're talking about names. We're talking about labels. We're talking about categories. We're talking about survey responses of yes, no, or undecided. We're talking about social security numbers. Those are all nominal. They're names, labels, categories. Ordinal means data can be ordered. Not only can it be ordered, but the order has significance to it. Differences between data values cannot be measured, so you can't subtract the data values. There's no meaning there. So what this means is, if you had course grades of A, B, C, D, or F, you could label those, but you can't subtract them. That doesn't make any sort of sense. So this is an example of ordinal data. You can order the data, it makes sense. You could also order it from F, D, C, V, or A also. Interval level of measurement. You have a definite ordering, but there's no zero starting points. So you could have negative numbers, or numbers that happened before zero. So the differences between data values can be measured and they are meaningful. So years, for instance, that's an example of an interval level of measurement. So the reason for this is because there's no zero starting point, because before our AD years, we had BC. We had BC, so there's no zero starting point. So 1000, 2000, 1776, 1492, those are all interval level. Next, the ratio level of measurement. It contains data with a starting point. So you have a zero starting point. The data can be ordered and differences are meaningful. So basically it's the same thing as the interval level except the key thing here is we have a zero starting point. So like prices of college textbooks, you have a zero starting point where zero represents no cost and you can compare data values. Like a $100 book costs twice as much as a $50 book. So you're able to work with the data and make comparisons. There's a zero starting point. That's the key for ratio. There is a zero starting point. So remember nominal, it's categories which cannot be ordered. Ordinal is categories that can be ordered. So nominal and ordinal kind of go together very closely with each other. The difference is ordering. Interval and ratio kind of go together. The difference between them is the zero starting point. Interval has no zero starting point and ratio has a zero starting point. So interval and ratio, those are numeric data values typically. So levels of measurement classify each of the following as nominal, ordinal, interval or ratio. So weights of a dog. Well these are numeric, therefore I'm dealing with interval or ratio. When you're looking at the weights of dogs is there a zero starting point? Yes, there is. You can't have negative weights. Zero means something weighs nothing. So you have a zero starting point. As a result, this is ratio. Next, letter grades in a college algebra class. Well, we already said letter grades fall under what category? Well, since they're non-numeric and they can be ordered, you have ordinal. You said letter grades are ordinal. Years that are leap years. Well, what did we say about years? We said there is no zero starting point. No zero starting point, which means interval. This is a useless trivia fact for you. I always thought leap years happened every four years but like 1900, 1800, 1700, those were not leap years. 2000 was and 1600 was. So leap years are pretty much every four years except in the case when a year is divisible by 100, once it's divided by 100, it must also be divisible by four. Now color of cars in a parking lot, this isn't really numeric, it's just categories or labels and you can't order it. So we're simply just dealing with nominal. So nominal is the most basic. So let's do a little extra example here of levels of measurement. So what about time and minutes that has passed while waiting for food at a restaurant? Is there a zero starting point? Yes, you can't have negative time. So the data values are numeric, there is a zero starting point, so guess what? That's ratio. What about ranks of cars by a car magazine? Well, this is actually going to be ordinal because there is orders and rankings. Length of a side of a rectangular pen. When you're dealing with length, there is always a zero starting point. There's always a zero starting point. It's numeric, so we're dealing with ratio again. Mood levels of happy, all right, and sad. So you could, these are non-numeric, so instantly I'm either nominal or ordinal. Can you order? Yeah, you could actually order these happy, all right, and sad. You could order the status or these responses and buy levels of happiness. So this is ordinal. You could order them based on levels of happiness or level of sadness. I could put them in an order and then if I gave them to you, you would be able to put them in some sort of logical order. Therefore, we're talking these are ordinal. What about social security numbers or barcodes? Those are numeric, but they're not interval or ratio because if you subtract social security numbers, there's no significance there. For instance, you could be born after me, but you have a social security number that happened before me. So social security number is just meant to be an ID for us. And that's the same thing for a barcode. It's an ID given to a product. So as a result, we're not interval or ratio. We're not ordinal either because the order of social security numbers has no significance. Like I said, you could be born after me and have an earlier social security number. Therefore, this is kind of tricky. The level of measurement is nominal. So like I said, please, please, please, if you get the homework questions right, regenerate the exercise, try it out again, write down the examples, write down what they are so you can study them and refer back to them because that's the best way to understand this. So I hope you enjoyed classifying data. Thanks for watching.