 In order to learn mathematics effectively, it's vitally important to understand the terms of mathematics. Unfortunately, mathematicians are terrible at coming up with new names for things, and so we tend to recycle old names. But we use these names in a very specific way. So remember, definitions are the whole of mathematics. All else is commentary. In statistics, one of the fundamental definitions is that of data. Data values consist of information gathered from observations. Now there's an unfortunate tendency among human beings. The more important something is, the more ways we have to talk about it. And what this often means is that we'll use different terms to refer to the same thing. So instead of data, we might talk about measurements. A measurement consists of information gathered from observations. And you'll notice that data and measurements are both defined the same way, and to a mathematician, that means these are essentially the same thing. Some examples. You might record someone's ethnicity, Lithuanian. You might measure someone's height, 6 feet 4 inches. You might find out their favorite type of music, gamelan. And you might determine their academic rank, 57th in class. Two other important ideas are those of population and sample. The set of all data values of interest form the population. A portion of the data values of interest form the sample. For example, if you're interested in the heights of all students at a university, the heights of all students at the university is the population, the heights of all students in a class is a sample, the heights of all students in the quad is a different sample, and if you want to take it to extremes, the height of a single student is a very small sample. This notion of sample and population allows us to describe the goal of statistics given the data values in a sample infer the data values in the population. So if we're interested in the heights of all students at a university, you could just get the heights of all the students. But that takes a lot of work. So we could do the following. Given the height of one student, what can you say about the heights of all students? Or given the heights of all students in a particular class, what can you say about the heights of all students? And in both cases, we have the data from a sample, either one student or all students in a particular class, and we want to infer information about all students. Now data is typically classified into four different types of values. First, there's nominal value. This is a named value. For example, red, Republican, Russian, these are all names. Next, there's ordinal. These are nominal where the order matters. For example, first place, least liked, neither approve nor disapprove. Beyond that, we have interval value. These are ordinal values and the difference between the values is meaningful. In particular, the same difference will have the same meaning. We'll take a look at what that means later on. For example, the temperature in degrees Celsius. And finally, ratio. This is interval value where the ratio between values is meaningful. For example, weight. Now it's easy to confuse ratio and interval data, and the important idea here is that in ratio data, zero remains zero if you change the units. In interval data, zero could change if the units change. So a temperature of zero depends on whether you're speaking Celsius or Fahrenheit, but a weight of zero doesn't matter whether it's pounds, ounces, or kilotons. For example, suppose you record someone's nationality. For whatever reason, it's easier for human beings to figure out when something is not than when something is. What that means is that it's easiest for us to work our way backwards through the types of data and eliminate those that we are not. So let's consider our re-ratio data. In order for this to be ratio data, we have to be able to form a ratio between nationalities, and this doesn't make sense. You can't divide one nationality by another. Could this be interval data? In interval data, we have to be able to talk about the difference between values, and that means we have to be able to find one nationality minus another. But we can't do that. You can't subtract nationalities. In order to be ordinal, we have to be able to put things in some order that matters. But again, if we have nationalities, we can't really put them in any reasonable order. In particular, we can't get people to agree on what the order of the country should be. So that leaves nominal, and the question is, are the nationalities names? And in fact, nationalities are named values, and so the nationality of a person is nominal data. Suppose we record their height. So again, we'll go backwards through our data types. Let's see if this is ratio data. So the key question here is whether the ratio between values is meaningful, and the answer here appears to be yes. If someone is 6 feet tall, and another person is 5 feet tall, the ratio of their height, 6 to 5, is a meaningful value. How about something a little bit more complex, like someone's academic rank? Is the data ratio? Not really. If someone is 15th in class, and another person is 30th in class, the ratio of 15 to 30 doesn't really tell us anything useful. Here it seems like there might be an argument made that this is interval data, but the important thing to remember about interval data is that the same difference has the same meaning. So your academic rank is based on your GPA, but there might only be a small difference in GPA between the person who is 1st and 2nd, but a larger difference between the 2nd and 3rd ranking student, or possibly even a smaller difference between the 2nd and 3rd ranking students. So here the same difference, one person, is going to have different meanings. How about ordinal data? Well, in ordinal data the order matters, and yes, this does seem to be ordinal data because there is a definite order. The person who is 15th is better academically than the person who is 16th.