 So if the goal of statistics is to go from information about the sample to information about the population, it's helpful if we're not overwhelmed with the amount of data in the sample itself. So what we'd like is some way of summarizing the information in the sample, and this leads to what's known as summary statistics, and the first portion of that is known as the measures of center. So one way of looking at the basic problem is the following. Suppose we have a set of data t. This set could include thousands or even millions of data values. We should treat each value as a unique individual. But in practice we can't, so we pick a representative. And this leads us to one useful definition. A statistic is a value that represents some feature of a set of data. So how can we do that? Well, let's work with some actual data. Suppose we have ten students take a quiz, and the quiz scores are four, five, eight, nine, nine, seven, ten, nine, two, and eight. So we want to pick a representative value, and so the important question is what shall we pick as a representative? So the least obvious answer is we'll apply some arcane mathematical formula to these numbers and obtain the number 7.1. But how is 7.1 representative? It's not even a data value. That would be like picking a multi-millionaire to represent you in the government. Who would do something like that? So there are much better answers. For example, we might pick nine. Since more students got nine than any other, nine is a representative value. Another possibility, maybe we could pick two as a representative value since every student got at least two. By a similar argument, we might take ten as a representative value since every student got ten or less. This analysis suggests three important statistics, which we might call the three M's. First, there's the mode. Given a set of values D, the mode is the most common data value. The mode is probably the most natural and obvious of the measures of center if the data value is held in election, the mode would be the winner of that election. We might also talk about the minimum given a set of data values D. The minimum is the least data value. And likewise, we could talk about the maximum given a set of data values D. The maximum is the, wait for it, greatest data value. The maximum and the minimum together suggest another measure of center which is known as the mid-range. The mid-range is the value midway between the minimum and maximum values. For example, let's try to find the mode, minimum, maximum, and mid-range for our set of data values. Definitions are the whole of mathematics, so let's pull in our definition for the mode, the minimum, the maximum, and the mid-range. So if we look at our set of data values, we see that 8 appears more than any other value. It's the mode. The least of the data values is 1. That's the minimum. The greatest among the data values is 10. That's the maximum. And the mid-range will be the midpoint of 1 and 10. That's 1 plus 10 over 2. Or 5.5. This is the mid-range. Now there are a few special cases we have to deal with. If we find the mode, we're looking for the most common of the data values. But every now and then we'll run into a data set like this one. If we look at this data set carefully, we see that both 2 and 3 are equally common, and they appear more commonly than any of the other data values. So which of these two is most common? In a case like this, we say this set is bimodal, and that the modes are 2 and 3. What about a set like this? Here we see that several values are equally common, and appear more commonly than the others. Typically we'll say a set like this has no mode. In some sense there's too many things that are competing for most common, and nobody wins. Now if we accept an answer like no mode exists, then the mode can be calculated for any data set, because all we have to do is find out which data value is the most common. But the minimum, maximum, and mid-range can't always be computed, at least not in any meaningful way. For example, suppose we take a survey over favorite colors, and we code our responses as 1 being the color blue, 2 being red, 3 being green, 4 being puse, no idea what color that is, and 5 being none. And so our responses might look like this. Now we can look at this data and say, well, our minimum is 1, the maximum is 5, and the mid-range is 3. But these values have no meaning, because the data is nominal. The numbers 1, 2, 3, 4, and 5 are just names that mean blue, red, green, puse, and none. There's no order of the color, so a minimum or maximum color doesn't make any sense. And since there's no order, the middle color doesn't make any sense either.