 A number of fundamental ideas and probability and statistics emerged from 19th and 20th century Russia. And one of the most far-reaching is something known as Chebyshev's theorem. Chebyshev was a 19th century Russian mathematician who discovered and proved a very important and useful result. There's a number of ways to state this result and apply this result, but for our purposes we can use version 1.0 given any set of data with population standard deviation, sigma, and any value of k greater than or equal to 1. Then at least 1 minus 1 over k squared of the data values are within k standard deviations of the mean. Let's see what this means. In general, we can pick a value of k and find the fraction 1 minus 1 over k squared. Then Chebyshev's theorem guarantees that at least this fraction of our data values will be within k standard deviations of the mean. We might have more data values in this interval. We will never have less. That is an absolute guarantee. For example, suppose a set of test scores has a mean of 70 and a standard deviation of 5. What fraction of the scores are within two standard deviations of the mean and what scores will be included in this interval? So we'll open up the packaging. By Chebyshev's we have k equals 2, that's two standard deviations of the mean. We have our standard deviation itself equal to 5, and so we know that 1 minus 1 over k squared, that's 1 minus 1 over 2 squared, or 3 fourths, so we know that at least 3 fourths of the scores are within two standard deviations of the mean. What scores are actually going to be in this interval? Well, since these are the scores within two standard deviations of the mean, this interval will include scores from two standard deviations below the mean, that's 70 minus 2 times 5, to two standard deviations above the mean, that's 70 plus 2 times 5. In other words, the scores from 60 to 80. The remarkable thing about Chebyshev's is that all we have to know about a data set is the mean and the standard deviation, and we can say quite a bit about how the values fall within that data set. So for example, suppose a set of test scores has a mean of 72 and a standard deviation of 4, what can we say about the fraction of students who got a 60 or higher? Well, if you don't play, you can't win. So Chebyshev's starts with a set of data value with a population standard deviation and any value of k greater than or equal to 1. Let's just pick a random value of k and see what happens. So if k equals 2, then we can do our computation, 1 minus 1 over 2 squared, that's 3 fourths, so at least 3 fourths of the scores are within two standard deviations of the mean. These will be scores between 72 minus 2 standard deviations to 72 plus 2 standard deviations. That's between 64 and 80. So we can say that at least 3 fourths of the scores are between 64 and 80. Well, that tells us something. It doesn't really give us information about 60 or higher, since our interval only goes down to 64. So let's try a different value of k. So let's try k equals 3. If k equals 3, then 1 minus 1 over 3 squared is 8 ninths. So at least 8 ninths of the scores are within 3 standard deviations of the mean. That's going to be between 72 minus 3 standard deviations and 72 plus 3 standard deviations. So those are scores between 60 and 84. Now, since all of these scores are 60 or higher and there may be some even higher scores, we can say that at least 8 ninths of the scores were at least 60. One of the things this allows us to do is it allows us to compare scores that come from different populations. So for example, suppose one student gets an 80 and another student gets a 70. Getting the higher grade might not mean a lot if the class is easier. So maybe we know that the mean of the first class is 70 and the mean of the second class is 65. But even knowing that might not be enough. So let's say we know the standard deviation as well, 5 and 1. And now that we have all this information, we can ask and get a meaningful answer to the question of which student did better relative to their class. So we might proceed as follows. If k equals 2, then 1 minus 1 over 2 squared point 75. So at least 75% of the exam scores must be within two standard deviations of the mean, or between 70 minus 2 times 5 and 70 plus 2 times 5, 60 and 80. So this means that the first student who got an 80, which is right at the top of the range, did better than at least 75% of the students. So you might say they did real good. But that would be wrong. You should say they did really well. But how about the second student? In the second class, if k equals 5, then 1 minus 1 over 5 squared equals 0.96. So at least 96% of the exam scores must be within five standard deviations of the mean, or between 65 minus 5 times 1, 60 and 65 plus 5 times 1, 70. And since this student got the 70, they did better than at least 96% of the students. And so that means that the second student did better relative to their class than the first student. The thing to notice here is that for large values of k, then the fraction within k standard deviations of the mean is going to be very high. So for example, if k equals 5, then 1 minus 1 over 5 squared is 0.96. So at least 96% of the data values must be within five standard deviations of the mean. If k equals 10, then 1 minus 1 over 10 squared is 0.99. So at least 99% of the data values must be within 10 standard deviations of the mean. So Chebyshev's is an example of a very important idea. Values far from the mean are very unusual. Now again, the remarkable thing about Chebyshev's is that the only thing we have to know about our data set is the mean and the standard deviation. But at the same time, it doesn't give us a lot of very precise information. So we know that at least some percentage is within k standard deviations of the mean, but it'd be nice to know in a more exact amount. For that, we have to know a little bit more about the distribution of the data values.