 Now, Markov's inequality requires only knowing the mean, but it's also very limited. It only gives us a bound on the probability that x is greater than or equal to some number. And if the number we're looking at is less than or equal to the mean, the bound isn't actually very useful. But you get what you pay for. So again, Markov's inequality only requires that we know the mean. So can we pay a little more and get a better bound? And one possibility is to incorporate the standard deviation sigma. Since standard deviation measures how far on average the data values are from the mean, let's consider the values that are more than k standard deviations from the mean. And these are the values for which the absolute value of x minus mu is greater than or equal to k sigma. An absolute value is hard to work with. However, the absolute value of x minus mu greater than or equal to k sigma is equivalent to the square of x minus mu greater than or equal to k squared sigma squared. Consequently, the probability of the two events is the same. And so if x is a random variable, no longer required to be non-negative with pdf f of x, then our probability that x minus mu is greater than or equal to k sigma is the same as the probability that the square of x minus mu is greater than or equal to k squared sigma squared. Since x minus mu squared is non-negative, Markov's inequality applies. So by Markov's inequality, the probability that mumble is greater than or equal to something is the mean of mumble divided by our something. And that's actually going to be the mean of x minus mu squared. But the mean of x minus mu squared is the variance of a random variable. And the variance is the square of the standard deviation. And so Markov's inequality gives us a bound on this probability. And since numerator and denominator has a factor of sigma squared, we can remove it, leaving one k squared. Consequently, let x be a continuous random variable with mean mu and standard deviation sigma, then the probability that x is greater than k standard deviations away from the mean is less than or equal to 1 over k squared. We can interpret Chebyshev's inequality in several ways. First, the probability of being more than k standard deviations away from the mean is 1 over k squared. But remember under the frequentist interpretation, probabilities are frequencies. And so we could also say that less than 1 over k squared of the values are more than k standard deviations from the mean. However, we often change Chebyshev's inequality around to talk about the values within some number of standard deviations of the mean. And we note that if less than 1 over k squared of the values are more than k standard deviations from the mean, then at least the remainder, 1 minus 1 over k squared of the values are within k standard deviations of the mean. And in fact, this is how we introduced Chebyshev's theorem quite some time ago. So suppose a grove of trees has a mean height mu equals 1.75 meters with a standard deviation of 0.15 meters. Just find the probability a tree's height is more than 1.95 or less than 1.45 meters. Now, for the application of Chebyshev's, it's important to note that our bounds, 1.45 and 1.95, are symmetric about the mean. In other words, 1.45, well that's 1.75 minus 2 standard deviations, while our upper bound is 1.75 plus 2 standard deviations. And so Chebyshev's inequality applies, and we have the probability that we're greater than 2 standard deviations away from the mean is going to be. And so the probability is at most 1 fourth. And again, equivalently, the probability the tree's height is within this interval between 1.45 and 1.95 meters is the rest, 1 minus a fourth or 3 fourths. So how accurate is Chebyshev's? Well, let's try it out in a case where we know the exact probability. So suppose we have a fair coin that we flip 10 times. We'll use Chebyshev's to estimate the probability that the coin will land heads less than 3 or more than 7 times. So note that this is a binomial experiment. And so for a binomial experiment with n trials and probability of success p, we have the mean np and the standard deviation square root np1 minus p. And so for this experiment we find, and so we have a mean of 5 and the standard deviation of 1.58. From our outcomes less than 3 or more than 7, suggest we take those ranges from 7. Well, that's 5 plus 1.26 standard deviations and 3 is 5 minus 1.26 standard deviations. And so we like k equal 1.26. Or do we? So another way of expressing these outcomes is 2 or less and 8 or more. And these would describe the same outcomes and that suggests 8, well that's 5 plus 1.9 standard deviations and 2 is 5 minus 1.9 standard deviations. So is k equal to 1.26 or 1.90? And the correct answer is neither. So remember Chebyshev's only applies to a continuous distribution. And while the binomial itself isn't continuous, we'll assume it has a continuous probability density function. And in that case we have to consider the outcomes that will round to the less than 3 or more than 7. And so in that case our outcomes are x greater than or equal to 7.5 or x less than or equal to 2.5. And so we find and we use k equal to 1.58. And so if x is the number of heads the outcomes describe correspond to x greater than or equal to 7.5 or x less than or equal to 2.5 and so we use k equal to 1.58. And so Chebyshev's guarantees the probability of being outside the interval is less than or equal to 1 divided by k squared or about 0.40. Now since this is a binomial probability we can find the exact binomial probability and we find it's 0.1094. And so Chebyshev's is a rather generous overestimate of what the probability will be.