 For reasons we'll talk about in a little bit, something called the normal distribution is one of the most important concepts in probability and statistics. And the normal distribution is a probability density function, which means that it conforms to certain properties. And the important properties are sometimes summarizing what's called the 1, 2, 3 rule. First of all, for any normal distribution curve, the area under the curve is equal to 1, that is something of the nature of it being a probability density function. It's also symmetric about the mean, mu, and there are some other important properties. So the 1, 2, 3 rule is a summary of the three properties. If I look at the area within one standard deviation of the mean, it works out to be about 0.68, or if you want to think about it, 68% of the area of the curve is within one standard deviation of the mean. Likewise, within two standard deviations of the mean, we have an area of 0.95. Again, 95% of the area underneath the curve is found within two standard deviations. And finally, if we go out to three standard deviations, 99.7% of the area under the curve can be found within three standard deviations of the mean. And currently, the Vogue is to refer to this set of observations as the 1, 2, 3 rule. It's actually kind of a useless way of looking at it because 1, 2, 3 isn't particularly useful to remember what is important, 68%, 95%, 99.7%. And in older books, they might call it the 68, 95, 99, 7 rule. Let's look at some implications of this. So suppose I have a normally distributed random variable X, and I have a mean of 20 and a standard deviation of 5, and a rather pedestrian problem I can ask at this point, so what's the probability that my random variable falls between 10 and 30? Now, there's a number of ways we can approach this, but I would say the thing that you want to start off with is we begin drawing the normal distribution curve. And here's a useful feature. All normal distribution curves look like this. And it really doesn't matter because we're not going to infer anything directly off the curve. This is mainly a way that we can keep all of our values straight. The one important thing that we do care about, the mean is located right here at the peak of the curve, or the peak of the curve is located at the mean. So here's 20, and what that means is that these values 10 and 30 are respectively to the left and to the right of where the peak is. That's going to become important in some contexts. So the next thing we want to do, let's go ahead and draw our boundary at X equals 10. Again, it doesn't really matter where that boundary is other than we wanted to be located in the correct position relative to where the mean is located. Then we have our second boundary at X equals 30, so from 10 to 30, we'll draw in that second boundary. And the region of interest is going to be the region between X equals 10 and X equals 30. That's the probabilities we're looking at, the probability that X is between those two numbers. And it's really convenient if we just shade the region that we care about. Now at this point, we'll make a useful observation. If we take a look at our lower value, 10, we note that it's the mean 20 minus two standard deviations, five. So the lower value is two standard deviations below the mean, and the upper value, 30, is our mean plus two standard deviations. So that upper value is two standard deviations above the mean. So in this particular case, this shaded area represents the region of the curve that is within two standard deviations of the mean. And that's good because the 123 rule tells me that the area under the curve within two standard deviations of the mean is 0.95. And because this is a probability density function, the area is the probability. And so we can translate that into a statement. The probability that our random variable X is between 10 and 30, which is to say within two standard deviations of the mean, is 0.95. Again, we often restate that as 95%. And to some extent, every problem involving a normal distribution can be handled in exactly the same fashion. This involves what I call the Scotch tape and scissors approach to mathematics because if you know how to put together the relevant area using Scotch tape to glue two pieces together or scissors to get rid of the areas that you don't want, you can actually answer any question involving the normal distribution. So for example, suppose I have a normally distributed random variable again with mean 50 and standard deviation 5, and let's find the probability that our random variable exceeds 65. So our process is really not too different from the first case. We'll draw our normal distribution curve. Every normal distribution curve looks pretty much the same. We'll go ahead and set down our boundary, X greater than 65, which means we'll want to put down a boundary line at X equals 65. And note that where our boundary is, it's going to be somewhere to the right of the mean. Again, the peak occurs at X equals 50 at the mean. 65 is someplace off to the right. I'll set that down. And I want to shade the area of interest. I'm looking for where X is bigger than 65. This is X equals 65, so X bigger than 65 will be someplace to the right of that. And I'll make a quick observation here. 65 is three standard deviations above the mean. It's our mean plus three standard deviations. So that tells me where this is located. My 1-2-3 rule tells me that if I want to find the area within three standard deviations of the mean, that's 0.997, 99.7%. Well, what do I know? The area under the entire normal distribution curve is 1. So if this area in the center is 0.997, that tells me the area in the outside, what are called the wings or the tails of the normal distribution, is going to be 0.003. And because the normal distribution curve is symmetric about the mean, that says that either one of these tails is half of that area. So it's going to be half of 0.003, it's going to be 0.0015. And because we have a probability density function, the area is the probability. And so we can state that the probability that X is greater than 65 is going to be 0.0015. Now the 1-2-3 rule is actually pretty good for a quick back of the envelope type estimate of probabilities associated with the normal distribution. But for more general problems, we need some way of calculating the area under arbitrary regions under the normal curve. And so we fall back on one of several things. In this age, in this day and age of high technology, we can resort to a calculator spreadsheet, smartphone app, any number of things. We'll have a built-in function, usually called norm-dist or some variation of that. And what this function will do is it will tell you the area under the normal distribution curve up to a certain value. So you tell it X equals 17, and it'll tell you the area under the normal distribution curve up to X equals 17. And again, the area is the probability. Now, does this change how we approach problems using the normal distribution curve? And the answer is absolutely not. Our problems are handled in exactly the same way. Only minor details are changed. So let's take a look at that same problem. We have a normally distributed random variable with a given beam and standard deviation. And again, I want to find the probability that our random variable exceeds a certain value. So what shall we do? Well, let's draw the normal distribution curve again. I know it's tedious to draw that, but it's very useful to have it. We'll go ahead and set down our boundary to X equals 65. Again, the only thing that really matters is we want to make sure that our boundary is where it should be relative to the mean. But even that really doesn't matter. Then we're going to shade the area that we want, which is our region bigger than 65. And here's the minor difference. If I use the norm-dist function, what it's going to tell me is that the area under the normal distribution curve up to this point, X equals 65, is going to be some value. In this case, it's 0.99865. And since the area under the entire normal distribution curve is equal to one, that says the area to the right is going to be what's left over. One minus that amount is 0.001135. And again, because this is a probability density function, the area is the probability. And so I can say the probability of X is greater than 65 is 0.001135. Now, you might notice that that's actually a different answer. So this is using norm-dist, whereas before, I found this probability 0.0015. And so the natural question to ask is, well, which one is correct? And technically, neither of them is exactly correct because both of these represent significant roundoff. But this value here is actually more correct. Again, the 1, 2, 3 rule will give you a good approximation to what the actual probabilities are. But if something really critical depends on having the value as correct as possible, you don't want to use the 1, 2, 3 rule. You want to use norm-dist. You want to use your calculator spreadsheet or cell phone app, something that will allow you to find that probability exactly right. But if you're just using a quick approximation to the value because you need a rough answer to a question, the 1, 2, 3 rule is actually fairly effective for a lot of problems that we would solve using the normal distribution.