 Welcome to our lecture on the normal distribution. The normal distribution is a continuous probability distribution. It's a probability distribution for a continuous random variable. And as you know, we've studied this before, those are called probability density functions. The probability distribution for a continuous random variable is called a probability density function. The probability is interpreted as the area under the curve. All the area under the curve is designed to be one, equal to one. So if we're looking for the probability in a particular interval of the random variable, the interval of values, two values of the random variable, we're looking at the area under the curve between those values divided by the total area under the curve, which is one, and that's the proportion and a proportion is a percentage, is a probability. It all works together. The random variable, in this case a normal random variable, but since it's a continuous random variable, it can take on an infinite number of values inside of a given interval unlike a discrete probability distribution. Therefore, the probability that the random variable, X, is exactly equal to any particular value has to be zero. So we can't even ask questions like that. What's the probability that this random variable, which we know to be distributed according to the normal distribution, is exactly equal to 27.42. Because it's a continuous random variable and there's an infinite number of possible values it can take on, if we have to ask a question a probability that the random variable is equal to a particular value, it's going to be zero. And of course, as with any other continuous random variable, the area under the whole curve is one. We'll see how to do this shortly, but basically we get probabilities by getting the area under the curve inside of a particular interval and that's the proportion of times under identical circumstances that a particular range of values is likely to occur. And it's a proportion of the area under the curve inside the interval over the total area under the curve. There are three characteristics of the normal distribution. You know the typical bell shaped curve, the picture you've seen, you're going to see it on the next slide. It's symmetric about the mean mu, that's the normal distribution, all normal distributions, they're all symmetric around the mean. Because they're symmetric around the mean and they have that bell shape, what that means is the mean is equal to the median is equal to the mode. If it's symmetric around the mean that means 50% of the observations are on one side of the mean and 50% of the observations are on the other side of the mean, well that's how we define a median. And of course, the mode is the highest frequency and that's what the bell shaped tells us. That bell, that highest point, that highest frequency is at the mean. Another interesting characteristic is that if you look at the picture of the normal distribution, as you get further and further out, away from the mean, into the tails of the distribution, the curve, the function of the normal distribution approaches the horizontal axis asymptotically, which means it just gets closer and closer and closer and closer but never actually meets. And what that means to us is that there's always, at least in theory, some positive probability that we will get some outlandish, crazy extreme value for a normally distributed random variable. Of course, that's according to the theoretical normal distribution. In real life, we very rarely see that. This is the function, the actual function for a normally distributed random variable. X is the random variable, f of X is the height of the curve. I'm not even going to read it out to you because it's so simple and so clear, all of you can see it well, okay. What do we have? What is in that formula? We see X, that's the random variable. We see some constants, 1, 2, e and pi. And we see two other symbols, nu and sigma, nu the population mean and sigma the population standard deviation. Those are parameters. Every normal distribution has its own nu and sigma and therefore, as we'll see, every normal distribution has its own slightly different location, that's what nu tells us, and spread, dispersion, that's what sigma, the population standard deviation tells us. Now, if I wanted to scare you, what I would tell you now is that if you wanted to compute the area under the curve between one value of X and another value of X, you could just use your calculus and do an integral. Of course, I don't really want to scare you and what I'll tell you is that those values have all been tabulated for us and we will have a much easier time with it than if we had to remember our calculus. What can we say about the normal distribution? Basically, it's defined by two parameters, nu and sigma. Nu is the population mean and sigma is the population standard deviation. Somebody asked you how many normal distributions are there? Infinite. Any nu-sigma combination can give you a normal distribution. However, this one normal distribution, we call it z and that's special and that's when I have a special table and it's the one that we always refer to and it's very important in this course. That's the normal distribution where nu is zero and sigma is one. That's called the z-distribution or the standard normal distribution. As you note, it's one of trillions and trillions of normal distributions you could have selected, but the special to us is that one, the standard normal distribution. So what is the standard normal distribution? It's the case of the normal distribution where mu is zero and sigma is one. We can take any normal distribution and convert it to a standard normal distribution z very much like what we did a while back in getting z-scores. And the way we do that is we take x, the value of the normal random variable, minus the mean of the distribution mu divided by the population standard deviation sigma and that's the transformation. If we take all the data we have, every single data point, subtract the mean divided by the standard deviation, we have a new set of data, a new distribution. The original distribution x is normally distributed with a mean mu and a standard deviation sigma. The transformed distribution is a standard normal random distribution z and has a mean of zero and a standard deviation of one. The nice thing about having a standard normal distribution to refer to is that we have only one as opposed to any normal distribution where we have millions and millions of them for every new sigma pair. There's another one. This one, the distribution, all the calculations have already been done for us and put into tables. There are different kinds of tables. The one we're going to be using most is on the next slide and we'll take a look at it in a minute. And a similar table is at that link. The nice thing about that link is that it's open to the world and there are tables for all kinds of statistical distributions. Here's the table we're going to be using for the normal distribution. This is a table, a z-table, for the standard normal distribution. One of the first things, the main thing about this is look at the picture and one thing you'll see and hear over and over again in this lecture is in order to do problems, get probabilities from the normal distribution, you must draw the picture, shade in the area that you want, the best way to do it. What this table does is for any z-value, any value from a z-distribution, it will tell you the area under the curve between z and that value. You see the blue shaded area? So let's just take an example. You see the table in front of you. It doesn't have the whole table, it doesn't have something that you do have. The z-value of 0.50, one-half. It just gives me an opportunity to tell you something else. In this table, it looks like a two-dimensional table, but it's really not. It's just a bunch of z-values and the associated area under the curve. The only reason you have it, the column is to fit it onto a page. So we don't have just two long columns going down over several pages. So for a z-value of 0.500, you first go to the 0.5 row and over to the 00 column and you see the value from the table is 0.1915. What we know then is that the area under the z-curve between the mean of zero and a z-value of 0.50, one-half, that area is 0.1915 or in other words, 19.15% of the distribution is between zero and 0.5. Now, what happens if we want to know the area under the curve between zero and negative 0.5, negative one-half? This table only gives us that blue area, it's all on the positive side. Well, remember one of the characteristics of the distribution is that it's symmetric. It's exactly the right side is exactly a mirror image of the left side and vice versa. So if we want to know the area under the curve between zero and negative one-half, it's also 0.1915. And that means that 19.15% of the distribution is between zero and negative one-half, another 19.15% of the distribution is between zero and plus one-half. How about between zero and 0.54? What do I do to get that? Between zero and 0.54, please help me. I'll help you out. All right, we need the z-value of 0.54. So we go to the 0.5 row and we go over to the, not to 0.001, to the 0.304 column and we end up with 0.2054. Very good. Thank you. How about points? I need to do that one. Let's say somebody would ask you how much area between zero and 0.68. So again, you go to the 0.6. See that? The 0.6 row, actually row. See that? First column in the row 0.6. Then you move over columns, 0.60, 0.61, 0.62, all the way to where it says 0.08. That's really a second decimal place on top. So 0.68 is 0.6 and you go to the 0.08 and then you find the area would be 0.2518. That's between zero and 0.68. Remember those columns? They start from 0.00, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, all the way to 0.09. That's really a second decimal place. Right? So let's say I actually mean zero and 1.35. Well, you go to 1.3, the zero is 1.3 and I go over to 0.05, that column. So look at 1.3 and 0.05, that's 1.35 and the area per decimal is 0.4115. That means between zero and 1.35, the area would be 0.4115. So here we just show you some important probabilities like plus or minus one standard deviation, roughly 0.34 plus 0.34, it works out to 68%. That really tells you when something's normally distributed, roughly 68% of the population is between plus one and minus one standard deviations. Another way of saying this, by the way, is when something's normally distributed and you convert to z-scores, 68% of your z-scores will be between minus one and plus one. Notice 1.645 plus or minus gives you 90% of the area underneath the normal distribution. This is the most important one we come up with in this course and we'll use it all over and over again. 1.96, plus or minus 1.96, that means you're going from zero to plus 1.96, zero to minus 1.96, we'll get 95% of the area underneath the normal curve and plus or minus 0.2 is 0.9555, 955, plus or minus 2.575 is about 99% of the area and plus or minus 3 gives you 0.997. Those of you who know a little bit about quality control, they talk about six sigma and that's related to this because six standard deviations away from the mean gives you very little area. That's why they talk about six sigma quality. They're talking about just a couple defectors per million. It's very hard to get the six standard deviations away from the mean when something's normally distributed. These are some of the values you're going to see a lot in this particular course. If you know how to use the normal distribution, I hope you do now how to read the table, you can solve all kinds of problems. For example, if I tell you the weight of adult men, it's normally distributed, the MD is going to be our symbol for normal distribution. I tell you that it's normally distributed with a mean of 150 pounds and a sigma of 10 pounds. What's the probability that a randomly selected male will weigh between 140 and 155 pounds? Again, we've never asked you what is the probability that adult men weighs 150 pounds because technically the answer is zero. Nobody in the world, nobody on the planet weighs exactly 150 pounds. This is a continuous measurement. 150 pounds means 150 with 8 million zeros after it. Nobody weighs back. Technically, even though you think you weigh 150, it's rounded. We never asked that kind of question, but we might ask between 140 and 155. Remember that? We're not asked. That's an interval. To get the answer, we're going to use the normal distribution, the standard normal distribution, and notice asking the question being 140 and 155. We can't do that directly. We'll do 150 to 155 and 140 to 150. Of course, we don't have a table that has pounds and those kind of numbers. We have to convert everything into the z-score. We're going to talk about x-scores. That's going to be in the original units, like pounds. The z-score, the z-value, which is never in units, pure number. You don't talk about pounds or dollars. It's a pure number. We're going to convert the 140 into the z-score. Using the formula that z equals x minus mu over sigma, so z of 140 becomes 140 minus 150 over 10, and those pounds are the pounds. You have a pure number. It's minus 1. You've got a z-score of minus 1, or another way of saying this. You have a minus 1 standard deviation from the mean. Well, minus 1 is the same as plus 1. It's symmetric. If you look at your table for 0 to 1, you'll find you have 0.3413. That's the amount of area that comes out of the z-table. What about 155? Well, 155 to 150, if you convert that into a z-score, 155 minus 150 over 10, that's plus 0.50. A half a standard deviation from the mean. Looking at the table for 0.50, you find the area under the curve between 0 and 0.5 is 0.1915. You can add this up. It's like little pieces. You'll add up because you know the area under the entire curve is 1. That makes it a probability distribution. So we add 0.3413 to 0.1915. We add 0.5228. Roughly 53.28% of adult men will weigh between 140 and 155 pounds. Another way of doing this, if I ask you what is the probability that a randomly selected adult man will weigh between 140 and 155, that's 0.5328. Suppose we know that IQ is normally distributed with a mean of 100 and a standard deviation of 10. What percentage of the population, then, will have IQs ranging from 90 to 110? That's part A. We'll look at part A first. You want the mean is 100. We're going on both sides of the mean. So there's no picture here. You should really draw that on your own. But when we draw the picture, you'll see that you have one piece attached to the mean on the plus side, one piece of the area under the curve attached to the mean on the minus side. So if we take 90 minus 100 divided by 10, we get the z value of negative 1. We take 110 minus 100 divided by 10. We get a z value of plus 1. The area under the standard normal curve between 0 and 1 is 0.3413. Therefore, because the distribution is symmetric, the area under the curve between 0 and negative 1 is also 0.3413. And when we add those two pieces together, we get 0.6826. And the question was, what percentage of the population will have IQs ranging from 90 to 110? The answer is 68.26% of the population will have IQs ranging from 90 to 110. Now for part B, all we're doing here is the same exercise, just going out a little further. Here we're looking for IQs ranging from 80 to 120. Same distribution, same mean, same standard deviation. We compute the z value for the value 80. 80 minus 100 divided by 10, you get a z of negative 2. 120 minus 100 divided by 10, you get a z of plus 2. And when we find from the table, the area under the curve between 0 and 2 is 0.4772. The area under the curve between 0 and negative 2 is exactly the same as the area between 0 and positive plus 2, and it's also 0.4772. You add those two pieces together, you get 0.9544, 95.44% of the population has IQs between 80 and 120. So a lot of different types of problems you're going to see that involve the normal distribution. So we're going to show you all kinds of problems and how to solve them. So this one, we're going to call it a salary problem. Suppose we know the average salary of a college graduate from school is normally distributed to the mean, and that's you, the population name of 40,000, and sigma, the population standard deviation of 10,000. And now you're asked, what proportion of college graduates at this school will earn 24,800 or less? So we'll learn how to solve that part A. B would ask, what proportion of college graduates will earn 53,500 or more? Question C asks, what proportion of college graduates will earn between 45,000 and 57,000? Question D asks us to calculate the 80th percentile at this college. We'll see in a minute that every Z-score is associated with a particular percentile. A Z-score for each percentile. And party asks for the calculation of the 27th percentile. So these are five different problems that we could ask you. Remember, in these cases, these kind of problems, you know me and sigma. In the real world, you're not going to know me and sigma generally. But again, this is practice. We want you to have practice using the Z-table. And we're going to assume that you do know me and sigma. So if you know me and sigma, you can solve all kinds of problems. Okay, party asks, what proportion of college graduates at that particular college are going to earn 24,800 or less? Now again, all these problems you have to convert into Z's. Because you don't have a table with 24,000 in dollars. You don't want to use counter-calculus. I'll be the other way to do this. All right? So we're going to convert the Z-score so we can use the Z-table. So we convert the 24,802 Z-score. And using the formula Z equals X minus mu over sigma, 24,800 minus 40,000 over 10,000 gives you a Z-score of minus 1.52. And the fact that you've done is you've converted the 24,800 dollars into a pure Z-value of minus 1.52. So if you want to explore that picture, you really want to know the area between zero and minus 1.52. But it's symmetric. So the area between zero and minus 1.52 is the same as the area between zero and plus 1.52. Now the area between zero and minus 1.52, as I said, is the same as zero and plus 1.52. That's 0.4357. So the area between 24,840,000, would be 0.4357. But the question wanted the left tail below 24,800. So using the table that you have, now there are other tables that actually can solve this directly if you wish. But our table, we know that the left half is symmetric. So going from 40,000, which is the Z of zero, all the way to negative infinity would be half the distribution. So how much is in the left tail if 24,800 to 40,000 is your 0.4357, the left tail would have 0.0643 of the area. So the answer to our question is 6.43% of college graduates will earn less than 24,800 or less. Just remember that these two pieces, the part that goes from negative infinity to minus 1.52, and the part of the Z distribution that goes from minus 1.52 to zero, it's got out of the half, right? It's half of the table. So we know that we got 50% on the left side and 50% on the right side. So once I knew that 0.4357 was in the area between 24,840,000, I knew the left tail, that's what we call that, the right tail has 0.0643 in it. As you can see, the earliest problems, you must draw the picture. You have to draw the, you never get a right. They partly ask, for what proportion of college graduates will earn 53,500 or more? Now look at the picture, that's asking for the right tail. 53,500 all the way to plus infinity, we need the right tail. Well, given the table that we have, we need the area between 53,540,000. Converting the 53,500 to a z-score, z equals 53,500 minus 40,000 over 10,000, that's z equals x minus mu over sigma. So 53,500 becomes z of plus 1.35. Well, that we can find on our table, zero to plus 1.35. And if you look it up at the z-table, let me get up to the column that has a 0.05 in it. 0.05 column, you look at 1.3 and 0.05, you're going to find the value in the z-table of 0.4115. So now we know the area between zero and plus 1.35 is 0.4115. That's the same as the area between 40,000 and 53,500. The question asks for more than 53,500. Well, again, we're looking at the right side of the distribution. You know that you've got half on the right, to the right of the mean of 40,000, and half to the left. Well, if you've got 0.4115 between 40 and 53,500, the right tail will have 0.0885, right? So the answer to the question is 0.0885 or 8.85% of college graduates at this school will earn 53,500 or more. Part C of this problem asks, what proportion of college grads will earn between 45,000 and 57,000? That sounds like some of the problems we had before, but when you draw the picture, you see that it's not. The mean is at 40. Both of those values are at the right side of the mean. We know by now that we're going to be looking things up in the Z-table between the mean and the Z-value, and one thing that we see right away is this problem is a little different from what we had before. But first, let's convert from a salary from the X to the Z scale. We're going to standardize 45,000 and 57,000. For 45,000, the equivalent value of the Z-distribution is 0.5. For 57,000, the equivalent value from the Z-distribution is 1.7. Now, we can get from the Z-table, if we don't want to use calculus, we can get the area under the curve between the mean and any Z-value. Not exactly what we want here, but we know that's what we can get. Sometimes we have to work with what we can get and then manipulate it a little bit into what we want. So let's just see what we can get. The area under the curve between the mean of zero and the Z-value of 0.5 from the table is 0.1915. You see that's outside the shaded area and in the picture, the shaded area is what we want. But let's just see what else we can get. The area under the curve between the mean of zero and a Z-value of 1.7, we can get that. And that whole big area, you can see outlined outside the picture, is 0.4554. So if we just use subtraction, we can take the little piece and subtract it from the big piece and we'll end up with the shaded area, which is exactly what we want. So when you take 0.4554 minus 0.1915, the result is 0.2639, or in other words, 26.39% of college grads from this particular school earned between 45,000 and 57,000. The next problems we're going to do involve percentiles. Again, as I noted, every Z-square is associated with a percentile. In fact, Z-square of 0 is the 50th percentile. That means if you take any test that's not only distributed, for example, many of you took the SAT exam, and suppose they told you your Z-square on the test and now you get a Z-square, X minus U over sigma. If they convert it into a Z-square and they tell you, oh, your Z-square on that test, when the SAT was zero, that means you're at the 50th percentile. In fact, we know that the mean, median, and node are all the same for the normal distribution. So again, you have to bear in mind that a percentile can be converted into a Z-square and vice versa, so you can go from one to the other and we'll see how to do that using the Z-table. Okay, let's do a problem, a salary problem with percentiles. You've been asked to calculate the 80th percentile of college students in salary. So first you've got to find out what Z-square is associated with the 80th percentile. Now, that means you need another 30 percentile. We know that a zero Z-square is the 50th percentile. Well, for the problem of salary, 40,000 was the Z-square of zero. That was the mean. All right, but we need another 30 percentile of the area of the curve. Why do we need another 30 percentile? Because we get .50050 percentile and add 30 to it. That'll give us 80 percentile, right? So we've got to find a Z-square that is the closest to .30000 that we can find. We'll move it around a little bit. Turns out the Z-value of .84. You go to the Z-table and look at .84, Z-value of .84. Remember where to get that? That's the .80Z. And then you go to the .04 on top. .84. That will be very close to .30000. So thus, a Z-square of plus .84 is the 80th percentile. So any test in the world, if you get a Z-square of .84, you know you've approximately the 80th percentile. So for this problem with the salary, we know the Z. Remember, the formula Z equals X minus mu of the sigma. We know Z, right? We don't know the X. That's the unknown. We know the mu of the sigma. That was given right away. So we solve. This is a simple algebra. .84 equals X minus 40 thousand dollars over 10 thousand dollars. Now, you do a little bit of algebraic manipulation and you find out that X is 8,400 or X works out to 48,400. It's 40,000 plus 8,400 using a little algebra and you end up with X is 48,400. How do we get that? Because 48,400 minus 40,000 will be 8,400 over 10,000. That gives you .84. It's not a regular algebra. And that's the end-style question. So if you're a student getting 48,400, that's the 80th percentile for salary in this college. Party asks the 27th percentile. That's going to be a negative number. Number zero is the 50th percentile. Anything below the 50th percentile is a negative Z. All right, so with Z score, we'll give you the 27th percentile. Now, if you look at a diagram, we'll see what you need. If you look at the left tail now, you want to get .2700. The table that we've been using, there are other tables, so you can solve it directly with some other details. But the one that we're using gives you zero for something. So we need to find the value, the closest to getting .2300. Because if we get the value that gives you .2300, we know the left tail is going to have .2700. Because the two together have to add up to half. Well, the closest to a .2300 on the Z table, the closest value is a zero of .61. Well, we know it's negative here. Actually, though, this is not really .2300, it's .2291. This is close enough for all purposes. We're rounding it. So roughly, a zero of minus .61 is associated with a 27 percentile. So now we just move the algebra. Remember, Z equals X minus mu over sigma. We know mu over sigma. And now we know Z of minus .61 is a 27th percentile. So we do minus .61 equals X minus 30,000 over 10,000. Through the little algebraic manipulation, you find that X equals 33,900. It's got to be valued below .30,000. Because if you're getting .30,000 at this college and you're starting salary, you're at the 50th percentile. That's a Z of zero. If you get .33,900, you're roughly at the 27th percentile. And that's the answer to the question. X equals .33,900. These problems are all very simple once you understand how to connect percentiles to Z scores. As you know by now, the only way to understand statistics is just to keep doing problems till you grasp it. It's not very difficult, but I'm asking you to memorize formulas to try to understand what you're doing. And with these kinds of problems, which you really have to do is draw the picture. If you draw the picture and kind of understand what you're looking at, you shouldn't have any difficulty at all. But do lots of problems.