 Welcome to our lecture on the normal distribution. If you looked at the lecture on probability distributions in general, you'll remember that for any continuous probability distribution, and the normal distribution is an example of that, the probability distribution is actually a curve. It's called a probability density function. The probability that X, a random variable, is equal to any one particular value must be zero because it's a continuous random variable. The probability that it takes on any one value is so minuscule that the probability of it taking on that value is zero. We're talking about a continuous random variable that can take on any value, including fractional decimal values inside of an interval. We talk about intervals. What's the probability that the random variable will take on a value between zero and one, let's say? We ask probability questions. We frame them in terms of intervals. In addition, we talk about the probability as the area under the curve because if you see the shaded area in the curve on the slide, if we're looking at the probability that the random variable will take on a value like that in that interval, let's say that interval is between zero and one. If we can get the sum, the area in that interval, the area under the curve, that would be a definite integral, and then we can get the area under the entire curve, divide one by the other. We get the proportion of times that the random variable takes on a value inside that interval. Well, a proportion is equal to a probability, and so that's how we get our probability. Of course, you'll also remember that the way the formulas for the continuous probability distributions, the way the formulas are constructed, the area, the total area under the curve is equal to one. So if we want to take the area under the curve that's shaded in and divide by the total area, that's just dividing by one, and it makes our lives a little bit easier. Let's look at some of the properties of the normal distribution, the famous bell-shaped curve. The three key characteristics. One, it's symmetric about the mean. That means on the right side of the mean, it's exactly the same as the left side. So the symmetry here. Also, you can see the highest point is right in the center. The mean equals the median equals the mode. That's why it's called a bell-shaped curve. And finally, fx, as you move further and further away from the mean, fx gets smaller and smaller and smaller, same as right as the left. And the way we show that is x goes from plus infinity to minus infinity. It's asymptotic. It means it's getting closer and closer to the horizontal axes, but never quite touches it. There's another way of understanding that if the IQ is normally distributed, let's say it's 100 is the mean. You have very few people who are going to have IQs of over 150 and very few who have IQs below 50. The bulk is always going to be the earth of 100. What you see here is the formula that produced that curve, that bell-shaped curve for the normal distribution for the probability density function. I like to scare my students when I teach in class. I can't do it here because you'll figure it out in a minute. I like to say, oh, yes, we're going to use that formula and we're going to use calculus. You'll have to do integrals. Yeah, so it's not going to work here, but basically I do want you to look at it. You're never going to use the formula. Again, thankfully, everything we need has been tabulated for us and we're going to be able to look up normal probabilities in a table. We'll see how very shortly. But do take a look at it. That curve is the y-value for any particular x-value, any value of the random variable x. It computes f of x. f of x is the relative frequency. It has to be relative frequency, again, because we know the probability that x is equal to any particular value, right, zero. We can only look at values inside of intervals if we're looking at probabilities. But that's not really the most important thing. The most important thing is to know that there is a curve and if we have the right information, every particular normal distribution can be drawn. You can do it yourself. You can do it with a computer program. Look at the formula. What do we have? We've got a lot of constants, right? We've got 1, we've got 2, we've got pi, we've got e. Those are all constants. We've got a random variable, x, that's the random variable. What's left? The only things that are left that are not constants and not the random variable itself are mu and sigma. These are the parameters of the normal distribution. Mu is the average of the normal distribution that you're working with. Sigma is the standard deviation. Remember from the very first lecture, mu and sigma are parameters of the population. For every mu and sigma combination, we have another normal distribution. Every normal distribution is characterized by a particular value of mu and a particular value of sigma. Really, there are an infinite number of normal distributions. How are we going to use one table to help us compute probabilities from the normal distribution we're interested in? We're going to see that shortly. As you've been told, there are infinite normal distributions. You can create your own. There are programs that will do a few. The program will ask you, give me the mu, and you can say mu is 12.67 and sigma is 3.99. It'll draw you a normal distribution. It's symmetric, you know the properties. There's only one we call z. That's the standard normal distribution. The standard normal distribution has a mu of 0 and a sigma of 1. That's why you have a z table. It's essentially showing you, it's like our template, and that shows you what the normal distribution looks like with 0 and sigma is 1. By the way, because of that property, you can transform any normal distribution. It doesn't matter what you're working with. As long as you know it's normal distribution, you can transform it so that you can use that z table using the formula z equals x minus mu over sigma. That's called, we discussed this already early in the course, standardizing the data. This way, you don't never need to use calculus in this course. Instead, you use the template, which is the template, the z distribution, the standard normal distribution. I'm going to look at the standard normal distribution, the z, and see how to read this table. Very important. Maybe one of the most important things you can learn is how to read the z table. It's quite easy. Look at the table. Suppose you want to know how much area do you have from 0 to 1. In other words, z is going to be 1. How much area do you have from 0 to 1? By the way, it will be the same answer, the 0 to minus 1 is the same as 0 to plus 1. Unless you don't have negatives. It's the same thing, it's symmetric. Let's do 0 to plus 1. Now look at the, it's not really a column. See, underneath the z, you see a 0.00, 0.10, 0.2, those are row headings. Now you want to know from 0 to 1, it's actually 1.00. So you go down and you look at the row headings underneath the z, look at the 1.0. Now you want the second decimal place. The second decimal place comes from those columns on top where it has 0.00, 0.01, 0.02, 0.3, that's your second decimal place. So if you want to know how much area is there between 0 and 1.00, you see it's highlighted, it's circled, 0.3413. 0.3413. So if I ask you how much area between 0 and minus 1, ignore the sign, it's symmetric. So it's the same answer, 0.3413. In other words, how much area do they have between 0 and 1? 34.13% of the population will be between 0 and plus 1 standard deviation away. How much between 0 and minus 1? Also 0.3413. I'm going to study how to use this for the z table. Now first, before I even get to a particular problem, if I ask you how much area between 0 and infinity, the answer is half. Let's say from 0 to minus infinity, it's a half, because the area under the whole curve is 1. So those numbers in that table, the four decimal places, you could see it as a proportion or a probability. So suppose you're asked what is the probability or area from 0 to 1.28. Zero is the middle. Remember, this is the z table. The mean is zero. That's where there's a zero in the middle. Now how do you find 0 to 1.28? Now look at, actually the first column has a z. Those are the row headings. Think of that as row headings. Think of the numbers, actually the columns. That's the second decimal place. So it's 0.00, 0.01, 0.02. That's where you read the second decimal place. So for 0 to 1.28, first you go to the row heading underneath the z of 1.2. Okay, you don't have the h yet. We need to get that extra 8. Well, you go to the column that has 0.08. See that column has 0.08 and you look at 1.2 over the intersect, that's 1.28. And you see the answer is 0.3997. That's a probability or a proportion. So the answer is, what is the area from 0 to plus 1.28? 0.3997. And guess what? If you need to get minus 1.28, you don't need another table. Because it's symmetric. So 0 to minus 1.28 will also be 0.3997. Or almost 40% of the area is between 0 and 1.28. That's another problem. How about 0 to 0.87? Okay. So again, you look at the z, the row headings underneath z. You go to 0.8. Now we need the second decimal place. By now you figure it out. You're going to get that second decimal place from the column that has 0.07. That's your second decimal place. So that's where 0.87 is. And notice, it circled for you. There's your answer. How much area do you have from 0 to 0.87? 0.3078. 30.78% of the area of the curve is between 0 and 0.87. Now we're asking you to figure out how about 0 to minus 0.87. Think about it. You'll know the answer. Now we say remember these probabilities. Again, we don't mean to memorize it. But these are very important ones to come up and we've rounded it for you. It's well known that if you go from 0 to plus 1, that was 0.34 in change. It's symmetric. We double it. So from plus and minus 1, if you're going all the way to plus 1 and to the left to minus 1, you get approximately 68% of the area under the normal curve. This is true with the z-scores. If something is normally distributed, and you look at all the z-scores, roughly 68% of your z-scores will be between minus 1 and plus 1. How about between 0 and plus 1.645 and from 0 to minus 1.645. You get to figure it out from the table. You see it's about 90% of the area. This number comes of a lot, 1.96. If you're going from 0 to 1.96 and 0 to minus 1.96, you get 95% of the area under the normal curve. 0 to plus and minus 2, it's 0.955. 0 to plus 2.575 and minus, you get about 99% of the area and finally 3, that gives you 0.997 or 99.7% of the area. Those of you who are going to think more advanced management courses, you can hear the term 6 sigma. That's a certain kind of level of quality we're talking about, just a couple defects out of a million. This is like a high level of quality control. That's 6 standard deviations from the mean. Notice your table doesn't know that high. I think it stops at 3.99. You have to get a special table to look on the internet. This shows us how much area you have between 0 and 1 standard deviations, 0 and 1.645, again plus and minus and these are very useful values. Again, don't memorize it, just know it by looking up. Let's do an example. We're going to take weight, the weight of adult males. Suppose we know that the weight of adult males is normally distributed with a mean mu of 150 pounds and a standard deviation sigma of 10 pounds. That's a particular normal distribution. Now the question is, what's the probability that a randomly selected male will weigh between 140 pounds and 155 pounds? Now notice we have an interval because we know that that's the only way that we can ask probability questions for a continuous random variable. Now we do have the standard normal distribution tabulated so that we can look things up. But how do we get the probabilities for a normal distribution with a particular mean and a particular standard deviation? Not normal, not standard normal. The standard normal, the mean would be zero and the standard deviation would be one. That's not what we have here. That's basically what the solution is all about. Let's take a look at the solution. Now you see a picture of the normal distribution. You're going to see that repeatedly practically all throughout the semester and not just because we like to make pretty pictures. You must do the same thing when you solve your problems. Anytime you're looking for a probability from the normal distribution draw the picture. The very first thing you do, draw the picture. It's almost impossible to answer questions correctly, certainly not every single time, without drawing the picture. Don't be lazy. Don't try to speed things up by not drawing a picture. You will be sorry and you'll have to do things again. You'll end up wasting time. Let me tell you what I do. I've been doing this a long time. Every time I have to do a problem, I draw the picture. Here we have the picture of the distribution in question. Mu is 150, so you see mu is at the 50% mark. The distribution, as always, since it's a normal distribution, is symmetric about the mean. The right side is 50%, and each side is the mirror image of each other. But what are we asking for? What we're asking for is the probability, the area in the interval between 140 pounds and 155 pounds. In essence, we're looking at two different non-overlapping areas. Why? Because we want to use the z-table that gives us the area under the curve between zero and z. The z-table that we've been working with, that has the blue shaded area and the picture on the top. For every z-value in the table, you can get the area under the curve between zero and it. We have to work with what we can get. We can't, in one step, get the area between 140 to 155 pounds. Well, we could if we wanted to use calculus and work with the formula that was up a few slides ago. But if we don't have to and if we don't want to, then let's not. So we have one piece of the distribution between 140 and 150 on the left side of the mean, one piece of the distribution between 150 and 155 pounds on the right side of the mean. But again, now we have the problem of not having this particular normal distribution tabulated. However, as we know, we can transform any normal distribution into a standard normal. We have the formula to do that. So what we want to do is translate this picture from an x-distribution to a z-distribution and that's why you see the additional scale was drawn under the x. On the left-hand side, you see the formula. We want to transform the 140 into a z. So z is equal to 140 minus 150 in the numerator divided by 10. 10 was the standard deviation. And the result is negative 1. And that's why there's a negative 1 on the z-scale underneath lining up with the 140. So the area under the curve between 140 and 150 is the same as the area under the curve in the z-distribution between 0 and negative 1. And when we look that up in the z-table, we find an area of 0.3413. In fact, I think we saw that for a different problem a minute or so ago. On the right side, we want to standardize that 155, turn it into a z. z is equal to 155 minus 150 divided by 10. And we end up with 1.5 for the z-value. And you see the 0.5 on the z-scale underneath the x of 155. And so again, the area under the curve in the x-distribution between 150 and 155 is exactly the same as the area under the curve between a z-value of 0 and a z-value of 0.5. And when you look that up in the table, you see that that value is 0.1915. It's important to note that these are non-overlapping areas. Because of that, they're mutually exclusive. And we can add them. We can use the adding rules, the rules of addition of probabilities. 3413 plus 1915 gives you 0.5328 for the answer to the question. And so the probability that an individual male chosen at random will weigh between 140 and 155 pounds, that probability is 0.5328 or in other words 53.28% of the population will be in that area. And here's another problem. Say we know that IQ is normally distributed with a mean of 100 and a standard deviation of 10. The percentage of the population will have A, IQ is ranging from 90 to 110. B, IQ is ranging from 80 to 120. What's the first thing you have to do? Always, always, always draw a picture. Take a look at the picture that's there. This is the same kind of picture you're going to be drawing all throughout this topic and other topics as well. We've got a picture of the normal distribution, the original one, the X. We've got a Z scale drawn under it for part A. And remember, if you have a multiple part problem, you draw another picture for each one. And take my word for it. If you think you'll be saving time by not doing that, you're actually going to be wasting time because you'll eventually have to go back to that. So the area where we have to standardize the 90 and the 110 for part A is much like the problem we did before where we've got non-overlapping pieces. We're going to be looking at the piece between 90 and 100 on the X distribution. And we're going to be looking at the piece between 100 and 110 on the X distribution. We're going to turn those into Z values so that we can look things up in the Z table. At 90, the Z value is 90 minus 100 over 10. Or in other words, negative 1. We actually saw that value before and the area was 0.3413. At the 110, we have 110 minus 100 over 10. And that works out to a Z value of plus 1 which makes sense because we're talking about values that are symmetric about the mean. The areas that we're interested in are non-overlapping areas about the mean but they're the same size. So it's 0 to negative 1 on one side, 0.3413 and 0.3413 on the other side. The answer to the question what percentage of the population will have IQs ranging from 90 to 110? We have to add those two probabilities up. And when you take 0.3413 plus 0.3413 you get 6826. But of course the question asked about percentages and to turn from a probability which is a proportion and convert to a percentage all you do is multiply by 100%. And the answer to part A is 68.26%. Part B of the problem is very much the same but the two pieces the two non-overlapping areas on either side of the mean are a little larger. That's really all it is. At an IQ of 80 the Z value ends up being negative 2 at an IQ of 120 the Z value ends up being plus 2. The area under the curve looking at the table the Z table between 0 and 2 is 0.4772 so since the normal distribution is symmetric around the mean that's 0.4772 on the right side 0.4772 on the left side add those pieces together you end up with a really big number 0.9544 So in this particular example with this particular mean and standard deviation 95.44% of the population will have IQs between 80 and 120. In this example we're looking at the salary of auto mechanics suppose we know it's normally distributed with the population mean of $40,000 and the population standard deviation of $10,000 So question A asks what proportion of auto mechanics will earn $24,800 or less B asks what proportion of auto mechanics will earn $53,500 or more C is what proportion of auto mechanics will earn between $45,000 and $57,000 D is asking for the 80th percentile and finally E asks for the 27th percentile and since we know it's normally distributed we can use the Z table question A what proportion of auto mechanics will earn $24,800 or less So you see the diagram always draw the picture you've been told this several times it's called the left tail see the way it is the table only tells us 0 to Z So we took the $40,000 that's going to be a 0 it's always going to be 0 Now what is $24,800 in Z value So the formula you're going to use you see it there X minus mu over Sigma So it's $24,000 minus $40,000 over $10,000 or minus $1.52 This is just saying that $24,800 is minus $1.52 standard deviations $40,000 Now you can use the Z table and look up minus 1.52 Now 1.52 doesn't matter whether it's a plus or a minus it's symmetric So we go from 0 to 1.52 by now you know how to read the Z table and you'll find the area is 0.4357 So the area from 0 to minus 1.52 is 0.4357 or $40,000 now we're in X values $40,000 to $24,800 is the same as 0 to minus 1.52 standard deviations and we know that area is 0.4357 But we know that the entire left left of 0 is half and from 0 to plus infinity is half So you take half minus 0.4357 and then you get the answer in that left tail let's call it the left tail the answer is 0.0643 or 6.43% of auto mechanics are going to earn less than $24,800 be asked what proportion of auto mechanics are going to earn $53,500 or more that's a right tail problem always remember this half and half cuts the normal distribution half on the right half on the left always keep that in mind you can even write 0.500 and 0.500 on the left just remind yourself now we want to get from $53,500 remember it's 0 to Z so $40,000 to $53,500 well that's the same as 0 to plus 1.35 in Z value converting the X to Z values so $53,500 is a Z value of plus 1.35 how do I get that? you take $53,500 minus the mean of $40,000 divide by $10,000 and then you get that value plus 1.35 now the table is always 0 to something so from 0 to plus 1.35 using the Z table we find that the area is 0.415 okay now we need remember the entire from 0 to infinity is 0.500 subtract 0.415 from 0.500 and then you get the answer in the right tail which is 0.0885 so the answer to the question is 8.85% of auto mechanics will earn $53,500 or more this is the hardest problem to solve and if you don't draw a picture you're not going to get it right the question asks what proportion of auto mechanics are going to earn between $45,000 and $57,000 now remember the way the table works is 0 to Z 0 is like base so we want to turn everything into Z values okay so the 40 as you know becomes 0 the mean is 0 the Z table has a mean of 0 so you just turn the $40,000 into 0 the $45,000 becomes a .5 it's half the standard deviation away in standard deviation units 45,000 minus 40 that's x minus mu over sigma 57,000 turns into a Z score of 1.7 how do you know? 57,000 minus 40,000 over 10,000 that's 1.70 now you can't get that area between .5 and 1.7 directly so what you got to do is you do 0 to 1.7 looking at the table and you get the bigger piece which is .4554 now you subtract see it's in red there now you take the smaller piece okay to get that red area the smaller piece is 0 to .5 so we take the 0 to .5 which is .1915 okay so now you have basically two pieces to get that in red the one you want you take the big piece 0.4554 subtract the smaller piece 0.1915 and what's left what's left is going to be the answer to the question between .45 and 57,000 so the answer turns out to be .2639 again there's no way to do this without having the picture and you kind of highlight what you're looking for and you notice you need the big piece minus the small piece and you get the area between .5 and 1.7 this is the most difficult question and it's not that difficult some normal distribution problems are in terms of percentiles parts D and E of this example ask you to compute percentiles of this distribution it sounds complicated you may get a little nervous but there's really nothing to stress out about you already know how to do this what's the 50th percentile in a normal distribution that's the mean in the z-distribution that means that we have a z-score of 0 if you take a standardized exam and you're at the 50% mark for the 50th percentile mark that means you have scored at the median and the mean and the mode if the test scores are normally distributed because as we know in a normal distribution the mean is equal to the median is equal to the mode let's look at how we can get other percentiles using the z-table so he is asking us to calculate the 80th percentile remember what a percentile is the 80th percentile is that value of the distribution for which 80% of the values are smaller 20% are larger so we split the distribution as you can see from the picture on the right side 80% 20% larger how does that 80% break down well, remember at the mean which in this case is $40,000 everything to the left of the mean is half of the distribution 0.500 plus 0.3 gives you 0.8 that's the 80th percentile so just take a minute draw your picture make sure you understand how everything fits together now we know if we know the percentile we know the area under the curve we're going to be using our z-table kind of backwards in reverse of the way we used that before the z-table we've been using and that we're going to continue using as examples in these lectures is the 0 to z-table where for any z-value you have an area under the curve between the mean and it but in this case we want to know we need the z-value we want to know what's the z-value for the 80th percentile we know the area under the curve and you can see from the picture the area under the curve which would be the blue shaded area displayed on the table is 0.3 0.300 what we're actually doing is sort of mucking around in the middle and finding the area the probability closest to 0.300 and when we do that we find that the z-value associated with it is 0.8 in the row heading 0.04 in the column heading we have a z-value of 0.84 that's not the complete answer yet because what we wanted to know is the 80th percentile of the x's of the salary of mechanics but that's just simple algebra in step 4 if we know z which is 0.84 and we know mu and we know sigma we just plug all those values in and solve for x using simple algebra and we end up with an x-value at the 80th percentile of 48,400 dollars first of all take a look at the picture and just make sure it's on the correct side of the distribution a very quick check we're supposed to be on the right side and 48,000 is higher than 40,000 so yes alright so at least we see we didn't make a gross error that's the answer the 80th percentile of this distribution is 48,400 dollars okay now we want to calculate the 27th percentile first we have to find a z-score that's associated with the 27th percentile every z-score is associated with some kind of percentile so which z-score is associated with the 27th percentile so look at the diagram you always draw this and you can figure out that roughly 23 percent of the area is between 0 and minus 0.61 right you can see that we have the z-table for you and so the area between 0 and 0.61 or minus 0.61 is 0.2291 which are rounding to 23 percent that means the z-score of 0.61 is associated with the 27th percentile bear in mind that any percentile below 50% has a negative z-score a 0 as you know is the 50th percentile a z-score of 0 at the 50th percentile if you below the 50th percentile it's got to be negative so in this case we know it's going to be a negative number it turns out that it's minus 0.61 and now just do the algebra minus 0.61 equals x minus 40,000 over 10,000 that's the x minus b over sigma we've been using and then when you solve it you'll see the answer is 33,900 that's x and as a check notice you're below the 40,000 40,000 would put you at the 50th percentile 33,900 is 27 percentile this is the way to solve it if you have access to a cumulative z-table and I think it's available we think we provide if you want it if you have access to it you can get the answer directly you can look up any kind of percentile actually any kind of z-score from any z-score you can get a percentile if you know that you're below 50 you look for the negative cumulative and you'll see you look at the table the z-score with the 27 percentile and you get the answer directly okay if it's a value that's a positive z-score then we know that it's above the 50th percentile so that's another way to do it we're doing it this way because many times you don't have access to that to the cumulative remember you'll see there's negative z-scores and positive z-scores if you're using the cumulative table practice practice practice you will learn this material much better if you do many many problems the more problems you do the better the easier the faster it's going to be for you to complete the problems and get them correct