 Welcome to our lecture on the normal distribution. If you looked at the lecture on probability distributions in general, you'll remember that for any continuous probability distribution, and the normal distribution is an example of that, the probability distribution is actually a curve. It's called a probability density function. The probability that X, a random variable, is equal to any one particular value must be zero because it's a continuous random variable. The probability that it takes on any one value is so minuscule that the probability of it taking on that value is zero. We're talking about a continuous random variable that can take on any value, including fractional decimal values inside of an interval. We talk about intervals. What's the probability that the random variable will take on a value between zero and one, let's say? We ask probability questions. We frame them in terms of intervals. In addition, we talk about the probability as the area under the curve because if you see the shaded area in the curve on the slide, if we're looking at the probability that the random variable will take on a value like that in that interval, let's say that interval is between zero and one. If we can get the sum, the area in that interval, the area under the curve, that would be a definite integral, and then we can get the area under the entire curve, divide one by the other. We get the proportion of times that the random variable takes on a value inside that interval. Well, a proportion is equal to a probability, and so that's how we get our probability. Of course, you'll also remember that the way the formulas for the continuous probability distributions, the way the formulas are constructed, the area, the total area under the curve is equal to one. So if we want to take the area under the curve that's shaded in and divide by the total area, that's just dividing by one, and it makes our lives a little bit easier. Let's look at some of the properties of the normal distribution, the famous bell-shaped curve. The three key characteristics. One, it's symmetric about the mean. That means on the right side of the mean, it's exactly the same as the left side. So the symmetry here. Also, you can see the highest point is right in the center. The mean equals the median equals the mode. That's why it's called a bell-shaped curve. And finally, fx, as you move further and further away from the mean, fx gets smaller and smaller and smaller, same as right as the left. And the way we show that is x goes from plus infinity to minus infinity. It's asymptotic. It means it's getting closer and closer to the horizontal axes, but never quite touches it. There's another way of understanding that if the IQ is normally distributed, let's say it's 100 is the mean. You have very few people who are going to have IQs over 150 and very few who have IQs below 50. The bulk is always going to be the earth of 100. What you see here is the formula that produced that curve, that bell-shaped curve for the normal distribution for the probability density function. I like to scare my students when I teach in class. I can't do it here because you'll figure it out in a minute. I like to say, oh, yes, we're going to use that formula and we're going to use calculus. You'll have to do integrals. So it's not going to work here, but basically I do want you to look at it. You're never going to use the formula. Again, thankfully, everything we need has been tabulated for us and we're going to be able to look up normal probabilities in a table. We'll see how very shortly. But do take a look at it. That curve is the y value for any particular x value, any value of the random variable x. It computes f of x. f of x is the relative frequency. So it has to be relative frequency, again, because we know the probability that x is equal to any particular value, right? Zero. We can only look at values inside of intervals if we're looking at probabilities. But that's not really the most important thing. The most important thing is to know that there is a curve and if we have the right information, every particular normal distribution can be drawn. You can do it yourself. You can do it with a computer program. Look at the formula. What do we have? We've got a lot of constants, right? We've got 1, we've got 2, we've got pi, we've got e. Those are all constants. We've got a random variable, x, that's the random variable. What's left? The only things that are left that are not constants and not the random variable itself are mu and sigma. These are the parameters of the normal distribution. Mu is the average of the normal distribution that you're working with. Sigma is the standard deviation. Remember from the very first lecture, mu and sigma are parameters of the population. For every mu and sigma combination, we have another normal distribution. Every normal distribution is characterized by a particular value of mu and a particular value of sigma. Really, there are an infinite number of normal distributions. How are we going to use one table to help us compute probabilities from the normal distribution we're interested in? We're going to see that shortly. As you've been told, there are infinite normal distributions. You can create your own. There are programs that will do a few. The program will ask you, give me the mu, and you can say mu is 12.67 and sigma is 3.99. It'll draw you a normal distribution. It's symmetric. You know the properties. There's only one we call z. That's the standard normal distribution. The standard normal distribution has a mu of 0 and a sigma of 1. That's why you have a z table. It's essentially showing you, it's like our template, and that shows you what the normal distribution looks like with 0 and sigma is 1. And by the way, because of that property, you can transform any normal distribution. It doesn't matter what you're working with. As long as you know it's normal distribution, you can transform it so that you can use that z table using the formula z equals x minus mu over sigma. That's called, we discussed this already early in the course, standardizing the data. And this way, you don't never need to use calculus in this course. Instead, you use the template, which the template, the z distribution, the standard normal distribution. I'm going to look at the standard normal distribution, the z, and see how to read this table. Very important. Maybe one of the most important things you can learn is how to read the z table. It's quite easy. Okay, look at the table. And suppose you want to know how much area do you have from 0 to 1. In other words, z is going to be 1. How much area do you have from 0 to 1? By the way, it'll be the same answer, the 0 to minus 1 is the same as 0 to plus 1. Unless you don't have negatives. It's the same thing. It's symmetric. Let's do 0 to plus 1. Now look at the, it's not really a column. See, underneath the z, you see a 0.00, 0.10, 0.2. Those are row headings. So now you want to know from 0 to 1. It's actually 1.00. So you go down and then you look at the row headings underneath the z. Look at the 1.0. Now you want the second decimal place. The second decimal place comes in those columns on top where it has 0.00, 0.01, 0.02, 0.3. That's your second decimal place. So if you want to know how much area is there between 0 and 1.00, you see it's highlighted. It's circled 0.3413. 0.3413. If I ask you how much area is between 0 and minus 1, ignore the sign. It's symmetric. So it's the same answer, 0.3413. In other words, how much area do they have between 0 and 1? 34.13% of the population will be between 0 and plus 1 standard deviation away. How much between 0 and minus 1? Also 0.3413. I'm going to study how to use this Now first, before I even get to a particular problem, if I ask you how much area between 0 and infinity, the answer is half. Let's say from 0 to minus infinity, it's a half. Because the area under the whole curve is 1. So those numbers in that table, the four decimal places, you can see it as a proportion or a probability. So suppose you're asked what is the probability or area from 0 to 1.28. You get to find 0 is the middle. Remember this is the z-table. The mean is 0. That's where there's a 0 in the middle. Now how do you find 0 to 1.28? Now look at, but actually the first column has a z. Those are the row headings. Think of that as row headings. Think of the numbers that actually are columns. That's the second decimal place. So it's 0.00, 0.01, 0.02. That's where you read the second decimal place. For 0 to 1.28, first you go to the row heading underneath the z of 1.2. You don't have the h yet. We need to get that extra 8. Well, you go to the column that has 0.08. See that column has 0.08. And you look at 1.2. Over the intersect, that's 1.28. And you see the answer is 0.3997. That's a probability or a proportion. So the answer is what is the area from 0 to plus 1.28? 0.3997. And guess what? If you need to get minus 1.28, you don't need another table. Because it's symmetric. So 0 to minus 1.28 will also be 0.3997. Almost 40% of the area is between 0 and 1.28. That's another problem. How about 0 to 0.87? Okay. So again, you look at the z, the row headings underneath z. You go to 0.8. Now we need the second decimal place. And now you figure it out. You're going to get that second decimal place. The column has 0.07. That's your second decimal place. So that's where 0.87 is. And notice, it circled for you. There's your answer. How much area do you have from 0 to 0.87? 0.3078. Or 30.78% of the area of the curve is between 0 and 0.87. Now we're asking you to figure out how about 0 to minus 0.87. Think about it. You'll know the answer. Let's do an example. We're going to take a weight. The weight of adult males. Suppose we know that the weight of adult males is normally distributed with a mean mu of 150 pounds and a standard deviation sigma of 10 pounds. That's a particular normal distribution. Now the question is, what's the probability that a randomly selected male will weigh between 140 pounds and 155 pounds? Now notice, we have an interval because we know that that's the only way that we can ask probability questions for a continuous random variable. Now we do have the standard normal distribution tabulated so that we can look things up. But how do we get the probabilities for a normal distribution with a particular mean and a particular standard deviation? Not normal. Not standard normal. It's the standard normal. The mean would be 0 and the standard deviation would be 1. That's not what we have here. That's basically what the solution is all about. Let's take a look at the solution. Now you see a picture of the normal distribution. You're going to see that repeatedly practically all throughout the semester and not just because we like to make pretty pictures. You must do the same thing when you solve your problems. Anytime you're looking for a probability from the normal distribution, draw the picture. The very first thing you do, draw the picture. It's almost impossible to answer questions correctly. Certainly not every single time without drawing the picture. Don't be lazy. Don't try to speed things up by not drawing a picture. You will be sorry and you'll have to do things all over again. You'll end up wasting time. Let me tell you what I do. Here we have the picture of the distribution in question. Mu is 150. You see mu is at the 50% mark. The distribution, as always, since it's a normal distribution, is symmetric about the mean. The right side is 50%. The left side is 50%. Each side is the mirror image of each other. What are we asking for? What we're asking for is the probability, the area in the interval between 140 pounds and 155 pounds. In essence, we're looking at two different non-overlapping areas. Why? Because we want to use the z-table that gives us the area under the curve between 0 and z, the z-table that we've been working with in the integrated area and the picture on the top. For every z-value in the table, you can get the area under the curve between 0 and it. We have to work with what we can get. We can't, in one step, get the area between 140 to 155 pounds. We could, if we wanted to use calculus in the formula that was up a few slides ago, but if we don't have to and if we don't want to, then let's not. We have one piece of the distribution between 140 and 150 on the left side of the mean, one piece of the distribution between 150 and 155 pounds on the right side of the mean. Again, now we have the problem of not having this particular normal distribution tabulated. Here, as we know, we can transform any normal distribution into a standard normal. We have the formula to do that. What we want to do is translate this picture from an x-distribution to a z-distribution. That's why you see the additional scale that was drawn under the x. On the left-hand side, you see the formula. We have the 140 into a z. So z is equal to 140 minus 150 in the numerator, divided by 10. 10 was the standard deviation. And the result is negative 1. That's why there's a negative 1 on the z-scale underneath lining up with the 140. The area under the curve between 140 and 150 is the same as the area under the curve in the z-distribution between 0 and negative 1. And when we look that up in the z-table, we find an area of 0.3413. In fact, I think we saw that for a different problem a minute or so ago. On the right side, we want to standardize that 155, turn it into a z. z is equal to 155 minus 150 divided by 10. And we end up with 0.5 for the z-value. And you see the 0.5 on the z-scale underneath the x underneath the x of 155. And so again, the area under the curve in the x-distribution between 150 and 155 is exactly the same as the area under the curve between a z-value of 0 and a z-value of 0.5. And when you look that up in the table you see that that value is 0.1915. It's important to note that these are non-overlapping areas. Because of that, they're mutually exclusive and we can add them. We can use the adding rules, the rules of addition of probabilities. 0.3413 plus 0.1915 gives you 0.5328 for the answer to the question. And so the probability that an individual male chosen at random will weigh between 140 and 155 pounds that probability is 0.5328 or in other words 53.28% of the population will be in that area. And here's another problem. Say we know that IQ is normally distributed with a mean of 100 and a standard deviation of 10. What percentage of the population will have A, IQ's ranging from 90 to 110, B, IQ's ranging from 80 to 120? What's the first thing you have to do? Always, always, always draw a picture. Take a look at the picture that's there. This is the same kind of picture you're going to be drawing all throughout this topic and other topics too. We've got a picture of the normal distribution, the original one, the X. We've got a Z scale drawn under it. For part A and remember if you have a multiple part problem you draw another picture for each one and take the word for it. If you think you'll be saving time by not doing that you're actually going to be wasting time because you'll eventually have to go back and start the whole thing all over again. So the area where we have to standardize the 90 and the 110 for part A much like the problem we did before where we've got non-overlapping pieces. We're going to be looking at the piece between 90 and 100 on the X distribution and we're going to be looking at the piece between 100 and 110 on the X distribution. But first we have to turn those into Z values so that we can look things up in the Z table. At 90 the Z value is 90 minus 100 over 10 or in other words negative 1 and we actually saw that value before and the area was 0.3413. At the 110 we have 110 minus 100 over 10 and that works out to a Z value of plus 1 which makes sense because we're talking about values that are symmetric about the mean the areas that we're interested in are non-overlapping areas about the mean but they're the same size. So it's 0 to negative 1 on one side 0.3413 and 0.3413 on the other side the answer to the question what percentage of the population will have IQs ranging from 90 to 110? You have to add those two probabilities up and when you take 0.3413 plus 0.3413 you get 68.6 826 but of course the question asked about percentages and to turn from a probability which is a proportion and convert to a percentage all you do is multiply by 100% and the answer to part A is 68.26% Part B the problem is very much the same but the two pieces the two non-overlapping areas on either side of the mean are a little larger that's really all it is at an IQ of 80 the Z value ends up being negative 2 at an IQ of 120 the Z value ends up being plus 2 the area under the curve looking at the table the Z table between 0 and 2 is 0.4772 so since the normal distribution is symmetric around the mean that's 0.4772 on the right side 0.4772 on the left side add those pieces together you end up with a really big number 0.9544 so in this particular example with this particular mean and standard deviation 95.44% of the population will have IQs between 80 and 120 in this example we're looking at the salary of auto mechanics suppose we know it's normally distributed with the population mean of $40,000 and the population standard deviation of $10,000 so question A asks what proportion of auto mechanics will earn $24,800 or less B asks what proportion of auto mechanics will earn $53,500 or more C is what proportion of auto mechanics will earn between $45,000 and $57,000 D is asking for the 80th percentile and finally E asks for the 27th percentile since we know it's normally distributed we can use the Z table question A was what proportion of auto mechanics will earn $24,800 or less so you see the diagram always draw the picture we've been told this several times it's called the left tail see the way it is the table only tells us 0 to Z so we took the $40,000 that's going to be a 0 using the conversion it's always going to be 0 what is $24,800 in Z value so the formula you're going to use you see it there so it's $24,000 minus $40,000 or minus $1,500 this is just saying that $24,800 is minus $1,500 standard deviations away from $40,000 now you can use the Z table and look up minus $1,500 $1,500 plus or minus so we go from 0 to 1.52 by now you know how to read the Z table and you'll find the area is 0.4357 so the area from 0 to minus 1.52 is 0.4357 or $40,000 now we're in X values $40,000 to $24,800 is the same as 0 to minus standard deviations and we know that area is 0.4357 but we know that the entire left left of 0 is half and from 0 to plus infinity is half alright so you take half minus 0.4357 and then you get the answer in that left tail let's call it the left tail the answer is 0.0643 or 0.043% of auto mechanics are going to earn less than $24,800 be asked what proportion of auto mechanics are going to earn $53,500 or more that's a right tail problem always remember this half and half so that 0 cuts the normal distribution half on the right, half on the left always keep that in mind 0.5000 on the left just remind yourself now we want to get from $53,500 remember it's zero to Z so $40,000 to $53,500 well that's the same as 0 to plus 1.35 in Z value converting the X to Z values so $53,500 is a Z value of plus 1.35 how do I get that you take $53,500 minus the mean of $40,000 divide by $10,000 and then you get that value plus 1.35 now the table is always zero to something so from zero to plus 1.35 using the Z table we find that the area is 0.4115 now we need the right tail remember the entire from 0 to infinity is 0.500 subtract 0.4115 from 0.500 and then you get the answer in the right tail which is 0.0885 so the answer to the question is 8.85% of auto mechanics will earn $53,500 or more this is the hardest problem to solve and if you don't draw a picture you're not going to get it right the question asks what proportion of auto mechanics are going to earn between $45,000 and $57,000 now remember the way the table works is zero to Z zero is like base okay so we want to turn everything into Z values so the 40 as you know becomes zero the Z table has a mean of zero so you just turn the 40,000 into zero the 45,000 becomes a 0.5 it's half the standard deviation away in standard deviation units okay how do you know 45,000 minus 40 that's x minus mu over sigma and you get 0.5 $57,000 turns into a Z score of 1.7 how do you know 57,000 minus 40,000 over 10,000 that's 1.70 now you can't get that area between 0.5 and 1.7 directly so what you gotta do is you do zero to 1.7 looking at the table and you get the bigger piece which is 0.4554 now you subtract see it's in red there the lines are in red now you take the smaller piece okay to get that red area the smaller piece is zero to 0.5 so we take the zero to 0.5 which is 0.1915 okay so now you have basically two pieces to get that in red the one you want you take the big piece 0.4554 subtract the smaller piece 0.1915 and what's left what's left is going to be the answer to the question 0.5 and 57,000 so the answer turns out to be 0.2639 again there's no way to do this without having the picture and you kind of highlight what you're looking for and you notice you need the big piece minus the small piece and you get the area between 0.5 and 1.7 this is the most difficult question and it's not that difficult if you draw on the diagram distribution problems are in terms of percentiles parts D and E of this example ask you to compute percentiles of this distribution it sounds complicated you may get a little nervous but there's really nothing to stress out about you already know how to do this what's the 50th percentile in a normal distribution that's the mean in the Z distribution that means that we have a Z score of zero if you take a standardized exam and you're at the 50% mark for the 50th percentile mark that means you have scored at the median and the mean and the mode if the test scores are normally distributed because as we know in a normal distribution the mean is equal to the median is equal to the mode let's look at how we can get other percentiles using the Z table D is asking the 80th percentile remember what a percentile is the 80th percentile is that value of the distribution for which 80% of the values are smaller 20% are larger so we split the distribution as you can see from the picture on the right side 80% are smaller 20% larger 80% break down well remember at the mean which in this case is $40,000 everything to the left of the mean is half of the distribution 0.500 plus 0.3 gives you 0.8 that's the 80th percentile so just take a minute draw your picture make sure you understand how everything fits together now we know if we know the percentile we know the area under the curve we're going to be using our Z table kind of backwards in reverse of the way we used it before the Z table we've been using and that we're going to continue using as examples in these lectures is the zero to Z table where for any Z value you have an area under the curve between the mean and it but in this case we want to know we need the Z value we want to know what's the Z value for the 80th percentile we know the area under the curve and you can see from the picture the area under the curve which would be the blue shaded area displayed on the table is 0.3 0.300 so we're using the table backwards and what we're actually doing is sort of mucking around in the middle and finding the area the probability closest to 0.300 0.300 and when we do that we find that the Z value associated with it is 0.84 0.8 in the row heading 0.04 in the column heading so we have a Z value of 0.84 that's not the complete answer yet what we wanted to know is the 80th percentile of the X's of the salary of mechanics but that's just simple algebra in step 4 if we know Z which is 0.84 and we know mu and we know sigma we just plug all those values in and solve for X using simple algebra and we end up with an X value at the 80th percentile of $48,400 does that first of all take a look at the picture and just make sure it's on the correct side of the distribution a very quick check we're supposed to be on the right side and 48,000 is higher than 40,000 so yes at least we see we didn't make a gross error that's the answer the 80th percentile of this distribution is $48,400 okay now we want to calculate the 27th percentile first we have to find a Z score that's associated with the 27th percentile every Z score is associated with some kind of percentile okay so which Z score is associated with the 27th percentile so look at the diagram you always draw this and you can figure out that roughly 23% of the area is between 0 and minus 0.61 right you can see that we have the Z table for you so the area between 0 and 0.61 or minus 0.61 is 0.2291 which are rounding to 23% that means the Z score a negative 0.61 is associated with the 27th percentile bear in mind that any percentile below 50% has a negative Z score a 0 as you know is the 50th percentile a Z score of 0 at the 50th percentile if you below the 50th percentile it's got to be negative so in this case we know it's going to be a negative number it turns out that it's minus 0.61 and I'll just do the algebra minus 0.61 equals X minus 40,000 over 10,000 that's the X minus B of a sigma we've been using and then when you solve it you'll see the answer is 33,900 that's X and as a check notice you below the 40,000 40,000 would put you at the 50th percentile 33,900 is the 27th percentile this is the way to solve it if you have access to a cumulative Z table and I think it's available, we think we provide if you want it, if you have access to it you can get the answer directly you can look up any kind of percentile and from the actually any kind of Z score from any Z score you can get a percentile if you know that you're below 50 you look for the negative cumulative and you'll see, you'll look at the table and you'll try to connect the Z score with the 27th percentile and you get the answer directly okay if it's a value that's a positive Z score then we know that it's above the 50th percentile so that's another way to do it we're doing it this way because many times you don't have access to that to the cumulative remember you'll see there's negative Z scores and positive Z scores if you're using the cumulative table practice, practice, practice you will learn this material much better if you do many, many problems, the more problems you do the better, the easier the faster it's going to be for you to complete the problems and get them correct