 Here we'll do a quick example, where we identify the sample and the population in a survey. The question asks, a survey asked 2000 US households if they currently own at least one pet. The results show that 69% of households do own at least one pet. Identify the sample and the population in this situation. Remember with a survey like this, the population is the group that we're interested in knowing something about. But it's usually not feasible to study the entire population, so we gather data from a small subset of that group. So in this case, we're interested in knowing about all US households, so that's the population. Notice that that wasn't explicitly stated, but it's clear from the problem statement that that's what we're interested in knowing about. Because it's infeasible to study all households in the US, we take this sample of 2,000 households. We gather data from them, and we use that to draw inferences about the entire population. Here we're looking at the idea of representative samples. So if we're looking to measure something about a population, we want to gather a sample to measure, and we want to make sure that when we do that, the sample represents the population. That it looks similar to the population as a whole. So we'll look at a couple of examples here. First of all, to find the average annual income of all adults in the United States, suppose we sampled representatives in the Congress of the United States. It turns out this is not a very representative sample. First of all, the salary for representatives in Congress is set at a fixed number, and that number is relatively high compared to the average income for all adults in the United States. So it's not representative, because if you look at the whole population, there are some people who make very little and some people who make a lot. And in the Congress, there's a fixed value that's unlikely to be similar to the average value of all adults in the US. So it's not a very good representative sample. We would say, no, this is not representative. The second example says to find out the most popular serial among children under the age of 10, you could stand outside a large supermarket one day and poll every 20th child under the age of 10 who enters the supermarket. It's not clear that there's any bias in this one. This seems like a pretty good way to find an answer to this question. If you pull children coming into a supermarket of the right age group, you're likely to get a pretty representative sample for all children to do this. Now you may want to pick different areas of the country, for instance, there could be differences depending on where you look, but without going any deeper, it doesn't look like there are any obvious red flags that this would not be representative for all the answers you're looking for. So this one looks fairly good. And the lesson from these is just that when you're gathering a sample, it's important to look for a representative one, one that's likely to look similar to your population. You don't want a sample that's chosen too narrowly or that's chosen with some sort of obvious bias. This is a simple example that illustrates a way that a sample can be biased. Here, a coach is interested in how many cartwheels the average college freshman can do at his university. Eight volunteers from the freshman class step forward. After observing their performance, the coach concludes the college freshman can do an average of 16 cartwheels in a row without stopping. Is this sample random and representative? In general, a good sample is random and representative. A simple rule of thumb for deciding whether a sample is random or not is just to think about whether or not every member of the population is equally likely to be selected. If so, there's randomness involved. To decide whether or not the sample is representative, think about whether the sample looks similar to the population. Here, the biggest source of bias that we observe, and bias means that the results will be skewed, is this voluntary response bias. Voluntary response bias means that rather than picking people to ask, the coach asks for volunteers. In this case, people that are able to do more cartwheels are more likely to step forward and volunteer for the study. Because of that, we conclude that this probably isn't a very good sample to do this study. A voluntary response bias also comes into play in surveys that have questions where certain responses are more favorable than others. In this example, we're going to decide which type of sampling is being used in each description. The first situation, we have a soccer coach who selects six players from a group of boys aged 8 to 10, then seven players from the group of boys aged 11 to 12, and finally three players from a group of boys aged 13 to 14 to form a rectine. Notice the key here, which is that the coach has divided the group into segments based on their ages. So there's a segment from 8 to 10, a segment from 11 to 12, and then a segment from 13 to 14. And from each segment, the coach has selected several players. So after dividing into segments, the sampling could be either stratified or clustered, depending on what happens next. But the fact that we're choosing a couple from each group makes it stratified. If we chose a couple of groups all together, that would look more like cluster sampling. So this first one is stratified sampling. Now, generally with stratified sampling, we select the same number from each group. In this case, the coach didn't do that. He selected six from one group, seven from another, and three from another. But generally speaking, with stratified sampling, we select the same number from each group. In the second part, there's a pollster who interviews all human resource personnel in five different high tech companies. And this may be hard to see at first, but the fact that this pollster selected five companies leads you to think that they looked at all the companies that were out there and thought of each company as a group, and they selected all human resource personnel from a few of these groups. So by dividing them into groups, again, we can think about either stratified or cluster sampling starting that way. And then because the pollster selected everyone from a couple of groups, that's cluster sampling. If they had selected a few from all the groups, that would be stratified sampling. But the fact that they selected a few full groups makes it cluster sampling. The third one, a high school educational counselor interviews 50 female teachers and 50 male teachers. Notice again that there's a separation of categories. And so the teachers have been separated into male teachers and female teachers. They've been divided into groups. And then from those groups, some have been selected an equal number from each, which again looks like stratified sampling. Next, a medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital. And the key here is that term every third, which is what identifies this as systematic sampling. Systematic sampling is where we have a list like this. And we pick some step like this, like three, and we check every third or we could pick every fifth or every tenth. Whatever it is that systematic moving through the list is what makes this systematic sampling. The next one, a high school counselor uses a computer to generate 50 random numbers and then pick students whose names correspond to the numbers. Notice how there's no division into groups. There's no systematic process. This is the full population of students. And we just select random numbers from that full group. And that's what makes this simple random sampling. That's kind of the simplest version where we're looking at the full population and using a random number generator to just select without any division into groups or anything else. Lastly, a student interviews classmates in his algebra class to determine how many pairs of genes a student at a school owns on the average. Notice here that this student is looking for information about the whole school, but rather than looking at a full student list and selecting randomly from them or selecting every third student or even dividing them into groups, the student just asks the students nearby, the students that are in his class next to him, which makes this a convenient sample. In this example, we'll see how to select a simple random sample from a population. The population we're given is a set of six quiz scores and there are 10 students. In this example, we'll draw a dot plot and we're given data that represents the ages of 30 randomly chosen NBA players. The first thing we need to do is draw an axis that will cover the full range of the data. So just kind of scanning through, it looks like the lowest value here is around 20 and the highest value is about 36. So let's make sure our range will at least cover those values. Let's draw an axis here with values from 20 up to 36. Now that we have our axis, all we need to do is read through each of these data points and put a dot for each one. So for the first value at 22, we just put a dot above the 22. Then we have 28. So we'll put a dot at the 28. Notice it hovers a little bit above the marker, but it doesn't really matter how high we put them as long as we place them at consistent heights just so we can visualize the final result. That'll make more sense as we draw more of these. The next value is at 20, then at 24, then at 26, then 21, 27, and then we get another 28. So this is the second time we've seen 28. So we won't draw the second dot at the same place as the first one, but we'll put it right above it. So now there's a second dot at 28. Then we go to 31 and 29 and just continue on entering these. We hit another 24 here. And again, we just put this one above the first one at that location. Then we have another 22, another 21, then a 25, another 22, another 25, 30, another 29, another 20, and so on. There's our 36, another 24, the first 23, another 36, another 24, another 29, and we'll just finish this out. So there we see our dot plot, where each time we run into a value we've already drawn a dot for, we just draw a one a little bit higher. So notice that the height of these stacks tells us how frequent an age is. So the more frequent ones are over here. And then the 36s, there's kind of what we would call an outlier. They're far out from the other data points. And for instance, 30 is relatively rare. There's only one of those and so on. So there's a lot we can tell from this plot. And we'll draw other types of plots later on that fit this same kind of pattern of looking for where the data is clustered and where it's spread out. But a dot plot is a very simple way to observe that at first. Here we're going to build a simple frequency table. Problem states that 19 people were asked how many miles to the nearest mile they commute to work each day. And their responses were recorded in this data set. The frequency table lists each possible data value and the number of times that data value occurs. So we put two columns like this, one for the data values and one for their frequencies. Now we'll go through the data set. And for each unique data value that we see, we'll list that in the left hand column. Once we filled in all these data values, now we just need to count how many times they occur. So for instance, I noticed that two appears twice in the data set. So it has a frequency of two. Three appears only once. So it has a frequency of one. And so on. And I fill in the rest of the table. And it's really as simple as that. Once we've filled in the frequency table, a quick check that we can do is add up all the frequencies and they should add up to how many data points we have. And here if we add up these frequencies, we do find a total of 19. In this example, we'll build a frequency table for categorical data. We're given a sample of MDA players with their position, which is a categorical variable. It divides them into categories, point guards, shooting guards, small forwards, power forwards and centers. And we're going to build a frequency table to go along with this. So the categories will be the divisions and then we'll count the frequency in each category. So our frequency table will start with two columns, one for the position and one for the frequency. And then we'll add a third column for relative frequency. It's a good thing to include when you draw a frequency table. It's not entirely necessary every time, but it's a good idea to include it when you can. So the positions we have point guard, shooting guard, small forward, power forward and center as our five categories. And then we just go through and count how many there are of each one. So for point guards, for instance, we have one, two, three, four of them. So the frequency is four. For shooting guards, you can count one, two, three, four, five, six, seven, eight, nine. For small forwards, we have one, two, three, four, five, six, seven. For power forwards, we have one, two, three, four and five. And then for centers, we have one, two, three, four and five. And notice that if we add up those frequencies, we should get the total number that we have, which is 30. So we should get 30 if we add those up. And you can check that by adding those frequencies. For the relative frequency, we need to divide each frequency by the total number that we have in our sample, which is again, 30. So for the relative frequency of point guards, we would divide four by 30, which we could write as a fraction. Or if we wanted to, we could write that as a decimal, which comes out to about 0.133 or 13.3 percent. Then for the next one, we would divide nine by 30, which works out to 30 percent. Seven out of 30 is 23.3 percent. And then five out of 30, about 16.7 percent. And that's for both of the last two categories. So again, a frequency table is pretty easy to construct. All you have to do is count how many fall into each category. In this case, the categories were the positions of the players. Here we're going to build a grouped frequency table using the dataset of NBA players with their points per game shown below. So we have 30 values ranging from 1.4 up to about 24. And we're told to use a class width of five for our grouped frequency table. So we want to start low enough that we can cover all of them. And rather than starting at one, let's start at zero just to get a nice round number and make it easy for ourselves. So in our frequency table, the first column will be points per game. And our first category, the first class, we'll start at zero. And we'll go up to five. But remember, we won't go all the way to five because we don't want to overlap our classes. So the first one's going to go up to just less than five. Let's use 4.9 to represent just less than five. And then the next one, we'll start at 5.0 and go up to 10, but just below 10. So we'll stop at 9.9. And then we'll go from 10 to 14.9, 15 to 19.9, and 20 to 24.9. We don't need any more classes because that's as high as we need to go to cover everyone in this data set. So the next column will be the frequency. And then lastly, we'll have the relative frequency. Since there are 30 values here in our data set, once we have the frequencies, we'll just divide each one by 30 to get the relative frequency. We won't show this in detail for all of these classes, but just for the first class, from zero to 4.9, we'll go through and select all the ones that fit into that category. So between 0.0 and 4.9, in the first row, we find 4.9. In the second row, we find 1.4 and 2.0. And in the third row, we find 3.0, 4.3. And that looks like all of them. So there are a total of five that fall into that range. And then we can continue this on for all the others, but I won't show the counting. You can go through and do that yourself. And I'll show you the results here. So once you count all the others, you should get 11 for the second class, five for the third class, four, and then five for the final class. And then if we divide each of these by 30 to get the relative frequency, five divided by 30 is 0.16 repeating. So you can round that to 0.167 or 16.7%. And then 11 out of 30, we could round to 36.7%, five out of 30 again is 16.7%. And then four out of 30, we could round to 13.3% to get the relative frequency for each. So it's really just counting. The only thing to keep track of for grouped frequency tables is to make sure your classes are all evenly wide and that none of them overlap. That's the important piece. Here we'll build a histogram using the data set for the players in the NBA with their points per game listed here. In the last example, we built a grouped frequency table. And now we're going to build a histogram that matches it. So we're going to take the frequency table we built earlier and just draw this histogram to represent the same picture. So to do that, we'll start with a grid and the x values will range from 0 up to 24.9 or up to 25. So we'll have 0 here, 5, 10, 15, 20, and 25. And each of these classes will fit between two of those values. So the first class goes between 0 and 5. The next one goes between 5 and 10 and so on. And of course we know that it goes just up to 5 but not including 5 and so on, but we'll draw it as if it goes all the way to 5 just to make the picture as simple as possible. Then on the vertical axis, we have the frequency. So here we have the points. Here we have the frequency. And the highest frequency we see is 11. So we need to at least go up to 11. So let's have 10 here and 5. And then we just draw a bar to the right height for each category. In the first class, the frequency is 5. So we'll draw a bar up to 5, ranging from 0 to 5. Then from 5 to 10, the frequency was 11. So that one goes all the way up to the top of the graph. Then from 10 to 15, that frequency was again 5. From 15 to 20, it goes down to 4. So we'll draw it a little shorter down at the tick mark for 4. And the last one goes back up to 5. So each class, each category gets a bar with the height representing the frequency. It's relatively simple. Once you've drawn the frequency table, which just consists of counting the ones that fit into each class. Here we'll build a bar chart for this dataset, which is the positions for the players in the MBA sample. Again, we have 30 observations, falling into one of five categories. Point guard, shooting guards, small forwards, power forwards, and centers. First, we need to find the frequency of each category, which means building the frequency table. But since we've already done that in a previous example, I won't go through in detail and count those. Rather, we'll just put the results here in this frequency table. The five positions and the frequencies, there are four point guards, nine shooting guards, seven small forwards, and five power forwards and five centers. Now just like with the histogram, we start with our grid. And this time, the horizontal axis will represent, again, our categories. So we'll have five spots for our bars to go. And then the vertical axis will again represent frequency. So the horizontal axis is position, the vertical axis is frequency. The highest frequency we see is nine. So we need to make sure we go up to at least nine. Let's again go to 10. And now we're ready to draw a bar for each position. So rather than having the bars connect, like they would with the histogram, with a bar chart, since we're thinking of these as separate categories, they're not ones that flow into one another. We'll draw the bars with some separation. So at the point guard position, we'll draw a bar that goes up to four, since there are four of those. And then at the shooting guard position, we'll draw one that goes up to nine small forwards have a frequency of seven and power forwards and centers each have a frequency of five. So again, we're drawing a bar where the height represents the frequency of that category. But unlike with the histogram, these bars are separated because we're indicating that these are separate categories, and there's not a flow from one to the next. Let's build a stem and leaf plot. The question says, suppose you gather data on how long it took you to get ready in the morning. For 40 days, you measured the amount of time between when your alarm went off and when you left the house. The results are below around to the nearest minute. And we want to build a stem and leaf plot for this. For a stem and leaf plot, we divide each value into its stem and its leaf. And the stems are generally the tens place, although you can tweak this and make it, for instance, the ones place. But in this case, since our values are two digit values, the first digit will represent the stem and the second digit will represent the leaf. So if you look through, you'll notice that all these numbers start with either one, two or three. That's the first digit. So our stems could be one, two, or three. And now we'll go through and for each value, we'll place that leaf in the correct category. So the first value is 35. So we'll put a five under the leaves category in the third row. Then for 28, we'll put an eight here. Then 25, we'll put a five here. Then 23, and so on. We'll put another three. Then 32 means we'll put a two in the third column. Then 29, 19, 21, 13, and so on. And I'll go through and fill all these out in detail. I'll go through and show the final result in just a moment. Now what you often find with the stem and leaf plot is that we write the leaves in the order of their value, rather than the order in which they appear in the data set. So instead of writing eight, five, three, three, nine, one, we would write the one first and then the threes and the fives and so on. Now let's take some extra work because we have to do some sorting. But when I show the results in a second, it'll be shown in that way with them sorted by value. Now that we have all the stems and leaves written, notice how it looks. It looks almost like a histogram or a bar chart turned sideways, where the length of each string of leaves represents the frequency of that category. So it divides it into categories by tens, and we can see that the most frequent category is the range from 20 minutes to 29 minutes. And yet, unlike a normal grouped frequency, by grouping them, we haven't lost any information. We could reconstruct the entire data set from this stem and leaf plot if we had to. In this example, we'll construct a scatter plot for some data that we're given. We're given the sizes of several TVs and the price that goes along with each one. Now our data is separated into three categories just so that the table could fit easily here. But side by side, you'll see a size and a price where the size is given inches and the price is given in dollars. And we want to construct a scatter plot where we compare these two variables. Now when we construct a scatter plot, we have to pick which variable will be x and which one will be y. It's not crucial that we pick it in the right order because it turns out if they get switched, much of the analysis we can do is still the same. But if we can, it would be nice to pick x and y in a reasonable way. And usually we want to think about how x determines y. In other words, is there one of the variables that seems to control the other one? Would we say that the size determines the price of the TV? Or would we say that the price determines the size? It's probably more likely that we would say that the price depends on the size or the size determines the price. So we could call x the size and y the price. So let's say our x-axis represents size and our y-axis represents price. Again, we'll try to be consistent with these where the first column is x, the second column is y. But it turns out that if you switch the order, it doesn't change too much, at least at this point. So now we need to put a scale on each axis. So for the sizes, the sizes go up to about 60 inches. So let's make sure we include at least up to 60. Let's say we go up to 70 here. And if we start down at zero, that means we need to divide this evenly so that we get to 70. So halfway there would be 35. And then if we divide these by fives, 35, 10, and so on. And the prices range from 200 to about 2800. So let's say we include all the way up to 3000. And let's mark it in increments of 500. Now for each value in our list, let's take the first one, for instance, the size is 43 inches, the price is $500. So on the horizontal axis, the size axis, we'll go up to 43, which is around here between 40 and 45. And then on the price side, we'll be right at 500. So we want to find out where those two cross, which is right around here. So our first point will be right there at 43 inches and $500. For the next one, we'll go to 55 and $900. So 55 is right here between 50 and 60. $900 is right here right under 1000. So those two look like they cross right around here, which gives us our second point. And then we'll continue for the rest of them without showing each one in detail. I'll just show the final picture here in a moment after we've drawn all these points. So there's the final result. And even though it's hand drawn and imperfect, we can see the general trend, which is that larger TVs tend to cost more. So there's this general upward trend as you scan through this picture. And that's really the value of a scatter plot is to look for an association or a connection between two variables to see if there's a relationship. And this one looks like there's this upward trend. Later on, we'll talk about how to draw a line or some other curve that represents this shape. But for now, just notice that there's that relationship between the two. Find the median of the salaries listed below. So these are the salaries from the MBA data set. We have 30 values. We want to find the median. Remember that the median is the middle data point. Now, helpfully, these salaries have been listed for us in order from smallest to largest. So notice we have the smallest value here. And then they increase along this row. And then down to the next row, until we reach the largest value at the bottom right. And that's important when you're finding the median to order them from least to greatest or from greatest to least so that when we look in the middle of that list, the middle value is truly in the middle between the highest and lowest so that half of the data points fall below it and half of them fall above it. So since it's listed like this for us, we can look halfway through since there are six rows at the end of the third row. And the beginning of the fourth row, we have the middle. So the middle is right between this value and this value. Since there are two values in the middle, which will happen every time we have an even number of data points, we need to find the number halfway in between those two, which we do by averaging them together. So find the average of those two numbers in the middle, that will be the median. If we had an odd number of data points, like say we had 29 players, we would find the middle value and that would be our median. We wouldn't have to find any average, but anytime there's an even number, there will be two in the middle. So we need to find whatever is halfway in between those two. So the median will be halfway between 4,469,160 and 4,767,000. So if we add those together into 5.2, what we get is 4,618,080 and that's the median. Half of the players make less than 4,618,000, half of the players make more. So that's the middle data point and it's a good measure of the center. Here we'll find a weighted average using a student score in a class. So we have several assignments, three tests, homework, project, and final exam. And we're given the student's score on each of these assignments as well as a weight for each assignment or a number of points for them. Now notice that usually we'd be given either the weight or the points. In this case we're given both and if you look closely there are a total of a thousand points that they could earn. If you add up all those points you should get a thousand and if you divide each of those point values by a thousand you'll get the percentages that are listed here. So really the same information is given in the weight column or in the points column and you may see for some of your classes the scores are given with weights. Sometimes they're given with points. It's really the same thing. It's just written in a different way. So we're going to show the calculation with both just to compare but we'll get the same answer either way. For part A then let's use the weights. So for the weights we can multiply each score the student got by the weight that's associated with it and once we multiply those and add them all together the answer we'll get at the end is the average score. So the weights are nice because they're already scaled for us so that just by multiplying the score times the weight and adding those together we'll get the final weighted average. So all I have to do is take 85 percent times 20 percent plus 92 percent times 20 percent plus 87 percent and so on. So if we multiply all those and add them all together the answer we get at the end is about 0.899 which works out to 89.9 percent. So on a standard 10 point scale this student will be very close to an A just under 90 percent. Now let's do the same thing with a point system and we're going to get the same answer of course but this time we have to do two steps where first we multiply their score so for the first category they got 85 percent times the number of points they got and then do that for all of them and once we multiply and add all those up we get 899 points so they earned a total of 899 points and then when you divide that out of the total of number of points they could have earned which was a thousand they get 0.899 which again works out to 89.9 percent. So notice you're doing the same work both ways it's just that when you use the weights you've already divided by a thousand before you start when you use the points you have to divide by a thousand at the end to get the final score the same way you did with the weights so either way you do it you're using the same values and this is how you do a weighted average whether you're given the weights as percentages or as points out of a total that you could earn. Here we'll find the mode of a data set that summarized for us we're already given the frequency table rather than just the raw data so most of the work is done for us the mode is the most frequent data value and the nice thing about having the data given to us as a frequency table is all you have to do is look through the frequencies and find the highest frequency notice the highest frequency is four and occurs three times there are three values which occur four times and are tied for most frequent and those are 21 22 and 24 so there are actually three modes and this can happen a lot in our case the modes are 21 22 and 24 here we'll find the range of a data set that we're given and the data set is the heights of the players given in the MBA data set the range is really simple it's just the difference between the smallest and the largest so all you have to do is look through this list and find the smallest number and the largest number that occur if you scan through you should be able to pick out that the lowest number is 1.78 1.78 meters is the shortest player in the sample that we listed and the tallest player is 2.13 meters so the range is just the difference between those two we take 2.13 minus 1.78 the range is the difference or 0.35 meters that's a really simple way of measuring how spread out the data is if the range is larger it's more spread out if the range is smaller it's more tightly clustered together so 0.35 when compared to the values in this data set is a fairly small range so they're fairly tightly clustered together all of these players are relatively tall here we'll calculate the standard deviation by hand using the formula instead of using the built-in function in the calculator just to illustrate how the formula works we have five data points and we want to calculate the standard deviation the first thing we need to do is to calculate the mean so the mean remember is the sum of the data points divided by the number of data points so we add them up and divide by five and we find that the mean is 10.2 now the standard deviation of this data set is the square root of the sum of the squared deviations divided by n minus 1 and all of that's inside the square root so we'll need to calculate these things individually we need to find each deviation the difference between each data point and the mean square those add them up divide that answer by n minus 1 and then take the square root of that answer so this can get kind of tedious which is why we only do this with small data sets and from this point on we'll use the built-in function in the calculator but here we'll just illustrate the formula once we'll use this table to organize everything we have the data values listed and we'll calculate the deviation for each one by simply subtracting this number 20 for instance minus 10.2 and then 4 minus 10.2 and 15 and so on and we'll calculate the deviation for each data value once we've done that we square them all and remember the reason that we square them is because if we try to average these deviations we would get zero because the positive ones and the negative ones would cancel each other out but by squaring them we end up with all positive numbers so when we average those they don't cancel each other then here all the square deviations are filled now we need to add those square deviations divide by n minus 1 or 4 in this case and then take the square root of the answer adding these all up we get 210.8 and then if we divide that by 4 we get 52.7 but we're still not done the last step is to take the square root of that the square root of 52.7 is 7.26 so the standard deviation of this data set is 7.26 or the distance that a typical data point is from the center again this example mostly illustrates why we don't calculate the standard deviation by hand usually even for a small data set it gets pretty tedious a random sample of 11 statistic students produce the following data where x is the third exam score out of 80 and y is the final exam score out of 200 here we'll use a graphing calculator to find the equation of the least squares regression line so that we can predict the final exam performance which is y based on the third test score which is x so to use the calculator we first need to enter the data if we enter the stat menu and edit we can enter the data here which I've already done the first list we've entered the x's and in the second list we've entered the y's now if we go back under the stat menu we want to calculate so we'll scroll over to the calc menu and we want linear regression the form we've been using is this number four linreg ax plus b so a is the slope and b is the y intercept if we select that we don't need to change anything here because we entered the x's in list one and the y's in list two so nothing needs to be changed we can just scroll down to calculate and it gives us the equation the form of it is y equals a x plus b and we get the values for a and b as well as values for r squared and r and now we have a better sense of what this r represents that's the correlation coefficient so the correlation coefficient is about 0.66 which means there's a moderate linear relationship and it's positive so there we go there's our regression line and there's the r value that goes with it here we're given a regression equation which we developed in a previous example to predict the price of a house based on its square footage so if x represents the square footage we can predict the price in thousands of dollars so if we entered a thousand for the square footage for x we could calculate a value for y that would be the predicted price for houses with that square footage now we're going to use this equation to predict the price of homes with two different square footage values so all we have to do in each case is replace x with the given value for square footage in the first case we have y hat equals 0.099 times 2700 plus 160.8 which works out to 428.1 so that corresponds to 428,100 and that technically means that the average house price we would expect for all the houses that are 2700 square feet would be 428,000 approximately and then in the second case we can make a similar prediction for houses with 4,500 square feet and that works out to just over 600,000 then the question asks which prediction do you expect to be more reliable now for this we really have to go back and look at the data which we don't have in front of us but if you go back in the textbook and look at the data you'll notice that in the range of houses that we have data for 2700 square feet falls within that range and 4,500 square feet does not none of the houses we have in our dataset are nearly as big as 4,500 square feet so it turns out that the first one is more likely to be reliable without external information for all we know that now it seems more likely that the first prediction would be more reliable because we've seen houses that are similar to it in our dataset the first example we call interpolation where we are predicting within our data range the second one we call extrapolation where we're predicting beyond our data range and in general extrapolation is dangerous because we don't know quite what could be happening beyond the range of data we've actually looked at so in general interpolation is more reliable than extrapolation so the first prediction is more likely to be reliable even though we don't have a lot of information about it at the moment here we'll do a linear regression problem with several steps first we're given some data which is a set of quarterbacks in the nfl during the 2019 season and for each quarterback we're given their height and their weight we're going to compare these two now without any other information at the moment we're going to assume that height is going to represent x and weight will represent y just because they're ordered that way later on in the problem we might have more information that'll tell us which one should be x and which one should be y but for now we'll just assume that and change if necessary so first we're going to calculate the correlation coefficient and to do this we'll use the calculator so first we need to go to the calculator and enter this data to enter the data we'll hit the stat button and hit enter to get into the edit menu and under list one we'll enter the x values so the heights that we saw the first quarterback had a height of 75 inches then 74 74 70 71 74 76 77 72 and 74 and then we can scroll over to the second list for their weight and we'll enter these as well once we have all the data entered we can go back into the stat menu and scroll over to the count down to the linear regression option again we've entered x's and list one and y's and list two so we don't need to change anything if those were switched it wouldn't change the correlation coefficient anyway so for this first part it didn't really matter which way we entered them but later on it will be significant so we'll just hit calculate and it goes ahead and gives us the equation for the line which we don't need just yet all we're looking for at the moment is the value of r which is about 0.6 the first part of the question just asked for the value of r so now that we have that we can go back to the notes and enter that now for the second part of the question we want to know is there a strong linear relationship so based on the value of r we know first of all that there is a positive relationship since r is positive and we wouldn't necessarily say it's a strong relationship because it's less than 0.8 which again is not a magical number but it's sort of an agreed upon level for a strong relationship but 0.6 we might say there is a moderate relationship and often a moderate relationship is good enough to continue on and we'll continue to do the rest of the problem so we wouldn't necessarily say it's a strong relationship but it is a moderate linear relationship the next part of the question asks for the regression line which we've already calculated now notice the direction here the regression line is going to predict the weight from the height in other words the weight is going to depend on height which means that weight should be y and height should be x which is the way that we already set it up so it's good that we did so and the regression line that we got from the calculator earlier is the one that will write here so if we go back to the calculator we can see what we had there we have a is about 3.01 and b is negative 4.43 so now that we have that we can move on to the rest of the problem here we want to graph the data as well as the regression equation and I'll actually use the calculator to do this we'll graph both the data and the equation on the same window we already have the data entered so if we turn on the stat plot under second y equals we turn on this first stat plot and leave it as a scatter plot the way it is x and y are in the right order now when we graph we'll see those points as long as we're scaled to the right window and then also if we hit y equals we can graph the equation which was 3.01 x minus 4.43 now before we graph we should go to the window and make sure that our x values and y values are in the right range the x values the heights range from 70 up to 77 so we should make sure that x covers at least that range so let's say we make x min 68 and x max 79 and then the y values the weights range from 200 up to 233 so let's say we go from 190 to 240 now when we graph this we see those heights as well as the line that passes through them now notice what this graph gives us first of all it points out that outlier on the lower right hand side and that's the first entry Lamar Jackson is on the upper end of the height scale he's at 75 inches he's one of the taller players but he's actually the lightest of all of them at 200 pounds so if we actually removed him from the data set our r value will be much much higher because he's what's making the data points not follow a straight line since he's kind of out in space by himself that messes up some of the strength of the linear trend that would be there otherwise so sometimes there are outliers like this that will lower the value of r but if you remove them the value of r will be better of course you can't just throw away outliers because your data would look better without them but keep that in mind next we can predict the weight of a quarterback who is 73 inches tall so given a height we can predict their weight using this equation so a is 3.01 and then if we plug in 73 for x we can predict that y hat their weight for a quarterback of this height would be about 215 pounds lastly we get this question does drew breeze weigh more or less than the weight predicted by the regression line based on his height so we can do the same thing we did in the last part where we predict what someone who is the height of drew breeze 72 inches would tend to weigh according to this equation according to this equation that comes out to 212 pounds approximately so the question asks does he weigh more or less than the predicted value the predicted values 212 pounds and he only weighs 209 pounds so he weighs less than what's predicted so here are the answers to all the parts of this question we found the correlation coefficient we interpreted it we found the regression line and grafted alongside the data and then we made a couple of predictions based on that regression line the scores in a college entrance exam are normally distributed are normally distributed with a mean of 52 points and a standard deviation of 11 points and we're asked to find what two scores encompass 95 percent of the test takers since the data is normally distributed the empirical rule tells us that 95 percent of the data will be within two standard deviations of the mean since one standard deviation is 11 points two standard deviations is 22 points twice that so 52 minus that and 52 plus that will form the boundaries of this range that includes 95 percent of the data therefore we decide that 95 percent of the values lie between 30 and 74 IQ scores are normally distributed with a mean of 100 and a standard deviation of 15 here we're asked to use the empirical rule to find the data that is within one two and three standard deviations of the mean remember the empirical rule states that 68 percent of the data is within one standard deviation of the mean meaning that if we go one standard deviation below the mean and one standard deviation above the mean that range will hold 68 percent of the data 95 percent of the data falls within two standard deviations and almost all the data or around 99.7 percent of the data falls within three standard deviations so if we work from the mean down three standard deviations and up three standard deviations will encompass almost all the data in this case with a mean of 100 and a standard deviation of 15 one standard deviation below will be 85 and one standard deviation above will be 115 100 minus 15 100 plus 15 so 68 percent of the data in other words 68 percent of people will have an IQ score between 85 and 115 95 percent of people will have an IQ score between 70 and 130 and almost everyone will have an IQ score of between 55 and 145 here we have a picture that illustrates this situation where each unit on the axis is one standard deviation suppose you know that the prices paid for cars are normally distributed with a mean of 17 000 and a standard deviation of 500 we want to use the 68 95 99.7 rule or the empirical rule to find the percentage of buyers who paid in any given range the first thing to do when working with a problem like this is to draw a picture and here's the picture for this example the center is the mean and then each unit on the axis is the standard deviation so we have it centered at 17 000 we go up to 17 500 18 000 18 000 500 and we could keep going but that's all we need and then on the lower side we go down to 16 500 16 000 and 15 500 here i've also filled in all the percentages for each range notice that the empirical rule tells us that 68 percent of the data falls within one standard deviation of the mean since everything is symmetric each half of that holds half of that 68 percent and that's where those 34 numbers came from then i know that within two standard deviations of the mean i have 95 percent of the data if i have 68 percent in the middle and by going out another standard deviation i get the 95 percent that means the yellow regions together must make up that 27 percent that gets us from 68 percent to 95 percent so if the two yellow regions together hold 27 percent each of them holds half of that or 13.5 percent we can repeat this process for the green regions again going out to a third standard deviation we know that that holds 99.7 percent of the data so the green regions together must hold 4.7 percent of the data since everything up through the yellow regions held 95 and then just by adding the two green regions we got the 99.7 the green regions must be that 4.7 percent therefore each of them holds half of that or 2.35 percent and then outside the 99.7 is 0.3 percent of the data and again because it's symmetric each half of that or each tail contains 0.15 percent again the goal is not to memorize these percentages but just to realize how we got them and be able to re-derive them at any point but now that we have them we can use them to solve the problem so part a asks what percentage of buyers paid between 16,500 and 17,500 so we find those two points on our picture and between them we add up the blocks and find that 68 percent of buyers were in that area similarly for part b between 17,500 and 18,000 we locate those points and between them there's just one region with 13.5 percent for part c we look between 16,000 and 17,000 and there's two blocks there adding them up we get 47.5 percent for part d we're looking between 16,500 and 18,000 and adding up those three blocks we get 81.5 percent so 81.5 percent of people paid somewhere between 16,500 and 18,000 for their car part e asks what percentage paid below 16,000 so that's right here and below 16,000 we have two blocks to take care of 2.35 percent and 0.15 adding them together we get 2.5 percent so 2.5 percent of buyers paid less than 16,000 for their car lastly the final part asks what percentage paid above 18,500 and again there's only one block up there so 0.15 percent of buyers paid above that amount and here we have all the answers summarized here we're told that female adult height in some population is normally distributed with mean of 65 inches and a standard deviation of 3.5 inches and we're asked to find the z scores of two heights remember that a z score is the data value minus the mean divided by the standard deviation and this z score represents how many standard deviations this data point is above or below the mean if the z score is positive it's above the mean if the z score is negative it's below the mean for the first example then to calculate z we take 58 minus the mean 65 and divide by the standard deviation 3.5 58 minus 65 is negative 7 and that divided by 3.5 is negative 2 so what that means is that first data value is two standard deviations below the mean for the second point we do the same thing now taking 71 minus 65 and dividing by 3.5 71 minus 65 is 6 and that divided by 3.5 is 1.71 so the second data point is between one and two standard deviations above the mean scores on an IQ test are normally distributed with a mean of 100 and a standard deviation of 15 we're asked to find the IQ score that corresponds to each of the following z scores so here we're given z values and we're asked to work backward from them to the data values again remember though that a z score is simply a given data value minus the mean divided by the standard deviation here we're given the z score the mean and the standard deviation and asked to solve the data value for the first one for example we know that z is negative 1.5 the data value is the unknown part that we're going to find so the z score is x minus the mean 100 divided by the standard deviation of 15 now to solve for x i'm going to multiply both sides by 15 and we find that negative 22.5 equals x minus 100 then add 100 to both sides to get x equals 77.5 so a z score of negative 1.5 corresponds to an IQ score of 77.5 for part b we do the same process the z score equals the data value minus the mean divided by the standard deviation so we multiply both sides by 15 and find that 30.75 equals x minus 100 and then adding 100 to both sides x equals 130.75 so a z score of 2.05 or just over two standard deviations above the mean corresponds to an IQ score of around 130 this question asks what is the margin of error on a poll with a sample size of a thousand people with the simplifying assumptions that we've made the margin of error is one divided by the square root of the sample size as a percentage so this will give us a decimal and then we'll write it as a percentage to do that some people write that this times 100 percent is the margin of error and that times 100 percent just converts that decimal into a percentage again there's some simplifying assumptions that lie behind this but for our purposes this is good enough in this example n is a thousand so the margin of error is one divided by the square root of a thousand times a hundred percent one divided by the square root of a thousand is point zero three one six etc and multiplying by a hundred percent or converting to a percentage gives us three point one six percent approximately it's fairly common for a poll to have a margin of error of around three percent and when you see that you can tell that the sample size is around a thousand people in this example we're going to find a sample size that corresponds to a given margin of error so if we want a poll to have a margin of error of two percent or less what's the minimum sample size needed to make that happen remember the margin of error looks like one over the square root of n as a percentage so we can write times a hundred percent just to indicate that we're making it a percentage so if we want this to equal two percent then really what we want is we want one over the square root of n to equal zero point zero two so as a decimal we want one over the square root of n to equal point zero two or as a percentage we want it to equal two percent now we need to solve for n and it takes a little bit of algebra to do so there are a few ways to observe this one if we want to get n by itself we could move it out of the denominator by multiplying it on both sides so if we multiply it over there we get one equals zero point zero two times the square root of n and then again we're trying to get this part by itself so we'll divide both sides by point zero two which gives us one over zero point zero two equals the square root of n now there's a shortcut to this at this step you could say we have one over the square root of n equals point zero two you can actually flip both sides of that equation upside down and get square root of n over one equals one over point zero two which gets you to the same place that we already did just by a shortcut don't get too lost in that step if that doesn't make sense but there are shortcuts to some of this algebra once we get to this point though we just have one step left which is that we need to square both sides now before I do that I'm going to simplify one divided by point zero two and that simplifies to 50 but now to get rid of that square root we just need to square both sides and the square and the square root will cancel each other when we square 50 we get 2,500 equals n so that's our answer that if we sampled at least 2,500 people we would be guaranteed to have a margin of error of two percent or less and if we sampled more than 2,500 our margin of error would be less than two percent but as long as we sample 2,500 we'll be guaranteed to have a margin of error of two percent consider randomly selecting a family with two children where the order in which different gender siblings are born is significant that is a family with a younger girl and an older boy is different from a family with an older girl and a younger boy what would the sample space look like for this family in other words what are the different possibilities for children that this family could have remember the sample space is a list of all the possible outcomes well in a family with two children either they could both be girls or they could both be boys or you could have one girl and one boy now we're told that the order matters so having a girl than a boy is different than having a boy than a girl so these are the four possibilities and that's our sample space suppose we toss a fair coin and then roll a six-sided die once describe the sample space s again the sample space is a list of all the possibilities for what could occur if we're flipping a coin and rolling a die then we can think about it this way either the coin is going to be heads or tails and then the die is going to be a number from one through six so let's suppose the coin comes up heads well then the number could be one or it could be two or three or four or five or six and then the coin could come up tails and the die could be either one two three four five or six so we have six more possibilities with the coin coming up tails and that is a full list of all the possibilities that could occur with this experiment assume you're rolling a fair six-sided die what is the probability of rolling an odd number remember a basic probability is the number of ways that our event could occur so in this case the number of ways to roll an odd number divided by the number of total possibilities for what could occur when we run this experiment so all we have to do is count up how many ways this event could happen and count up the number of total possibilities and then divide the two to get the probability so our sample space or our list of total possibilities is the numbers one two three four five and six because those are the outcomes that we can have when we roll a die of those three of them are odd numbers the one the three and the five so there are three ways to roll an odd number six total possibilities so the probability is three out of six which we could simplify to one half or we could write as point five or fifty percent any one of them is an acceptable answer and of course at the beginning of this problem you might be able to think about it and realize that the die is going to come up either an even number or an odd number and since there's an equal number of even numbers and odd numbers the chance that it comes up odd is one out of two and that again illustrates that there are often many different ways to solve a probability problem and as long as you use one of the correct ways everything's fine suppose you draw one card from a standard 52 card deck what's the probability of drawing an ace here i put a picture of a standard 52 card deck just for reference again we have four suits the clubs hearts spades and diamonds within each suit there are 13 cards ace and then the numbers two through 10 and then the three face cards jack queen king so you've got 52 total cards to draw from so the number of possibilities or the denominator in our probability is going to be 52 now out of those 52 possibilities four of them correspond to drawing an ace because there are four aces so the probability of drawing an ace is simply four out of 52 lisa's cookie jar contains the following five peanut butter cookies 10 oatmeal raisin 12 chocolate chip and eight sugar cookies if lisa randomly selects one cookie from this jar what is the probability she gets a peanut butter cookie again probability is simply the number of ways that our scenario can happen so in this case the number of peanut butter cookies in the jar divided by the number of total possibilities for what can happen so the total number of cookies so all we have to do is count the number of cookies in total divide the number of peanut butter cookies by that and that's our probability if there are five peanut butter 10 oatmeal raisin 12 chocolate chip and eight sugar cookies that's a total of 35 if we add up those four numbers and there are five peanut butter cookies so the probability of drawing a peanut butter cookie is five out of 35 the number of ways that can happen divided by the total number of possibilities once we have that we can either leave the answer this way or we can simplify the fraction to one seventh or write it as a decimal or a percentage again any one of them is fine as long as we get the right number consider the following information about students enrollment at FCC we have a table that breaks down the gender of students and we're told how many male students there are how many female students there are then we're told that we randomly select one person from all these students and we want to know what is the probability that the one we selected is a male student well again the probability is simply the number of ways that that could happen well out of this group there are 2,580 ways to select male student because that's how many male students there are and there are a total of 6,233 students and I got that number by adding together the female students and the male students so that's the probability and written as a decimal that's approximately 0.414 here we're given a breakdown of some students at FCC by their residents so some live in Frederick County some live out of county some live out of state and some are employees of the college furthermore we're told that each student can only be assigned to one category so there's no overlap no one falls into any two categories we randomly select a person from this group and we want to know what the probability is that that person we selected is from out of county so we're looking at this second category well again the probability is the number of ways that could happen so 245 is the number of people that we could select in that category and then divided by the total number of people there are so if we add up the 5819 the 245 the 141 and the 28 we get a total of 6233 so that's the probability the number of ways of selecting an out of county student divided by the number of ways of selecting any student and again if we write that as a decimal we get about 0.039 this example has what's called a contingency table or a two-way table so we have 130 FCC students broken down by gender and by dominant hand so they're male and female and then right-handed and left-handed and since we're breaking them down by two categories there are four total possibilities female right-handed female left-handed and then male right-handed and male left-handed we're also helpfully given these totals for each row and each column if we weren't given those we could easily find them for instance the total right-handed is simply the male right-handed plus the female right-handed adding those together we get 105 and so on you can add up each row and column and then once you have these row and column totals if you add together the two column totals you get 130 and if you add together the two row totals you also get 130 and that's a nice check to make sure that everything works out and that you haven't added something in correctly now though we're asked if we randomly select one person from this group what is the probability that the student is left-handed so all we have to think about is the number of ways that we could select a left-handed student divided by the total number of ways that we could select any student well the number of ways to select a left-handed student is just the number of left-handed students there are so if we go to the left-handed column we notice the total there is 25 which again we could have gotten by adding the 13 and the 12 so there are 25 left-handed students so 25 ways to select a left-handed student and there are 130 total students all in all so the probability of selecting a left-handed student is 25 divided by 130 and if we write that as a decimal comes out to about 0.192. Here's a simple example to demonstrate the law of large numbers again the law of large numbers states that if we do an experiment over and over and over again the results will line up with the probabilities that we expect before we run the experiment it's similar to one of the ways that we define probability for instance we could define the probability of flipping a coin and getting heads as 50 percent by saying that if we did this over and over and over again the proportion of heads that we would get would be 50 percent here though we'll illustrate this law so consider this experiment let's roll a six sided die over and over and over again and take the average of the results that we get for instance for the first 10 results we might get something that looks like this table or we get a four the first time then a two then a one then a six and so on and so forth if we average together these results we get 3.3 if each number from one through six is equally likely we expect to get the middle number as our average while the middle of those six numbers is between three and four so 3.5 so our expected result is 3.5 as the average for these rolls so after 10 rolls we've gotten pretty close we got to 3.3 but the law of large numbers says if we do this more and more times we should get closer and closer to the average of 3.5 here's a graph of what happens if we roll the die even more times after just a few rolls there's some volatility in the results after maybe the first 10 rolls the average might be as low as three but over time as we roll the die more and more times taking the average after each roll of all the rolls up until that point slowly this average will start to creep back toward 3.5 toward the expected value so it may diverge a little bit but eventually over time it's going to settle down to that average so the law of large numbers essentially says that in the long run the average will be 3.5 even if in the short term it isn't quite that over time it'll settle down to that average and over time these results get closer and closer to the average that we expect and that's the law of large numbers that over time the empirical probability that you observe approaches the theoretical probability that you calculate before ever running the experiment suppose you roll a fair six-sided die once what is the probability of rolling a six or an odd number there are a couple of ways to do this problem and I'll do two for the first I'll use what we know about or probabilities we know that the probability of one event or another happening is the sum of their individual probabilities minus the overlap minus what happens when both of them occur in this case a six and an odd number are mutually exclusive events these events cannot happen at the same time if you roll a six you didn't roll an odd number and if you roll an odd number you didn't roll a six so the probability that both of them happen together is zero so in this case the probability of rolling a six or an odd number is just equal to the probability of rolling a six plus the probability of rolling an odd number because there's no overlap so that doesn't affect the calculation the probability of rolling a six well there's one six and six total numbers so the probability of rolling a six is one out of six rolling an odd well there's three odd numbers one three and five and six total possibilities adding those up we get four out of six or two-thirds or point six seven or sixty seven percent if you like the other way to do this problem is simply to list the sample space or to picture it at least and to think about how many outcomes correspond to either a six or an odd number well all the odd numbers are one three five and the six is the extra one so there are four outcomes that correspond to the event that was specified rolling a six or an odd number and there are six total possibilities so again we get four divided by six again there's at least two different ways of doing this problem and both of them give the correct answer both of them are perfectly fine ways of doing it so either way you do this problem is okay but for some problems where the sample space is so large that we don't want to list it or picture it the first method will be more useful suppose you draw one card from a standard 52 card deck what is the probability that you get an ace or a face card again the probability with or says that the probability of one thing or another thing happening is the sum of their individual probabilities minus the probability that they happen at the same time minus the overlap and again the reason for that is that if you simply sum the probabilities of the individual events you double count the overlap so we have to subtract off the overlap once so that we don't double count it here the probability that you get an ace or a face card well an ace is not a face card so these events are mutually exclusive or disjoint they won't happen at the same time for only drawing one card we're either going to get an ace or a face card or something else but we're not going to get something that's both an ace and a face card so in this case the probability that we get an ace or a face card is simply the sum of their probabilities because the probability of both of them happy together is zero so we just subtract off zero at the end which doesn't change anything the probability of drawing an ace well there are four aces the ace of clubs diamonds hearts and spades and there are 52 total cards the face cards there are three face cards for each suit the jack queen and king and four suits total so that's a total of 12 face cards out of 52 adding them together we get 16 out of 52 we could also simplify this down to four out of 13 and that's another way of thinking of it you could focus just on one suit and say well in this suit there's one ace card three face cards and so four cards meet the description that I was given out of 13 cards in this suit so all the suits are the same so my chances of drawing an ace or a face card are four out of 13 so again there's multiple ways of thinking of this problem multiple ways of analyzing it but with all of them if we think correctly we get to the correct answer here we're given this contingency table or two-way table that contains information about 130 FCC students broken down by gender and by whether they're right-handed or left-handed we randomly select one person from this group and we ask what's the probability that our randomly selected student is female or left-handed so we're looking for the probability that they're female or left-handed and again an or probability is the sum of the individual probabilities minus the overlap again we have to worry about subtracting off the overlap in this case because these are not mutually exclusive in other words our randomly selected student could both be female and left-handed we're randomly selecting from a group of 130 students so that's going to be the denominator on each of these pieces the number of ways to select a female student is 71 because there are 58 right-handed female students 13 left-handed female students for a total of 71 female students the number of ways to select a left-handed student would be 25 since there are 13 female left-handers and 12 male right-handers and again the total number of people that we're selecting from is 130 the probability that our selected student is female and left-handed is 13 because there are 13 students in that category 13 students that meet that description and 130 total students so we add up 71 plus 25 which is 96 subtract off 13 which is 83 so our final answer is 83 out of 130 and if we write that as a decimal that's approximately 0.638 we can also think about this by adding up all the students that meet this description so female or left-handed while the female students we add these categories up and then the left-handed students are this column but we've already counted these first 13 so if we just add on these extra 12 we'll add up all the students that are either female or left-handed and notice if you add up 58 plus 13 plus 12 you get to 83 as well the only category that isn't counted is the right-handed males the only ones who don't fit this description of either female or left-handed so again multiple ways to get to the correct answer here's another contingency table or a two-way table so here a survey was given and this table breaks down the number of people who responded to this survey who have received or not received a speeding ticket in the last year and then whether their car was red or not so speeding ticket not speeding ticket red car not red car and again we're given the totals for each row in each column but if we weren't given those we could easily find them we're asked to use this table though to find the probability that a randomly selected person from this group has a red car or got a speeding ticket in the last year the probability that they have a red car or got a speeding ticket is the sum of those individual probabilities minus the overlap again since these events are not mutually exclusive since they can happen together we need to consider the overlap and subtract it off since we're selecting from a group of 665 total people that'll be the denominator for each piece the probability they had a red car well the total number of people in this group who had a red car was 150 15 of them got a speeding ticket 135 didn't but a total of 150 people in this group had a red car the total number of people who got a speeding ticket is 60 because 15 people with a red car got a speeding ticket and 45 people with out of red car got a speeding ticket for a total of 60 and then we need to subtract off the overlap those with a red car and a speeding ticket because what we've done is we've added up this row and this column which double counted this overlap the overlap of 15 so we need to subtract off 15 out of 665 again we could also do this problem by adding up for instance those with a red car and then only adding on the extra ones that we need the people who got a speeding ticket but didn't have a red car or we could add up this column of speeding tickets and then simply add on the extra people who had a red car without a speeding ticket either way we get to the answer that this probability is 195 out of 665 and if we write that it's a decimal it's approximately 0.293 so close to 30 of the subjects in this survey either had a red car or got a speeding ticket in the last year if you pull a random card from a standard 52 card deck what is the probability that it's not a heart in other words what's the probability that it's one of the other suits the probability that something doesn't occur is always one minus the probability that it does occur so we can either calculate the probability it does occur and subtract that from one like we're doing here or we could think about the number of ways that we could draw a card that's not a heart and do it directly I'll do it this way though using this formula the number of ways to draw a heart well there are 13 hearts 52 total cards so the probability we don't draw a heart is one minus 13 out of 52 which ends up being 39 out of 52 which is exactly what we would find if we added up all the cards that aren't hearts there are 13 spades 13 clubs and 13 diamonds adding those up we get 39 total cards that are not hearts and by the way of course this simplifies down to three fourths or 0.75 which also makes sense if one fourth of the cards are hearts three fourths of the cards are not hearts so the probability of not drawing a heart is three out of four a multiple choice question has five answers and exactly one of them is correct if you were to guess randomly what's the probability of not getting the correct answer this example is pretty simple but we illustrate again how we calculate the probability that something doesn't occur so the probability you don't get the correct answer is one minus the probability that you get a correct answer well there are five total possibilities for what you could guess one of them is correct so one minus one fifth is four fifths so four fifths chance that you won't get the correct answer if you randomly guess and of course we can get to that answer directly by thinking about the fact that if there are five answers and one of them is correct the other four are not correct so when you randomly guess you have four ways to choose an answer that's not correct five total possibilities to choose from so that probability is four out of five again multiple ways to get to the right answer and any one of them is fine but here we're illustrating the formula that lets us calculate the probability that something doesn't occur by taking one minus the probability that it does occur according to the FCC website female students make up 57 percent of the fall 2014 student body if one student is randomly selected from the student body what is the probability that that student is not female again we're doing a probability of something not occurring and that's always one minus the probability that that does occur so one minus now the probability that our randomly selected student is female is equal to the proportion of students that are female so 57 percent as a proportion is 0.57 so we get one minus 0.57 or if we were dealing with percentages we could write 100 percent minus 57 percent which would be fine but either way we get 0.43 or 43 percent so again if 57 percent of students are female the other 43 percent or not so the chances that we draw a student that isn't female is 43 percent determine whether these events are independent for each of these descriptions remember two events are independent if they have no effect on each other in other words one happening has no impact on the probability of the other one happening the first scenario says a fair coin is tossed two times the two events are the first toss being heads and the second toss being heads these events are independent because what happens the first time has no impact on what happens the second time each time the coin is flipped it doesn't remember what happened the previous time and it's an independent result the second scenario says the two events are that it's raining in frederick maryland and it's raining in a nearby town of thermod these events are not independent because if it's raining in frederick it's probably also raining nearby those two events have some impact on each other's probability the third scenario says you draw a red card from the deck meaning either a heart or a diamond and then you draw a second card without replacing the first now if i drew a red card the first time then the deck looks different than it did before i do that card now there are 51 total cards and only 25 of them are red 26 of them are not red which means that the probabilities for the second draw have changed meaning these events are not independent because i didn't replace the card the deck looks different and so all the probabilities have changed in the fourth one though if i draw a face card and then replace it and reshuffle the deck before drawing a second card that makes the two events independent because by replacing the card and reshuffling it i've returned all the probabilities back to where they were in other words the deck looks the same as it did before i drew the first card there's still 52 total cards and all the cards are back in the deck so those events are independent so drawing without replacement like in number three the events are not independent but drawing with replacement the events are independent like in number four suppose you flip a coin and then roll a six sided die once what is the probability that you get tails and an even number when we're dealing with probability with and if the two events are independent the probability of one and the other occurring is the product of the individual probabilities since these events are independent we can do this in other words flipping the coin has no effect on what the die is going to do so these events are independent they don't change each other in any way so the probability of one and the other is the product of their individual probabilities the probability of flipping a coin and getting tails is one out of two and the probability of getting an even number is three out of six three even numbers and six total possibilities multiplying these together it simplifies down to one fourth or point two five or any equivalent fraction so we could calculate the probability of these two things happening together by multiplying their probabilities only because the events were independent in this example we'll draw without replacement from the population in the us now before we said that drawing without replacement the events are not independent however when we're dealing with such a large population drawing one person out doesn't change the probabilities of the remaining pool enough to really matter in the problem the the change is so tiny that we can assume that it's negligible so we're going to assume that these two trials are independent even though we're drawing the out replacement and that's what that sentence in the middle of the problem statement says about nine percent of people are left-handed suppose two people are selected at random from the us population again because the sample size of two is very small relative to this large population it's reasonable to assume that these two people are independent in other words they won't change the problem in any significant way by assuming they're independent what is the probability that both are left-handed in other words what's the probability that the first person left-handed and that the second person is left-handed well of course we can assume that they're independent so this probability is the product of the two individual ones since nine percent of people are left-handed the probability of drawing a left-handed person is nine percent or point zero nine so we multiply together point zero nine twice and we get point zero zero eight one so there's a little less than one percent chance that these two people that we select randomly are both left-handed assuming that the probability of having a boy is point five or fifty percent find the probability that a family with three children has three boys so the probability of having three boys is equal to the probability of having a boy first and a boy second and a boy third so these events are all independent in other words the first child has no effect on what the second child will be we're assuming so the probability of these events all occurring together is the product of the individual probabilities and since the probability of having a boy each time is point five you multiply together point five three times and what we get when we do that is point one two five so about a twelve percent chance that this family will have three boys all in a row another way to do this problem is to list out the whole sample space of what could happen with three children so boy boy boy girl girl boy boy girl and so on and so forth what you'll find is that there are eight possibilities and one of them is boy boy boy so the probability of that happening is one out of eight and one out of eight is exactly point one two five so either way we do the problem we get to the same answer if you pull two cards out of the deck and the implication is without replacement what is the probability that both are spades as we said before drawing cards without replacement means that the events are not independent because once i've drawn the first card that changes what the deck looks like for the second card when the events are not independent the probability of two things happening together is the probability of the first thing happening times the probability that the second thing happens given that the first one already occurred because if the first one didn't occur it doesn't matter what happens the second time the whole scenario has failed but if the first card was a spade in this scenario then we can continue forward and calculate the probability the second card is also a spade and multiply those together so in this case the probability the first card is a spade is 13 out of 52 because there are 13 spades 52 total cards if that first card that was drawn was a spade there are 51 cards left because we drew a card and 12 of them are spades because the first one we drew was a spade so we multiply the probability that we draw a spade the first time times the probability that we draw a spade the second time assuming that the first card was a spade so if we multiply those two together we get in decimal form a probability of 0.0588 or about a six percent chance here we're going to find the probability of several different events using a bag of m&ms we've got a breakdown of colors with the numbers listed of each color and we're going to pull two m&ms out of the bag without replacing the candy so each experiment we're going to pull one m&m out hold it out and then reach in and pull out a second one randomly and then we'll reset the bag before the next experiment so each part of this problem we'll assume we're starting with the same distribution of colors that are given here so first of all we want to find the probability that we'll draw two red candies meaning that we'll draw one and then in our second draw we'll also get a red candy so this is a multiplication rule problem where the two events are not independent if we replace the candy and reshuffle the bag then they will be independent because the probability on each draw would be consistent but because we hold that first one out before we grab the second one the events are not independent and so we have to think carefully about what's going to happen with each situation so the probability of drawing a red one and then another red one means that first of all we need to find the probability of drawing a red one on the first pull and then multiply that by the probability that if we assume we did get a red one the first time we'll also get a red one the second time in other words we assume that the first half of the problem went the way we needed it to to continue going if we drew a blue one for instance the first time the experiment will be over because there will be no way to draw two red ones so first we'll calculate the probability that the first one's red then we'll assume that happened and under that condition we'll find the probability that the second one is red so the first one there are 12 red candies and if you add all the total counts together you'll find that there's a total of 106 M&Ms in this bag to begin with so the probability the first one is red is 12 out of 106 now if we assume that happened there are a total of 105 candies left because we're holding one out and of those 105 only 11 of them are red if we assume that the first one was red in other words if we assume the first half of our experiment was successful there's only 11 red candies left so we calculate those two probabilities and that second one is the conditional probability on the condition that the first one was also red now when we multiply those we get 132 out of 11,130 which corresponds to about 0.0119 for the next part we want to find the probability of first drawing a blue candy and then a brown candy in that order so the probability of drawing blue then brown again is going to be the probability first of drawing a blue one and then assuming that happened the probability the second one is brown given that the first one was blue so again there are 106 to begin with and out of those 22 of them are blue if the first one we drew is blue there are 105 total left but all of the brown ones are still in there if the first one we drew was blue all the brown that were there at the beginning are still there so there's still 24 browns so if we multiply these together we get 528 out of 11,130 which is about 0.0474 for the next part we want the probability of not drawing two green candies now if you try to do this directly you'll find yourself getting lost very quickly because the number of ways you could draw not two green candies quickly gets out of hand you could draw a red than a yellow or a brown than an orange or two browns or a yellow and a green and there are just too many possibilities to keep track of it's much much easier to first find the probability of drawing two green ones and then remember that the probability of that not happening is just one minus that answer so this is much much easier and then this one looks just like the ones we've done previously especially part a so we'll have one minus the probability that you draw a green one first times the conditional probability that you draw a green one second so of course the probability you draw a green one first there are 17 out of 106 and then assuming that happened there will be 105 total of which 16 will be green if you drew a green one the first time so this is one minus 272 out of 11,130 which works out to 10,858 out of 11,130 or about 97 or 98 percent for the last part we're going to find the probability of drawing an orange and a yellow now this looks a lot like part b where we drew a brown and a blue but in part b we were given an order specifically blue than brown in this case notice carefully that order is not mentioned it's just the probability that one of them is orange and one of them is yellow which is easy to miss but we need to keep track of is that we could draw orange than yellow or we could draw yellow than orange and notice the word or tells us we're going to add the two probabilities together so the probability of getting an orange and a yellow we could do orange then yellow or we could do yellow then orange and these individual probabilities will do just the same way we did the previous ones so for the first one first drawing an orange there are 13 out of 106 and then to draw a yellow there will be 105 total and if we drew an orange the first time all the yellows would be left in the bag all 18 of them then on the other hand if we draw a yellow first there would be 18 out of 106 times all the oranges 13 out of 105 and if you notice carefully those two values this one here and this one here are actually the same so we could have just calculated the probability of orange than yellow for instance and then doubled it and the answer would come out the same way as the one we're going to get in just a moment so there are a couple of ways to do this but this way keeps everything easy to see and we can keep track of all the possibilities and make sure we're accounting for them all it turns out this total is 468 out of 11 130 which is about 4 percent we're again given this data about the 130 FCC students broken down by gender and whether they're right-handed or left-handed now though we're going to calculate conditional probabilities from this table the first part asks what's the probability that a randomly chosen student from this group is female given that the student is left-handed now if we're looking at a contingency table and we're dealing with conditional probabilities we see the word given the given tells us what category to restrict to in other words if we've drawn a student and we're given that the student is left-handed we know that we're in the category of left-handed students that given has restricted us to just that column so we zoom in on that column and that's all we have to worry about so the probability that the student is female given that the student is left-handed tells us that we're working with a pool of 25 students all the left-handed students and we're asking what proportion of those are female while there's 13 students that are female and left-handed there are 25 left-handed students so the probability that the randomly chosen student is female given that the student is left-handed is 13 out of 25 similarly for part two if we want to know the probability that a randomly chosen student is right-handed given that the student is male we need to restrict ourselves to the category that we're given so we're given that the student is male so we know that we're in this row the male students and we're asked if we're choosing just from this pool in other words this pool of 59 people what proportion of those are right-handed what's the probability the student is right-handed given that we're just looking at the male category well there are 47 right-handed students in this category 59 total students so the probability is 47 divided by 59 there are 21 novels and 18 volumes of poetry on a reading list for a college English course how many different ways can a student select one novel and one volume of poetry to read during the quarter this example uses the fundamental counting principle and the fundamental counting principle says if we're making two choices and we know how many ways there are of making each choice we multiply those two together to get the number of ways of making the two choices in order so we're choosing two things we're choosing a novel and we're choosing a volume of poetry there are 21 ways to choose a novel 18 ways to choose a volume of poetry so the number of ways to select both together is the product of those two or 378 notice if we flip things around and made the choice of poetry first and the novel second it would just be 18 times 21 which is still 378 so the order that we think of the choices doesn't matter it just matters that we consider that there are two separate choices that we're making and we have a certain number of ways of making each choice assume you work at a pizza parlor and you're offering a special on large two topping pizzas your toppings are broken into two categories meat and veggies the toppings must be chosen with one from each category in other words they have to choose a meat and a veggie how many different two topping pizzas can you make under this restriction so again we're making two choices in order we're choosing the meat to put on the pizza and the veggie to put on the pizza if we multiply the two together we'll have the total number of ways of making these two choices and that's just the fundamental counting principle so there are four options for meat three options for veggies multiplying the two together we get 12 total options for how you can make this two topping pizza using this restriction an apartment complex offers apartments with four different options designated by four categories so you have the number of bedrooms the number of bathrooms which floor the apartment is on and which view you get and we want to know how many apartment options are available in this case essentially we're making four choices the number of bedrooms the number of bathrooms which floor we're on and which view we get the fundamental counting principle again says that if we can figure out how many ways there are to make each choice the number of ways of making all these choices together is simply the product of these numbers so there are three ways to choose how many bedrooms to have two ways to choose how many bathrooms to have two floors to choose from and three views to choose from so multiplying all these together we get 36 total options for what kind of apartment you could select you've been given the job of scheduling the movies for the FCC movie marathon you have four choices for movies an action movie a comedy movie a drama and a horror luckily for you you know the fundamental counting principle how many ways are there to order these four movies so we're going to watch these four movies but we can pick the order we want to put them in so we have four choices to make which movie goes first which movie goes second which one goes third and which one goes fourth so we start with the first one how many movies do we have to pick from to be the first choice well there are four movies so we can put any one of those four in the first slot however once we've done that the first slot is taken now for the second slot there are only three movies left to choose from because we've already used up one on the first slot so there are three ways to pick the second movie similarly for the third slot we've already placed the first and second ones so there's only two movies left to choose from and then for the last slot there's only one movie left to choose from because all the others have been placed so if we multiply these four together four times three times two times one is 24 so there are 24 total possible ways that you could order these movies and now we know that that's called four factorial four times three times two times one anytime we're ordering a certain number of things in the same number of slots we can use a factorial to calculate that so organizing four things in four slots is four factorial organizing 10 things in 10 slots there are 10 factorial ways of doing that you have 18 CDs and you need to arrange eight of your favorites on the shelf near your stereo how many ways can you arrange these CDs assuming that their order makes a difference to you there are two ways to do this problem and i'll illustrate both the first is to say well i have eight slots to fill with these CDs i have 18 CDs to choose from the first slot i can put any one of those 18 there the second slot i only have 17 left because one of them is in the first slot and then i have 16 options for the next spot 15 for the next and then 14, 13, 12 and 11 and you can multiply all these numbers together to find out how many ways there are to arrange eight of these 18 CDs in order if you multiply these together you'll get 1,764,332,560 possibilities the other way to do this is to use the formula now since in this case order matters we'll use the permutation formula where n is 18 because that's the number we're choosing from r is 8 that's the number that we select from that pool and then organize so that would be 18 factorial divided by 18 minus 8 and minus r factorial or 18 factorial divided by 10 factorial and if you pull up the calculator and type in 18 and then go to the math menu scroll over to the probability menu and down to number four for the factorial and then divide that by 10 factorial go over to the math i want a probability number four we get the same answer 1,764,322,560 the other way to do this is to use the built-in function for permutations notice number two on the probability menu is npr first we have to enter n the number that we're choosing from which is 18 we have 18 CDs to select from then we go math over to the probability menu select npr and then put in r the number that we actually select in order enter we get the exact same answer so really two ways of doing this one you can think about the slots that we have to fill and how many options we have for each one and have this descending product that goes from 18 down to 11 and then stops because we run out of slots or you can think of the problem as a permutation problem use either the formula or the built-in function in the calculator anyway you do it though you get the same answer as long as you do it correctly and any one of those methods is fine the math club has 18 members according to the bylaws they need to have a president a vice president and a secretary and we want to know how many different ways can these positions be filled so again we have three positions to fill the president the vice president and secretary and there are 18 people to choose from that means there are 18 options for who can be president once there's a president though there are only 17 people left to choose from for the vice president and then 16 people left to choose from for the secretary we multiply these options together using the fundamental counting principle we get 4896 possibilities for how many ways we could fill these three positions the other way to do this problem is to think of it as a permutation problem since order matters when we select these people because when we select them we then organize them into one of these positions we think of this as a permutations problem and so we use the permutation formula where n is 18 that we're choosing from 18 people and r is three we select three people and organize them into one of these categories so we get n factorial divided by n minus r factorial so if we go to the calculator and type in 18 factorial divided by 15 factorial we get the same answer we found in the first part and of course if we use the built-in permutation formula we'll get the same thing as well 18 go to the math probability menu select npr 3 because we're selecting three and organizing them and we get the same exact answer so again multiple ways to do this problem but the fundamental counting principle is probably the simplest to understand the group of four students is to be chosen from a 35 member class to represent this class on the student council how many ways can this be done since there's no mention of order no mention of position within this committee we're just selecting four students that want to know how many ways can we select them this is a combinations problem so we're going to use the combinations formula remember the combinations formula in general is n factorial over n minus r factorial times r factorial and either you can keep this formula handy or use the built-in function in the calculator I'll illustrate both for the purposes of this example so in this case n or the number that we're choosing from is 35 and r or the number we're choosing is four so choose four students from this group of 35 so n factorial is 35 factorial n minus r is 35 minus four or 31 and r is four so going to the calculator we type in 35 find the math probability menu select number four for the factorial divided by and then I would divide by the product of 31 factorial and four factorial 31 math probability menu factorial times four factorial and we get 52,360 ways to select this group of four students the other way of course is to use the built-in combinations formula so we type in n 35 go to the math probability menu and number three is ncr for combinations so we type in n then select ncr and then type in whatever r is in this case it's four and we'll get the exact same answer 52,360 so there are 52,360 ways to choose four items from a pool of 35 items how many different ways can harry potter choose three players to be chasers from a choice of 10 players since there's no mention of order within these three players that are chosen it's just which players do we select this is a combination formula so we're looking for 10c3 n is 10 that's the number we're choosing from and r is three we're choosing three out of these 10 so I'm just going to use the built-in function in the calculator so I type in n then go to math probability down to ncr three and finally the answer is 120 so there are 120 total ways that these three people can be chosen from a pool of 10 how many different ways can a director select four actors from a group of 20 actors to attend a workshop on performing in rock musicals again no mention of order so all we're worried about is how many ways are there to pick four people how many different combinations of four people are there out of a pool of 20 again I'll just use the built-in function on the calculator so I type in n go to the math probability menu select number three ncr four and there are 4845 ways to select four people from a group of 20 the big pen company makes pens in four colors blue black red and green with three tip styles extra fine fine or medium what is the probability of picking one pen at random and having it be a black pen so now we're going to illustrate doing probability using counting methods so again a probability is the number of ways to get a black pen divided by the total number of possibilities well how many different pens are there if we're selecting a color and a tip style there are four options for the color three options for the tip so there are 12 total options so that's the denominator that's the number of possibilities there are 12 possible options three of those are black you have black extra fine black fine or black medium so there are three ways to get a black pen and there are 12 total possibilities therefore the probability that our randomly selected pen is black is 3 out of 12 now of course we could also say well it's one fourth because there are four colors and each one of these colors is equally likely but the reason we did it the way that we did was just to illustrate a simple example of how the fundamental counting principle can be used to count up the number of possibilities in order to do a probability problem in a certain state's lottery 48 balls numbered 1 through 48 are placed in a machine and six of them are drawn at random so we get six distinct numbers there's no repetition if the six numbers drawn match the numbers that a player had chosen the player wins a million dollars in this lottery the order that the numbers are drawn in doesn't matter it just matters which numbers you get we want to compute the probability that you win the million dollar prize if you purchase one lottery ticket so the fact that the order doesn't matter is significant that means that this is a combination problem what we're looking for is the number of ways that you could draw six different numbers from 1 through 48 so the number of ways that we can select out of 48 numbers and select six of them using combinations if we use the calculator 48 is n go to the math probability menu select ncr six there are 12,271,512 ways that you can select six numbers out of these 48 numbers that's the number of total possibilities for what could come up when these six numbers are drawn only one of them is the winning combination though so the probability of winning is the number of ways to get the winning ticket there's only one divided by the total number of possibilities for what ticket you could buy and that's the 12,271,512 so it's a tiny probability in a decimal form it's zero point and then seven zeros followed by an eight so the tiny chance of winning given the number of possibilities that can occur if you select a four-digit pin what is the probability that there are no repeated digits let's start by counting the number of ways that anything could happen the number of total possibilities for what pin you could select well if there's four digits essentially you're making four choices what number goes into digit so we have the options of one through nine which is nine options or a zero there are 10 options for each position therefore multiplying them together we get 10,000 total possibilities for what pin you could select there are 10,000 ways to select a pin now the second part is a little trickier to calculate the probability that there are no repeated digits we need to count the number of ways that you could select a pin without repeating digits once we count the number of ways of doing that if we divide that by these 10,000 total possibilities we'll have the probability so counting the number of ways to build a four-digit pin without repeating any digits it turns out that this is equivalent to choosing four digits out of the 10 right the zero one two three four five six seven eight or nine and we select four of them without repetition so you can think of having a bag with the numbers zero through nine and you pick out four of them not allowing any repetition and you want to know how many ways going to do this so that's why it's a combination problem and that's the tricky part of this problem is recognizing that that is how to solve it that we're selecting four of these numbers out of the numbers zero and then one through nine but once we see that calculating is pretty straightforward we'll use the calculator again and the built-in combination function so 10 math probability ncr four there are 210 possibilities for how you could build a pin without repeating digits so the probability that that happens randomly is the number of ways that can happen divided by the total number of possibilities for what can happen 210 divided by 10 000 which as a decimal is 0.021 so about a 2% chance that no digits repeat assume that you're still in charge of planning the FCC movie marathon remember we have four types of movies action comedy drama and horror assume somebody also brought a thriller so now we have five types of movies total in order to find out the order of the movies you decide to throw all the names in a hat and plan to draw one name out of the hat at a time and the order will be determined by how they're pulled out of the hat what we want to know is what is the probability that the thriller is played fourth and the horror movie is played fifth again anytime we calculated probability it's simply the number of ways that our event could happen divided by the number of total ways that this scenario could play out so we need to count up the number of ways that this precisely could happen the thrillers played fourth and the horror movie is played fifth and divide that by the number of ways that anything could happen the number of total possibilities for how these movies could be ordered so let's start with the numerator let's start with counting how many ways could the thriller be played fourth and the horror movie be played fifth if we use the fundamental counting principle we think about well there's five slots to be filled five movies that we're ordering in the first slot we have three options because there are five options total but if the thriller is going to be played fourth and the horror movie is going to be played fifth then those two are not possibilities for the first one so there are three possibilities the other three movies then there are two movies that aren't the thriller or the horror that can go in the second slot and then one movie that can go in the third slot and then the thriller goes in the fourth slot and the horror movie goes in the fifth slot you can also fill in these slots by filling in the fourth slot in the fifth slot first with ones and then recognizing that there are three movies left to go into the remaining three slots in any case this tells us that there are six ways that the thriller could be played fourth and the horror movie could be played fifth the other way to do this is to use permutations and the way to do that is to notice that if the thriller is fourth and the horror is fifth the number of ways that that could happen is just the number of ways that we could order the first three movies so there are three slots that we still have to fill three movies to fill them so three p three permutations where n is three and r is three and of course that turns out to be exactly six as well then the denominator or the number of ways that we could order these movies in total without this restriction again we can use the fundamental counting principle there are five movies that can go in the first slot four in the second slot three in the third slot and then two and then one and five times four times three times two times one is 120 so there are 120 total possibilities that too we could do is a permutation problem where we have five slots to fill five movies to choose from and we're organizing them so it's a permutation problem where n and r are both five again if we use the calculator for that we get 120 but it turns out that these permutation problems can all be done with the fundamental counting principle so i've shown that here then of course the probability that this scenario happens is the number of ways that the scenario happens divided by the total number of possibilities so if we divide six by 120 that's the probability and of course we can simplify that fraction down to one over 20 or write it as a decimal which is 0.05 so there's a five percent chance that if we randomly organize the movies the thriller will end up in the fourth slot and the horror movie will end up in the fifth slot