 Okay, this video is on sampling methods, different ways we might gather samples for experiments or observational studies. And we're going to talk about six different methods here, voluntary response and convenience sampling, not so good, we'll talk about why. Systematic sampling, cluster sampling, stratified sampling, good because they use an element of randomization when gathering the sample, and we'll talk a little bit more about that. And then simple random sampling, which is the best in terms of getting the most random sample, not always the best from a methodology standpoint, but the best in terms of randomization. Now the definition for simple random sampling is difficult, but let's see if we can get our head around it. You know that simple random sampling is sometimes abbreviated as SRS, and you'll hear your teachers talking about taking an SRS, and your book will use that as well, so SRS stands for simple random sampling. Here's the definition, an SRS of size n, size 5 for example, consists of five individuals from the population chosen in such a way that every set of five individuals has an equal chance to be in the sample actually selected. Let's see if we can wrap our head around that. What that means is if I want to randomly sample five people from the high school, then literally this is true. There's an equal chance of five tall people being selected, as five short people, as five anybody you can imagine, as five men, as five women, as five football players, as five theater participants. Any five you can imagine has the same chance of being selected. If your methodology allows for that, then it's a simple random sample, and we'll talk in another video about how we do that. Just another view of it here, simple random sampling. Let's say this is the population I'm selecting from. My method is a simple random sample if these five people in the front row have exactly the same chance of being selected as these five people in the back row have exactly the same chance of being selected as any of the five people that you see in the picture. If those groups of five all have an equal chance of being selected, then my methodology is a simple random sample. We'll talk about that methodology and how we do that in another video. Stratified sampling is a second method. It's a good method because it includes random selection. In stratified sampling, we subdivide a population into at least two different subgroups based on some like characteristic. Then we draw a random sample from each subgroup, and those subgroups can be called stratums. Let's say I want to do a survey on park use in Eden Prairie. If I just did a simple random sample, it's possible my random sample might not include any children or might include only seniors. If I'm interested in surveying on park use, well, that's not a very good survey because I'd like to get a sense for how the entire population uses parks. A better method might actually be to break up the population into groups, and here we've sort of done it by age, children, youth or teens, 18 to 55-year-olds and seniors. My method would be to break it up into groups, and within each group I would make a random selection. I might randomly choose these three children, these three youth, these three 18 to 55-year-olds, and these three seniors, so I've made a random selection within each group. I've used randomization to do that, that's good, we'll talk about why, but I've covered the population that way. I've randomly selected 12 based on the circles I put up there. It's not a simple random sample because based on my methodology, there's absolutely no way I can select 12 seniors. And the definition of a simple random sample says any 12 you can imagine might be selected using your methodology. So that's stratified sampling. We break the population up into groups that are like on a characteristic, in this case age, and then within each group we randomly select. Stratified sampling is frequently confused with cluster sampling, so work for your permanent to see if you can understand the difference. In cluster sampling, we divide the population into sections or groups. This is the state of New York, you see the counties here. Then we randomly select the clusters. Let's say, for example, we randomly selected the five red counties. Then we choose all members from the selected clusters and use all members as part of our survey. So notice the difference here. In this case, we're randomly selecting the groups or the clusters. And then we're surveying everybody in each cluster that we randomly select. In stratified sampling, we broke into groups and we used every group. We didn't randomly select groups here. And our random selection took place within the group. So stratified, we randomly select within each group. Cluster, we randomly select the groups. An example of cluster sampling at the high school might be to break the high school up into classrooms, randomly select three classrooms, clusters, and then survey each student in the classroom. That would be cluster sampling. Systematic sampling is an acceptable method. It includes randomization as well. We pick some random starting point and then select every kth, third, fifth, fiftieth element in the population. So I might stand outside a grocery store and randomly select every 30th person that walks in and ask them to answer a question. And that way, at least there's randomization there and then I'm not picking out people I like or people I recognize or people my age. I just picked a random starting point and I'm picking every 30th person. Or in a manufacturing sense, if I'm making, these are dated photos, if I'm making iPads or iPhones, I might randomly pick a random starting point and then pick every 10th or every 50th iPhone off the assembly line and test it for quality. That's systematic. It's acceptable because it includes randomization. Now the next two methods not so good. Voluntary response. For voluntary response, I want you to think about the phone call at home. Excuse me, Sir. Madam, do you have a minute to answer a question or the survey that pops up on your computer that asks you for a response? The problem with voluntary response is individuals choose themselves as to whether they're going to be part of or not be part of. And that introduces something called bias. And I'll explain that here. Let's say you get a phone call at home on a controversial topic, say gun control. Excuse me, Sir. Madam, do you have a moment to answer a few questions on gun control? Well, we could make a pretty good argument that the people that choose to stand the line for that are folks who have a strong opinion one way or the other and want to make sure that opinion gets counted. Well, those who are neutral or don't care so much would simply hang up or say, I know I got better things to do. Well, that introduces something called bias and that the results then are going to favor those with strong opinions and miss those in the middle. And that's a problem if we're really trying to get a sense for how a broad population thinks about the issue of gun control. We should really be working hard to make sure we have everybody's opinions even those from people who don't have strong opinions. So, voluntary response doesn't include any element of randomization in its simplest sense. In a sophisticated sense, those who do phone surveys will try and introduce randomization and correct for people who hang up. In a computer pop-up, that might be much more difficult to do. So, pure voluntary response, not great. Why would we do it? Well, it can be cheap, it can be quick, it can be easy. Another example of cheap, quick, and easy is convenience sampling, where we just grab the folks that are easiest to reach. Picture the person with the clipboard at the mall who's just trying to grab people to walk by and say, excuse me, Sir Madam, do you have a few moments to answer a question? Now, I don't know about you, but when I see that person, I do everything I can to avoid eye contact. I might shield my eyes with my hand. I'll put my head down, I'll try and walk by them. I'm a busy, busy guy, or maybe I'm not. I just don't want to talk to these folks. Well, that's a problem because they're not going to get the opinions of busy, busy guys like me or folks who just don't want to talk to them. Now, my 86-year-old mother-in-law on the other hand, she'll talk to anybody about anything for as long as they'd like to talk. She's chatty. So that person with the clipboard hits her and she goes, yes, absolutely. Let's go have a cup of coffee. Let's go over to Caribou. Let's sit down. Let me show you the pictures of my grandkids. Have any more questions? I'd love to answer more. And so I'm making a joke here, obviously. But in convenience sampling, the problem is, you know, my example, your results will be biased towards 86-year-old mother-in-laws or more generally towards people who have an opinion or people who have the time. And they're not, again, getting necessarily a sample that reflects the entire population of people that are shopping at the mall. That's bias. Results favor a certain outcome. They favor what 86-year-old mother-in-laws have to say. No element of randomization there. Well, let's practice this a little bit. You should be able to identify each of the six methods I've just described based on examples I give you and have some sense for when and why they might be used. So pause me here for a minute, read these five examples, and see if you can identify which of the six methods these are, convenience, voluntary, systematic, stratified cluster, or simple random sample. All right, well, let's see how you did. Let's put some answers up here. The first one, we obtained a sample by selecting three lawn mowers from each manufacturer. So we've used every manufacturer. And within each manufacturer, we've selected three lawn mowers. That's stratified. We broke our population up into groups, manufacturers. And then within each group, we made a random selection. The second one, a sample of products obtained by selecting every 100th item, that every 100th, every case, you should recognize as systematic. Third one, random numbers generated by a computer used to select serial numbers of cars. That's a simple random sample. We'll talk about methods for doing that in another video. But that doesn't include any of the elements for any of the five other methodologies. Fourth one, auto parts manufactured contains a sample of all selected items. That word all is a bit of a clue. All stocked items from each of 12 randomly selected retail stores. So I've randomly selected retail stores from all the retail stores where I sell my parts. The retail stores are clusters. And then within each retail store, we've sampled all stocked items. So that's cluster sampling. Last one, Carmaker conducts marketing study involving test drives. We take a sample of 10 men and 10 women in each of four age brackets. So I have four age brackets. I've broken them up by gender. So I have eight stratum altogether. And then within each stratum or subgroup, I've selected 10 men and 10 women. So that's stratified sampling. I used all groups and randomly selected within each group. Here's five more. Quickly, pause me. Read these, see if you can identify the methodology. All right, let's see how you did. So here are the answers. First one, Carmaker conducts a marketing study interviewing potential customers who happen to request test drives. They happen to be there. It's convenient, convenient sampling. Is there an element of voluntary there? Of course, because anybody could say yes or no to this. So if you called it voluntary, we might have a hard time arguing with you. Motorola selects every 50th pager. There's the every k-th again. That's systematic. Dean at Ohio University surveys all students from each of 12 randomly selected classes. So we randomly selected groups, classes, clusters. And then within each cluster or group, we've surveyed all students. That's cluster sampling. Conducting research on a site course, a student at Boston U interviews 40 students who are leaving the cafeteria. Again, that's convenient to grab them as they walk out of the cafeteria. Is there an element of voluntary there? Of course, because they can say yes or no. But probably first and foremost, it's a convenient sample. Last one, an IRS auditor randomly selects 15 taxpayers with less than 25 in gross and 15 taxpayers with gross above 25,000. We've created two subgroups based on income and randomly selected within each group at stratified. When you randomly select within the group, that's stratified. Now, we've got six methods to choose from. Typically, the goal is to avoid something called bias. And a methodology is biased if it favors a certain outcome. So the convenience sampling in the mall is bias because it favors the outcome of talking to 86-year-old grandmothers. The voluntary response is bias because it favors the outcome of talking to those with strong opinions. So bias methods here we want to avoid at all costs. Sometimes the methods are used because they're cheap, quick, and easy. Ideally, we'd like to have an element of random selection in what we do. So SRS, stratified sampling, systematic, and cluster are all significant steps up in quality from the bias methods in that they reduce bias by using chance or randomization to select folks. So by being random in your selection, you avoid getting only 86-year-old mother-in-laws or only those who have strong opinions. So know that the four element, the four methodologies that use randomization are better and preferred, but there are two others that might get used. You're responsible for knowing the difference between these six, when and why each might get used, and the pros and cons of each.