 Greetings, this is Monica Wahee, lecturer at Library College, bringing you your lecture on section 1.2 on the topic of sampling. So here are your learning objectives for this particular lecture. At the end of this lecture, the student should be able to define sampling frame and sampling error. The student should be also able to give one example of how to do simple random sampling and one example of how to do systematic sampling. The student should be able to explain one reason to choose stratified sampling over other approaches, state two differences between cluster sampling and convenience sampling, and give an example of a national survey that uses multi-stage sampling. So let's jump right into it here. So we're going to go over in this lecture sampling definitions and then those different types of sampling I mentioned in the learning objectives, simple random sampling, stratified sampling, systematic sampling, and then convenience and multi-stage sampling. So let's start with some sampling definitions. What is a sample? Okay, so we're going to revisit that concept from the previous lecture. We're also going to talk about sampling frames and what errors mean and errors of sampling frames. And then we're also going to just go right back over that and make sure you understand before we go on and talk about the different types of sampling. So we take a sample of a population because we want to do inferential statistics. Remember that we want to infer from the sample to the population. And it's just not necessary to measure the whole population, it would be impractical, and it's cost a lot. And actually, what you'll find is if you ever do an experiment when where you actually do measure the whole population, you'll find that if you get, you know, a pretty good proportion of the population, you just take that, that's all you really needed to talk to. So ultimately, we save resources, especially in health care, when we do a good job of sampling and use that to infer to the population rather than having to take a census of the whole population all the time. So that brings us to the concept of sampling frame. So the sampling frame is the list of individuals from which a sample is actually selected. And the list may be this physical concrete list, like you can have a list of students enrolled at a nursing college, or in my other lecture I gave an example of a list of nurses who work at Massachusetts General Hospital. That could be your list, you go to human resources and get that. Or it could be a theoretical list. It could be like the list of patients who present to the emergency department today. Obviously, when you go into work, at the beginning of your shift, you're not going to know who's on that list yet, but it could be a theoretical list. But whatever that list is, that is your sampling frame. So that those are the people who actually could be selected for your study. So the sampling frame is the part of the population from which you want to draw the sample. And you want to work at such that everybody from your sampling frame has a chance of being selected for your sample. In other words, you don't want to leave anyone that should be in your sampling frame out in the cold. That leads us to the concept of under coverage. So what is it, it's omitting population members from the sampling frame, they're supposed to be on the list, but they're not there. So how can this happen. Well, let's say you did what I was suggesting in the previous slide, you got a list of nursing students, you know, from a college. Let's say somebody signed up that day or somebody was just admitted that day, maybe they didn't make it into the database in time and you're missing them. Or even like that HR list I talked about at MGH. Well, you know, I know how nurses are sometimes I'll temp in different places and maybe they're not on the payroll, maybe they're through a temp agency. And so then we would miss those nurses from the sampling frame. And then, you know, people who present at the emergency department at night might be different than those in the day. And so if you're really trying to sample from people who present to the emergency department, you can't just look at like some small period of time, you'd have to look at, you know, the whole 24 hour cycle. So if you omit population members from your sampling frame, they don't even get a chance to be in it. And that's called under coverage. Now I'm going to shift around, we're jumping around with a few different definitions. And we're going to talk about errors. Now this is something that took me a while to get used to in statistics, there's actually two kinds of errors and statistics. The first kind is I call it, this is my own terminology, a fact of life error. It's just an error that happens. When you do statistics is not bad or good, it's just what happens. And in this case, I'm going to describe one of those. It's called a sampling error. So the sampling error just simply says the population mean will be different from your sample mean, and the population percentage will be different from your sample percentage. So what does that mean? That means that if I cut corners, like I said, I could write and just take a sample to infer to the population. If I actually do one of those experiments, I was telling you about where I have the population data, and I just take a sample and compare the means, they will be different. Okay, I mean, there might be this huge coincidence where they're the same, but they're typically different. Same if you do percentages. And, and we just know this is going to happen in statistics, we account for it, we have ways of dealing with it. But we know that there's always going to be sampling error, whenever you take a sample from a population, try to make a mean or percentage in the sample, it's just not going to be exactly what's in the population. It's fine. But then there are other errors in statistics, which are actually bad. And you're it means you made a mistake. It's like mistakes, literally mistakes. And so as you go through learning about statistics, it's almost like you have to sit down and ask somebody is this one of those fact of life errors, or is this one of those errors you want to avoid. Well, we just talked about sampling error. That's just a fact of life error. But errors you want to avoid non sampling error. That's basically using a bad list. I had an example in my life where I wanted to study a whole bunch of providers, right. And my friend gave me this list of providers. And, and said this is the entire list of all these providers in this particular professional society. But when I sent the email to that list, I found there were not only duplicates on this list, but a lot of people emailed me back and said, Why are you sending this to me? I'm not a provider. I'm not part of this professional society. And also some people who were in that professional society, who had heard about the survey emailed me and said, Why didn't I get the survey. So this was a bad list. Some people had been left out of the sampling frame. So people who were in the society somehow weren't on my email list. And that's a problem, right? So you have to pay careful attention. This was actually a mistake. I mean, you have to pay careful attention that everyone in the population who is supposed to be represented in your sampling frame is actually there. So I should have really done a better job of calling the professional society and making sure that this list was a good list. So sampling error was caused by the fact that regardless of what you do, your sample will not perfectly resent represent the population. Whereas non sampling error. Yeah, I was sloppy. It was poor sample design, sloppy data collection, inaccurate measurement instruments, you can have bias and data collection other problems introduced by the researcher. So this is your fault if there's non sampling error. But sampling error, that's just the fact of life. Little whiplash here, we're going to now move on to the concept of simulations. So a simulation is defined technically as a numerical facsimile, or representation of a real world phenomenon. So it's like working through a pretend situation to see how it would come out in the case it was real. And this, you know, when you study statistics, you end up doing a lot of simulations. And remember how I've been talking about an experiment you could do if you somehow did a census and had a whole bunch of data on a population. You could do an experiment where you just took a sample from that population and looked at their mean to see the sampling error. That's an example of a simulation. So to just conclude this little section, it's really important to do your best to avoid non sampling error. And this is achieved by making sure you do not have under coverage when sampling from your sampling frame. So this puts together some of our vocabulary. But just remember sampling error is a fact of life. Okay, now we're going to specifically talk about different types of sampling. And we're going to start with simple random sampling. Okay, so first, we're going to start with just explaining what is meant by simple random sampling. Then we're going to talk about two different methods of doing simple random sampling. They work the same way they achieve the same thing. It's just that depending on how you're doing your research, one might be more convenient for you than the other. Finally, we will go over the limits of simple random sampling. Because all these sampling methods seem perfect. But then you got to take a look at their limitations. So let's first of all, define simple random sampling. So here's a definition, a simple random sample of n measurements from a population is a subset of the population selected in such a manner that every sample of size n from the population has an equal chance of being selected. Well, it's kind of complicated. But what it means is, is that if you use the proper approach for simple random sampling, whatever sample you get, you could have had just as easy a chance of getting another batch, another group of people from that sample. In other words, like let's say you have a list of the population of students in the class, so I'm going to define a class as a population. And you want to take a sample of five students from this bigger class. If you take a simple random sample, it means that all the different groups of five students you could pick from the list has an equal chance of being the sample group you actually pick. Now, you can just imagine that if you race into the class right at the beginning, and you take your sample of five and not everybody's in the class. What does that sound like right a sampling frame problem, maybe an under coverage problem, maybe biases creeping in there right. And so you just got to be careful if you're going to do simple random sampling that you start with a list with everybody in your sample frame because every single sample that you could possibly take should have equal chance of ending up being your sample. And I'll kind of explain it by explaining the two different methods that can be used of obtaining that sample. So one of the best things that you can do is just start with a really good list of all the people in your population. So maybe, you know, if I was going to study, I used to work at the army. So let's say I was going to study all the people who were active duty in the US Army, I would like to get a list of all of those people from an accurate place at the army. And I would like to have them have a unique ID. Okay, and that's true in the army, everybody in the army has a unique numerical ID. So what I would do, like in here, if you were looking at students, you take maybe take a student ID. So then you take the IDs from everybody on the list. And you cut them up, like you print them out, and you cut them up, and you put them in a hat, right, or a bag where you can't see in it. And then you mix them all up where you can't see it. And you draw five of them out or like in the picture, you know, what they did was mix up all those papers and now they're not looking. And they're drawing a few out. Okay. So what did you just do you just made sure First of all, that everybody in the population had an ID number, and that when you printed it out and cut it up all you didn't lose any of them if you drop them on the floor or something that's not simple random sampling, you got to make sure you keep all of them. And that you put them all in the hat and that you didn't look and you drew out five or whatever, because then any five of those slips of paper could have been drawn and therefore you're meeting with simple random sampling. Okay, that method will work right. Another method that works that might work better if you can't do this ID thing where you cut a paper is where you simply just make your own list of unique random numbers, right, you just make your own list. And then you assign those to the population. A great example is if you're, you know, kind of teaching kids and you want to put them in a random order, maybe you're going to do a game or something like that. Well, all you do is you you get, like, let's say you have 10 kids you number one to 10 you put it in a hat, and then you pull out the first number, let's say it's five you give it to the first kid, right. And then you just keep pulling out numbers and giving them to the kids and then tell them to stand in order, right. So you generate a list of random numbers as long as the list of the population so I said what if you have 10 kids well if you have, you know, 500 names then you get 500 numbers. 500 numbers, and they don't have to be one through 500, they just have to be unique. Okay, I like smaller numbers. So I'd say keep them small, but you can do what you want. And then, in any case, you randomly assign these numbers, you can use the hat, big on hats to this population. And then, you know, you ask them to stand in order or somehow you figure out it's kind of like a raffle you call out who's got number one, you know, and whoever says yes, you're like, you're lucky you get to be in my study, you know, so you can take the first five numbers in the order, right. And that's that'll achieve the same thing as the last method, you'll get a simple random sample, it's just two different ways of doing it. So ultimately, being in the simple random sample means that the sample has an equal chance of being selected out of the hat that this group of people, or group of whatever has an equal chance of being selected. And you'll see this picture on the left here is bingo, some of you may play bingo, you know, they pull balls out of there and they call off the names of the balls will each ball has a unique actually a letter and a number unique ID on there. And that's how they make them random. That's they take a simple random sample of these bingo balls each time that they do a bingo game. So I described to you the first method of doing that using an old fashioned hat. The second method, you know, where you generate your own numbers and you just make sure they're unique and then you assign them to things and put them in order. Well, that's my electronic hat. That's how I handle it. If I have for, for example, somebody sends me an excel sheet with a list of hospitals on it. I'll just assign each hospital random number and sort them in order. And I'll sample the top few hospitals. That'll be how I get a simple random sample of hospitals. That way I'm not biased picking out my favorite hospitals where all my friends work right. If I do it that way, the first method or the second method, all members of the population and equal probability of being selected in the sample and more importantly, all possible samples all possible groups and equal chance of being selected. Of course, I only did it once so I only got one of them. But the other ones that weren't selected had an equal chance of being selected. All right, you probably saw the limits is this whole list. Even if I'm sampling hospitals, right, I still need a list of hospitals to sample from. So you may not know who's going to show up in the emergency department that day, if you do, while you're psychic because most people are not. So how would you sample from them using simple random sampling. So simple random sampling is okay when you got a list like hospitals, but it's not so good when you don't know who's going to show up that day. And even if you do a simple random sampling, you need a good list. I made a mistake once where I did a survey with a bunch of professionals using professional society list. And when I sent out the survey, I learned that there were people on the list who were no longer part of the society that it was an old list. And more importantly, there were people who had joined the society that had not made it onto that list. So I was getting under coverage. So like if you were doing a study with students, you know, what if they just left off the part time students, then you'd be missing them. So this is a great example of non sampling error. And so if you're going to do simple random sampling, you do need a list and you really want to research it and make sure it's the best list possible. So I just went over the characteristics of simple random sampling and two different methods you can use from to sample from a list. And I also mentioned the limits of it. Now we'll talk about a different kind of sampling stratified sampling. So we're going to go over what it is. And then, um, just like simple random sampling had all these steps to it, there are different steps in stratified sampling. And I'll give you some examples. And then of course, just like simple random sampling, this stratified sampling has limitations. And I'll talk about those. So I first wanted to just remind you what the word stratified means, or what strata are, the single word is stratum, and more than one is strata. Now you see that rock on the slide, you see that big horizontal line across it, that those that's a stratum, there, there are strata, right, those are strata of rock, if you study geology, that'll the geologists will explain that where those breaks are, and that's what we're going to do. It means something happened often in the weather or the environment or whatever. But the reason why I put this picture up there is I want you to sort of imagine those layers. Because that's what we do in stratified sampling is first we divide our list, of course you need a list, we divide our list into layers. Okay, so remember how I was just talking about simple random sampling like what if I sampled from hospitals. Well, I could take this hospital list and divide it into layers by, for example, how close they are to the city, I could say, urban, suburban, and rural, I could first put them into those strata. Okay. And if I was doing that I'd be doing stratified sampling same with students like I could put them in, you know, first year nursing students, second year students, you know, and I'd have this, them divided into strata first. So this is why so why would you do that? Why don't not just do simple random sampling. Well, if you think about it, let's say that you've got a class like statistics, maybe a lot of, you know, they're not that many first year students in it. So let's say the very small proportion is that way. If you do simple random sampling might just by lock miss all of them. Right. And so if you're really concerned about what a minority thinks, then you can make sure to get representative from that stratum by doing stratified sampling. Because the first thing you do is you put those that list into groups. And then you take a simple random sample from each of the strata. So here's the steps. So step one divide the entire population, the whole list you have into distinct subgroups called strata. And remember, each individual has to fit into one of those categories. So if you have somebody who's sort of halfway halfway between first year and second year, or you've got a hospital that's kind of on the border, it you got to choose you got to put it in one of those groups. Step two. Well, it's not really step two, but you've got to think about the strata like what is it based on? It's got to be based on one specific characteristic such as age income, education level, you know, a great example is you could take people of all different incomes, right, that's a quantitative variable. But you can put them in strata by you know, less than a certain amount and then that to that that to that you can make, you know, four or five strata. And then you know, you just want to make sure that all members of the stratum each stratum share the same characteristic. And then you could do step four, which is draw a simple random sample from each strata. So like in the case where I was describing like maybe you have a class with very few first year students, if you take a random sample of five from each strata, you know, each stratum, then you might be, you know, you're kind of getting almost like extra votes from a small minority, right, like you're kind of treating them fairly, even though there's a way bigger group of the other people you're taking exactly five from. And but you just that that's the risk you take, because you want to make sure you hear from that small group. Because if you just do sample random sampling group so small, you might just accidentally miss it. So here are some examples of stratified sampling. And you'll see this in the youth behavioral risk factor surveillance surveys that they do in high schools that they'll stratify by grade, right? Because if they did a simple random sample, you know, a lot of students drop out a junior and senior year, they get probably too many freshmen and sophomore. And so they're going to want to look at getting a certain amount of freshman classes, certain amount of sophomore classes, certain amount of junior classes, student run the senior classes. So they can have enough of each to make good estimates, right. And in hospitals, they often sample providers from each department, right, like they don't just do a simple random sample of providers, if they're asking about like provider satisfaction, or if you know about a policy, they won't just do that because they might, for example, miss everybody in the ICU. Or if you're studying, you know, ICU is you have multiple ICUs there, then you would want to maybe stratify by ICU just to make sure even if one of them smaller, just to make you sure you have a good, good solid representation from each ICU. So those are the reasons that push you to do stratified sampling. It's not always necessary. But when you have these situations where you have these distinct groups, especially a little one involved, and you want to hear from everybody, you really want to consider the stratified sampling. So of course, there's limitations. And I've been sort of leading up to this. What you end up doing is oversampling one of the groups, usually, you know, like the smallest group, if you make the same amount of people you take from that stratum, the same amount as you take from the big stratum, it's like the smallest group is having all these powerful votes. And the biggest group has is weaker, you know, they're made equal when they're not technically equal in the population. But that's the way it goes, right. And I do higher level statistics, there's ways to adjust back for that. To just sort of say take a penalty for that and go back and say, Well, what if the real pot, you know, we can extrapolate this back to the population proportions, it's possible, but it's it takes some post processing is just the issue. And it's also like simple random sampling, not really possible to do without a list beforehand. And it's also hard to do because you actually have to split the list into groups into these strata. So let's say I had these hospitals and I didn't know where they were. I didn't know exactly if they were urban or rural or suburban. Well, that adds another level of complexity to this old stratified sampling. So in summary, I just went over what stratified means. And it means, you know, putting things in groups and then taking from that. And I described the steps involved. And it's stratified sampling goes a lot easily, a lot more easily, if the strategist happened to be equal to begin with, you know, I gave the example of the high schools, usually there's maybe slightly fewer people in junior and senior year, but it's kind of close. And it's always nice, like if you're comparing ICUs, for example, if the ICUs are roughly the same size, because then you don't have to worry about this whole one of them is smaller, but it's getting an equal vote. Alrighty, now we are going to move on to talk about systematic sampling. Okay, well, systematic sampling actually can be done with or without a list. So it's a little more flexible than the kind of sampling we've been talking about. Systematic sampling, it's easier for me to like define it by describing the steps you go through to do it. So I'm just going to explain how to do it. And then you'll understand it. In fact, you'll understand why it's called systematic. So whether you have a list or not, what you have to do for step one is arrange all the individuals of the population in a particular order. Now, if it's a list, you just make it in whatever order you want to make it in. But if we're talking about, for example, patients coming into the ER, well, they come in in the order that they want to. So they already are arranged in the list, right, you just don't know what that list is. Okay, then step two. Is pick a random individual as a start. So let's say I had a list of hospitals and let's say it was just sorted by state, right? I let's say I picked a random individual, maybe I went down, you know, seven on the list and I picked that hospital. Or maybe you could be at the ER, you start your shift. And the seventh patient, who is admitted to the ER, you pick that person just I picked seven, I mean, you could have picked five, you could have picked 20, you know, just you pick a random person. Then the next step, step three is take every case member of the population in the sample. Now don't try this and scrabble, case is not a word and scrabble. Okay, it's just a word and statistics ease. And what case means spelled kth. It means every so many. So let's pick a number and fill it in for K. So let's pick the number three. So let's say after you picked your first hospital from list or the first patient from the ER, it doesn't matter what number you chose for that. Then you take every third after that. So every third patient that comes in after that, you asked them if they want to be in a study, or every third hospital after that original random one, I pick and I say, Okay, this is going to be part of my systematic sample. So as you see, it's like pretty simple to do. It's easy to do if you have a list. It's easy to do if you don't have a list. It's just the deal is you have to pick K. Well, first you pick a random place to start, then you pick K, and then you just keep going every so many. So you could do this with classes, you could take out a list of classes available at your college next semester, you should pick a random number like three, you know, and it's sorted some way so you go to the third class and you circle that, then you pick another random number like five and then after that you pick every fifth class. So after the third one you go, four, five, six, seven, eight, and then nine, 10, 11, 12, 13, and you keep picking classes. Okay, this is not career advice. Okay, do not pick your classes that way. This was just an example. All right. So as you probably guessed, I'm going to be negative Nellie again, there are problems with systematic sampling. If already things are set up boy, girl, boy, girl, for example, if you pick like an even number, you're going to get all boys or all girls, right. And I noticed this actually when I was doing a study in the lab. We wanted to study like whenever they put the assay through the machines, we thought some of the assays weren't running right. And so we wanted to take a sample. And I wanted to take a systematic sample. But I wanted to take a systematic sample like every seven days. And that's a week. And so I asked my colleague, does the lab vary day by day in what assays it runs, because if it always runs the sexually transmitted disease assays it saves them up and runs them all on Friday. And I'm sampling from every Friday, that's all I'm going to get. Right. That's actually called periodicity. You don't have to remember that I don't think I've ever even seen that written. It's just I remember my lecturer in my class telling us that that's what you have to worry about with systematic sampling. It's not real common problem, though. But what's awesome about it is you can do it in a clinical setting. So you can sample patients that way coming into a clinic or coming into a central lab or like an emergency room. And that's why this is a particular power, particularly powerful way to sample is that if you have an ongoing sort of patient influx, when you design your research, you could simply say once you decide how many people you need to recruit for your sample, that you would use systematic sampling and just have somebody in the clinic, inviting every case person who qualifies every case patient who qualifies into your study. So it's easy to do systematic sampling is easy to do with or without a list. And you just pick a random starting point, and then you pick every case individual. Next, we're going to move on to cluster sampling. So what is up with cluster sampling? Why do we need need even other kinds of sampling? I just went over so many kinds. I mean, you could use stratified systematic or simple random sampling. Why would you even need another kind? Well, cluster is very special. It's special because it's the kind of sampling you use when you think there's a problem at a particular geographic location. Typically, that's how cluster sampling is used. And I'll explain it further. Um, imagine, for example, there's a particular factory that is believed to admit fumes that cause problems with people's health. Well, you can't do simple random sampling all over the nation, right, or you won't even get people by that factory can't really do easily do stratified or systematic sampling there. cluster sampling is what's designed when you want to study something that's coming from a geographic location. So when you do cluster sampling, you start by dividing a map into geographic areas. So I'm from Minnesota, and I know that there was a mind there with a vermiculite in it. And it was it was contaminated. A lot of people got sick from it. But they didn't know that's what was going on. So they first I think divided Minnesota into different geographic areas, areas. After dividing the area into these different geographic areas, some with the, with the bad thing in it and some without the bad thing in it, you randomly pick these clusters or areas from the map. So Like, if you'll see there on the screen, there's a map of the state of Virginia. And it's all been divided into different groups. And then this, this cluster is is highlighted. You'd usually probably pick more than one cluster. Sometimes it's only four or five. But the idea is you try to enroll all of the individuals in the cluster, it's usually people. Although you can do it with animals, if there's a disease going around among animals, you know, you would have these, you divide the area up into clusters. And then you try to measure all the animals in the cluster. So as you can imagine, not only as a sort of practically difficult. But there's reasons why people live together, right? People live in communities. I mean, people don't just randomly scatter themselves, you know, cultural communities grow. Communities grow around art, you know, affluent communities have different people in them than communities that have less money. So sometimes the people located in a cluster all similar in a way that makes the problem hard to study. And this is especially if you're studying some geographic thing like maybe a factory or a sewage plant, that you think might be causing cancer. If you're in an area where there's a lot of pollution anyway, from other things, and a lot of low income people live there, because if you're high income, you can afford not to. Well, they're already being exposed to higher rates of carcinogens and probably have a higher cancer rate. It's hard to tell what the independent effect might be of that thing in that geographic location because of the other similarities of the people around. And so this is cancer ends up being a really difficult, tough not to crack, because where we see high rates, there are often a lot of different geographic issues going on there. And cluster sampling doesn't really help tease that out. So to wrap this up, cluster sampling is used when geography is important. So if there is something geographically located in a certain spot and you can't move it, then you kind of are stuck doing cluster sampling. So briefly, the map around that area is divided into different sub areas, right? And those are not all the areas are picked, just a few are randomly picked. And then all of the people in that particular area are sampled. And of course, it's biased towards the people living in the area. If you, you know, any area you pick with a bunch of affluent people, you'll get affluent people, pick an area with a bunch of immigrants, you'll get immigrants. And so cluster sampling is not perfect, but you're kind of stuck with it. When there's a situation with geography, how I remember it is, when I used to live in Florida, we'd like to drive up to Georgia because they had the best pecan clusters. That's like a type of dessert with pecans and caramel and stuff. So when I think of cluster sampling, I think of those pecan clusters that they're only really good in Georgia. So that's my way of remembering that cluster sampling has to do with geography. Now I'm finally going to talk about the last two types of sampling that I'm going to cover in this lecture. Convenience sampling and multi stage sampling. They're both a little quick. So I'm going to just cover them quickly. First, we're going to start by talking about convenience sampling. And we like that name, right? It's convenient. Convenience sampling can be used under low risk circumstances, like if the findings of what you're doing aren't really that important. Like for instance, let's say that you wanted to know what ice cream is the best from the restaurant next to the hospital, let's say a new restaurant opens up. And you're going to go off your diet, you're going to go get some ice cream, but you don't want to waste it, right? So you want to ask people what's the best one? You might ask your coworkers, you might ask, you know, the people at the restaurant, hey, what's the best ice cream? But the results are not so reliable, because you might end up on yelp and see that other people disagree. So convenient sampling is basically using results or data that are conveniently or readily obtained. And my master's degree, one of the things I did was I surveyed people anonymously who were coming to a health fair, I sat at a booth and I gave them the survey, three questions in it. That was definitely a convenient sample, you know, just people showing up for the health fair. And this can be useful when there's not a lot of resources allocated to the study, like, I was a starving master's student, right? Like I didn't have any money. So that that was perfect for me, convenient sampling. And also, you know, the questions I was asking them about were just characteristics of whether or not they had risk for diabetes. Well, I'm not a doctor, and I wasn't going to do anything about it. But it was interesting. So it wasn't a very high risk survey to fill out. It and convenient sampling is convenient because it uses an already assembled group for surveys like I was doing at the health fair. An example might be to ask patients in the waiting room to fill out a survey or ask students in a class. You know, sometimes I do when I'm teaching, I'll do a convenient sample of whoever's sitting there, I'll say, Hey, is the homework that I signed you this week too hard? Well, it's always too hard. I don't even know why I do the survey. But anyway, I mean, sometimes as a teacher, you'll just want to do a convenient sample just to get the gauge on where the classes. But there are problems with it, right? You can't just use it for everything, even though it's nice and convenient. There's bias in every group, right? So if I let everybody go on break and then whoever's still sitting there, I asked them if the homework's too hard, I might get a totally different answer than if I waited for everybody to come back, right? And, you know, just about any time you just waltz into a room, like when I went to the health fair, who do you think is there a bunch of sick people know there's a bunch of health minded people there and so I'm going to get a bunch of bias, right? And also, more importantly, when you do convenient sampling, you often miss important subpopulations. So remember stratified sampling how sometimes people don't group evenly into the different strata, maybe they do kind of in high schools, but especially when it comes to job classifications, they usually have fewer big wigs than they do lackeys, right? And if they just have a few big wigs, if you do a simple random sample, you, you might miss all of them. So maybe you try a stratified sample. On the other hand, if you walk into the break room that is used by the lackeys, and you say, Hey, want to fill out my, you know, work satisfaction survey, and all of the ones you're going to get are going to be from the lackeys, you're not going to get any representation from the upper job classes because they don't go in that lounge. So you'd be missing them. So that's the main problem with convenience sample is the results can be so severely biased because you're only asking the small biased group of people that probably are all alike in some way. It's not very representative sample. Next, I'm going to talk about multi stage sampling. So, you know, if you have a kid and the kids crying, somebody's like, What's up, you say, Well, the kids going through stages. Well, that's exactly what you're doing when you're doing multi stage sampling is you're going through stages. It's basically like mixing and matching the different sampling. I just talked about only you do one stage and then two stages and then three stages and then four stages, or maybe even more. And that's how you get your sample. So if you're imagining, Wow, I got to start with a lot of people, you're probably right. I just gave an example, I made up of a way that you could do a multi stage sampling is you could start one with stage one as a cluster sample, right? Remember where you take out a map, and then you divide into areas. Well, let's divide into states and take two census regions of states like about 10 states from those clumps. Okay, Now we've limited to that. Now let's go to stage two of our multi stage sampling. Now from each of those we could take a random sample of counties, right? So we go and look at all the counties and we take that random sample. Then after we get those counties, stage three, we could take a stratified sample of schools from each county. So some of the counties will be totally rural, some will be totally urban, but most will have some mix. So we'll take a little a few schools from the urban a few schools from the rural in stage three from the stratified will take a stratified sample schools from the simple random sample of counties from the cluster sample of states. Okay, now we got our schools stage four could be a stratified sample of classrooms. So once we figured out our urban schools and rural schools, we could go in there and look at all the classrooms freshman, sophomore, junior, senior and take a stratified sample of those. So it's basically mixing and matching, but you're right, you got to start with a lot to begin with, if you're going to whittle it down in a whole bunch of stages, doesn't have to be for I just gave you four. Now I'm going to give you a real life example. This is the National Health and Nutrition Examination Survey and Haynes, definitely not a master's project. This is done by the Centers for Disease Control and Prevention at the United States, right? So what I'm kind of hinting towards is the kinds of places doing multi stage sampling our governments, not only do you have to start with a whole bunch of people and things and individuals states and schools and what have you right is that it's a lot of work to do all the sampling and it better be for a good reason. And the National Health and Nutrition Examination Survey is a good reason. That's a that's a survey that's done by the CDC to try and measure America's health. Of course, it's doing inferential statistics, right? It's taking sample and trying to extrapolate that information back to the population. And so it's got to be really careful about how it does a sampling can't just waltz in and do a bunch of convenience sampling. So this is how it does it. Just briefly, they start by in stage one, sampling counties, then from those counties, they sample something called segments, which is defined in the census, it's their different areas. From those segments, those areas, they sample households. And that's what they mean, like wherever you live is a household. Even if you live in a dorm, that's a household or you live in assisted living, that's a household. Um, an apartment building a house. So they sample those. And once they knock on your door of your household, they sample individuals from the household. So they use four stages of sampling. And that's a real life example of multi stage sample. So in summary, convenience and multi stage sampling. Um, with respect to convenience sampling, you want to avoid it unless it's really a low risk question you're asking about. And you also want to avoid it unless it's really the only type of sampling possible under the circumstances. When you have situations where you have patients with very rare disease, probably convenience sampling from your rare disease clinic is reasonable. There, it's also used when resources are low. Um, and so those are a few good reasons to try to use convenience sampling. It's really something that you want to use only if it's the thing you're stuck with. It's much better to look towards these other sampling approaches, I described. And then finally, multi stage sampling is usually used in large governmental studies. So don't expect to actually design anything alone with multi stage sampling. When that happens, I showed you those four things for that survey. That the CDC does hundreds of people work on that even just a sampling tons of people work to try and set that up. It's very difficult. But I wanted you to know about that kind of sampling because it's important in health care. It happens a lot. So in conclusion, we made it through the sampling lecture, didn't we? I first started by describing some definitions you needed to be able to understand all these different types of sampling. Then I went into simple random sampling and showed you how to do it two different ways and what it achieves and also its limitations. We next talked about stratified sampling, why you do that and how you do that and the limitations of that one too. Then we got into systematic sampling, which is a little more flexible and pretty easy to explain. Next, we talked about cluster sampling and why you might need to pull that tool out of your sampling toolbox. And then finally, we covered convenience sampling and multi stage sampling. Alrighty, well, I hope you better understand sampling now and can keep all of these different types of sampling straight in your mind.