 Chapter 5 is on experimental design and sampling methods, and it's going to seem very non-math-y after what we've been going through with normal curves and z-scores and percents and percentiles and linear regression. But it's extremely important to what we'll do the second term in this class. So the vocabulary and the methods matter greatly to the analysis we'll do with statistics in the second term. This video is quite short, just trying to do some quick vocabulary review on experiments and observational studies. So a reminder, an observational study, there we observe individuals, we measure variables of interest, but we're not actively involved with trying to influence the responses. We're just watching and recording or researching data that already exists. In an experiment, on the other hand, we are actively involved with the individuals in some way. The term of art we've used here is that we're going to impose a treatment of some kind and see what sort of impact that has on the response variable that we're measuring. Now, a survey is a type of observational study, and we'll talk about surveys in a future video. Just know that when we're asking people questions, for example, we're not influencing anything, we're just gathering data. Now, we're going to talk about sampling methods in a future video, and if we're doing an experiment or an observational study, it's almost always impossible to do it on the entire population that we're interested in studying, and so we take a sample. The population represents the entire group of individuals that we're interested in. Taking a sample is sort of the pragmatic approach to how we study that, so we study a subset. So a sample is a subset of the population that we're actually interested in examining, and it's important that you know the difference, and it's important that you be able to define a sample and define the population that the sample represents. The activity of pulling a subset out of the population is called sampling, and there are many methods for doing sampling that we'll talk about in another video, but that's the part of studying, part of the population to gather information on the whole, and we want to do that well, and we'll talk about how to do that in another video. It is possible to attempt to contact everybody if I wanted to survey my class of AP students that wouldn't be hard. I could do a census there. It wouldn't necessarily be hard at the high school either, but if I was interested in the opinions of voters in the United States on the debate right now over Obamacare, for example, it would be very hard to talk to everybody in the US. So we would take a sample to do that. The picture shows humans, keep in mind, it doesn't have to be humans. My population might be dogs, small dogs, and my sample might be a subset of small dogs, and the experiment I might do might be related to a new dog for if I'm Apple, and I'm making iPhones, the population might be all the iPhones made in a day, and the sample might be the 10 or 20 or 50 that I pull off, the assembly line to test for quality. So just remember, a population doesn't have to be full of humans. Read this quickly. Would you please pause me? Do this for practice? A quick example here. Read this. Identify, is this an observational study or experiment? Why? Identify the explanatory and response variables here and identify the sample and population. So pause me, read this, and do that for practice. Okay, in this case you should have said that this was an observational study. We're not imposing a treatment here. We didn't ask anybody to go intentionally use or not use a cell phone, so most likely we're looking back over data gathered here about folks through some sort of survey mechanism. Explanatory and response variables. When you identify or try to name an explanatory and response variable, this is picky, but you ought to identify them as things that vary. They are variables. So the explanatory variable here is not cell phones. The explanatory variable here is something that varies, and it's not obvious here, but it might be minutes of cell phone use in a day or hours of cell phone use in a week or something. We're clearly trying to establish a link between how much you use your cell phone and brain cancer. The response variable here is not brain cancer. That's not a variable. That's not something that varies. We could make it a variable quite easily by saying did or did not get brain cancer, and now it's a variable and it has two values, did get it or didn't get it. So when you name variables, make them something that varies. The sample here is 469 people who have brain cancer, along with the 469 people they were matched against. We'll talk about this in a future lesson. This is called a match pairs design, and the matching on sex, age, and race is meant to eliminate lurking variables here. So the sample is the subset, the 469 plus those that they were matched with. The population here may be a bit more difficult, and if you get stuck on identifying population, first remember it's big, bigger than the sample. And you can ask yourself who or what does the sample represent, and really we're studying people who use cell phones here, and so I think that the population here is a population of people who use cell phones, and what we're interested in is whether they get brain cancer. Here's one more, for practice read this, same questions. Pause me and read it and attempt to answer the questions for practice. Okay, in this case you should have said we have an experiment. Clearly we're imposing a treatment here. We've intentionally taken some students and divided them into two groups. One group uses animation, the other uses text to study cell biology, so we've imposed a treatment. I'm going to ask you what's the treatment here, and the treatment, there's actually two treatments here. The treatments are using the animation to study or using the text to study. Explanatory and response variables, things that vary. The explanatory variable here, type of instruction. It has two values, animation or text, so that's a variable. The response variable here, it says we're looking at the increase in understanding of cell biology. So if we were measuring this, perhaps there was a pre-test and a post-test or something like that to see which group had the greater increase in cell biology. So it could be increase in test scores, for example. The sample here tells us what the sample is. It's a group of first year college students. It doesn't tell us how many, so we're limited in our ability to be specific, but it's a group of first year college students. The population is interesting here. We want to be careful. We don't want to blow it up. It's not college students, because we're not studying all college students here, and fourth year college students might have a different capacity for learning than first year college students. So really about the best we can say here is what does our sample represent? Perhaps all first year college students would be okay, or all first year college students studying cell biology might be okay. But we don't want to make it bigger than it really is. So the sample represents the population and trying to bring the population that way. Alright, so quick review there. You should have a good comfort level with the difference between an experiment and an observational study. You should be able to define well, explanatory and response variables. Remember when you define them, they're variables, so make them sound like things that vary that can take on different values. And you should be able to pull out samples and populations and be able to explain the difference between the two.