 Hello, everybody. It's Monica Wahee, Library College lecturer for statistics. We're on to section 1.3 introduction to experimental design. And here are your learning objectives. So at the end of this lecture, you should be able to first state the steps of conducting a statistical study, and then select one step of developing a statistical study and state the reason for the step. You should be able to name one common mistake that can introduce bias into a survey and give an example should be able to explain what a lurking variable is and give an example of that. And you should be able to define what a completely randomized experiment is. So let's get started. This lecture is going to cover four basic topics. First, we're going to look at the steps to conducting a statistical study. You may think there's a lot of steps to conducting a study. This is from the point of view of the statistician. Okay, then we're going to go over basic terms and definitions. And by now you're probably used to the fact that in statistics, certain words are reappropriated and they mean something specific in statistics. So we'll talk about that. Then we'll talk about bias and what that is and how to avoid it in when designing your studies. Finally, we'll talk about randomization and particular topics you need to think about when thinking about randomization. So let's get started. We're going to start with, of course, basic terms and definitions. And so first, we're going to review these steps that I keep talking about to conducting a statistical study. But there's some vocabulary vocabulary that comes up. And so we're going to talk about those vocabulary terms that come up. And then also I'm going to give you a few examples from healthcare. So here are these steps I keep talking about. So these are the basic guidelines for planning a statistical study. So the first thing you want to do is state your hypothesis. And you know, I'm in a scientist a while now. And I can't tell you how many times I get in a group of us. And people are all curious and they start thinking about let's do a study. And it's only halfway through our conversation that I suddenly say, Hey, wait a second, we don't have a hypothesis. What's our hypothesis? So it's easy even for scientists to forget that that's really step one is you have to have a hypothesis. And so whatever hypothesis you pick, the hypothesis is about some individuals, if I have a hypothesis about hospitals, those are the individuals, if I have a hypothesis about patients, those are the individuals. But it's important actually to nail that down because am I talking about patients in the hospitals? Or am I talking about the hospitals? So make sure that you understand after you, you know, percolate and decide on your hypothesis, who the actual individuals of interest are. And that's because you're going to have to measure measure variables about these individuals. So step three is to specify all the variables you're going to need to measure about these individuals, you know, and of course, they relate to the hypothesis. So it's good thing that was step one, right? Step four is to determine whether you want to use the entire population in your study or a sample. If you already have a bunch of data, like you have the census data, you might as well use the entire population. But typically, if you don't have the data, you're going to want to sit down and think about using a sample. And if you do that while you're sitting down, you should probably also choose the sampling method on the basis of what I talked about in the sampling lecture. Now that you've figured out your hypothesis, you got your individuals, you figured out your variables, and you figured out whether you're going to do a census or a sample, if you're going to do a sample, what type of sample. Step five is you think about the ethical concerns before data collection. If you're going to be asking some sensitive questions, you think about privacy. If you're going to be doing some invasive procedures, you think about how painful that would be and how hard that would be on somebody, especially if they're not even, you know, it's they're just healthy. And you're just doing an experiment on healthy people just to better understand biology. So you have to really sit down and think about these ethical concerns. And they may change slightly your study design. Finally, after you get steps one through five, taken care of, that's when you actually jump in and collect the data. And like I was saying, you know, when I meet with my scientist friends, we get all excited about an idea, we're often talking about step six, we're like, Oh, we should do a survey, we should this, we should that. And I realized I end up saying, Hey, we actually have to go back to step one and start talking about a hypothesis, because I suddenly realized I don't even know what data to collect, right? If you don't go through the steps in order, you really aren't doing it, right? Step seven is after you get the data, you finally use either descriptive or inferential statistics to answer your hypothesis. And that's what statistics is about. It's here for that. And then finally, after you use the statistics, you have to write up what you find, even if you're at a workplace, and they asked you to do a little survey that happened once when I was working somewhere, they wanted us to do a survey. Their hypothesis was that they didn't have enough leadership programs, and they weren't building good leaders they could promote. And so I was on a team that did the survey, we didn't, you know, really publish it, like, everywhere. But we made an internal report, right? And in that internal report, we had to do step eight, which we had to note any concerns about data collection or analysis, you know, that happened when we were doing a report. And we also had to make recommendations for future studies, or if you wanted to study this in future groups of employees. So in science, what it usually ends up being is a peer reviewed literature report, right, is you do a scientific study, maybe you get a grant. And then you do all these steps. And then step eight is where you actually prepare a journal publication. And in that, you have to know any concerns about your data collection or analysis, anything that might have gone wrong, or not gone exactly the way you planned, or something you need to take into account to really properly interpret what the study found. You also want to make recommendations for future studies, especially if you screwed something up, or especially if you answered a really good question, no reason to perseverate on that question, why don't we move forward and ask the next one. Now, these are a lot of steps to remember. So I'm going to help you try to remember them in sort of clumps. So let's look at the first clump, which are steps one through three, which is data hypothesis, identify the individuals of interest, and specify the variables to measure. So let's give an example of that. So let's say our hypothesis was air pollution causes asthma in children who live in urban settings. You know, that's how we'd stated, or we could state it as a research question, like does air pollution, cause asthma and children who live in urban settings. And so in that case, the individuals would be children in urban settings. And the variables we'd have to measure our air pollution at least, and asthma at least. And of course, we'd want to know more things about these individuals, these children, we probably measure their income and where exactly they were living, and how old they were, and if they're a male or female, and these kind of things. But that just kind of helps you think about the first three steps together. Now let's think about the second three steps together for five and six, which is determined if you're going to use a population or sample and if it's sample, pick the sampling method, look at the ethical concerns, and then actually collect the data. So when you do that, you can either quote unquote collect data, you know, like by using existing data by downloading data from the census or like a Medicare, they have data sets available that are are de identified. So you don't know who exactly is in there. Or you can collect data yourself, like do a survey or, you know, get a bunch of patients that will allow you to measurement. When you use it a government data set, often you can make population measures out of it. And so you don't really have to go through a lot of sampling, or ethics, because they've already provided it for you and it's confidential. And that's kind of your data collection. But most of the time, what you'll see, especially for studying patients and treatments and cures and things like that, those are on a smaller scale. So you end up collecting data from a sample for those estimates. And again, you need to choose a sampling approach. And then you need consent, if legally found to be human research. So I just want to share with you, in case you didn't know, if you want to go do research on humans, you're a nursing student or you're a medical student or a dental student, any, any students or your dentist, you're a physician, whatever, a nurse, you can't just make up a survey or study design and go out and do it. You have to get approval from an ethical board. And that ethical board will talk to you if what you're doing is considered legally human research, that you need to get consent from the patients, or the participants in your study if they're humans. And if you're collecting data about children, for example, you have to get the consent of their parents and the assent of the children. And in the United States, the way we have it set up, it's called an institutional review board for the protection of human subjects and research or the short answer is IRB. And so I just want to make sure that if you ever do design a study that you know about this IRB thing, and you realize you have to go through this ethical board and make sure that they're cool with it. Before you can move on to the next step of designing a statistical study. All right, finally, we're on to the last clump of steps, which is seven and eight, right. So that's using descriptive or inferential statistics to answer your hypothesis, you in six, you collected the data. Now we're going to do the statistics. And then step eight is noting any concerns about your data collection or analysis and making recommendations for future studies. So you can kind of imagine this is where we're sitting in our offices and writing up our research, whether we're writing an internal report to our bosses, or we're writing for the scientific literature to publish for everybody. So at this point, I just want to remind you that it matters whether you picked a census or a sample for your study design. Because if you pick the census, you're going to do a certain kind of analysis. And if you picked a sample, you're going to do a different kind of analysis and statistics. So again, that all kind of cycles back to your study design. And what's important here is I want to talk to you about the two different main types of studies. Now within these two categories, you have different subtypes. But these are the two main types that you can have. The first is called an experiment. Experiment is where a treatment or intervention is deliberately assigned to the individuals. So you can kind of imagine that if you enter a study, and they assign you to take a drug in the study that you weren't taking before, that would be an experiment. But another thing could happen, I mean, you could do this to individuals, you could do it to animals, but you could do it, I keep getting the example of hospitals, we could choose some hospitals and say, Hey, you need to try a new policy as the intervention. And, and that was assigned by the researcher. So that makes this an experiment. And the reason why we have experiments is sometimes you need them. The purpose is to study the possible effect of the treatment or the intervention on the variables measured. And so that's one option you can do is have an experimental study where the researcher assigns the individuals to do certain things in the study. There's another kind of study, the other kind, which is called observational. And the way you can think about it is in experiment, the researcher does something, they intervene, they give a treatment, right? But in observational, the researcher doesn't do that. The researcher just observes. So if you enroll in the study, and you say, Do I have to take a drug? Am I supposed to eat something? What am I supposed to do? And the researcher just says, No, we're just going to measure you, we're just going to ask you questions, and we're going to measure things about you. We're not going to tell you to do anything different. Then you're in an observational study. So no treatment or intervention is assigned by the researcher in an observational study. Now, let's say you're taking a drug, you know, just because maybe you have migraines, you're taking a migraine drug, well, you just keep taking it or you can stop taking it. You know, they don't care, they might ask you about taking the drug. But they're not going to assign you to take it, if it's an observational study. I wanted to give you a couple of real life examples. So women's health initiative up on the slide was mainly an experiment. Okay, this is was run by the United States government, but of course had the cooperation of many, many universities and, and healthcare centers and most importantly, women. So women in America, women who are post monoposal volunteered to be in the study. And the study actually had two separate sections, the experiment section, and the observational study section. They really wanted women to qualify for the experiment. And that the purpose of the experiment was to study whether hormone replacement therapy, which is a therapy for symptoms that women can get if they're post menopausal that are unpleasant, whether that therapy is good for women, or bad for women, because they thought maybe it helps the post menopause system symptoms, but they thought maybe it causes cancer, right? So they didn't know. So what they had to do was assign get a bunch of women who were agreeing, you know, that they would take whatever was assigned to them. And they had to assign the drug to some of these women. So that's what made an experiment. The problem is not all the women qualified for the study. So they had a separate observational study, if the woman did not qualify to get the experimental drug assigned to her, then she could be in the observational study. And it's because this is these big government studies, why not, you know, somebody wants to be in a study, why not study them, just put them in the observational section. A very huge, popular, long ongoing study. That's an observational study again, run by a well, this one actually started out of Harvard. And that's called the nurses health study. Some really smart person figured out a long time ago, that nurses are are smart people, they understand their own health, they understand other people's health, and they're good at filling out surveys about health. So they started studying nurses and regularly sending them surveys. Of course, they didn't tell the nurses what to do. They didn't assign the nurses and a sort of drug to take or any diet or intervention or anything. They just observe the nurses, they send the nurses a survey and about the nurses health and then the nurse falls fills out that information. I think it's every two years that they do that. They're still doing it. Also at this point, I do want to point out the concept of replication. So just the word replication and regular speaking means to copy, right? Like if you ever You know, have a new roommate, you might need to replicate your key. So you have a copy of the key for the new roommate. Well, part of the whole science thing is that studies must be done rigorously enough to be replicated. So those are the little keywords in there. A rigorous study means one that's done really carefully, like thinking about sampling very carefully. You know, like avoiding, for example, non sampling error, not being sloppy, not getting a lot of Under coverage, using a good sampling frame, you know, I'm just giving you examples that you might know about. But there's a lot of things that have to be done in research to do it properly. It's just like driving or anything else, you really have to keep your eye on a lot of different things and you want to try to do them perfectly. And the main reason why you want to do that is so if somebody tries to do the same experiment you did or roughly the same experiment you did, she can't do exactly the same right if I study this hospital over here, and somebody wants to study that hospital over there, well, they're going to get different people in there, right? But even so, if that person decides that they want to study that hospital over there, if I did my study rigorously, then it won't be so hard for that person to replicate how I did the study. And then we can see if that person and my study, if we get the same thing or if there's something slightly off or what's going on. And so replicating the results of both observational studies and experiments is necessary for science to progress. So you'll know that a lot of experiments are done on drugs, before they can be approved to be given to everybody because they can't just do one study, they have to replicate it to make sure that the findings are all sort of coming in about the same and that we can deduce some information about it. You really just don't want to rely on one study for your findings. So I just went over several steps that we need to follow when we're doing a statistical study and we actually have to follow them in order. And you also have to determine the type of study you're doing you know as an experiment or observational study. And there's a ton of study decisions you have to make. So you got to keep that in mind. Now we're going to talk about avoiding bias in specifically survey design. Now you can do a lot of different kinds of studies. But let's just talk about surveys because that happens a lot in nursing. Nurses interact with patients a lot with the community with each other. And often they gather information about those interactions or attitudes or or how the health care system functions by using a survey. So surveys can provide a lot of information in useful information. But it's important that at all aspects of survey design, and administration when you're giving it, you got to think about minimizing bias, and try, you know, trying to get a representative sample trying to get accurate measurements. And so several considerations should be made. Um, you want to think about non response and also voluntary response. Okay, so I talked a lot about sampling in the previous lecture. But just because you invite someone to participate in your study, like maybe you're doing systematic sampling and every third patient you ask would you like to fill out a survey, that doesn't mean they're going to write. And so if that person says no, thank you, even though they resampled, that's called non response. So if I was helping you with a survey and you said, Hey, I was getting a lot of non response, I would look at the proportion. If you approached 100 people, and 80 said no, you know, that's only a 20% response rate in an 80% non response rate. If many people are refusing your survey, the few who actually completed are likely to have a biased opinion. I've noticed this at in situations where things are really bad. Okay, like I remember going to a subway station and it was flooded and it was really in a bad situation. And there was a man handing out surveys from the Transportation Authority. And he was like, please take my survey, please take my survey. And everybody was waving past him, they didn't want to grab a survey. Well, you know, me, I got a bleeding heart for survey. So I took his survey. And I filled it out. You know, I think the transportation authorities not so bad. Right, I lived in Florida, there's no transportation there, right? So in here in Massachusetts, we got a great transportation system, even if it's flooded or doesn't work half the time, right? It's way better than not having one. Well, I'm not the only one who grabbed a survey, a bunch of nice polyannas like me grabbed a survey. So probably the transit authority thinks that everybody loves the subway when everybody was waving past this poor guy because they were so disgusted, because the station was flooded, right? So if so many people are refusing your survey, a high proportion, the feeble will actually fill it out are going to be kind of weird, probably like me. You know, you're gonna get a bunch of happy people when most of the people who said no might be sad people. And so the reason they may not be completing your survey has may have to do how with how they feel about your topic. This is not just in terms of satisfaction. Let's say you want to talk about how many drinks per night somebody has. Okay, do you think a lot of people who are struggling with alcoholism are going to want to fill out that survey? You know, how about illegal drugs or other illegal activity, people who are into that, they don't always feel so good about talking about it. And so, you know, you might get a few people to fill out your survey, but those are not necessarily the people who are engaging in the behavior. So the fact that we have the freedom to choose whether or not we want to be in a survey is great. But from a researcher standpoint is you have to be careful. If you get low response rates, you need to ask yourself who's not responding. And, you know, am I missing a good share of opinion there. And then when you get people who do respond, you got to be careful with that to respondents may lie on purpose. If you've got a pretty cool survey, but you suddenly ask a question that's too personal, people might just lie. If you ask maybe a students you're doing a student, you know, maybe a satisfaction survey with how the front desk runs at a dorm or something. If you, you know, ask a question, have you ever cheated on a test? You know, my everybody's probably going to say no. Also, if you ask a question where people don't really know the answer off hand, they're not going to put it. Like if you ask somebody, you know, when you're, you know, you ask a kid who's been living in the house forever, when your parents bought that house, how much did it cost? I mean, they're not going to know. Maybe they'll know but probably not. And so you want to be careful. And you design your questions that you're not asking anything that's so personal, everybody's going to lie about it. Or that you're not asking a question that even if people try to be accurate, they're probably not going to give you the right answer, because it's just too hard to think about. Respondents also, you know, to surveys may lie without meaning to like inadvertently. Again, if you ask a question about something that happened really a long time ago, they're not probably going to get it right. This is called recall bias, like you can have, you can, you know, how like, you can look back at a time in your life, like, especially if you went through something really harsh, like if you were part of a sports team and you went to state and it was really tough that you don't remember the tough part, right? You sit around singing, you know, your sports songs and you say, Hey, that was awesome. Well, that's recall bias, right? Because after winning state, everything looks rosy. But, you know, on the bus there, it really wasn't that easy. So people tend to have recall bias, it's influenced by events that have happened since the original event. So if you're giving people a survey and you're saying, Well, before you applied for nursing school, you know, what did you did you think this or did you think that, you know, they might just tell you and think they're telling you the truth, but they're actually lying if you actually went managed to go back in time and ask them then they tell you something different. So again, you can kind of screw up your own data by screwing up your own questions. So you want to think about how you word your questions. You can also screw up your questions by introducing a hidden bias. Something happened to me recently, where a company sent me a free app. And they said try a free app and I downloaded it. And it was awful. Okay. And then about a month later, they sent me a survey. And these were the questions I said, When you use the app, you know, what time of day do you use it? Right? Like, how, how, how do you use it? Do you read scientific literature? Do you read news? And the problem was, I couldn't really answer any of this because from the day I downloaded it, I never used it. It was so bad. Right. So question wording may induce a certain response. They were asking me how do you use this? But they didn't give me a choice of I don't. So I had to say something. I don't even know what I said. I mean, there was nothing I could say to be honest, because of that bias. So you have to be careful that you aren't too rosy about whatever your topic is. And and assume everybody loves everything. I mean, you've got to put out questions like are you even using the software? Did you have any problems with the software? Right? I'm just assuming they're using it and liking it and using it, you know, like it's supposed to be used is a big assumption. Order of questions and other wording may induce a certain response. And you'll see this a lot if you take a public opinion poll. I used to do a lot of polling. We'd ask questions like, how likely are you to vote for candidate x, you know, very likely someone likely, someone unlikely and not at all likely. And people say, I don't know, not at all likely. And then you'd say, Well, what if you knew that candidate x supported this new proposition proposition 69, right? Then would you be more likely to vote for candidate x. And so that's why order of questions, other wording and stuff, they're trying to see if I add this fact add that fact, is that going to make the person like the candidate better. And so you do have to think about the order you put the questions in. If you want to ask about two different subjects, kind of think about which subject should come first, because it might color the respondents answering of the subsequent subject. And also on the slide, I wanted to point out that the scales of questions may not accurately measure responses. Do your feelings always fit on a scale from one to five? Well, you know, Yelps kind of figured it out, if people's feelings about restaurants tend to fit on a scale of one to five, I'd have a lot of trouble filling that out if they gave me a scale of one to 17. Right. But sometimes people have more granular feelings about things, maybe they need a longer scale one to seven, you'll see a lot of pain scales, where they offer more than just five choices, because probably pain can maybe go from one to seven or one to 10. So think about your scales, when you're creating these questions, because that's your choice if you're designing the study. Another point to be made is the influence of the interviewer. Now we don't have as much interviewing going on these days, because we have the internet where we can do anonymous surveys and people just fill them out self report, we have robo phones that you can call robo call and using an automated voice, that's obviously not a person, you can get survey data. But there's always situations where you actually have to interview people, especially if somebody's really sick in bed and you have to show up there, you have to talk to them. And so, even on the phone, you have to interview people and they can hear your voice, right? So you got to think about when you're pairing up whoever's being interviewed with whoever's interviewing. I've found that it's best to have the interviewer come from the same population as the research participant, in general, the only time that can be a problem is if there's from the same community, and there's a privacy issue. But it can be very helpful, for the most part, not always to have your interviewers be actually from the population that you would be studying, you know, from the individuals that you would be studying. So for instance, if you need to interview a bunch of young African American, you know, like some African American teenage men, like I recently saw study on how healthcare in the United States really isn't suited for them and it needs to improve and needs to better cater to this population. Well, let's say you wanted to better understand that the best thing would be is to hire a young African American male and train him on how to be a good interview and to be a good data collector because you probably get the best data that way. On the other hand, let's think of different ways that that could go. You could take a person who is older, who is maybe of a different race. And maybe that would change how this young African American male would respond to this interviewer. I mean, the interviewer could be like, in many ways, like the respondent, but the respondent's perception might change then how they answer all verbal and nonverbal influences matter, you know, clothing, the setting that the person's being interviewed in. And so I'm not saying there's really a solution to all this. I'm just saying make some good decisions. Like I remember working on a data set where there were some questions that had been asked about some older men about their sexual function. And I, it looks the data look funny to me and the statistician who was there during data collection told me that they had chosen young female nursing students to interview these elder men about their sexual habits. And I just said, you know, that might be subject to interviewer influence. And then you, of course, have to worry about vague wording. Just because it looks clear to you doesn't mean it looks clear to everyone. There are simple ways of avoiding vague terms in a survey. When you can just put a number on it. So instead of asking a person if they've waited a long time in the waiting room, you can say more than 10 minutes. You can say exactly like within the last month, have you done certain a certain activity or within the next year, do you expect to change schools or whatever. And so try to wherever you can use numbers or something very specific, you know, instead of go to the clinic, go to the public health clinic at this particular corner or whatever. And then you're going to get some pretty accurate information. But sometimes you're stuck using vague terms because you're studying vague terms, right? I was doing a study of controllable lifestyle attitudes towards controllable lifestyle in medical students. So we asked this question, how important is having a controllable lifestyle to you in your future career? Well, what does that mean? That's pretty vague. So what we did is we use this grounding this anchoring language. We added the sentence, a controllable lifestyle is defined as one that allows the physician to control the number of hours devoted to practicing his or her specialty. So even though we're talking about something kind of waffly and watery and loosey goosey like controllable lifestyle, who knows what that means. And that's not to say that that sentence could be interpreted differently by people it certainly is. But if you're stuck with a vague wording, try to put some grounding language in it. So everybody's at least sort of led in the same direction with their thought before they answer the question. Now, I want to also point out, you probably have noticed, there's all these issues you have to think about when doing surveys. There's this other issue called the lurking variable. Well, you know, lurk means to sneak around behind the scenes, right? Behind the scenes, a lurking variable is a variable that's associated with a condition, but it may not actually cause it. I remember when I was studying epidemiology, they talked about how a lot of people with motorcycle accidents, you unfortunately got in motorcycle accidents that they had tattoos. So therefore they said, everybody shouldn't get a tattoo, you might get in a motorcycle accident. Well, that's a great example of a lurking variable. Yeah, a lot of people who do get into motorcycle accidents have tattoos, but the tattoos don't cause that. We also know that having more education increases income. But people of the same education level do not all make the same income. There's this thing, you know, called sexism, and it's called racism. So it matters whether you're a woman or a man, it matters the color of your skin. If the, you know, if you've got a darker skin, doesn't matter that you have the same education as somebody with lighter skin, you're still going to make less money. And so you have these lurking variables behind the scenes. So when people are looking at well, why are people, you know, making less income, because they're less educated, whatever, well, you got to look for also the lurking variables. So current studies show that why women and African Americans make less money on the whole is not explained by fewer of them working or fewer of them getting degrees. It's really these lurking variables. And so, you know, you got to think critically. And I guess what I would say is whenever you do a survey, if you're studying something that has a lot of lurking variables associated with it, make sure you measure those variables. Like early studies where they were looking to see if drinking a lot of alcohol causes lung cancer, some of them forgot to really study how much these people would smoke. Because we know smoking causes lung cancer. And we know if you're hanging out in a place with a lot of drinking and they allow smoking, you'll see a lot of people smoking too. They seem to go hand in hand. So you don't want to miss measuring variables that you think might be lurking variables. It's no problem to measure them and not use them later, but just make sure they're included. So as a final note on bias, I just want to point out that survey results are so important for healthcare and for the progression of science that you really owe it to even a simplest survey to think about all of these things. These possible things that could go wrong, just with the wording of questions or with how you're approaching things and just really consider how you can improve it. It's really important to pay attention to avoiding bias when you're designing and conducting your survey. So think about all these things at the design phase. Finally, I'll get into the last section of this lecture, which is about randomization, which I think a lot of us have heard about. So I'm going to explain the steps to a completely randomized experiment. And after I go through all that, I'm going to also talk about the concept of a placebo and the placebo effect. Then we're going to briefly touch on blocked randomization and also define for you what is meant by blinding. So why ever randomized, right? So what randomizing is is when you take a bunch of respondents or participants in your study, and you randomly choose what group they go in. And if you remember, like I was talking about experiment versus observational study, we can't do that in an observational study. This is definitely an experiment because you're telling them what group to go in, right? So randomization is used to assign individuals to treatment groups. And when you do that, when you randomly assign them, not only you're assigning them, but you're randomly assigning them, you're not picking, you know, you're using like dice or some sort of random method. It helps prevent bias and selecting members for each group. It distributes the lurking variables evenly, even if you don't know about the lurking variables, even if you aren't measuring them. By using this randomization method, they get equally allocated in each group. So just to remind you how you actually do that is first I remember the steps to that statistical study, you have to follow those. And after you get to the point where you have ethical approval, that's when you start doing the data collection step. And that's where you start recruiting sample or, you know, hanging up signs and saying, be in my study and people come in, and you see if they qualify. And if they qualify, you've got this group of sample, right? And what you do with those people is you say, thank you for being in my study. And you measure the confounders, which is another word for lurking variables. You also measure the outcome, whatever you're trying to study. If you're doing a randomized experiment, I know I've been involved in a lot of these where they're studying drugs for lowering blood pressure. So they'll often have maybe two groups or three groups where they're randomizing people into, but they don't do that first. The first thing to do is get everybody in there and measure their blood pressure, right? The outcome, you know, because they want to know the before they got to take a picture of the before. And they also measure confounders like smoking. Remember, smoking, you know, is not good for your blood pressure. You know, other things are not good for your blood pressure, like not exercising, they'll measure all of those things. Okay, now here's where we get into things. That's when the whole randomization happens. I show this picture of a die, but we usually use a computer for it. So we got all these people together and now, you know, randomly, we put them in different groups. And in this example on the slide, we're just going to pretend that there's two groups. And in fact, we can't really study blood pressure on the slide, because we're going to give one group treatment and the other group placebo, which is an inactive treatment. It's fake. It doesn't work. Of course, the treatment and the placebo are going to look the same to the people taking it or, you know, we're going to fool them. They don't they won't know. But the reason why in real life, you can't do that with a blood pressure study today is we know that high blood pressure is really bad for you. So it's really unethical to give somebody a placebo, you got to give them some sort of drug to lower their blood pressure. So usually when we do studies like this of blood pressure now, new blood pressure drugs, group A is treatment in group B is old treatment, like they usually take a new treatment and give it to group A to group B to see if they can find just a better treatment. But if we were talking about something like Alzheimer's, especially late stage Alzheimer's, there's no treatment. Okay. And so what's on the slide here, group A that gets treatment in group B, which gets this sham pill, this placebo, that would be ethical then. But let's just cross our fingers that someday that's not ethical anymore and that we do get a treatment, right? Okay, so after you put them in the two groups, what's sort of missing from the slide is time passes. People in group A take whatever they're supposed to take their treatment. And in this example on the slide, people in group B, take the fake treatment, the placebo and neither of them, you know, usually knows what's happening. But it takes a while, right? And in the olden days, before we knew high blood pressure was bad. These were the study designs. And this is what ended up happening is that you would see at the beginning where they measured the confounders and the outcome. Everybody had a high blood pressure, they all look the same. But after treatment, group A would go down, whereas group B would go down a little bit from placebo effect, which I'll explain in the next side. But that's how we learned that you can make blood pressure go down with these different pills. Finally, after that time passed, it could be six weeks, it could be years. However long that took after that time passed, it could be six weeks, it could be years. However long that took after that passed, when it was over, we'd measure again, the confounders because they could have changed. And the outcome which in my example was blood pressure, or, you know, how serious somebody's Alzheimer's disease would be, if we were doing that. So I promised you on the last slide that I talked to you about more about what a placebo is in a placebo effect, found this great picture of old placebos from the National Institutes of Health. So a placebo is this fake drug that's given and it's actually kind of hard to make placebos. Just imagine a drug you may need to take me even etc, and there's something like that. Imagine we had to study etc, and we'd have to make a fake etc that tasted like it and looked like it. Because then otherwise, the people who are randomized to the placebo group would be able to totally tell that they were in the placebo group and that's not good to do. So the reason why you need a placebo is there's this thing called the placebo effect. And that occurs when there is no treatment, but the participant assumes she is receiving treatment and responds favorably. Now, sometimes I talk about one of my favorite epidemiologist comedians, Ben Goldacre, he reported in one of his, I think one of his TED talks about a study where they everybody they enrolled, they didn't have a disease, right? I guess they had a mild disease. And they told everybody either they were going to give them nothing, or they were going to give them a pill that's a placebo, it doesn't do anything. Or they're going to give them an injection. That's a placebo injection, it doesn't do anything. And what they found is of the three groups, the people who got the injection did the best. And the people, you know, the fake injection, people who got the fake pill, the placebo pill, they did second best and people didn't get anything did the worst. And that his point is that's what the placebo effect is. For some reason, when we're getting injected, even with just sailing, we think we're getting some sort of drug and it psychologically or however affects our bodies. The same thing when we're taking a pill. I don't know if you've ever seen kids, you know, saying, oh, I need medicine, I need medicine, and then the parent gives them an M&M, right? They think it's a pill, they're happy with it. But actually, the placebo effect can cause real effects on your health, it can make you feel better just because you think you're taking a drug. And so that's why it's super important to include a placebo group. If you don't have a comparison group like I described with blood blood pressure in all your studies, because if you just have one group where they're taking it, they'll all say it's good. They would say it's good if it was water, right? So the placebo is given to what's called a control group, and they receive the placebo. Now, if you're studying like acupuncture, you can't really give a placebo acupuncture. So what they'll do is they'll sort of hang, hang up a little curtain and kind of tap you and you don't know whether you're getting real, or it's called sham acupuncture. The other things have to happen like that when you're doing these studying these interventions that aren't pills. Those are called attention controls, right, where we have like a sham acupuncture. So in any case, you've got to think about this because you need a control or comparison group. That's fair. Whenever you're testing in an experiment in a randomized experiment, a new thing, I promised you I'd talk a little bit about blocked randomization. I won't get much into it. But sometimes when you go to randomize, right, you know, you get this whole group of people, they're all about the same, but you're going to split them into a group A and group B, one's going to get maybe a drug and the others maybe going to get the placebo. Sometimes you get worried that the groups are going to be unbalanced with respect to a particular lurking variable. In blood pressure, we'd always care about smoking, we want the equal amount of smokers in each group. You know, a lot of times we we care about gender, we want equal amounts of men and women in each group. So if you're worried about that with your randomization, you can't just do it one at a time, because you might just randomly put too many men in one group. So what you have to do is block randomization. So see, I drew all these blocks on the, on the screen, and you'll see that there's nobody in them, they're just blank, I just put x, x, x. So this is before you do your study, you have these blank blocks. And what you do is as you enroll those people, remember, you have to measure them and make sure that they qualify for your study. As you get them in, you can just write them in the blocks, right. So here I just put their fake initials, you know, so let's say that x, y, z came in first, that's a woman, and then maybe NFW came in, and that's another woman, you just keep putting the women there. And then when the men come in, you put them in, and you fill up the blocks. Then here's a trick. You actually randomize the entire blocks, right. So block one and block three ended up in group A, and but magic, you got to equal men and women there. And then group B, equal men and women. And so that's how you do with blocks. So, but you know, there's some limitation to this, like if you get multiple races in your study, maybe, you know, four or five racial groups, you make a five block, you've got to fill up the whole block before you randomize it. And, you know, sometimes you're in an area where certain racial groups are rarer, and you might have trouble filling up your blocks. So there's some limitations to this too. Now, I mentioned the situation where you really don't want, if you're going to do an experiment, right, non observational study and experiment. And you're going to randomize people either to a drug or some sort of intervention versus placebo, or a drug versus another drug, an old drug, you really don't want them to know what group they're in. I mean, because you have to be ethical, before they enter the study, you have to tell them you're going to put them in one or two, one of two groups, but you got to tell them, you're not going to know what group you're in while it's going on. So blinding is where the where any person is deliberately not told of the treatment assignment. So he or she is not biased in reporting study information. And it actually doesn't have to just be the person participant in the study but it can be research staff, like the most common one is a participant is blinded to treatment or placebo. But I've been in studies, or I've worked on studies of like Alzheimer's disease, right? Well, they'll they want to take the patients or the participants in the study who might have Alzheimer's disease, and look at their image, the MRI of their head. And often they'll have also a neurologist interview them, they'll also see a neuropsychologist, and they often want those three different groups, the imaging group, the neuropsychology group and the neurology group, not to know about each other's opinion of this particular patient. So they'll blind them to each other's opinion. So blinding are it's much more complicated and just blinding the participant to whether or not they're in placebo or they're in a drug group. But double blind is a really important concept. And that means that both the participant and the study staff do not know the treatment assignment. So everybody who's operating with the patient doesn't know it. So you're probably thinking that's really pretty serious, right? Like, what if that person gets sick, and goes to the emergency room, and they're taking an experimental drug, or they could be taking placebo, who knows what they're taking? Well, in that case, what happens is there's an unblinding procedure, it just has to be as part of ethics. It's already set up in the study. If somebody goes the emergency room, there's a person that can be called to unblind, the pay the participant, who's now a patient. And once they're unblind, they learn what they were taking, even if they were taking placebo, the whole thing's over, right, even the study staff learn. It's just a fact of life, it has to happen sometime. But for the most part, what we try to do is keep things double blind, because it makes things the least biased in the most fair. So 10 the section on randomization, the purpose of randomization, why we go through all this when we're testing treatments, especially, is that it's used to reduce bias. And, especially if you have a particular variable you're concerned about like gender, or, like we were talking about race, or smoking, smoking status, you can use a blocked randomization to even out each group. And then blinding further prevents bias, right, because people don't know what they're taking and the study staff don't know what they're giving them. And the reason why you have to really think about blinding is the placebo effect is necessary to take into account. You're always going to get the placebo effect every time you give somebody something. So you've got to account for that in your study design. So in conclusion, I went over the steps to conducting a statistical study in order and kind of gave you tips on how to remember that. We looked at some basic terms and definitions. And we talked about how to avoid bias and survey design because there's a lot of different considerations. And finally, we talked more in depth about specifically about randomization in experiments. All right, now you know a lot, maybe too much. I hope you enjoyed my lecture.