 Well, hello everybody today. I see we have one person. Welcome to our lecture today. Let's hope that there's a few more. It's Monica Wahee and I'm here to present to you easy power calculations in G-Power. So today what we're going to talk about is G-Power and I'm like a data scientist. That's what I say I am. But what I'm really is a epidemiologist and a biostatistician and an informaticist, but you'll notice I did not say I was a mathematician. I am not very good at math. I mean, I'm not bad at math, but I'm not like really good at math. Like I've been listening to a lot to this trial of Sam Bankman-Freed and no matter what you think of all these fraudsters, they were really good in math. And I am not that good at math. I'm very good at ethics and governance and so you can probably trust me more, but I use Excel to calculate things. And I also use G-Power. So G-Power is a dedicated application that was developed and I was actually looking something up before I started. It was developed actually in Germany in the 90s. And so you'll see this application. It's free. You can download it. I'll show you where and I'll show you how to use it. But you'll see it looks kind of like Windows XPE. And so applications that look like that sometimes they're built in that era and this one definitely was. But don't like judge it by its cover. And also Windows XP wasn't that bad. It's like a really good interface, I think. Okay, so this application was built just to calculate sample size. It's just for power calculations. It's not for anything else. And so you might think, you know, if you have a kitchen gadget and it just does one little thing, like it just removes eggshells or something like that, you're like, what a waste? I am not putting that in my kitchen. It's my kitchen so small. But, you know, this is not a kitchen. Like you can have dedicated apps for just certain little tasks in statistics or biostatistics. And why is that not a bad idea? I will tell you why that's not a bad idea. That way you make sure your math is right. And that gets back to the fact that math is just not my strong suit. You have to really kind of have your mind on math when you're using proc power in SAS. I mean, there's just so many ways you can use a calculator. You can use a spreadsheet to do power calculations. The advantage of deep power is it is menu driven, and it's well documented, and you can just use it. It helps you stay unconfused about what you're doing, because what you're doing is really important. If you're calculating sample size for like an epidemiologic study, that's going to impact the budget a lot. And so, you can just imagine I've been in kind of pretty bad situations, not really bad, but like challenging situations where I'm the one who's making these calculations. And people are like, okay, if we, for every 10 people we have, we can afford to collect less data from them. And they're like, okay, but if you don't have enough people, you won't have a study. So you end up sort of having all this political discussion around what the sample size calculation is. The last thing you want to do is get it wrong, right? So that's why I just love using g power. And in fact, like if you just download g power and you start to use it, you're probably like, okay, I don't know what to do. The best way to sort of explain how the power of g power is to use it in an actual biostatistical context, which is why I held this today is I wanted to just explain how you use g power to actually calculate, estimate sample size, like you're writing a grant and it's going to matter. Okay. So, but before I get started, I want to encourage you to sign up for my free online workshop that I'm holding in November. It's called application basics, and the theme is integrating application pipelines. So as you probably just guessed from my little introduction, I love to use applications. I use R, I use SAS, I use deep hour. Zoom, I use a lot of different applications. But when you're a data scientist, your job is often not to make them or use them. Your job is to put them together and pipelines is to figure out, like, what are the right applications for what you're doing? You know, and that's sort of like more of a management function. And unfortunately, a lot of people in biostatistics or in healthcare informatics, you know, we didn't go through business schools and we didn't go through computer programming schools. So there's a lot of terminology and knowledge around applications we just don't have, even though we're smart, or at least I'll speak for myself. I'm pretty smart. But I didn't know any of this. Okay, there was a time in my life I did not know any of this that I'm teaching in the workshop. And so that's why I have holding the workshop is so you can learn what I learned and just have a crash course in it. So what I did was I used to hold this workshop in like three days, but I managed to get it down to two and put it over a weekend. So if you work during the week, this is perfect for you. It's Saturday and Sunday, November 18th and 19th. The sessions start at 12pm Eastern time and they'll each be about like four or five hours depending on how many people show up, because it's a real workshop like we're going to go through this online course. I've already programmed, which is part of my data science mentoring program, which you can find out more about if you're interested. But this course is one of the core courses in the mentoring program. And so I give you a free access to this course. And then we all meet on zoom. And it's a workshop so you got to be there you got to be paying attention and doing your challenges together. And hopefully it'll be a memorable experience or you do a little networking, learn a little something about applications and just kind of add to your data science toolkit. So if that sounds like fun, please I encourage you to sign up for my free application basics workshop. All right, now back to our regularly scheduled program. G power. All right. So before I go to the actual software G power. I just want to explain kind of the use case. You want to download these slides because I've got these links. I actually had a blog post documenting me doing this use case, but I don't think people read it much so I thought I'll just talk about it on here. Okay, so what is G power I as I was saying before G power is a standalone application, you download and install that is only for doing power calculations. Okay, so it's only for. So the term power calculation is synonymous with sample size calculation, which I don't know why we call it that why we say power calculation. I mean you do end up calculating power but that's not the point. What is a calculate sample size so but I mean if you want to feel cool you say power calculation like, like I was in Florida I used to be married and my husband he'd say power, power calculation because in Florida that would be their their accent but we're from Minnesota so we'd say power. But anyway, the advantages of using G power is that the interface, the way the guided user interface these dropdowns you'll see it, it helps you make sure that you're doing things the right way. And I actually have done a calculation of G power and thought it was right and then noticed I had said a dropdown wrong. So, like you can take a screenshot of what you did and you're like, oh, I didn't do that right. So it's like super helpful, like you can tell if you screwed something up. And it does all kinds of power calculations. What's cool about it is it does all kinds, but what's not cool about life is you usually only end up doing certain things like whatever job you're in, you usually focus on a domain so maybe there's only part of G power you ever use but and that really is obvious with me because you'll see me demonstrating only certain things that I use all the time. But that's going to happen to you too, right? That's just work. So the disadvantages, now you're probably thinking you're selling G power, well there's disadvantages. And the disadvantage that people who are really good at math and programming would cite is that it's really manual. Like you have to click, click, click, calculate, click, click, click, calculate. And you've got to be organized about the results of that calculation because each time you do a new one, it's erases and you get a new one, right? And so maybe you've got to do data collection on your answers. Yeah. And so that's part of what I'm going to demonstrate is just how do you do this manual operation and then keep track of what happened so you can go and communicate with the rest of your team. All right. So, so this is the scenario that we're going to look at today. We're going to look at bleeding on probing, which is an oral health measure. BOP, you know, it sounds like bop bop all out. Well, it's not fun. Okay, bleeding on probing is not fun. Okay, it happens when you have gingivitis. So I don't know if this has ever happened to you where something either maybe you haven't had an opportunity to brush your teeth or whatever but your gums swell up and they start kind of bleeding and feeling inflamed. Now, what would happen if you went to the dentist when that happens is they have a dental probe that they can probe six sites on each of your teeth, right? So let's say they take one of your teeth and they probe all six sites. They look at the sites and they see if any blood comes out. If blood comes out, it's yes. And if blood doesn't come out, it's no. So if they probe one tooth six sites and blood comes out three times, you have a BOP of 0.5 on that tooth, right? And so, if you have 20 teeth still in your mouth and you times that by six times two was 120, 1200, there's so many measurements in this dental software. But anyway, I'll tell you that dental EMR software is crazy. The visualizations are awesome. Like it's just calculating BOP for you on the fly. It's cool stuff. But anyway, what if you're doing research, okay? I mean, you can use the dental software but you still have to be thinking about what your sample size is. So like a BOP, so if somebody did my whole mouth and did all six sites on all the teeth and I got a whole mouth BOP of 0.75, it means that 75% of the sites blood. And yeah, you might be thinking, well, what if just the sites over here are bleeding and not over here? Well, that's taken care of by other oral health measures, okay? But anyway, the reason why I'm going over BOP is I just want to tell you that's our outcome in the study is BOP. Like whether or not your gingivitis went away, that was the treatment. It was a treatment for gingivitis. And so it means that the effect is going to, like the outcome value is going to be somewhere between 0 and 1, okay? That's the outcome. So everybody's going to have an outcome of that value. And the reason why that's important is when we go and approach our power calculation, we have to have an effect size. Like so, so I'll continue with the right side of the slide. So CX stands for chlorhexidine mouthwash. So that can help with basically what's happening is your mouth is infected. There's pathogens in this chlorhexidine mouthwash. It gets rid of the pathogens and it therefore the inflammation goes down, the bleeding goes down. But the problem is it doesn't taste very good. It tastes pretty bad actually. And it stains your teeth. So the person I was working with had developed this natural mouthwash called NSM. And we were going to do that as a comparison. And so it's like, we were kind of doing a comparison study. So you might be like, well, you won't really expect an effect size from that. But let's just pretend we were doing a treatment study. And I mean, we're comparing like to placebo or something. Actually, I'll cut you to the chase. We did, we studied this mouthwash or something else. And I was really surprised it worked. When we were, we never did this study. We were just calculating sample for it. I think we decided it was too much trouble. But I really thought this mouthwash was going to be like a placebo. I didn't think it was going to work. But anyway, we really didn't know what to expect. My colleague is really cool. She did kind of a pilot study so we could see what the average BOP was on her patients. But it honestly, I just kind of had to do some gaseous. So okay. So now we're going to go over to G-Power. So first, I just want to point out, if you download the slides, it'll lead you to this web page. And what's important about this web page is actually in German. I told Chrome to like translate it to English. And so when you go down here, you'll see it's at like a university in, I think in Germany, yeah, in Düsseldorf. And you'll see there's all this stuff on here. It's actually a little tricky to find out where you download the application. It's down here under download. And you'll love these license terms, which is basically this is open for anybody. You can take screenshots, you can anything. And so here it says download it for Windows and download for Mac. You download that and you just run it and it'll execute. And then, you know, it'll install it on your computer and I'm running Windows here. So here it is. There we go. See, see this. So if you just run it after you install it, this is what you see. Okay. Now normally when I present software, I usually go through these menus, but honestly, I didn't even notice these menus until today. Like when I brought it up, I was like, whoa, I never even noticed those menus because literally I only look down here. So I'm going to show you how this basically works. So when you're preparing a study, the first thing you have to do is sort of design your study, which I just was describing. I was describing a study design where we have two independent samples and we have a continuous outcome, which is leading on probing that 0.7 or whatever it's going to be. What if you don't have that design? Like that design is basically like a t-test design, you know, because we have these two, these two, these two mouth washes. Let's say we didn't have two mouth washes. Let's say we had like three mouth washes, right? Then we wouldn't probably be doing a t-test at the end. We'd probably be doing an analysis of variance. And so here they sort of defaults to t-test, but let's just stop for a second and pretend where we had three groups. And so we're going to do analysis of variance. We would have to change this test family to remember doing an ANOVA. The first thing you do is an F test. We'd have to change it to F test because then you see all the stuff changes, right? So if we had three mouth washes and we're doing the F test, we want an ANOVA. So then I go to this choice and look for what I wanted. And see, I'd have to choose in here and there, I don't know that I once had a customer who did a repeated measures ANOVA. And I learned about study designs like that because I'm not, you know, those are more psychology stuff and I usually do epidemiology stuff. But if I had to do it again, I can. I know how those study designs go. And this is exactly where I would go to try and figure out how many people put it in each group. All right? But we're back on bleeding and probing. So I'm going to go back to t-test. And then what we're going to do is choose the choice for my situation. And for my situation, when you look at it here, remember how I said I'm always, almost always looking at the difference between two independent means. In this case, the NSM versus the CX group. But sometimes I'm also doing paired t-tests. So let's say the study design was different. Like let's say we were getting people at baseline, we're getting the baseline BOP. And then we're going to give them some mouthwash and then do a follow-up BOP. That would be a paired t-test. Okay? But we're not doing that. We're doing two independent groups. So basically I'm asking the question, how many people do I need in each group in the NSM group and the CX group? So let's choose that, right? But now I have the choice of type of power analysis. So just to remind you, when you're calculating sample size, what you're really doing is you're taking this big equation that has a whole bunch of variables in it. And you're setting the values of some of those variables and then solving for the other one, which is of course total algebra, which is why it's hard on me, right? I don't have any trouble choosing the values of the variables I want to enter. I just have trouble working it through the formula and getting the result. And that's what G-power does to me, does for me. So here's the type of power analysis. And notice how they name them and then explain what it is. Like for example, we could do sensitivity and that's where we compute effect size. That's where we're solving for effect size, right? But we don't want that. We want a priori, which is where we're solving for sample size. So we're going to compute sample size, but we're going to give them alpha, power, and effect size. Okay, so that's what we're going to do. Okay, so now everybody with me, I want you to look at this bottom part. So we filled this in, chose what kind of test we're going to do, chose exactly what kind of test within the test family we're going to do. We chose exactly what we wanted to solve for. And now we have to choose these input parameters. And then we'll click Calculate and our output parameters will come in, okay? So because I do a lot of biological studies, I always do a two-tailed test. That's what we were taught in biostatistics. Because you don't know, like I said, I didn't think NSM worked. Well, what if it worked better than chlorhaxidine? Like we've got to just not know, okay? Now effect size here, I just have this effect size was, and I was playing with G-Power before we started. We don't really know what this effect size was. And my colleague is like so cool that she went out and tried to do some pilot studies. So just for the sake of running this calculation, let's pretend we thought the effect size was like 0.2, okay? Now the rest of it, we just fill in, like this is alpha. Remember, we only really care about alpha equals 0.05, right? And this is error or power. And you know, I always go with power at 80%, and so does pretty much everyone else. But for some reason, G-Power defaults a 0.95, so I always change this down to 80 or 0.8. I would have to say since the 90s, like alpha equals 0.05 and power equals 0.8 are just like industry standards in biostatistics. So if you go to compute sample size, I wouldn't change any of those. I mean, unless you're like, if you're getting advice from me, then you don't know more than me, right? So you have to know more than me to know what you're doing with changing this, okay? Now with this allocation ratio N2 and N1, remember it's a t-test, so we're going to have two groups. And this is just me saying that I want it one to one, right? Like I want the same number in the chlorhaxidine group as I do in the NSM, all right? So now I've entered our parameters, and then I'm going to go down here and calculate, and everything is going to get filled in here. So are we ready? Here we go. Ta-da! Okay, so what just happened? Well, I just love this output. Here's a critical t. Okay, but let's just go down here. This is what we're looking for. Sample size in group one, 394 people. And in group two, 394 people. Do you think we could not do this? Okay, we cannot do 394 people at the dental office, right? So whatever I pick for effect size, this is not going to work, right? So that's when I started, let me move over to my documentation here. So this is actually, you can download this. This is from the slides in that blog. This is a spreadsheet. Let's go to the sample size estimates. So look at this. It says assumptions, alpha equals 0.5, power equals 80%. Outcome is difference in BOP, difference of difference in BOP between for visit one and visit three in CX and NSM. So what does that actually mean? It means that there's going to be a difference between visit one and visit three in the CX group. Like, let's say it goes down 0.2. And there's going to be a difference in the NSM group. So let's say it goes down 0.1, you know, like 0.1 BOP. That's what we're testing is this 0.2 versus this 0.1, this difference between the difference. Who knows? Who knows, right? So I figured, well, I can just set up these effect sizes. I can just, like, choose the effect size and see what I get for a sample because this is fixed, right? So what I basically did was I put in, like, see this 0.2 and we got 394 in each group. I think that's what I just literally did, right? Yeah, that's literally what I just did. So that's what's kind of cool about this is in the past, I've done power calculations using, like, power and SAS and keeping a bunch of documentation. And sometimes I can't even reconstruct exactly what I did. But this documentation makes it easy. And also, another thing you can do is this is a Word document here, is you can take a screenshot of your screen. So I could take just, this isn't exactly the screen. But if this was the one we were going to settle on, I could take a screenshot of it and that I would remember forever what I settled on, right? But of course, each time you calculate one of these, you're getting a, you're making a new calculation. So that's why I put it in the spreadsheet. So actually, let's just look at these calculations, right? Now that we are sure I'm doing it, right? Because we recalculated this one. So remember, this is total and is two times this. So I said, okay, what if there are a difference in point nine? Well, think about what I just said about the difference of difference. That's probably not going to happen. But if the difference are really that big, we'd only need 42 in a group. But that's pretty big. And so this is telling me that this is going to be pretty hard. There's going to be an inflection point with these, right? So when we get down to these smaller differences is probably where reality is. So I say, okay, well, what can we afford up here? So let's say we could afford to get 64 in each group. I'm going to extrapolate a little. Let's pretend our new mouthwash was funded by some big company as big corporation. They really care, you know, and my friend works at a university. So they really want this independent research on their multimillion-dollar new mouthwash. If that was really going on, then maybe they would fund the study with 64 people in each group. The issue would be that if there was a smaller difference and difference at the end between the chlorhexidine and the NSM, we wouldn't be able to see it. Like if it was, if God knows the difference is point two, I love poking God. But if the reality, the truth is point two and we are up here, we're not going to be able to detect that. Now that leads us to the conversation of, well, is that even clinically significant, right? Like would that represent a big enough difference that would matter between chlorhexidine and this new mouthwash? And I don't know the answer to that. Like I'm not a dentist. But if you were a dentist, you would know the answer then, probably be no. And that's just kind of what ends up happening. So as you can see, it's not that hard to use G-Power. And it's wonderful because at the end, you know, you feel comfortable with your results, but it doesn't solve your problem in that you still have to make these decisions. And that's where the spreadsheet comes into everything is I am good at helping people make these decisions. Now I want to show you something else in G-Power that like what we just did is we filled this in this input parameters, and we got this output parameters. And one of the things we did was I just picked point two here for this effect size. I don't know if you notice that there's this little determined button here. So let's say that I click determine and that comes out. You know, you can't hear it, but it makes a little noise. It's super cool. I love these. The Germans, they know how to design stuff. But anyway, so let's say I wanted to calculate the effect size for me. And I click on that and this comes out. So normally, like this is n1 not equals n2. And actually, technically, I probably should be looking at this one because it was an equivalent study, like this chloroxidine and this natural mouthwash the same. But let's just pretend I was doing what I normally do, which is a treatment study where I'm trying to see if treatment is better than not treatment or new treatment is better than old treatment. And so this would be the null hypothesis n1 equals n2. And so the mean in group one, the meaning group two. So let's pretend that my colleague had had a lot of some of her patients, gingivitis patients use the new natural mouthwash. I mean, it's natural. It's just not dangerous. And versus the chlorhexidine. And she said, well, the mean difference in difference of BOP, I guess, you know, between time one and time two. And one of the groups was point one. And the other one, it was point two. Okay. So which are small differences. So you still would have to do a test. And then we would have the standard deviation. And I'm not going to fake these. But, you know, literally, if you do a pilot study, if you really do not know what to fill in here. And you go over here, and this is a big deal. You don't know what to decide. Then the answer is pilot study. Literally, that's the answer. Like go get 10, 20 people in each group. You can. I mean, it depends on what you're measuring. If you're measuring something like blood pressure or like gingivitis, you know, bleeding a pro rank. Something that's pretty easy to measure clinically. Just get a 10 or 20 people in the group and get something out of it. And then enter it into here with enter the means and enter the STs. Okay. But if it's hard to measure, like it's labs or something, you know, try to look up some historical data from old patients, like old former patients. Or see if you can find something in the literature. Sometimes I'll assemble a whole table. You know, in this case, it was just an easy clinical measure. But sometimes I'll really scour the literature and I'll assemble a whole table of estimates. So I know I can put something in here and you can always recalculate a bunch of times. But I need to feel like I'm in the ballpark, you know. So, but anyway, so I filled this in. And what's going to happen is after filling this in and hitting calculate here, if I had calculate here. It'll just calculate the effect size down here. Right. But if I hit calculate and transfer the main window, that's what I'll do is go fill this in here. And then I can calculate here and get that 394. I guess this ended up coming out the same way. All right. And so now I'm going to go back to the spreadsheet. So basically, you know, we made these agreements and I did kind of two simulations. Like I was just kind of, this is me doing my own little pilot study. Let's see here. This is another one I did. If chlorhexidine reduces BOP by this much, like 0.6, and NS reduces BOP by this much 0.5. And here's the effect 0.1. And here's the effect size. So you can see what happened this. I'm simulating pilot studies here. Calculating the effect size using that determinant and then getting the N and H group. Right. Actually, just for fun, let's sort this and let's actually sort it by N and H group. If I can remember how to use Excel. So we're going to do N and H group. Let's do smallest to largest. Okay. Because we want to save money. So it looks like at 64, we can answer like these three scenarios because the effect size is here. Okay. If the effect size is any bigger, we really need, we almost double our sample. So we're up to 100 here. Now we're up to 146. And this is in each group. So you got to really double this in your head because that's your, your. So this one, I keep sort of hanging out at this 146 in each group. I feel like if we had, I'm really being imaginative here, but if we really had sponsorship from a company that they, like if they said to me, Monica, I'll fund you at any of these level. I'd say honey, just fund us here. Because honestly, if I don't think we find a difference at the effect size 0.33, then it's going to be hard to make your marketing materials. Yeah, perfectly honest. Here we go. So this was just a screenshot that I saved from that blog post. So, and I guess on the blog post, we had that pilot data. I don't remember I wrote this a long time ago. And this is just a summary from the slide of what I was just doing. So does everybody then understand what just happened? Like we downloaded, or I downloaded before you got here, I download the, the G power application, and it's in English. I just realized, you know, they wrote in German, but it's in English. I, you know, installed that and actually was playing with it. And I, but what I really want to show you here was just how it's not even so much G power. It's the fact that you've got this awesome calculator. So that solves this problem that you have when you're helping somebody with their sample size. And so you can get on with the hard part of trying to make these decisions of keeping this documentation of keeping like these, these spreadsheets because now I mean this happened years ago. And I can, if somebody said we chose this sample, this level, I can totally defend it. I totally know what's going on. And that's really what you need because at the end of the study, do you think you remember what you were doing at the beginning of the study? That's why you have this documentation. All right. Well, I hope you like my presentation. And if you weren't here at the beginning, let me remind you that I'm holding my free online workshop this or next month, like starting next month is in one day. But oh, happy Halloween, everybody. On November, November 18th and 19th, it's a Saturday and Sunday. So you might be like, why would I spend like a few hours, you know, starting at noon Eastern time on Saturday and Sunday with you, Monica. And the answer is because you'd have fun. We would be learning about application basics, integrating application pipelines. It's a workshop. Other people like you would be there and we'd all be on Zoom. We all be talking about applications like G power, but kind of not like G power because as data scientists, we get data out of applications. And we often don't understand exactly how they were built, but we don't want to go get a degree in computer science. So this is a crash course and like how applications are built, terminology around them, you know, how they're designed, who builds them. And people go through data scientists, especially go through my workshop. They often have like epiphanies about why certain things are a certain way. They're like, oh, I always wondered about that. And so please come along, sign up if you're interested. It's free. And also it's based on my free online course application basics, which is part of my online data science mentoring program, which you might be interested in. Mainly this course is part of the core courses. And so if you sign up for the free workshop, you get the free online course and you get to come and network with everybody and learn about applications. And it'll be a good time. I mean, if you're the kind of person that in your free time, you like to learn stuff like you go to the library or you go to a art museum. Then you're the right kind of person to come to this workshop because you'll learn a lot in a short amount of time and hopefully have a lot of fun. Thank you very much for showing up to my G-Power presentation. And I hope to see you again. And I hope you have a good week.