 It's LinkedIn Learning author Monica Wahee with today's data science makeover. Watch while Monica Wahee demonstrates how to prepare a data set for a meta-analysis in R. Hi everyone! From time to time, I get requests from people wanting to do a meta-analysis. If that happens, I first warn them that in a systematic review and meta-analysis, the devil is in the study design, not the analysis. You'll see I explained that in a blog post that I've linked you to in the description. But assuming you have a good study design, the next step is making a data set that you can make a forest plot with, and that's what we are focusing on here. For this demonstration, we are using open source statistical software R and R package R meta. We are also using the meta.dsl command, just pointing this out because our meta has other analyses. And we are combining the odds ratios for a discrete estimate, a yes-no outcome, between treatment and control groups. So, here's a data dictionary for the minimal data set. Name is a character string you can use to label the article from where you are getting the estimates. I like to put the articles in chronological order on the plot, so I can look for time trends. I usually include the first author and the year in the name label. Then for the next two fields, we have trt underscore denom and cnt underscore denom. Denom stands for denominator. So, trt underscore denom literally means the number of people in the treatment group, and cnt underscore denom means number of people in the control group. Then the next two fields are trt underscore num and cnt underscore num. Num stands for numerator. So the first variable has the number of people in the treatment group who got the outcome, and the second variable has the number of people in the control group who got the outcome. Remember, these numbers have to be equal to or less than the denominators so that if you put the numerator over the denominator, you have the rate of the outcome in that group. In other words, trt underscore num divided by trt underscore denom equals the rate of the outcome in the treatment group. Oh, yeah, and if you want to know the total number of people in the study, you have to do math. It's not in here. You need to add trt underscore denom to cnt underscore denom to get the total sample size for the study. Oh, and I was just recording one outcome here, but you can include more if you want. Let's pretend this has to do with COVID-19, and one outcome was the number of people infected and another one had to do with the number of people who died. How morbid? Well, it's a pandemic. Okay, so in that case, you still only need one denominator for each group. You would just have two sets of numerators. I usually add a suffix. Like for the treatment group, I'd have trt underscore num underscore infected and trt underscore num underscore died. And I'd also need those two other columns for the control group. So you just have to be careful when you are making the plot to call up the right fields. Okay, now let's look at the fake data I made for you for demonstration. The blog post explains the scenario. It has to do with ice cream flavors and happiness research being done by Disney characters. But anyway, notice that our data are structured exactly the way we designated in our dictionary. So now we are ready to go to R. You will see that first we read in our data set, which is called Goofy. Remember Disney characters? Before this, you need to make sure you install package Armada. So this is just us calling up Armada. Okay, here is where we are using our meta.dsl command. See how we are making an object called calc on the left side of the arrow? And what are we putting in calc? These variables that you will recognize in this order. Treatment denominator, control denominator, treatment numerator, and control numerator. We set our data to Goofy. And for the names that will appear on the y-axis, we set that to our names variable in our data set. Next, after we make that calc object, we run a summary of it. Let's do that now. See what this gives you? It's the info about the odds ratios you are getting ready to plot. Okay, now let's go back and plot them. Now see what we are plotting? We are plotting that calc object. Okay, let's run the plot. Okay, so this is what you get with the default settings. You'll see the squares are scaled to the size of the sample, and the confidence intervals are on there. You'll also see the tell-tale combined estimate diamond at the bottom. And that, my friends, is how you make your meta-analysis data and forest plot gorgeous. Thank you for watching this Data Science Makeover with LinkedIn Learning author Monica Wahee. Remember to check out Monica's Data Science Courses on LinkedIn Learning. Click on the link in the description.