 Our next speaker is Dr. Megan Babies. Megan is an associate professor at the Department of Environmental Health and Engineering at Johns Hopkins School of Public Health. In a presentation today, Megan will be using an example to illustrate how researchers can use data from NHANES to address environmental health questions. Megan? Great. Thank you so much. Can everyone see my slides? Just doing a quick double-check. Yes. Wonderful. So I'm going to talk about kind of a personal story of getting into using NHANES, and I'm going to go through a couple of different maybe strategies and give you some tips and tricks, but really kind of frame it around this one case of a use that I did. And to start, I'd like to orient everyone to what I mean by environmental health and how this relates to the way we use data. And so this is kind of the classic environmental health paradigm where we start with exposure and we go through the different ways that the exposure gets into the body, how it interacts with the molecular targets on a cellular level, how that then leads to the pathophysiology that manifests as clinical disease. And so some people might think of this kind of piece in the middle as classic toxicology, you know, pharmacokinetics, pharmacodynamics and what that means in terms of the pathophysiology. And some people might look at your E and your O, your exposure and your outcome as kind of classic epidemiology. So just to relate this to things that you might have heard of and might be doing. But what I'll say is that the data needs you have along this framework are going to be entirely dependent upon your research question. And NHANES is very useful because it contains a lot of different things that we can kind of benchmark within this paradigm. So there can be data often self reported on exposure that might be held within the questionnaire survey data. There can be laboratory biomarkers of exposure, so measurement of different toxicants, for example, inside the body. There can be laboratory biomarkers of disease. So these would be measures of altered structure and function. And then there can be survey or exam data related to disease outcomes. And so really the first step is your research question. And you really need to think about the question first. I think that it's, you know, perhaps easy just to go browse NHANES and look at what you could look at. But I teach my own students, I know at least one of them is on today, that you start with a question. And then you ask, well, what do you need? Do I need exposure data? Do I need to think about upstream factors related to exposure? This would be risk factors for exposure to environmental agents, whether these agents are, you know, toxicants or chemicals, physical exposures or biological exposures. Do I need to think about exposure at the individual or population level? Am I looking at biomarkers or am I looking at survey data? And then do I need outcome data? Do I need to link the exposure to biological effects or clinical outcomes? What's my target population? What demographic or life stage is important for this question? What's the geographic scope? What is the exposure measurement in terms of the timing of exposure relative to life stage or disease latency? So if you're looking at cancer outcomes and current exposures, well, you know, an exposure that is going to lead to a cancer outcome will often have a latency of decades. What's the measurement technique? What kind of error do I expect within this measurement? And then in terms of the outcome, again, relative to life stage or disease latency, what's the timing I need to think about? And do I need to worry about the measurement technique as it relates to measurement error? And so I'd really like to give a shout out to Drs. Roger Pang and Elizabeth Matsui, some of my own mentors who have written this really lovely book. You can get it for free. And the link is here on Lean Pub. It's called The Art of Data Science. And this is really about connecting that question to the aspects of data analysis that are important. And so really it's about the fact that this is what they call an epicycle. It's iterative. You start with your question. You take a peek at the data. You identify that maybe there's a mismatch between the data you have available and the question you have. So you circle back, you refine that question. Again, you go back to the data. You start building your models, et cetera. But throughout these, you also are developing a priori or expectations for what you expect to see. And so it's easy to identify when the data don't match your expectations and you need to revise your expectations. And then also in terms of publishability, you want to start with a literature search as you are, you know, coming up with this question, looking at whether NHANES can answer that question. Well, you might want to first check and see if this question has been answered in NHANES. And so I'm not going to go into great detail because Yutaka has done a lovely job of setting this up, but NHANES is, you know, a one stop shop for answering a lot of different questions that you might have within the realm of public health. And because most data are publicly available and because there is excellent documentation and analytics guidelines available, I think it's pretty, pretty easy for people to begin to answer a question with this data set. And so most people are going to work within the continuous NHANES. These are those two-year cycles. And what you'll find is that you have data that become available over time as the laboratory assessments are being conducted. And so you may not have all the data available from, for example, NHANES 2017-2018 today. You may see that some of those come out later. And this is just a screenshot of what it might look like when you go into one of these two-year cycles. So this is from NHANES 2015-16. And you see that there are demographic data, dietary data, examination data, laboratory data, questionnaire data, and then the data that are limited access. And again, the extensive documentation on how to use the data. And so if we go into, for example, questionnaire data in NHANES 2015-16, you can see that there are a lot of different instruments. These are small sections of the questionnaire that was performed as they relate to specific topic areas. And if you go into the laboratory data, you can see, again, the same kinds of things are grouped together. So your codeine and hydroxy codeine from serum analysis are grouped together. But the ones from urine analysis are separate from the ones from serum. And these data have been used by many different people to answer environmental health questions. I'm just giving a couple examples from my colleagues here at the Department of Environmental Health and Engineering at Johns Hopkins Bloomberg School of Public Health. But what I did is I started out actually being a microbiologist and being very interested in the bacterium Staphylococcus aureus and how it operates and persists in the environment. And so I've had a question about whether staff aureus, in addition to creating infection outcomes, so skin boils, skin or soft tissue infections, et cetera, could also have inflammatory effects. And that's because a number of these strains can carry certain genes that will encode for proteins that are quite inflammatory, such as enterotoxins. And I was interested to see if they could play a role in non-infection outcomes. And so I've had a question around staff aureus environmental exposures associated with respiratory outcomes in children. And so I went to NHANES to see what was available. And there really isn't any environmental staff aureus assessment in NHANES. This is a pretty new area of inquiry. But staff aureus nasal colonization that is in the nose, and this is one of the niches where staff aureus can hang out for a period of time, and in about 20% of people it can be persistently there all the time. And it was measured in two cohorts. So this was NHANES 2012 and 2003, 2004. And then I looked at outcome measurements in NHANES. And there are both children and adults, and there are people with and without asthma. And so if I then alter my question, I went back to that epicycle, I iterate it, so okay, I can't ask about environmental exposure, but I can ask about exposure. Because I might look at staff aureus nasal colonization as a marker of internal dose of staff aureus, and I can look at this in the U.S. population. So when I look at the outcome measurements, respiratory symptoms are queried and lung function is tested in most years. But I only have staff aureus data from 2012 and 2004. And so here's the laboratory data from 2012. So this is the Methicillin-Resistant Staphylococcus aureus or MRSA data set, but it also includes Methicillin susceptible staff aureus. And I did a lit search to see what had already been done in NHANES. And there was data on lead and staff aureus colonization showing a very interesting relationship where at the highest quartile of lead exposure, there was actually reduced odds of staff aureus colonization, at least for MRSA. And then it was a little bit different for MRSA where actually the drug-resistant forms of staff aureus were enhanced by this measure of lead exposure. And the scenario here, as postulated by the authors, was that lead could exert selective pressure and it was not clear they did not have the data in NHANES to differentiate between their different scenarios of how this might occur. So if I go back to the target questionnaire data in NHANES, and this is just from 012, you can see that there are respiratory health questions asked. And, you know, there are a lot of different kinds of questions within this instrument. So you can see that there are questions about coughing, phlegm, wheezing, etc. And so if I put all this together, I have the same demographic questionnaire in 012 and 034, the same staff aureus lab data in 012 and 034, and the respiratory questionnaire, again, in both of these. And so I want to merge within cohort on the SQN, which is the identifier variable. That means that I need to take that demographic data set, the staff data set, the respiratory data set, I need to bring them all together and make sure that I have everything lined up by participant. And I did this separately for 012 and 034. And then I appended these data sets to each other. They won't match an identifier variable because they are two cross-sectional studies. And then I had to set the correct survey weighting. And so this is just a shout out that they're actually really great weighting documentation and kind of tutorial guides on the NHANES website. And so within my whole population, I looked at the association between staff aureus nasal colonization and respiratory outcomes, and these are just some of the ones that I've selected to share with you. And so you can see that there was an association in the entire population for emergency room visits for wheeze. But when I started to look a little bit more closely at the data, I realized that there was effect modification by age. And so here I have a smooth estimate of the association between staff aureus and these various outcomes. So each outcome is a different color. And you can see that the odds are increased. So there's higher odds of having these outcomes in association with staff aureus nasal colonization among those who are under the age of 30, particularly in children. And I have it truncated at age five because it's very difficult to diagnose asthma before age five. And I have it truncated at age 85 because of the kind of increase in comorbidities, respiratory comorbidities in particular that occur over time. And so you see that there's a bit of an inflection point, which is why I chose 30 as my cutoff for two strata. And so now here I present the data for those who are, you know, of an age to be diagnosed with asthma up to 30, and then 31 to 85. And you can see that now this relationship is much more consistent. And there was significant effect modification. And so back to the original question. So I did a tiny pilot study in addition to this and I conducted the NHANES analysis and from these is my preliminary data. In addition to the publication, I had two internal grants, a K award and R 21 award, additional publications and a commentary, numerous presentations, including a workshop at the National Academy of Sciences. And I shout out to my student who's on here who's already given a presentation on an additional analysis and NHANES is working on this within her dissertation and has a manuscript in preparation and there are additional grant proposals in prep. And so this is just to say that, you know, sometimes you can't fully answer your question and NHANES, but it can be an excellent starting point for the kind of data you will need to move forward. And so what about null results? Well, so you might look at an associate or you might look at an exposure in an outcome and see no association. And the important things to think about, especially when you're, you're starting to ask that original question before you even get to the analysis is, is this data set designed to answer your question. And if it was sufficient to answer the question, are you in the correct target population? Was the exposure assessment conducted correctly? Was the outcome assessment appropriate for the kinds of analysis that you're trying to do? Was there potential for bias or measurement error? And was there sufficient statistical power? And, you know, if you're mostly checking the boxes here, then, you know, this is really useful because these existing data sets can help you triage. And triage is a medical term that we use when we're looking at patients who are trying to decide, okay, if I've got a really critical patient that I don't expect is going to do so well, and I have a really critical patient that I do expect is going to do so well, I might need to focus my attention on the one that I think I can save. Well, some research questions are not going to go very far. And so this can be very helpful to lift some up to the point where you are moving forward with them and to maybe, you know, put some in the back burner and say maybe I'll come back to you when there is a breakthrough in terms of the way we measure exposure, the way we measure outcomes. So some acknowledgments for the incredible work of many of my collaborators who have been part of this process for me, and I'm now going to stop sharing and go to the Q&A. And I think I've gotten us back on time, which was one of my goals. Great, Megan, thank you. So before we start Q&A, I would like to remind you everyone one more time, please use Q&A box to ask a question to the speakers. And for chat box or everything else. Thank you, Rob. Sorry, things can do. So we have a question relating to interpreting a weighted estimate. So if you combine two waves or more of NHANES data, how do you interpret a weighted estimate? Yeah, and Utica, you want to jump in for these as well since you are kind of there at the mothership, so to speak. So what I did is I did use survey weighting because I wanted to estimate within the US population. So the estimate is to the US population, if that makes sense. Utica, do you want to add to that? I'm sorry, I was answering the questions and other questions and missed. So could you repeat that? Sorry. It was about the survey weighting and the interpretation of what you're looking at when you use versus do not use the survey weighting. Yeah, so I mean, yeah. There's a link I just put in the answer. Basically, there's a way to combine cycles that you have to actually recalculate a new weight. And the link actually explains not only how to calculate these weights, but it gives a work example and also talks about the interpretation of those when you do an analysis combined across cycles. Great. And I'm seeing that most of the Q&A are actually, I think, residual from our very first presentation. So maybe we can go ahead and catch up. Sure. So I think you definitely talked about this, Utica, regarding response rates and concerns about that and NHANES. So there's a question about what changes is NHANES considering to increase response rate in participants. Oh, and you're muted, by the way. That was the question I was trying to answer. So there have been ongoing efforts. We are making some changes, but I sort of hastened to go into the detail in part because I don't remember exactly maybe. And some of them I think we sort of, but I just tried enough to say that some innovative techniques that have been proposed and in other surveys, we have been trying to introduce those. That's probably I can say as much. I'm sorry. If you search the NHANES site or some other sources, you may find more information. Okay, great. And you talk to just a couple more questions related to mortality data in NHANES. So are the NHANES mortality data only available at the RDC is one? Yeah. No, there is a public mortality data, but those are not full data. There are some, it doesn't include all the cause of death. And also, it's kind of collated into groups. And also some of the data are synthetic meaning to prevent the disclosure. So if you would like to analyze mortality data to the full, to full extent, it's recommended to use the restrict data at the RDC. I think the public data made available as sort of feasibility data, not for the real study, just to see how many people have died or just get the idea for the feasibility. Okay. Okay. And then the final question I have relates to actually references your longitudinal paper that you mentioned in your talk regarding national death index mortality data. So the question is, is there a more updated national death index data set for 2020 that's currently available because I think the one reference in your paper was up through 2011. Yeah. Again, I don't remember exactly. It's updated periodically every like five. And then this just came in. Oh, sorry. I think if I remember correctly, I think there's a newer version available. But there are lots of things I can't remember. I'm sorry. But if you go to the, again, go going to the website with clarify, it says when data, new data, it lists the most recent mortality data available. Okay, great. And then this just came in regarding, I think this, this might go to you Megan. Could you explain how you combine two sets of data collected in two rounds in your asthma research? Yeah, that's a great question. And I think a lot of people have this because they may want to combine even more than two, but since I had Sephora is injustice to what I did is I downloaded all of the instruments. So these would be the, the files that are in a SAS format for all of the different pieces I needed from first 2001 to, and then from 2003 for I put them in different folders. I then combined merging data sets on the sqn identifier within 2001 to, and then I did that separately for 034. And then I appended the two data sets, meaning I did not merge on seqn, but I stuck them together. And the reason is the seqn are all independent, meaning that you're not going to be able to match on seqn, seqn variable, which is the identifier from one cycle to the other, because this is not a classic longitudinal study, even though there is there are ways that you took us has said that you can do some longitudinal analyses, but the most part you have to treat these as separate cross sectional studies. And then I went into the survey waiting documentation in order to create the correct waiting variable, and that may depend on not just what questionnaire data you have, but kind of what laboratory data you have what examination data you have. And if it was performed on a subset or not, it may adjust the waiting. So you have to read very carefully within the documentation, not just for the survey waiting, but also within the documentation for each instrument and each section. So it takes a little bit of setup, but once you have the data set set up correctly. The next thing I did is I made sure that I could replicate the analysis that had been performed in prior and Haynes, and then I moved forward with the exploratory data analysis, the model building, and then obviously kind of the finalization of the model prior to publication. So, if that kind of steps it through that might be a helpful way to think about it. Young Ju, I don't know if there's time for a question that I've thought of. No. No, okay. I'm sorry. I'm sorry. Megan did such a great job with the time. So I need to move on. We're putting it behind. We're really good. I understand. I'll hold it till later. Hold on for your dear love. Our next speaker is...