 on how to enhance content, application, and analysis of the National Health and Nutrition Examination Survey data. For our conference today, we have a hashtag, so if you're Twitter, please use hashtag NHANES2020. Before we start, I'd like to explain how we want to communicate with you. So first, let me introduce our moderators, Carrie and Rob. Carrie is a clinical informationist at West Medical Library and Rob is basic science informationist. When you have a question for a speaker, please use Q&A box. Then Rob the moderator after the speaker is done, and then anything else, please use chat box. Carrie will moderate. And we will also use chat box to communicate with you and also send you resources through chat box. Now we want to know, okay, that's a normal emphasis about Q&A. Please tell us your affiliation. It's based on the number of registrants. So number one residents are coming from Johns Hopkins School of Public Health and then school medicine and nursing. And we have a fair number of folks coming from NYU, thanks to Dr. Stella E, who is one of our speakers. And we also have several from Towson, Harvard and NIH and other institutions. If you choose other, you can use chat box to write your institution. Kevin, if we have research, can you show us? It looks like a few other people are still voting. So I can't close it just yet. So we'll give it a minute, about 85% of people I voted. And for those of you who have not voted, you should be able to see a polling question. And that is open right now. And many of you asked about recording. The session will be recorded and all the residents will get notification when the recording is ready. And the slide will be also available as well. We'll see the results. Should we go to the second poll, Kevin? So the second question is, we want to know how much experience do you have with enhanced data? So if you never heard of it, but change here because it sounds interesting, that's number one, never heard of it, never used it. And then if you have some experience, you play with it, you know how to find the data, you search the variables, that would be number two. If you are familiar with the data, but have not either, you know, presented a conference or published, that would be number three, third option. And if you have publication, that will be fourth option. Seems like just a few more people are still polling. So we have a lot of never used. This is fantastic because our first speaker, Utaka will talk of extensive overview of NHANES. Or those of you who have published, I would love to hear from you in case we do another NHANES symposium next year. I'm always recruiting speakers and introduce our first speaker. Our first speaker is Dr. Utaka Aoki. Utaka is a senior service fellow at the division of national health and nutrition examination surveys at National Center for Health Statistics. He's an environmental epidemiologist cross-trained in biostatistics and toxicology. Utaka is Hopkins alum, yay, with his PhD from environmental health sciences and master of health science from biostatistics. Today, he will give an extensive overview of NHANES. If you're interested in learning about NHANES, this will be a great opportunity for you. Utaka. Thank you for the nice introductions. I wanted to start my talk by mentioning that I was a user of NHANES data, like some of you are, and hopefully will be. And 15 years ago, I first analyzed NHANES data, and then about eight years ago, I started working for NHANES. And I'm still using it and finding more uses and having more ideas. So I hope you'd find something of a similar experience in the future. So I will cover a lot of information, which I hope to be useful for you as you consider using NHANES data for your research. Here's the outline of my talk. Most of the information I cover can be found in NHANES web page. But you really don't need to remember this. You know, just type in NHANES in any search engine, and usually the NHANES page will come up at the top of results. So goal and history. NHANES National Health and Nutrition Examination Survey is designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examination. And really the key word here is examination. And hence provides US population-based estimates of health conditions, including awareness, treatment, control of selected diseases, nutritional status, and diet behaviors, environmental exposures, many of which are based on the objective measures coming from examination. Say blood pressure, BMI, iodine, urine, blood lead, these all come from examination. Most of the collected data available for free download. And not just this estimate, providing estimates for this, there are much more uses for NHANES data. Brief history of NHANES, it was started in 1960 as a national health and examination survey covering only adults. And then in 1971, as you can see, the acronym was got longer with N for nutrition. And HHANES, the first H for this was for Hispanic. But all these other NHANES earlier surveys are covered in participants of any race or Hispanic background or ethnicity. Age ranges, as you can see, expanded over years. And it used to be these earlier NHANES were periodic. But starting with 1999, it became continuous with the two-year cycle. And it covers it's been covering all ages, although some of the data were not collected for some age groups. Overall design of NHANES. NHANES sample represents US civilian non-institutionalized population residing in 50 states and DC. About 5,000 people are sampled each year. Over sampling is performed to allow reliable estimates for subgroups such as Hispanics, non-Hispanic blacks, older persons, or income whites. And starting in 2011, non-Hispanic Asians also have been oversampled. And still I will talk more about this later in her talk. NHANES stage probably some probability sampling design. Meaning that the counties, the locations selected at the first stage. And in the subsequent selection of the lower units of analysis, segments, household, and then subgroup participants are eventually selected. Data collection steps start with ascending advance later to the prospective participants. And screening is done with informal consent. They may go through informal interview. And with further consent, they may take part in the exam component of the NHANES. So screening is done at the doorstep of the prospective participants. And as I said, if consent is obtained, move on to the traditional computer assistance personal interview, CARP interviews. Hustle interview collect data on various health conditions, health related behaviors and exposures, health care utilization, prescription medications, and dietary supplements. So in the second component of examination is performed at mobile exam center, or MEC, which consists of four trailers, one, two, three, four, and inside of it, like this, you know, many rooms. But just to emphasize is that some of the interviews also are conducted as a part of MEC. I'm going to take you through a quick virtual talk of MEC. It's going to be very quick. But and I don't know, some components pictured are not performed every survey cycle. Reception area, cardiovascular health, hearing and vision, anthropometry and body composition, some image like this collected for whole body, muscular strength, which is actually grip strength, private interviews for dietary behavior, and also some other components, or health, respiratory health, laboratory, some blood and other sample collected and processed. Some laboratory tests are performed in the MEC on the spot, pregnancy test and complete blood count, and the rest of the samples are processed and sent to CDC and other labs, about 500 assays are performed. Other laboratory tests include biochemistry profile, nutritional biomarkers, diabetes, lipids, syrupy protein, hormone test, infectious diseases, and also some environmental exposures are measured using biomarkers in blood or urine, such as metals, pesticides, various other chemicals. Some of these are measured in pooled samples, allowing assessment of time trends and estimation for select race, Hispanic origin, sex, age groups. So there are lots of data. How do you know what data are available, what variables are available? You can search variables or browse and enhance websites. If you go to an enhanced site on the left side, there are many tabs and one of them is search variables, and you can search variables using usual keyword search. Also, you can on the left hand side, another tab, another tabs, other tabs for web pages specific to two-year cycles. And so you can go there and look at the documentations. And I think Megan will have some examples of this later in her talk. serum, plasma, urine, DNA, these are the specimens maybe stored for future research with separate consent from participants. It takes about three hours and a half to go through the exam for adults and teens. This is average and it takes longer. So it's somewhat burdensome for certain people, such as old people. For children, it's one to two hours. But as you can see, participants, you know, really invest their time and efforts to take part in. And what's the benefit for the participants? Certainly, one benefit is they get the results. On the spot at the mech, they may find out they have hybrid pressure. I have two carries, or as soon as the abnormal values are encountered in laboratory tests, participants are informed. And they can also call for STD results in a secure manner. They also get final reports. What else? They also get participants also receive renumeration, cash renumeration for time and effort, and transportation costs are reimbursed. Non-participants' parents receive renumeration for child or children. Additional renumeration for other components too. So we make efforts to increase the participation rate, but the response rates for enhance has been, unfortunately, decreasing. Like many other surveys, enhance used to have more than 80% response rates. But in the recent years, it's been declining. And in the most recent years, it has come down to about almost like 50%. This, as you can imagine, is a very serious issue we'd like to keep concerned about. How data are used? Here are some previous contribution examples. Pediatric growth charts are based on enhance. Blood pressure guidelines are based on enhance. For blood lead, current CDC reference value, 5 micrograms of liter per deciliter for age 1 to 5, also based on enhance, 2007 through to 2040, 2010 data. It was actually the value 5 was actually 97.5 percentile value for this age group. And it's actually, as you can see, it has actually no direct health relevance. 132 healthy people, 2020 objectives were based on enhance. Lots more. For the sake of time, I skipped that and move on to the topic of so what kind of enhance studies you might, you might, you might carry out. It could be descriptive or analytical in a different context, such as change over time, comparisons by group, for minority demographic SCS. And for minority health, Stella will talk more about in later, in her talk later. Design could be cross-sectional and longitudinal. Actually, longitudinal, I will come back to this. It may come as a surprise for you because enhance is a cross-section survey. But, you know, in many of these studies, usually they are underlying independent variable, independent variables and covariates. There are a lot of choices for these variables in enhance. Okay, so now let's go through some, just some examples. This is change over time, hyperattention, another change over time for HPV, the New York Times headline read HP vaccine is credited in full, fall of teen ages infection rate. Group comparison, this is a cross-sectional comparison and sort of a shameless personal plug. I was a senior author for this article in which we described the blood levels of the children living in hard assisted housing. And we compared the blood levels to the children living in, not in hard assisted housing, in living somewhere else. And we found that hard assisted children living in hard assisted housing actually had lower blood levels, you know, adjusting for their social economic status and other important potential confounders. Here's another example, again, personal plug. I was the lead author for this. We looked, we found that the blood lead is linearly associated with cardiovascular disease mortality risk. This was longitudinals and based on mortality data linked to enhance. And hence, data was used as a baseline and follow up outcome data were obtained from other, another data which is linked to enhance. The next section is analytic and methodological considerations. It is really just some tips for enhance based research. For any research, you start with searching for topic and for enhance based studies, you also would like to learn what data available for enhance and what you are looking into is what has been researched properly using enhance, what can be or other data set, what can be researched properly with enhance data. So, and you are looking for sort of, sort of, you know, unbeaten path, so to speak. And a little difference here from other general epidemiologic studies in general is that you go from topic to, you know, search for topic and, okay, what kind of data will you need to collect. But if I enhance, data have been collected. So you might have, might look the whole thing slightly differently. Given the data available, what I can study. Another, one thing I important to do is to learn how to analyze complex survey data. And you can take a course, it could be short or quarter or semi-star long courses, or you can self-teach. And I, at Hopkins, actually, I think I believe there is a quarter course on complex survey data analysis. And you can, for simple analysis, it may be suffice to go through some online tutorials. And hence, we also have tutorials for that. Another thing I can recommend you to do is that to pull out some previous papers and try to replicate findings based on public enhance data. And I did that when I first touched the enhanced data 15 years ago. And it's very, you know, gratifying that you find that, okay, my estimates actually exactly matches the values in the published papers. And you might go on to consider use of limited access data, which I'll talk about a little more in greater length. So strength enhance, strength of enhance based studies, these are so obvious enhance data nationally representative, nationally representative. So we have, we have a good general generalizability. And also data collected consistently, consistent in a consistent manner. So it increases internal, improves internal validity. But the very practical advantage is that, you know, there are lots of data for free for you to use. Some limitations, one limitation to note is local estimation will be limited. And hence goes only about 15 locations per year. And no geography at the state level below disclosed. Sample size may be small for some group analysis you might want to do. There are some manipulations of data to protect confidentiality, like top coding for age and some other data masking are done in public data. And as I mentioned earlier, declining response rates have been over many, many of ourselves and those users concerns in recent years. Here's some analytic considerations. Just to hear, I mentioned top coding and you can, for age, you can, and the solution will be categorized, use categorized age, you can impute, you can also use limited access data, which are recorded, which provides exact age and so forth. Another one just came to mind. My mind as I prepare this presentation is when analyzing data from multiple surveys, there's a concern for confiding due to year of survey. For example, suppose you are looking at two variables X and Y and they both have decreasing time trends. And you might see X and Y have an associated with each other. But if you look at data for each survey, there's no association. So, you know, the, so the time of survey, you're actually operating as a confounder solution. One easy solution is adjust for survey cycle, like, for example, by including survey cycle indicators integration. There are many more, but I'm going to go to the next one. So, next simple thing to consider is complex survey data, design of enhanced. And it's, in January, it's better to avoid using methods for simple random sample. And there's exception to this as for some sophisticated users can deviate from this with a certain justification for sure. This will be, Josh will talk about this later in greater detail. But typically, typically, complex survey appropriate extensions of usual random simple random sample methods are available. So it's highly recommended you run this complex survey method and use those. Usually, you just need to pay attention to two important to design considerations. One is waiting. And the other is variance estimation. And intervention of these could be relatively simple. For example, for waiting, you need to choose proper sampling weight. And for example, instead, you can just put in the weight variable like this. Or, and variance estimation also strata and PSU could be specified properly. Limited access data can be accessible through research data center or RDC. I put accessible in quotation marks because some data may not, may be only used as matching variables and cannot be directly accessed even at RDC. How to do this? This RDC is based research is proposal goes through proposal process. You submit the proposal to be to be reviewed for disclosure on disclosure grounds and excuse me. And once the proposal is approved, you'd conduct analysis at the RDC and retrieve results which we call sometime called output access acceptable on disclosure ground. But no raw data could be taken out from the RDC. There's some user fees. There are many RDC locations across the country. And here I listed three that's nearby. I mean from near, not far away from Hopkins, downtown DC, Rockville and Hyatt's Bay. RDC may be closed during COVID-19 epidemic. Types of limited data include sensitive data such as drug use for adolescents, personally or geographically identifying information such as, you know, old age values, detailed ethnic origin ancestry, geocode. As I mentioned, matching variables, these geocode generally speaking can be used mostly only for matching variables or merging with external data. And no geographic identifier at state or low level can be directly accessible even at RDC. There are some linked administrator, mortality, Medicare, Medicaid, social security benefit and housing assistance program of HUD. And some of these could be used for longitudinal study. Okay, latest update. So field operation of NHANES data collection has been on pause since March due to COVID-19 epidemic. Meanwhile, we have some surplus, you know, some tracks which haven't been used. So we provided that for the DC government to be used for COVID-19 response. Okay, I guess this is the last slide, I guess. So as mentioned earlier, you know, most of the information I covered could be looked at again or learned more at our website. And you can explore variables, learn about design analysis, use tutorials and so forth. If you still have questions, the best way to find more about is to send inquiry through CDC info email form, which is available from this website. And this is really recommended for inquiry. Okay, so I guess that's all I have. I will entertain your question. Thanks. Stop sharing. Thanks, Itaka. So we have a number of questions. Some relate to NHANES scope. So I'll start with those. So we have a question about amputation. Amputation is a variable or considered in the collection of data for NHANES? There are so many data. I don't remember the data have been collected, but I may be wrong. The best way to find out is just to go to that search page and you can put in amputation in the and to see if it's having collected. And for future, if we get some questions like this, will this be collected in the future? That's hard to answer. So generally, you find out when data come out. And there's no other good way to find out. Unfortunately. Yeah. Okay. Thank you. A couple of questions relate to children specifically. So the question about whether learning assessments are administered in NHANES and also whether there's screening for behavioral functioning in children. Yeah. So again, I'm talking from my memory. So please confirm this at the website. But there have been some components like some for like assessing the ADHD types of outcome. That's having conducted for some select years. And I think some of them restricted, you know, limited use because of the privacy and concern. So there are some such data. But I don't think it's been, you know, it's been corrected every year for all children. But there are some data to use on the behavior or developmental aspects of children. Great. We had a question that relates, I guess, to methodology in NHANES relating to the longitudinal character of NHANES. So the question relates to explaining how longitudinal analyses are possible. Is it just overall trends over time? Or is it truly the longitudinal in nature where the data on the same patients are available in different years? Yeah. So the way this works is that NHANES, in these longitudinal studies, for the most for the most of the time, NHANES data used as a baseline data to form a baseline cohort. And outcome data comes from other data sources, such as mortality data, which comes from national death index. Our center has that data set and there's a matching. And the data provided as a public also, some exact mortality data provided through RDC, but some like, you know, like feasibility data, some of the sort of, you know, simpler data available as public. And also some other administrator mentioned, Medicare, Medicaid, Social Security, those also could be used to as a sort of passive follow-up using those other data sets and pick up outcomes such as death, where people start receiving benefit for disability, which means they had some disability, you know, episodes. So that's how, so, like I said, you know, baseline cohort is formed by NHANES, then other data set that are used to do the longitudinal follow-up. I hope that answers. But you can pull out some of the actual longitudinal analysis publications. I had one, but you can search for those and you can see read the methods of those studies. You can get a better idea. Thanks. Okay. And then there are a couple of questions related to data analysis. So one, I mean, you touched on this, I don't know if you want to say any more about waiting, but a question saying does NHANES require any waiting I think you sort of addressed that, but didn't know if you wanted to speak more. Well, it's not like when the use of public data, we actually can't require much. People do whatever they want to do. And Josh is nodding and sometimes we summarize, but generally speaking, for example, get producing the national estimate of something waiting suddenly is needed. And in order to produce a reasonable, valid, appropriate variance for those estimates, strata and PSU also should be considered. That's in principle, maybe more later for sophisticated users, really sophisticated users. But for somebody starting to use NHANES always should wait. So that's my advice. So it's an official recommendation of NHANES also and also NCHS. So, Ra, and Yutaka. So Yutaka, thank you so much for your talk. Unfortunately, we don't have enough time to answer all these great questions. So if you can type the answer, they'll be great. We want to move on to the next speaker.