 and you can do it great. Great. You're good. Well, hi everyone. I organized this symposium, and I carry on that speech, but not mine, so I have name. I am Yongjin, Data Information Starwatch Local Library. They're really running behind, so that's enough about me. Disclaimer, what I'm presenting may look like a research research, but it's not. It's more of an idea at this stage. I'm still finding some pros in my methodology, but it's an interesting idea. The previous speakers have focused on NHANES data. My focus is after you researchers study the data and find interesting and significant and publish. That's my data. So there are thousands of research articles published using NHANES data. So my question is, what stories do they tell? So as you can see, my research question with it is not really scientific. But to analyze publication, citation, you first have to get all the publications. So I use PubMed. So I'm looking at from 1980 to 2019, 40 years. So the way I harvest it is I searched NHANES full title, National Health and Nutrition Examine Survey, in quotation mark, and just to look for Titan abstract. I also searched NHANES in abbreviation and or it with the first search. One of the questions came when Yutaka was presenting, asked about, is there repository that we can look at all the publication? The answer I sent was no. That's why the search was complicated. So after I got more than 14,000 citations, I started cleansing, eliminate first positive. I found that there are more than almost 3,000 studies using Korean National Health and Nutrition Survey. So I deleted that. And there are about 20 South African National Health and Nutrition Examine Survey. And then I also deleted, not written in English and also in terms of publication type, anyone editorial. I also deleted CDC's Mobility and Motility Weekly report because I only care about journal articles. And because I'm looking in 1980s in 2019, I deleted previous to 1980 and this year. This is not complete. So if you want this data, you want to make appointment with our informationist, we probably carry unnecessary, but if you don't know anyone contact us. So this is what I have, 7,056. So I'm looking at this, well. I'm sorry to interrupt. I'm wondering, we got some feedback that your slides are a little small. I think you're in presenter mode possibly. So just. Okay. Is it better to my query? So I have 11,056. So I'm looking at this data. And in a brief glance, as you can see that the number of publications using NHANES is rapidly growing from 2000. So in terms of method of analyzing the publication, I first used topic analysis using MASH. If you never heard of MASH, again, time to make appointment with your librarian. And you go to PubMed. Let's say you find an article. This is published by Hopkins authors near vision impairment and frailty. Evidence of an association. It is published in American Journal of Ophthalmology. So if you scroll down forever, then you will find MASH. MASH stands for Medical Subject Heading. And these are terminologies that are used by National Library of Medicine to describe what the study is about. So you can see that about aged, or aged meaning 65 and older, and then aged, comma, 80 is older than aged. And then you can see that about frailty and of course, about human. When you look at this MASH, there are about 10 to 20 signed to each article. And some of them have at the end, asterisk star sign. It indicates these terms are major, major terms to describe the study. So these are my data. I only collected not all MASH terms because there are too many, and some of them are minor. I only collected major MASH terms. So when I look at them from 1980 to 1989, there are 234 major MASH terms. So if you look, this is a previous graph I showed you. So in 1980 to 1989, there are not many. So I rendered them using tree map. As you know, tree map graph is not very precise, but it can give you some general idea. From left to right, the frequency, patient frequency, the box will get smaller. From top to down, it will also get smaller. If you look at on the right bottom, that's where fewer ones are existing. What I noticed is that in 1980 to 1989, the articles assigned with obesity was actually smaller than articles tagged with lead. Then I also found there's something interesting that I didn't, there's a major topic for cancer and studying enhance. Until next decade, the number of unique major MASH increased to almost 600. So we're talking about this period. The number of publications have grown slightly. So if you look at the tree map, the thing that will jump at you is that there are a lot of terms that are used to describe the new MASH terms you described. That's the first thing. And then obesity became the number one condition assigned to the studies. Unfortunately, that is still here. Going to the next decade, the topics or subject assigned to studies of enhance has grown to 1500. We are looking at this period where the number of publication has grown substantially. If you look at this tree map, again, the one that jumped right at you is that there are so many new topics assigned to the studies that it's not even squared, it's almost dot. And then obesity is still number one topic. And again, lead is still here as one of the major topic studying enhance. Going to the most recent decade, the terms now have grown to almost 2500. So many new topics, new terminology have been tagged to the enhance publication that now it's no longer possible to have the dot. So here it indicates that there are just so many. Maybe there's one topic for one study. Again, obesity is still number one condition. So I am going to pause a little bit here because I don't want to lose any more completely. Looks like you're following, so I'm gonna move on. As for method two, I did, I analyzed title word. For any of you who have never heard of text analysis, this might be a little bit confusing. I'm not expert myself, but bear with me. What I did was I harvested title word. Remember the publication 11,056. I harvest all their title. So this is article written quarter by Stella, Asian-American dietary source of sodium and salt behaviors compared with other racial ethnic group and hence 2011 and 2002. I harvest all this title of all article. So it looked like this without frequency. What do I do with this? First I found that there are over 7,800 unique title words. Unique title word. The word that appeared most in title was adult. Now I'm gonna ask everyone, what condition appeared most in titles? Can someone tell me if anyone answered my question, the number one condition used in title word? Hi, Yuck Ju, someone said hypertension, someone said diabetes and obesity. Whoever said obesity, have been paying attention to me because you remember in my tree map, the obesity was number one, a mesh pegging. When I look at this different word in terms of frequency, what jumped right at me were words that represent subgroups or minority groups, such as African-American, Hispanic-American, Asian-American and Native American. I decided to pursue the studies on minority health. What do I mean by minority health? I'm trying to find title word that has African-American, Asian-American, Native American or minority racial disparities in title. What did I find? The articles that have those words, name of minority group or minority or disparities in their title are like this. If you made a title to describe their study, these are the studies that I can claim that about minority health. So it looks like it's increasing. One thing I wanna explain is that I didn't double dip. If there's a study, a study that comparing African-American and Asian-American, it's not two study, it's only one study. But in terms of increase, if you put that in context, if you compare that with all the studies published, then it's not that impressive. If you look at the percentage, the proportion of minority health in a total and hence publication is decreasing. Again, when you look at data presented like nice graph like this, it may look very authoritative, but my method, I'm finding some flaws. So I have to work on this. I am thinking that it's not gonna be significantly different. It also was interesting looking at subgroups. So I looked for subgroup, Native American, Asian, African-American, Mexican and Hispanic Latino. When I compare this to full group, this is like this. Then here I doubled it. So if a study compared, let's say, Mexican and Asian, it will get one for Mexican and one for Asian. So I doubled it in this case. And what I found is that I have compared the study more about Mexican-American than any other ethnic group. And then African-American was number one until 2004 and then outnumbered by other group. The studies on Asian-American is McRoying and then native American, very, very minor. Another thing I looked was journal titles. So many journals have published their studies with top 20 journals in 1980 and 1989. But like this, if you remember from total publication from this period from 80 to 89, there are only 233. So there are not many. But if you take a look, journal, American Journal of Epidemiology is number one and you recognize these titles. And then there are two environmental health journals. And I said that cancer was, I was surprised to find and here it is, there are two cancer journals making top 20 in 1980 and 89. Point to 1990 and 99. Well, graph has become a little longer because we have 686 publication. American Journal of Epidemiology is still number one. You recognize JAMA and pediatrics. Slowly, Opicity Journal is making top 20 and then Environmental Health Journal is still here. More recent decade, we have a lot of increase. Out of the number one is American Journal of Clinical Nutrition. You still find the pediatrics in JAMA. New England Journal disappeared from top 20 last decade. And then you see ethnicity and disease. So this explained the increase of minor health studies. And again, obesity is making top 20. The Environmental Health Perspective, one of them. The last decade, most recent period. So we have a lot of publication over 7,200. Now, plus one is publishing a lot, becoming number one. This is very impressive given that plus one was created I think in 2006. And then obesity is moving up. And the pediatrics is still here. We don't see JAMA anymore. Another thing impressive is that we have a free Environmental Health Journal making top 20. Is there a story to tell? I don't know yet, but impression is that the number of journal have increased, not just article increased, but the number of journal increased, which you don't find is surprising because the journals are popping up every day. What I want to explore, and I had a lot of time, I wanted to explore if the number of journal actually lead to more and more specialized journal. Meaning that if the article is about disease, they tend to be published more in ophthalmology journal as opposed to other journals. And same thing with environmental health studies, if they are, they tend to be published more in environmental health journal. So free environmental health journal making top 20. And same thing with health disparities. This is significant issue in terms of leadership. If the article is published in certain journal, and then the leadership will be smaller. But I want to emphasize the new wake-up approach literature review. If we have 10,000 or 20,000 articles to review, traditional method will not work because it will get very old by the time it's finished. So that's what I'm trying to come up with, big data approach to literature review. And I also want to emphasize the titles are important, not just for my study, but when people search article they're looking for, whether it's a Google Scholar, whether it's a PubMed, title player, very big role when retrieving. And when I was looking at title, there are a lot of words that I found not very helpful for searching. So be very thoughtful when you article is accepted. Next step we want to think about is can people find it easily? So think very careful about title. So I didn't say much about this, but data must be cited properly in references. One of the reason why it's difficult to retrieve a publication using particular data sets, such as NHANES, is because many authors cite them in text, which is considered to be informal citation when they should cite in references that make it easier to retrieve. So here is the beautiful picture that I show in the cover, Dr. Harvest Data from PubMed. You can also use PubMed access through Python, so you may want to try that. And for mesh setting analysis, I got code from Blizzard Circus, the NIO. So if you are from NIO, she will be a great resource to learn about research metrics and data management. So that's from me. I am going to stop sharing and you will have questions. You know, in life, questions are guaranteed answers are not, so feel free to ask a question. So Yanju, there were kind of a question and a comment, so you may have adjusted in your last slide, but there was a question about the visualization tool you used for showing the mesh term analysis. Was that using PyMed? PyMed, maybe I didn't do it right. I couldn't import mesh from PyMed. If you have any question about how I did it, I will be happy to share with you personally when I clean the code and then cannot move. I will share my code in ghetto, but for the code for using mesh to analyze article, that I need permission from Alisa because I borrowed from her. But eventually, eventually, if no one asked me any reference research question, I will have a lot of time and I will be able to work on this and make better plans. So I am over time, where I was, okay. We have a question and answer, a question. I would like to learn how you actually performed it. Oh, actually, I am breaking my rule. Rob, ask me a question. So I wanted to, yeah, so I see that one question and then there was a comment which I wanted to share about articles with non-English abstracts are harder to find. So they recommend having English abstracts. Although I guess in PubMed, those are represented typically, or there might be a translated abstract in PubMed where the titles and brackets, at least that's been my experience. Also, a comment about using standard terminology, which I know in other areas that I've worked with in terms of working with groups and trying to do systematic reviews has been a challenge. I don't know if you had any comments about the sorts of terminology you've come across or any of the other presenters. In terms of making these kinds of literature reviews that you're doing more systematic and easier to do. Do you have an answer, Rob? If you're the systematic review expert and Kerry, too. Well, I've certainly run across it in the toxicology realm that there can be some deficiencies in the mesh hierarchy in terms of describing some toxicology concepts. This really isn't, so I can speak to that, but not so much other ways of describing this data. But that was just a comment, not a question. So we can go to the next one, which is a question again about your data analysis tools and what you used, which I think what you're saying, what you just said was you'll share your analysis and how you generated those pictures. Yes, I can share my code. If you wanna know, if you want to learn how to do it, there are many sources that you can use. If you affiliate with Hopkins and also NIU, I looked up NIU, you have access to LinkedIn learning and formerly known as linda.com, so you can learn data visualization. TreeMap, I don't recommend using it. As I said, it's not very precise. It's difficult to find which one is bigger than which one and then the box gets really smaller. I try that just to get an idea once I know if something is going on, I will go for something else. There are many better resources than me because I'm just just studying. So that I can, if you're interested in learning, I can send you the link how I learned it. Now we have, we are done with the presentation. So I would like to ask the speaker in the final closing, if you can give a device to our attendees who are interested in using NIH for study but haven't very little experience because a lot of our attendees have very little experience, what would be your device? One minute, start with Megan. Yeah, great. My one piece of advice is always start with a good research question. And then identify if NHANES is the right vehicle to answer that question. So there are many criteria of a good question. I think it needs to be of interest. It shouldn't already have been answered and it should be something that is feasible to achieve. So based on whatever criteria you have, just spend a little time on the question first. That's my piece of advice. Itaka? Yeah, actually, before I go to my advice, I have to mention something. So I said, no, we don't, you know, places visited by NHANES are not disclosed. And there is an important exception to that, which is state of California and Los Angeles County have been visited many times in the past NHANES. And that fact are out. And also you can access the data necessary to do specific mission for California and Los Angeles. You can access those data through the RDC. Thanks. And so my advice would be similar to Megan. Yeah, it's easy to use NHANES, but it's important to be a good subject matter expert. If you keep studying a subject matter, there are always, I don't know, I shouldn't say always, but you can find something to study using NHANES. And in the last 15, almost about like 15 years of experience, things I think, oh, I wanna, I can do this. I wanna do this. Though the list is sort of getting longer and longer. And I think you invest your time and efforts. I think something like that would happen to you. Thank you, Itaka. And also, again, as I said in the beginning, I'd like to invite you to come back and give our full-length seminar because I know you usually give overview over an hour and you have to cut a lot of material. So love to have you come back. Now, next, Stella, one of the bikes, one minute. Yeah, I mean, I would echo what Megan was saying in terms of starting out with a research question and a conceptual framework. In particular, if you're interested in digging into some of the disparities pieces that I mentioned today in my talk, like maybe NHANES is this a best study design for you or maybe it is. So I think identifying and matching your research question to the data that's available and not trying to make a round peg fit in a square hole or vice versa. I think I mentioned this at the end of my talk, but I really encourage everyone to just think carefully about what representativeness actually means. I think don't be afraid to dig into survey documentation and see like, oh, how come native Hawaiian Pacific Islanders are not here or maybe like, how come who's in that other category? And I would say increasingly, over the last 10 years or so, there has been more and more attention placed on the other group, on the multiracial group. There's a lot of question marks about what to do with those individuals. I think past practices have been to delete them, but I don't think that that's gonna be the correct way to move forward. So, as many of you are newer to NHANES, not to say that you're necessarily young in your career, but newer to NHANES, I would say, don't be afraid to sort of dig in there and understand who you're actually representing and if that really is adequate to address the research question that you're trying to ask. Thank you, Stella. And also, I asked you and you said it right, yes, already. So I'd like to invite you back to give a talk in full length for those of you who are still staying with us. If you're interested in knowing all these new classes coming up, I will talk about how we can get in touch with me. Now, Josh. I would echo what Stella said, check the documentation, read it, reread it, and it'll answer a lot of your questions. They put a lot of work into it. You should put a lot of work into reading it and take its recommendations to heart. And if you are going off the beaten path in terms of analysis, I would highly recommend collaborating with a statistician or epidemiologist who is skilled in these particular practices. We can be very helpful in helping you make the most out of NHANES because it's an excellent resource for learning many different fields. So. Thank you, Josh. And, okay, so I'd like to introduce Gayan Yanokian, Executive Director of Biostatic Center. Gayan, could you give us a few words how, when they need help from Josh, how they can get in touch with me or about your centers, how we can help with our researchers at Hopkins? Great, thank you. And I promise I'll keep it under one or two minutes is because we're just finishing up. And so just to reiterate, so my name is Gayan Yanokian and I'm Josh and I are colleagues from the Biostatistic Department in the School of Public Health at Johns Hopkins, but we also belong to this smaller group called Johns Hopkins Biostatistic Center. And we're an applied arm within a department and we support, provide biostatistics and data management support. So it's a wider group of us who essentially do consulting, sharing of best practices, education, and essentially at all stages of research from study design and different types of study design, observational, clinical trials to data management, setting up databases, data quality control to analysis and publications. So for those of you who are within Hopkins, we're also partners with the Institute of Clinical and Translational Research, ICTR. And so if you just want to quickly find us, it's ICTR biostatistics program. And so we offer both virtual now walk-ins as well as abilities to submit short requests for smaller projects. And then of course those might develop into larger projects and maybe even long-term collaborations. We also do work with researchers outside Hopkins. So we have a mechanism for that. So if you actually find us on the website, the easiest way is to go to biostatistics at Johns Hopkins. And then there is a panel on the left-hand side that says Biostatistics Consulting Center. And so there are lots of questions explained both our combined experience as well as the type of projects we do as well as how to work with us, especially for those who outside of Hopkins. And then last small thing is the email would be johnshopkinsbiostatisticscenter. So JHBC at JHU.edu is another way to contact us. So I'll just stop here. Thank you, thank you. Maybe 30 seconds, could you talk about our walk-in clinic that you offer? Of course, and we actually offer two types. So one is for faculty and staff. And so those are three per week, Tuesday, Wednesday, and Thursday. And it's a nice split between different software programs. So Tuesday, we have SAS. And Keith Carson is actually one of the participants. So she's the one who runs the clinic. It's the full hour. And she can answer others as well, but her primarily software is SAS. We have Leah Jagger who works in R and that's a Wednesday clinic at 11. And so the one on Tuesday, 1.30, Wednesday at 11. And then Thursday at 11 as well. And that's me and I primarily work with Stata. And again, any other type of questions, this is primarily for short walk-ins. Just imagine if you just have a quick question about something. Again, sometimes it's hard to determine. So we might advise you to submit the request and work more extensively on your question. But those are the three days, Tuesday, Wednesday, Thursday. Thank you. Kevin, could you reopen the third poll? So regarding the voting for next year's data topic, I was told that there are many people who didn't get to vote. Maybe they were left. So maybe we can just get it later. But if you, I would like to say a few things. If you are on Twitter, I will be tweeting information for upcoming classes, any event using hashtag enhance 2020. You can also follow Megan and Stella and me. And also for all the residents, I will be sending the survey tomorrow. So if you fill out your email address, then you will receive information. Well, you can email me. I will include you when I send out the information. So I think that's it. So this has been really wonderful. I don't know what else to say. I didn't really expect it when I was talking individually. So thank you so much for coming. Hopefully we will meet again, really meeting again on site to meet our attendees. Thank you for all the work you did. Thank you, Yonju. Thank you, Yonju. Thank you so much. Thank you, Rob. Thanks everyone who came. Thanks to everyone. Wonderful job. OK.