 Our next speaker is Dr. Stella Yeh. Stella Yeh is an assistant professor at the Department of Population Health at NIU School of Medicine. She's a chronic disease epidemiologist with a research focused on policies and community programs to improve lifestyle behaviors in immigrant communities. Stella Yeh is also Hopkins alum, Yeh, having completed her PhD in epidemiology at the Green Book School of Public Health. In a presentation today, Stella will be discussing how research analyzing enhance can help to address the growing diversity of the United States with a particular focus on the Asian American population. Stella? Okay, hi. Good morning, everyone. Let me go ahead and share my slides. Okay, great. Is the slideshow showing up? Not yet? Okay. Great. Okay, good morning, everyone. I know a number of my colleagues and trainees and other people are on the call today, so hello, everyone. So I'll just get started with a brief outline of what I'm going to cover today. I'm going to start with talking a little bit about racial, ethnic diversity and research. We'll touch briefly on the model minority stereotype, and then we're going to discuss NHANES and Asian American representation, and then close with some summary thoughts. So many of you are familiar with the fact that the U.S. population is rapidly diversifying with regards to racial, ethnic makeup, and there have been several reports published by the Brookings Institute in the past few years that have highlighted these demographic trends. In fact, this transition to being majority minority, meaning majority non-white, has already occurred for 22 of the 100 largest metropolitan areas in the United States. So this is a map that displays the most populous minority racial, ethnic group in those 22 cities, with the number and parentheses illustrating the percent of the population that is white. So, for example, in New York City, where I'm located right now, the largest racial, ethnic minority group is Hispanic, and 49 percent of the population is white. And it's likely that the racial, ethnic distributions of these metropolitan areas represent a microcosm of the future diversity of the United States. Sorry, I'm having a hard time advancing my slides. Okay. Now, some of the growth of the population has been driven by the growth of the Asian American and Hispanic populations over the past 50 years. In fact, the Asian American population grew faster than any other racial, ethnic group in the United States between 2000 and 2010. And Asian Americans make up 5.6 percent of the U.S. population. So on an absolute level, it's still a small proportion of the overall U.S. population despite this growth. Now, I include this slide just to remind everyone who we are talking about when we are talking about Asian Americans. There are a number of different diverse ethnic heritages that are included under that monolithic category of Asian Americans. On the left-hand side of the slide here, you can see a list of 20 countries. This is not an exhaustive list. But, of course, if you label yourself as having one of these countries of origin, you would be categorized as being Asian American once you're in the United States. And I'd like to point out that this is not different, particularly for other racial ethnic groups. So, for instance, if you're immigrating here from Guatemala or from Mexico, you'll be categorized as Hispanic in the United States. Or similarly, how blacks from the Caribbean are grouped together with blacks from Africa or black Americans. So there is a lot of advocacy being undertaken by many of our colleagues supported by the Robert Wood Johnson Foundation, pushing for data disaggregation within racial ethnic groupings, which is beyond the scope of this particular presentation. So I won't go into depth. But to some degree, much of what I'm saying throughout this presentation with regards to disaggregated subgroups really applies to all different racial ethnic and immigrant groups. So just to kind of close out on the quick stats about Asian Americans, there are about 17.3 million Asian Americans in the U.S. as of 2010. The largest groups are Chinese, Filipino, Indian, Vietnamese, Korean and Japanese. And I highlight these six groups because they make up approximately 86%, 87% of the Asian American population. Now, the shift in racial ethnic diversity is troubling for healthcare and health research because we also know that current health knowledge, treatments and evidence-based practices have been developed primarily with white participants and patients. And despite calls to diversify research and recruitment, we are still lagging behind where we should ideally be. This brings us to NHANES and other large national health survey data sets. So you took up previously referred to the declining response rates over time, which is a larger concern for these types of studies. But I'd also like to spend a few minutes unpacking some of the key features of most national data collection efforts and the implications that I might have on the population capture and therefore diversity and representativeness. So first is the use of random digit dialing or RDB. Random digit dial recruitment relies primarily on calling those with landline telephone access, although there have been more recent designs and efforts to include sampling of wireless users. But RDB really fails to capture specific individuals. So for instance, those without any kind of telephone access, those who are medically underserved, or those who are unavailable at the times when surveys are conducted. So for example, working class individuals that have non-standard work hours. So RDB has really come under criticism for generating data that is representative primarily of older and white Americans. Another key feature of national surveys is that the geographical distributions of different subgroups, racial ethnic subgroups are not fully accounted for. For example, this map that I have here on this slide depicts a geographic distribution of Asian Americans across the United States. So the different colors represent the largest Asian group in that state. So for example, in California, the turquoise means that Filipinos are the largest Asian subgroup in that state. Or in Texas, the largest Asian subgroup is Asian Indian. But this distribution is often not accounted for in study design, recruitment, or sampling. So the result is that the Asian American racial ethnic category across different datasets can actually represent very different Asian subgroups with differing social determinants of health and health outcomes and cultural norms, etc. Which really makes it difficult to actually understand the current state of health disparities for the Asian American population. Lastly, another key feature of most national surveys is that they usually only collect data in English and Spanish. This is particularly problematic for the Asian American population because 33% of the Asian American population compared to 9% of the total US population have limited English proficiency and 75% of Asian Americans versus 22% of the US total speak a language other than English at home. So moreover, these linguistically isolated individuals tend to have lower income, less access to care, and worse health outcomes. So since these individuals are systematically not being included in large data collection efforts, our current understanding of health disparities are likely an underestimate of the health needs of Asian Americans. I mentioned before, I want to take a quick aside and mention the model minority stereotypes because understanding what it is and the consequences that it has on health research for Asian Americans is really critical. So just very briefly to define the model minority stereotype, it's this idea that Asian Americans are believed to have high educational attainment, low crime rates, and a lack of juvenile delinquency, a lack of mental illness, close family ties, they're a law abiding and have a hard work ethic. Now some of the consequences of the model minority stereotypes in particular with regards to health research is that Asian Americans are not considered a community of color or a disparity group. And in fact, Asian Americans are the most understudied racial ethnic minority group in the United States. And analysis showed that .01% of Medline articles included any Asian Americans in the study sample from 1966 to 2000, and .17% of the total NIH budget funded research studies that focused on Asian American or Native Hawaiian Pacific Islander populations from 1992 to 2018. And you would at least anticipate that these percentages would be somewhat close to the overall US population of Asian Americans, which is 5.6%. So there is a clear disparate amount of funding being allocated towards these populations. And this prolonged disparate funding will lead to worsening disparities over time. So I wanted to just show a couple of examples of how this actually looks in the real world. The first is a paper that was published in JAMA looking at prevalence and incidence trends for diabetes a few years ago now. But when you look at the graph stratified by race ethnicity, you can see that there are blacks, Hispanics, and whites on the graph, but Asians are missing. And this is something that you will often see in the literature when you're kind of trying to dig deeper and figure out exactly what Megan was talking about, like setting up your research question, what data actually exists, what analysis already exists. You'll see this absence of data on Asian American and Native Hawaiian Pacific Islanders. Here's another example from the American Journal of Public Health. This is looking at the effect of Obamacare on racial ethnic disparities and health insurance coverage. I know this graph is a little bit small, but it is also stratified by racial ethnic group and Asian American Native Hawaiian Pacific Islanders are also not included in this analysis. And this was using census data. So this would be presumably on everyone. This is a line in their method section. We focused on whites, blacks, and Hispanics in line with much of the literature on health disparities. So it's just perpetuating the idea that Asian Americans are not a health disparity population. So turning now to NHANES, an Asian American representation in NHANES. NHANES really recognized the important gaps in representation in health data and surveillance. And so NHANES started oversampling Asian American and Pacific Islander populations, which you took referred to starting with the 2011-12 survey waves. Just to go into a little bit more detail about the oversample. In 2011-12, the sample size for APIs was increased to approximately 750. In prior years, it was approximately 100. And they have been included as an oversample on every subsequent wave starting from 2011-12. And there are some health outcome measures for APIs that can be conducted using one two-year data set. But many of them require multiple combined waves to get point prevalence estimates, for example. Asian Pacific Islander is now available as a race ethnicity category. However, disaggregated data are not publicly available. That's something that you would need to access through that limited access data set process that you took over for two. So, however, while we are seeing many publications that now include the Asian American Pacific Islander data, so this is an example. I purposely did not put author lists on this. We are also observing the fact that newer analyses that have this new data in it are actually still not including Asian Americans routinely. So, you know, part of these patterns are somewhat attributable to the fact that these analyses are trend analyses that have baseline values from 15 to 20 years ago. But this also points to a larger limitation of the data, the NHANES data, that despite having been added, it will be several years or even several survey waves before meaningful trends can be presented for Asian Americans using NHANES, let alone any kind of longitudinal analyses with linked data sets. So, we have also undertaken a comparison to assess the difference between NHANES and other national data collection efforts and our locally collected large-scale data collection efforts that we perform in partnership with community-based organizations that serve the Asian American population. We refer to this as our CHERNA, or Community Health Resources Needs Assessment. So, using data from the 2013-15 CHERNA, we found that our community partner and collected data captured more Asian Americans that were lower income, lower education. They were more likely to be on public insurance or uninsured and to report fair or poor health compared to the NHANES sample. So, this points to the importance of complimenting national data, which may under-represent the linguistically isolated and Asian Americans with lower socioeconomic levels with community sourced data. So, I wanted to offer a few suggestions for best practices for analyzing NHANES for NHANES data for minority groups. As I mentioned before, some of these lessons learned don't apply only to Asian Americans, but also apply to other racial, ethnic, minority, and immigrant groups as well. So, number one, suggest presenting all available data in its most granular form. This is a recent analysis using NHANES where we assess disparities in sources of added sugar and high glycemic index foods across racial ethnic groups. And as you can see, we presented data on all racial ethnic groups and in the most granular form available, this is particularly important at a national level for the Hispanic population where those coming from countries other than Mexico, for example, those coming from El Salvador or Cuba, which has been a large sending, large sources of Hispanic immigrants in recent years are also contributing meaningfully to the growth of the Hispanic category. Okay, number two, it's really important to contextualize the data using language, basic language in methods in the discussion. So, for example, in this added sugar analysis, this was included in the methods, data for Hispanic and Asian American subgroups were unavailable due to limited sample sizes. So, you'll often see, you know, data are not presented on Asian Americans or data are not presented on other Hispanics because of limited sample sizes, but here we actually present those data, but we just kind of qualify the fact that we didn't look at subgroups. Also, in the discussion and in the limitations, we added a few sentences in the discussion. We said, in addition to important differences in demographics within Asian Americans by subgroup and country of origin, there is also large variability in cardiometabolic risk. So, just kind of contextualizing the findings that we don't have subgroup data, but, you know, it's something that should be considered when interpreting these results. And then in the limitations section, we believe our findings are largely generalizable to U.S. children. However, prior studies have determined that Asian Americans included in NHANES are skewed to higher income and better educated individuals, as such, the results may not represent low income and less educated Asian Americans. So, a couple other best practices, number three, dig deeper. So, understand who has actually been included in the analyzed NHANES wave with regards to socioeconomic status and subgroup. So, just looking quickly from the NHANES site, they have this sentence to facilitate oversampling of Asian Americans, which began in 2011. Selected survey materials were translated into Mandarin Chinese, both traditional and simplified Korean and Vietnamese. So, what this tells us is that NHANES may adequately capture and represent Chinese and Korean Vietnamese speakers and potentially English speaking across all groups, including Filipinos and South Asians, which tend to have a higher level of English proficiency than those other groups. But it may also be less representative of some South Asian subgroups that do have limited English proficiency for Pacific Islanders, for example. You may or may not want to actually include this kind of detail in your manuscripts, but the contextualization is important to kind of more meaningfully make comparisons with published literature, other published literature, which you often are doing within the discussion section. So, if you're trying to figure out why a result is kind of strange or off, it may be because you're comparing apples to oranges within your different subgroup population. And then number four, if possible, present in tandem with locally collected community data. So, we have sort of taken this approach where we'll use national or other administrative data sets, but then we'll also kind of make comments about the community-based data that we have. I'll just add a caveat to that, however, that this strategy has been met with mixed success. Some peer reviewers don't really like this and they don't really understand why we're doing this. So, if it's not possible to do that within your manuscript, you can always publish that separately and then you can refer to it within your analysis. So, just to kind of summarize, as we see now through these three presentations and Haynes' analyses are an important starting point to planning, resource allocation, and policymaking. And to illustrate this, I put together this non-exhaustive and sort of crude representation of a cascade of events that really starts with a primary data source like and Haynes. So, for example, you may do an analysis, which could lead to a publication. The analyses across some kind of topic area are compiled, maybe in the form of a systematic review. This could lead to further study, maybe in the form of funded grants. The compiled knowledge can actually also directly lead to evidence-based policy and practice, or it may also be fed into complex models such as AI or other micro-simulation models. And so, the idea is that if racial ethnic minority groups, including Asian Americans, are not included at this outset, then it essentially stymies the representativeness of the entire pipeline of future research and practice. And it's the systematic lack of data that will perpetuate the continued mismatch between the health research literature and the growing diversity of the U.S. and ultimately the health of the nation. So, I'll finish with my acknowledgment slides and I'm happy to take any questions. Thank you. Thank you, Stella. So, we have a couple of questions that have come in. One relates to datasets of other under-representative groups. So, are there any other national datasets that have indigenous representation? This question is finding that they have largely been left out of these kinds of national surveys. I'm guessing by indigenous the person means like Native American and Alaska Native type of population. That I don't know the answer to, unfortunately. I think that a lot, so like I mentioned at the beginning of the talk, there are a lot of limitations with the national datasets. I think that there has been a lot of progress in terms of those indigenous populations with sort of like one-off cohort studies that have been developed sometimes mostly in partnership with those indigenous communities. And so it's kind of similar in that sense with a lot of the themes that I was talking about with the Asian American experience that we found that it's really the community-based local data collection that is informing a lot of the grant efforts or informing a lot of the publications. And again, you'll sort of find this absence of you'll probably find most of those indigenous populations like grouped in the other category in a lot of the national analyses that you do see. I don't have a good answer for like a catalog of indigenous population research because that's not my area. But again, I think some of the themes really do cut across for Asian Americans and for other smaller underrepresented groups. So another, thank you. So another question relates to sample size. So the question is, what is a good minimum sample size for a minority subgroup for inclusion in analysis? And what number might be too low? Say if you're wanting to have a Asian American subgroup. So I see, I don't know if Josh or Yutaka have a better answer for this. We undertake a number of different strategies in our section. We always try to like look at the data that's available. And that's the first kind of strategy that we do. We also have adopted a strategy that is used by the New York City Department of Health and Mental Hygiene, which involves flagging estimates that are potentially unstable. So maybe those that have a large relative standard error, we will often sort of flag those estimates. And then if we find that if we're doing an analysis and you know, a lot of our estimates are really being flagged and they're, you know, we don't feel comfortable with sharing the results because it just doesn't seem very reliable. You can also combine survey waves. You can combine across survey waves. And of course, that's limited, right? Like, if you're looking at, if you want to look at the prevalence of some phenomenon and you have to combine six years of survey data to get the sample size you want, then there's a limitation to that too. So I don't think that I wouldn't put a, unless Josh or Yutaka or Megan have like another idea, I don't typically put a number on a minimum sample size. I usually, it's usually like a sort of like a more comprehensive exercise that you have to go through to, to, you know, feel like whether or not it's a reliable estimate or not. I completely agree with what Stella has said and endorse what she said a hundred, a hundred percent. If I could endorse it more, I would. And it also depends on what's the scientific question. Are you trying to get a prevalence estimate or are you trying to get, let's say, a logistic regression coefficient? And in that case, it depends also, what is your outcome? If your outcome is a low prevalence outcome, you're going to have higher standard errors there just on the virtue of you're looking at something that's low prevalence to begin with. So if you've got a low prevalence outcome to begin with and you're looking in subgroups, that compounds the problem. So it depends a lot on what you're trying to learn and what the scientific question is as well. Yeah, so I, something I can add is that so, yeah, Stella mentioned, you know, we, these things, we can talk about how, you know, what reliability of estimates as opposed to what sample size we need. And for the NCHS, National Center for Health Statistics, have some standards for these sort of thing. And for prevalence, there has been a couple of years, maybe two years ago, we put a new guideline for reliability, how to assess, how to, how to assess the reliability of a prevalence have been published. I'm going to detail about that. But you can look at that. And also, there is, for, that was for prevalence and the other estimates, which usually takes as a form of mean for means. Traditionally, sort of cut off is we'd like to have the relative standard error, meaning standard error divided by the point estimate of the estimate point estimates should be maybe should be 30% or less. That's sort of that. Then we don't worry too much about it. But it's not a very strict, you know, we publish some of the quote unquote, unreliable estimate with RSE of 30% or 40% or more. But we flag it, you know, tell the user's result, these are not reliable. So that's one way to go. And there are some very sophisticated mathematical manipulation, these RSE or other one, other one prevalence guidelines, they would be somehow translated, could be translated to the minimum sample size. And sometimes there's statistics called depth, design effect, those could be used to do those sort of translation, but it's too, it's a loaded question. So I'm refrain from going into the detail, but you can learn about it. But one attitude is, you know, you already have the data, try to estimate, and then decide if it's, once you estimate it in a correct way, you can get the idea of how reliable it is, and you can make a decision, or this is too, too unreliable, we're not going to publish it, or it may be published because as Stella mentioned, the meta analysis, you know, if it is different, you know, if there's a collection of estimate, each one may be unreliable. But when taken look at used as a whole, as a set of estimates, then they may be pulled and combined to produce more reliable estimates. So anyways, I guess I stop. Yeah, I think I would just advise like, thank you, thank you, Yutaka, for adding that. I think I'll just also advise like what we often see in the literature is that the Asian American sample is just not even presented, right? So like at the minimum, just look at it, include it, look at it. If it's not great, like we find ourselves in our group, like sometimes we can't include it either, right? And it has to be that sample sizing. But the problem is that if there's this sort of perpetual sample size, sample size, sample size, and it's like never reported, then there's no use in NHANES spending all this money to do this oversample if nobody's going to publish on the data. So I think that's part of it too, right? Like just look at the data that's there and then it becomes an exercise of deciding as to whether or not you actually want to present on it. Thank you so much. I think we need to take at least five minute break. Thank you Stella again. You should really come back to us because I know your talk is over an hour's seminar and the same thing to Yutaka, I will invite you later. I think we're going to take five minute break, right? Or six minute break. So please come back 10, 45, and don't be late because we're going to vote for our next symposium topic. We're going to choose the data. So let's take a break. So sorry Stella again for cutting the great conversation. Don't worry. It would have been very nice if we're on site as we originally talked about. Yeah. Yeah, that would have been great because I'm looking at the chat box and the I'm looking up the project for you right now. It's being led by the Asian Pacific Islander American Health Forum but it's really for data disaggregation that is an effort that started in 2015 and it was a collaborative effort between PolicyLink and Sir Robert Wood Johnson Foundation and it's about data disaggregation across all racial ethnic groups and in fact you may be familiar with the fact that in 2017 or so they made new proposals to change the way that the census data collects racial ethnic subgroup information and unfortunately the OMB turned them down. So at the federal level and so if you were to take the census now or if you've already taken the census you would have seen that the collection of subgroup data is not really represented well in the census 2020 but the health forum has continued the work and is looking towards data disaggregation efforts. I want to say right now they're focusing on four states. I think it's California, New York, Michigan and Nevada in terms of working with the state government to improve the collection of disaggregated data for different racial within racial ethnic subgroups but this is the more information on the project. I'm excited to have this conversation. All right let's move on Kevin. Do we have the winner for next year's symposium and he is the winner. You know it's a library and I found it surprising because I will say number one question that it has been H-Cup but and this is definitely the winner I cannot argue with the data.