 So, good afternoon. My name is Vince Bonham. I'm the Senior Advisor to the Director of the National Human Genome Research Institute. We're pleased that you're joining us this afternoon. I want to welcome you to the 23rd lecture of the Genomics and Health Disparities Lecture Series. This series aims to highlight the opportunities of genomics research to address health disparities and address health inequities within our country. This lecture is hosted by the National Human Genome Research Institute in collaboration with our colleagues at the National Institute of Minority Health and Health Disparities, the National Heart, Long and Blood Institute, the National Institute of Diabetes and Digestive and Kidney Disease, and the Office of Minority Health and Health Equity at the Food and Drug Administration. Speakers are chosen by the cosponsors to present their research on the ability of genomics to improve the health of all populations with a focus on issues of health disparities in health inequities. The speakers in the series approach this problem from different areas of research, including basic science, population genomics, translational, clinical, and social science research. Today, Dr. Rere Admiral Sharde Arrojo, who serves as the Associate Commissioner for Minority Health and Director of the Office of Minority Health and Health Equity at the Office of the Commissioner at the U.S. Food, Drug Administration will introduce our speaker, Dr. Arrojo. Thank you, Vance, and good afternoon, everyone. It's truly a pleasure to be here with all of you today and to have the privilege to introduce our distinguished speaker, Dr. E. Mer Kenny. Dr. Kenny is the founding director of the Institute for Genomic Health and Associate Professor of Medicine at Genetics at Mount Sinai. She is a statistical and population geneticist. She leads a multi-disciplinary team of geneticists, computers, scientists, clinicians, and other medical professionals working on problems at the interface of AI, very large-scale genomics, and medicine. Her goal is to accelerate the integration of genomics into clinical care, particularly in diverse and underserved populations. She is also principal investigator in six large international programs focused on genomic research, medicine, and health, and is in the top 20 of NIH-funded genomics researchers in the United States. She is a scientific advisor to many genomic and genomic medicine initiatives across the government, nonprofit, and industry. And she has extensively published over 90 papers in leading journals like Science, Nature, Nature Genetics, and New England Journal of Medicine with over 13,000 citations. And her work has been featured in many media outlets, including the New York Times. She received her VA in biochemistry from Trinity College Dublin, her PhD in computational genomics from Rockefeller University, and her postdoctoral training in population genetics at Stanford University. Please join me in welcoming Dr. Kenny. So thank you very much, and thank you for that warm welcome. Thank you very much to my host at the FDA Office of Minority Health and at NHGRI, my I think NIH, NIH home, I think. So before I start my disclosures, so I wanted to spend a little bit of time today for this talk, looking back at our field and how far it's come in the last 20 years, I think there has been huge leap forwards in terms of technology and in the ways in which we've come together as a global community of genomics and how that has impacted genomics, not just in research, but also in medicine and actually in society. And because I'm here giving the talk virtually at NHGRI today, I really wanted to give a nod to the important birthday that is coming up and a historical perspective of the field of human genomics. So about 20 years ago last summer was released the first draft of the human genome project, the back to back papers of which came out in 2001, I think it was February 15th. And that really was an enormous landmark achievement for humanity. And at the time, many people put their bets on what the future of human genomics would look like. I don't know if anybody would have predicted where we are today and how far that we've come. So in the subsequent two decades, as a field, we've generated vast, databases of variants of human genomes and catalogs and tools for the community to use often open source and publicly available or available to researchers, research communities. We've surveyed the sequence variation and the genome structure of humans all across the world. And we've done this in ways where we openly share methods. We openly, to the best extent we can, shared data and we foster relationships these days, not only thinking about participants and researchers as samples, but as real partners in research. And this is particularly being through the emergence of global biobanks and community partnerships. Just last year, there was launched the human pan reference genome project, which can be seen as the successor to the human genome project in that it's leveraging what have been just dramatic improvements in genome technology. Just a month or so ago, the first end-to-end deploy phase human genome was generated and now the endeavor is to generate those for 350 diverse genomes around the world for our reference genome so that the scientists of the future will be using tools that look very different from the tools we use today. So it really is true that genomic technology is creating our biggest databases of knowledge. Just last year at the AGBT conference around this time, Illumina announced that 150 petabytes of data had come off their sequencers that year. And I was at that conference and I saw that number at the time. My eyes went really wide, so I went to the interwebs and I thought, what other entities are generating data like this? And if you go online to YouTube, they announced that they generated 693 petabytes of social media database in 2019. If you look at reports from other large research efforts like, for example, folks at the Large and Pedro Collider in CERN, they generated 25 petabytes of data in 2019. So genomic data is really becoming one of our biggest types of data we have in the world. And really what that means is that the opportunities to think about how we learn from that data and utilize that data for good in medicine and in society are really exploding. And not only that, but we're really getting to numbers in terms of assaying the genomes of humans, in numbers that are topping hundreds of millions. So we really are getting to a point in our field where we are uncovering genetic variation in a considerable slice of humanity. And of course, not only are we uncovering genomic variation, but often we link that to information about outcomes, about disease, about cellular phenotypes, tissue level phenotypes, and organismal phenotypes, not only in medicine, but also an anthropometry for biomarkers, for lots of different diseases and traits. And here's one of the databases. It's also an EHGRI and EMBL supported database called the GWAS catalog, which is a repository for information from genome-wide association studies that have been, since their inception back in 2005, up until today in 2020, that these days has grown to represent hundreds of millions of participants, but also broadly over 3,000 diseases and traits. So it's a very exciting time to be a researcher in human genomics. And the outcome of this is that we have seen genomics come into new arenas, certainly in medicine, not only in terms of diagnostics and family planning, but also into new arenas of preventive health and pharmacogenomic. And of course, you as a citizen can take your credit card and go to a company as a consumer and get your genome sequenced or genotyped. But with these exciting technology and the growth of knowledge and databases comes some challenges that I think we have to face. Here's one of them that is the focus of my research, which is that genomics is failing on diversity. And by that, what I mean is that the enormous databases that we have built in the last two decades that are the engine of data that we're using to drive discoveries and knowledge that we're bringing into other arenas are biased in their representation of humans on the planet. So taking the GWAS catalog as an example, but really you could pick any database as an example. It looks very similar. About 80% of participants in the GWAS catalog are from European ancestry populations, actually specifically usually from Northwestern European ancestry populations. And there is under-representation or even lack of representation sometimes from many other populations and places on the planet. So I want to actually talk a little bit about what this impact of this bias in our knowledge bases has meant. And I want to walk us through some examples from genomic medicine and bring us into examples in the knowledge of common disease and even the future of preventive health. So one of or two I should say of the genes that are possibly two of the most characterized genes in the genome, which were discovered back in the mid-90s are the BRCA one and BRCA two genes that were discovered because of their because carrying a pathogenic variant in either of these genes puts individuals at risk for breast cancer. Fast forward about 25 years later, we actually know a lot more clinically about what is going on in individuals who harbor a pathogenic variant in one of these genes. We know about this not only now in terms of average lifetime risk, but also within different age groups and not only in women and for breast cancer, but also in men for breast cancer and for many other cancers like ovarian cancer, prostate cancer, pancreatic cancer and melanoma. But quite shockingly to me at the time that we wrote a paper last year that came out in genome medicine in 2019 led by Nora Abel-Husson, who's the co-director and clinical director of the Institute for Genomic Health. Even though we had known you so much about these genes and variants in these genes, we yet didn't know a lot about them in diverse populations. So this was something we could remedy by looking in diverse populations in New York City and cataloging the prevalence of BRCA1 or BRCA2 variant positive individuals in different populations in New York City with recent ancestry from different countries in the world. And in doing so, we found that there is quite a difference in the prevalence of harboring a BRCA1, BRCA2 mutation in different populations that ranges from about one in 50, particularly in Ashkenazi Jewish populations due to well-known founder mutations in that population to one in 500 in, for example, populations from the Dominican Republic. But actually, one of the more shocking things about that study was that 74% or three-quarters of people who harbored a variant positive, a BRCA1 or BRCA2 mutation were unaware of the cancer risk that they harbored. And so that says that even though we have learned a lot in research and hopefully increasingly we'll learn a lot about risk across diverse populations, that information is not actually getting into clinical care as much as we would like or as quickly as we would like. So that has really led, I think, to a lot of interest in investment into generating genomic data within health systems for the study of how we translate findings from genomic work and implement them in clinical care. This was something that really brought me back to New York City as I was finishing my postdoc and I was looking for faculty jobs. As a population geneticist with these interests, New York City just seems like a place to be, a canonical melting pot of the U.S. that's really representing not only diversity of populations in the U.S. but also from around the world. So New York City is about his five historic boroughs of Staten Island, Brooklyn, Queens, the Bronx, and Manhattan. And if you divide that into the 70 or so storied neighborhoods of New York, then you can paint those neighborhoods different colors based on the majority of a particular population group living in different neighborhoods. And this is using data from the U.S. census from 2010, hopefully soon to be updated with the 2020 census. And what you can see in New York City is that the neighborhoods are very colorful, representing neighborhoods that are the majority Hispanic, white, black, or Asian, or in purple neighborhoods that are and have no one majority population group. But that's only one way to look at diversity in New York City. Another way to think about diversity is in terms of people's stories and their heritage and their experience and their families and their ancestries. And this is the topic of a website called Humans of New York that I recommend to anyone if you're interested, where the author just walks around the streets of New York and meets people and asks them questions about themselves. And in those anecdotes and stories you learn a lot about the rich cultural history of New Yorkers and their ties to New York City and also beyond New York City where they or their family may have come from. So in the School of Medicine at Mount Sinai we have established a biobank of which is a repository of DNA and this was established in 2006 we started recruiting in 2007 and it really was a lot of foresight on behalf of leadership in the hospital to understand early on that many of the databases that were growing in the field of human genomics research may not be representative of the patients that we are serving in the hospital and therefore it was worth making an investment to develop our own resource for linking genomic information to health outcomes and driving research in diverse populations. So today there's about 60,000 participants enrolled in the Biomy Biobank. It's more or less a population-based biobank. There was a little bit of ascertainment for particular diseases in the beginning but after a while it was actually more or less the only inclusion criteria is that you are a patient in Mount Sinai health system. At this point we've generated through both grant funding and industry partnerships sequence data, whole genome, whole exome and array data for approximately half the participants with another tranche of data you actually next year. At the point of enrollment for the Biobank we collect census level data that includes questions about people's personal and family health history but also questions about people's ancestry and self-reported identity. So that if we take that information and we ask about the diversity in the Biobank we can see in the Biomy Biobank that about 36 percent of participants are born outside mainland US. This is not unexpected it's very consistent with the denizens of New York City and if you go to nyc.gov and you see similar statistics about the demographics in New York City and you can see by painting different countries in the world where people were born that there's quite a lot of representation here with particular enrichment for people who were born in areas of the Caribbean and if you go back two generations so Biomy participants or three or more grandparents born outside the US you see that that patterning gets even richer about 65 percent of participants have recent ancestry from somewhere else in the world. The patterns change a little bit because of course now we're going back in time to approximately the beginning of the 20th century and world events that were happening at the time but you can see that there's about 160 countries in the world represented here in terms of ancestry and recent country of origin and stories of diaspora migration to New York City. Now if you think about that from a genetic lens then you start to see another lens upon ancestry and this is a tool that we use in population genetics a lot it's called a principal component analysis and we're looking at this data on two dimensions only in fact the data really is very very multi-dimensional but visually that's hard to see so we look at it first on two dimensions and every dot here it represents the genome of an individual and the distance between any two dots represents simply how similar or dissimilar the genomes of individuals are to each other and we use this as a tool in population genetics to then make an inference about genetic ancestry of participants and to help us out here I've also in colors included populations from Africa from seven continents Africa Middle East Europe South Asia Oceania and East Asia and the Americas so straight away I think you can notice something very striking and that is there is at least in New York City genetic ancestry of New Yorkers is represented very much on a continuum of genetic diversity that is really overlapping with ancestry from seven continents around the world and at least in genetic ancestry space it is very hard to divide people up into distinct groups rather what we're seeing here is a continuum of genetic ancestry that is very consistent with our history as species and our origins in sub-Saharan Africa and our migrations throughout almost everywhere in the world since then one of the interests in my group is to also think about ancestry not just in more ancient context but in very recent context particularly in the types of personal experiences and migrations and family stories people have about their migration to or through New York City and so we use a tool in genomics called identity by descent sharing so this is basically where you may share long haplotypes of genetic of your genome identical with an ancestor who may be cryptic to you in your history by virtue of being separated by 5, 10 even 50 generations but with whom you still share identical tracks of genetic ancestry and for the most part these are people walking around in a population that you may not know you're cryptically related to but we can pick this up genetically and what this tells us a lot is about at certainly at a population level is about more fine-scale demographic events for example like bottlenecks like patterns of endogamy patterns of migration patterns of admixture and one of the interests in my group is really to understand how this recent population structure particularly as in the coincides with an era in human history where we have really accelerated intercontinental travel and exploded in our numbers on the planet how all of these factors can actually impact diseases and health outcomes so to do this I won't go into too much detail because this paper was just accepted in nature communication it's also open bio archive but just to let you know these types of methods involve a lot of kind of heavy lifting in the company software engineering side and so we had to develop a method called eyelash using locality sensitivity hashing to allow us to detect IBD at a population level for every pair of individuals in biome in reasonable time and this is work that was led by Ruby Shimarani and Jillian Belbin and with colleagues Jose and Louise Ambient and Christian you so and if we detect these distant signatures of cryptic relatedness at a population level in New York City and then we use community detectors to find communities of people who have more IBD sharing within the community then between communities and then we correlate that with information that we know about recent country of origin or cultural or other types of population group labels then it turns out that this way of detecting recent history and demography in New York City is highly correlative with country of origin and different population groups so for example you can see that we can detect with with quite good accuracy and people of Puerto Rican ancestry or Puerto Rican versus people of Dominican ancestry or people from the Dominican Republic and this turns out to be important if you're interested to understand if there's particular health and outcomes that are enriched or sometimes even depleted so be protective in those communities particularly when these are understudied so this was something we did by linking these communities to the health system data and repurposing billing codes so ICD-9 codes for research so we this data is not data that's collected for research but you can be opportunistic and link this data because it does carry information about the health of patients in our in our health system and use it for discovery research and we used an approach called Phenomide Association and we looked for ICD-9 billing codes that we had collapsed into something we call fee codes that represent diseases that might be enriched in in this case in the Dominican community and not so in other communities and straight away we found a particular cluster of these fee codes that were highly enriched in Dominican populations and this this fee codes encoded a number of codes that were peripheral artery disease or related to peripheral artery disease so for those who don't know peripheral artery disease is a narrowing of arteries that reduces blood flow to peripheral arteries or limb and it's usually caused by atherosclerosis and it's usually a downstream effect of atherosclerosis it can cause pain that's caused by compromised blood flow and the progression of this condition can lead to things like ulcers ulcers or even amputation but much like atherosclerosis there are many interventions for this disorder including things like statins so but of course we don't know that there's necessarily a genetic component to why we're seen an enrichment of peripheral artery disease in populations from the Dominican Republic because again we didn't collect this data for research it's just health systems data so there could be lots of reasons there could be something environmentally going on something societal there could be things like issues of health access or where people are getting treated so we try to rule in or out what might be going on certainly we did not find anything previously published in the literature to suggest that this might be a disorder impacting this population at increased rates so we decided to apply some genomic approaches that were really tailored for the unique history of admixture that occurred in in populations in the Caribbean particularly in the Dominican Republic so due to the history both of colonization of indigenous populations that were living on the island and through the slave trade that very much came through these parts of the Caribbean antecedents today in the Dominican Republic or living elsewhere in the US but with ancestry from the Dominican Republic share that history in their genome in terms of genetic ancestry here's one way of looking at at the bottom of the plot this is called an admixture analysis where each bar represents a genome and you can see that in populations from the Dominican Republic individuals can harbor African and European and Native American genetic ancestry but at different rates and in fact you can almost tell that no one person is exactly the same as as another individuals can be have high degrees of African genetic ancestry or low degrees of African genetic ancestry and just to show you looking at other populations in the Caribbean you get similar patterns but slightly different depending on different population histories in in islands in the Caribbean so that feature of admixture allowed us to use a method that actually is a method that's been around for a couple of decades it fell out a little bit of favor in the in the era of GWAS where we're very much looking at SNP-based association but this is a haplotype-based association that leverages the ancestry along a genome so if we look at admixture in the Caribbean in the Dominican population on average individuals have 40 percent African ancestry 50 percent European ancestry 10 percent Native American ancestry however if you look at the stereotypes of individuals and in this GIF this is cycling through I think 20 different individuals you can see at every diploid locus of a given chromosome that that ancestry that haplotype itself may come from Africa Europe or the Americas and and any any any given individual may have a different haplotype at any at a specific locus so we leveraged that feature of genomes in the Dominican population to perform admixture mapping in other words that's similar to SNP-based mapping but also including the local ancestry inference of African Native American or European ancestry at the locus and this work was led by Sinead Kulina and Jen Wojcik and sure enough we found a signal on chromosome 2 that was shared predominantly in the Native American and European tracts of admixture in the populations and this was at a locus that includes the fibronectin 1 gene which is which had been previously associated with vascular thickening and has been observed with increased expression in different aortic valves and and other vascular chart so this particular variant that was finemapped to this locus is a very good candidate as a variant that's enriched on the background of European ancestry and Dominican populations that is linked to peripheral artery disease in those populations and the other thing we can learn about population structure is also about founder effects in fact we learned to our surprise that there are many fine-scale populations in new york that exhibit signatures of founder population maybe it's not so surprising because founder effects occur through the process of migration itself in addition to other processes of cultural endogamy or sometimes in isolation but from a genetics perspective this is interesting because of course after a bottleneck there is an increasing chance that at any given Mendelian snip might arise to appreciable frequency in a founder population where there has not been enough time yet for purifying selection to select it out and indeed we found that was true so when we looked in the Puerto Rican population that harbored a founder effect we discovered a mutation in the col-27a1 gene that in a recessive state is causal for steel syndrome and this had been both described clinically previously and then the molecular etiology of the disorder was discovered through clinical sequencing the year before but now that we had that information we could link back to our health system data and start to look at the clinical characteristics of the disease and sure enough in the five individuals who were recessive for this variant if we look in their medical records they exhibit very similar clinical features including congenital hip dislocation extremes short stature and things like cervical spine anomalies and large joint replacement before the age of 50 and at the time we published this work in a paper in elife and this work was led by Jillian Belbin but now that we had data actually linked to a health system we could go a little bit step further and explore and clinically characterize what was going on in this variant one of the things that we were particularly interested in was whether this disease was fully recessive or perhaps there might be a semi-dominant or heterozygous effect and certainly this was important because although only five people carried both copies 170 people carried one of them and so when we did a few was sure enough it turned out that if you even carried one copy of this variant then that was associated with other spinal and joint anomalies and clinical experts who went in and did a careful manual chart review in individuals below the age of 55 uncovered that harboring even one copy of this mutation resulted in spine degradation ranging from severe almost as severe as in the recessive state to moderate to asymptomatic or at least asymptomatic in in that we could detect so this further clinical characterization of this variant really unhelped us understand what might be going on in fact based on this research we think that the single variant drives spine and joint degradation in upwards of two percent of individuals with Puerto Rican ancestry and we could tell this by looking where in the world this variant was segregating and and this was exciting also because working in a hospital this didn't just stop with a research finding we could cross the hall and talk with our clinical colleagues and think about points of entry to a health system where kids with who were born with this order might come and we worked with colleagues in pediatric endocrinology and orthopedics who might see children with severe hip dysplasia we also talked to our colleagues in the medical genetic testing laboratory at the time which is now spun out to a company called semaphore and they based on our evidence and emerging evidence from the clinical literature developed a test to diagnose this disorder and now patients who come in through this system can be referred to medical genetics for diagnosis of this disorder and this is an example of a syndrome that was very fairly under-studied and under-characterized in the literature that is much more prevalent than previously thought and much more nuanced in its clinical characterization and certainly very under-diagnosed if our health system is an example we did search the the notes and clinical records for any mention of anybody having a diagnosis of a steel syndrome and we could not find any even though there must be based on numbers hundreds of individuals of Puerto Rican descent in our health system who have this disorder and as we sequence more and more humans on the planet I think we're going to see this more and more about variants and Mendelian variants in terms of understanding what they're doing at a population level and understanding that there are probably in some cases more prevalent than we had previously appreciated and maybe not always as penetrant as we had thought based on our ascertainment of severe clinical cases but sometimes and usually under-treated even when there are for some diseases good therapeutics existing. Okay so in the remaining time I'm going to talk a little bit about moving from genomic medicine into thinking about preventive health and come back again to the GWAS catalog because of course one of the things that's happened very recently just really only in the last couple years the GWAS catalog has achieved a size that has reached that sort of tipping point in terms of predictive ability where genomic discovery data can now be used to predict out of sample for those outcomes and diseases and disorders but again we're still in this situation where we know that the genomic databases are biased in representation so a couple of years ago one of and this is work led by Alicia Martin who was a grad student at the time in Carlos Postmante's lab as I was a postdoc there we posed the simple question does this bias in our discovery genomic databases impact our prediction of these outcomes and traits in diverse populations and in this paper that was published in HHG in 2017 we showed by looking at this using the 1000 genomes data that if you naively take a polygenic risk score at least the ones that were available to us around that time and you predict the trait in non-european samples and that those predictions are actually quite biased and so for in this example I'm showing you height where if you take a PRS for height that was generated using predominantly European populations it would predict that all non-european populations are shorter than European populations and of course that's not true at all there is a lot of variation in height in humans on the planet but it's not organized by continents at all an average across continents height seems to be very similar around the planet and so the empirical phenotypic evidence does not support this observation in risk prediction which led us to conclude at the time that the differences in polygenic scores were reflecting a technical difference rather than true biology and that we noted that the prediction actually decays with increasing genetic divergence between the discovery population and the target population and at the time we concluded that neutral human evolution was sufficient to explain these differences in other words differences in neutral variant frequencies and neutral and LD architecture between populations was really what was driving these differences and since that and another paper at the time that came out from David Baldwin's group in London I think many many people now have are doing research in this arena and it turns out that our simple explanation probably isn't sufficient by itself and I'm going to get to that in a minute but of course when we think about predicting using genomic data in two samples that were not part of that study and that we're using that as predictive value we have to also consider things other things we know in genomics so for example selection and and the impact of non-genetic factors like the environment particularly for a species that has so much exposure to so many different environments around the world who lives in a very complex structured society that these things will also come into our ability to predict so here's a very good example of that and I'm biased here because it also was a project that I worked on in which we uncovered a variant in a gene called TYRP1 again in a recessive state this actually has a big impact on hair color in populations from the Solomon Islands in Melanesia in fact in two copies it explains about 50% of the variants in hair color and as you can see in this photo here and this child has beautiful blonde hair as a result of harboring two copies of this variant and at the time we went looking for it in databases for from Europe because one theory might be that this was a variant that traveled to this part of the world in some of the early days of sailorships that came from Europe to Australasia in fact we showed in that paper that we didn't think that that was the case at all we couldn't find it outside of the Solomon Islands at the time and we showed no evidence that this had a link back to European genetic ancestry so we concluded in this paper that this was a variant that arose independently driving blonde hair in the Solomon Islands maybe even before there was blonde people in Europe to my excitement last year when the first tranche of UK Biobank exomes was released and you know there was a number of blogs about the initial findings it turned out that the same missense variant also occurred in white Britons in three copies out of 50,000 individuals so very very rare and it by itself was not strong enough observation to make a link to a phenotype but in aggregate with other missense or pathogenic variants in that gene they were linked to blonde hair so i point this out because that particular variant which most likely is a variant that reoccurred in in that part of the world but we still need to find that out and that particular variant does seem to drives blondism in different parts of the world but because we have the frequency and the effect size very differently ascertained then that variant will be a very poor predictor from the UK Biobank to populations in Melanesia and vice versa so to follow up some of the work that we did in in the AGHPG paper we took empirical data from a really wonderful study called the population architecture using genomics and epidemiology which is an NHGRI funded consortium and work from this study was published last summer in nature and this work is led by Jen Rogik so this was a consortium whose goal was to investigate ancestrally diverse populations across the Americas to under to gain a better understanding not just of how genetic factors influence disease but also how genetic risk transfers across populations and at the time this was one of the biggest studies that was ascertained in non-European populations across the Americas there was 17,000 African American 22,000 Hispanic Latino about 4,000 Native Hawaiian about four and a half thousand Asian actually Japanese ancestry populations and then about 600 or so and Native Americans and others in this group and this data was also linked to over 200 traits and outcomes but we focused our first flagship paper on 26 of these mainly traits and biomarkers underlying cardiometabolic conditions and some lifestyles and one or two diseases in there for example type two diabetes so using methods that we built that allowed us to do a parsimonious joint analysis across populations together and I won't go into it but we had to do a lot of work actually because existing tools would not work out of the box and we were able to do a very well calibrated genome-wide association study for all these traits and this little heat graph is showing you some of the results one of the things I want to point out is even though at the time that we did this GWAS there existed GWAS in predominantly European populations for all of these traits and conditions usually five if not tenfold bigger in size we still discovered about 26 novel loci and unsurprisingly that was because the loci that we discovered were either rare or absent in non-European populations but the other thing we were able to do was replicate about 1400 known trait variant associations from predominantly European populations in the diverse populations of page and this really allowed us to dig into some of the questions around replication of genome-wide associations in diverse populations so one of the first things is we looked at effect sizes across population groups so this is looking at standardized e-scores from the page analysis in two of our biggest populations the African American and Hispanic Latino compared to standardized effect sizes from predominantly European population GWAS for the same conditions in the GWAS catalog and what we discovered was the effect sizes across all loci in a mix of 26 conditions were attenuated in the non-European population so in Hispanic Latino the effect size was attenuated to point eight six and even more so in African American populations now we couldn't rule out that there could be some interplay of winners curse going on here in the study design but the fact that there's two different populations with different effects suggests that there is something more complicated going on now i certainly don't think that the biology or the causal variants are necessarily different in fact i think they're mostly shared but really what i think is going on is the estimates of the effect size are different across the populations so one of the nice ways we could follow this up is we could take the 50 000 participants in page and do a meta-analysis and fine mapping of that data with a very large European ancestry study of height called giant and we could compare that to a similar meta-analysis but just adding 50 000 more European participants to the meta-analysis and when we did this we showed the posterior probability of a top-ranked SNP in any of the credible SNP sets for almost 400 loci in that analysis were significantly higher when you meta-analyze using diverse populations even when the diverse populations are a very small percentage of the meta-analysis as a whole and the reason for that is really shown here in this example where we zoom in on a particular locus in both analyses where you see at the top the locus zoom of the signal at that dot 1L locus and the magnitude of the association is pretty similar it's a little lower p-value in the European meta-analysis compared to the page meta-analysis so you get a slight improvement of signal however when you go down to try and understand what is the causal variance driving that signal in the middle row you see that you have four variants that you have very poor power to discriminate between in the European only analysis versus being able to discriminate with very strong signal excuse me a single variant in the meta-analysis that includes diverse populations and the reason for that is that LD between those four SNPs is very strong in European populations and broken apart in African-American and Hispanic-Latino populations so now you have power to discriminate causality okay so i'm going to finish up with a few thoughts so as i mentioned there are now very many people who are working on this question of how you generalize genomic risk predictions into not only diverse populations but also into different age groups into different clinical or social contexts and to do this i think we have to consider what are the issues that need to be explored further so i mentioned already linkage disequilibrium differences particularly when you are using which is common in in GWAS studies with array and imputed data when you're using non-causal alleles to predict but there are other technical things i think that many people have done a nice work to show including actually in the giant study there was a great paper that showed that there are some even when you're using breast practices in the field to control for population stratification residual population stratification can actually bias your risk prediction in the discovery population even when you know the causal variant for example the 2IRP1 variant i showed you allele frequencies and even effect size differences can differ between populations and of course there are often background positive and negative selection at play this tends to be thought of as a fairly uncommon event in human genomes relative to the types of effect sizes associated with disease outcomes however in this era of thinking about genomic risk for thousands of diseases and traits in many different populations it crops up actually quite a lot and need to be considered and of course not only the genetic effects but i think even more often the non-genetic effects at play that are linked to our own demographic history that's extraordinarily complex and the environment and societies we live in can really have an impact about our risk prediction and interpretation of risk so in these early days i think the evidence suggests that the relative quantitative contributions of these factors can really be condition or trait and population and context specific and highlighting that when we're thinking about calibrating a polygenic risk in humans we ought to jointly consider these complexities so i'm going to end there and just with a couple of thoughts as on some of the work that i showed so you know cryptic relatedness as we really build bigger and bigger databases and particularly as those databases kind of migrate toward being more population based databases the types you find in health systems or in biobanks and then i think we're going to find more and more cryptic relatedness and it's going to explode as we genotype our sequence more individuals and this can impact prevalence for risk it can point us to populations with specific risk and it can point you to how genetic risk or protection is shared among people who could actually live in very different places in the world and genomics you know it tells us about our susceptibilities to disorders and disease but as a population geneticist it can also reveal the broader story of our ancestry and our history and how that itself is correlated to social environmental and behavioral and a determinants of health and often when we're using genomics to look at population groups what we're really controlling for is the environment rather than the genetics to help us uncover what are the important factors going on and then lastly to embrace opportunity in this field i think we really need to embrace complexity and embrace diversity i'm just going to finish up by thanking the people who like the work that i showed today Dr Jillian Belden who's an assistant professor in the Institute for Genomic Health and at the work the work on IBD Sinead Kulaina is a grad student who contributed to the work on the page project that was led by Jen Wojcik who is now an assistant professor in Johns Hopkins University Ruhi Shamarani led the development of Eyelash Alicia Martin led the work on PRS and diverse populations and Christian Yu is a close collaborator on almost all of these work as is Nora Abelhusson who is the co-director of the Institute for Genomic Health with that i'll thank my page collaborators thanks to the thousands of participants who make this work possible and of course to my funders many of which most many of my grads supported by NHGRI thank you thank you so much Dr Kenny for an outstanding presentation sharing all of your extensive work with us it's so very informative i know we are right at time but i do want to take a couple questions that we received so i'm going to start with the first one how do you generally approach issues around linking socially constructed race to genetic genomic indicators and what are more scientifically relevant ways to describe race what would your guidance be to people who are newly considering these types of concerns yeah and i think that people are newly not new i mean people have been considering these questions for decades but you know it comes around in cycles again i think this is something that's really to the forefront that a lot of people are considering maybe even particularly how this relates to medicine and how we use race or ethnicity as a variable in science and medicine so i think that you know the answer is very nuanced because we can use genetic ancestry that maybe is more of a proxy for our demographic history as a species and in different timeframes as i gave you two examples more ancient timeframes and more recent timeframes and that tells us that often tracks with shared biology by virtue of shared genetic variation which can be a very good proxy for how we can think about a much more of a continuum of a population group not necessarily population group but a continuum of us on from you know from an individual level to a family level to populations to our species and however i think that identity is a very nuanced and complex issue and and i think that information about identity although it may be more linked to society and geography and politics and it can also track with things that can be important for our health too like for example healthcare access or food security and from a research perspective that can be very important information if you're trying to learn how those factors impact people's health but from a medical perspective maybe there's ways that we can come away from those groupings in arenas where it causes more harm to think about people that way and we maybe could be more advanced and think about people and using for example aspects of genetic ancestry that might be more appropriate thank you dr kenny another question is related to what environmental covariates do you have access to within bio me and what are your aspirations for using genetic data to tease out the role of various environmental factors yeah i mean i confess this is new to me so i but it's very exciting so from because we're health systems data we can start to think about how we use features of both the exposome and epigenetic type biological markers but also the types of information collected by local and state governments and entities and institutes about the environment around us so the built environment the atmospheric environment the cultural environment and things like that so certainly not my expertise but there's wonderful researchers in this space who have been thinking a lot about what are the environmental components that really matter for specific outcomes and i think maybe genetics is a little new sometimes into this space but we can bring a genetic layer in and think about modeling those things together or with other biological markers and kind of increase the dimensions of the data that we're using to learn i think we have to do that very thoughtfully and carefully and not overinterpret but i think that certainly from a genetics perspective it makes sense to me that this is a big missing variable in much of the work that we do and i have no doubts that we will uncover a lot of nuances of information for discovery thank you dr kenny i know we are a couple minutes over time so i just want to thank you so very much for your presentation today and i want to thank everyone that joined us we really appreciate it thank you all thank you very much thank you