 Good afternoon, everyone. I'm Eric Green, Director of the National Human Genome Research Institute, and I want to welcome you to this fifth lecture of the Genetics and Health Disparities Lecture Series. For those of you like me who are a little disoriented, having the podium on this side of the stage, it was like an alternate reality for me when I walked in. Twenty-two years I've been here, and the podium is always there. And I came in and I thought I was in the wrong amphitheater. And it complicated reasons why it's here, but I believe it's a permanent move, so get used to it. You know, the world is changing as we know it, and the podium position is one such feature. For those of you who are new to this lecture series, I should remind you this is actually cosponsored by four NIH institutes, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Heart, Lung and Blood Institute, the National Institute of Minority Health and Health Disparities, and the National Human Genome Research Institute. And we're pleased to be joined by another cosponsor, which is the Office of Minority Health at the Food and Drug Administration. For those again who haven't been familiar with the series, these speakers have been chosen by these five different organizations to present the research to demonstrate how the ability of genomics has great capability for improving health equity. The speakers in the series have approached the problem from different areas of genomics, to basic science, to population genomics, to translational and clinical research. And the lecture topics have really focused across disciplines and frequently across the globe, as certainly you'll hear today with today's speaker. I'm actually not going to introduce today's speaker. I will tell you up front, I've heard her give talks probably four or five times over the last two or three years, and you're in for something special. But to introduce today's speaker, actually, I'm going to invite to the stage a representative from one of our cosponsors, and that is the Office of Minority Health at the Food and Drug Administration. And representing that organization is Dr. Marty Mendoza. So Marty, please, if you've come to the stage, and please introduce today's speaker. Good afternoon, everyone. My name is Martin Mendoza from the Office of Minority Health, Commissioner at FDA. And on behalf of FDA Commissioner Dr. Robert Califf, Assistant Commissioner for Minority Health, Dr. John Kable, and Dr. Green, is my distinct honor and pleasure to introduce our speaker for today's lecture, Dr. Sarah Tishkoff. Now as Eric alluded to, Dr. Tishkoff is one of those speakers that, due to her long history of excellence in this field, really needs no introduction. So I'm going to try and keep my remarks brief, just to allow more time for discussion and questions. But I want to share some of the highlights of Sarah's career to date with you, because she's just really, really impressive. So Dr. Tishkoff, Sarah, started off her academic career by earning her bachelors from UC Berkeley, followed up with a master's from EAL, and then a doctor in genetics also from EAL in 96. She went on to do a postdoc at Penn State under Dr. Andrew Clark. And from 2000 to 2007, served as both assistant and associate professor at the University of Maryland College Park in the Department of Biology. So glad to have you back home here in Maryland, Sarah. Currently Sarah is the David and Lynn Silphin University professor in genetics and biology at the University of Pennsylvania, holding appointments in both the School of Medicine and the School of Art and Sciences. At U Penn, Sarah studies genomic and phenotypic variation in ethnically diverse Africans. Her research combines fieldwork, laboratory research, and computational methods to examine African population history and how genetic variation can affect a wide range of practical issues. Sarah is a recipient of an NIH Pioneer Award, a David and Lucille Packard Career Award, a Burroughs Welcome Fun Career Award, and a Penn Integrates Large Endowed Chair. In 2003, she was featured as one of the 10 most brilliant young scientists in the U.S. by Popular Science Magazine, and has been the keynote speaker at Cerebral Symposia, including NHGRI Symposium in honor of the 10th anniversary of the Human Genome Project. She serves on the board of several journals and has been a reviewer for Nature, Nature Genetics, and Science, and several other journals. She has a long, very impressive history of NIH funding, one that most PIs can only dream about, with her current research being supported by multiple NIHR loan grants, as well as funding from the National Science Foundation. Her publication record, when I looked up last night on PubMed, is well over a mile long with over 100 peer-reviewed manuscripts, as well as numerous authoring numerous editorials, reviews, chapters, and books. Tara's research has been the focus of numerous news articles and perspectives in science, nature, and nature genetics, and her science has been featured in hundreds of media reports, including The New York Times, The Washington Post, The Baltimore Sun, The LA Times, U.S. News and Rural Reports, National Geographic, CNN, ABC News, and Sports Illustrated. Just to name a few. She's been interviewed by the Discovery Channel, PBS, the BBC, NPR, and WTOP, and really I could just keep going on forever. I could think of the whole hour talking about Sarah. So instead, I'll just say that we are so honored to have Tishkoff with us today, and we can't wait to hear more about her research on evolution and adaption in Africa, implications for health and disease. Sarah? Thank you for that. Nice introduction. A really hard time living up to that, like amazing introduction. So I am really excited to be here today to tell you about my research, and I'm going to start off just talking about what I think are some of the key challenges generally in human genomics research. One of the big outstanding questions or things that I think we need to accomplish is just simply to continue to characterize genomic and phenotypic diversity across a broad range of ethnically diverse human populations. We're trying to better understand the evolutionary processes that generate and maintain that variation and to understand how gene-gene, gene-protein, and gene-environment interactions contribute to both normal variable traits and disease risk. And as you heard, the focus of my research is on Africa, and there's a number of reasons for that, but one of the primary is that these red dots are showing the location of fossils of anatomically modern humans in Africa, and the oldest is dated to about 150,000 to 200,000 years ago. So modern humans arose in Africa. And then somewhere in the past 50,000 to 100,000 years, small numbers of individuals migrated out of Africa and across the rest of the globe. And that demographic history has really shaped the pattern of variation that we see today. Now when we left Africa, it turns out as the nature, apparently, of the human species, we like to add mix. We actually add mixed a bit with Neanderthals, archaic populations that were outside of Africa, and Denisovans, another archaic population. So they've contributed roughly two to six percent. It's been estimated of the non-African genome. We've also shown this is probably going on in Africa as well, but we don't have any good ancient DNA samples from that region yet. Now, this slide kind of cracks me up because it's like Estee Lauder's version of ethnic diversity amongst beautiful models. But the point is just there's not a lot of diversity. Right at the genome level, we're talking about less than 0.1% divergence reflecting this recent common ancestry. But we do know that there's a lot of structural variation. There's insertions and deletions and inversions, and they're really challenging to characterize. And I feel like I think we still have a ways to go with that. And so some people have, you know, it's been shown that that could be accounting for a pretty decent proportion of variation between individuals as well. But every study has shown that the majority of genetic variation is within populations and only about 15% between populations. So I've long been interested in studying African diversity, as I said, because it's the site of origin of modern humans. It's also important for reconstructing the African or understanding the African diaspora and African American ancestry. It's an important region to study, to understand the genetic and environmental factors that contribute to both communicable and non-communicable diseases. So the three biggest communicable diseases, the three biggest killers right now are HIV, TB, and malaria in Africa. But a lot of non-communicable diseases like diabetes and hypertension are really on the rise, particularly in urban areas. So we need to have more research done in that area. And then, of course, we all have everybody in this room knows about the Precision Medicine Initiative. And one of the things that's been stated about that is that one of the goals is to identify variation at genes that play a role in drug response, for example. We know that these exist. And there's been shockingly little done in Africa. I'm going to talk to you today about a little bit of preliminary data that we have. But when we went to find comparative data sets, they almost didn't exist. So apologies to those who've heard my talks before, but I have to give you a little bit of background about Africa. So there are over 2,000 ethnic groups in Africa. And they speak languages that have been classified into four major language families. In blue is the distribution of populations that speak Afroasiatic languages. And those would include Semitic speakers from the Middle East and Cushitic speakers in the Horn of Africa. In red is the distribution of Nilo-Saharan-speaking groups. And these are mainly nomadic pastoralists. You might have heard of the Maasai, for example. The widest-ranging language family is Niger-Cortifanian. And that originated in Western Africa. And the biggest subfamily is called the Bantu set of languages. They originated on the border of Nigeria and Cameroon. And there was a major migration within the past 4,000 years to the east and south and to the west and south. And it really shaped the genomic landscape in Africa. And then lastly, you have populations that have been called Koisan. And those are the groups that speak with a clique, like the Saan, if you saw the gods must be crazy. They have traditionally practiced hunting and gathering lifestyle. There are actually two groups in Tanzania, the Hadzan, the Sundawi, who also speak with cliques. And that was actually what really got me originally interested in doing fieldwork in Africa was to study the relationship of those populations. So when I started this research and I got initially interested when I was at Ken Kids Lab at Yale, and at that time they had two representative African populations where they claimed it was. And those happened to be two central African hunter-gathered groups, so called pygmies, who were not at all representative. And there was just so little known. And that's about genetic variation because there just wasn't a lot of DNA material that was available from ethnically diverse populations. So starting in 2001, myself, my students and African collaborators had been doing fieldwork in Africa, and I thought I'd just show you a couple of shots so you can see what it's like. We are focusing on minority populations who practice indigenous lifestyles. They're typically in very rural areas, requiring use of a four-wheel drive vehicle. And this is work that was carried out largely by Simon Thompson and Alessia Ranchiero. And this is from Tanzania. These are members of the HATSA hunter-gatherers who speak with a click. This is our anthropometric measurement setup. More recently, I had funding actually from NIGMS to go to Cameron. This is, these are my colleagues from the University of Younday Medical School and my grad student, Megan Rubel and Alessia and Eric Mabunway. And just to give you an idea of some of the challenges we've run into, there's just the challenges of working in really remote areas. This was in Ethiopia during the rainy season. You can imagine what that was like. And then in Cameron, this was our biggest hazard. It was these big giant logging trucks because most of the time we're not on the paved roads. We're on, you know, dirt roads. And when you're behind them, they're kicking up a ton of dust and they're going like five miles an hour and you can't pass them. And when they're behind you, they're going like 95 miles per hour and they're running you off the road. So in all of these regions, we're collecting biological samples. We're getting DNA, RNA. Now we're getting frozen plasma from blood. We're getting fecal samples to study the microbiome, urine samples to look at metabolism. And then we're getting very detailed ethnographic information and information about diet and whatever health information we may be able to obtain. It's a huge challenge to process these. It took many years just to figure out how to collect this data because there's typically no electricity. So now we bring a generator with us or if we had to, we had to resort to using the car battery at times. One, a really important component of this work is we've tried very hard to do this in as ethical a manner as we can. And what that means is it takes a long time. I'd say the average time is about five years for any region we go to. Because a lot of time is spent initially in community discussions talking about the project, the risk and benefits and questions they may have and making sure they understand. And then we have to go through all the ethical reviews, both at the countries where we work as well as at our local university. We also think it's really important to return results to participants and I could tell you they really, really appreciate that. They don't typically read science but we translated this into the local language in sort of layman's terms. And of course training capacity building is absolutely crucial right now. The H3Africa program is putting a lot of money into this and I think it's great in something we need to continue to do. So we also measured phenotypic variation, very detailed anthropometric variation, not just height and weight but like limb length and grip strength, skin pigmentation. We looked at cardiovascular lung and blood phenotypes, metabolic function, lactose tolerance, glucose tolerance, we've done taste perception test and when possible we look at infectious disease status. So our approach is what I would call an integrative evolutionary genomics approach. We're using different types of omics technology to look at how those are impacting each other and environment is interacting with these to impact the phenotype. Today I'm just going to focus on one of these. I'm going to focus on our genomic studies. So working with populations in Africa, one of the interesting things we can do is to try to disentangle genetic and environmental factors that are influencing variable traits. So in this case for example we can look at individuals safe from the same ethnic group. So you're controlling for genetic ancestry. But living in a rural versus an urban location. Or you can do the opposite. For example these are two ethnic groups that live in the same village. These colors represent different ancestries. The Fulani have an innate resistance, relative resistance to malaria for reasons we don't know. And the San in Botswana have more of a susceptibility, increased susceptibility to TB relative to the other populations. Here's another striking example just looking at BMI. On the left is a woman from the San population that speaks with cliques from Southern Africa. On the right is a Haruero Bantu speaking pastorless. So we're seeing pretty striking differences. So I'm going to tell you now about a study that was published a while ago. But it's still the largest study to date of ethnically diverse Africans. We looked at over 2,500 Africans from 121 ethnic groups shown here. But you can see there's a lot of gaps. There's still a lot to be studied. We looked at 98 African Americans from four regions in the U.S. and greater than 1,500 comparative non-African groups. So this is just showing the levels of genetic diversity. The higher the bar, the more diversity there is. Color coded by geographic region. And we see that Africans are the most diverse. And we see decreasing variation as you go west to east across Eurasia, into East Asia, Oceania and the Americas. That's reflecting that bottleneck when we left Africa in a series of founder events as we migrated across the globe. This is a phylogenetic tree made by pairwise genetic distances between populations. You can't see any details. But I want to just point out that here and shown in colors are the African populations. These are the non-African populations. We see that populations cluster by geography, but not by what we would think of as racial classification. So here are people from India, Central Asia, Europe, East Asia, Americas, Oceania, North Africa. And even in Africa we see clustering by geography Eastern Africa, Western Central and Southern Africa with a couple of exceptions. Now another way we can look at ancestry is to use a computational approach. This is what the method of software called structure, some of you might have heard of that. And we just ignore all the labels. And we just use the genetic data to try to infer the number of ancestral populations or population clusters. And they're represented by the different colors here. Then we put the labels back on. And these are actually each, this is made up of a number of lines, each one represents a person. And they can have ancestry from these different ancestral population clusters. Here are the non-Africans. People who self-identify as Middle Eastern or European are shown in blue. People from India here, Pakistan, Central Asia. We see a lot of admixture here. East Asia, Oceania and the Americas. And you can see in Africa a lot of diversity. And so more actually than we're seeing on a global level. So lots of variation amongst populations in Africa. Just pointing out some of the trends in orange. These are the groups that speak the Niger-Cortifanian or Bantu languages from Western and Central Africa. In purple are people who speak Afro-Asianic languages from Ethiopia in red, Nile-Saharan languages. We think they came from southern Sudan. And then the hunter-gathered groups like the Hadza, Sandawe, Pygmies and the Sun. Now, if we do the same thing and we just look in Africa, and then we pull people together just for ease of presentation. What I'm showing you here are the three groups representing the groups that were in the HapMap study. And of course we now know that 1000 Genome Study has focused mainly on the Yoruba from Nigeria. It's now expanded to include other African groups. But they all speak a Niger-Cortifanian language. So they all can trace back their ancestry to that region. So genetically they look very similar to that Yoruba population. And this is what's being missed. A huge amount of diversity in Africa. And we can see that the variation we see in East Africa is different from Southern Africa, Western and Northern. And we think that this is due to both the demographic history and due to adaptation to very diverse environments. And as I'm going to show you, we know that there are some functionally important variants that are common and specific to particular regions. So we really need to be encompassing a broader range of African populations in human genomics research. What about the African-American ancestry? This is going to be absolutely no big surprise. Predominantly, West African ancestry from Niger-Cortifanian-speaking populations. And then the next ancestry we see in large proportions is European, ranging from zero to greater than 50%. Very small amounts of ancestry from other regions. But keep in mind, these are just four populations. If you look in other regions of the U.S., you'll see something slightly different. And this is of course reflecting the history of the slave trade, which is predominantly from Western Africa. Now note that one of the biggest sources of slaves was actually Angola. And we know very little about variation there. It's a place that's been hard to go to. There are landmines. So maybe in the future. Then together with Carlos Bustamante's group, we wanted to see if we could look at the fine-scale variation in Africa. So we looked at several hundred thousand single nucleotide polymorphisms, genetic variants. And you can see that if we do a principal component analysis where each dot represents a person, and if they cluster close together, it means they're genetically more similar to each other. It looks remarkably like the map of Africa. And this is because of isolation by distance. Populations migrated. They tend to marry or, you know, interbreed with people who are nearby. And they tend to differentiate a little bit. But that's a really subtle difference. Because what we originally wanted to do was try to see if we could trace ancestry to different ethnic groups in Africa, and we couldn't do it. Because those differences are really, really tiny. What we could do is along principal component one, here is a sample of African-Americans, Europeans, and the Western African groups we looked at. We can, with high accuracy, determine if African-American individual has more European ancestry or African ancestry. And Carlos's group came up with a method in which we can essentially slide across the chromosomes. Here are one chromosome, one through 22. You can slide along and look at regions where there's 100% African ancestry, 100% European ancestry, or mixed ancestry and green. We see a lot of variation. This would be, you know, typical. But you see some people who self-identify as African-American. We don't see a lot of African ancestry. Was it mislabeling or is it just telling you about people may identify in different ways? But keep in mind that, you know, imagine that there was a gene that plays a role in drug metabolism that was located right here. Someone self-identifies as African-American or maybe the doctor looks at them and says, you know, I think you're African-American. Say that there were differences in how effective these are in different populations. They might say, well, I'm going to give you this medication based on what I perceive your ancestry to be. But at that position, they're actually 100% European at that position. And that's why a more personalized approach is important. All right. So I want to tell you about some new data that we have. We've now done high coverage, whole genome sequencing. And this is part of the Simons Project together in collaboration with David Reich's group. And this was work being done by a very talented postdoc, Xiaofan. We've sequenced 94 individuals from 44 African populations from very diverse regions, as you can see here. We can construct a phylogenetic tree. And we can see similar to the other tree. If we were to, oops, I just went to the wrong thing. If we were to look at non-Africans, they'd be coming off here. So here is sort of the Ethiopian, the Afro-Asiatic and Nalisa Heron speaking groups. The Hudson, Sundawe, the Bantu, and Niger-Cortifanian speaking groups. Pygmies from Central Africa who have some of the oldest lineages. And the absolute oldest lineages are the sawn. So if you trace their lineages back, they're at the root of this tree. So one thing that we can do is there is a computational approach where basically if you can try to phase these genomes, and that is somewhat challenging to do. But if you can, you can compare the genomes of individuals from different populations. We look at the coalescence of lineages. And from that, you can try to infer the time of divergence and also the historic population sizes. And so we have inferred that the deepest, and this is very consistent with what other people found, the deepest split is between the sawn and other populations at about 100,000 to 120,000 years ago. Then the sawn split off from the other hunter-gatherers, the Hadza and the Sundawe and the Pygmies roughly 60 to 80,000 years ago. Then we have more recent divergence between the Niger-Cortifanian, Nalisa Heron and Afro-Asiatic between about 22,000 to about 40,000 years ago. And relatively, well, not that recent, the divergence of the Hadza and the Sundawe roughly 20,000 years ago. And interestingly, subgroups of the sawn diverged around 30,000 years ago. And that timing actually happens to be around the time of the last glacial maximum, about 22,000 years ago when it was very arid and dry. It may be that population separated into different refugia. And that's when that divergence occurred. So in other words, though, this substructure we see has been around for a very long time in Africa. When we try to infer historic population sizes, interestingly, these hunter-gatherer groups historically have the largest population sizes. And in fact, if we were to put Europeans on here, it'd be right around here. Now, I want to talk about some other new data that we have looking at genes, admi genes that play a role in drug metabolism. This is an honor of being invited partly by the FDA. And this is work that has been done by a talented graduate student who has actually just been rotating in my lab and a postdoc, Nancy Lowe. So we looked at genetic diversity at 21 admi genes in diverse Africans from eastern and southern Africa. And these are the genes shown here. And the first thing I'm showing you is just level diversity. And the different colors represent the different ethnic groups. So the point is just to show that there's a lot of variation depending on the gene in terms of the level of diversity. We see one of the ones that has the highest diversity is that NET2, that's a gene that plays an important role in metabolizing drugs that are used to treat TB. Now, out of these genes, we found a total of a little bit over 13,000 single nucleotide variants. About 70% are in the database and about 30% are novel looking at these African populations. Now, if we look at the ones that are causing a non-synonymous substitution, so meaning a mutation that causes an amino acid change, we found a total of 220. But now look at this, the majority of those are novel. So when we look at the non-synonymous, they tend to be novel. And 82% are predicted to be potentially deleterious when we use the software called Polyphen. So the novel ones are sort of the open boxes. And we could see that a larger proportion of novel ones tend to be, are predicted to be deleterious. And here, I'm trying to show that all these non-synonymous steps, 138 of these, they tend to be at very low frequency, all right? So many of these non-synonymous variants tend to be at low frequency. And the potentially damaging variants often are restricted to one or a small number of populations. So here I'm showing the number of populations that have a minor allele frequency. For example, here, the possibly damaging and probably damaging, it's sort of the orange colors, the ones that are common. So you could see that often we see these variants in a single population, sometimes we see them in two, typically not more than five. But it's something to be aware of that you have these potentially deleterious mutations and they may be very specific to certain populations and they can differ quite a bit in allele frequencies. So we also see that several of these genes have an excess of non-synonymous variation that might be due to selection maintaining diversity. Diversity, again, that too is one of those. And we can use another test of selection called Tegimus D. It's based on the allele frequency distribution. Without getting into details, I'll just say that if you have a positive value, that can be consistent with the type of selection to maintain diversity. We call it balancing selection. And if it's negative, that could be consistent with a single variant being under very strong selection and rising rapidly to high frequency. And what's interesting is amongst these genes, you see a really different pattern. Some of them are maintaining diversity. Some of them you see less diversity than you'd expect. And some of these, for example, the one at the very end, UGT-2B7 at the end here, we're seeing really divergent patterns between populations. So some are showing evidence of positive selection in balancing selection, and some are showing evidence of positive directional selection. So I want to move on now from the showing the patterns of genomic diversity. I want to talk a little bit about phenotypic diversity in Africa. And this is work being done by a postdoc, Matt Hanson. So we've measured 22 traits, cardiovascular and anthropometric traits, and between 500 to 5,600 individuals in this particular study. And these are groups that have very different diets. So includes people who are practicing hunting and gathering. So they have a diet very high in nuts and tubers and somewhat in meat. People who practice a pastoralist lifestyle have a diet very rich in milk, blood and some meat. They also exchange corn with neighboring groups. And then the agriculturalists will have a lot of grains in their diet. One of the first anthropometric traits I'm going to mention is skin color. And I also mention this because often there's been a lot of discussion lately about race, race and genomics. When people think about races they often think about skin color. It's a really terrible way to classify people because it seems it's thought to be an adaptive trait. We see dark skin color in many places where there's a lot of UV light. What people don't realize is that in Africa there's a huge amount of variation actually in skin color with the most lightly pigmented populations being the sawn and the most darkly pigmented being the pastoralists who speak Nile Saharan languages. And we now know that this is due to different genetic variants. Now looking at some anthropometric traits, let's start actually with height. And we've color coded these according to the diet, the subsistence pattern. And we could see a huge variation in height. The shortest statured populations are the so-called pygmy populations from Central Africa. And the tallest are the pastoralists, the East African pastoralist groups. If you look at BMI, the pastoralists have the lowest BMI so they're very tall and thin. Some people have said that may be an adaptive trait. And consistently the agriculturalists, the food producers tend to have the highest BMI. If we look at some cardiovascular traits like blood pressure, we see significant differences with significantly lower blood pressure in the pastoralist populations compared to the others. We've also looked at a number of lipid biomarkers like triglycerides, for example. Again, you can see significant differences between populations and also based on subsistence patterns. Now if we look at the extremes of the phenotypic distribution for these different traits and then we ask the question, at those extremes, which people with which subsistence patterns are enriched at those extremes? For many of anthropometric traits, the pastoralists tend to be on the very enriched for the extremes of being taller for example, or longer limb length as opposed to the hunter-gatherers. But if you look at, for example, the cardiovascular traits like here is a, can I see if that's correct? A pulse, it seems like there's an enrichment in the higher pulse rate in the hunter-gatherers and LDL is very high in the pastoralist. Now we use a different data set, a million single nucleotide polymorphisms to infer ancestry the way I told you before in a number of these individuals for which we have phenotype data. And we ask the question, what's the association between ancestry and these phenotypes? And then we compared it to what if you label people based on diet? When we look at height, ancestry is really strongly correlated with height and diet, not at all. Okay, so that's consistent with a strongly genetic trait. But if you look at blood pressure, also very strongly correlated with ancestry. You look at BMI, not so much, right, as we would expect. So now we're trying to look at genome-wide association studies to try to identify genetic variants associated with these traits. I also want to mention, this is literally hot off the press. I barely had a chance to look at this data, but we have a grant from NIDDK, this is together with Chuck Burant of the University of Michigan. And we started looking at metabolic variation in diverse Africans. So here we're looking at 450 individuals, the different colors represent different ethnic groups. There's no point labeling because it gets too complicated. And this is looking at a lipidomics panel of about 400 lipids. What you can see from this, though, is we are clearly seeing clustering based on ethnicity. Okay, so we do see differences between ethnic groups. And this becomes even more striking if we just, like, tease this apart a little bit. If we look at three different pastoralist groups from three geographic regions, they actually are easily distinguished. So here's a case where their genetic ancestry is different. But the diet is the same, and we can distinguish them here. Here's a case where the Hadza, there are only 1,000 Hadza hunter-gatherers. They live around this lake called Lake Ayasi. We sampled from two regions. One of them is living a more traditional lifestyle. The other one's a more settled lifestyle. We can actually distinguish them with this metabolomic analysis. We then did, oh, just to show you, in terms of differences in lipids, you can look at things like the total amount of lipids. When we do that here, we actually see the highest of one of the agro-pastoralist groups. Or you can look at the relative composition. We see that triglycerides very low in the hunter-gatherers and very high in the pastoralist. We then looked at untargeted metabolomics. You can detect about 4,000 metabolites. And again, if we label these based on diet, we're seeing pretty striking differences. Here's the pastoralist, agriculturalist, and agro-pastoralist in red and blue. And then the hunter-gatherers in the red here. And here, if we just focus on the named compounds of which there are about 300, again, we can clearly distinguish based on diet. And we also see differences amongst ethnic groups. So as I said, hot off the press. Now the next part is to do a more detailed analysis and integrate it with the genetic data. I'm super excited about what we're going to find from this. So the last part of my talk, I want to talk about our studies of looking at genetic signatures of natural selection, adaptation in Africa. This is important both to learn about human evolutionary history, but also because mutations associated with disease and modern populations have been, it's been hypothesized that maybe they were adaptive in the past. And maybe that's why they're so common. So if we can find them, it gives us a clue about disease risk. Now here are some subset of populations we've looked at somewhere at high altitude, low altitude, very different diets, very different infectious disease exposures. So they've undergone local adaptation most likely. I'll briefly mention sort of a classic example which is the genetic basis of lactose tolerance done by Alessia Ranchero in my lab. So the ability to drink milk as adults or lactose tolerance is long been known by anthropologists to be an adaptive trait. Most mammals can't drink milk as adults and most of the world's population can't drink milk as adults. Only in people whose ancestors practiced staring are they able to do that. So mainly we see it really common in Northern Europe and then less common in Middle East, not common in East Asia, not common in Native Americans, not common in most West Africans, which is why it's also not common in African Americans, lactose tolerance that is. But we do see this to be common in East African pastoralists. And in 2002 there was a beautiful study by Lena Paltonen's group. They found the genetic variant near the lactase gene. It's a regulatory variant. It's altering gene expression in Europeans. So we sequenced that region in the Africans and they didn't have it. So we knew they had to have something else. So we gave them a lactose tolerance test where basically we give them the sugar lactose and everybody drinks this at one time and then we use just a diabetes monitoring kit and you look at the blood glucose levels. That's actually more challenging than you might think. They have very callous hands. It's sometimes hard to do that and it is a time test. You do this every 20 minutes over about an hour and a half and then you look at the maximum rise in blood sugar and that's how you characterize their lactose tolerant. And then we sequenced this region. We looked for associations and we found three novel variants very close to the variant associated with lactose tolerance in Europeans. And these are all located about 14,000 base pairs upstream of lactase. Now more recently, Alessia looked at the distribution of these variants in globally diverse populations and what's striking is the most common one we see at position 1410 is really specific to Kenya and Tanzania. We see it a bit in Southern Africa where we think it was introduced by migration. This other variant appears to have arisen in the Middle East and was introduced by migration into the Horn of Africa and another variant seems to have arisen in the Horn of Africa. So again, take home point. You can have functionally important variants. They arose through the natural selection and they can be geographically restricted. We also saw a whopping signature of natural selection. If you simply plot, these are individuals who are homozygos for the allele associated with lactose tolerance. And if you look at genetic variants over 3 million base pairs and if they're homozygos, you just draw them as a solid line when they're not homozygos, you stop the line. We see homozygosity extending about 2 million base pairs. If you look at the chromosomes that have the ancestral allele, it extends about 1800 base pairs. So there's been this rapid sweep of chromosomes that have that variant and leaving a really striking signature and we can use computational approaches to estimate the age of this mutation to be somewhere between 3,000 to 7,000 years old. By contrast, the European mutation, we estimate it to be about 9,000 years old, which corresponds really well with the archeological data for the origins of cattle domestication to be about 8,000 to 10,000 years in North Africa and the Middle East, but it wasn't introduced south of the Saharan desert to about 5,000 years ago. So a striking example of gene culture co-evolution. But that's a Mendelian trait. We could say that Mendelian traits are the low-hanging fruit, so let's look at a really complex trait. Let's go for height. One of the most complex traits we know about at least in Europeans. Highly heritable studies and now tens of thousands, hundreds of thousands of Europeans show that there are hundreds of loci involved, most of them of tiny effect. As I mentioned in Africa, we see a really broad range of height from very short-statured central African hunter-gatherers commonly referred to as pygmies to very tall and thin pastoralists in East Africa and it's thought this may be adaptive to different environments. So I'm gonna tell you about our studies of the genetic basis of short-statured central African pygmies. The distribution of these groups are shown in gray. There's typically divided into two groups, the Eastern pygmies and the Western. We focused on three groups in Cameron whose mean male height is 152 centimeters and we looked at the neighboring Bantu-speaking agriculturalists who are considerably taller. Now there's been all kinds of theories about why might there be selection for short-stature. But I'm gonna focus on the last one which had to do with a demographic study that showed that pygmies, all pygmies, this trade actually arose around the world. We see this in the Philippines, Papua New Guinea, we see it in Amazon. And what they all have in common is that they die at a really, really young age. So the chance of living to age 15 in this group that we're studying is about 40%. If they make it to age 15, the expected lifespan according to this research is 24 to 25 years of age. And it's really remarkable. So they're largely dying from infectious disease but there are some other hazards of this lifestyle like falling out of trees as well. And if you're dying at that young of an age, there may be selection to reproduce at a younger age. And that's what they showed is that they were reproducing at a significantly younger age. Now it's thought, this has been debated, but some people have argued that there's a lack of an adolescent growth spurt in this population. And that could be then a trade-off that if you're having earlier reproduction, maybe it's at the expense of not having this adolescent growth spurt but we still need to do more work on that. There have been very few physiologic studies done. Most of them were like in the 1980s on a handful of individuals but they've all shown disruptions to the growth hormone IGF-1 pathway. So we're talking about a very different physiologic mechanism than what we see in Europeans, counting for differences amongst Europeans in height. So if we look at the ancestry, here are the pygmies, they have a lot of admixture but the neighboring Bantu agriculturalist. And their height is strongly correlated with that admixture. So the more pygmy ancestry you have, the shorter you are and the more Bantu ancestry you have, the taller you are. Very consistent with this being genetically influenced trait. We then wanted to see how did the genomes of the pygmy hunter-gatherers compared to those of Bantu agriculturalist and Maasai pastoralist. And we did all kinds of tests for scans of selection and I won't get into those but just the places where we saw them in the genome are shown here. And what we noticed was a cluster on chromosome three. And we have almost no power. This is a tiny population and this is a general problem when looking at minority populations, particularly indigenous populations, we will never have sample sizes at the level that we do in urban populations. But we can still get information and in this particular case, because we hypothesize that this is an adaptive trait, if we just focus in on those regions that we identified as being targets of selection and then do an association with height, we have much more power. And in fact, we were able to find some significant associations and several were in this 15 megabase region on chromosome three and they encompassed several genes. One of them is doc three associated with height in non-Africans. Cish, a member of plays a very important role in IL-2 signaling and has been shown to be associated with resistance for a number of infectious diseases. And it turns out that when Cish is overexpressed, it shuts down human growth hormone receptor. So that made me wonder, could there be selection for immune function? Because that's one of the reasons they're dying so young and maybe that's indirectly altering influencing height. We then, I think this is out of order. I think I skipped a slide. We then did whole genome sequencing and we looked for regions of the genome that are highly differentiated and we saw that these tend to be regions enriched for pathways that include genes involved in neuroendocrine signaling, reproduction, metabolism and immune function. And interestingly, we see an enrichment for genes that play a role in pituitary function. So these are some of the genes that stood out. It's being very differentiated specifically in this pygmy population. I find that kind of intriguing that they're all, it's like a bunch of arrows pointing in that direction. One of the top hits was thyrotropin-releasing hormone receptor. And I thought that was interesting because I've spoken with anthropologists who noted that pygmies don't get goiter to the same extent as the neighboring Bantu populations. And it may be wondered, is there something different about thyroid function in this population? Could it be maybe even in biological adaptation to a low iodine environment? But again, it has pyotropic effects and it could be affecting other things like growth, thermal regulation, reproduction and immune response. This was supposed to have come earlier. I apologize, this was just the show that we got this data from sequencing. And when we looked from that sequencing study for variants that were very common in the pygmies and specific or close to specific to being in that population, we found about 25 clusters. And one of the biggest was right in that chromosome three region, but we missed it. We did not find it in the other study. And that's because it makes this, these variants are in 100% LD over almost 200,000 base pairs and it's essentially a pygmy specific haplotype and the Illumina array didn't tag it. So we didn't find it if we hadn't done sequencing. And interestingly, it encompasses this gene called Hessex1 and there's a non-synonymous variant. This is interesting because that gene plays an important role in development of the anterior pituitary site of production of growth hormone. This non-synonymous variant, however, is common at about 20% frequency in other African groups. So we suspect there's important regulatory regions in flanking this and we're following up on that. The other variants we found are about 300,000 base pairs upstream of PUF1, which is known as pit one and mouse. That is the transcription factor that is key for regulating growth hormone expression. These were very differentiated. When we genotype them in a larger set of samples, we found an association with stature. So our hypothesis is that there are alterations in the growth hormone, growth hormone IGF1 pathway that are important. They may be influencing pituitary function and it could be that short stature is a byproduct of selection acting on pleiotropic loci. So if we look at this pathway, here's our candidate genes. The growth hormone pathway, growth hormone results in an increase of IGF1 expression from the liver. That triggers off expression of these genes that play a role in bone growth, but it's also important for insulin metabolism, fat metabolism, and infectious disease, alter cytokines, and those cytokines are often regulating gene expression. We just went to Cameron. We are going to measure all of these intermediate traits to better understand what may be this co-adaptive complex in this population. All right, my last like three minutes or something, three to five minutes, I'm going to talk about a commentary, a perspective piece that I wrote with my colleagues, Michael Udell, Dorothy Roberts, and Rob DeSalle that was published in Science Not Long Go, but had it really seems to have had a pretty big impact where the title was taking race out of human genetics. And apparently people, this was kind of a controversial figure that science put together. We're going to shred up this concept of race. Now what we didn't mean to say is that there's no differences, and we didn't mean to say that it's not important to study different ethnic groups in genomics research. But the problem is that when biomedical researchers talk about race, they often are implying a biological classification, and there's been a lot of problems with biological classifications of race. They've been used for a number of bad things including eugenics, justification for genocide, for example, the Holocaust, colonialism, slavery, many other social inequities. And the problem is with using race as an identifier is that there's just no clear definition. I challenge you to come up with a definition of race because we've talked about this at many different workshops, we never can. And that's because it is typically classified based on both sociocultural and biological characteristics. As I hope I've shown you in my talk, it does not correlate with patterns of population structure inferred from genetic data. They don't form these clusters that correspond with what, you know, we or the Census Bureau is referring to as races, at least in the U.S. population. Now we know that ethnic groups differ for risk of disease. In fact, that's a classic example of a genetic trait, a genetic disease as they tend to differ in prevalence in ethnic groups. But we know that people whose ancestors lived in different environments have been subject to different selective forces and different demographic histories. Some diseases may be influenced, as I've shown you by geographically restricted alleles. And inclusion of ethnically diverse populations is important for distinguishing pathogenic from non-pathogenic variation because particularly nowadays, we're doing all the sequencing and precision medicine. And you can imagine you're sequencing a genome, maybe there's an orphan disease, whatever. You see a variant is a pathogenic. You look at the database and you say, I've never seen this before, it must be pathogenic. Not necessarily because if you look in Africans, there's a good chance you may see it. So, you know, it's going to be important for that as well. And information about individual ancestry can provide, in my opinion, important medical information for both diagnosis and treatment. But we have to be wary of racial profiling and ignoring the continuous nature of variation in admixtures. So, for example, people, you know, cystic fibrosis, it's very common in Europeans. But in the African-American community, there's a lot of admixture with Europeans. So, someone who self-identifies as African-American may have that variant. We can't make generalizations based on continental region of origin because there can be a lot of variation amongst populations within a region like Africa. And very importantly, people differ in prevalence of disease, not just due to genetic factors but due to environmental factors. And a key issue is to disentangle these factors in their interaction. So, the goal should be personalized medicine. Information about race can be important if you're trying to distinguish sociocultural and environmental risk factors for health disparities. But my argument would be depending on the study, depending on the question that you're looking at, give more detailed information. Be as detailed as you possibly can. You might want to talk about ethnicity, geographic ancestry, religion. If there's looking at a disease that's common, say, in Ashkenazi Jews, it depends on what you're looking at. But more information is better. I'll just end with that thanks to the many, many people who contributed. And of course, all the funding institutes including here at NIH and particular thanks to all the people in Africa who contributed. Thanks. I'm happy to answer any questions that you may have. Don't be shy. I know there's got to be some. Do you have any data about Neanderthals out of Africa? Did they come out of Africa in earlier radiation? Or where did they start? So the fossil record shows Neanderthals present in Eurasia, I think roughly 400,000, 300,000 to 400,000 years ago. My understanding is that people think that there may have been an original split possibly in Africa. The ancestors migrated out. So there was a divergence more than 4,000 years, 400,000 years ago, and then they migrated into that region. That's my understanding. But at the same time, then again, the species preceding modern humans, Homo erectus, we know that Homo erectus left Africa a long time ago. I think 1.8 million years ago there's evidence for that. In fact, the history of the homo lineage is that we like to migrate long distances. There's been multiple migrations out of Africa. So that species, Homo erectus, left Africa 1.8 million years ago and then evolved into different archaic populations. That's probably the origin of Denisovans, possibly Neanderthals as well. I'm not as knowledgeable about that. And then this group that was on the island of Floris, that was like a very short-statured group, with a small head and small brain. And they appear to have, that lineage may have diverged, not be even a million years ago. They might have been on that island for a million years and they were walking around until like 15,000, 20,000 years ago. Okay, thank you. Sarah, you said saying today I hadn't quite appreciated before, which is that, like in the case of Angola, where we know a lot of people came from originally, but we have not captured that genomic diversity because it's hard to get into that country to do the studies. And that probably applies, I would imagine, for a number of other countries, for other applications, especially in Africa. Are there any systematic attempts to try to contact and enlist individuals who have left countries like that, fairly recently? So if we can't get into the country, can we get the handful of people that have come out of that kind to participate? That's a great idea. I'm not aware of any such efforts. I think, of course, in the US, we have a lot of people, recent immigrants from Africa. And that's another problem, again, when people use classifications like black, and that could represent people from all over Africa who have very diverse ancestries. I mean, that'd be a great idea, actually, to try to identify people with that ancestry or coming from that region, I should say. Yes. Oh, I'm sorry. Yes. So I know some of this data that you're going to integrate with the genomics is just hot off the presses, but I'm interested in how you're going to approach the question of causality in these populations where so frequently, ancestry and lifestyle are intrinsically linked. That's going to be the big challenge, clearly. And I think one of the ways we might be able to tease it apart is I was mentioning that for several of these groups of the pastoralists, for example, we have pastoralist people who have a very similar diet, let's say, but they have very different genetic ancestry. That might be one way we can start to disentangle this. It's a big challenge, generally, when you have traits that are associated with ancestry. Because we try to knock that out when we do human-wide association studies, and I worry we're throwing the baby out of the bathwater because I think we're actually, we may be missing important regions because we're overly correcting for structure, but those might actually be the interesting regions. And the other thing though is there's a lot of admixture in many, you know, actually most of the groups in Africa. So I'm thinking through that admixture, we might be able to have some success at trying to map some of these variants that are playing a role in these traits. The other thing for the metabolomics, some of them, I didn't show this, but some of our earlier studies, we were able to find metabolites that were clearly due to environmental exposure. So we found differences between Ethiopians and Tanzanians for something called indica. And it's from a plant called indigo furra. And it turns out that the people in Tanzania, the mausai used the branch as a toothbrush. It's also used for medicinal purposes and it's super common in Tanzania. So that's almost certainly due to an external source, but it'll be interesting to also look at metabolites that are not due to an external source. And we can even infer by looking at ratios of metabolites, you know, pathways, doing a pathway analysis, indirectly infer enzyme function, I guess you could say. Thank you very much for that really interesting talk. There's an observation in the field in which I work, which is organ transplantation, that African-American recipients do much worse, have much worse outcomes, regardless of the organ transplant. So could the variation, the wider variation, genetic variation, account for this? And if so, by what kind of pathways and how would one go about studying this? That's a good question. And I think, you know, so I assume you're saying that even someone who's of African-American ancestry gets a transplant from someone else who's, identifies as African-American ancestry. And you're saying that even then, there tends to be more complications. So again, I'm not that knowledgeable. All I know is that HLA plays an important role. And I know there's a huge amount of variation in HLA. And again, we talked about this individual variation, that what if at the low side, that one individual has more kind of European ancestry than African, that could be one source. But amongst African populations, we know there's a lot of variation. And in fact, that comes out in most of our scans of selection, right? Because HLA of course plays an important role in resistance to infectious disease. And so I'm sure it's a source of, or a target of selection. We just don't know that much about HLA variation. So myself and other groups are interested in characterizing that more. So I think hopefully we'll find out more soon. Yeah. Hi, you mentioned at the very beginning of the talk, how many of these African populations have very different, very wide variations in skin tone, even though they're all greatly exposed to sunny UV rays. Not all. It's quite variable actually. Okay. Had you noticed any difference in functional vitamin D with respect to these differences in skin tone? I didn't show that. I started to measure it. I just did like a preliminary analysis in about 300 people. And I found completely the opposite of what I was expecting to find. So the hypothesis with vitamin D is that when humans migrated out of Africa, so there'd be adaptation for dark skin in Africa for obvious reasons. You know, strong UV exposure, protection against melanoma. Also UV can degrade folate, right? So we could see why there'd be selection for dark skin. Then people migrate out of Africa. They go to regions with less UV. But UV is also important for production of vitamin D. And so there might have been selection for lighter skin. And if I'm, let's see if I remember this correctly. So you might predict then that people with darker skin might have lower vitamin D levels. And when I looked in the eastern African populations, I found the opposite. I found the Nylosaharins, the darkly pigmented individuals, had really high vitamin D compared to the more lightly pigmented individuals that tend to be a little bit more in the north of Ethiopia. Now, I don't know if that's dietary. I don't know if it has to do with, you know, they don't wear a lot of clothes. Many of these pastoralist groups were studying whereas the people in the north are completely covered. I don't know. But I found that intriguing that I didn't see what I was expecting to see. That's interesting. Thank you. Hi, that was great. You mentioned that you take a lot of effort into giving the information back to the communities and they enjoy that. I'm curious to see if there's any topics that you gave back that wasn't well accepted by the communities, such as maybe relationships to other parts of the... Oh, no, no, no. They didn't mind that at all. So I mean, I'm sure a cultural thing because I know there's a lot of concern in Native American populations about ancestry. They don't want to be told that this is your ancestry because they already know their ancestry. They know their story. I honestly found the opposite. And many of the African groups that we worked with, that was what they were most interested in. They were so interested and we would talk about it and particularly my Kenyan student, Jabril Herbo, when he went back to Kenya, he went to 22 different populations and he would sit down with them and they would discuss their oral history and sort of how does this genetic data match or not match with your oral history? And they loved it. I mean, at least the ones we met didn't have a problem with that. We don't return results, obviously. Anything that could indicate paternity, no individual results or that sort of thing. But we keep them updated regularly so we'll have to see for future studies how they feel. Thanks. Hi, Dr. Tyskov. My question is about your opinion on, I don't know what other term to use other than race-based drugs like BIDIL. And as an extension on that, what your opinion is on how the precision medicine movement can avoid the slippery slope of that genre of medicines and further contributing to the polarization of race in this country? So in theory, hopefully I'll do exactly the opposite because I mean, the goal is gonna be base it hopefully more on the individualized genome. I think the most important thing will be to include as much diversity as possible and to look at drugs that are effective in people with all kinds of genetic backgrounds and to target it to the individual depending on what their particular genetic background is of that particular region of the genome that is influencing metabolism. I don't know if that answers your question or not. Yeah, it does. Because my understanding, that's a complex question. My understanding, there were things that had to do with like patenting it and if you targeted it towards this one group then you can patent it. Hopefully we avoid that. And we're just looking at things that are effective in large groups of diverse groups of people. And I'm thinking that the precision medicine initiative is gonna help a lot in that way. Okay, thanks. Thanks again to Dr. Tishkoff. Can we get one more round of applause for her? Thank you.