 Good morning, everyone. Welcome to week seven of our current topic series. Thank you for coming. This week, we're honored to have with us Dr. Lynn Geordi from the University of Utah School of Medicine, where he holds the H.A. and Edna Benning Presidential Endowed Chair in the Department of Human Genetics, and he's also the appointed chair in the Department of Human Genetics. Dr. Geordi received his degrees from the University of New Mexico. His lab studies the evolution of mobile elements and the effects of these elements on the human genome. He's also interested in natural selection in humans, and has identified genes that have helped Tibetan populations adapt to living at high altitudes. Finally, he's used whole genome sequencing to uncover disease-causing mutations and to estimate the human mutation rate. Dr. Geordi served on several advisory panels for the National Science Foundation in NIH, and in 2012, he was elected as a fellow of the American Association for the Advancement of Science. Finally, Dr. Geordi has received 12 teaching awards from the University of Utah, as well as one from the American Society of Human Genetics. I'm pleased to say he'll be bringing that excellent teaching style here at NIH this morning, and I'm sure you'll enjoy learning a lot from today's talk, which is intended to provide you with an overview of population genetics. Please join me in welcoming Dr. Geordi to the NIH this morning. Well, thanks very much, Tierra. It's a pleasure to be here again. And before I start, let me say that I'm happy to entertain questions at any point in the talk. So if something comes up that you'd like to know more about, don't be shy about asking a question. This discloses that I have no commercial interest related to this presentation. So this morning, what I'd like to talk with you about is, first of all, an overview of patterns of human genetic variation, both among populations, because really that is the essence of population genetics, but also now, particularly with whole genome data, we can really dissect patterns of variation similarities and differences at the individual level, giving us, I think, a very different and much fuller perspective of human genetic variation. We'll talk about the implications of our findings in human population genetics for the concept of race, something that I think always stirs a certain amount of controversy and something that I think can be illuminated by our genetic data. We'll talk about linkage disequilibrium, a fundamental population genetic process that has been very important in disease gene identification. Throughout we'll be talking about the relevance of genome sequencing data for these topics. So there are a number of applications of human genetic variation. One is in deciphering human history because really the history of our species is written in our genome, and more and more we have the technology to make inferences about that history. And I'll be giving you a few examples of how genetic data can be used to infer human history going back hundreds of thousands of years. We can infer individual ancestry. I'll give you some examples of that. And this is something that I think is much more informative than traditional self-identified population categories. Genetic variation is used commonly now, as you know, in the field of forensics. Tens of thousands of cases every year are solved using DNA data. So this is a very important and to some extent unanticipated application of basic population genetics, application of things like Hardy-Weinberg equilibrium, linkage disequilibrium to help exonerate the innocent and to convict the guilty. And finally, perhaps most importantly, principles of population genetics are used to find, to identify, and to understand disease-causing genes. And we'll be talking about some of those applications. So of course mutation is the fundamental source of genetic variation in our species and others. We now can estimate the human mutation rate directly by sequencing families. We sequenced a human family from Utah a few years ago and estimated the human mutation rate to be about 1.3 times 10 to the minus 8 per base pair per generation. And there have been now several estimates using families that all come up with about the same number, roughly one in 100 million base pairs per generation for single nucleotide variants. So what that means is that we transmit about 30 new DNA variants each time we make a gamete. And I really like this quote from Lewis Thomas, the science writer, about mutation. He said, the capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would still be anaerobic bacteria and there would be no music. So I think we should be thankful for our mutations because some mutations under natural selection lead to adaptation to a changing environment. Others, of course, cause disease. Another thing we've learned by sequencing families is that the mutation rate goes up substantially with advanced paternal age. We've known for some time that certain autosomal dominant diseases increase in frequency with the age of the father. But now, by looking at sequence information, whole genome sequences in families, we know that we estimate that there are about an additional two mutations each year with each additional year of paternal age after around age 30 as a result of spermatogonia continuing to undergo mitotic divisions throughout the life of the male. So at least three quarters of all new mutations in mammalian species can be attributed to males. So in addition to wreaking a lot of the havoc in the world in general, males also wreak most of the havoc in the genome, at least at the level of single nucleotide variants. So given that these mutations are happening all the time, that we're transmitting them from generation to generation, a natural question is, well, how much at the DNA level, if we look at aligned DNA bases, how much do we actually differ? Well, identical twins are nature's clones. So for all intents and purposes, they differ at none of their DNA base pairs. There are, of course, somatic mutations that cause small differences, but we can say that they are essentially genetically identical. You probably know that for any pair of unrelated humans, we differ at about one in a thousand of our base pairs. And I think that's a very important result because it tells us that at the level of DNA, this most fundamental biological unit, we are 99.9 percent identical. If we compare ourselves to our nearest evolutionary relative, the chimp, we are about 99 percent identical. We are about 99 percent chimp at the DNA level. A mouse, as you would expect, with 70 million years of separation, we differ at one-sixth to a third of our base pairs. And if we look at something very different, broccoli, we are, thankfully, mostly different from broccoli. Well, a small number of differences then, proportionally, only one in a thousand. But because, as you know, we have three billion base pairs in a haploid genome, that means that between any pair of haploid genomes, including the two genomes that you get from your parents, there are about three million single nucleotide polymorphism or variant differences. So actually a lot of variation for evolution to work with. Now, we can put this in context a little bit by comparing the amount of variation in humans with that of other great ape species. And this is a paper published just last year, sequencing 79 great apes. And we see that for humans, on average, there are around three million single nucleotide variants per individual. We compare an individual genome to the reference. For common chimps, it's nearly double. For gorilla, it's more than double. For orang, it's about three times as much. So humans, at least relative to other great ape species, are somewhat depopurate in genetic variation. And what this suggests is that we were founded by a relatively small number of individuals not so very long ago, so we haven't had that as much time, to accumulate variation. Now, another important kind of genetic variation, and one that population geneticists are using more and more, are copy number variants. So here we have a couple of genes, A and B, that exist in extra copy in a genome. And these are often defined as deletions or duplications greater than 1,000, sometimes greater than 500, base pairs. And altogether, they account for a substantial amount of inter-individual variation, each human being heterozygous for at least 100 copy number variants, or more if you define them as being a bit smaller. But another important source of variation, and one that is traced in some cases to the causation of diseases like schizophrenia and autism. So we can also ask the question, we've said how much do individuals differ from each other, we can ask the question, well how much do populations differ from each other? And of course, this has been really a central focus of population genetics for a long time. So I'll show you some data from a fairly widely distributed series of human populations. We've collected many of these over the years, 850 individuals in 40 different populations distributed across the major continents of the world. And of course, there's a substantial amount of phenotypic variation in these individuals, and these are photographs of some of the people that were sampled in the course of these studies. So one of the ways that we can look at variation among populations is with the simple tabulation of allele frequencies. So if we have, let's say, three populations here, and let's suppose for simplicity we're looking at three single nucleotide variants, these are the major allele frequencies, the allele with higher frequency. We can assess variation among populations simply by looking at the frequencies of these alleles and comparing them. And one of the things that we typically do is to estimate average heterozygosity, this is a fundamental measure of variation, so that for each locus, we assess the proportion of heterozygos individuals, typically by direct counting, or we can make a Hardy-Weidenberg calculation, and then we can average that heterozygosity across loci. So one of the ways that we apply this is to estimate a quantity called FST. And this is something used very often in population genetic analysis, and we can think of FST as the amount of genetic variation in a whole population, a whole sample, let's say the whole world, that arises because of differences in populations, rises because of subdivision. So a simple measure of FST is shown here. We look at the total heterozygosity in our sample, let's say all the heterozygosity in humans across the world, the average heterozygosity, and then we subtract from that the average heterozygosity within each subpopulation. So if we divide our populations into continents, we would look at the average heterozygosity in each continent, subtract that from the total and then normalize by dividing by the total. So you can imagine that if this quantity were very high, in fact if there was much variation within populations as there is in the whole sample, then FST would be zero. What that says is there's really no differentiation across human populations. Every subpopulation has just as much variation as the entire population, no differentiation. On the other hand, if all variation exists between populations, in other words if this quantity is always zero, every subpopulation is essentially a clone, then FST would be one. So this is a way of saying how much variation in a sample is due to subdivision, due to the fact that this is not a completely random mating population. So if we look at some measures of FST using different kinds of genetic systems, these are short tandem repeats, these are a couple kinds of mobile element systems, here's a 250 K SNP, what we see actually is very consistent across different kinds of genetic systems that FST, the amount of variation due to subdivision typically runs between 10 and about 15%. We see similar results for sequence data as well. So most of the variation in human populations would be found within any major subdivision, within let's say Asia or within Africa, a little more in Africa, but the bottom line is that if we look at the variation within one major human population, we see 90% of human genetic variation in that population. We only get an extra 10% if we look at the rest of the world. So we are really somewhat minimally differentiated, which I think is another important point with some real social implications. Now we can compare FST in these genetic systems with FST for a measure skin pigmentation, which is highly differentiated across continents, and we see essentially the opposite result, 90% of variation is found between major continents. So for this very visible indicator that people often use to essentially classify populations, there is a lot of variation among populations, essentially the reverse of what we see for genetic systems. And if we now look at some of the genes that underlie skin pigmentation, they also vary tremendously among populations, as you would expect. So here are the tabulation that we did on the samples I showed you earlier with a 250K SNP. Simply to ask the question, well, how many, what proportion of leels are shared among populations? And we divided our populations into Sub-Saharan Africa, Europe, East Asia, and the Indian subcontinent. And what we found with that SNP chip, which of course consists mostly of common SNPs, where the minor allele frequency exceeds 5%, about 80% of the SNPs of the minor alleles were shared in all four groups, 88% in at least three, 92% in at least two, 7% were African specific, and less than 1% were specific to any of the three non-African populations. So the bottom line here is that for these SNPs, with frequencies greater than 5% or so, they are, they typically are old polymorphisms. You have, the polymorphism typically has to have some age to attain a higher frequency. They tend to be shared among populations. And in fact, none of these SNPs were fixed present in one population, fixed absent in another. So none of them could be used actually on its own to differentiate populations. And this is a similar result from the 1000 Genomes data in an earlier version of DB SNP that consisted mostly of common SNPs. This is the Asian 1000 Genomes populations, the European derived, this is actually a sample from Utah, and then African. And most of these SNPs are shared in all three populations. Somewhat more are found in Africa relative to Europe and Asia, but mostly shared. And these are relatively common SNPs, where the average allele frequency difference between populations is right around 15%. But now more recently, we can look at rarer SNPs identified by sequencing. And now you see a very different pattern. Most of these are not shared among populations. They're rare enough so that they arose relatively recently and therefore tend not to be shared among continental populations. And in fact, for alleles, for SNPs where the minor allele frequency is less than 5%, less than 2% of those are actually shared across continents. So it's much, much more common to see population specificity with these rare alleles, which is what we would expect given population history, but a very different picture for the one that we see for the more common SNPs. So we can look at differences among populations using a simple genetic distance measure. And I'll just take you through how we estimate those to give you the basic principle. The simplest form of a genetic distance, if we're estimating the distance between population i and j, is to simply take the absolute value of the difference in allele frequencies. So the allele frequency in population i minus the allele frequency in population j. So if we look back at our little matrix of allele frequencies, our distance for locus 1 would simply be this number minus that one, the absolute value. And then we can just average this over all of our SNVs, we might have a half a million of them or a million of them, to get the distance, the genetic distance between that pair of populations. And you can imagine that this starts to get much more complex to evaluate as we get more and more populations. If we have 50 populations, then we've got a 50 by 50 matrix of genetic distances. So we can use these genetic distances to build a population network that displays similarities of populations. So let's take that first single nucleotide variant. Here are our three populations. And we can subtract piece of one, piece of two from piece of one here. So these two SNV frequencies. And we can take that difference to place a node between populations one and two. And then a commonly used approach then averages these two allele frequencies, the ones from P1 and P2. And then subtracts that from P3, this frequency to give us the distance between these two populations averaged here and the third population. So we can see very simply that populations one and two are more closely related, three is a bit more distantly related. And that's essentially how these networks are built. Now this is kind of a whimsical analysis that my colleague Steve Guthrie did a few years ago just illustrating how you can use this technique to understand not just genetic distances but all kinds of variation. The New York Times published this matrix of disagreements on decisions in the U.S. Supreme Court a few years ago. So it's a nine by nine matrix showing the percent of time that each pair of justices disagrees. This would be just like a genetic distance, except in this case it's a disagreement distance. So you see the justices Thomas and Scalia disagreed only nine percent of the time. Well that makes sense. Whereas Thomas and Stevens disagreed most of the time. Scalia and Stevens disagreed most of the time. But you have to stare at a matrix like this for a while before you can really intuit the pattern. So what Steve did, he was interested in learning some of these techniques, he put this matrix into a program that made a neighbor joining network. And you can immediately see the two wings of the court. Conservative here, more liberal here, and the swing vote justice Kennedy. So these networks can very easily portray relationships among individuals or populations. And that's one of the reasons we like to use them. So here's an application of that technique, a neighbor joining network, using a hundred autosomal ALU polymorphisms. So these are mobile elements that insert into the genome. There are thousands of polymorphic ALUs, or they are present in some individuals, absent in others. We like them for these kinds of studies because we know that if two people share an ALU at a given spot in the genome, then they share a common ancestor in whom that ALU occurred. So these give us essentially polarity. We know that the absence of the ALU is the ancestral state, presence of ALU is a derived state, and they are virtually never precisely deleted. So very good markers of events in population history. So we looked at this series of populations made a neighbor joining network using the techniques I just described, and we see some interesting patterns in a diagram like this. Here are African populations, and we see quite a lot of variation among these populations. Here is a group of European populations, substantially less variation. East Asian, South Indian populations giving us a nice portrayal of human genetic diversity in the old world. And we also see that there's quite a long branch separating these Sub-Saharan African populations from the others. And as I mentioned, more variation here. And the ancestral state, which would be absence of ALU's, is closest to this group of populations, suggesting that this would be the ancestral, the descendants of the ancestral population for modern humans. These are bootstrap support levels telling us that this result is supported 100% of the time, this branch 97%, this branch 97%. So with just 100 polymorphisms, we have really quite good confidence in this result. Now here is a similar exercise done with a 250K SNP chip on 40 populations. And we see very much the same patterns again. Here's a series of African populations. Here are the European populations. Here are populations from the Indian subcontinent in Pakistan. Here you see a fairly long branch length for Native American populations, but branching off from an Asian cluster, as we would expect. And down here are a couple of South Pacific populations, again with a long branch length indicating founder effect as they were founded by a relatively small number of individuals. But a pattern in general quite consistent with what we saw for those ALU polymorphisms. This is a completely different set of populations published a few years ago in Nature where once again we see a very, very similar pattern, both for half a million SNPs, geographic patterning to genetic distances, and also for a smaller number of copy number variants. So the bottom line here is that we see a very consistent picture of human genetic variation regardless of the sampling frame, regardless of the kind of genetic system that we examine. And another thing that we see very clearly from these data is that as we go, if we look at heterozygosity, and in this case we're looking at haplotype heterozygosity, so these are groups of linked SNPs, and we're asking how much they vary. We see the greatest variation in Africa, and then a progressive decline in variation as we go from Africa to Europe to East Asia, and then to the more recently founded Polynesian and American populations. So this is a very reproducible pattern, and what it reflects is what's termed a serial founder effect. So the largest ancestral population being in Africa, a subset of that population going out to found Europe and Asia, so a founder effect there. Another subset of that population going out to found the Americas, so a continued serial founder effect as humans spread across the globe, resulting in less and less genetic variation, essentially, the further we go from Africa. And this is a nice diagram published in a review a couple of years ago that just outlines those major patterns. An out of Africa movement, something like 80,000, maybe 100,000 years ago, then going into Eurasia, and finally about 20,000 years ago into the Americas, very recently into Polynesia. And one of the interesting questions, and something I'll come back to in a minute, is whether these anatomically modern humans, people who looked just like you and me, as they came out of Africa and encountered Neanderthals in Europe, was their mixture with that population? And we'll come back to what the genomic results tell us about that in just a minute. But that's a nice summary of essentially the origins of modern humans across the world. But there are other sources of information on our origins. The supermarket shelf is a good one. So I ran across this at the supermarket of 10 years or so ago. And I was surprised to learn that Adam and Eve's skeletons had been stolen. I didn't even know they had been discovered. But because there were more amazing photos inside, I actually bought this. And this is what I learned. All that's left was Eve's leg. And it looks like the identity of the perpetrator may have been established. That's kind of interesting what you can learn from supermarket tabloids. Well, another way that we can look at genetic variation is through something we call principle components analysis. And we should go through this, because this is a way that genetic data, population, individual data are often displayed now. And what is basically is a data reduction technique. Because imagine that you're looking at 1,000 individuals and you want to assess the genetic patterns, the differences and similarities in those 1,000 individuals. You would have 1,000 by 1,000 matrix to try to explore. We need some way of reducing the variation in that matrix down to something we can actually look at. That's what principle components is. And here's a very simple example. Let's imagine we're looking at height and weight. We can diagram it like this and we can run just a standard regression line through that set of points. And that's the line that accounts for as much variation in height and weight as possible. It's probably a representation of overall size. And then we could run another line through to try to account for the next greatest amount of variation. And that's what principle components analysis does. It takes a huge matrix, in this case, 850 by 850. Each of these dots is an individual. We look at the amount of allele sharing between each pair of individuals. And then we run a line through that multidimensional matrix and ask what single line accounts for as much variation among individuals as possible. And then we plot the individuals along that line. And so the first principle component here, we can see, separates this group of sub-Saharan African individuals from other populations. So consistent with there being a founder event in which a subset of the ancestors of this population went on to found the rest of the world. And if you look at the second axis, it's basically a west to east axis. Europe, West Asia, Central Asia, all the way out to East Asia, with these groups plotting in here closest to their ancestral population. So it's a very convenient way in just two dimensions of representing as much variation in human diversity as we can. Here's a plot for just Eurasian populations. And what you see here is that this creates essentially a map of Eurasia. So here is Northern Europe, Southern Europe, Central Asia, East Asia, Southeast Asia, and then the Indian subcontinent with Nepalese out here distributed quite widely. So this tells us that geographic patterning does affect genetic relationships among populations because for the vast majority of our history, you are much more likely to mate with someone five kilometers away than with someone 5,000 kilometers away. And we still see the signatures of that relative degree of isolation when we look at genetic variation in populations. Over the last few hundred years, of course, this is beginning to change and to break down. And we'll show you some examples of that and how that affects our genomes. But in many cases we can distinguish between fairly closely related populations. So we published this just recently looking at a couple of Tibetan populations. They speak different dialects that are largely discernible from one another on a plot like this. And here are different Mongolian populations here and here, again distinguishable on a principle components plot. So for looking, for example, for population stratification, if we're doing an association study, this kind of a display helps us to determine, helps us to detect stratification in populations. And then we can use the loadings on these axes to essentially control for that stratification if we need to. Now, here's a great example. This was published by Carlos Bustamante's group a few years ago looking at 3,000 individuals from Europe. And what you see here, these are color coded. Each of these is an individual. These are two principle components. They used a 500,000 K chip, looked at allele sharing among pairs of individuals, and this essentially reconstructs a map of Europe. So the countries here pretty much correspond to the locations of the individuals here, although some individuals fall closer to members of other populations. So as a result of gene flow through time, this is not by any means perfect, but they estimated that for the majority of the individuals in their sample, they could trace their birthplace to within a few hundred kilometers based on their genetic profile. So in many important ways, our history is written in our genomes. Now, one thing I like to, I like to compare this plot from 2008 to one that we published now 30 years ago doing pretty much the same thing, but with only 15 loci instead of 500,000. We were not able to look at individuals. You wouldn't have adequate resolution with just 15 loci. So we looked at allele frequencies and populations. But what you see again with just 15 loci is a map of Europe. So it's quite interesting to see this reproduced on a much grander scale and at the individual level with a larger number of populations. So so far I've been talking about data based on primarily on micro arrays, SNP arrays. But as I'm sure you're aware, SNP arrays miss an important part of variation. That is a variation due to less common alleles. They're also typically selected for diversity in a specific population, usually populations of ERP and ancestry. So we worry about biases, ascertainment biases in the data that we get from SNP micro arrays. Sequences, on the other hand, give us information about rare variants. And in most ways we can consider them to be unbiased. So they do permit a number of inferences that simply aren't possible from micro array data. The reason is shown here. This was an early study done by Andy Clark comparing the allele frequency spectrum. So these are alleles with minor count of one, two, three, four through the population, through this sample. This is what you would expect at equilibrium. That is for a constant population, you expect an excess of rare alleles. For the HapMap data, which were based on SNP micro arrays, you can see that there's a real deficiency of these rare alleles because these SNPs were really designed for more common SNPs. And then for two sequence data sets at that time, Perlogen and NIEHS, there was actually an excess of rare alleles over what you would expect at equilibrium. But it's this class of alleles that tell us a lot of things about population history, about population size and about growth rates. So sequence data give us this information that really the micro array data don't give us accurately. One of the things that this allows, this is from the 1000 Genomes data, is an accurate inference of population sizes and migration rates through time for human populations. So these bars represent the size of populations. This is the African founder population. This is the effective size of that population. The estimate here is that about 50,000 years ago, a small piece of that population went out to found Eurasia. And then there was rapid expansion of that derived population. Very very rapid population growth from an initial bottleneck with migration among populations subsequently. So although we think of out of Africa as a single event, it was probably multiple events and there was probably back, there were probably back to Africa events as well, at least to some extent. But with sequence data we can really portray human history much more accurately in greater detail. So here is an allele frequency spectrum like the one I just showed you. Now for 2,400 exomes from the Seattle group and we see again this excess of very rare variants. In fact, more than we would expect in a constant population. What this reflects is population growth and I'll show you an example of that in a second. But one of the interesting findings of this study is that 73% of all protein coding single nucleotide variants and 86% of the deleterious SNVs are very young. They've arisen within the past 5 to 10,000 years as human populations exploded because a growing population does not successfully eliminate these rare variants, including the deleterious ones. And another interesting finding from this study is that we see more deleterious single nucleotide variants in European and Asian populations than in African populations. The reason for that is that European and Asian populations had this incredible bottleneck as they came out of Africa and then expanded very, very rapidly retaining those rare variants including the ones that are deleterious. Not necessarily lethal, those would be eliminated quickly, but other deleterious variants and this from a population genetic perspective helps to explain why we see more rare variants, more deleterious rare variants in European and Asian than in African populations. So to understand why population expansions increase the frequency of rare variants, let me use this little example here. So here we have an individual who has had two children and of course if that individual has received a new variant, a de novo variant from one of his parents, if he has just two children there's a chance with each child that he will not pass on the new variant. So the extinction probability when he has only two children is one half times one half or one quarter. So there's a good chance that that new variant is simply going to go extinct in one generation. On the other hand, let's say he's from Utah and he's got 10 children, now the extinction probability goes to one half to the tenth. In other words, the chance is that only one in a thousand that that allele will go extinct in this generation. So this would represent a rapidly growing population and if this person's descendants also have a lot of descendants, that extinction probability is low. So for rare variants, for variants that arise in a time of rapid population growth, they tend not to be eliminated simply because the extinction probability in any generation is really quite low and that helps to explain why we see this excess of rare variants in human populations and particularly in human populations that have undergone a bottleneck and extreme expansion. Now I said we'd come back to the issue of mixture with Neanderthals because people are naturally interested in this. As our ancestors came out of Africa, did they mix with Neanderthals? And the separation of human and Neanderthal ancestors took place something like 300,000 or 400,000 years ago, but the question is when these populations were near each other some 50,000 to 60,000 to 70,000 years ago, was there gene flow between them? And we now have actually very good evidence from nuclear sequencing of Neanderthal skeletons that about one to three or four percent of modern human DNA has Neanderthal origins, but only among non-Africans. So that as humans went out of Africa, encountered Neanderthals probably first in the mid-east, there was a small amount of mixture. So instead of the African replacement hypothesis, we now refer to a leaky replacement hypothesis. Neanderthals were mostly replaced, but probably not entirely. And in fact we see Neanderthal DNA in pretty much all non-African populations. And one of the interesting questions is could some of the shared sequences have adaptive significance? And there is now some evidence based on surveys of the 1,000 genomes data that in fact they do. For example, genes that encode keratin filaments appear to have been selected for in these Neanderthal modern human mixed populations. So here is a plot showing probability of Neanderthal ancestry in CEU Europeans, CHB East Asians, and in Sub-Saharan Africans. You can see that there are sections in this individual that are very, very likely, almost 100 percent probability of Neanderthal origin and in this European individual, whereas for Sub-Saharan Africans typically you see no evidence of Neanderthal contribution. So another interesting application of the 1,000 genomes data, searching for Neanderthal genes, searching for those that may have been selected for adaptation in this new environment as populations were coming out of Africa many thousands of years ago. And of course you can send your DNA into some of the direct to consumer testing companies and they will estimate your portion of Neanderthal genes, typically between 1 and 3 percent. So this is finding your inner Neanderthal as they say. So one of the interesting questions that arises as we're looking at population similarities and differences is what can genetics tell us about the concept of race? And I put this in quotes because it's a term I personally don't use in writing, but certainly it is used and I think often misunderstood. Here's a quote from an editorial in the New England Journal in the last decade stating quite unequivocally that race is biologically meaningless. There was a response in the New York Times by Sally Sattel, a psychiatrist who said, I am, and this is deliberately provocative, I am a racially profiling doctor. Her argument was that self-identified population affiliation gave information about response to some of the drugs that she prescribes as a psychiatrist. So the question is, how useful a concept is this and what can genetics do to illuminate our understanding? Back about 10 years ago, this article made the cover of Scientific American, Steve Olson, a science writer, and Mike Bomshod, my colleague, who are the co-authors, and the question was, does race exist? And according to Scientific American, science has the answer. I always get suspicious any time they say that science has the answer, but I think science, genetics, can give us at least some insight. So we can start by looking at DNA sequence differences among individuals. And we've kind of gone over this concept that if we have DNA sequences, and I thought I would use some political figures for this example, let's say we have a sequence from Rick Santorum, Mitt Romney, Hillary Clinton, and I almost hated to put him in, but John Edwards. And our question is, how different are they? We can make a matrix of DNA sequence differences. We see that Romney and Santorum differ at just two bases here. Clinton and Santorum differ at five, and we see that Edwards and Santorum differ at six. Edwards and Clinton at only one. This is a hypothetical example, but now we can put this pattern in a tree, a network, as we did before. And again, we see some very discernible patterns, a clustering. So we can do the same thing with real DNA sequence. And we did this with a sequence at the angiotensinogen locus some time ago. It's 14 kb of sequence, so a relatively small amount of sequence. But we're asking the question for these major population groups, Asia, Europe, Africa, how similar are people to one another for the DNA at this locus? And what we see is that for this gene, sometimes an individual of African descent is actually more similar to people from Asia or Europe than to others from Africa. Now partly this reflects the fact that this is a relatively small amount of genetic variation, but it says that for any given gene, it's very difficult to trace population origin from that gene. And conversely, if we know your population origin, we can't predict necessarily your genotype or genotypes at that locus. This also reflects the sharing of DNA that has gone on through the history of our species, because human populations have mixed and migrated fairly extensively throughout our history. And the mosaic patterns that we see in many of these diagrams are a reflection of that. This is actually something that Darwin himself was aware of. He said a long time ago that it may be doubted whether any character can be named which is distinctive of a race and is constant. In other words, characters tend to be shared across populations, and any single character is not going to delineate a specific population group. So we then took that same group of individuals, and we used about 200 loci, and again made a diagram. And now you see that for these individuals, and they're from Africa, Europe, and East Asia, so they are geographically the group somewhat separated, but now every individual falls into a group that is consistent with their continent of origin. Now one of the things you notice here is that the lengths of these branches are much, much greater within populations, and that's consistent with that FST estimate that says that most differentiation, most variation occurs within populations, but there is enough between population variation here so that we can begin to see a pattern according to ancestry. And that may seem a little bit paradoxical when you compare this to the diagram I just showed you for angiotensinogen, but it makes sense if you think in terms of this being a lot more information about ancestry, about population history. So in a way it's like looking at, say, just height in males and females. If we measure everyone's height and try to determine sex from that, we're going to be wrong a lot of the time, but there is, on average, a difference. If we add another characteristic, like waist-hip ratio, well then we have more accurate separation of our two groups, and the more characters we look at, the more accurately we can discern these two different groups. So that's essentially what we're saying is that with more genetic information, we can more accurately discern the histories at least at a very basic level of these continental populations. So here's another example now using more single-nucleotide variants, and you can see in this neighbor joining network it appears that there are groups, and in fact they do correspond to various worldwide populations. These are New World populations, Asian populations, African, a Spanish population, a South Indian population. But we shouldn't get too misled by this because we can add populations with a more complex history, such as African Americans, where some fall into this group with African populations, others trend toward other groups because of the complex history of this population. The same thing if we look at, say, Puerto Ricans, who again have a complex history, complex ancestral history, where some fall in with a Spanish group, others fall in closer to an African group. So the point here is that especially as human populations become more mobile, it's very difficult to classify every individual into a nice, neat category. Here's a similar exercise that my graduate student Wilford Wu carried out a year or so ago with the complete genomics data. So this is whole genome sequence, and we see very much the same kind of pattern, where in general individuals, and these are individuals from the Thousand Genomes Project, sequenced by complete genomics, and we can see that in general these population groups do tend to fall together. But there are interesting exceptions. For example, individuals from Mexico are distributed in various places throughout the graph, once again illustrating their complex demographic history. Another thing Wilford did that was kind of interesting, just from a genomics point of view, was to compare, he included here the same subject sequenced in the Thousand Genomes database with their sequence in complete genomics, and on average the between-platform differences were about 348,000 variants. A lot of that has to do with relatively low coverage in the Thousand Genomes database, so we would expect it. It's actually kind of encouraging that each of these pairs, which are the same individual on two different platforms, did at least cluster together. But you can see that between-platform difference quite clearly in this slide. Here's just one more example of the point I'm making here. This is a principal components plot for American populations of African, European, Asian, and Hispanic descent. Again, you see that some individuals, for example, of African descent are closer to members of other populations than they are to many of the other individuals of African descent, so very difficult to put a self-identified group into a nice, neat little compartment. What this tells us is that if we look at multiple polymorphisms, if we look at a lot of SNPs or single nucleotide variants, if we look at enough of them we can often learn something about population affiliation, kind of the non-overlapping parts of these circles. But the converse, and this is where people sometimes get confused, the converse is not true. If we know your population affiliation, we can't predict your SNP genotype because these populations typically differ just in frequency of SNPs and there's a lot of overlap. So I think that's a very important point that we need to make, especially to the general public. And it really points up, I think, the fallacy of thinking typologically, which is what racial categories tend to be. Humans really don't fall into discrete groups like this. What the genetic data tell us is that there's a tremendous amount of overlap in genetic information across human populations. So here's a good example of that, or also of how self-identified population affiliation can be misleading. Wayne Joseph was a principal in the school system in California. He was raised in a family in Louisiana that was self-identified African-American. He sent his saliva into a direct to consumer testing company and this is what he learned, that at least according to their estimates he was 57% European, 39% Native American, maybe 4% Asian, although that could just be an error term, but no apparent African ancestry. So in his case, his self-identified population affiliation appears to have been really completely wrong. Now this didn't change anything importantly for him. Culturally he maintained his same affiliation, but it shows how that self-identified affiliation can be wrong, can be misleading. So I think a much more useful than the concept of race is individual ancestry, because we can now estimate genetic ancestry for individuals, at least at a broad level, and someone with this apportionment of ancestry would likely self-identify as African-American as very likely would summon with this. And yet their ancestries and their genetic makeup could be really quite different, and that's why I think it's much better really to assess ancestry at the individual level rather than to use these categories. I'll just give you an example from my own genetic testing because I sent my DNA into one of the companies, I guess this was 23 and me, a few years ago, and they will assess your paternal and maternal ancestry. This is based on Y chromosomes. So I have this particular Y haplogroup, I1 star, and it was kind of amusing to learn I share it with Jimmy Buffett and Warren Buffett. They don't know that, and it hasn't done anything for my singing or my investing ability, but my grandparents all came from Norway, so this is consistent with what I know about my ancestry. My maternal line, my mitochondrial DNA again, the haplogroup I have is quite common in Europe, fairly widely spread throughout Europe, so that again makes sense. And then using ancestry informative markers across the genome, they attempt to essentially paint your chromosomes with ancestry, and I was hoping that I would have something exotic, but according to this at least, my ancestry derives 100% from Europe. I was hoping that my kind of rambunctious Viking ancestors might have brought something interesting into the genome, but it doesn't look like that's the case. But here we're looking at the ancestry of a Berber female from North Africa, so this is an African, but where 86% of the ancestry is predicted to be European derived. And we see quite a lot of mixture in that ancestry, even more so for a self-identified African American. And the important point here is that for this individual, some regions of the genome would be African derived, other regions of the genome would be European derived. And if we're interested in disease susceptibility that is genetically related, what we really want to do is to look at those specific regions and look at their genetic makeup rather than assessing a self-identified population affiliation. So for biomedicine, I think these findings do have some important implications. First of all, they tell us that if we look at a large number of independent polymorphisms, we can learn about population history, we can learn about ancestry. But, and very importantly, these variants typically differ only in their frequency and they typically overlap a lot among populations. And here's an example of that. This was a study done on response to ACE inhibitors in African American and European American populations, very large meta-analysis, and it addressed the issue or the question of whether African Americans tend to respond less to ACE inhibitors for lowering blood pressure than European Americans. And what we see here is that the decrease in blood pressure and systolic pressure in response to ACE inhibitors is a few millimeters less in African Americans than European Americans, but that there's a large distribution here, a large amount of overlap, and so as you can see, many of the African American patients would respond better to an ACE inhibitor than would many of the European American populations. So far better than using this average difference as an indicator of who should get this drug would be much better to be able to look directly at genotypes in individuals to predict response. And we see a good example of that with EGFR inhibitors and non-small cell lung cancer. So EGFR inhibitors like Gephininib and Erladinib inhibit tyrosine kinase activity, and they're estimated to be effective in treating this condition in roughly 10 percent of Europeans, but a higher percentage of Asians. So one might imagine using population affiliation as an indicator for who should get this drug to treat non-small cell lung cancer. But it's interesting that if you look at somatic mutations in EGFR, these are gain-of-function mutations, we see those in about 10 percent of European patients with this condition, a higher percentage of Japanese and other Asian individuals, and in fact 70 to 80 percent of those who have the mutations respond to the drug, fewer than 10 percent of those without respond. So you can see that looking at the gene itself, looking for gain-of-function mutations is a much better indicator of who is going to be a good responder than is the more blunt population category. This is one more example of that response or the calibration of warfarin dosage. So this is a standard clinical algorithm that takes population affiliation into account, but here are the results of looking, of doing genetic testing for V-core C1 and CYP2C9 that are both involved in warfarin metabolism, and here you see a much, much bigger difference between this genotype category and this genotype category than we do across population. So again individual testing giving us a much better prediction of response than the use of population affiliation. So what this, I think, tells us is that genetic variation we've seen is correlated with geographic location, but it tends to be distributed continuously across space. It's difficult to delineate specific borders or boundaries between populations. So race, going back to a question raised earlier, it may not be biologically meaningless, but it's biologically very imprecise. It is a very blunt tool, and we can use better tools, genetic tools, to infer individual ancestry, and that I think will provide more medically relevant and useful information. And I want to go on now to the topic of linkage disequilibrium, but everyone has been sitting very still for about an hour, and so I'd like to invite you to just stand up and stretch for a minute before we do the last half hour of the lecture. So I think it's a cruel and unusual punishment to make you sit here for 90 minutes. It violates your Eighth Amendment rights. Oh, yes, absolutely. Not on. Is it on? Okay. How did you define what were Neanderthal genotypes? Okay, that's... Since there are no Neanderthal around. Oh, but several have been sequenced. From frozen material, as it were? Well, it was... I'm not so sure about the exact pervenience, but there were several Neanderthal specimens have been sequenced, including one at 42X coverage. So that has given a baseline for the Neanderthal genomes. And they're taken from geographically somewhat diverse areas? Yeah. One was in Croatia, another much further east. I don't remember the exact location. But the point is they're closer to each other than to anything else. Yes, the sequences are... The Neanderthal sequences are much more similar to each other and quite divergent from human. Another question on the findings of much greater diversity, genetic diversity within Africa than other populations. How much of that is due to population substructure within Africa? FST values between different populations in Africa? Yeah. So there is, as you'd probably expect, more substructure as we look across Africa. That population having been resident in Africa longer and having more time to subdivide and differentiate. It also has a larger effective size. And the larger the effective size of a population, the more variation you see. So in all the different kinds of systems we've looked at, you tend to see about 20 to 25 percent more variation in samples of persons of African ancestry than in non-African ancestry, those of non-African ancestry. Okay. Well, it looks like everyone has sat back down. So we'll go on to talk about what I think is a very interesting application of a population genetic concept linkage disequilibrium to disease gene mapping. Let me ask, how many of you are familiar with the concept of linkage disequilibrium? Okay. I see just a few hands. So let's go through this because this turns out to be very important for our understanding of not just SNP data but also genome data. So basically what linkage disequilibrium is, it can be described as the non-random association of alleles at linked loci. So let's imagine that we have here two loci, A and B, and their alleles are big A and little A. At equilibrium, we're going to see all possible combinations. But in disequilibrium where there's a non-random association of alleles, we see big A and big B together, little A and little B together, but very seldom do we see the other combinations. And that in essence is what we mean by linkage disequilibrium. Now we can quantify this by looking at the allele frequencies of big A and little A, 60 and 40 percent, big B and little B, 70 and 30 percent. Now what we would expect under equilibrium is that in this population, haplotypes having this combination would be seen 42 percent of the time, that is the frequency of big A multiplied by the frequency of big B. That's essentially random association, very much like Hardy-Weinberg, except now extended to two loci. Similarly we would expect the frequency of big A and little B together on the same chromosome copy, the same haplotype, to be 18 percent, 60 percent, times this frequency, 30 percent. So that's what we would expect under linkage equilibrium, but let's suppose we assay a population and we see that we have a real excess of this haplotype and an excess of this haplotype and then a paucity of the other two haplotypes. Well that would be linkage disequilibrium. We're finding these alleles in combination much more often than we would expect given their frequency. So what this suggests most of the time is that the alleles that have higher linkage disequilibrium have had less opportunity for crossover to occur between their respective loci. So over many generations we're going to find big B and big C together on the same haplotype more often than big A and big B because being further apart, these two loci will have had their alleles broken up by recombination much more frequently than this pair. So what that implies is that we can look at linkage disequilibrium patterns as a way of inferring how close together any two loci are. It's another way of doing linkage analysis. But it has some advantages. We don't need family data necessarily. If you're doing traditional linkage, you're of course counting recombinants from one generation to the next. We can use microarrays or sequence data so we can look at a large number of single nucleotide variants spaced as closely as a KB or so. And we can do association studies that effectively incorporate not just the last two or three generations of recombination to map loci but essentially all of the generations of recombination that have occurred since a variant arose. Because really for any given mutation populations are in essence just one big pedigree. So if all of these individuals and these different families inherited a mutation from this founder back here, what linkage disequilibrium allows us to do is to look at recombinations that have occurred between this mutation and nearby SNPs throughout the generations. So in principle it allows us to more finely map loci than we could map if we were just doing recombination mapping linkage analysis in say three generations of a family. So that's the advantage of linkage disequilibrium. And that's one of the reasons why if you look at the number of papers published over the last 30 years on linkage disequilibrium back in the 1980s, this was when I first became interested in that topic, there were about 50 papers a year published on linkage disequilibrium. You could read a paper a week and you knew everything that was going on. Now this is kind of plateaued but at around 12 to 14, 1500 papers a year are published on this topic. So it has gained a lot of interest, a lot of popularity as a gene mapping tool. But there are a lot of factors that can influence linkage disequilibrium patterns. One is chromosome location just as with recombination. Because recombination is more common near telomeres, the relationship between linkage disequilibrium between two loci and their actual distance is going to vary. We also know that there is less recombination within genes than in extragenic regions that will again affect the relationship between linkage disequilibrium and physical distance. Sequence patterns affect recombination and therefore linkage disequilibrium. So GC content associated with more recombination, presence of inserts like ALU's associated with more recombination. We know now of recombination hotspots every 50 to 100 KB in the genome, in particular motifs that are bound by this zinc finger protein PRDM9 or associated with a high proportion of hotspots. It's interesting that there is more variation in PRDM9 in a repeat unit in PRDM9 in African populations than non-African populations. One of the reasons why we see more recombinations in African populations, that and their population history. And finally, the evolutionary factors that we're interested in in population genetics, natural selection, gene flow, mutation, and genetic drift all influence the patterns of linkage disequilibrium. So linkage disequilibrium can be rather complex to interpret. Here's an example, the age of a population. And of course populations really all have the same age, but when we talk about an older population, we're really referring to a population that was founded longer ago, like the current African population. And in such populations there have been many generations for recombinations to occur. So that means that there will be a lot of different haplotypes in relatively smaller blocks in a population like that. The other hand, if we look at, for example, a Finnish population, which most of which was founded relatively recently from a small number of individuals, there haven't been as many generations passing for recombinations to have occurred. So we tend to see larger blocks of haplotypes more disequilibrium. And that means that a mutation here will be associated with more nearby polymorphisms, even after many generations, even in modern populations, whereas a mutation that occurred in this population will tend to occur in association with a smaller number of polymorphisms. And if we look at patterns of disequilibrium in these populations, we're looking now back at the angiotensinogen locus, and each of these little units here is a snip at that locus. And we can interpret this plot much like we do a mileage chart. For those of you that remember mileage charts from atlases, this might be, say, San Francisco, this would be New York, and here would be the distance between San Francisco and New York. If this were Los Angeles, then this would be the distance between San Francisco and Los Angeles. Well, for these snips, this is the amount of linkage disequilibrium between these two snips, this pair of snips, and this is the amount of linkage disequilibrium between these two rather distant snips. Red indicates high disequilibrium, and what you see here is much more disequilibrium in this locus in the more recently founded Eurasian population than in an African sample, so consistent with what we know about population history. So, one of the questions that we want to ask is, well, how general are these patterns across the genome? And how much does linkage disequilibrium vary with genomic location and with population? And I would say that about 10 or 12 years ago, our knowledge of that, of haplotype structure across the genome, was kind of like this map of the world in 1544. I think these maps are fascinating. At the time, Europe was reasonably mapped out, Asia to some extent, North America was not even on the map, so it was a pretty low resolution and misinformed map of the world. Well, the HapMap project, which I know all of you have heard about, really sought to create a better, more accurate map, haplotype map of the human genome. So it started with 600,000 SNPs that was expanded. The populations were three, 90 Utah SEF individuals representing people of European ancestry, 90 Yorubans from Nigeria, and 90 East Asians. By no means a complete sample of human diversity, but a small sub-sample. And the idea was to evaluate patterns of linkage disequilibrium and haplotype structure across the genome in these different populations. And I think the result was a map that looks more like this. By 1688, we had a much better resolved map of the world, somehow California is still escaped notice here, but for the most part, we had a much, much better map of linkage disequilibrium. And this has led to some very useful applications. First of all, understanding human genome-wide haplotype diversity, detecting recombination hotspots, detecting genes that have experienced strong natural selection, and of course, detecting disease-causing mutations. And in this last part of the talk, I'll go through a few examples of those. Only one of the take-home messages from that project was that SNPs, many SNPs throughout the genome, are redundant. So if you have this SNP here, then you almost certainly have these alleles here. So these TAG SNPs are really all we have to genotype. The others are effectively uninformative because they're in strong linkage disequilibrium, meaning that we don't have to type nearly as many polymorphisms to get complete coverage of the human genome, more in individuals of African descent, but still far fewer than the total number of SNPs that have been discovered. And that has led to this success story, I think, and you've all seen this slide or some version of it, the many, many hundreds of replicated associations across the genome using SNPs designated from the HAPMAP project. Now, it also, these data also allow us to detect hot spots of recombination, because what we will see often is blocks of linkage disequilibrium from this group of sequence where there are strong associations among alleles, but no association between essentially this block and this block because of a recombination hotspot, where recombination is elevated at least 10-fold over the rest of the genome, rapidly disassociating those groups of alleles from one another. And of course, that is going to influence our estimates of distance among loci. If there's an intervening hotspot, we're going to have unexpectedly low linkage disequilibrium. So we estimate that there are as many as 50,000 or so recombination hotspots throughout the genome, and that about 60% of all recombination occurs in just 6% of the genome, much of it focused at these highly active hotspot areas. And what's very interesting is that hotspots vary among species. In fact, in chimp, the location of hotspots completely different from that of humans. PRDM-9 is not active in chimps, so that explains part of it. And we also see variation even among human populations in the location and activity of hotspots. So this is really helping us to understand this very, very important property of genomes, how frequently and where they shuffle and recombine. Now another thing that these linkage disequilibrium patterns allow us to do is to detect regions that have undergone very strong natural selection. And the idea is diagrammed here. If we imagine a new DNA variant arising on a haplotype background, it will slowly, if it's neutral, that is, if it does not undergo natural selection, it will in some cases slowly increase in frequency. But as it does so, that background haplotype, that is, the other snips associated with it, becomes smaller and smaller due to recombination that's occurring generation after generation. So for a neutral variant, if it attains high frequency, it will have relatively low disequilibrium with other nearby snips because of recombination. But now imagine that this is an advantageous variant, that it sweeps very rapidly to high frequency What it's going to do is to carry those other variants along with it, also at high frequency, and we're going to see long regions of homozygosity in populations because of selection not only of this advantageous variant but of nearby snips. So we can look for regions that have this signature as a way of detecting snips, detecting variants that have undergone very rapid positive selection. So this is another illustration of the idea. If there's positive selection for this variant, it will pull the adjacent variant along with it, that it's associated with here, and after a while, most, maybe all, members of the population will have this combination of variants. You'll see a region of homozygosity here. We can compare that, for example, to purifying selection where variants occur, but because they're deleterious, they simply get eliminated. So this approach has now been used in a number of very interesting applications. For example, to show that the variation at the G6PD locus was selected for very strongly for malaria protection. This cytochrome P450 locus underwent selection for sodium retention. A very interesting story, the lactase enhancer populations, independent populations, some in Europe, some in Africa that are herding populations, have hereditary lactase persistence so that they can digest milk throughout their lives. There's an enhancer element that has undergone strong selection in those independent populations. A good example of convergent evolution just within the last 10,000 years. Several skin pigmentation loci that have again undergone rapid selection as humans encountered new environments, and Tira mentioned work that we and others have done on high-altitude hypoxia response in Tibetan populations, because Tibet is one of those great, essentially natural experiments done on humans where humans live, move to an altitude of 15,000 feet or so, and successfully adapted in part by altering their response to high-altitude hypoxia. So we've discovered selection and now specific variants at these members of the HIF pathway that help to confer that high-altitude adaptation. So these were all discovered by exploiting these properties of linkage disequilibrium in populations. Now I'll say that population genetics is also guiding the development of sequence analysis. As we are now analyzing more and more exomes and whole genomes, the 1000 Genomes Project provides a very useful set of control sequences because whenever we sequence a group of patients, one of the questions is if we find a variant in that group, is it a variant that is absent in other populations or at least very rare? And the 1000 Genomes Project has given us one of the important sets of control sequences for that kind of variant analysis. And I think we need our population genetic analysis to inform us about the nature, the behavior of rare variants because these rare variants often are the ones that we are especially interested in in terms of their association with disease now that we're able to do whole genome sequencing. And evolutionary principles, population genetic principles help us to determine when a variant is actually functionally significant because we can find associated variants but figuring out which ones actually have functional relevance is, in many cases, quite a challenge. So we incorporate, and others do this as well, purifying selection. We look at regions that have undergone purifying selection as a way of prioritizing candidate variants when we're doing genome analysis. I'll just mention this software that we've developed in the last few years, VAST, and now Pedigree VAST, which has just come out. So this is a tool for analyzing sequence data. And Pedigree VAST makes use of sequence data in pedigrees. So that's just coming out in nature biotechnology. But one of the things we use is evidence of purifying selection to assign functional significance. And, of course, evolutionary conservation among species, again, very useful in deciding which variants might actually have functional relevance and significance. So I'll just wrap up by saying that what I hope you've seen today is that genetic variation does contain a lot of useful information about the history of our species. I think it gives us a more subtle and nuanced view of issues like race and how relevant they are or are not to medicine. I think it gives us some useful alternatives. And, really, population genetic analysis, our understanding of concepts like linkage disequilibrium, has actually been a fundamental significance in gene mapping efforts. And now, as we're trying to understand the role of rare and common variants in disease, again, understanding the evolutionary processes that give rise to those variants is turning out to be of key significance. And I hope you've seen the population genetics, which sometimes people associate with a lot of heavy math, is actually fun. So I hope I've convinced you of that. I'll leave you with this picture of the lovely Wasatch Mountains. This is my backyard where I enjoy playing. And here are some of the people that contributed to some of the work I told you about. And I want to thank you for your kind attention. And I'm happy to take any questions. I've got about three or four minutes here. Yes, sir. Like plants in microorganisms, do humans have significant numbers of mobile elements transposons and such, and how does this complicate genetic analysis? Yeah, that's a great question. We estimate that at least half of our genome is derived from mobile element insertions. So if you look at allus, it's about 11 percent. There are more than a million 300-base allus in the human genome, mostly inserted earlier in the course of primate evolution. But some of them, since humans diverged along their own independent lineage, another 17 to 20 percent are line-1 elements. And one of the interesting questions is what effect these have on the genome? We know that occasionally you can have, for example, transduction of other genetic elements as an L1 pops out and goes someplace else. Sometimes it takes other material with it because it has a rather weak poly-A signal, so it is sometimes involved in the transfer of other genetic elements because these are highly methylated sequences. They're CG rich. They may affect gene regulation depending where they land. So we think that they do occasionally have effects on the genome. And of course we've got some very good examples now in which these elements have inserted into a specific gene and caused loss of function. And there's some good examples in which they mediate unequal crossover. The BRCA1 gene is full of ALU elements. And that's one of the reasons why you see so many deletions in BRCA1 is that these ALUs are mediating unequal crossover and causing deletion. So they do have some interesting effects. And I think because they've been difficult to identify easily, they have been somewhat challenging to understand. But a lot of work is being done in that area. Other questions? Okay, well, thanks very much.