 Good morning, everyone, and greetings to those of you joining us through the live feed. This is week seven of the course, and it's my pleasure to introduce to you today Dr. Lynn Geordi, who is a professor of human genetics at the University of Utah School of Medicine. Dr. Geordi's research interests are in the area of gene mapping and in evolutionary genetics. His group is currently analyzing variation in genes in the renin angiotensin pathway with the goal of hoping to better understand the role of these genes in susceptibility to hypertension, and his work also focuses on a number of other disorders, such as the genetics of schizophrenia, polycythemia, juvenile idiopathic arthritis, and inflammatory bowel disease. The focus of his evolutionary genetics research is on analyzing worldwide genetic variation in human mitochondrial and nuclear DNA, focusing primarily on mobile elements. The goal of this work is to understand better the geographical origin and migration of man and how these data might be used to determine the relevance of race, race and quotation marks in biomedical settings, which Dr. Geordi will be talking to a little bit about this morning. Finally, Lynn is one of the most wonderful lecturers that I know, and so I'm very sure that you're going to enjoy today's lecture and learn a lot from him this morning. So today's lecture, again, is intended to provide you an overview of the field of population genetics, and it's my pleasure to introduce to you, once again, Dr. Lynn Geordi. Well, thank you very much, Andy. It's a pleasure to be back here again. So this morning, I would like to introduce you to the field of population genetics. My talk will be divided into three parts. We'll talk about patterns of human genetic variation, both among human populations and among individuals, which we can now look at with some precision. We'll talk about the implications of this work for concepts of race, something that is controversial and something that I think is illuminated by our genetic studies. And then finally, in the third part of the talk, we'll discuss how population genetics, evolutionary genetics, informs our understanding of things like linkage disequilibrium, the HAP map, its design, and our continuing search for genes underlying susceptibility to complex disease. So, of course, the story starts with mutation, the generator of genetic variation. And we estimate, based on phylogenetic analysis, that the human mutation rate is about 2.5 times 10 to the minus 8 per base pair per generation. What that means is that we transmit about 75 or so new DNA variants with each gamete. Now, I should say that some of the new genome analyses of families are suggesting that this rate is overestimated and that the actual mutation rate may be about half of this, so it'll be interesting to see as those studies come out where the mutation rate finally lands. But we think now it might be roughly half of the usually cited phylogenetic estimate. But here's a quote from Lewis Thomas that I like very much. He said, the capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would still be anaerobic bacteria and there would be no music. I think that's a lovely quote, and it reminds us of why we should be thankful for our mutations. Well, one of the questions that we can ask as we look at DNA variation in individuals, in populations, in species is how much at the DNA level do we differ? If we look at aligned DNA-based differences, identical twins being nature's clones, of course, have, for all intents and purposes, zero DNA sequence differences. A famous figure now is that if we look at any two unrelated humans for aligned DNA-based differences, we vary at about only 1 in 1,000 DNA bases. Several times more than this, if you include copy number variants, but for alignable sequence, we are, as it has been said many times, now 99.9% identical. If we compare ourselves in the same way to our nearest neighbor, the chimp, we differ at about 100 base pairs from the chimp. So in a sense, we are 99% chimp. If we compare ourselves to mouse, about 1 in 3 bases differs comparing human and mouse. And fortunately, if we go out substantially further, we are pretty different from broccoli. But what this means is that given that we have 3 billion DNA base pairs in haploid genome, that means that there are about 3 million differences between each pair of humans. So a tremendous reservoir of genetic variation that accounts for the diversity that we see in a room like this. So we can ask the question, well, how much do populations differ? And that'll be the first area that we look at this morning. And here, we see a map of the world with the populations that I'll be talking about designated. So we've been trying to sample more and more extensively across the world, and in collaboration with the Sorenson Molecular Genealogy Foundation in Salt Lake City, who have collected more than 100,000 DNA samples from all over the world, we've been able to fill in a number of gaps as we look across the world. So I'll be telling you about variation in nearly 1,000 individuals representing 40 human populations, and really quite a diversity of individuals, either some of the photographs and some of the subjects that I'll be telling you about. So to assess variation in populations, the standard approach is to look at allele frequencies. So if we imagine that we have three snips, single nucleotide polymorphisms that we're analyzing in three populations, we can see that there are differences in the frequencies of these three snips in populations 1, 2, and 3. And we look at those differences. We look at that variation to assess patterns of similarity among populations. This is one of the very few equations I think one of two equations I'm going to show you. So I'm not going to torture you with mathematics this morning. And this one is pretty simple. But it shows how we estimate a statistic called FST. It's very commonly used in population genetics to assess variation between populations. So FST is the amount of genetic variation that is due to population differences. And we get it by looking at the total heterozygosity. It's the total variation across all of our samples. That's H sub t. And from that total, we subtract this quantity, which is the average heterozygosity within each subdivision. So we just look at the heterozygosity within populations, in this case within continents. And we say, how much heterozygosity is there on average within our populations? And then we standardize by the total. So if FST is zero, that means that all of the variation that we observe exists within populations. That is, the average within subdivisions is equal to the total. There's no difference between populations. On the other hand, if FST is one, then all of the variation exists between populations. And the only way we could get this would be for H sub s to be zero. In other words, all of our populations, or in this case, continents, would consist of identical clones, no variation within subdivisions. So FST of one, maximum variation between subdivisions, FST of zero, none. So if we look at this statistic in a series of polymorphisms, in the samples that I've just been telling you about, if we look at STRs, restriction site polymorphisms, we've also looked at ALU insertion polymorphisms, L1 insertion polymorphisms, and a 250K SNP array, we see that the FST value for our human populations is pretty consistently somewhere between 10 and 15%. In this we see in many, many studies different kinds of DNA polymorphisms, different sets of populations, but typically roughly 10 to 15% of genetic variation can be ascribed to differences between these major populations. And that's a relatively small amount, telling us that there isn't that much variation between human populations for these largely neutral DNA variants. Now we can compare that with skin pigmentation, something that has often been used in classifying populations, and we see the opposite pattern. There 90% of variation exists between populations, very, very different pattern from what we see when we look at actual DNA variation. So these are traits that have been strongly selected in human populations for difference and divergence. So if we look at just the three original HapMap populations, the European derived population CEU, the African YRI, and the two Asian populations, we just look at those clearly that's a limited sample of human diversity. And it gives us a relatively high FST value, about 15%. If we start to sample more populations, FST tends to go down. That is less variation between populations as we sample more of the world's diversity. So here, looking at 27 populations, FST has gone down to 12%. And here, looking at 40 populations, it goes down to 11%. So the important point here is that as we sample more evenly across the globe, this level of differentiation, the apparent level of differentiation tends to go down. It's not gonna go to zero because of course there is variation among human populations. But importantly, it can be overestimated if we sample selectively from specific populations. So another way of talking about the extent to which we're similar is to simply ask, well, what proportion of SNPs are shared among populations? And here we're looking at common SNPs from the ENCODE database where the minor allele frequency is greater than 5%. The important point here is that about 80% of these variants of the minor alleles are shared among the three major continental populations. And fewer than 1% are seen just in Asian populations here and fewer than 1% are restricted to European populations. About 6% are specific to African populations, more diversity in Africa and diversity outside of Africa being largely a subset of what we see in Africa. Now this paper just came out last week. This was the publication of two complete African genomes. And you can see, again, comparing those genomes and now we're looking at whole genome sequences. So this includes not just common variants but also rare variants. And we can see that still, as we're comparing a Yorubin, a Qoison, an Asian, and a European individual, there's still a lot of sharing of variants among these individuals even when we're looking at rare variants. Not as much sharing as when we're looking at common variants because the common variants tend to be older, they're more likely to be shared among populations but still an interesting level of DNA sharing among these individuals for whom we now have complete genome sequence. So how do we actually assess genetic distances, differences between populations? Well, we can use simple genetic distances. And we define the distance between two populations, call them I and J, by the difference in allele frequencies. So piece of I and piece of J are the allele frequencies in the two populations that we're comparing. So if we go back to those frequencies, I showed you a few minutes ago, we have three populations. We're looking at just three SNPs. If we want to assess the distance between populations one and two, we can simply for this SNP, SNP one, subtract the difference in their frequencies in populations one and two. So that's a very simple genetic distance estimate. And then to get the overall distance between populations one and two, we would just average this distance with the other, the distances derived from the other two SNPs. So it's really pretty simple. On average, how different are these populations in terms of their allele frequencies? And from that, we can build a network of similarity among populations. So if we have our three populations, we have our distance P one minus P two, we can draw a node between those two populations. Then we can take the average of those frequencies, P one and P two, and subtract that from the frequency in the third population. And that gives us another node in our network. So that's how we can show the similarity of these three populations to one another in terms of their allele frequency similarity. Populations one and two, as you can see, just by looking at the frequencies, are a little more similar to each other than they are to population three. And that's what the network displays for us. So if we do this now for a series of polymorphisms, now we're looking at allu insertion polymorphisms. So these are short interspersed nuclear elements that have inserted recently into the human genome, recently enough so that some people will have an allu at a specific chromosome location, others don't. Then we're assessing the frequencies of those, in this case, a hundred allu polymorphisms, some work we did a few years ago, and we're looking at various human populations. You see some interesting patterns here. First of all, you see that populations do tend to group together according to their continent of origin. And this isn't really a surprise to a biologist. Populations that live closer together are more likely to exchange mates, are more likely to have common history. So there is a correlation between ancestry and geographic location. So we can see populations from the major continents essentially grouping together. The other thing we notice here is that there's a lot more diversity among the African populations than in the rest of the world. And we'll come back to that. If we assess the statistical significance of these results, these are bootstrap support levels, percentages, and they're very high, telling us that these groupings have substantial statistical support. If we look at a similar network, now based on line one insertion polymorphisms, we see the same pattern. If we look at a network based on 40 populations on a 250K SNP array, again, we see that same pattern. Here are a series of African populations, European populations, a series from South Asia, East Asia, and the New World over here in yellow. So these populations do tend to group together according to their geographic location. We've now added to that network the HGDP samples, so another 40 populations, and we see again the same pattern. So it's really quite robust. This is another analysis published in Nature a couple of years ago by another group with a somewhat different sample of populations looking both at CNVs and SNPs, and once again we see the same pattern. So a reassuring level of consistency across studies. One thing that we notice if we plot haplotype heterozygosity, so we're now looking at haplotypes, and we're saying how much variation in haplotypes is there across the world and as it relates to distance from East Africa, and what we see is that haplotype diversity steadily declines as we go further and further away from Africa, and the pattern is even more apparent if we look just at non-African populations with a very high, statistically significant negative correlation. So the further away from Africa we go, the less diversity we see, especially in terms of haplotypes. So all of these bits of evidence are consistent with what I think is now pretty well accepted about modern human origins, that we have a recent common origin in Africa, that human populations first arose in Africa, that is anatomically modern humans, people who look like us, arose roughly 200,000 years ago, stayed in Africa for at least 100,000 years, developed variation as a result of mutation, and then a small subset went out to colonize the rest of the world. As a result of that, variation in the rest of the world tends to be less than in Africa, and it tends to be a subset of what we see in Africa, all very consistent with the recent African origin of our species, and a common origin of our species. Now, this is a somewhat different take on human origins. I was in the supermarket a few years ago, and all my eyes were caught by this headline. Adam and Eve skeletons had been stolen, and I wasn't even aware that they had been found, but because I was promised more amazing photos inside, I had to buy it, and what I discovered, all that's left is Eve's leg, and the identity of the perpetrator may have been established. All right, well, inevitably, if we're talking about differences among populations, among individuals, the issue of race comes up. What does genetics now tell us about traditional concepts of human race? I think what you'll see is that our view of race becomes much more nuanced, much more complex, as we begin to look at genetic data. But first we can ask the question, well, why does race even matter? Why does it keep coming up in our discussions? Well, certainly the prevalence of many diseases is known to vary by population, and along lines that correspond to traditional racial designations, things like prostate cancer, hypertension, type II diabetes, and so forth. We know that some relatively common disease predisposing variants vary among populations, things like the clotting factor five widen variant, substantially more common among Europeans than among other individuals. There's evidence that response to some drugs may vary among populations. African Americans may, on average, be less responsive to ACE inhibitors based beta blockers for lowering blood pressure, and I emphasize maybe because we're gonna come back to that point and what that actually means. In the forensic databases that are commonly used by the FBI and by crime labs, they are grouped along traditional, quote, racial lines, Caucasian, I put in quotes, African American, Hispanic, and so forth. So these designations are used commonly in many ways. The question is, what can genetics tell us about their validity? Well, it's interesting to look at some comments over the last decade on race. There was an editorial in the New England Journal a few years ago now that asserted that race is biologically meaningless. In a response in the New York Times, a psychiatrist, Dr. Sally Sattel, responded, I am a racially profiling doctor, a deliberately provocative comment. Her point was that she uses population affiliation in part to help decide dosage and drugs for her patients. A statement a few years ago from the American Anthropological Association said that genetic data show that any two individuals within a particular population are as different genetically as any two people selected from any two populations in the world, which is a fairly bold claim, and we'll see what the genetic data actually do tell us. But I think that when there are so many divergent opinions on an issue, it's time to look at data. But this, it was interesting to me that the headline in the cover of Scientific American a few years ago asked, does race exist? And this is the part that caught my attention. Science has the answer. Now anytime as a scientist, I think as scientists, when we see that science has the answer, we get a little bit skeptical. But let's look at the data. Well, the way we start is by tabulating DNA sequence differences among individuals. So now instead of looking at populations, we're looking at individuals. And we can pick a few individuals whose DNA sequence we have hypothetically obtained. So we're looking here at George Bush, John McCain, Hillary Clinton, and I just couldn't resist putting in John Edwards. I'm not sure if anyone wants to share DNA with John Edwards these days, but we can compare their sequences. And so as we're making these DNA networks, what we're interested in is sequence difference among our pairs of individuals. And if we compare Bush and McCain, we see that there are two sequence differences between them, two base differences. So we put a two in our little matrix here between Bush and McCain. If we compare Bush and Clinton, we see that there are five differences. So we put a five here. Bush and Edwards, six. Edwards and McCain, four, and so forth. So we make a matrix of DNA differences among our pairs of individuals. And from that, we can make one of these diagrams or trees that shows how similar they are. This is hypothetical, but it gives you a display then of distances, differences among our pairs of individuals. And you can imagine if we're looking at just a few people, we can easily look at the matrix itself and see the pattern. But imagine if we're looking at hundreds or even thousands of individuals, then it becomes much more difficult to deduce a pattern by looking at the matrix of let's say 1,000 by 1,000 individuals. So these displays help us to see the pattern very easily. Now, Steve Guthrie, a gastroenterologist who works with us a few years ago, saw this matrix in the New York Times. It was a matrix of percent disagreement among the nine Supreme Court justices. And he was learning population genetics at the time and he thought, well, this is a good exercise for building a tree. And if you look at this matrix, you can see some pattern. You can see, for example, that justices Thomas and Scalia have only 9% disagreement, so they're pretty similar. But still, it's not so easy to deduce the whole pattern until you make a diagram, a tree. And then you can see the pattern very, very easily. We have the conservative wing of the court here, the other wing of the court over here showing up very nicely on this display. So if we do the same thing for DNA sequence, here we're looking at the angiotensinogen gene, important component of the renin angiotensin pathway. We looked at 14KB of sequence in that gene and then we went back and said, well, how similar are the members of these three continental populations, Asians, Europeans, and Africans? And what we see is that sometimes an individual from Africa is actually more similar to people from Asia or people from Europe than to others from Africa. When we're looking at this single gene, when we're looking at 14KB of sequence. And what that is reflecting, at least in part, is the mixed ancestry of individuals with regard to specific genes, our complex history of migration and mixing. And we see, when we look at humans, we see genes from Europe and Africa. We see genes from Asia in Europe. So we humans do have a mixed and very complex history of migration. There is no such thing as a quote pure human population and the genetic data tell us that very clearly. Now we may think we've discovered something, but actually Charles Darwin long ago said that it may be doubted whether any character can be named, which is distinctive of a race and is constant. So Darwin was well aware of this, that characters that he was looking at tended to vary in frequency among populations, but seldom could you define a population based on any given characteristic. Now here what we did a few years ago was to look at a larger number of variants. In this case, ALU, STR and restriction site polymorphisms, materially we had at the time, about close to 200 polymorphisms. And we asked the same question, how similar are these individuals if now instead of looking at sequence from one gene, we're looking at 190 independent variants. And now we see that there is some pattern here where our samples from Asia tend to group together, from Europe tend to group together, Africa tend to group together. We're using a lot more information, we're picking up more about the ancestries of these individuals. Now notice also that these branches, most of the branch length is seen within populations. So this is consistent with the FST statistic I showed you, that said that most of the variation that we see, we see within major populations. But there is enough variation, just enough between populations so that if we're looking at a lot of characters, we can begin to see a reflection of partial isolation of these populations through their history. So the analogy I like to use is if we're looking at, let's say, height in males and females, and if we only look at height, we're going to see quite a lot of overlap between our male and female populations. But if we add another characteristic, let's say waist-hip ratio, now we have more information that allows us to discern males versus females, and so there's less overlap between them. Well, that's what we're doing as we look at more genetic characters, we're learning more about the ancestries of these individuals, we're starting to see more of the non-overlapping parts of those circles I showed you. So if we do this now with a larger number of characters, here we were using a 10K SNP array, we start to see a pattern in this diagram of individuals, and don't worry about reading the labels, they correspond to populations. So that if we use a large number of SNPs across the human genome, we start to see individuals sorting according to their population of origin. Now I should point out these populations are pretty well separated from each other geographically, but we do see, for example, African populations here, a European population, South Indian population, South Pacific, New World, Asian, and so forth. So again, telling us that there is some record of ancestry if we look at lots of independent DNA characters. Recently, my colleague, Mark Yandel in my department, looked at the whole genome sequence data, now publicly available for 10 individuals, and so if we look at whole sequence data, we see that again, individuals tend to sort out according to continent of origin, which is not too surprising, if we can do this with 10K, with 10,000 SNPs, we would expect that if we look at whole sequence data, we're going to see a similar pattern. Now another thing that Mark found that I think is pretty interesting is that there is some variation depending on the sequencing platform that is used, because down here, these are the same individual, one and two are the same African sample, but they look rather different depending on whether they were sequenced on an Illumina or an AVI solid platform. In fact, there are 557,000 differences between them generated by differences in platform. So although there is consistency here with ancestry, there is also some platform variation that we see in our diagram. So you might say, well, what is whole genome sequence telling us that we can't get with just a sample of SNPs? Well, I want to tell you just, this is a very new study just came out where we were able, one of my postdocs, was able to use the presence of ALU insertions in whole sequence data to essentially mark regions of the genome that are ancient. So he compared the published NIH sequence with the venture sequence, hue ref. And the idea here is that if we look at whole sequence variation, the average time to coalescence, that is for any genomic region across these two genomes, they would have a common ancestor about 460,000 years ago. But because ALU insertions are rare events, they only occur in one in 20 births. They tend to be preserved just in very old regions of the genome because they are so rare. In other words, between my brother and me, there's very little chance that I would have an ALU insertion at a spot on chromosome five that he doesn't have. So because these are rare events, they mark ancient parts of the genome where actually the average coalescence time is about 900,000 years. So for these regions of the genome, regions in which an ALU has inserted, they tend to be very old, we can look at sequence variation in those regions, essentially to probe very ancient history in our species and in our ancestors. And what Chad was able to do in this paper was just published in PNAS in January, was actually to estimate the effective population size of human ancestors 1.2 million years ago. What's interesting is that that estimate is only about 18,000. In other words, we at that time, our ancestors, pre-human ancestors, had a very small population size. And the effective size of anatomically modern humans is also quite small. What that suggests is that our ancestors were fairly close to extinction at one point in their history. And what I find remarkable is that we can learn this and learn it with quite a lot of precision from just two human DNA sequences because with whole genome sequence, we have a lot of information. So we can make estimates like this that would be impossible without that much information, without that much variation to look at. So a lot of interesting things that we can do with these whole genome sequences and we're very happy that they are publicly available so that we can all look at them and think of interesting things to do with them. So here is another way of looking at genetic distances among populations at similarities and differences. And this is called a principal components analysis. It's used very commonly in population genetics. I won't go into details about this but basically what it does is displays differences among our individuals in this case in three dimensions. So we have a first principal component running this way, a second one running this way, and then a third one, this is kind of the third dimension. So this is a three dimensional display of differences and similarities among individuals. The point here is that if we look at only 10 SNPs, if we look at a small collection of variation across the genome, we can't really see much of a pattern. So we don't have very many characters to look at. There's a lot of overlap. If we look at 100 SNPs in the same individuals, the same kind of display, we start to see a little bit of pattern but there's still a great deal of overlap. There's not really a discernible pattern here in three dimensions. If we look at 1,000 SNPs, we start to see some discernible pattern. And in fact, these groups correspond to the major continental populations. If we look at 250,000 SNPs, there's even more of a pattern and we can begin to see individuals sorting into populations. These are populations from India here, Asian populations, European, African populations. So again, with lots of information, we can start to see individuals sorting into populations of origin. And I've highlighted the three hat-map populations here to show that they fit essentially where we would expect them to. So if you translate these dimensions, the first one is sort of Africa versus non-Africa. So this is sort of the out-of-Africa dimension of this plot. The second one is pretty much an east-west across the old world. And the third one is a north-south orientation. So it kind of gives us three dimensions of human genetic variation. This is a similar plot, looking at 850 individuals in two dimensions. And we have essentially a climb here going from west to east. And again, we see individuals sorting out. But I also want to point out that there is overlap. So we can't, especially now, that we're sampling more broadly, we can't draw a sharp boundary among these various populations. And that's a very important point. Here we're looking at Eurasia alone, and you start to see essentially a map of the world as you look at these populations, as you look at their genetic similarities for 250,000 SNPs. The important point here, though, is that if we look at multiple polymorphisms, if we look at 10,000 or 100,000 or a million polymorphisms, we can with some accuracy predict population affiliation because these SNPs, if there are many of them, are, as I said, telling us about these non-overlapping parts of the circles. But what we can't do, and this is a critical point, we can't go the other way because these SNPs vary just in frequency among populations. We can't, by looking at a single, by looking at population affiliation, we can't infer what someone's SNP allele is going to be, or we can't, from a single SNP, infer population affiliation. So that, I think, is a critical point. And that brings us to the question, well, if we have enough genetic information, could we classify everybody into a population? Well, let's look at that network that we viewed just a few minutes ago that showed various human populations, but let's add a population with a complex history, African-Americans. We see that some individuals group into this group down here with people from Africa, others don't really fall into a group. If we look at Puerto Ricans, another group with a complex history, some tend to fall in with people from Spain, others closer to people from Africa. So there are many groups, many human groups, that don't fall neatly into any of these categories. And as we sample them, as we learn more about them, and as human migration continues to increase, we'll find that there are a lot of groups, a lot of individuals, that don't fall into any specific group. We have complex ancestries. And that, I think, emphasizes the fallacy of thinking typologically. This is one of the reasons why I don't use the term race in my own publications, because I think it tends to encourage thinking along the lines of types and typologies, when in fact, as we've seen, most human genetic variation is shared among populations. Now, let me ask you a question. This man, Wayne Joseph, grew up in Louisiana, in a Creole family. What do you think his ancestry is, looking at his appearance? Any guesses about his ancestry? French. You can tell this is a trick question, can't you? Well, he was raised as an African-American. He was a high school principal in California. He sent his DNA off to be tested by a company to find out about his ancestry. Now, I should say that we have to take these ancestry estimates with a grain of salt. There are a number of assumptions involved. But what he got back was that he was 57% European, 39% Native American, 4% East Asian, and apparently no African genes at all, despite his self-reported ancestry. He retained his culture, of course, but it's interesting to see how one's self-reported ancestry can differ completely from one's DNA-measured ancestry, at least as accurate as that is. And I think this points up a very important difference. In our discussion about race, the difference between individual ancestry, which can be very complex, and race, which is a very blunt tool. An individual, let's say, with 90% African ancestry, 10% European ancestry, would be considered African-American in the United States. But also an individual with only, say, 30% or 40% African ancestry in the United States would likely self-identify as African-American, even though their genetic constitution is very, very different. So that's why I think it's so important to understand this difference between individual ancestry and what we refer to as race. Now, a few months ago, out of curiosity, I sent my DNA into a company to learn something about my own ancestry. And I was really disappointed. This is about as boring a genome as you could find. According to the company, at least, my ancestry was 100% European with no interesting variation whatsoever. I should say all of my grandparents came from Norway, but I was hoping there might be some rogue genes in there somewhere. You can compare me. This is a woman from the Berber population in North Africa and a much more interesting genome. Here, we see mostly European ancestry. I think it's 86%, but some significant African ancestry, a little bit of Asian ancestry, as estimated by the markers used by this company. The important point here, though, is that sections of this person's chromosome might be of European origin. They might be of African origin. Biometically significant genes in this person might be of European origin. They might be of African origin. So once again, ancestry gives us a much more complex view of this person's genetic legacy than does a category like race. Going back to my ancestry, I also did get a report back on my Y chromosome. And this is a Y chromosome that is pretty common in Northern European, in Northern Europe, a Y haplogroup, common in Northern Europe. And I learned that I share it with Jimmy Buffett and Warren Buffett. Hasn't done anything for my singing or my investing, but it was an interesting factoid. And I don't have Genghis Khan's Y chromosome. I don't know that we know what Genghis Khan's Y haplogroup really was. This is sort of historical frolicking, I think, but it is interesting to do this sort of recreational genomics. I also got a report back on my maternal haplogroup, which, not surprisingly, is also common in Europe. Well, what did these kinds of patterns imply for biomedicine, for biomedical research? Well, we've seen that if we look at a lot of DNA polymorphisms, we can learn something about ancestry and population history with, I think, some important qualifications that not everyone falls into groups and that there are, of course, assumptions made in making these inferences. But responses, for example, to a lot of therapeutic drugs, may involve variation in just a few genes, and they are also going to be affected by environment. And what that means is that those genetic classifications don't necessarily tell us that much about biomedically significant phenotypes, because the genetic variants, as we've said, typically differ just in frequency among populations and therefore are going to show a lot of sharing, a lot of overlap among populations. And here's a great example. We talked earlier about the effects of ACE inhibitors in African-Americans versus European-Americans. And this was a meta-analysis published a few years ago looking at the decrease in blood pressure that occurred after administration of ACE inhibitors in thousands of European hypertensives and African-American hypertensives. And what we see is that, on average, there is about a five-millimeter difference in response, that is, African-Americans, on average, don't respond quite as much as European-Americans to ACE inhibitors in terms of lowering their blood pressure. But you can also see that there's a lot of overlap between these two curves. A lot of African-Americans would actually respond better than a lot of European-Americans to an ACE inhibitor. So, once again, using a category like race to predict response gives us some information that it can also mislead us. Now, here's another good example. The drug gefininib, which is sometimes used to treat non-small cell lung cancer. It's an EGFR tyrosine kinase, it's an EGFR inhibitor. It is effective, at least for a while, and about 10% of Europeans, roughly 30% of Asians. So you might be tempted to think, well, we could use population affiliation to help predict who's going to respond to this drug. After all, a three-fold difference in different populations. But somatic mutations in EGFR are seen in about 10% of Europeans, about 30% of Japanese patients. And what's really interesting is that 80% of those who have these EGFR mutations respond to gefininib and only about 10% without the mutations respond. So we can see that by looking directly at mutations in this gene, we get a much better predictor of response to the drug than if we look at population affiliation. Oh, I'm glad to see my anti-virus is working, okay. Well, and that leads us to a theme that you've heard about and we'll hear more about in this series. Hal McLeod will talk about personalized medicine and the notion that now that we can look at individual variation, that is a much more appropriate target and a much more appropriate means of deciding therapy once we have the information, once we can make the prediction, then using broad categories like race or population affiliation. So what I've told you about genetic variation in the race is that we do see a correlation between geographic location and genetic variation, but that variation, if we sample enough of it, we start to see essentially continuous non-interrupted variation across space. It's hard to delineate specific borders between populations of individuals. And so I think that our traditional concepts of race may not be actually biologically meaningless. That might be an overstatement, but it's biologically very imprecise. It is a blunt tool. Concepts like ancestry looked at at the individual level are certainly going to be more informative and we hope that personalized medicine, when it becomes a reality, will be medically a lot more useful than categories like ethnicity or race. And I think finally, and this is a point that can't be emphasized enough, there is nothing in genetics that supports racist thinking, thinking that one group is in some way superior to another. And in fact, I think a lot of evidence that contradicts that kind of thinking because we can, with genetic data, ascertain how similar we all are to one another, how much variation we do share. So I think actually genetics is an important tool that can help to combat racist thinking. And what I would like to do now, because I think 90 minutes is too long for humans to sit down in one place, I'm gonna show you a nice pretty picture taken from just a few miles from my house in Utah. And I'm gonna ask you to stand up for about a minute or so and just stretch. So you have a little break and we'll go on to the third part of the talk. Yeah, I do, actually. That first peak, it's called Fox Older Peak, and I've skied from the top of that. And I climbed the one in the back, Tintinotus, and the one on the bottom of the Tintinotus. This peak is maybe really meant to be. No. It's much beautiful. Not that way. No. Okay, I think we'll go ahead and get started again. Hope you enjoyed your break. Okay, in the last part of my talk, what I would like to discuss, and you'll be discussing more of this as this series proceeds, is how our understanding of population genetics and evolutionary genetics helps us to understand haplotype distributions, the concept of linkage disequilibrium, and how it helps us to design more effective gene mapping studies. So this is really a bridge between population genetics and evolutionary genetics on the one hand and gene mapping on the other. The two have become, I think, really intertwined over the last decade or so. And as somebody with interest in both areas, I've been very gratified to see the mutual interest in both population genetics and gene mapping location. So if we look at SNP frequencies across human populations, we find at least roughly that a SNP with a minor allele frequency greater than 1% occurs about one every 300 base pairs in the genome. And this depends a little bit on populations you sample and so forth, but at least roughly we can say that there are at least 10 million SNPs in the genome where the minor allele frequency is greater than 1%. They would be considered under the traditional definition polymorphisms. A common single nucleotide polymorphism, that is, with a minor allele frequency greater than 5%, we have about 5 million or so of those, at least roughly. Well, that means that at even relatively modest costs, let's say a tenth of a cent per SNP, if we wanted to genotype all 5 million of those variants, that would cost $5,000 a person, maybe $2,000 or $3,000, but still a lot of money per sample. So if we want to do a case control association study, comparing 1,000 cases and 1,000 controls, in these days this is pretty much a minimum sample size, well it would cost $10 million to genotype all 5 million of those SNPs. So this was a real problem. Did we really need to test all of these SNPs in order to assess variation in doing a case control study? And would the SNP association tests reveal disease genes? So I want to start with just a couple of very simple definitions, because I know we have a diverse audience here. So first we'll define a haplotype as the DNA sequence found on one member of a chromosome pair. So in this individual we see two haplotypes with these two sets of alleles, Big A and Little A and so forth. And we transmit those haplotypes to our offspring. Now as you know during meiosis crossovers can occur between homologous pairs of chromosomes like this, resulting in recombination of alleles. And now this parent transmits a new haplotype, a haplotype with a new combination of alleles to his offspring. So these very fundamental concepts of crossover and recombination of course are what allow us to establish the relative distances between loci in the human genome. We can ask how often do these crossovers occur? Well over time, then we expect more crossovers between loci that are located further apart. So A and B are further apart. We observe more crossovers between that pair of loci than between B and C. And what that means is that after many generations we're going to find alleles B and C, that is Big B and C, together on the same copy of a chromosome more often than we will find Big A and Big B because these recombine so that we get Big A with Little B and eventually we tend to reassort these alleles due to recombination. And what we're saying is that there is more linkage disequilibrium between this pair of loci than between this pair of loci. That is, the alleles at this pair of loci are found together more frequently than we expect by chance. And that's what we mean by linkage disequilibrium. So here's a little diagram again illustrating the idea. So linkage disequilibrium, it's the non-random association of alleles at linked loci. At equilibrium, if we have our two loci, A and B, we would expect to see essentially every possible combination, Big A and Big B, Little A and Little B as we look at copies of chromosomes in our population. We can assess the frequencies of those alleles, Big A and Little A, Big B and Little B. And under equilibrium, under linkage equilibrium, we would predict that copies of the chromosome that have Big A and Big B together should equal the frequencies of those alleles in the population. That is 60% times 40% for 42%. Big A and Little B, we should see 60% times 30% or 18% of the time and so forth. In other words, these loci, their alleles are independent of each other. They're at equilibrium. We can multiply their respective frequencies to get the haplotype frequency. But if we see a substantial deviation from that, as we see in this diagram, where Big A and Big B are found together on the same copy of a chromosome that is on the same haplotype more frequently than we would expect by chance, and Big A and Little A and Little B are also found more frequently than we expect by chance, then we have linkage disequilibrium between A and B. In other words, there haven't been enough recombinations to randomly reassort these alleles in our population. And what that suggests is that A and B are likely fairly close together. So let's imagine how linkage disequilibrium would arise. Let's imagine a cystic fibrosis causing mutation arising some thousands of years ago. This could be the Delta F508 common CF causing mutation. And when it first arises, these alleles occur on a specific haplotype background. That is, we see these alleles nearby. The uppercase alleles would designate then the ancestral chromosome on which that CF mutation first occurred. And so for a few generations, every time we see the CF mutation, we're going to see these alleles nearby. And these alleles, the alternatives shown here, are not associated of causing mutation. But over time, crossovers are going to occur, breaking up these associations. So on this chromosome copy in the present day population, we see our CF mutation co-occurring not with big D, as it originally did, but with little D. But we still tend to see associations between our mutation and SNPs that are very, very nearby. We still tend to see uppercase G most of the time when we see our disease causing mutation, because it's very, very close, so close that recombination still hasn't had time to reassort our mutation with the alternative allele little G. And this is what we mean by linkage disequilibrium. And in fact, this approach was used to pinpoint the cystic fibrosis causing gene back in the late 1980s. So there's some advantages in using linkage disequilibrium in mapping disease causing loci. First of all, we don't necessarily have to use family data. So as opposed to standard linkage analysis where we look directly at recombinations in families, and we have to count recombinations in families with linkage disequilibrium, this is something we can estimate in populations. And we have, of course, microarray technology now that allows us to look at very dense arrays of snips to do our genotyping. And the real advantage of linkage disequilibrium is that it in essence incorporates many, many past generations of recombination. Essentially every recombination that has occurred since the mutation took place. So I like to show this as a contrast. Here we have a series of three generation families that we might use in standard linkage analysis. We would count recombinations in these families to estimate the distance between loci, but we're limited to the number of generations that we can actually collect. But with linkage disequilibrium, what we're effectively doing is going all the way back to the common ancestor in whom that disease-causing mutation first occurred. And our hope is that in these families they all share the same disease-causing mutation descended from that common ancestor. And with linkage disequilibrium we're incorporating the effects of the recombinations that have occurred over many generations since the mutation first occurred. What that means is that linkage disequilibrium can allow us to more finely localize the disease-causing mutation because we have a lot more recombinations effectively to look at. And because populations with regard to any specific mutation are essentially one big pedigree, one complicated pedigree tracing back to that original ancestor in whom the mutation first took place. So this linkage disequilibrium has had kind of an interesting history. When I first became interested in it and I have to admit it was a long time ago back in about 1982 a guy named David Barker who was a postdoc of Ray White came to my office and had four brand new RFLPs. And back in those days four new polymorphisms was a big deal. He got the lead article in the American Journal of Human Genetics that issue. And we were looking at linkage disequilibrium patterns and that's when I got interested in this phenomenon of linkage disequilibrium because it has interesting properties in population genetics. But if you look back at that time only about 20 articles per year were published on linkage disequilibrium. You would have to read a paper every other week and know everything there was to know about this topic. Now we're getting close to 2,000 papers a year. You would have to read 30 or 40 papers a week to keep up with that literature. Not that all of them are necessarily worth reading. But this indicates how much interest there is in the topic now of linkage disequilibrium relative to say 25 years ago. Well the question is is there a simple, uniform relationship between physical distance in the genome and linkage disequilibrium between pairs of loci? In other words if we know the amount of linkage disequilibrium the amount of non-random association between two loci how well can we predict how far apart they actually are? So this is the relationship that we would expect that as distance between loci increases linkage disequilibrium decreases. It goes eventually to zero. It's often measured with R a correlation coefficient so that in complete disequilibrium that is when we can perfectly predict the allele at one locus if we know the allele status at a nearby locus then R is equal to one if there's no relationship in other words if they're under equilibrium R is equal to zero there's no correlation. And this is the relationship that we expect. Well some years ago we looked at a number of RFLPs near the adenomatous polyposis coli gene and we were interested in the physical distance between those polymorphisms so each of these points represents a pair of polymorphisms and the question was does disequilibrium between pairs of polymorphisms decrease as we look at polymorphisms that are further and further apart and here we looked across about 600 kb and indeed we did find a significant negative relationship between disequilibrium and distance between these polymorphic markers and that was an early indication that well linkage disequilibrium potentially could be used to finally localize genes on chromosomes. Now here's another example of this plot that I've reduced to small size has a series of points here and then a series of points that doesn't correspond to our relationship at all so it's not the decrease, the monotonic decrease in disequilibrium with distance that we would expect and this was an analysis we did in the neurofibromatosis type 1 region what we found was that for all of these markers all pairs in substantial disequilibrium with an r value greater than 0.82 but there was another one just 68 kb away from its nearest neighbor where there was no disequilibrium between it and all of the other markers so we had a lot of hypotheses at the time to account for this it was a GC rich region recombination is somewhat associated with GC content but what ultimately we learned from the HapMap data is that there is a recombination hotspot right in this area so the recombination hotspot located right here explains why this polymorphism is not in disequilibrium even with one that's only 70 kb away so we don't always see the uniform relationship that we might expect between distance that is physical distance between polymorphisms and the linkage disequilibrium between them because among other things we have hotspots throughout the human genome of recombination and there are a number of factors that can affect linkage disequilibrium patterns chromosome location recombination is more common near telomeres than elsewhere so we tend to see disequilibrium dissipating more quickly in telomeric regions DNA sequence patterns GC content we found that ALU elements influence recombination a little bit and actually increase recombination by a few percent and of course there are a lot of ALU elements throughout the human genome there are recombination hotspots every 50 to 100 kb and then what I think is especially interesting is that evolutionary factors influence patterns of linkage disequilibrium in the human genome things like natural selection, gene flow gene conversion genetic drift all of the factors that population geneticists evolutionary geneticists like to think about influence patterns of linkage disequilibrium because disequilibrium reflects the histories of populations and the factors in populations that have affected genetic variation so there are some interesting implications of our population genetic studies for disequilibrium patterns we've seen that there is continental variation even variation within the major continents that is going to affect stratification patterns and should be taken into account when we're designing case control association studies the fact that the African populations were founded further ago in time in other words they have we can say a greater age implies that we should see less linkage disequilibrium in those populations that is there's been more time for recombinations to occur in those populations we're going to see linkage disequilibrium persisting over shorter distances because of more recombinations we've seen greater divergence of African populations what that implies is that what we sometimes call admixture linkage disequilibrium would be especially effective in populations that reflect mixtures of African and non-African populations and we don't have time here to talk about admixture disequilibrium but it's starting to be applied and with some level of success so here's a way in which population genetics I think informs our understanding of haplotype structure and gene mapping if we think of populations that were founded a long time ago such as those in Africa there have been many generations for recombinations to occur as we see here and that means that we're going to see relatively short groups of haplotypes or haplotype blocks as they're sometimes called in contrast if we look at a population that was founded relatively recently an example might be the population of Finland most of which was founded just a couple thousand years ago well there hasn't been as much time for recombinations to occur in a population like that so we have fewer haplotypes in larger blocks there's more disequilibrium less haplotype diversity so if we think about a mutation that may have occurred in that population a couple thousand years ago it's going to be in disequilibrium with a large number of SNPs there hasn't been much time for recombination to cause those associations to decay in contrast a mutation that occurred in an African population will have had more time during which recombination could reduce its association with nearby SNPs so we tend to find that mutation in association with a smaller number of SNPs in other words we're going to need more SNPs in this population to find associations than in this population but conversely in this population we can more finely map the location of a mutation because it's in association with just a few nearby SNPs so some important attributes of population history that help to inform us about association studies and if we look at some real data this is the kind of display that we get from the program Haplow View and let me explain what we're looking at here because we see these all the time in the association study literature and this is a map of linkage disequilibrium each of these little columns here represents a SNP and they are arrayed according to their physical location across the chromosome and then each of these squares like this red square here indicates the linkage disequilibrium between a pair of SNPs so for this adjacent pair of SNPs right here we have red, we have high disequilibrium for this pair that is this one and this one we have little disequilibrium so an analogy would be the mileage charts that some of us have used where we can take any pair of cities let's say New York and San Francisco and we can say what's the distance between them well here what we're saying is for a pair of SNPs what is the disequilibrium between all possible pairs and the pattern that should really strike you as you look at this is that there is a lot more disequilibrium in this Eurasian sample than in the African sample consistent with this population having been founded much more recently and having less haplotype diversity we see more disequilibrium we see SNPs occurring in much larger haplotype blocks in these populations than in these and that has important implications for study design so I've showed you a few examples of disequilibrium in specific regions but one of our questions is well how general are these patterns if we look across the genome what kinds of patterns do we see and back about 10 years ago our knowledge of the human genome of linkage disequilibrium across the human genome was a lot like this map of the world from 1544 we really didn't know much about patterns of disequilibrium or haplotype structure across the genome and if you look at this map we have a fairly good representation of Europe some of Africa and Asia you see that North America is completely absent in 1544 well that's kind of how our knowledge of haplotype structure across the genome was roughly 10 or 12 years ago and that's what led to the HAPMAP project and I want to mention this you'll hear more about it in other lectures as well but the original idea was to look at a large collection of SNPs 600,000 it eventually went to a million and then more after that in individuals from three major populations 90 of them in 30 trios were from the Utah SEF collection so this represented northern Europe 90 Euribon and 90 East Asian individuals and the idea was to look at patterns of linkage disequilibrium in these different populations and to look at haplotype structure to what extent these vary among populations and across the genome interesting issues that came up in the early discussions of the HAPMAP and I was lucky enough to be part of those discussions one of the issues was how best to sample human diversity if you can only sample a few populations and the decision was to try to look at a fairly broad sampling but by no means a complete sampling of human diversity and of course there were sample size issues, statistical power issues, issues involving a SNP ascertainment and density and then also a number of ethical, legal social issues, things like informed consent there was even some discussion of whether we should name the populations or not or whether the three populations should not be identified because of concerns about potential stigmatization. The population geneticists felt that because we know that population history affects haplotype structure and disequilibrium if we don't name the populations then we don't know their histories that would be a severe liability so for that reason it was decided we did decide to name the populations and I think that's added a lot of usefulness and information to the HAPMAP samples and subsequent to that I think our map of the world improved, our map of the genome our map of haplotype structure improved quite a lot as you can see California for some reason is still missing from this map but by and large our knowledge of the human genome improved, our knowledge of disequilibrium in the genome improved a great deal and there have been a number of interesting applications of the HAPMAP first of all understanding worldwide genome-wide patterns of haplotype diversity, detecting recombination, hotspots throughout the genome detection of genes that have experienced natural selection and then of course detection of disease-causing mutations so here's an example looking at the decay of disequilibrium across genomic regions in the HAPMAP populations the Asian, European, African populations and you can see that as we would expect with more recombinations disequilibrium decays more quickly with physical distance in the African population in the urban population and in this isolate population more recently founded we see that disequilibrium doesn't decay quite as rapidly again more recent history, fewer recombinations more linkage disequilibrium so we start to get a picture of patterns of disequilibrium across the world with these data and one of the really important consequences of HAPMAP is that we've learned that because of the pattern of disequilibrium across the genome a lot of SNPs are effectively redundant if we know that this person has a C at this position they have a T at another position and A at another position because of linkage disequilibrium whereas person B here has an A at this position a G here and a C here and what that tells us is that we only need to type this one in order to know the genotypes of these in other words we don't have to type all 5 million common SNPs we can type a subset of them what we call tagging SNPs and get a pretty good picture of the diversity across the genome by looking at that subset of variation and that in itself is a huge saving of money the fact that we can type maybe a million SNPs in non-African populations and essentially get the half a type diversity across the genome instead of typing all 5 million is a huge savings in money and we find a number of studies have looked at the portability of the HapMap tagging SNPs across populations and we find that in general they are pretty portable that is you can infer patterns of disequilibrium from one major population within a continent to another and most of the time get it right we've also been able to detect the presence of recombination hotspots by looking at regions where disequilibrium suddenly declines and we define a recombination hotspot as a 1-2 kV region where recombination is elevated 10 fold above background and it's been quite interesting to discover that there are tens of thousands of hotspots in the human genome roughly 1 every 50 to 100 kV we recently looked at a family of two parents two offspring looked at their whole genome sequence found 155 recombinations 92 of them were in recombination hotspots so in general the data tell us that most crossovers at least 60% occur in only about 10% of the genome so hotspots really are significant in terms of accounting for most crossovers and another really interesting finding in studies of recombination hotspots is that they are not at all congruent in human and chimp even though our DNA sequence is 99% the same our hotspots are very very different suggesting that these evolve very rapidly they may not be sequence dependent they may involve epigenetic mechanisms so all kinds of interesting questions that can be addressed with data such as those of the HAAP map another thing that these data allow us to do is to detect natural selection in the genome and this slide sketches out how we do that so imagine that a variant of the interest has occurred and as we saw it occurs on a specific chromosome background so we will see it at first in association with nearby SNPs but of course when it first occurs its allele frequency over here is very low if it's a neutral variant it may rise in frequency through time as a result of genetic drift but that increase in frequency is going to be very slow and what happens as this variant the red star increases in frequency through time is that because of recombination it is associated with fewer and fewer nearby SNPs through time that is we are going to see very little disequilibrium between this variant and let's say this SNP once the variant gets to let's say 10% in frequency but if it's under selection if there's been recent positive selection for that variant let's say it confers some sort of adaptive advantage it will rise quickly to high frequency let's say 10% or 12% and because selection has caused it to rise in frequency very quickly it will still be in disequilibrium with many nearby SNPs will have a very long range of linkage disequilibrium around that variant and we will see that when we look at genomic data we'll see a region in which there's high disequilibrium over an unexpectedly large distance and this is a signature of strong positive selection on this variant that is that the variant has risen to high frequency and is in a large linkage disequilibrium block and this approach has been used to detect natural selection involving a number of phenotypes malaria resistance hemochromatosis sodium retention lactose hereditary lactase persistence several genes for skin pigmentation and so forth so another interesting application of these data finding regions that have been strongly affected by positive selection in human populations and of course linkage disequilibrium has had some real successes in localizing single gene disorders that is where they were the loci were first mapped roughly by using linkage analysis but then the disease causing gene was found using linkage disequilibrium analysis to pinpoint the actual gene and my display has just frozen there we go so these kinds of studies are very successful if most cases of the disease are caused by a single mutation that makes sense this mutation is the only one or the principal one that causes disease then we're going to be able to easily detect association between the disease phenotype and nearby SNPs but imagine if instead we have multiple disease causing mutations then sometimes we're going to see the disease when we see one genotype sometimes a different genotype so when there are multiple disease causing mutations when there is substantial allelic heterogeneity that presents real challenges in doing case control association studies so one of our issues is how can we reduce that heterogeneity and enhance the genetic signal well clearly consistent trait definition the use of intermediate phenotypes will help to decrease heterogeneity and identify subtypes of disease those with early onset atypical expression severe expression and this is where clinicians can be especially helpful because typically the clinicians understand those subtypes and can inform the geneticist as to which group of cases should be used in an association study and we can use our knowledge of evolutionary history to define populations in a very strict general fashion so that we have as uniform an evolutionary history as possible and there may be situations in which population isolates will be of special utility so the bottom line is and some of you have probably seen these kinds of displays before we can now point to quite a few genome wide association studies that have been successful in uncovering susceptibility variants for common complex disease as you know there's still a lot left to be discovered most of the heritability remains to be discovered but because of I think intelligence study design much of it informed by our knowledge of population genetics these kinds of studies have been much much more successful over the last couple of years than previously and this is something else that you will hear about Karen Mulkey will be talking about this later in this series so to summarize what I've told you this morning we see that genetic variation when we look at SNP microarrays or now as we're starting to look at whole genome sequence it does contain useful information about our population history about our ancestry I think that our studies of genetic variation give us a more informed and more nuanced view of the concept of race and tell us much more about medical relevance than if we use these broad categories like population affiliation or race and population genetic analysis especially in the context of linkage disequilibrium has played a central role in understanding linkage disequilibrium and how it is applied to mapping and localizing disease causing genes and finally I hope you've gotten some sense of something that not everyone appreciates population genetics can actually be fun it can tell us interesting things fun things about ourselves our populations and even about our our phenotypes in general so finally I want to acknowledge a number of my colleagues at the University of Utah people in my lab and other colleagues who have contributed to the work I've told you about the mobile element work that I touched on just a little bit I've done in collaboration over many years now with my colleague at LSU Mark Batzer some of the samples that I told you about were gathered by the Sorenson Molecular Genealogy Foundation Scott Woodward and Edgar Gomez in particular and that I think is helping to increase our knowledge of genetic variation so I want to thank all of these people for their contribution to our research and I'd like to thank all of you for your attention and I think we have a couple of minutes for questions okay I'm just told that usually we don't have questions at these I'll ask one so this field of analyzing linkage dysequilibrium does it work for very very rare mutations I mean conceived that it would have to be pretty common to actually work but it could potentially work for a rare mutation the problem is that you need a fairly large sample size of affected individuals who are unrelated at least not closely related and if you've got a really rare mutation it might be very difficult to get a large enough sample say 50 to 100 cases so that's where it would be challenging also there are a lot of disease classes that are caused by multiple different type of mutations cardiomyopathies for one I mean wouldn't it be difficult to use linkage dysequilibrium in such a case? yeah if there's substantial allelic heterogeneity if you look at say BRCA1 where there are hundreds of different mutations each of those mutations occurs on a different haplotype background so you're not going to see a consistent pattern of association so yeah if there's strong allelic heterogeneity linkage dysequilibrium becomes it can isn't always very useful thanks has anyone applied phylogenetics to SNPs? phylogenetics to SNPs aha yes there are are the results meaningful or what? yeah the results that we see are you talking about phylogenetics across species or within species within species if you look at patterns in humans they're very consistent with what we've seen with other markers what we see with whole sequence essentially every kind of polymorphism we look at at least for autosomal polymorphisms gives us a pretty similar pattern phylogenetically